I was reading a tutorial written on Linear Regression by Avi Kak. There is a part about geometric interpretation of linear regression on pg.19.The optimum solution for β~ that minimizes the cost function C( β ∼) in Eq. (14) possesses the following geometrical interpretation: Focusing on the equation ∼y = X β ∼, the measured vector ∼y on the left resides in a large N dimensional space. On the other hand, as we vary β ∼ in our search for the best possible solution, the space spanned by the product X β ∼ will be a (p+1)-dimensional subspace (a hyperplane, really) in the N dimensional space in which ∼y resides. The question now is: which point in the hyperplane spanned by X β ∼ is the best approximation to the point ∼y which is outside the hyperplane. For any selected value for β ∼, the “error” vector ∼y − X β ∼ will go from the tip of the vector X β ∼ to the tip of the ∼y vector. Minimization of the cost function C in Eq. (14) amounts to minimizing the norm of this difference vector.I could not understand how to relate N-dimensional space and (p+1)-dimensional subspace. B vector defines a (p+1) dimensional subspace but I could not understand why N dimensional space contains the (p+1) subspace. As I understand in (p+1) each dimension means features but in N dimensional space each dimension means a data point. I'm a lot confused about the idea. Are there any other resource that explains the idea in a much more detail? or Could anyone explains the idea how these spaces relate?

Question

I was reading a tutorial written on Linear Regression by Avi Kak. There is a part about geometric interpretation of linear regression on pg.19.The optimum solution for β~ that minimizes the cost function C(  β  ∼) in Eq. (14) possesses the following geometrical interpretation: Focusing on the equation   ∼y = X  β  ∼, the measured vector   ∼y on the left resides in a large N dimensional space. On the other hand, as we vary   β  ∼ in our search for the best possible solution, the space spanned by the product X  β  ∼ will be a (p+1)-dimensional subspace (a hyperplane, really) in the N dimensional space in which   ∼y resides. The question now is: which point in the hyperplane spanned by X  β  ∼ is the best approximation to the point   ∼y which is outside the hyperplane. For any selected value for   β  ∼, the “error” vector   ∼y − X  β  ∼ will go from the tip of the vector X  β  ∼ to the tip of the   ∼y vector. Minimization of the cost function C in Eq. (14) amounts to minimizing the norm of this difference vector.I could not understand how to relate N-dimensional space and (p+1)-dimensional subspace. B vector defines a (p+1) dimensional subspace but I could not understand why N dimensional space contains the (p+1) subspace. As I understand in (p+1) each dimension means features but in N dimensional space each dimension means a data point. I&#039;m a lot confused about the idea. Are there any other resource that explains the idea in a much more detail? or Could anyone explains the idea how these spaces relate?

grcalia1 · Accepted Answer

The matrix X is an   N  ×  (  p  +  1  ) matrix. Its row space is at most of dimension N and its column space is at most of dimension p+1. In your notes, it is assumed that N&amp;gt;p+1 and the columns of X are linearly independent. So the rank of X is p+1. A theorem in linear algebra says that the dimension of the row space and that of the column space are the same. So the row space is also of dimension p+1.By matrix multiplication, the vector   y  =  X  β is an   N  ×  1 matrix. This is why they say y is in a large N dimensional space (since there are N rows).On the other hand,   y  =  X  β implies that the vector y is a linear combination of the column vectors of X, and hence in the column space of X, which is of dimension p+1.Here is an example.Let  X  =      (                            1                          0                          1                                      0                          1                          1                                      0                          0                          1                                      1                          1                          1                      )  N=4 and p=2.On the one hand, for any   β  =  (      b    1    ,  ⋯  ,      b    4        )    T  ,   y  =  X  β is a vector in a large space             R        4  , which is of dimension N=4, it is also a vector in a subspace (of R4) of dimentions p+1=3, namely, the space span by the column vectors of X.

I was reading a tutorial written on Linear Regression by Avi Kak. There is a part about geometric in

Answered question

Answer & Explanation

New Questions in Pre-Algebra