Dimensionality of in multiple regression Y =beta_0+beta_1X_1+beta_2X_2+cdots+beta_nX_n +varepsilon



Answered question


Dimensionality of datasets in multiple regression
As an example, let's say that a linear regression is performed of the form
Y = β 0 + β 1 X 1 + β 2 X 2 + + β n X n + ε
where Y is a vector of 10 , 000 measurements of peak acceleration of different car models, and the regressors correspond to different technical features of the cars.
From a linear algebra standpoint Y lives in R 10000 , and the coefficients are found by minimizing the sum of the square distances of this vector on a hyperplane.
Now, from the point of view of dimension being the number of linearly independent vectors that span a space, this vector Y is just 1 dimension.
If it is truly 1-dimension of a R 10000 ambient space, the Euclidean projection on the hyperplane that underpins the process of finding the coefficients does not have any dimensionality issues (collinearity between the regressors being a separate topic). Otherwise, L 2 norms in high dimensions do pose problems.
"So is Y (the vector of 10 , 000 observations) 1-mimensional or high dimensional?"

Answer & Explanation

Baron Coffey

Baron Coffey

Beginner2022-09-05Added 5 answers

Consider the function
f ( X 1 , X 2 ) = β 0 + β 1 X 1 + β 2 X 2 .
This is a plane in 3 dimensions no matter how many times you evaluate the function. Thus, your problem "lives" in a 2-dimensional space.
As for using the L 2 norm, you are correct.
Leonel Schwartz

Leonel Schwartz

Beginner2022-09-06Added 2 answers

The issue of dimensionality is the context of regression analysis is the ratio between n, number of observations, and p, number of estimated parameters. As closer n to p, the less reliable your estimated model is. Assume that your model is
y = β 0 + j = 1 p x j β j + ϵ ,
hence in order to find the OLS estimators of β = ( β 0 , . . . , β p ) you project the vector y on the affine space spanned by ( 1 , x 1 , . . . , x p ), hence it is a p dimensional space. The number of observations, n, is not count as dimension. If you have a continuous stochastic process, then you can sample from it infinitely many times, i.e., n , that is usually a good feature because you can safely use asymptotic results. Notably, in such a case, there is another problem of artificially low p.values, but this is unrelated to the dimension of the model or the embedded space.\

Do you have a similar question?

Recalculate according to your conditions!

New Questions in Inferential Statistics

Ask your question.
Get an expert answer.

Let our experts help you. Answer in as fast as 15 minutes.

Didn't find what you were looking for?