jhenezhubby01ff

2022-09-04

Dimensionality of datasets in multiple regression

As an example, let's say that a linear regression is performed of the form

$Y={\beta}_{0}+{\beta}_{1}{X}_{1}+{\beta}_{2}{X}_{2}+\cdots +{\beta}_{n}{X}_{n}+\epsilon $

where $Y$ is a vector of $10,000$ measurements of peak acceleration of different car models, and the regressors correspond to different technical features of the cars.

From a linear algebra standpoint $Y$ lives in ${\mathbb{R}}^{10000}$, and the coefficients are found by minimizing the sum of the square distances of this vector on a hyperplane.

Now, from the point of view of dimension being the number of linearly independent vectors that span a space, this vector $Y$ is just $1$ dimension.

If it is truly $1$-dimension of a ${\mathbb{R}}^{10000}$ ambient space, the Euclidean projection on the hyperplane that underpins the process of finding the coefficients does not have any dimensionality issues (collinearity between the regressors being a separate topic). Otherwise, ${L}^{2}$ norms in high dimensions do pose problems.

"So is $Y$ (the vector of $10,000$ observations) $1$-mimensional or high dimensional?"

As an example, let's say that a linear regression is performed of the form

$Y={\beta}_{0}+{\beta}_{1}{X}_{1}+{\beta}_{2}{X}_{2}+\cdots +{\beta}_{n}{X}_{n}+\epsilon $

where $Y$ is a vector of $10,000$ measurements of peak acceleration of different car models, and the regressors correspond to different technical features of the cars.

From a linear algebra standpoint $Y$ lives in ${\mathbb{R}}^{10000}$, and the coefficients are found by minimizing the sum of the square distances of this vector on a hyperplane.

Now, from the point of view of dimension being the number of linearly independent vectors that span a space, this vector $Y$ is just $1$ dimension.

If it is truly $1$-dimension of a ${\mathbb{R}}^{10000}$ ambient space, the Euclidean projection on the hyperplane that underpins the process of finding the coefficients does not have any dimensionality issues (collinearity between the regressors being a separate topic). Otherwise, ${L}^{2}$ norms in high dimensions do pose problems.

"So is $Y$ (the vector of $10,000$ observations) $1$-mimensional or high dimensional?"

Baron Coffey

Beginner2022-09-05Added 5 answers

Consider the function

$f({X}_{1},{X}_{2})={\beta}_{0}+{\beta}_{1}{X}_{1}+{\beta}_{2}{X}_{2}.$

This is a plane in 3 dimensions no matter how many times you evaluate the function. Thus, your problem "lives" in a $2$-dimensional space.

As for using the ${L}^{2}$ norm, you are correct.

$f({X}_{1},{X}_{2})={\beta}_{0}+{\beta}_{1}{X}_{1}+{\beta}_{2}{X}_{2}.$

This is a plane in 3 dimensions no matter how many times you evaluate the function. Thus, your problem "lives" in a $2$-dimensional space.

As for using the ${L}^{2}$ norm, you are correct.

Leonel Schwartz

Beginner2022-09-06Added 2 answers

The issue of dimensionality is the context of regression analysis is the ratio between $n$, number of observations, and $p$, number of estimated parameters. As closer $n$ to $p$, the less reliable your estimated model is. Assume that your model is

$y={\beta}_{0}+\sum _{j=1}^{p}{x}_{j}{\beta}_{j}+\u03f5,$

hence in order to find the OLS estimators of $\beta =({\beta}_{0},...,{\beta}_{p})$ you project the vector $y$ on the affine space spanned by $(1,{x}_{1},...,{x}_{p})$, hence it is a $p$ dimensional space. The number of observations, $n$, is not count as dimension. If you have a continuous stochastic process, then you can sample from it infinitely many times, i.e., $n\to \mathrm{\infty}$, that is usually a good feature because you can safely use asymptotic results. Notably, in such a case, there is another problem of artificially low p.values, but this is unrelated to the dimension of the model or the embedded space.\

$y={\beta}_{0}+\sum _{j=1}^{p}{x}_{j}{\beta}_{j}+\u03f5,$

hence in order to find the OLS estimators of $\beta =({\beta}_{0},...,{\beta}_{p})$ you project the vector $y$ on the affine space spanned by $(1,{x}_{1},...,{x}_{p})$, hence it is a $p$ dimensional space. The number of observations, $n$, is not count as dimension. If you have a continuous stochastic process, then you can sample from it infinitely many times, i.e., $n\to \mathrm{\infty}$, that is usually a good feature because you can safely use asymptotic results. Notably, in such a case, there is another problem of artificially low p.values, but this is unrelated to the dimension of the model or the embedded space.\

Which of the following statements is not correct for the relation R defined by aRb, if and only if b lives within one kilometre from a?

A) R is reflexive

B) R is symmetric

C) R is not anti-symmetric

D) None of the aboveA line segment is a part of a line as well as a ray. True or False

Which characteristic of a data set makes a linear regression model unreasonable?

Find the meaning of 'Sxx' and 'Sxy' in simple linear regression

In the least-squares regression line, the desired sum of the errors (residuals) should be

a) zero

b) positive

c) 1

d) negative

e) maximizedCan the original function be derived from its ${k}^{th}$ order Taylor polynomial?

Should the independent (or dependent) variables in a linear regression model be normal or just the residual?

What is the relationship between the correlation of two variables and their covariance?

What kind of technique is to be adopted if I have to find an equation or model for say, $D$ depends on $C$, $C$ changes for a set of $B$, which changes for different $A$.

Correlation bound

Let x and y be two random variables such that:

Corr(x,y) = b, where Corr(x,y) represents correlation between x and y, b is a scalar number in range of [-1, 1]. Let y' be an estimation of y. An example could be y'=y+(rand(0,1)-0.5)*.1, rand(0,1) gives random number between 0, 1. I am adding some noise to the data.

My questions are:

Is there a way where I can bound the correlation between x, y' i.e. Corr(x,y')?I mentioned y' in light of random perturbation, I would like to know what if I don't have that information, where I only know that y' is a estimation of y. Are there any literature that cover it?What is the benefit of OU vs regression for modeling data, say data in the form of ($x,y$) pairs?

Can you determine the correlation coefficient from the coefficient of determination?

How can one find the root of sesquilinear form with positive definite matrix?

From numerical simulation and regression analysis I discovered that the root-mean-square amplitude of white noise with bandwidth $\mathrm{\Delta}\phantom{\rule{negativethinmathspace}{0ex}}f$ is proportional to $\sqrt{\phantom{\rule{negativethinmathspace}{0ex}}\mathrm{\Delta}\phantom{\rule{negativethinmathspace}{0ex}}f}$. How can this be derived mathematically ?

In a Simple Linear Regression analysis, independent variable is weekly income and dependent variable is weekly consumption expenditure. Here $95$% confidence interval of regression coefficient, ${\beta}_{1}$ is $(.4268,.5914)$.