Recent questions in Multiple Regression

Inferential StatisticsAnswered question

shiya43 2022-10-29

Can you use bivariate analysis in a multiple regression problem

I have one response variable and around 10 potential explanatory variables. I am looking for simple ways to visualize and explore my data before modelling. As well as carrying out PCA and looking for correlations between my explanatory variables I wanted to look at bi-variate scatter plots of the response with each explanatory variable - but I have been told this wont work.

This is they way I am picturing it -

Imagine you have n independent, uncorrelated explanatory variables.

You propose to use these variables in a multiple regression. You want to simplify the analysis by not including more variables than is necessary.

So , you want to know if the explanatory variables have any correlation with the response variable to decide whether to include the variable in the regression.

My feeling is 1) that you can look at bi-variate plots for the response variable with each explanatory variable to see if there is a correlation.

2) If a single explanatory variable has no correlation with a the response in bi-variate correlation, it cannot have any affect on the response if included in the multiple regression.

(The only way this variable could affect the response variable would through an interaction with another explanatory variable - but it has already been determined that all the explanatory variables are uncorrelated.)

I have been told that the above is incorrect and a variable showing no correlation in a bi-variate plot can affect the response variable in a multiple regression.

Could someone help me see where I am going wrong? Many Thanks.

I have one response variable and around 10 potential explanatory variables. I am looking for simple ways to visualize and explore my data before modelling. As well as carrying out PCA and looking for correlations between my explanatory variables I wanted to look at bi-variate scatter plots of the response with each explanatory variable - but I have been told this wont work.

This is they way I am picturing it -

Imagine you have n independent, uncorrelated explanatory variables.

You propose to use these variables in a multiple regression. You want to simplify the analysis by not including more variables than is necessary.

So , you want to know if the explanatory variables have any correlation with the response variable to decide whether to include the variable in the regression.

My feeling is 1) that you can look at bi-variate plots for the response variable with each explanatory variable to see if there is a correlation.

2) If a single explanatory variable has no correlation with a the response in bi-variate correlation, it cannot have any affect on the response if included in the multiple regression.

(The only way this variable could affect the response variable would through an interaction with another explanatory variable - but it has already been determined that all the explanatory variables are uncorrelated.)

I have been told that the above is incorrect and a variable showing no correlation in a bi-variate plot can affect the response variable in a multiple regression.

Could someone help me see where I am going wrong? Many Thanks.

Inferential StatisticsAnswered question

Amiya Melendez 2022-10-24

Questions about multiple linear regression

I have a couple true/false questions basically, one of them is this

In the multiple linear regression model the coefficient of multiple determination gives the proportion of total variability due to the effect of a single predictor

I know the coefficient of multiple determination indicates the amount of total variability explained by the model, but I'm not sure about the single predictor part, I don't think this is true because it uses x1, x2... as predictors no?

The other question is this;

In the multiple linear regression model

${y}_{i}={\beta}_{0}+{\beta}_{1}{x}_{i,1}+{\beta}_{2}{x}_{i,2}+{\beta}_{3}{x}_{i,3}+{\epsilon}_{i}$

the parameter ${\beta}_{1}$ represents the variation in the response corresponding to a unit increase in the variable ${x}_{1}$

I don't think this question is true but can't really explain why

All help would be greatly appreciated

I have a couple true/false questions basically, one of them is this

In the multiple linear regression model the coefficient of multiple determination gives the proportion of total variability due to the effect of a single predictor

I know the coefficient of multiple determination indicates the amount of total variability explained by the model, but I'm not sure about the single predictor part, I don't think this is true because it uses x1, x2... as predictors no?

The other question is this;

In the multiple linear regression model

${y}_{i}={\beta}_{0}+{\beta}_{1}{x}_{i,1}+{\beta}_{2}{x}_{i,2}+{\beta}_{3}{x}_{i,3}+{\epsilon}_{i}$

the parameter ${\beta}_{1}$ represents the variation in the response corresponding to a unit increase in the variable ${x}_{1}$

I don't think this question is true but can't really explain why

All help would be greatly appreciated

Inferential StatisticsAnswered question

Vincent Norman 2022-10-15

Variable and its dynamics in one multiple regression model

I am trying to find the dependence between default rates in bank and macroeconomic variables with linear regression. To do so I created a code which estimates every possible model - every combination of variables is tested. As an output I obtain R-squared, statistics for Chow, Breusch-Pagan, Breusch-Godfrey, RESET and Shapiro-Wilk tests as well as VIF. The only model which pass all tests, has satisfying R-squared and low VIF is as follows: ${y}_{t}={\beta}_{0}+{\beta}_{1}{x}_{t}+{\beta}_{2}\Delta {x}_{t}$ where $\Delta {x}_{t}={x}_{t}-{x}_{t-1}$. Altough using variable and its dynamics in one model seems a bit strange, I did not find any reason to reject the model. I would be grateful if someone could help me motivate accepting or rejecting such a model.

I am trying to find the dependence between default rates in bank and macroeconomic variables with linear regression. To do so I created a code which estimates every possible model - every combination of variables is tested. As an output I obtain R-squared, statistics for Chow, Breusch-Pagan, Breusch-Godfrey, RESET and Shapiro-Wilk tests as well as VIF. The only model which pass all tests, has satisfying R-squared and low VIF is as follows: ${y}_{t}={\beta}_{0}+{\beta}_{1}{x}_{t}+{\beta}_{2}\Delta {x}_{t}$ where $\Delta {x}_{t}={x}_{t}-{x}_{t-1}$. Altough using variable and its dynamics in one model seems a bit strange, I did not find any reason to reject the model. I would be grateful if someone could help me motivate accepting or rejecting such a model.

Inferential StatisticsAnswered question

Paola Mayer 2022-10-14

Multiple linear regression linear relationship or not

How should a multiple linear regression be interpreted that has statistically significant predictors, but an R Square value of $0.004$? Does that mean that there is a significant linear relationship (because statistically significant predictors), even though there is close to no linear relationship (${R}^{2}$ of $0.004$ indicates close to no linear relationship).

How should a multiple linear regression be interpreted that has statistically significant predictors, but an R Square value of $0.004$? Does that mean that there is a significant linear relationship (because statistically significant predictors), even though there is close to no linear relationship (${R}^{2}$ of $0.004$ indicates close to no linear relationship).

Inferential StatisticsAnswered question

Hunter Shah 2022-10-13

2-dimensional representation of a multiple regression function

Supposing I have a multiple regression population function of the form:

${Y}_{i}={\beta}_{1}+{\beta}_{2}{X}_{2i}+{\beta}_{3}{X}_{3i}+{u}_{i}$

with ${X}_{3i}$ a dummy variable (only takes values $0$ and $1$).

I am given a sample of points. Although the latter takes place in 3 dimensional space, the question states "its results can be represented in $Y$ vs ${X}_{2}$ space". I don't understand how graphing $Y$ vs ${X}_{2}$ will give us a 2 dimensional representation of our population regression function. Isn't ${X}_{3i}$ being completely omitted?

Supposing I have a multiple regression population function of the form:

${Y}_{i}={\beta}_{1}+{\beta}_{2}{X}_{2i}+{\beta}_{3}{X}_{3i}+{u}_{i}$

with ${X}_{3i}$ a dummy variable (only takes values $0$ and $1$).

I am given a sample of points. Although the latter takes place in 3 dimensional space, the question states "its results can be represented in $Y$ vs ${X}_{2}$ space". I don't understand how graphing $Y$ vs ${X}_{2}$ will give us a 2 dimensional representation of our population regression function. Isn't ${X}_{3i}$ being completely omitted?

Inferential StatisticsAnswered question

Cindy Noble 2022-10-05

How to write an equation where both independent variables and dependent variables are log transformed in a multiple regression?

How to write the multiple regression model when both the dependent variable and independent variables are log-transformed?

I know that without any log transformation the linear regression model would be written as enter image description here

$y={\beta}_{0}+{\beta}_{1}({x}_{1})+{\beta}_{2}({x}_{2})+\dots $

But now I have transformed both my dependent variables and independent variable with log. So is correct to write as enter image description here $\mathrm{log}(y)={\beta}_{0}+{\beta}_{1}\cdot \mathrm{log}({x}_{1})+{\beta}_{2}\cdot \mathrm{log}({x}_{2})+\dots $

Or since I am transforming both sides of question so can I write it as enter image description here

$\mathrm{ln}(y)={\beta}_{0}+{\beta}_{1}({x}_{1})+{\beta}_{2}({x}_{2})+\dots $

How to write the multiple regression model when both the dependent variable and independent variables are log-transformed?

I know that without any log transformation the linear regression model would be written as enter image description here

$y={\beta}_{0}+{\beta}_{1}({x}_{1})+{\beta}_{2}({x}_{2})+\dots $

But now I have transformed both my dependent variables and independent variable with log. So is correct to write as enter image description here $\mathrm{log}(y)={\beta}_{0}+{\beta}_{1}\cdot \mathrm{log}({x}_{1})+{\beta}_{2}\cdot \mathrm{log}({x}_{2})+\dots $

Or since I am transforming both sides of question so can I write it as enter image description here

$\mathrm{ln}(y)={\beta}_{0}+{\beta}_{1}({x}_{1})+{\beta}_{2}({x}_{2})+\dots $

Inferential StatisticsAnswered question

trkalo84 2022-09-27

What is $Var[b]$ in multiple regression?

Assume a linear regression model $y=X\beta +\u03f5$ with $\u03f5\sim N(0,{\sigma}^{2}I)$ and $\hat{y}=Xb$ where $b=({X}^{\prime}X{)}^{-1}{X}^{\prime}y$. Besides $H=X({X}^{\prime}X{)}^{-1}{X}^{\prime}$ is the linear projection from the response space to the span of $X$, i.e., $\hat{y}=Hy$

Now I want to calculate $Var[b]$ but what I get is an $k\times k$ matrix, not an $n\times n$ one. Here's my calculation:

$\begin{array}{rl}Var[b]=& \phantom{\rule{thickmathspace}{0ex}}Var[({X}^{\prime}X{)}^{-1}{X}^{\prime}y]\\ =& \phantom{\rule{thickmathspace}{0ex}}({X}^{\prime}X{)}^{-1}{X}^{\prime}\phantom{\rule{thinmathspace}{0ex}}\underset{={\sigma}^{2}I}{\underset{\u23df}{Var[y]}}X({X}^{\prime}X{)}^{-1}\\ \\ \text{Here you can}& \text{see already this thing will be k}\times \text{k}\\ \\ =& \phantom{\rule{thickmathspace}{0ex}}{\sigma}^{2}\underset{I}{\underset{\u23df}{({X}^{\prime}X{)}^{-1}{X}^{\prime}X}}({X}^{\prime}X{)}^{-1}\\ =& {\sigma}^{2}({X}^{\prime}X{)}^{-1}\phantom{\rule{thinmathspace}{0ex}}\in {R}^{k\times k}\end{array}$

What am I doing wrong?

Besides, are $E[b]=\beta $, $E[\hat{y}]=HX\beta $, $Var[\hat{y}]={\sigma}^{2}H$, $E[y-\hat{y}]=(I-H)X\beta $, $Var[y-\hat{y}]=(I-H){\sigma}^{2}$ correct (this is just on a side note, my main question is the one above)?

Assume a linear regression model $y=X\beta +\u03f5$ with $\u03f5\sim N(0,{\sigma}^{2}I)$ and $\hat{y}=Xb$ where $b=({X}^{\prime}X{)}^{-1}{X}^{\prime}y$. Besides $H=X({X}^{\prime}X{)}^{-1}{X}^{\prime}$ is the linear projection from the response space to the span of $X$, i.e., $\hat{y}=Hy$

Now I want to calculate $Var[b]$ but what I get is an $k\times k$ matrix, not an $n\times n$ one. Here's my calculation:

$\begin{array}{rl}Var[b]=& \phantom{\rule{thickmathspace}{0ex}}Var[({X}^{\prime}X{)}^{-1}{X}^{\prime}y]\\ =& \phantom{\rule{thickmathspace}{0ex}}({X}^{\prime}X{)}^{-1}{X}^{\prime}\phantom{\rule{thinmathspace}{0ex}}\underset{={\sigma}^{2}I}{\underset{\u23df}{Var[y]}}X({X}^{\prime}X{)}^{-1}\\ \\ \text{Here you can}& \text{see already this thing will be k}\times \text{k}\\ \\ =& \phantom{\rule{thickmathspace}{0ex}}{\sigma}^{2}\underset{I}{\underset{\u23df}{({X}^{\prime}X{)}^{-1}{X}^{\prime}X}}({X}^{\prime}X{)}^{-1}\\ =& {\sigma}^{2}({X}^{\prime}X{)}^{-1}\phantom{\rule{thinmathspace}{0ex}}\in {R}^{k\times k}\end{array}$

What am I doing wrong?

Besides, are $E[b]=\beta $, $E[\hat{y}]=HX\beta $, $Var[\hat{y}]={\sigma}^{2}H$, $E[y-\hat{y}]=(I-H)X\beta $, $Var[y-\hat{y}]=(I-H){\sigma}^{2}$ correct (this is just on a side note, my main question is the one above)?

Inferential StatisticsAnswered question

Parker Pitts 2022-09-27

Multiple Regression Forecast

"Part C: asks what salary would you forecast for a man with 12 years of education, 10 months of experience, and 15 months with the company."

This is straight forward enough just reading off the coefficients table. $y=3526.4+(722.5)(1)+(90.02)(12)+(1.269)(10)+(23.406)(15)=5692.92$

but

"Part D: asks what salary would you forecast for men with 12 years of education, 10 months of experience, and 15 months with the company."

I know that the answer to this must be different from C, but I have no idea why, I would of just done exactly the same as in part C,

What is wrong with my train of thought or intuition and how might I go about calculating the salary for men, rather than a man?

"Part C: asks what salary would you forecast for a man with 12 years of education, 10 months of experience, and 15 months with the company."

This is straight forward enough just reading off the coefficients table. $y=3526.4+(722.5)(1)+(90.02)(12)+(1.269)(10)+(23.406)(15)=5692.92$

but

"Part D: asks what salary would you forecast for men with 12 years of education, 10 months of experience, and 15 months with the company."

I know that the answer to this must be different from C, but I have no idea why, I would of just done exactly the same as in part C,

What is wrong with my train of thought or intuition and how might I go about calculating the salary for men, rather than a man?

Inferential StatisticsAnswered question

jhenezhubby01ff 2022-09-04

Dimensionality of datasets in multiple regression

As an example, let's say that a linear regression is performed of the form

$Y={\beta}_{0}+{\beta}_{1}{X}_{1}+{\beta}_{2}{X}_{2}+\cdots +{\beta}_{n}{X}_{n}+\epsilon $

where $Y$ is a vector of $10,000$ measurements of peak acceleration of different car models, and the regressors correspond to different technical features of the cars.

From a linear algebra standpoint $Y$ lives in ${\mathbb{R}}^{10000}$, and the coefficients are found by minimizing the sum of the square distances of this vector on a hyperplane.

Now, from the point of view of dimension being the number of linearly independent vectors that span a space, this vector $Y$ is just $1$ dimension.

If it is truly $1$-dimension of a ${\mathbb{R}}^{10000}$ ambient space, the Euclidean projection on the hyperplane that underpins the process of finding the coefficients does not have any dimensionality issues (collinearity between the regressors being a separate topic). Otherwise, ${L}^{2}$ norms in high dimensions do pose problems.

"So is $Y$ (the vector of $10,000$ observations) $1$-mimensional or high dimensional?"

As an example, let's say that a linear regression is performed of the form

$Y={\beta}_{0}+{\beta}_{1}{X}_{1}+{\beta}_{2}{X}_{2}+\cdots +{\beta}_{n}{X}_{n}+\epsilon $

where $Y$ is a vector of $10,000$ measurements of peak acceleration of different car models, and the regressors correspond to different technical features of the cars.

From a linear algebra standpoint $Y$ lives in ${\mathbb{R}}^{10000}$, and the coefficients are found by minimizing the sum of the square distances of this vector on a hyperplane.

Now, from the point of view of dimension being the number of linearly independent vectors that span a space, this vector $Y$ is just $1$ dimension.

If it is truly $1$-dimension of a ${\mathbb{R}}^{10000}$ ambient space, the Euclidean projection on the hyperplane that underpins the process of finding the coefficients does not have any dimensionality issues (collinearity between the regressors being a separate topic). Otherwise, ${L}^{2}$ norms in high dimensions do pose problems.

"So is $Y$ (the vector of $10,000$ observations) $1$-mimensional or high dimensional?"

Inferential StatisticsAnswered question

oliadas73 2022-09-03

Conflicting Results from t-test and F-based stepwise regression in multiple regression.

I currently am tasked with building a multiple regression model with two predictor variables to consider. That means there are potentially three terms in the model, Predictor A (PA), Predictor B (PB) and PA*PB.

In one instance, I made a LS model containing all three terms, and did simple t-tests. I divided the parameter estimates by their standard errors to calculate t-statistics, and determined that only the intercept and PA*PB coefficients were significantly different from zero.

In another instance, I did stepwise regression by first creating a model with only PA, and then fit a model to PA and PB, and did an F-test based on the Sum of Squares between the two models. The F-test concluded that PB was a significant predictor to include in the model, and when I repeated the procedure, the PA*PB coefficient was found to reduce SSE significantly as well.

So in summary, the t-test approach tells me that only the cross-product term PA*PB has a significant regression coefficient when all terms are included in the model, but the stepwise approach tells me to include all terms in the model.

Based on these conflicting results, what course of action would you recommend?

I currently am tasked with building a multiple regression model with two predictor variables to consider. That means there are potentially three terms in the model, Predictor A (PA), Predictor B (PB) and PA*PB.

In one instance, I made a LS model containing all three terms, and did simple t-tests. I divided the parameter estimates by their standard errors to calculate t-statistics, and determined that only the intercept and PA*PB coefficients were significantly different from zero.

In another instance, I did stepwise regression by first creating a model with only PA, and then fit a model to PA and PB, and did an F-test based on the Sum of Squares between the two models. The F-test concluded that PB was a significant predictor to include in the model, and when I repeated the procedure, the PA*PB coefficient was found to reduce SSE significantly as well.

So in summary, the t-test approach tells me that only the cross-product term PA*PB has a significant regression coefficient when all terms are included in the model, but the stepwise approach tells me to include all terms in the model.

Based on these conflicting results, what course of action would you recommend?

Inferential StatisticsOpen question

umm82i 2022-08-28

What is J in while calculating SST in multiple regression?

I am little confused what actually is the J in the formula of the SST and SSR for multiple regression

SST= ${Y}^{T}[1-\frac{1}{n}J]Y$

SSR=${Y}^{T}[H-\frac{1}{n}J]Y$

I am little confused what actually is the J in the formula of the SST and SSR for multiple regression

SST= ${Y}^{T}[1-\frac{1}{n}J]Y$

SSR=${Y}^{T}[H-\frac{1}{n}J]Y$

Inferential StatisticsOpen question

dammeym 2022-08-21

Multiple Linear Regression in Matrix Form

I am currently studying for my exams and came across the following question:

State the multiple linear regression equation in matrix form. Write down the order of each matrix and explain what the elements of each matrix and vector stand for. Write down the standard assumptions for the multiple linear regression.

I understand how order in a matrix normally depending on the amount of rows and columns of the matrix, but don't understand when I am talking in terms of the multiple linear regression equation. If you could include what the elements of each matrix and vector standard for as well that would be much appreciated.

I am perfectly fine with the assumptions.

Thank you

I am currently studying for my exams and came across the following question:

State the multiple linear regression equation in matrix form. Write down the order of each matrix and explain what the elements of each matrix and vector stand for. Write down the standard assumptions for the multiple linear regression.

I understand how order in a matrix normally depending on the amount of rows and columns of the matrix, but don't understand when I am talking in terms of the multiple linear regression equation. If you could include what the elements of each matrix and vector standard for as well that would be much appreciated.

I am perfectly fine with the assumptions.

Thank you

Inferential StatisticsOpen question

lollaupligey9 2022-08-18

Finding Multiple regression coefficients

If I have a multiple regression like this $Y=a+{b}_{1}.{X}_{1}+{b}_{2}.{X}_{2},$ how can I calculate the values of ${b}_{1}$ and ${b}_{2}$? I have searched on the web but couldn't find an answer.

If I have a multiple regression like this $Y=a+{b}_{1}.{X}_{1}+{b}_{2}.{X}_{2},$ how can I calculate the values of ${b}_{1}$ and ${b}_{2}$? I have searched on the web but couldn't find an answer.

Inferential StatisticsOpen question

Landen Miller 2022-08-17

Is there such a thing as a weighted multiple regression?

I'm new to linear algebra, but I know how multiple linear regressions work. What I want to do is something slightly different.

As an example, let's say that I have a list of nutrients I want to get every day. Say I have a list of foods, and I want to know how much of each food to eat to get the best fit for my nutrition plan. Assume I'm fine with using a linear model.

However, some nutrients are more important than others. The errors on protein and calcium might be equal in a typical linear regression, but that's no use. Protein has higher priority than calcium (in this model), so I'd want a model that is better fitting to the higher priority points than to the lower ones.

I tried putting weights on the error function, and I end up with a matrix of matrices. At that point, I'm not sure if I'm minimising for the weights or for the coefficients on the nutrients. I think both, but I wasn't sure how to minimise for both at the same time.

Is it possible to solve this with linear algebra, or does this require some numerical approximation solution?

I'm new to linear algebra, but I know how multiple linear regressions work. What I want to do is something slightly different.

As an example, let's say that I have a list of nutrients I want to get every day. Say I have a list of foods, and I want to know how much of each food to eat to get the best fit for my nutrition plan. Assume I'm fine with using a linear model.

However, some nutrients are more important than others. The errors on protein and calcium might be equal in a typical linear regression, but that's no use. Protein has higher priority than calcium (in this model), so I'd want a model that is better fitting to the higher priority points than to the lower ones.

I tried putting weights on the error function, and I end up with a matrix of matrices. At that point, I'm not sure if I'm minimising for the weights or for the coefficients on the nutrients. I think both, but I wasn't sure how to minimise for both at the same time.

Is it possible to solve this with linear algebra, or does this require some numerical approximation solution?

Inferential StatisticsOpen question

ferdysy9 2022-08-16

Multiple linear regression with interaction

I'm doing a multiple linear regression with interacting variables. I'll give you an example:

$y$=value, ${x}_{1}$=material, ${x}_{2}$=weight, ${x}_{3}$=color

${x}_{1}$ and ${x}_{2}$ are interacting variables but ${x}_{3}$ is not. Right now I'm using something like:

$y={a}_{0}+{a}_{1}{x}_{1}+{a}_{2}{x}_{2}+{a}_{3}{x}_{3}+{a}_{12}{x}_{1}{x}_{2}+u$

I'm pretty new to regression analysis so I wonder if there is any way to convert this formula to something like

$y={a}_{0}+{a}_{1}{x}_{1}+{a}_{2}{x}_{2}+{a}_{3}{x}_{3}+u$

so I can see how much effect ${x}_{1}$ and ${x}_{2}$ have simply by looking at ${a}_{1}$ and ${a}_{2}$? What I want to do is to just be able to look at the equation and understand how much 1 kg of extra weight adds in value without needing to calculate y. Splitting up the interaction term ${a}_{12}$ and distributing the effect over ${a}_{1}$ and ${a}_{2}$ if you guys understand what I mean. Maybe it's not possible or maybe there is a better regression method that is more suited for this, I don't know. I'd love to get some pointers from you guys.

Thanks.

I'm doing a multiple linear regression with interacting variables. I'll give you an example:

$y$=value, ${x}_{1}$=material, ${x}_{2}$=weight, ${x}_{3}$=color

${x}_{1}$ and ${x}_{2}$ are interacting variables but ${x}_{3}$ is not. Right now I'm using something like:

$y={a}_{0}+{a}_{1}{x}_{1}+{a}_{2}{x}_{2}+{a}_{3}{x}_{3}+{a}_{12}{x}_{1}{x}_{2}+u$

I'm pretty new to regression analysis so I wonder if there is any way to convert this formula to something like

$y={a}_{0}+{a}_{1}{x}_{1}+{a}_{2}{x}_{2}+{a}_{3}{x}_{3}+u$

so I can see how much effect ${x}_{1}$ and ${x}_{2}$ have simply by looking at ${a}_{1}$ and ${a}_{2}$? What I want to do is to just be able to look at the equation and understand how much 1 kg of extra weight adds in value without needing to calculate y. Splitting up the interaction term ${a}_{12}$ and distributing the effect over ${a}_{1}$ and ${a}_{2}$ if you guys understand what I mean. Maybe it's not possible or maybe there is a better regression method that is more suited for this, I don't know. I'd love to get some pointers from you guys.

Thanks.

Inferential StatisticsAnswered question

zabuheljz 2022-08-14

variance of multiple regression coefficients

If I consider universal kriging (or multiple spatial regression) in matrix form as:

$\mathbf{V}\mathbf{=}\mathbf{X}\mathbf{A}\mathbf{+}\mathbf{R}$

where $\mathbf{R}$ is the residual and $\mathbf{A}$ are the trend coefficients, then the estimate of $\hat{\mathbf{A}}$ is:

$\hat{\mathbf{A}}=({\mathbf{X}}^{\mathbf{T}}{\mathbf{C}}^{\mathbf{-}\mathbf{1}}\mathbf{X}{\mathbf{)}}^{\mathbf{-}\mathbf{1}}{\mathbf{X}}^{\mathbf{T}}{\mathbf{C}}^{\mathbf{-}\mathbf{1}}\mathbf{V}$

(as I understand it), where $\mathbf{C}$ is the covariance matrix, if it is known. Then, the variance of the coefficients is:

VAR($\text{VAR}(\hat{\mathbf{A}})=({\mathbf{X}}^{\mathbf{T}}{\mathbf{C}}^{\mathbf{-}\mathbf{1}}\mathbf{X}{\mathbf{)}}^{\mathbf{-}\mathbf{1}}$???

How does one get from the estimate of $\hat{\mathbf{A}}$, to its variance? i.e. how can I derive that variance?

If I consider universal kriging (or multiple spatial regression) in matrix form as:

$\mathbf{V}\mathbf{=}\mathbf{X}\mathbf{A}\mathbf{+}\mathbf{R}$

where $\mathbf{R}$ is the residual and $\mathbf{A}$ are the trend coefficients, then the estimate of $\hat{\mathbf{A}}$ is:

$\hat{\mathbf{A}}=({\mathbf{X}}^{\mathbf{T}}{\mathbf{C}}^{\mathbf{-}\mathbf{1}}\mathbf{X}{\mathbf{)}}^{\mathbf{-}\mathbf{1}}{\mathbf{X}}^{\mathbf{T}}{\mathbf{C}}^{\mathbf{-}\mathbf{1}}\mathbf{V}$

(as I understand it), where $\mathbf{C}$ is the covariance matrix, if it is known. Then, the variance of the coefficients is:

VAR($\text{VAR}(\hat{\mathbf{A}})=({\mathbf{X}}^{\mathbf{T}}{\mathbf{C}}^{\mathbf{-}\mathbf{1}}\mathbf{X}{\mathbf{)}}^{\mathbf{-}\mathbf{1}}$???

How does one get from the estimate of $\hat{\mathbf{A}}$, to its variance? i.e. how can I derive that variance?

Inferential StatisticsAnswered question

Jazmin Clark 2022-08-13

Multiple regression degrees of freedom $f$-test.

I'm finding conflicting information from college textbooks on calculating the degrees of freedom for a a global $F$-test on a multiple regression. To be absolutely clear, assume there are 50 observations and 3 independent variables. Can you please tell me the df for the numerator and denominator? I have found 2 sets of numbers in college texts. One indicating the numerator is equal to $P$, in this case 3, and alternatively $P-1$. For the denominator I am finding $n-p$,which in this case would be 47, and alternatively, $n-p-1$. Perhaps I am misunderstanding the material and there are circumstances when one vs. the other formula applies. I've not done any regression analysis in more than 25 years and now find I'm stuck on a Christmas vacation project I wanted to do with my son. So any help that would explain, in a gentle way, (I can't get through the quadratic explanation, or something that will bury me in calculus) how to determine the df would be appreciated. Concrete examples would be very beneficial. Also, if there is a good practical walk through of multiple regression/Anova that will show some examples and explain concepts (but please do not recommend Regression for Dummies) I'd appreciate a referral to that as well. Thanks for your help.

I'm finding conflicting information from college textbooks on calculating the degrees of freedom for a a global $F$-test on a multiple regression. To be absolutely clear, assume there are 50 observations and 3 independent variables. Can you please tell me the df for the numerator and denominator? I have found 2 sets of numbers in college texts. One indicating the numerator is equal to $P$, in this case 3, and alternatively $P-1$. For the denominator I am finding $n-p$,which in this case would be 47, and alternatively, $n-p-1$. Perhaps I am misunderstanding the material and there are circumstances when one vs. the other formula applies. I've not done any regression analysis in more than 25 years and now find I'm stuck on a Christmas vacation project I wanted to do with my son. So any help that would explain, in a gentle way, (I can't get through the quadratic explanation, or something that will bury me in calculus) how to determine the df would be appreciated. Concrete examples would be very beneficial. Also, if there is a good practical walk through of multiple regression/Anova that will show some examples and explain concepts (but please do not recommend Regression for Dummies) I'd appreciate a referral to that as well. Thanks for your help.

Inferential StatisticsAnswered question

Crancichhb 2022-08-13

Multiple linear regression ${b}_{0}=0$

I am trying to calculate the coefficients ${b}_{1},{b}_{2},...$ of a multiple linear regression, with the condition that ${b}_{0}=0$. In Excel this can be done using the RGP Function and setting the constant to FALSE.

How can this be done with a simple Formular?

Thank you in Advance!

I am trying to calculate the coefficients ${b}_{1},{b}_{2},...$ of a multiple linear regression, with the condition that ${b}_{0}=0$. In Excel this can be done using the RGP Function and setting the constant to FALSE.

How can this be done with a simple Formular?

Thank you in Advance!

Inferential StatisticsAnswered question

cofak48 2022-08-11

Matrix derivative in multiple linear regression model

The basic setup in multiple linear regression model is

$\begin{array}{rl}Y& =\left[\begin{array}{c}{y}_{1}\\ {y}_{2}\\ \vdots \\ {y}_{n}\end{array}\right]\end{array}$

$\begin{array}{rl}X& =\left[\begin{array}{cccc}1& {x}_{11}& \dots & {x}_{1k}\\ 1& {x}_{21}& \dots & {x}_{2k}\\ \vdots & \dots & \dots \\ 1& {x}_{n1}& \dots & {x}_{nk}\end{array}\right]\end{array}$

$\begin{array}{rl}\beta & =\left[\begin{array}{c}{\beta}_{0}\\ {\beta}_{1}\\ \vdots \\ {\beta}_{k}\end{array}\right]\end{array}$

$\begin{array}{rl}\u03f5& =\left[\begin{array}{c}{\u03f5}_{1}\\ {\u03f5}_{2}\\ \vdots \\ {\u03f5}_{n}\end{array}\right]\end{array}$

The regression model is $Y=X\beta +\u03f5$

To find least square estimator of $\beta $ vector, we need to minimize $S(\beta )={\mathrm{\Sigma}}_{i=1}^{n}{\u03f5}_{i}^{2}={\u03f5}^{\prime}\u03f5=(y-x\beta {)}^{\prime}(y-x\beta )={y}^{\prime}y-2{\beta}^{\prime}{x}^{\prime}y+{\beta}^{\prime}{x}^{\prime}x\beta $

$\frac{\mathrm{\partial}S(\beta )}{\mathrm{\partial}\beta}=0$

My question: how to get $-2{x}^{\prime}y+2{x}^{\prime}x\beta $?

The basic setup in multiple linear regression model is

$\begin{array}{rl}Y& =\left[\begin{array}{c}{y}_{1}\\ {y}_{2}\\ \vdots \\ {y}_{n}\end{array}\right]\end{array}$

$\begin{array}{rl}X& =\left[\begin{array}{cccc}1& {x}_{11}& \dots & {x}_{1k}\\ 1& {x}_{21}& \dots & {x}_{2k}\\ \vdots & \dots & \dots \\ 1& {x}_{n1}& \dots & {x}_{nk}\end{array}\right]\end{array}$

$\begin{array}{rl}\beta & =\left[\begin{array}{c}{\beta}_{0}\\ {\beta}_{1}\\ \vdots \\ {\beta}_{k}\end{array}\right]\end{array}$

$\begin{array}{rl}\u03f5& =\left[\begin{array}{c}{\u03f5}_{1}\\ {\u03f5}_{2}\\ \vdots \\ {\u03f5}_{n}\end{array}\right]\end{array}$

The regression model is $Y=X\beta +\u03f5$

To find least square estimator of $\beta $ vector, we need to minimize $S(\beta )={\mathrm{\Sigma}}_{i=1}^{n}{\u03f5}_{i}^{2}={\u03f5}^{\prime}\u03f5=(y-x\beta {)}^{\prime}(y-x\beta )={y}^{\prime}y-2{\beta}^{\prime}{x}^{\prime}y+{\beta}^{\prime}{x}^{\prime}x\beta $

$\frac{\mathrm{\partial}S(\beta )}{\mathrm{\partial}\beta}=0$

My question: how to get $-2{x}^{\prime}y+2{x}^{\prime}x\beta $?

- 1
- 2

Multiple regression is a form of predictive analysis used to measure the impact of multiple variables on a desired outcome. It is a powerful tool for data analysis that can be used to determine how much effect each variable has on the outcome. This technique can be used to identify relationships between variables, develop predictive models and measure the impact of changes in variables. Multiple regression can also be used to assess the impact of different factors on a given outcome and help to develop strategies for improving outcomes. By understanding the relationships between different variables, businesses can apply multiple regression to help make better decisions and improve their operations.