Start with PCA and multiple regression or Start with multiple regression and PCA



Answered question


Start with PCA and multiple regression or Start with multiple regression and PCA
I would like to know something easy but very important.
Imagine I have a database with 0 NA, a perfect database who has been clean. And I have to do a PCA on this database. This datebase got a lot of individuals and variables ( 95 individuals and 10 variables)
I have to do a multiple regression and a PCA.
I must start per my multiple regression and eventually delete somme individuals who has been a Cook's distance > at the limit. And after I do my PCA on " new data base"
OR I must start per my PCA on my complete database, and after I do my multiple regression.
In conclusion, I must do :
- multiple Regression
-multiple Regression
Ty for helping me !

Answer & Explanation



Beginner2022-07-17Added 18 answers

Regression should be the final step, not the first one. By using PCA you can reduce dimension (i.e., number of explanatory variables) by discarding "unimportant" (that is, with small variance) variables. You can use PCA to perform whitening, i.e., eliminating autocorrelation or heteroscedsticity (inhomogeneous variance) in your data (or future model's residuals). Note that if you are interested in point prediction or R square, regressing on the original features yield the same results as on the principal components. Namely, you should use the PCA for further reduction and tiding of your data (if possible), and not just for the sake of doing PCA itself.

Do you have a similar question?

Recalculate according to your conditions!

New Questions in Inferential Statistics

Ask your question.
Get an expert answer.

Let our experts help you. Answer in as fast as 15 minutes.

Didn't find what you were looking for?