Brodie Beck

2022-09-07

Let's suppose a regression between earnings and age (and suppose I do not know the distribution of earnings). Would it be possible for the residuals to be normally distributed?
I am thinking it would not be possible since earnings only takes on positive values and since the support of the normal is from $-\mathrm{\infty }$ to $\mathrm{\infty }$, it would not be normal. However, since residuals are errors, they can be both positive and negative, so I am starting to question my hypothesis here.

Tanya Anthony

If earnings are always positive then no, the residuals cannot be normally distributed, even though many may be negative: the magnitude of the negative residuals are bounded by the highest predicted earnings on the regression line.
That may not be the major issue: more important might be issues such as the skewness of earnings distributions at any age, or a non-linear relationship between earnings and age .

Isaac Barry

You can always skew-zero transform a $y$-variable (earnings) if transforming skewed $x$-variables do not result in normally-distributed residuals. van Der Waerden scores would do a good job here, so to begin:
1. Determine percentile values, $pc{t}_{i}$, of each $y$-value based on rank position, $R\left({y}_{i}\right)$, after an ascending sort.
2. Obtain the van der Waerden scores by plugging in the percentile values into the inverse CDF, i.e., ${Z}_{i}={\mathrm{\Phi }}^{-1}\left(pc{t}_{i}\right)$
3. Then regress $Z$ on age, providing age is not skewed too much.
By definition, van der Waerden scores are mean-zero standard normal distributed, $\mathcal{N}\left(0,1\right)$, so the residuals should now be normally distributed.
To interpret the coefficient on age, just deconvolve.

Do you have a similar question?