I know that the size of a sample is inversely proportional to the width of a confidence interval, and that outliers tend to increase the width of the interval as well. So that must mean that increasing the sample size reduces the effect of outliers on a confidence interval, and decreasing the sample size amplifies the effect, correct? How can I show this using formulae instead of words for, say a confidence interval for a one-sample z-test?
Elisabeth Wiley
Answered question
2022-08-13
The effect of changing sample sizes on outliers I know that the size of a sample is inversely proportional to the width of a confidence interval, and that outliers tend to increase the width of the interval as well. So that must mean that increasing the sample size reduces the effect of outliers on a confidence interval, and decreasing the sample size amplifies the effect, correct? How can I show this using formulae instead of words for, say a confidence interval for a one-sample z-test? Also as a side note, does changing the sample size change how outliers affect the p-value of a hypothesis test? I'm inclined to say yes, but I'm not sure how to justify that conclusion.
Answer & Explanation
kunstboom8w
Beginner2022-08-14Added 8 answers
Step 1 (1) The width of a z CI for normal when is known is inversely proportional to , not n. Because is known, the sample SD S plays no role in the width of the CI. [If you are talking about unknown and t CI's, then the width depends on n, the appropriate quantile of t for df , and the size of the sample SD (a random variable).] Step 2 (2) If data are normal, there may be outliers. Normal tails go out to even though the probability of extreme values beyond are rare. After a certain point, as the sample size increases, so does the likelihood of getting an outlier. (Boxplots are not really intended for use with samples with very small n, and so I'm talking about or so. The definition of 'quartile' is a bit sketchy in a sample of size 5, or 11.) Step 3 (3) I do not know of any 'formula' that directly links (1) and (2), mostly because I know of no formula for the numbers of outliers. I have seen simulation studies that show the results, but outliers depend on the quartiles and their distributions with increasing n are a bit messy. Step 4 (4) Here are simulations that estimate the average numbers of outliers per sample in normal samples of various sizes , , and . By averaging outliers in many samples of each size, one can approximate the expected number of outliers. Results for expected numbers of outliers per sample are about 0.33, 0.58, and 0.92, respectively. Step 5 While many extreme outliers may be a signal that a sample is not from a normal distribution, we see from these simulations that there is nothing 'abnormal' about getting some outliers in a normal sample. About a quarter of samples of size 20 have them and a normal sample of size 100 is more likely to have some outliers than not. The values from t tables that are used to make t CIs (when is estimated by S) allow for the effect of such inherently normal outliers.