The effect of changing sample sizes on outliersI know that the size of a sample is inversely proportional to the width of a confidence interval, and that outliers tend to increase the width of the interval as well. So that must mean that increasing the sample size reduces the effect of outliers on a confidence interval, and decreasing the sample size amplifies the effect, correct?How can I show this using formulae instead of words for, say a confidence interval for a one-sample z-test?Also as a side note, does changing the sample size change how outliers affect the p-value of a hypothesis test? I'm inclined to say yes, but I'm not sure how to justify that conclusion.

Question

The effect of changing sample sizes on outliersI know that the size of a sample is inversely proportional to the width of a confidence interval, and that outliers tend to increase the width of the interval as well. So that must mean that increasing the sample size reduces the effect of outliers on a confidence interval, and decreasing the sample size amplifies the effect, correct?How can I show this using formulae instead of words for, say a confidence interval for a one-sample z-test?Also as a side note, does changing the sample size change how outliers affect the p-value of a hypothesis test? I&#039;m inclined to say yes, but I&#039;m not sure how to justify that conclusion.

kunstboom8w · Accepted Answer

Step 1(1) The width of a z CI for normal   μ when   σ is known is inversely proportional to       n    ,, not n. Because   σ is known, the sample SD S plays no role in the width of the CI. [If you are talking about unknown   σ and t CI&#039;s, then the width depends on n, the appropriate quantile of t for df   n  −  1, and the size of the sample SD (a random variable).]Step 2(2) If data are normal, there may be outliers. Normal tails go out to   ±  ∞ even though the probability of extreme values beyond   μ  ±  3  σ are rare. After a certain point, as the sample size increases, so does the likelihood of getting an outlier. (Boxplots are not really intended for use with samples with very small n, and so I&#039;m talking about   n  ≥  15 or so. The definition of &#039;quartile&#039; is a bit sketchy in a sample of size 5, or 11.)Step 3(3) I do not know of any &#039;formula&#039; that directly links (1) and (2), mostly because I know of no formula for the numbers of outliers. I have seen simulation studies that show the results, but outliers depend on the quartiles and their distributions with increasing n are a bit messy.Step 4(4) Here are simulations that estimate the average numbers of outliers per sample in normal samples of various sizes   n  =  20,   n  =  50  ,, and   n  =  100. By averaging outliers in many samples of each size, one can approximate the expected number of outliers. Results for expected numbers of outliers per sample are about 0.33, 0.58, and 0.92, respectively.Step 5While many extreme outliers may be a signal that a sample is not from a normal distribution, we see from these simulations that there is nothing &#039;abnormal&#039; about getting some outliers in a normal sample. About a quarter of samples of size 20 have them and a normal sample of size 100 is more likely to have some outliers than not. The values from t tables that are used to make t CIs (when   σ is estimated by S) allow for the effect of such inherently normal outliers.

Answered question

Answer & Explanation

New Questions in Integral Calculus