Why can we use empirical standard deviation when computing mean confidence intervals. I'm reviewing some basic statistics, and I'm asking myself questions on things I used to take for granted when I first saw them years ago. I'm going to state things as I understand them, so there might be a mistake in the following.

2k1ablakrh0

2k1ablakrh0

Answered question

2022-09-22

Why can we use empirical standard deviation when computing mean confidence intervals
I'm reviewing some basic statistics, and I'm asking myself questions on things I used to take for granted when I first saw them years ago. I'm going to state things as I understand them, so there might be a mistake in the following.
Consider a random variable X following a distribution of mean μ and standard deviation σ. We measure n samples of X, and observe an empirical mean x ¯ and empirical std s.
Because we're observing samples, x ¯ and s are random variables themselves. We should therefore not use x ¯ and μ interchangeably, and the same goes for s and σ. Instead, people compute confidence interval on μ based on x ¯ . By the central limit theorem, if n is large enough, we can say that x ¯ N ( μ , σ n ).
When computing confidence intervals, we usually use σ x ¯ = s n . If all of this is correct, my question is the following: since both s and x ¯ are random variables, why does it seems to be ok to consider s = σ when computing confidence intervals for μ?

Answer & Explanation

Elias Keller

Elias Keller

Beginner2022-09-23Added 11 answers

Step 1
Nearly everything you wrote is accurate, but the conclusion you draw at the end is based on a misapprehension; namely that the standard error of the mean when the variance is unknown, the quantity s / n , is equal to the standard deviation of the sampling distribution of the mean, which is asymptotically σ / n . These are not equal since the former is a statistic and the latter is a function of the unknown parameter σ.
That said, the former estimates the latter, in the same way that x ¯ estimates μ.
The source of this confusion comes from the notion that the construction of the confidence interval is based on some "recipe" in which the confidence limits may be expressed in the form
x ¯ ± t n 1 , α s n ,
where t n 1 , α is some "critical value" that depends on the desired level of confidence and the sample size. While the formula is correct, the origin of this formula isn't by analogy with the corresponding parameters; that is to say, the interval estimate isn't constructed from some expression like
μ ± t n 1 , α σ n .
Step 2
This is incorrect. The interval estimate comes from inverting a hypothesis test. Specifically, we want to control the Type I error α so that
Pr [ | X ¯ μ 0 | s / n > c | H 0 ] = α ,
for the hypothesis
H 0 : μ = μ 0 vs. H a : μ μ 0
and c is some critical value that depends on α. The statistic on the LHS of the inequality is (without the absolute value) is Student t-distributed when H 0 is true, thus c is a quantile of the Student t distribution.

Do you have a similar question?

Recalculate according to your conditions!

New Questions in College Statistics

Ask your question.
Get an expert answer.

Let our experts help you. Answer in as fast as 15 minutes.

Didn't find what you were looking for?