Why does the standard deviation change from confidence intervals to hypothesis tests? When considering two-sample data that involves a difference of proportions, both a confidence interval and a hypothesis test can be done.
Hugh Soto
Answered question
2022-09-13
Why does the standard deviation change from confidence intervals to hypothesis tests? When considering two-sample data that involves a difference of proportions, both a confidence interval and a hypothesis test can be done. The standard deviation used for a difference of proportions in creating a confidence interval is However, the standard deviation used for confidence intervals is , where , , and What I don't understand is why these are different. They're both the standard deviation of the same proportion, so why should they differ?
Answer & Explanation
Clarence Mills
Beginner2022-09-14Added 18 answers
Step 1 In hypothesis testing you are making an assumption (the null hypothesis) that . If they are truly equal, then we can call the common parameter by p. You need to use something as a value for p when it shows up in the standard deviation expression. If you fully follow the assumption that and are equal, then neither sample proportion is your best guess for a value of p. Rather the pooled proportion is better because it uses more individuals. If really does equal , using all individuals would give a better estimate for p, aka , aka (by assumption). Since we are now using a sample statistic in place of a population parameter, we are working with a standard error rather than a standard deviation. Step 2 On the other hand with a confidence interval for , we make no assumption that ; if we did, we'd be done! and that's that. So we use the best guess we have for each pi separately. I'm assuming that you understand standard deviation, variance, and how variance is additive in the first place to understand why the big messy square root arises in the first place.
moidu13x8
Beginner2022-09-15Added 2 answers
Step 1 Below, I will use to indicate sample proportions and to indicate true values (population parameters). I will use 95% intervals for demonstration. The test of differences in proportions starts with the null hypothesis . Under this assumption, is approximately normal with variance and mean 0. When this is true, the 95% probability interval (the interval for which, if the null-hypothesis is true, the value will be within 95% of the time) is approximately
The alternate formula (with rather than p), is a less efficient estimator of the standard deviation of the sample proportion difference since the entire data set is not used to estimate p and instead and are estimated separately. On the other hand, the 95% confidence interval is the set of all potential true values of for which the a sample value less extreme than would be generated from the same sampling procedure at least 95% of the time. In constructing this interval, one could not make the assumption since asking about the potential true values about while making an assumption about that value is meaningless. Step 2 Without the assumption of equivalence, our best estimate of the standard deviation of the difference is the alternate formula and the approximate 95% confidence interval is given by: