Sample Standard Deviation vs. Population Standard Deviation. I have an HP 50g graphing calculator and I am using it to calculate the standard deviation of some data. In the statistics calculation there is a type which can have two values: Sample Population
link223mh
Answered question
2022-10-26
Sample Standard Deviation vs. Population Standard Deviation I have an HP 50g graphing calculator and I am using it to calculate the standard deviation of some data. In the statistics calculation there is a type which can have two values: Sample Population I didn't change it, but I kept getting the wrong results for the standard deviation. When I changed it to "Population" type, I started getting correct results! Why is that? As far as I know, there is only one type of standard deviation which is to calculate the root-mean-square of the values! Did I miss something?
Answer & Explanation
Ostrakodec3
Beginner2022-10-27Added 18 answers
Step 1 There are, in fact, two different formulas for standard deviation here: The population standard deviation and the sample standard deviation s. If denote all N values from a population, then the (population) standard deviation is
where is the mean of the population. If denote N values from a sample, however, then the (sample) standard deviation is
where is the mean of the sample. Step 2 The reason for the change in formula with the sample is this: When you're calculating s you are normally using (the sample variance) to estimate (the population variance). The problem, though, is that if you don't know you generally don't know the population mean , either, and so you have to use in the place in the formula where you normally would use . Doing so introduces a slight bias into the calculation: Since is calculated from the sample, the values of xi are on average closer to than they would be to , and so the sum of squares turns out to be smaller on average than . It just so happens that that bias can be corrected by dividing by instead of N. (Proving this is a standard exercise in an advanced undergraduate or beginning graduate course in statistical theory.) The technical term here is that (because of the division by ) is an unbiased estimator of . Another way to think about it is that with a sample you have N independent pieces of information. However, since is the average of those N pieces, if you know , you can figure out what is. So when you're squaring and adding up the residuals , there are really only independent pieces of information there. So in that sense perhaps dividing by rather than N makes sense. The technical term here is that there are degrees of freedom in the residuals .