How to refine population statistic when more data is available. Suppose I have two pieces of data about two populations. The first piece of data is the national accident rate, denoted A The second piece of data is the national safety rate, a related, but not exactly inverse piece of data, denoted S. Now if I were to be given an additional piece of data, say a particular cities safety rating, and asked, what is the best guess of that cities accident rating, how would I approach this problem?

ritualizi6zk

ritualizi6zk

Answered question

2022-11-05

How to refine population statistic when more data is available
Suppose I have two pieces of data about two populations. The first piece of data is the national accident rate, denoted A The second piece of data is the national safety rate, a related, but not exactly inverse piece of data, denoted S
Now if I were to be given an additional piece of data, say a particular cities safety rating, and asked, what is the best guess of that cities accident rating, how would I approach this problem?
Also, I am not sure what this kind of situation/problem is called, if someone could point out the branch of statistics this falls under, that would also be helpful.

Answer & Explanation

partatjar6t9

partatjar6t9

Beginner2022-11-06Added 8 answers

Step 1
In favorable circumstances, you might be able to handle this as a regression model. Make a plot, one point for each city, with S on the horizontal axis and A on the vertical axis. One hopes that the points would mostly and approximately lie along a straight line or smooth curve.
If you have hundreds cities, you might be able to predict the S value of a new city, for which you know A, by looking at other cities with nearly the same A's and seeing if their S's are nearly the same.
If you have fewer cities (or want more than a crude approach), you need some a model to so that information on A's in all prior cities might be used to predict S for a new city. The easiest case would be that the points on your plot fall pretty much along a straight line.
Step 2
Then what statisticians call 'simple linear regression' would give you the 'regression line' S = b 0 + b 1 A that best fits the data points for purposes of predicting S from A. The data determine the slope b 1 and the intercept b 0 of the 'regression line'. Plug in the new A and find the corresponding S. With enough data following the right conditions, you might even be able to get 'prediction intervals' (not confidence intervals) to give a guide as the the accuracy of an S value obtained in this way.
Even if a straight line does not fit the data, some smooth curve might. The best fitting smooth curve of that type would require more computation to obtain, but once obtained might also be used for prediction. This is not the place to go into the additional complexities of fitting a curve.
Most elementary statistics books have explanations of simple linear regression. They will likely use the letter x for your A and y for your S. Be aware that the line for predicting y's from x's is different from the one for predicting x's from y's. (That is because 'best fitting' means to reduce prediction error in only one of the two 'directions'.)
Although this is not a statistical consulting service, I might be able to give you better advice how to get started if you could say for how many cities you have data and make a plot and give me some idea whether the points fall roughly along a straight line. (This is not a Newtonian physics experiment, so you are unlikely to get anything like a perfect fit.)++

Do you have a similar question?

Recalculate according to your conditions!

New Questions in College Statistics

Ask your question.
Get an expert answer.

Let our experts help you. Answer in as fast as 15 minutes.

Didn't find what you were looking for?