Determining sample size of a set of boolean data where the probability is not 50%

atgnybo4fq

atgnybo4fq

Answered question

2022-11-04

Determining sample size of a set of boolean data where the probability is not 50%
I'll lay out the problem as a simplified puzzle of what I am attempting to calculate. I imagine some of this may seem fairly straightforward to many but I'm starting to get a bit lost in my head while trying to think through the problem.
Let's say I roll a 1000-sided die until it lands on the number 1. Let's say it took me 700 rolls to get there. I want to prove that the first 699 rolls were not number 1 and obviously the only way to deterministically do this is to include the first 699 failures as part of the result to show they were in fact "not 1".
However, that's a lot of data I would need to prove this. I would have to include all 700 rolls, which is a lot. Therefore, I want to probabilistically demonstrate the fact that I rolled 699 "not 1s" prior to rolling a 1. To do this, I decide I will randomly sample my "not 1" rolls to reduce the set to a statistically significant, yet more wieldy number. It will be good enough to demonstrate that I very probably did not roll a 1 prior to roll 700.
Here are my current assumptions about the state of this problem:
- My initial experiment of rolling until success is one of geometric distribution.
- However my goal for this problem is to demonstrate to a third party that I am not lying, therefore the skeptical third party is not concerned with geometric distribution but would view this simply as a binomial distribution problem.
A lot of sample size calculators exist on the web. They are all based around binomial distribution from what I can tell. So here's the formula I am considering:
n = N × X X + N 1
X = Z α / 2 2 ­ × p × ( 1 p ) M O E 2
n is sample size
N is population size
Z is critical value ( α is 1 c o n f i d e n c e   l e v e l   a s   p r o b a b i l i t y )
p is sample proportion
MOE is margin of error
As an aside, the website where I got this formula says it implements "finite population correction", is this desirable for my requirements?
Here is the math executed on my above numbers. I will use Z a / 2 = 2.58 for α = 0.01, p = 0.001 and M O E = 0.005. As stated above, N = 699 on account of there being 699 failure cases that I would like to sample with a certain level of confidence.
Based on my understanding, what this math will do is recommend a sample size that will show, with 99% confidence, that the sample result is within 0.5 percentage points of reality.
Doing the math, X = 265.989744 and n = 192.8722086653 193, implying that I can have a sample size of 193 to fulfill this confidence level and interval.
My main question is whether my assumption about p = 1 1000 is valid. If it's not, and I use the conservative p = 0.5, then my sample size shoots up to 692. So I would like to know if my assumptions about what sample proportion actually is are correct.
More broadly, am I on the right track at all with this? From my attempt at demonstrating this probabilistically to my current thought process, is any of this accurate at all?

Answer & Explanation

Frances Dodson

Frances Dodson

Beginner2022-11-05Added 17 answers

Explanation:
If the probability of success is S = 0.001 then the probability of failure is F = 0.999 and the probability of 699 failures without success is F 699 0.4969
Bayobusalue

Bayobusalue

Beginner2022-11-06Added 4 answers

Step 1
If the method you have chosen to analyze your problem of rolling a die is selecting a sample from a population which involves taking into account a finite population (correction factor), which means "without replacement" and hence there can be a difference of p = .001 for all rolls versus an increase in p as your sample size increases which is "not desirable for your requirements".
Step 2
However, analyzing it as a one proportion Z test, with n = 699, x = 0, and p 0 = .001, the p value is .4029 versus .4969 by Daniel Mathias's method for a die roll. In both cases such a high p value indicates that getting 699 failures is not statistically significant for either a proportion or a probability of .001.

Do you have a similar question?

Recalculate according to your conditions!

New Questions in College Statistics

Ask your question.
Get an expert answer.

Let our experts help you. Answer in as fast as 15 minutes.

Didn't find what you were looking for?