Determining the correct sample size is critical to the statistical analysis of any cohort. Sample size means the number of observations that comprise a statistical sample. The conclusions that you draw about a cohort are not be meaningful if your sample size is too small, and the project itself may be too onerous if your sample size is too large. Sample size is affected by several factors, including cohort size, sampling error and your desired level of certainty. Statistical formulas can help you to determine how many observations you need to include in your sample.
- Skill level:
Other People Are Reading
Things you need
- Standard Normal (Z) Table or Online Z-Score Calculator
Determine your confidence level. Confidence level refers to the frequency with which a study identical to yours will include your cohort's true value within its confidence interval (your estimate plus or minus its margin of error). For example, a confidence level of 0.95 means that 95 per cent of the time, the confidence interval produced by a study identical to yours will include the true value for your cohort. By convention, confidence level is often set to 95 per cent, but you can choose whatever value best suits your needs.
Translate your confidence level into a Z-score. A Z-score represents the number standard deviations that separate a data point from the normal mean. To determine Z-score, consult a standard normal table. Subtract your confidence level from 1 and divide the result by 2. Then add this value to your confidence level.
For a confidence level of 0.95, you would search the table for a value of 0.975, or 0.95 + ((1 - 0.95)/2. Search for the value that most closely approximates this number (in this example, 0.975) in the body of the table, then add the numbers in the same row of the leftmost column and the same column of the top row to determine your Z-score. For example, a confidence level of .95 corresponds to a Z-score of 1.96. Alternatively, you can use a reputable online calculator to perform this step.
Estimate the prevalence of the characteristic you are seeking to measure in your cohort. For instance, if you wish to measure the number of left-handed people in your cohort and you would anticipate, based on trends in the population at large, that 11 per cent of your cohort is left-handed, the estimated prevalence of this characteristic would be 0.11. This value is called p.
Subtract the value, p, from 1, and call the result q. If p equals 0.11, then q equals 0.89.
Square the Z-score you obtained in step 2 and multiply it by two values: p (which you calculated in step 3) and q (which you calculated in step 4). Using the example numbers above, you would arrive at a result of approximately 0.376, or ((1.96)(1.96)(0.11)(0.89)).
Divide the figure from Step 5 by I-squared, where I is equal to your confidence interval. If your confidence interval is .05, then using the example figure from the previous step, your sample size should be approximately 150, or (0.376/((0.05)(0.05))).
Perform all the steps of Part 1 without taking cohort size into consideration.
Identify the total size of your cohort, and call this value N. Insert N into the following formula, where n' is the sample size calculated in Step 6 of Part 1: 1 + ((n' - 1)/N). Using the example figures from above and a total cohort size of 2000, this value would be or 1.0745, or (1 + ((150 - 1)/2000)).
Divide the value from Step 6 or Part 1, n', by the number you have just calculated in Step 2 of Part 2, to determine the sample size appropriate for your smaller cohort. Using the example figures from above, the sample size would be approximately 140, or (150/1.0745) rounded to the nearest whole number.
- 20 of the funniest online reviews ever
- 14 Biggest lies people tell in online dating sites
- Hilarious things Google thinks you're trying to search for