# 3.1.1 Sample Size Needed to Benchmark an Individual State’s Group Performance

The actual number of inspections needed to draw conclusions about a universe from a smaller “sample” depends on the following four factors:

î‚¾

Universe Size: The required sample size INCREASES with the increase in the total number of SQGs in the state.

î‚¾

Confidence Level: The required sample size INCREASES with the increase in the desired level of certainty that the population selected IS reflective of the population as a whole -- the “confidence level.” Fewer inspections would be required if states felt they would be able to make decisions based on results that had a one in ten chance (a 90% confidence level) that the population selected was not representative than if they could only tolerate a one in twenty chance (a 95% confidence level) that the population selected is not representative of the whole.

î‚¾

Confidence Interval: The required sample size INCREASES with the increase in the required precision of the results. When drawing conclusions about a population from a smaller sample, the actual performance must be expressed as a range around the “observed” value for the sample. This range is called the “confidence interval.” For example, if the inspectors “observed” that 70% of SQGs were in compliance with labeling requirements, and the confidence interval was 10%, the true compliance rate for the entire population of SQGs would be somewhere between 60% and 80%. The number of inspections needed declines with declines in the minimum level of precision that is required. Fewer inspections would be needed if, for example the states felt that they could base decisions on a confidence interval of 20% than if they felt they needed a confidence interval of 5%. A higher percent is less precise. It is important to note that confidence intervals and confidence levels are also related. For a given sample size, the higher the confidence level, the larger the confidence interval. For example, one can be 99% certain that one has properly estimated a person’s age if one guesses that they are somewhere between 1 and 100 years old. One might be only 90% certain (have a one out of ten chance of being wrong) if one guessed that their age was between 20 and 50.

î‚¾

Observed performance: The required sample size DECREASES the closer the actual performance is to either end of the scale. This happens because a score cannot be greater than 100% or less than 0% -- there is less total room for variation in the result at either end of the scale. Therefore:

o

A 50% compliance rate requires the largest sample size.

o

70% or 30% compliance rates require a smaller sample size.

o

1% or 99% compliance rates require the smallest sample size.

The “sample-size calculator”^{5 }developed for the Massachusetts ERP Program was used to calculate the sample sizes that would be required to benchmark each state’s performance

5 The “sample-size calculator” is an excel based tool that may be obtained by contacting Susan.Peck@state.ma.us

## The States Common Measures Project Final Report

48