Statistical formula with historical estimates
As I tell my clients, the statistical answer to the sample-size question is: "We first need a prior estimate of the inherent
variability, the variance under exactly the same conditions to be used, an estimate of the alpha risk level, the beta risk
level, and the size of the difference to be detected."
The formula for the sample size for a difference from a mean is:
α and t
β are the one sided t distribution values for the given α and β risk levels selected and S
2 is the variance of the total product, process or method and d is the difference to be detected.
Four values are needed to calculate the sample size. The alpha and beta errors are standardized for most scientific and industrial
applications to α = 0.05, β = 0.05 or 0.10. Thus, the t values are taken from the standard t table for α and β and a given number of degrees of freedom of the data used to estimate the variance. The other two values
are more difficult to obtain.
According to FDA, "The number may vary depending upon the variability of the particular test ..." (2). This prior estimate
of the variance of the method for a given product may be difficult to obtain. If the product, process or method has been changed,
the data must be limited to that last change to be representative. Also, some products are made only a few times a year. There
may be only three or four batches and thus three or four values. This amount is not enough to get a good estimate of the variance.
If sufficient data does exist, from historical records, then perhaps the estimate can be made. A sample size of 30 or more
is preferred to obtain a reasonable estimate of the standard deviation.
The size of the difference to be detected is difficult to determine in advance because one does not know in advance how far
out of specification any future OOS result may be.
If the specification is 95% and the OOS is 89%, then the difference to be detected is 6%. But if the OOS is 94.4%, the difference
to be detected is 0.6%. These would give very different sample sizes.
Thus, there seems to be an inherent and unintended conflict within the industry on sample size. One is not allowed to adjust
the number of retests depending on the results obtained, but that is the very information we need to statistically and scientifically
determine the sample size.
To determine the sample size in advance without knowing how far out of specification the OOS result will be, one would need
to decide on a difference to detect in advance. But how to select this difference? Should it be the best guess of the analysts?
How does one justify that guess? Should it be the bias in the method from the validation, if it exists? If the bias is large,
the sample size would be small. If the bias is very small, the sample size will be large, as can be seen from the equation.
This seems to be the opposite of what industry wants to achieve.
Statistical formula with sample
Equation 1 can also be used if a first sample size (e.g., seven) is available to estimate the variance. With this variance
estimate and the difference between the specification and the OOS result, the sample size needed can be recalculated. Additional
samples would be taken to meet the sample size if greater than seven.
Equation 1 assumes a continuous response that is normally distributed. Some data, such as for a limulus amebocyte lysate test,
may be skewed, and colony counts are both discrete and skewed, so a different model and formula must be used to get the estimate.
There are books and computer programs dedicated to determining the sample size in different situations.
Further, from a laboratory management point of view, should a different number of OOS retests be pursued for each method?
Do the statistical and scientific advantages of different sample sizes outweigh the need for consistency for the analysts
to prevent confusion and mistakes? Are we out of compliance if the analyst does eight retests when the method calls for seven?
To conclude, there seems to be an inherent conflict in the industry's position on sample size. Given this discussion, the
seven out of eight criteria given in the Barr case may be as good as any.
Lynn D. Torbeck is a statistician at PharmStat Consulting, 2000 Dempster, Evanston, IL 60202, tel. 847.424.1314, LDTorbeck@PharmStat.com
1. United States vs. Barr Laboratories, Inc. Civil Action No. 92-1744, US District Court for the District of New Jersey: 812 F. Supp. 458. 1993 US Dist. Lexis 1932; 4
Feb. 1993, as amended 30 Mar. 1993.
2. FDA, Investigating Out-of-Specification (OOS) Test Results for Pharmaceutical Production (Rockville, MD, Oct. 2006).
The author would like to extend an open-ended invitation to those interested in this issue to send their comments and solutions
. Given adequate response, the information will be shared in a future column.