Thus for stability testing, if the goal is to estimate a characteristic of the batch throughout the shelf life of the product,
regression analysis can be performed to estimate the product characteristic with better precision than looking at individual
measurements, while the analysis can also be used to forecast that characteristic and predict potential product failure. The
predicted value from a regression analysis uses the power of all measurements made on the batch, thus providing a superior
estimate of the batch average and its performance over time.
In a product optimization, a process validation, or an OOS investigation, similar issues are present. A proper analysis of
data stemming from a multifactor experimental design can facilitate the acquisition of key process information.
Using statistical analysis for addressing multiplicity is standard practice in the clinical and preclinical testing of a product.
Well-known examples include the definition of a primary end-point for clinical trials and the multiplicity correction of p-values used by safety toxicology assessment. Similar adjustments can be made in the CMC environment. For example, statistical
concepts could be used to mitigate the risk of incorrectly concluding nonhomogeneous slopes among stability batches in shelf-life
estimation to obtain a more accurate estimate of shelf life (16, 17). In addition, multiplicity adjustment might be performed
in setting acceptance criteria for multiple release parameters or in establishing a shelf-life specification, where multiple
measurements will be made on a batch throughout its shelf life.
Conclusion
As industry begins to embrace the new quality paradigm for the 21st century introduced by FDA, manufacturers will obtain more
data. Additional samples will be taken for current tests, and additional parameters will be tested to obtain a better understanding
of products and processes. When large amounts of additional data are gathered, the underlying multiplicity issue creates a
problem for the current system of specifications with zero tolerance for results outside limits (i.e., a go/no-go approach).
This makes it increasingly difficult to focus on the true underlying quality of batches. Acting in the traditional manner
will cause many OOS investigations to look for special causes when none exist. The degree and frequency of nonconforming results
must be considered as part of the evaluation of the larger sets of data gathered in this new paradigm. This should not be
at the expense of understanding the nature of the true batch quality, but rather requires improved estimates of batch parameters.
Risk assessments are a part of judicious handling of large data sets, and extended use of statistical techniques and philosophy
play a key role in this. There are real costs associated with poor risk management. This is as true when not reacting to real
signals as it is when overreacting to false signals.
The emphasis in the new paradigm must change from individual test results to improved estimates of the true batch parameters
needed to support quality decisions. The conformance of a singlet determination under such circumstances is neither necessary
nor sufficient to characterize whether a batch is poor or failing. Indeed, situations will exist in which batch parameters
indicate that processes and products are satisfactory, whereas some individual singlet determinations may not comply on their
own. Current requirements outlined in ICH Q1E are an example of this, because decisions are made based upon the batch average
and not individual test results (5).
The acceptance criteria and the amount of data should be linked. Together they define the test characteristics to meet the
objective of making a decision on batch quality. This must be recognized and continuously emphasized to successfully manage
the risks. Examples of this thinking are becoming more common and represent the future desired state (8, 18–21). These ideas
must be extended to include the composite testing case (i.e., assays and degradation products) and tests with correlated end-points
(e.g., dissolution, particle-size distribution). The batch parameter estimates should take on greater importance than individual
test results, and regulatory guidance should be updated to recognize the importance of these estimates.
It must be understood that the only way to guarantee that all units comply with a limit is to test every unit (100% testing)
and have a perfect measurement system with no error in testing. A sample from the population, no matter how large, cannot
provide this guarantee. Although nondestructive testing can help overcome the limits on the size of samples, it cannot be
expected to operate under the same requirements for results (acceptance criteria) as for the small sample case. The current
expectation that every test result and replicate determination made must meet acceptance criteria is no longer a useful concept
in this environment. In the new paradigm, there should be rewards for developing criteria that allow for improved knowledge
about the quality of products and processes. The penalties associated with historical practices must be removed.
|