OR WAIT 15 SECS
It has been demonstrated that the existing FDA dose content uniformity test has very poor statistical relevance, which has resulted in the acceptance of poor quality batches and the rejection of good quality batches. By using Bayesian Inference, a much improved test has been produced that allows the quality of a batch of drug product to be determined accurately, using a suitable number of samples for the quality of the batch.
Much has recently been written regarding the US Food and Drug Administration's (FDA's) dose content uniformity (DCU) tests contained in its draft Guidance for Industry.1 This article is not intended to be merely another overview of the poor statistical relevance of the test, nor is it simply a commendation of the International Pharmaceutical Aerosol Consortium on Regulation and Science (IPAC-RS) for the work it has done to raise the profile of this issue and propose an alternative. Instead, the authors intend to propose a novel approach that optimizes the level of testing required for each batch and reduces the risk of high quality batches of drugs being discarded unnecessarily.
Before alternative approaches can be discussed, it is necessary to highlight the existing testing procedures and also to examine the proposal presented by the IPAC-RS.
FDA's 'Guidance for Industry - Metered Dose Inhalers (MDI) and Dry Powder Inhalers (DPI) Drug Products' was released in draft version in November 1998.1 Within this document, FDA outlines the recommended method for assessing the DCU of inhalation drug products and describes this test as "Providing an overall performance evaluation of a batch."
Figure 1: The results of the FDA test, applied to batches of varying quality. The poor discrimination of the test can clearly be seen.
The FDA test examines two aspects of DCU - the DCU for doses from multiple containers within a batch and the DCU for doses within the same container. This article focusses on the former.
The DCU test is familiar to many in the industry because it forms the backbone of the quality assurance (QA) testing for inhalation drug products. However, for those who are less familiar with it, it can be summarized as follows:
Dose content uniformity test. For each of 10 containers, determine one dose. The test is passed if:
The test is failed if:
If the test is not passed or failed, 20 additional containers are tested in a second tier. The test is passed if:
Otherwise the test is failed. The test may not be repeated on the same batch.
The FDA test has been devised so that it is simple to understand and easy to implement, thus reducing confusion when the test is interpreted. For any given set of delivered dose data and corresponding LC, it is possible to quickly analyse the data and to pass or fail the batch.
However, such a simple approach has resulted in a test with poor statistical relevance. The test fails to accurately distinguish between high and low quality batches of drug and, as a result, leads to a high frequency of either poor batches being accepted or good batches being rejected. It is this poor statistical relevance that has been the driving force behind recommendations to change the test.
Figure 2: The 10/30 IPAC-RS test applied to the same batches as in Figure 1. The data drawn from each batch are identical to that used in Figure 1.
Another criticism often directed at the test is that it does not permit drug manufacturers to reduce the level of testing required if they produce a high quality product. The test demands the same level of QA testing for all batches of product, however close they are to the pass/fail criterion. It would appear to be in everyone's interest to encourage the production of high quality drugs by reducing the burden of QA testing on such a product.
During November 2001, IPAC-RS submitted its response to the FDA draft Guidance for Industry detailing its recommendations for an alternative DCU test.2 The test is a parametric tolerance interval (PTI) test containing three acceptance criteria:
There is no doubt that the IPAC-RS proposal makes significant strides towards a more rational and statistically relevant DCU test. Figure 1 and Figure 2 contain graphs comparing the existing FDA test with the 10/30 test plan proposed by IPAC-RS. The data clearly demonstrate the poor differentiation provided by the FDA test, with unacceptable batches being passed and, most significantly, a high percentage of acceptable batches being failed. It is evident from the graphical representation of the IPAC-RS test that the proposed PTI test improves the discrimination of the testing process and more accurately assesses the quality of the drug product. Fewer acceptable batches were rejected and fewer low quality batches were accepted.
Equation 1 and Equation 2.
Another key benefit of the proposed PTI test is the ability of a manufacturer to determine its own testing programme. IPAC-RS has proposed six test plans with increasing levels of sampling, from a sample size of only 10 doses per batch up to a first tier sample size of 24. The smaller the test sample, the higher the probability of an anomalous dose resulting in the batch failing the acceptance criteria. However, manufacturers with a high quality product will be confident of passing the acceptability criteria, even with smaller sample sizes. The PTI test allows these manufacturers to benefit from their high quality product by reducing the volume of QA testing and, therefore, the cost associated with QA testing.
The IPAC-RS proposal offers significant improvements compared with the current FDA DCU test programme; however, it still has a number of features that are undesirable. For example, a manufacturer must decide what level of testing it will undertake before it commences QA testing. The manufacturer may decide to perform a test plan requiring a high level of testing and yet when the first ten results come back, it may be evident that the batch is of excellent quality and further testing will not be required.
Figure 3: Central section of prior on batch mean.
Conversely, a low sample quantity test plan may be adopted and one erroneous result may cause the test to fail. The manufacturer is then faced with discarding a potentially high value batch of drug because it is forbidden to retest the batch and discard earlier results. An ideal DCU test would optimize the level of testing required depending on the quality of the drug product shown in the results.
To optimize the levels of testing required, the authors have applied Bayesian mathematics to the DCU test. The test addresses the problems identified with the FDA and IPAC–RS tests whilst building on the improvements made by the IPAC-RS PTI test.
The proposed test is based on Bayes' theorem of conditional probability. By applying this theorem to the data collected during a DCU test, an accurate assessment of the batch quality is achievable.
Much has been written about the differences between frequentist and Bayesian statistics, and the debate will continue for years to come. However, to give a broad overview of the proposed test, a brief explanation of Bayesian statistics is required for those unfamiliar with the principles.
Figure 4: Cumulative distribution function for lower part of prior on batch standard deviation.
Bayes' theorem on conditional probability can be expressed as Equation 1. This theorem can be used to calculate the probability of a theory being true given a certain data set (Equation 2).
Therefore, if we hypothesize that a batch of product falls within given acceptability criteria, then the probability of this theory being true can be calculated. The only uncertainty exists in determining P(theory). This is an initial probability that the theory is true, otherwise known as the 'prior.' Standard frequentist statistics can calculate P(data|theory), the probability of the data set occurring given the theory that the batch is acceptable is true. However, it is only using Bayesian Inference that we are able to answer the true question - what is the probability that this batch of product is acceptable?
Many would argue that arbitrarily assigning a value for the prior is unjustified and will result in a distorted final probability. However, it can be shown that if a suitable common prior is assigned to all DCU tests, then, after more than a few samples have been taken, the prior selected has only minimal influence.
Figure 5: The results of the Bayesian test applied to the same batches as in Figure 1.
As discussed earlier, by using Bayesian Inference we are able to accurately determine the probability that a tested batch falls within a given acceptability region. If a limit is then set on this calculated probability, a pass/fail criterion is defined.
The IPAC-RS test contains acceptability coefficients that are calculated so that a bad batch will be rejected with a probability of at least 0.95. For the Bayesian test, we arranged to pass a batch if the probability that it is good is at least 0.95; that is, IPAC-RS promises that P(reject|bad)>0.95: Bayes accepts if P(good|bad)>80.95.
Performing the Bayesian test on a batch is straightforward. An initial measurement is performed and P(batch is good|data) is calculated using the standard prior agreed by regulatory authorities. This result is then fed back into Bayes' theorem and, if necessary, combined with further readings to establish new values of P(batch is good|data). When P(batch is good|data) is found to be 0.95 or greater, the test is passed.
Figure 6: The number of measurements made on each batch during the Bayesian testing of Figure 3.
(In the work below, measurements were performed in groups of 10 and analysis only done between groups; however, this was an arbitrary choice.)
For high quality batches of drug product with repeatable dose weights, this will occur after only a few tests. However, for lower quality drug product a large number of tests will be required to achieve this. If the batch quality lies outside the acceptance region then a significant amount of testing will be required and it is likely that a probability of 0.95 will never be achieved.
The test does not define a limiting number of tests beyond which no further retesting is permitted. A manufacturer may continue testing its product for as long as it wants. However, further testing will not influence the fraction of passed batches that are good - rather it will confirm the quality of poor quality batches. It is expected that each manufacturer will determine the maximum number of tests they are prepared to perform before discarding a product; they may also decide to cease testing once P(batch is good|data) falls below some small value such as 1025, as the probability of further testing getting the batch to pass is then correspondingly small.
To demonstrate the benefits offered by the Bayesian DCU test compared with the existing FDA test, a computer simulation was created to model the two tests. The program simulates the testing procedure for batches with varying quality and identifies which batches would pass the tests and which would fail. As the Bayesian test allows testing to continue indefinitely, any batches that required more than 2000 samples to be assessed before it met the acceptance criteria were considered "fails." In reality, a manufacturer is likely to give up testing before this point is reached because it will be clear that the batch is not of adequate quality.
The acceptance criteria applied to the Bayesian Inference test were as follows:
Figures 1, 2, 5 and 6 are based on a simulated set of batches of varying quality, prepared as in Figure 22 of the IPAC-RS proposal document,4 particularly using the 10/30 test from the IPAC-RS proposal. The graphs clearly show the benefits of the proposed Bayesian test. High quality batches are passed after examining small numbers of samples (an average of 29 samples per good batch for the Bayesian test, 10.5 for the FDA test and 34 for IPAC-RS 10/30). Lower quality but still acceptable batches are passed after examination of more samples, and unacceptable batches passed in sufficiently small numbers that they constitute less than 5% of the population that passed (0.07% in this experiment).
The other tests, in contrast, failed many acceptable batches with consequential effects on yield and sales price. Thus, the Bayesian test did not fail any good batches, whereas the FDA test failed 25.2% of good batches and the IPAC-RS 10/30 test failed 16.5%. However, the number of bad batches passed were similar in the three tests: 2 out of the 36 bad batches in the 3000 tested for the Bayesian test, 3 out of 36 for the FDA test, and 1 out of 36 for the IPAC-RS 10/30 test.
The IPAC-RS parametric tolerance interval test offers significant improvements on the existing FDA guidance for DCU testing. However, there are still a number of issues that the IPAC-RS test does not address. Manufacturers who produce a high quality drug product should benefit from reduced QA testing but this should not be at the expense of increased risk of a batch being failed by an anomalous result. The proposed Bayesian Inference test allows testing to continue until a given level of confidence in the batch quality has been achieved.
For the FDA and IPAC-RS tests, in contrast, testing beyond the single use of a single test plan is forbidden - if a batch fails a test, the only option is to discard it, not retest it. The reason for this is that if retesting is permitted, the probability of a bad batch passing rises above the specified 0.05; indeed if unlimited retesting was permitted, the probability of any bad batch passing rises to 1.0.
The FDA and IPAC-RS tests answer the question: 'Given the theory that this batch is bad, what is the probability of achieving this set of data?' The actual question that should be asked is 'Given this data set, what is the probability that this batch of drug product is good?' This question can only be answered using Bayesian Inference.
The Bayesian Inference test requires a higher level of computation to establish the acceptability of a batch of drug product. This computational cost is more than offset by the reduction in the level of testing that will be required for good drug product and the reduced frequency of acceptable batches having to be discarded because they failed the DCU test.
Finally, before such a Bayesian test can be implemented, some empirical information on what distributions should be used in practice for the priors needs to be gathered and agreement needs to be reached on this subject between industry and regulators. The benefits can then be precisely quantified - the authors believe that both consumers and manufacturers will be shown to benefit significantly.