Assessment of Large-Sample Unit-Dose Uniformity Tests

Assessment of Large-Sample Unit-Dose Uniformity Tests

Published on: October 2, 2011

Pharmaceutical Technology, Pharmaceutical Technology-10-02-2011, Volume 35, Issue 10

The authors describe the concept of the limiting discriminatory the limiting discriminatory threshold (LDT) as an objective means of evaluating the inherent quality requirement of a large-sample content-uniformity test.

On-line measurement methods are often used to implement the process analytical technology (PAT) approach to process understanding and control and offer many advantages over traditional pharmaceutical analytical methods. PAT tends to be fast and nondestructive, and the tools can be integrated into the manufacturing process stream relatively easily. PAT helps improve process understanding and enhance control of product quality. A well-developed PAT method can be applied to perform real-time release testing on a large number of samples per batch. It is not surprising that the application of PAT to the testing of the uniformity of dosage units (UDU) has been the subject of extensive discussion between industry groups and regulatory agencies.

The harmonized UDU test (hUSP) currently described in US Pharmacopeia <905> is the result of the International Conference on Harmonization's (ICH) effort to harmonize the US, EU, and Japanese pharmacopoeia tests and to ensure the consistency of the dosage units within a narrow range around the label claim (1). The hUSP flow chart is summarized in Figure 1.

Figure 1: The harmonized uniformity of dosage units (UDU) (hUSP) test flowchart. (ALL FIGURES ARE COURTESY OF THE AUTHORS)

Using operating characteristic curves to evaluate the performance of the hUSP test

The characteristics of the hUSP test can be summarized by a set of operating characteristic (OC) curves. A UDU test's OC curve is conventionally presented as the probability of lot acceptance (P) versus one key distribution parameter (Q). Assuming the test is characterized by its sample size (n), with possibly other distributional parameters, this relationship can be mathematically defined by the following equation:

Often Q = standard deviation. However, it is common to replace the standard deviation by coverage, which is defined as the percentage of dosage units within an acceptable potency range. It is assumed that the acceptable range is between 85% and 115% of the label claim (LC).

Representative OC curves obtained assuming a normal distribution and a target assay of 100% LC are shown in Figure 2. Under these assumptions, the performance of the hUSP test is symmetrical around 100% LC batch mean. Thus, only the range of 90% to 100% LC of the batch mean needs to be evaluated. It is evident from Figure 2 that the probability of passing the test is quite similar for batches with mean assay between 94% and 100% LC because of the design characteristics of the hUSP test.

Figure 2: Operating-characteristic curves of the harmonized dose-uniformity test for a range of batch means.

Large sample size UDU test proposals

The hUSP test is designed to accommodate a small sample size (i.e., as many as 30 units), but PAT-associated on-line methods permit a much larger sample size from each commercial batch. An alternative "large n" UDU test is thus needed. Several proposals have appeared in recent years, and a brief review is provided below.

Sandell proposal. Sandell et al. proposed a nonparametric counting test for sample sizes between 100 and 10,000 units (2). The maximum number of defect units, whose content exceeds the range of 85–115% LC, is equal to the median of the binomial distribution defined by a sample size (n), and a defect probability of 1 – 0.952 = 0.048. The authors demonstrated that the test has lower acceptance probability than that of the hUSP test as long as the probability is at or below 50%. However, the choice of the binomial median and defect probability of 0.048 is somewhat arbitrary and remains a point of controversy.

Bergum and Vukovinsky proposal. Bergum and Vukovinsky suggested a nonparametric test with a defect-count limit equal to the largest integer less than or equal to 3% of total sample size (3). OC curves are used to make various comparisons with the Sandell and hUSP tests. However, as with Sandell's proposal, the choice of 3% seems somewhat arbitrary.

Diener proposals. Diener et al. proposed several parametric alternatives for sample sizes in the range of 31 to 99 (4). As they did with Sandell's approach, the authors demonstrated with OC curves that their tests were more stringent for batches with at most 95.2% coverage. But, again, it is not clear that the inherent quality levels that the tests represent are widely accepted.

The key to any large sample UDU test is to define the underlying quality threshold that distinguishes an acceptable batch. Each of the above proposals makes assumptions about the acceptable coverage or defect rate. For example, both Sandell's and Diener's approaches chose 95.2% coverage as the quality requirement, while the Bergum and Vukovinsky proposal led (in the large sample limit) to a coverage of 97% as the quality requirement. The quality requirement of a large-sample UDU test should ensure good product quality and should be at least as stringent as the quality requirement of the hUSP test. OC curves are often used to evaluate large-sample UDU tests, but they depend on test characteristics (e.g., sample size) and parameters of the population distribution, thus requiring the generation and unwieldy comparison of a large number of OC curves. It would seem desirable to summarize the performance of any proposed large-sample UDU test using a single quality criterion.

Limiting discriminatory threshold

Formally, a limiting discriminatory threshold (LDT) is defined as the large sample limit of the inverse of the probability of acceptance function in Eq. 1. The concept of LDT for a large-sample UDU test is motivated by the following ideas and assumptions:

1. The key critical quality parameter of content uniformity is coverage.

2. A coverage threshold of less than 100% that unambiguously distinguishes between acceptable and unacceptable batches should be agreed upon.

3. UDU tests exhibit both Type I (i.e., rejection of acceptable batches) and Type II (i.e., acceptance of unacceptable batches) errors. As sample sizes increase, the test becomes more discriminatory. In the ideal limit of infinite sample size (i.e., the true coverage of each lot is known with absolute certainty), the error probabilities are zero, and the OC curve is a step function. In this article, the coverage at which this transition occurs is called the LDT, defined in the following equation as the converging limit with increasing sample size:

4. The LDT of a given UDU test represents its inherent quality-discriminating threshold in the ideal limit of complete knowledge (i.e., infinite sample size) and can be determined algebraically or by computer simulation.

5. To be considered equally stringent, UDU tests, at a minimum, should have the same LDT. Tests with the same LDT may approach the LDT at different rates as sample size increases. When comparing two large-sample UDU tests that have the same LDT, the test that approaches the LDT more rapidly can be considered more efficient.

An LDT of 100%, while certainly desirable, is not realistic because of inherent analytical uncertainty. The LDT and the rate of approach to the LDT will depend not only on the test itself, but, in some cases (e.g., parametric tests), also on the distribution of the population from which samples are drawn.

The LDT concept may not be limited to single coverage, but is applied to the 85–115% LC coverage here for illustration. In principle, the concept could be extended to multivariate quality metrics (e.g., joint coverage of 75–125% and 85–115% LC ranges), although that approach is not considered here.

It is legitimate and useful to determine the LDT for compendial tests (e.g., hUSP test) that employ fixed sample sizes. Such tests rarely define a quality requirement (i.e., the required coverage to pass the test) explicitly. When a definitive quality metric, such as coverage, can be identified, LDT provides a reasonable way to reverse engineer the intended standard of quality. Multistage tests are often designed such that the acceptability range is widened as the fixed sample size (i.e., stage) increases because the estimation of the population information has improved. This design reduces the risk of the Type I error and maintains a nominal Type II error rate. Ultimately, the criteria applied at the final stage set the standard for expected quality and decision error rates. Consequently, when evaluating the LDT of fixed-sample-size tests, the acceptance criteria of the final stage should be kept constant with changing sample size. If the content of all individual units were known, the coverage would be known exactly and could be compared with the LDT that is inherent to the final stage of the test. This procedure effectively assumes that batch acceptance would be based on the perfect knowledge of the batch uniformity, had it been available.

LDT estimation for the hUSP zero-tolerance criterion

Eq. 2 is used to estimate the LDT of the nonparametric zero-tolerance (ZT) criterion in Stage 2 of the hUSP test. Assuming that a batch true mean is at the target potency of 100% LC, the ZT criterion requires all tested units to be within the range of 75–125% LC. If Q represents the true 75–125% LC coverage of the batch, the probability of acceptance (Eq. 1) is expressed in the following equation:

The above equation may be inverted to solve for Q, thus yielding the following result:

Substituting the above equation into Eq. 2 and solving for the limit yields the following result:

Thus, the hUSP ZT criterion inherently would require a 75–125% LC coverage of 100%, which, as argued above, cannot be attained with existing technologies. The ZT criterion is useful as a failsafe in the current hUSP test to protect against an accidental serious failure that is not otherwise detected because of the small sample size. However, the ZT criterion in large-sample-size UDU tests may be unnecessarily stringent and dissuade personnel from performing testing.

LDT estimation for the hUSP test without the ZT criterion

The hUSP test does not explicitly define the inherent quality requirement. Because the test only applies to a small sample size, it will exhibit nonzero Type I and Type II errors. The test effectively establishes an acceptable standard for the Type I error probability that is determined by the Stage 2 criteria. Therefore, the LDT for the hUSP test without the ZT requirement, referred to as hUSP (–ZT) can be determined solely based on the Stage 2 acceptance-value requirement. This LDT can serve as a benchmark of the quality level that is expected in production batches.

Assuming a target potency (T) of 100% LC, the Stage 2 AV requirement can be expressed by the following equation:

Figure 1 contains definitions of L, M, S, and X.

The batch means above 100% LC are the mirror images of the ones below 100%, and thus are not considered in Eq. 7. The limiting coverage at large sample size is expressed in the following equation:

in which N(x|μ,σ_LDT) is the normal probability density function. The LDT for a normal population distribution (Eq. 8) is shown in Figure 3 as a function of population mean.

Figure 3: Limiting discrimination threshold (LDT) coverage of the hUSP(-ZT) test as a function of batch means, assuming a normal population distribution and target assay of 100% label claim (LC).

The LDTs are 95.45% and 95.96% for batch means of 100% LC and 96% LC, respectively. The dip in LDT curve at batch mean of 98.5% LC results from the indifference zone of the hUSP test. Clearly, the most stringent coverage requirement of about 96% occurs at a batch mean near 96% LC. This requirement is more conservative than that of Sandell (i.e., 95.2%), but less conservative than that of Bergum and Vukovinsky (i.e., 97%).

Determining the efficiency of the hUSP (–ZT) test

Monte–Carlo simulation was used to determine the approach to LDT for the hUSP (–ZT) test as sample size increases. In these simulations, half of the total units (n) were tested at Stage 1 and the other half at stage 2, if necessary. Therefore, a test with sample size n may only use half of its total units. The coverage required to achieve a specified probability of acceptance, given a normal batch mean, was determined iteratively. Figure 4 illustrates the coverage required to achieve 10% and 90% probability of acceptance (P10 and P90, respectively) for two cases of batch means (96% LC and 100% LC). P10 and P90 coverage are plotted against the inverse of the square root of sample size.

P10 and P90 of the hUSP test (n = 30, including the ZT requirement) is also given in Figure 4. A batch with 89% coverage will have 10% probability to pass the hUSP test, while coverage of 98% is needed to pass the hUSP test with 90% probability. These coverage values are essentially the same for the two batch means, as implied by the overlapping OC curves for batch means between 94% and 100% LC.

Other data points in Figure 4 are simulated without the ZT criterion. With increasing sample sizes, P10 and P90 coverage converge to the LDTs identified in Figure 3, thus indicating the increasing discrimination power with increasing sample sizes. With a batch mean of 100% LC, P10 and P90 converge to an LDT of 95.4%. This LDT is the inherent quality-level requirement of the hUSP (–ZT) in the ideal state where the content of all units in a batch are known. Figure 4 also shows that the LDT is 96% for a batch mean of 96% LC, thus matching the data in Figure 3. The differences in the LDTs for various batch means indicate the hUSP test is not totally independent of the batch mean.

Figure 4: Batch coverage to achieve 10% or 90% probability of acceptance for the hUSP(-ZT) test. Coverage from hUSP test provided as references. Normal distribution is assumed.

The choice of acceptance probabilities of 10 and 90% (i.e., P10 and P90) to represent the rate of convergence is arbitrary. It is desirable to choose probabilities that are extreme enough to illustrate convergence yet are not so extreme as to require excessive computer simulation time. Although all coverage lines should converge to the same LDTs, it is possible that other probability pairs (e.g., 5% and 95% or 20% and 80%) could lead to different conclusions about test efficiency. The chosen probability pairs should be consistent across the tests being compared.

Together, Figures 3 and 4 establish the inherent quality requirements of the hUSP test for the content range of 85–115% LC, as well as the convergent rates of the hUSP test toward the inherent quality requirements (LDTs). Jointly, these figures serve as useful tools for assessing large-sample UDU tests. A satisfactory large-sample UDU test should have LDTs no less than those of the hUSP test and should converge relatively quickly toward the LDTs. These two assessment criteria can be demonstrated using Sandell's proposal as an example.

Evaluation of the LDT of Sandell's proposal

As stated above, the Sandell limit is equal to the median of the binomial distribution defined by a sample size n and a defect probability p of 1 – 0.952 = 0.048. For a large n, this binomial distribution approaches (by the central limit theorem) the normal distribution with mean (median) of np. Thus, at infinite sample size, an acceptable batch will have no more than 4.8% of all units outside 85–115% LC. Thus, the LDT coverage of Sandell's proposal is 95.2%.

As a nonparametric test, the Sandell LDT is independent of assumed batch mean. Figure 5a superimposes the LDT of Sandell's proposal as a horizontal line on the LDTs from the hUSP (–ZT) test for a normal population distribution. Clearly, Sandell's proposal is not uniformly as stringent as the hUSP test.

Figure 5: Sandell’s proposal assessment: (a) Sandell’s limited discrimination threshold (LDT) is fixed at 95.2% coverage; (b) Sandell’s proposal coverage for 10% and 90% probability of acceptance versus the coverage from the hUSP(-ZT) test (batch mean = 100% label claim [LC]) (2).

The P10 and P90 coverage lines for Sandell's proposal are shown in Figure 5b. For comparison, the coverage lines from hUSP (–ZT) test assuming a batch mean of 100% LC are also plotted. The angle of convergence of Sandell's proposal is only slightly greater than that of the hUSP (–ZT) test. For tests that include as many as 1000 samples, the P₉₀ coverage line from Sandell's test is at or above the corresponding line from the hUSP (–ZT) test, thus indicating that Sandell's approach has a higher risk of Type I errors than that of the hUSP (–ZT) test. Furthermore, the P10 coverage line from Sandell's proposal is lower than that of the hUSP (–ZT) test, thus indicating that Sandell's approach has a higher risk of Type II error as well. Sandell's coverage lines can be compared with those of the hUSP test for other batch means as well, although this article will not discuss this comparison.

In summary, a nonparametric test that does not rely on the batch mean must have at least a 96% LDT coverage for the test to be considered as stringent as the hUSP test for all batch means. Furthermore, the P10 and P90 coverage lines of a proposed test should be contained within those of the hUSP (–ZT), as shown in Figure 5b. A P₉₀ coverage line above that of the hUSP (–ZT) test indicates a higher risk of Type I error, with potential negative effect on the business operation. Similarly, a P10 coverage line below that of the hUSP (–ZT) test indicates a higher risk of Type II error, potentially harming the marketed product's quality.

Comparison of hUSP (–ZT) test and Sandell proposal with a non-normal population distribution

Given the concern over greatly deviating units, it is of interest to determine the LDT assuming population distributions with "fatter tails" than the normal distribution. Such distributions might be present in batches for which the unit-dose potency variance is not constant during manufacturing. The nonstandard t-distribution, indexed by μ, σ, and df (i.e., degrees of freedom), is such a distribution. In this case, the limiting coverage at large sample size is expressed in the following equation:

in which t(x|μ, σ_LDT, df) is the nonstandard t probability density function.

Figure 6 shows how LDT coverage varies as a function of the assumed batch mean. The LDT for the hUSP (–ZT) was calculated for various nonstandard t distributions (indexed by the t-distribution parameter df) using Eq. 9. The LDT for the nonparametric Sandell test does not depend on assumptions about the population distribution and thus remains constant. The hUSP (–ZT) LDT for the normal distribution is also shown for comparison.

Figure 6: Limited discrimination threshold (LDT) coverage for the hUSP(-ZT) and Sandell tests when used to test units from nonstandard t-distributions. A target mean of 100% label claim (LC) is assumed (2).

With increasing degrees of freedom, the LDT coverage profile of a nonstandard t distribution approaches that of the normal population distribution, as expected. With lower degrees of freedom, the LDTs are less stringent for batches with means at or above 96% LC, but more stringent for batches with lower means. Between 96% and 100% LC batch means, the effect of df on the LDTs is surprisingly minimal, thus indicating the robustness of the hUSP test with respect to the changes in the greatly deviating units. Again, Sandell's proposal does not provide as stringent a requirement as the hUSP test does under the assumption of a nonstandard t distribution.

Discussion and conclusion

This article introduced and illustrated the LDT criterion for evaluating the performance and efficiency of large-sample UDU test proposals. The intent was not to introduce a new proposal or to provide a definitive evaluation of any specific large-sample UDU test, but merely to suggest an objective way of comparing test proposals and of ensuring that large-sample UDU tests will meet or exceed the quality standard set by the hUSP test.

The LDTs for parametric tests generally depend on the parameters of the assumed population distribution. For illustration, LDTs were examined for normal and t distributions. In principle, LDTs could be evaluated against other distributions that include skewness (e.g., log normal) or bimodality (e.g., mixed normal). The authors used 85–115% LC as the range of interest for defining coverage, although other ranges (e.g., 75–125% LC) may also be of interest. Furthermore, the LDT approach could be extended to quality measures other than coverage, or to multivariate measures.

It may not always be possible to calculate the LDT of a test directly by algebra and intuitive arguments. However, the LDT can always be obtained by extrapolation to a large sample size using simulation. This article showed how coverage corresponding to the 10% and 90% acceptance probabilities varied as a function of the sample size, which can be used to evaluate and compare the power and efficiency of competing tests.

To ensure that quality standards are maintained at an appropriate level, it is imperative that common quality criteria be identified and adopted. This article presents the LDT approach to assist in reaching this common goal.

Yanhui Hu* is principal process development engineer, and David LeBlond is senior statistician, both at Abbott Laboratories, D050Z, AMJ23, Abbott Park, IL 60064, tel. 847.938.8885, yanhui.hu@abbott.com.

*To whom all correspondence should be addressed.

Submitted: Apr. 4, 2011. Accepted: June 13, 2011.

References

1. USP 33–NF 28 Reissue, General Chapter <905>, "Uniformity of Dosage Units" (US Pharmacopeial Convention, Rockville, MD, 2010).

2. D. Sandell, et al., Drug Inf. J. 40 (3), 337–344 (2006).

3. J. Bergum and K.E. Vukovinsky, Pharm. Technol. 34 (11), 72–79 (2010).

4. M. Diener et al., Drug Inf. J. 43 (3), 287–298 (2009).