Predicting Meaningful Process Performance

Published on: 
Pharmaceutical Technology, Pharmaceutical Technology-04-02-2015, Volume 39, Issue 4

Process design experimental data and risk assessments are used to predict expected process performance and establish process performance qualification acceptance criteria.


FDA's current process validation guidance (1) has presented an implementation challenge for many pharmaceutical organizations since it was published in 2011. This revision of the original 1987 process validation guidance became necessary due to concerns about poor quality drugs on the market from supposedly validated processes and from drug shortages caused by unreliable commercial processes producing low quality products that could not meet release specifications. The 2011 Guidance for Industry, Process Validation: General Principles and Practices document (1) implements a product lifecycle concept that effectively aligns with and encourages concepts from guidances published by the International Conference on Harmonization (ICH), namely ICH Q8 Pharmaceutical Development (2), ICH Q9 Quality Risk Management (3), and Q10 Pharmaceutical Quality System (4).

Per ICH Q8, the aim of pharmaceutical development is to design a quality product and design its manufacturing process to consistently deliver that quality product. This approach is because quality cannot be tested into products; quality must be built in by design. This concept of quality by design is hardly revolutionary. Design controls for medical devices specifically address the importance of how upfront design impacts the quality performance of the device in the hands of a patient. In another example, aseptic processing does not depend solely on the use of end-point sterility testing, but on the design of equipment and facilities and the control of processes and personnel. Any validation engineer realizes quickly that a piece of equipment that is not designed for qualification is difficult, if not impossible, to qualify. Drug product manufacturing processes are no different; a process must be designed from the start to produce quality product.

The first stage of FDA's process validation guidance is process design, and its purpose is to define a process and its necessary controls to reliably produce a quality product (i.e., a process control strategy). The second stage is process qualification; its purpose is to confirm that the process as designed (in Stage 1) with its defined process control strategy will not only produce quality product for the duration of the qualification, but also reproducibly continue to do so into the future. This article explores how to use data generated during process design to establish meaningful and statistically justifiable acceptance criteria for the process performance qualification (PPQ) in Stage 2.

Specifications and acceptance criteria
For users of the process validation guidance, the focus has been primarily on the justification of the number of lots (since three lots may or may not be sufficient) and how to implement enhanced sampling with "statistical confidence of quality of both within a batch and between batches (1)." Critical process parameters (CPP) no longer are required to be tested to the extremes of their operating ranges during PPQ; those limits have been justified through data or risk assessment performed during process design. Less effort has been put into determination of well-defined acceptance criteria for PPQ. In most cases, the acceptance criteria are the end-product release specification limits and the in-process control limits. PPQ may have more lots; PPQ definitely has more samples to be tested. For these cases, however, process qualification only requires that each PPQ lot pass its end product release specifications and in-process controls limits with no additional acceptance criteria.

There are several flaws with this approach. The first is a "goal post" mentality. As long as the product is within the specification limits (the goal posts), quality is good; if the product is outside the limits, quality is bad. For those trained in the concepts of Six Sigma and Lean Manufacturing, the loss of quality is best described as a Taguchi Loss Function where the quality gradually decreases (or the loss of quality increases) as the result for the quality attribute moves away from its intended target. The best quality is found as far from the specification limit as possible (i.e., as close to the target value as possible). Product near the specification limit is not "good" but rather "barely acceptable." The goal should be to design processes that can achieve the best quality for customers and patients.

The second flaw comes from understanding the statistical relationship between a population (the lot, or groups of lots) and a sample from that population (the material tested). One can test a group of samples and determine a sample mean (the average of the samples tested) and statistically infer the value of the population mean (the true mean of the lot, which is unknown). When the sample mean is at the edge but still within the limit of the specification, the lot is accepted. Sample variation (and testing variation) can occur, however, so there is a likelihood that a second sampling could, in fact, be outside of the specification. In fact, this case is only unlikely when the lot has been determined to be extremely uniform.

The third flaw comes from believing that a quality product (passing product specifications for the number of PPQ lots) means a good process. Certainly, it is necessary for PPQ lots to produce quality product (pass specifications). Bad processes, however, also can make good product; the difference is that bad processes are unreliable and unpredictable. Product specifications are designed to judge the quality, safety, efficacy, identity, and strength of the product. Specifications define what the patient needs and are thus described as “the voice of the customer.” Processes have to be judged by how predictably they will produce that quality product. Statistical concepts such as statistical control (predictable data that are normally distributed) and process capability (measurements of process mean and variation relative to specification limits) are described as "the voice of the process." To assess the process as well as the product during PPQ, acceptance criteria, in addition to specification limits, are needed.

Linking Stage 1 and Stage 2
The output of process design (Stage 1) is a defined commercial manufacturing process with a process control strategy to ensure product quality. Process qualification (Stage 2) is the qualification of the process by demonstrating that the process control strategy can reproducibly produce quality product. Per the FDA process validation guidance (1), Stage 2 will "confirm the process design" and ensure that the process "performs as expected." For a process to perform as expected, it needs to be reproducible and therefore, predictable. To complicate assessing predictability, manufacturing processes are rarely deterministic. In a deterministic process, all process inputs are fixed so that an exact process output can be calculated. Real-life processes are probabilistic and are affected by numerous random factors. Even so, when a process is in statistical control, the output of the process will have predictable data distributions with the normal distribution being the most common.

The primary work in the process design stage is to understand the relationships between CPPs (and critical material attributes [CMAs]) and the outputs of the process (the quality attributes). These relationships may be derived from design space models produced using design of experiments (DOE), first principles, or prior knowledge of existing unit operations and process equipment. The collection and statistical analysis of data from process design should consider the eventual need of using these data to support PPQ acceptance criteria.



Acceptance criteria using a prediction model
In the first example, a solid oral-dosage form is produced by applying an extended release coating with a fluidized spray coater. One of the critical quality attributes (CQA) for the process is the % dissolution at 4 h. The specification for this attribute is 20% to 40% using the United States Pharmacopeial dissolution method. Many of the process parameters for the spray coating (such as air temperature, dew point temperature, air flow, coating solution flow rate, etc.) have either been determined to be non-critical process parameters or are fixed set points, which are well-controlled with little measured variation. A series of design of experiments at small scale with verification at commercial scale were conducted.

Using the first principles of mass balance of the dried coating solution on the pellets and the mechanisms of dissolution, it is expected that the rate of dissolution is driven by the surface area and coating thickness of the final pellet. Statistical analysis of the experimental design confirmed that the process input of the average diameter of the uncoated pellet was statistically significant. Therefore, the average diameter of the uncoated pellet is a CMA for the spray-coating process. Note that this parameter can also be considered as an output, or in-process control, of the previous process step.

Figure 1 shows the combined results of several process design experiments where various uncoated particle sizes were used. A simple linear model is fitted to relate the particle size to the resulting dissolution of the uncoated particle. Despite some variability about the model, the R-squared value of 83.8% indicates that this material attribute dominates the resulting dissolution over any other factors.

Figure 1: Linear prediction model for dissolution at 4 h (CI = confidence interval, PI = prediction interval, R-Sq = R-squared).

The model (solid black line) is the best fit, on average. However, the "true" model line more probably lies between the 95% confidence interval lines (dashed red lines). This is still where the dissolution results will lie, on average, for a given particle size. If we wish to predict what the dissolution result for any given run might be, we use the 95% prediction interval lines (dotted green lines). Therefore, this model predicts if an uncoated average particle size of 436 μm is used, the dissolution result should be between 22 and 29 approximately 95% of the time. If the process control strategy is properly applied, this prediction model should hold for not only PPQ lots, but for all future lots produced with this control strategy. If this model breaks down, it indicates either that the process control has failed or an unexpected event has occurred.

Using this model's prediction intervals as acceptance criteria for dissolution in the PPQ lots, confirms that the commercial process is following a statistical prediction for a CQA. Using the dissolution specification limits (20–40%) as the sole acceptance criteria, essentially ignores the process knowledge and prediction model developed during process design, and could lead to an unpredictable process being qualified.

This example used a simple linear model of one-factor, but the same approach can be applied to more complex multi-factor models as long as the variability of individual lots around the model best-fit line is taken into account.

Acceptance criteria using process capability
As discussed, a process that is in statistical control will produce a predictable output, which frequently is shown as a normal distribution. This distribution is compared to the specification limits to calculate a process capability index (Cpk). A higher capability index indicates a lower likelihood of producing out-of-specification product. Statistical control is a prerequisite for calculating process capability, because processes that are not in control are not predictable for future performance.


In this example, a capsule filling process is assessed for the in-process control of filling weight, which is a determining factor for the CQA content uniformity. The specification for the filled capsule weight is a target of 150 mg ±12 mg. Capsule filling weight was collected during Stage 1 small-scale clinical builds and during filler speed runs performed as part of the performance qualification of the capsule filler. Consistent filling performance was confirmed over several runs.

Figure 2: Capsule filling weight X-bar and S charts (UCL/LCL = upper/lower control limit, X-bar= sample average, X-2bar= average of sample averages, StDev = standard deviation,  S-bar= average of sample standard deviations).

Figure 2 and Figure 3 report the data collected from a full-scale run using the worst-case filling speed with 10 samples collected every 60 minutes. Figure 2 shows the X-bar (mean of samples at each timepoint) and the S (standard deviation of samples at each timepoint) control charts. These control charts indicate good statistical control with no trends, shifts, or points beyond the upper and lower control limits (UCL and LCL). Because the process is in statistical control, the process capability index can be calculated.

Figure 3: Capsule filling weight capability (USL/LSL = upper/lower specification limit, N = number of samples, StDev = standard deviation, CL = 95% confidence limit, PPU/PPL = upper/lower overall process capability, Ppk = overall process capability [minimum of PPU and PPL]).

Figure 3 is a histogram, which graphically displays the data distribution relative to the specification limits. As expected, the data are a good fit to a normal distribution. Capability indices can be either calculated as a Cpk (assumes no shift or drifts between subgroups) or as a process performance capability index (Ppk) (based on the overall distribution of the data). The Ppk, a less ideal case, is calculated in this example. The Ppk is 1.80, but this is based on the sample standard deviation. To obtain a population standard deviation for calculating capability, the 95% confidence limits for the Ppk is calculated as 1.53 to 2.08. Using the lower confidence limit is especially useful when the amount of data to calculate the capability is limited.

For PPQ runs, an assessment of the statistical control charts for capsule filling weight and acceptance criteria of not less than the lower 95% confidence limit of 1.53 can be used as additional acceptance criteria. This acceptance criteria will ensure that the filling is predictable and the process performs as predicted by earlier studies. Additionally, there is a now a very low statistical probability of filled capsules near the specification limits, if any. If only the specification limits of 138-162 mg had been used, no assessment of statistical control could be done and no prediction of future performance could be calculated.

Acceptance criteria using statistical tolerance intervals
The final example examines the use of statistical tolerance intervals. A tolerance interval defines the limits that a defined proportion of the distribution (called the coverage) will fall within to a defined confidence level. Tolerance intervals do not depend on the value of the mean or the standard deviation of the distribution, only the proportion. Tolerance intervals can be calculated with an expectation of a normal distribution or can be nonparametric with no assumptions about the distribution of the data.

In this example, the data for the CQA, assay, are collected over five process design experimental runs with four samples from each run. Runs representative of the typical conditions (i.e., not extreme conditions) for parameters that impact assay are selected. The specification limits for assay are 20–40 mg/mL. The tolerance interval is constructed using multiple Stage 1 lots with multiple samples from each lot, to capture both within lot and lot-to-lot variation. An analysis of variance (ANOVA) confirms that the lot-to-lot variance component is not significant, which allows the data from the five lots to be combined to calculate a tolerance interval.

Statistical software (5) or tables from ISO standard 16269-6 (6) can be used to calculate the limits of the tolerance interval. Figure 4 shows the calculation of the tolerance interval for the data set using a 95% confidence and the 95% coverage. Included is the Anderson-Darling normality test on the data set. Since the p-value is above 0.05, one can conclude that this data set is a good fit to a normal distribution and use the normal tolerance interval limits of 24.5 and 34.6 (results are rounded). It is important that the tolerance interval limits are within the specification limits of 20–40. If the tolerance interval limits were not, then there is a probability that the process would produce out-of-specification assay results.

Figure 4: Tolerance interval for assay for 95% confidence/95% coverage  (N = number of samples, StDev = standard deviation, AD = Anderson-Darling statistic)

Since PPQ lots should represent the same process population as the supporting Stage 1 data for assay, the PPQ lots' assay results will fall between 24.5 and 34.6 mg/mL for 95% of the time with 95% confidence. This acceptance criterion is tighter than the specification limit (20-40 mg/mL) and represents the observed process variation (both within and between lots) from Stage 1 studies.

To evaluate PPQ lots for within lot variation, one should select the sample size per lot with sufficient statistical power (e.g., 0.8 to 0.9) for the amount of variation they intend to detect. Detecting small variations within lots with sufficient statistical power may require a substantial sample size.

The same approach used to calculate the Stage 1 data tolerance interval can be used on the actual PPQ lot assay results post hoc. In this case, the acceptance criteria will "demonstrate with 95% confidence that at least 95% of the assay results are within the specification limits." First, one must perform an ANOVA and confirm the between lot variance component is not significant in order to combine the data sets. When the PPQ lots tolerance interval limits are calculated, they must be within the specification limits.

Multiple levels of tolerance intervals can be applied on a single CQA or one can use wider coverage and high confidence levels for CQAs that have a higher risk to patients. Figure 5 shows a wider (99% coverage and 95% confidence) tolerance interval of 23.0 to 36.1 mg/mL (results are rounded). Assay results from PPQ lots should fall within this wider interval 99% of the time with 95% confidence.

Figure 5: Tolerance interval for assay for 95% confidence/99% coverage. (N = number of samples, StDev = standard deviation, AD = Anderson-Darling statistic)

Tolerance intervals are also useful when re-qualifying legacy products or setting action limits when implementing Stage 3, continued process verification. The tolerance interval for a CQA can be calculated from a series of historical lots. Newly manufactured lots should fall within the tolerance interval limits with the defined coverage and significance. If the CQA fall outside of the coverage limits, it may be indicative that the new process lots are not part of the same population of historical lots.

This article described the potential flaws of using end-product testing and in-process specification limits as the sole acceptance criteria for PPQ lots. That approach indicates whether the PPQ lots have acceptable product quality, but does not predict if future lots will continue to do so. Because pharma companies are required to qualify the process as it was designed (in Stage 1) and demonstrate its reproducibility, they must establish additional acceptance criteria to demonstrate that the process is predictable for manufacturing future product. Statistical methodologies of prediction models, process capability, and statistical tolerance intervals can be used to develop more meaningful PPQ acceptance criteria and demonstrate that a designed process and its control strategy can reliably product quality product throughout its lifecycle.

1. FDA, Guidance for Industry, Process Validation: General Principles and Practices, Revision 1 (Rockville, MD, January 2011).
2. ICH, Q8 (R2) Pharmaceutical Development (August 2009).
3. ICH, Q9 Quality Risk Management (June 2006).
4. ICH Q10 Pharmaceutical Quality System (April 2009).
5. Minitab Inc., Minitab Statistical Software, version 16.
6. ISO 16269-6:2014 Statistical interpretation of data-Part 6: Determination of statistical tolerance intervals (2014).

Article DetailsPharmaceutical Technology
Vol. 39, Issue 4
Page: 44-46
When referring to this article, please cite as M. Mitchell, “Predicting Meaningful Process Performance,” Pharmaceutical Technology39(4) 2015.