Statistical Considerations in Design Space Development (Part II of III)

Pharmaceutical Technology, Pharmaceutical Technology-08-02-2010, Volume 34, Issue 8

The authors discuss the statistical tools used in experimental planning and stategy and how to evaluate the resulting design space and its graphical representation.

Part I of this article appeared in the July 2010 issue of Pharmaceutical Technology and discussed experimental design planning (1). This article, Part II, addresses design and analysis in statistical design of experiments (DoE). Part III, to be published in the September 2010 issue of Pharmaceutical Technology, will cover how to evaluate a design space.

Design space is part of the US Food and Drug Administration's quality initiative for the 21st century which seeks to move toward a new paradigm for pharmaceutical assessment as outlined in the International Conference on Harmonization's quality guidelines Q8, Q9, and Q10. The statistics required for design-space development play an important role in ensuring the robustness of this approach.

This article provides concise answers to frequently asked questions (FAQs) related to the statistical aspects of determining a design space as part of quality-by-design (QbD) initiatives. These FAQs reflect the experiences of a diverse group of statisticians who have worked closely with process engineers and scientists in the chemical and pharmaceutical development disciplines, grappling with issues related to the establishment of a design space from a scientific, engineering, and risk-based perspective. Questions 1–7 appeared in Part I of this series (1). The answers provided herein, to Questions 8–22, constitute basic information regarding statistical considerations and concepts and will be beneficial to a scientist working to establish a design space in collaboration with a statistician.

Statistical experimental design

The following questions address types of experiments and appropriate design choices. The selection of the design depends on the development stage, available resources, and the goal of the experiment. The type and size of the design for an experiment depends on what questions need to be answered by the study. In general, there are three types of experiments: screening experiments to select factors for more experimentation or to demonstrate robustness, interaction experiments to further study interactions between factors of interest, and optimization experiments to more carefully map a region of interest.

Q8: Why are one-factor-at-a-time (OFAT) designs misleading?

A: OFAT experimentation is effective if the error in the measurements is small when compared with the difference one desires to detect and if the factors do not interact with one another. If either of condition is violated, the OFAT methodology will require more resources (experiments, time, and material) to estimate the effect of each factor of interest. In general, the interactions between factors are not estimable from OFAT experiments. Even when there are no interactions, a fractional factorial experiment often results in fewer resources and may provide information on the variability. Experimenters may be able to estimate interactions based on a series of experiments that were not originally designed for that purpose, but this approach may require more sophisticated statistical analyses.

Q9: If I have a model based on mechanistic understanding, should I use experimental design?

A: Experimental design can provide an efficient and effective means for estimating mechanistic model parameters. Further, experimental design can be used to minimize the number of runs required to estimate model coefficients. The underlying functional form of the surface, including interactions, may be known so that the interest is focused on estimating the model coefficients. Random variation should also be estimated and incorporated into the mechanistic models; the incorporation of error is often omitted from these models. General mechanistic understanding can still be used to select factors for evaluation, assist with identifying interactions, and assure that the model is appropriate. The appropriate design to accomplish the goals listed above may differ from factorial designs and may result in levels of factors that are not equally spaced.

Another advantage of using an experimental design is that an empirical model can be fit to the data and compared with the mechanistic model as part of model verification (see Questions 10–13).

Q10: What is a screening experiment?

A: A screening experiment provides broad but not deep information, and generally requires fewer experimental runs than an interaction or optimization experiment. In general, a screening DOE is more efficient than changing one factor at a time. There may be some interest in studying the interactions between factors although this is usually not the focus. Typically, performing a risk assessment to identify potential factors for a screening study is recommended.

Screening experiments are often used to identify which factors have the largest effect on the responses1 . These few significant factors can then be studied in a subsequent, more comprehensive DOE. Another use for screening experiments is to demonstrate that factors have no practical change in the responses across the range studied.

Designs used in these experiments generally include fractionated or highly fractionated factorial experiments with each factor at two levels (e.g., Plackett–Burman designs). Center points (i.e., each factor is set at the center of their range) are often added to measure replication variability and to perform an overall test of curvature over the experimental region.

Boundaries of the design space can be defined using a screening design; however, complete process understanding may not be achieved by studying main effects, thus requiring an interaction or optimization design. In some cases, the risk/benefit may not be great enough to proceed beyond a screening design.

Q11: What is an interaction experiment?

A: As underlying scientific phenomena become more complex, so too does the underlying mathematical model. The goal is to study interactions between factors along with the main effects. An interaction between two factors means that the effect that one factor has on the response depends on the level of the other factor. For example, the effect of mixing time on a response may depend on the mixing speed. Interaction designs generally consist of factorial or fractional-factorial experimental designs to assess factors at two levels along with a replicated center point. There are other designs that can evaluate interactions (e.g., unbalanced designs produced using D-Optimal algorithms).

The advantage of the factorial family of designs, including screening and interaction designs, is that additional points can be added to the design to obtain more information about the interactions in a sequential fashion. The analysis is fairly easy and can be performed by most statistical software packages. Some designs (e.g., Plackett–Burman) appear similar to factorial experiments but do not have the same properties and cannot be easily added on to in a sequential fashion.

Boundaries of the design space can be defined using an interaction design. Complete process understanding, however, may not be achieved by only studying main effects and interactions. In this case, an optimization design may be necessary.

Q12: What is an optimization experiment?

A: Again, as the underlying scientific phenomena become more complex so too does the underlying mathematical model. Optimization experiments are used to map a response surface to understand the intricacies of factors and their interactions, as well as nonlinearities and their combined effect on the responses of interest. These experiments can be used to find the combination of factors that predict the optimum response (a maximum or minimum), to find a region of possible factor combinations that predict acceptable results, or to predict process performance in the region studied. Optimization experiments are generally larger, requiring three or more levels to estimate quadratic or higher order terms. In addition, more factor levels may be required to capture the response more accurately, especially in cases where the factor ranges are large. Common designs used for optimization experiments are central composite, Box–Behnken, three-level factorials, or mixture designs. Analysis of designs used in optimization experiments can be more complicated than those used in screening experiments.

Q13: What are the differences among screening, interaction, and optimization experiments, and why choose one over the other?

A: The design type is selected based on experimental goals, number of factors, timing, and resources. Planning is a crucial part of experimental design. As discussed, screening designs are usually used to select a few significant factors out of many for more intense future study or to support a decision about robustness for the combination of factors under study. Interaction experiments usually include fewer factors than screening designs and result in more information about the factors and associated interactions. Optimization designs are used to obtain a complete picture of the factors effects plus the interactions plus any curvature or quadratic effect.

Factorial designs encourage sequential experimentation. For instance, one could start with a screening design, choose to create a new screening design with different factor ranges, or augment with additional runs an interaction design. If needed, augment an optimization design. If the timing of results will not permit this increased experimental time, then one larger experiment may be more appropriate.

Screening experiments should be performed before running an optimization experiment. It is possible that a response surface is not needed. If the goal is to show that acceptable results can be obtained over a defined experimental region, then a screening or interaction experiment may be all that is necessary. A common mistake is to run an optimization experiment too early in the process and find out that the ranges were not well chosen, thus resulting in wasted resources. One economical approach is to run a screening or interaction design, and if the results are reasonable, to then add points to the design to allow fitting a response surface.

There are pros and cons to performing studies in a sequential manner. Performing pre-experimental runs can provide a good start on the potential ranges to consider for the studies. Performing center points during screening could provide information not only on the replication error, but also on the presence of significant curvature in the experimental region. Although center points can identify curvature, they cannot by themselves, identify which factor(s) contribute to the curvature. If the center points do not provide evidence of curvature, the risk/benefit is unlikely to be great enough to proceed to an optimization experiment. However, if there is curvature, then the risk/benefit based on a priori knowledge and the magnitude/direction of the curvature should be used to determine whether to develop a response surface model. In general, if there is practically important curvature, either scientific knowledge can be used to help model the curvature, or an optimization experiment should be performed. The risk/benefit should be considered when two-level screening or interaction experiments with center points are used to define the design space. In any case, if there are unacceptable experimental results among the factorial points, additional runs may be required to confirm the model predictions of acceptable performance. If designs are done in stages, at least one point should be replicated (often the center points) at each stage. Such common points allow a comparison of the two sets of data to ensure that changes between stages are within experimental error.

Q14: If I conduct several small experiments, can they be analyzed together to produce a design space?

A: During the development process, a scientist may run several experimental designs. The information obtained from one design provides knowledge that helps determine the next design. Results from each design may be analyzed separately, or results may be combined and analyzed together. However, the results could be misleading if the analysis is not performed properly given the structure of the studies. In the planning stage of each experiment, the combined set of factor levels across the designs should be considered, so that when combined, a meaningful response surface model can be constructed. In many cases, a sequence of designs has the same factors (or a subset), but the factor levels have changed. Ideally, there should be common points from one design to the next to make sure that there is not a shift in the results, as well as common factors with similar levels between the experimental designs. In addition, if there are differences between the designs such as scale, these should be modeled appropriately during the analyses in order to obtain meaningful and accurate conclusions. Although it is generally advantageous in terms of power (i.e., the ability to detect an effect) and information to combine data from various experimental studies, additional planning of the analysis approach is required to guarantee success. When combining data from various experimental studies, a more complex analysis may be required. Also, caution should be used when extrapolating outside the range of the experimental space.

Q15: In a multiple step process, should I use a separate design for each step or a design that includes factors from multiple steps to see interactions between steps?

A: Most manufacturing processes have multiple steps (e.g., blend, mill, blend, compress or dissolve, sonicate). Each step may have several factors that affect responses either during that step or a subsequent step. If one factor in one step interacts with a factor from another step, it is better to include factors from multiple steps in the same design so that the between-step interaction can be evaluated. Suppose that temperature is a factor in one step, and there is a speed factor in the next step. If the effect of speed in the second step depends on the temperature in the previous step, then the design should include both steps. If, however, the effect in the second step does not depend on the level of each factor in the first step, then each step can have its own design. The existence of cross-step interactions should be considered during the risk assessment or the planning stage; cross-step-versus-single-step experimental designs may be beneficial. Another approach is to use a separate design for each step but to use the most important factors in a design across multiple steps. Sometimes, response at a step can be used as a factor in the subsequent step (see Question 16).

Q16: Can the experimental design used in a later step use a response, rather than factors, in an earlier step?

A: As discussed in Question 15, experiments can be performed at each step separately or they can incorporate several steps in the same experiment. If experiments are performed at each unit operation or step separately, there may be a response at an early stage that can be used as a factor in a later stage. Using the response from an earlier step as the control of a later response is desirable. This property could allow equipment interchangeability. The assumption is that the response in the earlier step used as the control contains all of the information required to predict the second response. At a minimum, the experiment performed in the subsequent step should have the response from the first step as a factor. For example:

  • In a roller-compaction formulation for tablets, ribbon thickness (and scientific understanding of ribbon density) from the compactor may depend on several factors in the roller-compaction process. If dissolution is an important response for the final tablets, the experimenter may want to control the ribbon thickness to ensure acceptable dissolution rather than controlling the factors in the compaction step that affect the ribbon thickness. Ribbons of different thickness (e.g., thin, middle, thick) should be generated and used as a factor in the design for the next stage. One should be able to demonstrate that, after removing the effect of ribbon thickness, the other factors used in making the ribbons no longer have an effect on the response at the later stage.

  • When developing an active pharmaceutical ingredient, if the focus is to understand impurity rejection at a subsequent step, then one could use the impurity at a previous step as a "factor" while making sure that the range is challenged. If one is not able to spike the input material for the current step with the impurity(ies) of interest, one would make a batch with the highest possible impurity level from the previous step as a level in the subsequent step.

Q17: When developing a design space, one tends to increase batch size from small laboratory batches to medium and large production batches. Because production size batches are more expensive, how can one minimize the resources used to develop the final design space?

A: A significant amount of resources are required to establish a detailed design space using large-scale batches. Using a laboratory scale model is beneficial to understanding the chemistry and effect of other factors that are scale-independent on the desired responses. Most of the effort to define ranges and evaluate factors should be done at the laboratory scale(s). One strategy is to reduce the number of factors and the number of batches as the batch size increases while maintaining the most important factors. Using laboratory scale models has certain advantages and challenges (see Table II2 ).

Table II2: Advantages and challenges in the use of laboratory-scale models.

One challenge when using a laboratory-scale model is not being able to simulate the impact of scale-dependent factors. Some factors may be known to be dependent on batch size. In other cases, mechanistic knowledge may enable a scale-independent factor to be identified rather than perhaps a more obvious scale-dependent factor. Increasing the batch size from laboratory scale to a medium size may identify other scale-dependent factors that need to be studied at production scale.

There are a few options for addressing scale-up:

  • Identify the least desirable, yet acceptable, combination(s) of the factors that have been studied in the laboratory. The least desirable combination for each critical quality attribute (CQA) is determined by scientific knowledge and information obtained from laboratory data that is likely to give the least desired response. In addition, identify the operating target combination obtained from the laboratory scale model. Perform the worst-case combination(s), the operating target combination, and preferably the best-case combination(s) in duplicate at the large scale. If the combined experimental runs provide results that pass the specifications, then the ranges from the large scale can be used to establish the ranges of the design space for the factors.

  • Identify the scale-dependent factors and their potential ranges based on laboratory and medium-size batches. Perform a very small designed experiment with these factors (at two levels) with the other factors at their worst combination(s) so that any potential interaction of the scale-dependent and scale-independent factors can be studied at large scale.

  • Generate scale-dependent data on intermediate scales using approach (b) above and perform approach (a) on the actual production scale. This approach might be useful when material cost is expensive.

  • Augment a design space over time using information from batches that are made at large scale. The data should include deviation data and data from other small designed studies that may have been performed to gain more knowledge about the process in real time.

  • Use mechanistic models that can provide a good understanding of scale-dependent rates. These models allow one to change factor settings in large scale to ensure the desired response. Typical examples of this change include heat transfers and mass transfers in chemical reactions. Equipment and scale effects can be studied using rate kinetics studies in the laboratory or by performing in silico experiments on a mechanistic model. These can provide an optimum combination for large scale production.

Q18: Are there any other statistical principles to consider when designing the experiment that might reduce the number of batches required or improve identification and/or estimation of important factors and their effects?

A: There are several statistical principles that can be used to ensure that the important factors are identified and their effects are estimated without ambiguity, as outlined below.

  • Randomization refers to running experiments in such a way that the order in which the experimental runs are performed do not induce systematic bias in the results. To accomplish randomization, each run should be treated as though it is the first; that is, all factors should be reset before each run. Any divergence from complete randomness for convenience in running the experiment introduces a potential change into the analysis approach. Divergence from traditional randomization can often be economically desirable or physically necessary, but should be discussed with a statistician before running the experiment. For example:

  • Splitting the batch into sub-batches. In a DOE, there could be factors applied to the parent batch and other factors applied to the sub-batches. An example would be to manufacture tablet batches with different combinations of factors (e.g., mixing time, lubricant level, mixing speed) and then split the core tablets from each batch into sub-batches for coating, which would have its own set of factors (e.g., spray rate, pan speed). These types of designs are called split-plots and are analyzed in a different way than a factorial experiment because the error structure is more complicated.

  • Performing an experiment in a convenient manner. It may be convenient to make all of the high-temperature batches first and the low-temperature batches afterward. However, this arrangement can introduce a split-plot structure and other biases into the experiment and lead to a misleading conclusion.

  • Not planning the order of analytical (or other) testing. It is important to ensure that measurement sources of variability are not confounded with the effects being studied. Consider the following examples demonstrating the impact of analytical variability on the samples from the experimental phase. Suppose that six batches were manufactured in the experiment and sent to the analytical department for dissolution testing. Assume that six analytical tests are performed with six tablets from the same batch and all tested in the same analytical run. The batch results are now confounded with the analytical run. A better approach would be to perform six analytical runs with six tablets per run but within each run, use one tablet from each batch. This approach will separate the batch-to-batch variability from the analytical variability, thereby improving the comparison between batches. Analytical runs could also be confounded with factor effects. If samples generated from all of the high levels of a factor (e.g., temperature) in an experiment are tested in one analytical run and the low levels in another analytical run, it is possible that a significant difference between the high and low levels is not due to a temperature effect but rather may be due to day-to-day variation in the analytical method (i.e., the factor temperature is confounded with analytical days). Instead, one might randomly assign the samples from the experimental study to the different analytical runs.

  • Prior to running the experiment, the sampling strategy determines where and how many samples should be obtained to maximize information. Collecting samples throughout the execution of a given experimental run provides significant information that can be used to understand mechanisms across the run. Preliminary experiments are useful for developing the sampling strategy. This exercise also helps to identify what type of analytical methods would be beneficial for obtaining accurate and precise information about the process.

  • Replication refers to running one or more independent repeats of the same experimental condition or setting to provide reliable estimates and/or conclusions from the data. Replication should not be confused with repeated measurements of the same sample or measurements of multiple samples taken at the same experimental condition. To have adequate ability or power to distinguish factor effects and estimate with good precision the coefficients of the statistical model, it is essential to have sufficient replication of the design points in the study. Most of the time, the center design point of the statistical study is replicated. However, there may be situations in which replication of the extreme design points provide precision information across the statistical design. In general, the number and location of replicates depend on the size of the effects to be detected, the desired statistical power, and the range of the factors in the study.

  • Statistical thinking refers to recognizing that all work occurs in a series of interconnected processes. Before running a designed experiment, the process and potential sources of variability should be discussed, understood, and controlled as much as possible. Diligence in this manner will aid in assuring more accurate and precise data. The scientist should be well trained and knowledgeable about the process flow and the equipment. In this case, performing a preliminary experiment(s) can be helpful. Learning how to operate the equipment while conducting the experiment adds a confounding factor to the experiment, which may adversely affect the conclusions. In addition, because the analysis usually depends on analytical results, the methods should have adequate accuracy and precision before performing an experiment.

Analysis of DoE

This section provides an overview and interpretation of the analysis of DoE data beginning with the simple single response and single-factor case progressing to the single response with multiple-factor case, and concludes with multiple responses and multiple factors. Analysis of variance (ANOVA) along with partial least squares (PLS) analyses are discussed.

Q19: How do I analyze the data and define the design space for one response and one factor?

A: Analyzing an experiment to study the effect of a single material attribute or a process factor, X, on a single response, Y, varies based on whether X and Y are categorical or continuous and whether the relationship is described by a well-defined mechanistic model or an empirical model. Typically, a design space has more than one CQA and more than one factor, but this example is useful to understand more complex situations. Figure 22 shows an example where X is categorical. Figure 3 shows an example where X is continuous and the relationship between X and Y is described by a mechanistic model.

Figure 2: Categorical Factor X and continuous product attribute Y.

When X is categorical as shown in Figure 2, statistical intervals on the response are constructed for different values of the categorical factor. Figure 2 demonstrates a box–plot display of the response for each of the X values (A, B, C, D). Box plots summarize the data for each X value, showing the spread and center of the response at the value of X. The design space would include all the X values where the statistical limits on Y provide acceptable quality.

When X is continuous, functional relationships are fitted between X and Y as shown in Figure 3. When the relationship between X and Y is not well-defined, or is too complicated, empirical models may be used to approximate the underlying relationship between X and Y. First-order and second order polynomial models are frequently used and have been found to be very useful in approximating the relationship between X and Y over limited ranges of X.

Figure 3: Arrhenius relationship.

Figure 4 illustrates an example of an empirical model of Y on X. A linear regression equation is used to model how the level of a degradate changes with varying levels of Factor B. If the specification for degradate is not more than (NMT) 1.00%, then –1.00 to 0.49 may be defined as a design space for Factor B. This is shown as the green region in Figure 4. In this region, the predicted values from the regression equation meet specification. It should be noted that there is some uncertainty associated with the just defined area. Evaluation of the design space is discussed in Question 25, which will be covered in Part III of this series.

Figure 4: Empirical model of degradate (Y) on Factor B (Degradate = 0.72 + 0.57 * Factor B).

Q20: Once I have data from the designed experiment, how do I define a design space for one response and multiple factors?

A: The design space can be built using a statistical model that results from the DoE that has been executed. The particular model employed will vary depending on the design and will be subject to the appropriate caveats. For example, a screening design may produce a model with linear terms and may assume no curvature. An interaction design may assume no quadratic effects. An optimization experiment is an approximation to a more complicated response function. As described in Questions 8–18, a risk-based approach is used in determining the DoE structure and determines the analysis.

The modeling step of the analysis is often an iterative process where the statistically significant model terms that contribute to explaining the response are determined. The final fitted model can be used to generate interaction profiles and contour plots to help visualize and understand the effect of the factors on the response. An interaction profile shows how the response changes as one factor changes at given levels of another factor. A contour plot is a two-dimensional graph of two factors and the fitted response. Vertical and horizontal axes of the contour plots represent factors from the DoE while the lines on the contour plot connect points in the factor plane that have the same response value (y), thereby producing a surface similar to a topographic map. The contour lines show peaks and valleys for the quality characteristic over the region studied in the DoE. When there are more than two factors in the experiment, contour plots can be made for several levels of the other factors (see Figures 5 and 6).

The same fitted equation,

Figure 5: Assay interaction profile depicting the interaction effect between Factors A and B on assay. The subgraph on the left shows how the assay (y-axis) changes from the low to high level of A (x-axis). The red line indicates the assay values for the low level of B (–1.41 coded units) and the blue line indicates assay values for the high level of B (1.41 coded units). The subgraph on the right shows levels of B on the x-axis and the low and high levels of A coded as red and blue lines. This subgraph provides insight into how assay is related to level of factor A and that this relationship is dependent on the level of factor B.

Assay = 96.9+3.2* A-1.7* B+3.0* A* B* - 2.2 A2

that is used to draw Figures 5 and 6 can then be used to determine the factor ranges that would produce acceptable results; results that meet specifications. If there were no variability, Figure 7 offers an example of the region (in yellow) for Factors A and B where the assay values are predicted to be at least 95%. The levels of factors A and B are presented in coded/standardized form with -1 representing the low level of the factor and 1 representing the high level of the factor in the study. This coding allows the factors to be viewed on a common scale that is not dependent on units of measure. Because this DoE is a response surface design, the experiment includes star points that extend from the center of the design to the -1.41 and +1.41 levels.

Figure 6: Assay contour plot shows the expected assay responses for different combinations of Factors A and B. Red points are the experimental design points; note that although the axes extend from –1.5 to +1.5 coded units, the experimental factors only spanned -1.41 to +1.41 coded units. Color is used to indicate the value of assay with red representing lower values and blue indicating higher values.

Similar to Question 19, there can be variability and some degree of uncertainty associated with the edges of a design space region (see Figure 7). The use of a probability region to define a design space to protect against the uncertainty of meeting specification at the edges of the region described in Figure 7 is discussed in Question 26, which will appear in Part III of this series.

Q21: How do I construct a design space when I have multiple important responses?

Figure 7: Assay design space for Factors A and B. The red points are the experimental design points; note that although the axes extend from –1.5 to +1.5 coded units, the experimental factors only spanned -1.41 to +1.41 coded units.

A: If each response can be adequately modeled using univariate analyses, the simplest approach to dealing with multiple responses is to overlay contour plots of the fitted model for each response. The overlay plot will indicate regions where each mean response is within its required bounds.

To illustrate, the example used in Question 19 can be extended to include two responses: Assay and Degradate 1. The design space can be constructed using the results from the DOE with the factors (A and B) and two quality characteristics. The analysis of the data from the DoE found that the responses could be adequately modeled using the following equations and that these responses were not highly correlated:

Assay =96.9+3.2* Factor A -17*

Factor B - 2.2* Factor A2+3.0*

Factor A* Factor B Degradate = 0.82

– 0.30* Factor A +0.49* Factor B +0.72*

Factor A* Factor B

When there is more than one quality characteristic in the design space, the use of overlay plots is helpful. An overlay plot is created by superimposing the contour plots for each quality characteristic with the required quality bounds (e.g., specifications). A potential design space can often be found where all the mean quality characteristics are simultaneously within the requirements. Figure 8 provides a potential window of operability for Assay and Degradate 1. For this overlay plot, the requirements for assay are 95.0% to 105.0% and Degradate 1 is NMT 1.00%. The yellow region indicates the settings of Factors A and B that meet both of these requirements simultaneously. The red points included on the plot are the experimental design points; note that the axes extend from -1.5 to +1.5 coded units although the experimental space is from -1.41 to +1.41 coded units.

Figure 8: Overlay plot of assay (%) and degradate 1 (%) for Factors A and B where the bounds for assay are 95.0%–105.0% and the upper bound for degradate 1 is 1.00%. The red points represent the observed results.

With multiple important responses, it is valuable to understand the correlation structure of the responses. A correlation analysis can help determine if each response should be assessed separately or if the responses should be analyzed together. If a set of the responses is highly correlated, it may be possible to eliminate some responses from the analysis, recognizing that each of the correlated responses contains the same information. As a result, one response can be chosen to represent the set of responses, and it can be analyzed using univariate methods. If the results are moderately correlated, analysis methods that take the correlation structure into account may be used such as multivariate analysis (MANOVA), the Bayesian interval approach referenced in Question 26b (in Part III of this series), or principal component analysis (PCA) if the linear combinations make scientific sense (e.g., particle size data). These multivariate techniques can provide effective analyses, but come at the cost of increased complexity. If, for all practical purposes, the responses are not correlated, it is possible to employ univariate analyses and use simple combinations (such as overlays and desirability functions) of the univariate models to construct the design space (noting the need to account for uncertainty due to variability). In any case, potential correlation between responses must be explored for scientific meaning and understanding.

Q22: Can I use PCA/PLS to analyze data from a statistically designed experiment?

A: The first choice of statistical method for analysing data from a multivariate DoE would be multiple a linear-regression (MLR) model. Mathematically, PLS can be used in the analysis of a DoE but there is some question surrounding this tool's benefit. PLS is a "latent variable" approach to analyze multivariate data when there is a correlation structure within data. If the responses are independent (i.e., uncorrelated), and the data has originated from an orthogonal DoE (e.g., full-factorial, fractional-factorial), then PLS will have no mathematical advantage compared to performing the analysis one response at a time using MLR. In fact, there may be a disincentive to using PLS because if the response variables are all independent, PLS will require the same number of latent variables as responses.

If the responses are correlated, PLS could be used, but there are several other preferred approaches:

  • Perform the analysis on each response separately. This is the easiest and most interpretable approach. If some responses are highly correlated, the factors that are significant and their interpretations will be similar.

  • Perform PCA on the responses and analyze the principal components separately. The individual components may serve as interpretable summary of the original responses (e.g., particle-size distribution data). Furthermore, the new components are independent of each other. An exploratory PCA on responses may be useful to identify those responses that are likely to have similar univariate models under the first approach.

  • Use of other multivariate methods such as MANOVA or the Bayesian interval approach referenced in Question 26b, which will appear in Part III of this series.

1 "Factor" is synonymous with "x," input, variable. A process parameter can be a factor as can an input material. For simplicity and consistency, "factor" is used throughout the paper. "Response" is synonymous with "y" and output. Here, "response" is either the critical quality attribute (CQA) or the surrogate for the CQA. For consistency, "response" is used throughout the paper.

2 For continuity throughout the series, Figures and Tables are numbered in succession. Figure 1 and Table I appeared in Part I of this article series.

Additional reading

1. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use, Q8(R1), Pharmaceutical Development, Step 5, November 2005 (core) and Annex to the Core Guideline, Step 5, November 2008.

2. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use, Q9, Quality Risk Management, Step 4 , November 2005.

3. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use, Q10, Pharmaceutical Quality System, Step 5, June 2008.

4. A Posterior Predictive Approach to Multiple Response Surface Optimization, John Peterson, 2004.

5. Potter C., et al..al.. A Guide to EFPIA #8217;s Mock P.2. Document, Pharm Tech 2006.

6. Glodek, M., Liebowitz, S, McCarthy, R., McNally, G., Oksanen, C., Schultz, T., Sundararajan, M., Vorkapich, R., Vukovinsky, K., Watts, C., and Millili, G. Process Robustness: A PQRI White Paper, Pharmaceutical Engineering, November/December 2006.

7. Box, G.E.P, W.G. Hunter, and J.S. Hunter (1978). Statistics for Experimenters: An Introduction to Design, Analysis and Model Building. John Wiley and Sons.

8. Montgomery, D.C. (2001).). Design and Analysis of Experiments. John Wiley and Sons.

9. Box, G.E.P.,and N. R. Draper (1969). Evolutionary Operation: A Statistical Method for Process Improvement. John Wiley and Sons.

10. Cox, D.R. (1992). Planning for Experiments. John-Wiley and Sons.

11. Cornell, J. (2002). Experiments with Mixtures: Designs, Models, and the Analysis of Mixture Data, 3rd Edition. John Wiley and Sons.

12. Duncan, A.J. (1974). Quality Control and Industrial Statistics, Richard D. Irwin, Inc., Homewood, IL.

13. Myers, R.H. and Montgomery, D.C. (2002).). Response Surface Methodology: Process and Product Optimization Using Designed Experiments. John Wiley and Sons.

14. Montgomery, D.C. (2001). Introduction to Statistical Quality Control, 4th Edition. John Wiley and Sons.

15. del Castillo, E. (2007).Process Optimization: A Statistical Approach. Springer. New Yor.k

16. Khuri, A. and Cornell, J. A. (1996.). Response Surfaces, 2nd Edition, Marcel-Dekker, New York.

17. MacGregor, J. F. and Bruwer, M-J. (2008). "A Framework for the Development of Design and Control Spaces", Journal of Pharmaceutical Innovation, 3, 15-22.

18. Mir and#243;-Quesada, G., del Castillo, E., and Peterson, J.J., (2004). "A Bayesian Approach for Multiple Response Surface Optimization in the Presence of Noise Variables", Journal of Applied Statistics, 31, 251-270.

19. Peterson, J. J. (2004). "A Posterior Predictive Approach to Multiple Response Surface Optimization", Journal of Quality Technology, 36, 139-153.

20. Peterson, J. J. (2008). "A Bayesian Approach to the ICH Q8 Definition of Design Space", Journal of Biopharmaceutical Statistics, 18, 958-974.

21. Stockdale, G. and Cheng, A. (2009). "Finding Design Space and a Reliable Operating Region using a Multivariate Bayesian Approach with Experimental Design", Quality Technology and Quantitative Management (in press).

Stan Altan is a senior research fellow at Johnson & Johnson Pharmaceutical R&D in Raritan, NJ. James Bergum is associate director of nonclinical biostatistics at Bristol-Myers Squibb Company in New Brunswick, NJ. Lori Pfahler is associate director, and Edith Senderak is associate director, scientific staff, both at Merck and Co. in West Point, PA. Shanthi Sethuraman is director of chemical product R&D at Lilly Research Laboratories in Indianapolis. Kim Erland Vukovinsky* is director of nonlinical statistics at Pfizer, MS 8200-3150, Eastern Point Rd., Groton, CT 06340, tel. 860.715.0916, kim.e.vukovinsky@pfizer.com. At the time of this writing, all authors were members of the Pharmaceutical Research and Manufacturers of America (PhRMA) Chemistry, Manufacturing, and Controls Statistics Experts Team (SET).

*To whom all correspondence should be addressed.

Submitted: Jan. 12, 2010. Accepted: Jan. 27, 2010.

Reference

1. S. Altan et al., Pharm. Technol. 34 (7) 66–70 (2010).

Acknowledgments

The authors wish to thank Raymond Buck, statistical consultant; Rick Burdick, Amgen; Dave Christopher, Schering-Plough; Peter Lindskoug, AstraZeneca; Tim Schofield and Greg Stockdale, GSK; and Ed Warner, Schering-Plough, for their advice and assistance with this article.

Note

See Part III of this article series