Statistical Considerations in Design Space Development (Part III of III)

Published on: 
Pharmaceutical Technology, Pharmaceutical Technology-09-02-2010, Volume 34, Issue 9

In the final article of a three-part series, the authors discuss how to present a design space and evaluate its graphical representation.

Parts I and II of this article appeared in the July and August 2010 issues, respectively, of Pharmaceutical Technology and discussed experimental design planning and design and analysis in statistical design of experiments (DoE) (1, 2). This article, Part III, covers how to present and evaluate a design space.

Design space is part of the US Food and Drug Administration's quality initiative for the 21st century which seeks to move toward a new paradigm for pharmaceutical assessment as outlined in the International Conference on Harmonization's quality guidelines Q8, Q9, and Q10. The statistics required for design-space development play an important role in ensuring the robustness of this approach.

This article provides concise answers to frequently asked questions (FAQs) related to the statistical aspects of determining a design space as part of quality-by-design (QbD) initiatives. These FAQs reflect the experiences of a diverse group of statisticians who have worked closely with process engineers and scientists in the chemical and pharmaceutical development disciplines, grappling with issues related to the establishment of a design space from a scientific, engineering, and risk-based perspective. Questions 1–7 appeared in Part I of this series (1). Questions 8–22 apeared in Part II (2). The answers provided herein, to Questions 23–29, constitute basic information regarding statistical considerations and concepts to consider when finalizing a design space and will be beneficial to a scientist working to develop and implement a design space in collaboration with a statistician.

Presenting a design space

This section reviews the presentation of the design space, including: tabular display of summary information, and graphical displays such as contour plots, three-dimensional surface plots, overlay plots, and desirability plots. The authors discuss the presentation of the design space based on multifactor equations or a multidimensional rectangular rather than as a system of multifactor equations. Traditionally, one would evaluate the design space before finalizing the presentation. In this article, however, the presentation of the design space is provided first in order to explain the graphics used in the evaluation stage.

Q23: How can a design space be presented?

A: When a design space has been developed from a statistical design of experiments (DoE), there are many effective displays, including tabular and graphical summaries. An example of a tabular summary is presented in Table III* from a two-factor central composite design described in Part II of this article series (2). Table III summarizes the quality characteristic and its specification or requirements, the range of the data realized in the experiment, the regression model, and the root mean square error (RMSE). The RMSE is an estimate of the standard deviation in the data (after model fit) that is not accounted for by the model. If there is an expected amount of variation for the quality characteristic, then this estimate can be used to determine whether the model fits the data appropriately. If the RMSE is too small, then the model may be over-fitting the data; if too large, the model may be missing some terms. The range of the observed data is provided to indicate where the model is applicable. Predictions outside this range (extrapolation) should be confirmed with additional experimentation. This summary is just one possibility for a tabular approach.

Table III: Summary of design space for assay and degradate 1.

Some effective graphical displays include contour plots, three-dimensional surface plots, overlay plots, and desirability plots. Each graph has strengths and weaknesses. It is anticipated that multiple graphs or graph types may be needed to clearly display the design space.

A contour plot is a two-dimensional graph of two factors and the fitted model for the response**. A contour plot is defined by vertical and horizontal axes that represent factors from the DoE. The lines on the contour plot connect points in the factor plane that have the same value on the response producing a surface similar to a topographic map. The contour lines show peaks and valleys for the response over the region studied in the DoE. When there are more than two factors in the experiment, the contour plots can be made for several levels of the other factors. (Figure 9* shows the contour plots for the models presented in Table III). In Figure 9, the red points included on these plots are the experimental design points; note that the axes extend from -1.5 to +1.5 coded units although the experimental space is from -1.41 to +1.41 coded units.

Figure 9: Response surface contour plot of assay and degradate 1 for Factors A and B. All figures are courtesy S.Altan et al.

Another useful display of the design space is the three-dimensional surface plot (see Figure 10). Figure 10 shows a three-dimensional plot of Factors A and B, the assay response surface on the left, and Degradate 1 response surface on the right; note that the contour plot is projected at the bottom of the graphic. Note that three-dimensional plots are ideal for showing the process shape, however, contour plots are more useful for determining or displaying acceptable operating ranges for process parameters.

Figure 10: Three-dimensional surface plot for assay and degradate 1 for Factors A and B.

When there is more than one quality characteristic in the design space, the use of overlay plots is helpful. Question 21 and Figure 8 in Part II of this article series provide an example of an overlay plot (2).

Multiple response optimization techniques can also be used to construct a design space for multiple independent or nearly independent responses. Each response is modeled separately and the predictions from the models are used to create an index (called a desirability function) that indicates whether the responses are within their required bounds. This index is formed by creating functions for each response that indicate whether the response should be maximized, minimized, or be near a target value. The individual response functions are combined into an overall index usually using the geometric mean of the individual response functions.

Figure 11 is a desirability contour plot in which both the assay and degradate 1 best meet their specifications. If the desirability index is near one, then both responses are well within their requirements. If the desirability is near or equal to zero, one or both responses are outside of their requirements. The most desirable simultaneous regions for these responses are the upper right and lower left. The desirability index, however, combines the responses into a single number that may hide some of the information that could be gained from looking at each response separately.

Figure 11: Desirability contours for assay and degradate 1 by factors A and B-multiple response optimization.

Q24: Why are some design spaces multidimensional rectangular and others not?

A: The shape of the design space depends on the relationships between the factors and the responses as well as any physical or computational restrictions. Historically, the proven acceptable region has been best understood as a rectangle. For example, in Figure 12, a rectangular design space is presented two ways, in blue and black. This concept can be extended to more than two dimensions, creating a multidimensional rectangle.

Figure 12: Overlay plot displaying functional and rectangular design spaces (Factors A, B).

In most cases, some feasible areas of operation will be excluded if the design space is specified by several ranges for individual factors. This approach will result in a multidimensional rectangle that will not be equal to the non-rectangular design space. Defining the design space as functional equations with restrictions as shown in Table III will enable one to take advantage of the largest possible design space region. The yellow region in Figure 12 is the space defined by equations with restrictions or specifications on the quality characteristics. The blue and black rectangles can be used as design space representations but neither one provides the largest design space possible. For ease-of-use in manufacturing, it may be practical to use a multidimensional rectangle within the design space as an operating region.

Evaluating a design space


This section addresses variability within the design space and the implications on the probability of passing specifications, in particular, when the process operates toward the edge of the design space. Alternative methods to specify the design space to account for this variability are discussed using the example data provided in previous sections of this article (1, 2). Suggestions on the placement of the normal operating region (NOR) and confirmation of design space are provided as well.

Q25: How do I take into account uncertainty in defining the design space for one factor and one response?

A: The regression line (shown in Figure 13) provides an estimate of the average degradate level. When Factor B is equal to 0.49, the regression line indicates that on the average, degradate will be 1.00%. This means that when operating at 0.49, there is roughly a 50% chance of observing degradate levels failing the specification by exceeding the upper limit of 1.00%. If one were to define the design space as the entire green region below 0.49 in Figure 13, it would come with a significant risk of failing specification at the edge, and thus not provide assurance of quality.

Figure 13: Empirical model of degradate (Y) on Factor B. Solid line represents Degradate = 0.72 +0.57 * Factor B. Dotted line represents the upper statistical limit.

Figure 13 also illustrates an approach that makes use of a statistical interval to protect against uncertainty. In the region described by the striped rectangle in Figure 13, the probability of passing the specification increases from 50%, at B = 0.49, to a higher probability, at B = –0.08. Thus, a range for Factor B that protects against the uncertainty and provides higher assurance of the degradate being less than or equal to 1.00% is between –1 to –0.08. This range corresponds to the solid green region in Figure 13. The width of the interval and the increased probability will change based on the interval that is selected.

There are multiple ways to establish intervals that can be calculated to protect against uncertainty. Question 26b provides more details on this increased assurance of quality.

Q26a: How much confidence do I have that a batch will meet specification acceptance criteria?

A: A design space is determined based on the knowledge gained throughout the development process. The goal of defining a design space is to demonstrate a region for the relevant parameters such that all specification acceptance limits are met. The size of the design space is represented by a region that is defined by parameter boundaries. Those parameter boundaries are determined by results of multifactor experiments that demonstrate where a process can operate to generate an acceptable quality drug product. Parameter boundaries of a design space do not necessarily represent the edge of failure (i.e., failure to meet specification limits). Frequently, those boundaries simply reflect the region that has been systematically evaluated.

Different approaches may be used define the size of the design space. The approaches are based on the type of statistical design that was used; the accuracy of the model used to define the operating ranges; whether the boundaries represent limits or edges of failure; and the magnitude of other sources of variability such as analytical variance. Each approach provides a certain level of confidence that future batches will achieve acceptable quality or a certain level of risk that future batches will not meet acceptable quality. The approaches include using a statistical model based on regression, using an interval based on approach, or using mechanistic models.

Q26b: Which statistical approaches ensure higher confidence?

A: Because of the inherent variability in the process and in the analytical testing, the boundary of the mean predicted response region also has inherent variability and thus is probably not exactly equal to the underlying true mean value of the responses. Building a buffer into the system may provide increased confidence in reliably meeting the specification. Two types of intervals can provide a buffer: a statistical interval around the normal operating range, and an interval on the edge of the design space. Figure 13 illustrates a design space for Factor B and its relation to the degradate specification of 1.00%. At the 0.50 value for Factor B, three of the six observations (50%) are outside of the specification. Using an interval approach and adjusting the operating region to 0.5 to -0.08 provides more assurance that responses from future batches will meet the specification. Figures 12 and 14 illustrate this same principle for a two-parameter design space. The yellow region in Figure 12 shows the design space based on the predicted value that provides about a 50% probability of passing the specification if the factors are set at the boundary of the design space. Figure 14 shows a yellow region based on an interval that although is smaller in size, increases the probability that batches manufactured within the region will meet specifications. Quantification of these probabilities is helpful in determining where to operate the process.

Figure 14: Two-factor (A, B) design space with confidence of passing specification.

As discussed in Question 19 and shown in Figures 13 and 14, an interval can be developed to quantify the level of assurance of passing specifications at the boundary of the design space, and can also quantify assurance throughout the design space, not just the boundary. Possible intervals include prediction, tolerance, or Bayesian.

  • A prediction interval captures a specified proportion of future batches.

  • A tolerance interval can be developed to include a specific proportion of batches with a specific confidence level. This approach may be more restrictive than a prediction interval because it will most likely ensure greater confidence.

  • A Bayesian interval accounts for uncertainty in estimating the model parameters and calculates a probability of meeting all specifications simultaneously. Figure 15 shows the Bayesian probability contour plot for the two-factor example.

Figure 15: Two-factor design space based on Bayesian contours for passing specification.

These intervals can be thought of as providing some "buffer" around the region that is based on mean results. Although an interval will decrease the size of the acceptable mean response region, there is greater confidence that future batches within the reduced region would meet the specifications (providing the assurance required for a design space), especially if the mean response is closer to the specification. Use of intervals may not reduce the acceptable mean response region on all boundaries. There may be cases that within the experimental region for example, when the responses are not close to their specifications. In such cases, the predicted response at the boundary of the region where the experiments were conducted may be well within specification. Using the predicted response may be appropriate. For example, if the degradate in the example was never greater than 0.3% and the specification was 1.0%, then the use of intervals is unlikely to reduce the region.

The use of an interval approach is a risk-based decision. Proper specification setting and use of control strategies should also be used to increase confidence in the design space and the strategy employed should fit the entire quality system.

Q26c:. Are there any other considerations when using an interval approach?

A: The interval approach incorporates the uncertainty in the measurements and the number of batches used in the experiments. As discussed in Question 7 (see Part I of this article series (1)), increasing the number of batches through additional design points or replicating the same design point increases the power of the analysis. If a small experiment is used to define the acceptable mean response region, then the predicted values will not reflect the small sample size. However, the interval approach DoEs reflect the small sample size resulting in a smaller region. In a similar way, if the variation in the data is large, then the interval approach will reduce the size of the region. A large difference between a region based on the acceptable mean response and one based on intervals indicates uncertainty in the mean response region. Often, an assumption when performing the statistical analysis is that the variability of the response is similar throughout the experimental region. In the situation where this assumption is not true, replication of batches near the boundary of the design space may be needed to increase the confidence that future batches will meet specifications and the variability explicitly modeled. Prior knowledge from other products or other scales may be incorporated into the estimates to increase confidence.

Q27: Where should the NOR be inside the design space? How close can the NOR be to the edge of design space?

A: Once the design space is established, there is often a desire to map out the area where one would operate routinely. Typically, NOR is based on target settings considering variability in process parameters. The target settings could be based on the optimality of quality, yield, throughput, cycle-time, cost, and so forth. The NOR around this target could be based as a function of equipment and control-systems variability. However, how close the NOR can be from the edge of the design space depends on how the design space is developed. For example:

  • If the design space is constructed based on the predicted values, then historically, the normal operating range is developed as a small interval around a set point. This NOR can move throughout the design space, but should include a buffer to keep the NOR from the edge and allow the NOR to be sufficiently within the design space. The sources of variability to consider in developing the buffer between the NOR and the design space edge are: variability associated with inputs such as raw/starting materials; variability in process controls, including set point tolerances and set point drifts; any operator-to-operator variability; measurement error (whether these are at-line or off-line); and any error associated with the modeling of the surface (e.g., amount of data, the factors and levels chosen, scale-up uncertainty).

  • If the design space description includes an interval-based approach which points to an area of higher assurance then, data dependent, there may be no buffer between the NOR and the interval boundary. The interval boundary may need to be updated as more manufacturing data become available. The design space boundary may stay constant. In any case, every company should have sound quality systems in place to ensure appropriate oversight on any changes to NORs.

Q28: I didn't run experiments along my design space boundary. How do I demonstrate that the boundary is acceptable?

A: The design space will only be as good as the mathematical or scientific models used to develop the design space. These models can be used to produce predictions with uncertainty bands at points of interest along the edge of the design space, which is contained within the experimental region. If these values are well within the specifications and there is significant process understanding in the models, then the prediction may be sufficient.

Q29: If I use production-size batches to confirm my design space, how should I choose the number of batches to run, and what strategy should I apply to select the best points?

A: There is no single recipe to choose the points to run in order to verify the design space when developed subscale. Several options are provided in the answer to Question 17 (see Part II of this article series (2)). Briefly, using either mechanistic or empirical models along with performing replicates could provide some idea of the average response along with an estimate of the magnitude of the variability. Alternatively, using existing models and running a few points at the most extreme predicted values may be a reasonable approach if the design space truly provides assurance that the critical quality attribute requirements will be met. Finally, a highly fractionated factorial (supersaturated) experiment of production size batches matched to the subscale batches is another way to confirm the design space.

* For continuity throughout the series, Figures and Tables are numbered in succession. Figure 1 and Table I appeared in Part I of this article series. Figures 2–8 and Table II appeared in Part II.

** "Factor" is synonymous with "x," input, variable. A process parameter can be a factor as can an input material. For simplicity and consistency, "factor" is used throughout the paper. "Response" is synonymous with "y" and output. Here, "response" is either the critical quality attribute (CQA) or the surrogate for the CQA. For consistency, "response" is used throughout the paper.

See Part I of this article series

Acknowledgments: The authors wish to thank Raymond Buck, statistical consultant; Rick Burdick, Amgen; Dave Christopher, Schering-Plough; Peter Lindskoug, AstraZeneca; Tim Schofield and Greg Stockdale, GSK; and Ed Warner, Schering-Plough, for their advice and assistance with this article.

Stan Altan is a senior research fellow at Johnson & Johnson Pharmaceutical R&D in Raritan, NJ. James Bergum is associate director of nonclinical biostatistics at Bristol-Myers Squibb Company in New Brunswick, NJ. Lori Pfahler is associate director, and Edith Senderak is associate director, scientific staff, both at Merck and Co. in West Point, PA. Shanthi Sethuraman is director of chemical product R&D at Lilly Research Laboratories in Indianapolis. Kim Erland Vukovinsky* is director of nonlinical statistics at Pfizer, MS 8200-3150, Eastern Point Rd., Groton, CT 06340, tel. 860.715.0916, At the time of this writing, all authors were members of the Pharmaceutical Research and Manufacturers of America (PhRMA) Chemistry, Manufacturing, and Controls Statistics Experts Team (SET).

*To whom all correspondence should be addressed.

Submitted: Jan. 12, 2010. Accepted: Jan. 27, 2010.


1. S. Altan et al., Pharm. Technol. Part I, 34 (7) 66–70 (2010).

2. S. Altan et al., Pharm. Technol. Part II, 34 (8) 52–60 (2010).

Additional reading

1. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use, Q8(R1), Pharmaceutical Development, Step 5, November 2005 (core) and Annex to the Core Guideline, Step 5, November 2008.

2. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use, Q9, Quality Risk Management, Step 4 , November 2005.

3. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use, Q10, Pharmaceutical Quality System, Step 5, June 2008.

4. A Posterior Predictive Approach to Multiple Response Surface Optimization, John Peterson, 2004.

5. Potter C., et A Guide to EFPIA #8217;s Mock P.2. Document, Pharm Tech 2006.

6. Glodek, M., Liebowitz, S, McCarthy, R., McNally, G., Oksanen, C., Schultz, T., Sundararajan, M., Vorkapich, R., Vukovinsky, K., Watts, C., and Millili, G. Process Robustness: A PQRI White Paper, Pharmaceutical Engineering, November/December 2006.

7. Box, G.E.P, W.G. Hunter, and J.S. Hunter (1978). Statistics for Experimenters: An Introduction to Design, Analysis and Model Building. John Wiley and Sons.

8. Montgomery, D.C. (2001).). Design and Analysis of Experiments. John Wiley and Sons.

9. Box, G.E.P.,and N. R. Draper (1969). Evolutionary Operation: A Statistical Method for Process Improvement. John Wiley and Sons.

10. Cox, D.R. (1992). Planning for Experiments. John-Wiley and Sons.

11. Cornell, J. (2002). Experiments with Mixtures: Designs, Models, and the Analysis of Mixture Data, 3rd Edition. John Wiley and Sons.

12. Duncan, A.J. (1974). Quality Control and Industrial Statistics, Richard D. Irwin, Inc., Homewood, IL.

13. Myers, R.H. and Montgomery, D.C. (2002).). Response Surface Methodology: Process and Product Optimization Using Designed Experiments. John Wiley and Sons.

14. Montgomery, D.C. (2001). Introduction to Statistical Quality Control, 4th Edition. John Wiley and Sons.

15. del Castillo, E. (2007).Process Optimization: A Statistical Approach. Springer. New Yor.k

16. Khuri, A. and Cornell, J. A. (1996.). Response Surfaces, 2nd Edition, Marcel-Dekker, New York.

17. MacGregor, J. F. and Bruwer, M-J. (2008). "A Framework for the Development of Design and Control Spaces", Journal of Pharmaceutical Innovation, 3, 15-22.

18. Mir and#243;-Quesada, G., del Castillo, E., and Peterson, J.J., (2004). "A Bayesian Approach for Multiple Response Surface Optimization in the Presence of Noise Variables", Journal of Applied Statistics, 31, 251-270.

19. Peterson, J. J. (2004). "A Posterior Predictive Approach to Multiple Response Surface Optimization", Journal of Quality Technology, 36, 139-153.

20. Peterson, J. J. (2008). "A Bayesian Approach to the ICH Q8 Definition of Design Space", Journal of Biopharmaceutical Statistics, 18, 958-974.