Experimental design planning
Developing a design space requires a significant level of planning. A critical aspect of QbD is to understand which factors*
affect the responses and#8224; associated with the CQAs and to specify the operating regions that meet product requirements.
Data-mining and risk assessment contribute to the selection of factors and responses and to the planning of a DoE. In the
following FAQ section, the authors explore the role of historical data, preliminary runs, DoE, responses, factors, and factor
ranges in developing a design space.
Q1a: What is the role of prior knowledge such as historical data, in developing a design space?
A: A starting point for statistical experimental design is the review of available historical information. Historical data might
consist of information obtained from previous commercialized products and processes or literature and fundamental scientific
understanding. Some possible sources of useful information include: exploratory laboratory-trial data, analytical data, stability
data, batch-release data, regular production batches, and deviation data such as factors that do not meet proven acceptable
ranges or operating ranges or responses that fail to meet control limits or specifications.
Q1b: How and what type of information can be gleaned from historical data?
A: Information gleaned from historical data can provide valuable information in the planning of designed studies. As part of
planning, it is critical to identify appropriate responses-that is, those responses that are linked to the quality attributes
and factors that could contribute to the variability in identified quality attributes. Historical data also can provide information
on relationships between factors and responses that can be used to set up experimental studies. For example, information from
unit operations or on equipment capability could provide a starting point for factors and factor ranges to include in the
designed studies. Although analysis of historical data may provide information on relationships, it should be understood that
the structure of the data may affect the conclusions of the analysis. The relationships in the context of historical data
can do no more than point to an association; the data cannot be understood as causal. When the historical database is extensive,
a variety of tools can be applied to understand relationships, including Principal Component Analysis (PCA) and Partial Least
Squares (PLS).
Q1c: What problems can arise when using historical data?
A: It is important to exercise caution when mining historical data because historical data are restricted to observed variation
in levels versus forced controlled variation in levels from a planned statistical design. Gaps in knowledge, therefore, are
likely. Problems can arise due to the nature of observational data. No analysis can make up for deficiencies in the structure
and quality of the data. Below are a few critical points that can arise when analyzing historical data.
- Multicollinearity: In observational data in which two or more factors are highly correlated, it is difficult to identify which
factor(s) are affecting the response. No analysis can remedy this problem.
- Missing factors: Important factors may not be recorded in the data and if the recorded data are correlated with unrecorded
causal factors, a partial relatonship may be mistakenly proposed. A common mistake is to attribute causality when associational
relationships are all one can propose until a confirmatory experiment is carried out.
- Missing data and imbalance: Historical data can frequently suffer from a relatively large proportion of missing information.
Prediction in areas where there is no information or extrapolation to areas where no data exists can be highly misleading.
- Precision of the data: Over-rounded data can lead to misleading conclusions.
- Range of factor levels: Data with a wider range may provide knowledge on a relationship. If the inputs haven andapos;t been
varied across an appropriate range, the relationship won andapos;t be detected.
The relationships established by historical-data analysis should be confirmed by appropriate DOEs before defining or expanding
a design space.
|