Looking ahead, by embedding the data-based process descriptors within monitoring and optimization algorithms, it will be viable
to move toward processes that inherently adopt a quality by design (QbD) concept. The real challenge in using statistical
tools to extract process descriptors is that in the development laboratory data are only available from a limited set of experiments.
Statistical awareness in assessing the experimental results is crucial to guide the development program.
In production, there is a need to ensure consistency of processes and provide tools that give an early warning of operational
and process deviations to allow timely corrective action. The knowledge extraction philosophy for production is different
from development. In production, the spread of data coverage is limited, and the objective is to look for occasional deviations
from predominantly "consistent" behavior. In development, the spread of data is significantly larger, and the objective is
to search for robust and effective design and operational areas. Consequently, the tools required for interpreting data into
knowledge demand different capabilities, and the implementation of different fundamental algorithms is often a prerequisite.
The knowledge extraction "toolbox" must, therefore, possess multiple capabilities.
The methods available for data analysis can address a significant proportion of industrial problems (2). A common strategy
for data analysis is compression, the goal of which is twofold: first, to describe the variation in the data from a statistical
data analysis perspective more effectively and, second, to enable the graphical representation of the data in the compressed
data subspace, which allows the analyst to interrogate the data. The idea here is that patterns inherent within 1000 variables,
for example, cannot be easily visualized individually, but using multivariate statistical projection-based techniques, the
problem can be reduced to a limited number of variation indicators. This enables the visualization of the latent patterns.
A multivariate statistical method that has been widely applied is principal component analysis. This technique has been extended
to take account of both batch and nonlinear behavior. Other visualization strategies to support the compression methods include
parallel coordinates plots (3).
There is a further challenge related to combining data from multiple sources logged at different frequencies. This raises
data pretreatment issues that must be carefully addressed if original data features are not to be masked or lost. In the production
environment, where consistency deviations are sought, these compression methods enable the interrogation and understanding
of changes in the original process variables to address these challenges.
In the development environment, further analysis is required to understand the behavior of the process in the operating space.
Data compression is the first step in data process model construction, which results in applying methods such as partial least
squares. Variants are also required to capture both the batch and nonlinear behavior. These models of process behavior can
then be used in knowledge-based experimental design strategies to design the most informative experiments.
Considering the solutions required, it might first be thought that "off-the-shelf" data analysis products can provide the
answers. There are, however, many hurdles to overcome between identifying problems and implementing solutions. These range
from data availability and configuration (which can typically take 33% of the time of the overall job) to tool identification,
where the combined technical competencies in statistics, engineering, and biochemistry are necessary. There are also problems
that require new fundamental data analysis approaches and, in particular, the area of process development is one where real
benefits can be derived through their implementation. More efficient use of data and its appropriate analysis can help to
reduce development times. Although this is an important business opportunity, the impact of small data sets raises further
Extracting knowledge across products is one way of supplementing the limited information during early-stage development, but,
again, data-analysis challenges are significant. One possible solution is combining data-based analysis with other information
sources, which will result in hybrid information structures becoming important.
For most applications, the tools to deliver a solution are available, but there is a vast difference between having a set
of tools and knowing how to use them effectively. The overall strategy adopted, and the selection of appropriate methods and
decisions required to progress from recognizing a process problem to quickly and effectively attaining a solution only comes
with experience. Those who believe that the purchase of a data-analysis package is the panacea to all problems need to be