OR WAIT null SECS
The authors present a simple-to-use Microsoft Excel-based statistical tool that uses cumulative sum techniques to aid retrospective understanding of data trends.
Submitted: August 21, 2019; Accepted: November 21, 2019
Process variability must be assessed over time to ensure that pharmaceutical product quality is maintained. While the application of prospective statistical process control has been widely published, much less emphasis has been given to retrospective statistical analysis and associated methods and approaches. The authors present a simple-to-use Microsoft Excel-based statistical tool that uses cumulative sum techniques to aid retrospective understanding of data trends. Practical recommendations and experience of applying the tool from a pharmaceutical manufacturing context are also provided, including the teamwork needed to fully exploit these approaches through combining input from multiple disciplines.
In pharmaceutical development and manufacturing, scientists are often involved in assessing process variability over time. For example, having developed a new analytical method, the analyst needs to review data to ensure that the method continues to perform adequately over time; or scientists responsible for manufacturing processes need to demonstrate that product quality is maintained and that there is no evidence of systematic or special cause variability in processes. It can be misleading to apply traditional prospective techniques when analyzing data retrospectively (see below), and a useful alternative is cumulative sum techniques.
The tool presented in this paper incorporates such cumulative sum techniques and furthermore uses the Autocorners algorithm to automatically identify statistical changes in the average over time, which can aid problem-solving and process improvement.
The techniques and approaches described in this paper apply to any process where data are gathered in time-ordered sequence and one wishes to understand the process to ensure statistical control and variability within acceptable limits. For many processes, Shewhart control charts work well for prospective monitoring. The implementation of effective Shewhart charts involves a set-up phase prior to the subsequent run phase. The set-up phase includes capturing data over a pre-defined period of time exhibiting statistical control (absent of special cause variability) and calculating control limits based on the mean and standard deviation. These summary statistics are then applied to calculate control limits to be used subsequently in the run phase. Often, analysts want to review data retrospectively (e.g., in manufacturing, analyze data from previous batches of product to assess process robustness or to identify a root cause of a problem). In these cases, the traditional Shewhart chart does not apply in the conventional way. In particular, care needs to be taken in constructing a Shewhart chart from historical data where the limits applied are based on the same data being analyzed. In this case, the limits might be derived from data exhibiting special cause variability, which would inflate the limits calculated. An effective alternative to Shewhart charts in the situation of retrospective data analysis is CuSum charts and the Autocorners algorithm described in this paper.
Cumulative sum (CuSum) charts and Autocorners can be applied both in prospective and retrospective situations. Many texts (1–7) describe the improved sensitivity in identifying small shifts in the average or variance when using CuSum techniques compared with Shewhart methods. Table I distinguishes the benefits of Shewhart charts for prospective monitoring versus CuSums/Autocorners analysis.
• Primary goal: Identifying
special cause variation at
the time it occurs to
the causes and hence
reduce the frequency in
• Ideal for prospective
• Suitable when process is
• Primary goal: Visualising,
reducing the frequency
of step changes (in mean
• Able to pick up subtle
step changes and able to
identify any step change
• Ideal for retrospective
• Flexible in dealing with
unstable processes with
frequent step changes (in
mean or variation).
In constructing a CuSum chart, the data are first suitably ordered (e.g., by manufacturing date) and the following steps applied.
Suppose there is a set of results in sequence, denoted by x1, x2, … xn. The differences of each result from a target or reference value, T, are calculated, so the ith difference = xi-T. The cumulative sum of these differences can be calculated as follows:
S2 = cumulative sum of the first two differences in series = (x1-T) + (x2-T) = x1 + x2 – 2T
S3 = x1 + x2 + x3 – 3T
Sr=(∑rk=1(Xk)) - rT
These cumulative sums, abbreviated to CuSums, are plotted in time sequence to produce a CuSum chart. In many situations, the target or reference value T can be set to equal the average of the raw data, and in this case, the last point in the CuSum chart equals zero.
The effect of the CuSum is to produce a smoother picture of changes in data over time-a shift up or down in the raw data will appear as a change in slope in the CuSum.
Having plotted the CuSum data as a chart, it may be useful to assess whether an observed change in slope demonstrates a genuine signal (special or assignable cause) or whether it represents typical noise in the process (common cause variation). Several approaches to making this assessment are described by Taylor et al. (8). The basis for finding significant changepoints programmed into the Manhattan tool is the automatic search algorithm, Autocorners, described by Woodward and Goldsmith (6).
The tool has been applied successfully in a variety of different pharmaceutical applications, including:
The Autocorners algorithm identifies where statistical changes in slope occur in the CuSum chart. These change points are highlighted in a plot of the original raw data by splitting the data into stages and plotting the average per stage.
In the retrospective analysis of process data, Woodward and Goldsmith describe several approaches that can be applied to decide whether a change in the slope of a CuSum chart is real or due to noise. These include two manual approaches (the “span” method and the decision interval method) and one automatic search by computer. It is this latter approach, called “Autocorners”, that has been programmed into the Manhattan tool. The data in Figures 1–3 are used to illustrate the approach described as follows:
In the above steps, numerous tests of differences between adjacent segments are made. The overall significance level (α) has been explored by varying between 1, 5, and 10%, so that the algorithm would find increasing numbers of corners with higher α. In the authors’ experience, applying a 1% significance level works well for most of the data sets, and this is used as the default setting in the tool. The overall adjustment is applied for multiple comparisons described in BS 5703: Part 2. This standard has been superseded by BS ISO 7870-4 (3); however, this more recent version does not include the multiple comparisons details; two alternative references (9,10) have been provided for readers without access to the original standard.
As explained in the references (4,9,10), for retrospective analysis using CuSum charts, the individual tests should have significance level α√m /2N, where m=length of segment, N=total number of observations. The t-test with an ad hoc adjustment has significance level given by Lewis et al. (9), α/(2*√m). Where multiple tests are carried out in a series of total length N, the adjustment described by Osanaive et al. (10) also applies: (m/N)α. This leads to a combined adjustment of α (m/N)/(2*√m) = α√m /2N.
Having identified points of change or corners in the raw data, then a convenient representation of the data is provided in Figure 3. The raw data are shown as green triangles and connected with a blue line. (In situations where an unusual value has been identified, as in step 1 described previously, then such a point is connected to its neighbors with a green rather than blue line, such as shown in Figure 4.) In Figures 3, 4, and 5, the black line shows the CuSum plot. The red line shows the means across the regions between the break points, and shifts in this line indicate where the significant change points occur. This plot is called the Manhattan plot as it can look like the skyline of Manhattan. The changepoints in the Manhattan plot coincide with where the slope changes in the CuSum plot.
The Autocorners functionality described in this paper has been written in to the Excel tool using VBA.
Data entry and tool output. Raw data are entered into the spreadsheet. There is some additional functionality that is available including the ability to refresh the spreadsheet by deleting existing data. The tool is flexible to handling different numbers of response variables.
When the analysis is run, for each variable, two new worksheets are produced:
The choice of whether to include the CuSum chart or not depends on the nature and amount of data entered. In the following example, the CuSum is particularly helpful to visualize alongside the Manhattan plot to help confirm the key change points in the data. When viewing the Manhattan plot, some pragmatic interpretation is required (e.g., it is crucial to take account of the scale of the y-axis and to avoid over-interpreting changes which are not of practical importance). If changes are made to the data entry sheet and the analysis re-run, then the new sheets produced will over-write the previous ones.
Example of application-capsule dissolution investigation. The dissolution test for this product involves taking a random sample of six capsules from a batch and submitting them to the dissolution test. The dissolution test gives the percent dissolved at 90, 300, and 480 minutes, from which the rate of change between 300 and 480 is calculated. The average dissolution rate for the six capsules across multiple time-ordered batches has been recorded and analyzed using this tool. See Figure 4 for an example of the graphical output with the following features:
There are some other changes that can be seen in Figure 4 between batches 40 and 63, and around batch 275; however, these are more subtle and not sustained over a long time.
Various other processing factors were also recorded for these batches. These were examined to assess their variability and to seek a possible link with dissolution. One such variable is spray time, which has been analyzed and presented in Figure 5. It is notable how similar the pattern in the CuSum is for spray time compared to dissolution. This provides a plausible hypothesis for a causal relationship which could be further investigated in a designed experiment.
From practical experience, the following recommendations are made to fully exploit the tool and approaches:
° A corner identified using the automated algorithm may represent an unimportant change, scientifically.
° While step-changes are common in manufacturing processes (e.g., due to switch of batch of raw material, change in operator, change to procedure), other types of change can occur (e.g., drift or cycling)-these may not adequately be picked up by this tool.
The tool presented in this paper provides an automated approach to retrospective data analysis and interpretation. The tool is user-friendly and accessible, only requiring the user to paste in data and click a button to run the analysis. The output is easy to interpret comprising a CuSum plot, identification of significant change points and subsequent visualization using the Manhattan plot, and summary statistics of each stage between adjacent change points.
Important areas of application include manufacturing and engineering where some insightful applications of the tool have been delivered, particularly regarding trends in critical quality attributes (responses) and relating these to in-process parameters.
Access to tool. The tool can be downloaded from Box here: http://goto.az/manhattan. (Click the “Download” button to open in Excel or save the file. As it opens in Excel, click the button “Enable Content” and “Enable Editing” to proceed. Data can be entered to the worksheet “Data for Monitoring” following the instructions in the Help tab.)
Editor’s Note: The link is provided by the author, and Pharmaceutical Technology does not assume any liability for the contents of the linked file.
The authors wish to thank Linda McKinnon for her instrumental delivery of Excel VBA code.
1. G.B. Wetherill and D.W. Brown, Statistical Process Control (Chapman & Hall: London, 1991).
2. D.C. Montgomery, Statistical Quality Control, Vol 7. (Wiley, New York 2009).
3. BS ISO 7870-4:2011 Control Charts–Part 4: Cumulative Sum Charts (British Standards Institution: London, 2011).
4. BSI, BS5703: Part 2 1980, Guide to Data Analysis and Quality Control Using CuSum Techniques–Part 2: Decision Rules and Statistical Tests for Cusum Charts and Tabulations (British Standards Institution, London, 1980–1982).
5. V.V. Koshti, “Cumulative Sum Control Chart,” International Journal of Physics and Mathematical Sciences ISSN: 2277-2111 (Online) (2011).
6. R.H. Woodward and P.L. Goldsmith, Cumulative Sum Techniques, ICI Monograph No. 3 in the ‘mathematical and statistical techniques for industry’ series (Edinburgh: Oliver and Boyd, 1964).
7. Colin D. Lewis, International Journal of Quality & Reliability Management, 14 (2), pp.160-175 (1997).
8. A.L. Taylor, et al., Pharmaceut. Statist., 1: 25-34 (2002).
9. C.D. Lewis and K.A. Yeomans, Journal of the Operational Research Society, 46 (12), pp. 1471-1480 (1995).
10. P.A. Osanaiye and C.O. Talabi, Journal of the Royal Statistical Society, Series D (The Statistician), 38 (4), pp. 251-257 (1989).
Vol. 44, No. 3
When referring to this article, please cite it as R. Shaw and M. South, "Implementation of Autocorners Algorithm for Retrospective Process Monitoring," Pharmaceutical Technology 44 (3) 2020.