Implementation of Autocorners Algorithm for Retrospective Process Monitoring

Robert Shaw

Robert Shaw is principal scientist statistics, Oral Product Development, Operations, AstraZeneca Pharmaceuticals, Macclesfield, Cheshire, United Kingdom, SK102NA, Robert.Shaw@astrazeneca.com.

,
Marie South

Marie South is associate director, Discovery Sciences Statistics, AstraZeneca Pharmaceuticals, Alderley Park, Cheshire, United Kingdom, SK104TG (until 2015).

Pharmaceutical Technology, Pharmaceutical Technology-03-02-2020, Volume 44, Issue 3
Pages: 34–39

The authors present a simple-to-use Microsoft Excel-based statistical tool that uses cumulative sum techniques to aid retrospective understanding of data trends.

Peer-Reviewed

Submitted: August 21, 2019; Accepted: November 21, 2019

Abstract

Process variability must be assessed over time to ensure that pharmaceutical product quality is maintained. While the application of prospective statistical process control has been widely published, much less emphasis has been given to retrospective statistical analysis and associated methods and approaches. The authors present a simple-to-use Microsoft Excel-based statistical tool that uses cumulative sum techniques to aid retrospective understanding of data trends. Practical recommendations and experience of applying the tool from a pharmaceutical manufacturing context are also provided, including the teamwork needed to fully exploit these approaches through combining input from multiple disciplines. 

 

In pharmaceutical development and manufacturing, scientists are often involved in assessing process variability over time. For example, having developed a new analytical method, the analyst needs to review data to ensure that the method continues to perform adequately over time; or scientists responsible for manufacturing processes need to demonstrate that product quality is maintained and that there is no evidence of systematic or special cause variability in processes. It can be misleading to apply traditional prospective techniques when analyzing data retrospectively (see below), and a useful alternative is cumulative sum techniques.

The tool presented in this paper incorporates such cumulative sum techniques and furthermore uses the Autocorners algorithm to automatically identify statistical changes in the average over time, which can aid problem-solving and process improvement.

Retrospective vs prospective monitoring

The techniques and approaches described in this paper apply to any process where data are gathered in time-ordered sequence and one wishes to understand the process to ensure statistical control and variability within acceptable limits. For many processes, Shewhart control charts work well for prospective monitoring. The implementation of effective Shewhart charts involves a set-up phase prior to the subsequent run phase. The set-up phase includes capturing data over a pre-defined period of time exhibiting statistical control (absent of special cause variability) and calculating control limits based on the mean and standard deviation.  These summary statistics are then applied to calculate control limits to be used subsequently in the run phase. Often, analysts want to review data retrospectively (e.g., in manufacturing, analyze data from previous batches of product to assess process robustness or to identify a root cause of a problem). In these cases, the traditional Shewhart chart does not apply in the conventional way. In particular, care needs to be taken in constructing a Shewhart chart from historical data where the limits applied are based on the same data being analyzed. In this case, the limits might be derived from data exhibiting special cause variability, which would inflate the limits calculated. An effective alternative to Shewhart charts in the situation of retrospective data analysis is CuSum charts and the Autocorners algorithm described in this paper. 

Cumulative sum (CuSum) charts and Autocorners can be applied both in prospective and retrospective situations.  Many texts (1–7) describe the improved sensitivity in identifying small shifts in the average or variance when using CuSum techniques compared with Shewhart methods. Table I distinguishes the benefits of Shewhart charts for prospective monitoring versus CuSums/Autocorners analysis.

CuSum Chart/Autocorners

•   Primary goal: Identifying 

      special cause variation at

      the time it occurs to

      facilitate understanding

      the causes and hence 

      reduce the frequency in 

      future.

 •   Ideal for prospective

      analyses.

 •   Suitable when process is

      predominantly stable.

•   Primary goal: Visualising, 

      understanding and

      reducing the frequency

      of step changes (in mean

      or variation).

 •   Able to pick up subtle

      step changes and able to

      identify any step change

      more quickly.

 •   Ideal for retrospective

      analyses.

 •   Flexible in dealing with

      unstable processes with

      frequent step changes (in

      mean or variation).

Overview of CuSum chart construction

In constructing a CuSum chart, the data are first suitably ordered (e.g., by manufacturing date) and the following steps applied.

Suppose there is a set of results in sequence, denoted by x1, x2, … xn.  The differences of each result from a target or reference value, T, are calculated, so the ith difference = xi-T.  The cumulative sum of these differences can be calculated as follows:

S2 = cumulative sum of the first two differences in series = (x1-T) + (x2-T) = x1 + x2 – 2T

S3 = x1 + x2 + x3 – 3T

Sr=(∑rk=1(Xk)) - rT

These cumulative sums, abbreviated to CuSums, are plotted in time sequence to produce a CuSum chart. In many situations, the target or reference value T can be set to equal the average of the raw data, and in this case, the last point in the CuSum chart equals zero.

The effect of the CuSum is to produce a smoother picture of changes in data over time-a shift up or down in the raw data will appear as a change in slope in the CuSum.

Having plotted the CuSum data as a chart, it may be useful to assess whether an observed change in slope demonstrates a genuine signal (special or assignable cause) or whether it represents typical noise in the process (common cause variation). Several approaches to making this assessment are described by Taylor et al. (8). The basis for finding significant changepoints programmed into the Manhattan tool is the automatic search algorithm, Autocorners, described by Woodward and Goldsmith (6).

The tool has been applied successfully in a variety of different pharmaceutical applications, including:

  • Monitoring stability of in-vitro assays in research

  • Assessing animal model performance by monitoring control groups over time

  • Aiding understanding of variability in critical quality attributes and processing parameters in manufacturing processes.

The Autocorners algorithm identifies where statistical changes in slope occur in the CuSum chart. These change points are highlighted in a plot of the original raw data by splitting the data into stages and plotting the average per stage.

 

 

Autocorners algorithm description

In the retrospective analysis of process data, Woodward and Goldsmith describe several approaches that can be applied to decide whether a change in the slope of a CuSum chart is real or due to noise. These include two manual approaches (the “span” method and the decision interval method) and one automatic search by computer. It is this latter approach, called “Autocorners”, that has been programmed into the Manhattan tool.  The data in Figures 1–3 are used to illustrate the approach described as follows:

  • The first step is to inspect the data for unusual observations that may be individual outliers. These are identified by testing whether an observation differs from both of its neighbors by more than 4.12 times the mean range (4.12 is the appropriate 0.1% significance point-see Weatherill et al. (1) Table 5.2).  If any are found, then they are replaced in subsequent calculations by the average of the two adjacent values in the series. It is important to note that these unusual values are not excluded from the final interpretation and are represented in the output plot; however, outliers should not overly distort the CuSum chart or skew the overall trends and associated calculations.

  • The overall mean of the data series is calculated, and CuSums are formed using the overall mean as reference value.

  • A forward search is carried out through the data series starting at the first point as follows: a chord is joined from each CuSum back to zero, and the maximum difference between the chord and corresponding CuSums is found (see Figure 1):
    a.         The position of this largest difference is used to split the intervening original observations into two groups, and a t-test is carried out between the mean values of the two groups.
    b.         When the value of t is not significant, the program moves to the next CuSum and repeats the process. When the value of t becomes significant, the program, subject to further tests, declares this position to represent a significant changepoint (called a “corner”).

  • A forward search is carried out through the data series from each corner as follows: a chord is joined from each CuSum back to the last corner, and the maximum difference between the chord and corresponding CuSums is found (see Figure 2). The same criteria a. and b. are applied as in 3 above.

  • When this forward search is completed, the program reverses the order and does a backward search.

  • Finally, the two lists of change points are amalgamated, and a list of significant change points made. 

In the above steps, numerous tests of differences between adjacent segments are made.  The overall significance level (α) has been explored by varying between 1, 5, and 10%, so that the algorithm would find increasing numbers of corners with higher α.  In the authors’ experience, applying a 1% significance level works well for most of the data sets, and this is used as the default setting in the tool.  The overall adjustment is applied for multiple comparisons described in BS 5703: Part 2.  This standard has been superseded by BS ISO 7870-4 (3); however, this more recent version does not include the multiple comparisons details; two alternative references (9,10) have been provided for readers without access to the original standard.

 

 

As explained in the references (4,9,10), for retrospective analysis using CuSum charts, the individual tests should have significance level α√m  /2N, where m=length of segment, N=total number of observations.  The t-test with an ad hoc adjustment has significance level given by Lewis et al. (9), α/(2*√m).   Where multiple tests are carried out in a series of total length N, the adjustment described by Osanaive et al. (10) also applies: (m/N)α.  This leads to a combined adjustment of α (m/N)/(2*√m) = α√m  /2N. 

Having identified points of change or corners in the raw data, then a convenient representation of the data is provided in Figure 3.  The raw data are shown as green triangles and connected with a blue line.  (In situations where an unusual value has been identified, as in step 1 described previously, then such a point is connected to its neighbors with a green rather than blue line, such as shown in Figure 4.)  In Figures 3, 4, and 5, the black line shows the CuSum plot.  The red line shows the means across the regions between the break points, and shifts in this line indicate where the significant change points occur. This plot is called the Manhattan plot as it can look like the skyline of Manhattan.  The changepoints in the Manhattan plot coincide with where the slope changes in the CuSum plot. 

Using the tool

The Autocorners functionality described in this paper has been written in to the Excel tool using VBA.

Data entry and tool output. Raw data are entered into the spreadsheet. There is some additional functionality that is available including the ability to refresh the spreadsheet by deleting existing data. The tool is flexible to handling different numbers of response variables. 

When the analysis is run, for each variable, two new worksheets are produced:

  • One sheet consists of a plot of the raw data with the Manhattan plot overlaid, together with the CuSum chart (if selected).

  • The other sheet provides statistical summary statistics and details of the different stages.

The choice of whether to include the CuSum chart or not depends on the nature and amount of data entered. In the following example, the CuSum is particularly helpful to visualize alongside the Manhattan plot to help confirm the key change points in the data. When viewing the Manhattan plot, some pragmatic interpretation is required (e.g., it is crucial to take account of the scale of the y-axis and to avoid over-interpreting changes which are not of practical importance). If changes are made to the data entry sheet and the analysis re-run, then the new sheets produced will over-write the previous ones.

Example of application-capsule dissolution investigation. The dissolution test for this product involves taking a random sample of six capsules from a batch and submitting them to the dissolution test. The dissolution test gives the percent dissolved at 90, 300, and 480 minutes, from which the rate of change between 300 and 480 is calculated.  The average dissolution rate for the six capsules across multiple time-ordered batches has been recorded and analyzed using this tool. See Figure 4 for an example of the graphical output with the following features:

  • There is one value deemed to be an unusual value–batch 84. This one value is connected to the ones either side by a green line.  The algorithm automatically identified this point (see point 1 in Autocorners Algorithm Description) and replaced it in calculations by the average of the two adjacent values.

  • The CuSum plot shows an important change in slope at batch 315. Prior to this point, the slope is predominantly negative, while subsequently the slope is consistently positive.

  • The change in slope of the CuSum corresponds to an upward shift in the raw data and an upward step-change in the Manhattan plot.

There are some other changes that can be seen in Figure 4 between batches 40 and 63, and around batch 275; however, these are more subtle and not sustained over a long time.

 

 

Various other processing factors were also recorded for these batches. These were examined to assess their variability and to seek a possible link with dissolution. One such variable is spray time, which has been analyzed and presented in Figure 5. It is notable how similar the pattern in the CuSum is for spray time compared to dissolution. This provides a plausible hypothesis for a causal relationship which could be further investigated in a designed experiment.

Practical considerations

From practical experience, the following recommendations are made to fully exploit the tool and approaches:

  • Ensure appropriate staff are involved in the review process, including both technical staff and, depending on the nature of the problem, operators as they are closest to the day-to-day running of the process.

  • Statisticians should be involved in the data review and interpretation. They should also take responsibility to raise awareness of the techniques and skill level of scientists and engineers in basic data manipulation, visualization, and statistical analysis.  In-house statistical training should be provided to scientists and engineers including understanding variability, statistical process control, and retrospective data analysis including CuSums.

  • While the tool operates in an automated fashion, it is important to not treat it as a black box, but to make practical interpretation of the trends in the raw data and change-points identified, for example:

°           A corner identified using the automated algorithm may represent an unimportant change, scientifically.

°           While step-changes are common in manufacturing processes (e.g., due to switch of batch of raw material, change in operator, change to procedure), other types of change can occur (e.g., drift or cycling)-these may not adequately be picked up by this tool.

  • Further statistical analysis may be required in some situations (e.g., more in-depth data visualization and multivariate data analysis).

Conclusion

The tool presented in this paper provides an automated approach to retrospective data analysis and interpretation. The tool is user-friendly and accessible, only requiring the user to paste in data and click a button to run the analysis. The output is easy to interpret comprising a CuSum plot, identification of significant change points and subsequent visualization using the Manhattan plot, and summary statistics of each stage between adjacent change points. 

Important areas of application include manufacturing and engineering where some insightful applications of the tool have been delivered, particularly regarding trends in critical quality attributes (responses) and relating these to in-process parameters. 

Access to tool. The tool can be downloaded from Box here: http://goto.az/manhattan.  (Click the “Download” button to open in Excel or save the file.  As it opens in Excel, click the button “Enable Content” and “Enable Editing” to proceed.  Data can be entered to the worksheet “Data for Monitoring” following the instructions in the Help tab.)

Editor’s Note: The link is provided by the author, and Pharmaceutical Technology does not assume any liability for the contents of the linked file.

Acknowledgments

The authors wish to thank Linda McKinnon for her instrumental delivery of Excel VBA code.

References

1. G.B. Wetherill and D.W. Brown, Statistical Process Control (Chapman & Hall: London, 1991).
2. D.C. Montgomery, Statistical Quality Control, Vol 7. (Wiley, New York 2009).
3. BS ISO 7870-4:2011 Control Charts–Part 4: Cumulative Sum Charts (British Standards Institution: London, 2011).
4. BSI, BS5703: Part 2 1980, Guide to Data Analysis and Quality Control Using CuSum Techniques–Part 2: Decision Rules and Statistical Tests for Cusum Charts and Tabulations (British Standards Institution, London, 1980–1982).
5. V.V. Koshti, “Cumulative Sum Control Chart,” International Journal of Physics and Mathematical Sciences ISSN: 2277-2111 (Online) (2011).
6. R.H. Woodward and P.L. Goldsmith, Cumulative Sum Techniques, ICI Monograph No. 3 in the ‘mathematical and statistical techniques for industry’ series  (Edinburgh: Oliver and Boyd, 1964).
7. Colin D. Lewis, International Journal of Quality & Reliability Management, 14 (2), pp.160-175 (1997).
8. A.L. Taylor, et al., Pharmaceut. Statist., 1: 25-34 (2002).
9. C.D. Lewis and K.A. Yeomans, Journal of the Operational Research Society, 46 (12), pp. 1471-1480 (1995).
10. P.A. Osanaiye and C.O. Talabi, Journal of the Royal Statistical Society, Series D (The Statistician), 38 (4), pp. 251-257 (1989).

Article Details

Pharmaceutical Technology
Vol. 44, No. 3
March 2020
Pages: 34–39

Citation

When referring to this article, please cite it as R. Shaw and M. South, "Implementation of Autocorners Algorithm for Retrospective Process Monitoring," Pharmaceutical Technology 44 (3) 2020.

 

 

 

download issueDownload Issue : Pharmaceutical Technology-03-02-2020