A Statistical Decision System for Out-of-Trend Evaluation

Published on: 
Pharmaceutical Technology, Pharmaceutical Technology-01-02-2017, Volume 41, Issue 1
Pages: 34–43

The authors present a set of statistical decision rules based on linear regression models that can be implemented in an automated trend system to assist stability studies.

Submitted: Feb. 22, 2016.
Accepted: Apr. 7, 2016.

The authors present a set of statistical decision rules based on linear regression models that can be implemented in an automated trend system to assist stability studies. The models combine historical stability and analytical method data with data from stability studies, and allow the responsible person to routinely evaluate stability results based on statistical tools, without the need for expert statistical assistance. The system provides a fast and standardized framework for evaluating parameters that approximately follow a linear degradation path or are constant.

Evaluation of data from stability studies is a central part of the control strategy of pharmaceutical products and is a GMP requirement (1). The purpose is to ensure the safety and efficacy of the product by confirming that the stability is as expected and that it will continue to meet quality specifications until expiry. Stability studies can be part of the development program for new products or the ongoing stability program for marketed products. The studies are typically conducted both at long-term storage conditions and at accelerated conditions.

For stability studies on marketed products, the objective is to confirm that the stability profile follows the trend of earlier batches. Unexpected results may either indicate that the batch is out-of-trend or that the result is out-of-trend (OOT). A typical approach to evaluate the data is to consider the following three questions (2):

  • Is the latest result within the expected range, or is the result substantially different from what is expected? The latter is known as an analytical alert and would usually be related to the analytical procedure or the handling of the stability sample, and more rarely to the actual stability of the product.

  • Does the stability of the batch follow the expected trend compared to historical stability data? Or are there indications that the batch degrades in a different manner than observed earlier, which could indicate a special cause event has occurred in the production of the batch? This is known as a process control alert and will often lead to the conclusion that the batch is OOT.

  • Will the product comply with specifications throughout the shelf life? In the event of a process control alert, the batch is known to deviate from the historical expectations. The stability should be examined and evaluated to ensure that the batch stays within the specifications. When this stability is questionable, a compliance alert is raised.

The evaluation can be performed subjectively by an analyst, but it requires long experience with the analytical method and the product and its distinct properties. Also, different analysts or different laboratories may conduct the trending; they may have different experience and evaluate the data differently as a result. However, an objective evaluation requires data from different sources to be combined, namely the precision of the analytical method and the stability trend of historical batches and their associated uncertainty, and this is a burden both practically and statistically.

Statistical tools can control the risks of false alarms, when the product and result are actually within the expected range, and the risk of overlooking an OOT. The factors of uncertainty that need to be considered are:

  • How much historical data is available-how well is the expected slope determined?

  • Are there historical batch-to-batch variations in the slope?

  • What is the intermediate precision of the analytical method and how well is this determined?

  • Is the variation in the current stability study comparable to historical intermediate precision, and if so, what is the combined estimated precision?

  • How much data are available in the current study?

  • How much confidence is there in the predicted value of the batch when extrapolating to end-of-shelf life?

Unless a system is in place that facilitates the combination and statistical evaluation of data in an automated and standardized manner, the evaluation of stability data will be laborious and may require expert statistical assistance, which is usually not readily available at all the facilities where data are generated and evaluated.


A number of different approaches for evaluating stability data from a statistical perspective have been proposed in recent years (2-5). In this paper, the authors consider only parameters that follow a linear stability trend (or are constant). In this approach, the analysis is based on linear regression models that combine the efficiency of a parametric statistical model with the practical aspect of being relatively simple and intuitive.

From the authors’ experience, the vast majority of parameters that are followed in stability are approximated well by zero or first-order kinetic reactions, which lend themselves to linear regression analyses. Parameters that do not develop linearly must be evaluated, for instance, by tolerance interval methods by time point (3), or by more advanced kinetic models of the stability profile. These methods will not be considered here.

An overview of the system is provided in the following sections. Statistical details are deferred to the appendix.

System setup

The system is illustrated in Figure 1. The system supports a work flow where the stability responsible person routinely evaluates and releases results in a stability study as they are available. Stability data is stored in a laboratory information management system (LIMS). To evaluate the trend questions discussed in the previous section, historical data and data on the analytical variability of the method are needed. These data are stored in a database with tables for each product.

The combination of the two data sources and the statistical analysis and presentation of results is implemented in a computer program (JMP, SAS Institute) (6), but other systems for data analysis and visualization can be used. The evaluation of results and alerts is conducted on a computer screen.


The parameter table with historical data

Historical stability data is summarized in a parameter table (see Table I) for each product. The table should be based on batches and results that are representative of the current product and analytical methods.

The parameter table should be established based on statistical analysis of historical stability data that are representative of the current product. For new products, typically data from the new drug application (NDA) stability studies and other development stability studies will be used. For marketed products, the body of historical routine stability data can be used.

The analysis of the historical data should be based on a regression analysis, in which the average stability trend is determined. In the model, each batch should have its own intercept to account for batch-to-batch variation in the starting level. If the stability slope varies slightly from batch-to-batch due to random variations, for instance in raw materials or input factors, a mixed model with random slopes can be used (5).

The intermediate precision of the analytical method should preferably be estimated as the residual variation in historical stability data, because this estimate will cover long-term variation in the method and also any other variation in stability studies, for instance, due to sampling and handling of the samples. Alternatively, method validation data or variation in control samples can be used.

The construction of the parameter table is typically a large task and may require a cross-functional team of analytical chemists, product responsible chemists, and statisticians. It is advisable to ensure careful documentation and control of the parameter table because it is the cornerstone of the stability trend evaluation.

Generally, the parameter table need only be established once for each product, but it may be necessary to update the table over time if there are changes to the stability profile of the product or to the analytical methods, or if the initial parameter table is based on a relatively small body of stability data and more precise estimates are obtained over time.

The parameter table summarizes all the historical knowledge of the product and the analytical methods in a single table. Thus, there is a wealth of information in the table, and the creation of the table ensures that the expectation of the stability study is clear across the organization. By using the same parameter table for trending, consistency in the evaluation of the data across persons, departments, and sites is ensured, which is an important benefit of the system.


Routine trend evaluation

When conducting routine trending, stability data are retrieved from the LIMS and combined with the parameter table. The system processes the data and presents a graph for each parameter, batch, and storage condition. The graphs illustrate the data and summarize the statistical evaluation of the three trend questions.

Is the latest result comparable with the results previously seen for the same batch in the study?
This trend is evaluated by a prediction interval based on the stability results for each batch, excluding the latest result. If the latest result falls within the prediction interval, it can be concluded that it follows the trend seen so far, within the expected uncertainty range.

Typically, a 99% prediction interval will be used to have a reasonably low risk (1%) of a false alarm. This interval corresponds approximately to ±3 standard deviations around the expected value.

The historical stability slope in the parameter table is not used in this evaluation, but the historical intermediate precision of the method is used to calculate the variance of the result.

The result of the analysis is indicated graphically by plotting the data with the regression line, calculated with the latest result excluded, and by overlaying ±3 standard deviation error bars on the latest result. This approach provides a simple visual check for whether the result is within the expected range. The conclusion of the statistical analysis is illustrated visually by plotting the latest result with a red symbol, if the result is outside the 99% prediction interval. An example is provided in Figure 2.

Is the development of the parameter comparable to the development of the same parameter in historical studies?
This trend is addressed by a regression analysis, in which the estimated slope of the current batch is compared with the expected slope from the parameter table. Based on a t-test, the statistical significance of any difference can be assessed, accounting for the uncertainty of both the current estimated slope and the expected slope. The uncertainty of the expected slope can express both estimation uncertainty and, if relevant, random batch-to-batch variation in the slope (5). Typically, a significance level of 1% will be used to avoid too many false alarms, corresponding to the 99% intervals used above.

The result of the analysis is indicated graphically by plotting the regression line for the batch (the green line in Figure 3) as well as a line with the expected slope (dotted line in Figure 3). If a statistically significant difference is observed, all points can be plotted with a separate color to provide the stability responsible person with a clear visual indication that this statistically significant difference needs to be evaluated and possibly investigated further.

Can compliance with the specification limits be expected to be maintained until the end of study?
This analysis is conducted following the principles in (7) by evaluating if the 95% confidence interval for the batch intersects the specification limit before the end of shelf life. A one- or two-sided confidence interval is used depending on whether the specification is one- or two-sided, respectively.

If the batch is confirmed to be OOT and there is less than 95% confidence that it will comply with the specification during shelf life, a compliance alert is raised (see Figure 4). The evaluation of criticality is not only a statistical exercise, but the statistical result may be used to evaluate the effect of reducing shelf life or other mitigations.



Practical use of the system

In the practical use of the system, all data for a given time point are evaluated and a graphical overview of the different parameters, batches, and storage conditions presented. The graphical illustrations of alerts make it easy to get an overview of the data. In case one or more alerts are identified, summary tables with estimates and statistical details are available to interpret the findings.

When evaluating alerts, the trend responsible person should be aware of a number of pitfalls and understand the limitations of the methods used:

  • Rounded and truncated results: The trend analysis requires data with sufficient resolution. In particular, impurity data are often rounded to one decimal and truncated when they are below the limit of quantification. It is important that a database with the unrounded results is available for the trend analysis; if not, the trend system may not analyze impurity data correctly.

  • Non-linear trend: The system assumes a linear trend over time (or no trend). This approach is typically reasonable, but complex biological reactions or physical parameters are not necessarily linear. In this case, the results of the system should be interpreted with much care, and trending may need to be conducted by other methods, for instance, the by-time-point method (2).

  • Multiplicity: A number of statistical tests are conducted for each time point. For instance, if three batches are followed at three different storage conditions and five parameters are evaluated for each, a total of 45 tests are conducted. With a significance level of 1% for each test, there is a risk of 1-0.9945=36% of at least one false alert. Because there is no correction for this risk, it is important that the stability responsible person is aware of the risk of a false alert and uses good judgement when evaluating alerts.

  • Independent results: It is an assumption in the analysis that all results are independent. When this is not the case, for instance, if two determinations are obtained in the same analytical run, there is a risk of over-interpreting findings and getting too many false alarms. The correlation between multiple results can be handled statistically using random effects models, but this method is difficult to automate in a system like this.

  • Only the latest result is evaluated: Previous OOT results in the same study should be excluded before the analysis; otherwise, these previous OOT results may mask new OOT results. The system supports a work flow where the OOT evaluation is conducted routinely after each result, and, therefore, only the latest result is evaluated.

  • Patterns across batches: The system analyzes each batch, parameter, and storage condition separately, giving a relatively simple framework, but it means that patterns across similar batches or storage conditions are not discovered. These patterns must be evaluated subjectively or by more advanced statistical analyses in specific cases.

  • Number of results available. The system can, in principle, estimate the stability slope based on two results, using the historical standard deviation as an estimate of the residual variation in the data. But clearly, the analysis will have low sensitivity until more time points are available.

The computer system should be validated for GMP-use. However, by building the system on existing validated computer systems, the validation effort is relatively smaller than if the system was built from scratch.

Comparison with other methods
As discussed, the methods presented rely on linear trend models with normally distributed errors. They are, therefore, less general than OOT methods that do not rely on these assumptions, such as the ”change-from-previous” type methods and by-time-point methods presented in references 2 and 3, but they provide a simpler and more efficient setup when the assumptions are fulfilled.

The methods can be compared with other published regression methods as follows:

  • Analytical alert: The method presented here is very similar to the regression control chart method (3-5) based on a prediction interval. A difference, however, is that the authors’ method uses a pooled variance based on the historical variance and the variance in the present study. This approach will increase the power of detecting an OOT, provided that the variation in the historical data is comparable to that of the current study. If the historical variance is not entered in the parameter table, the authors’ test simplifies to the regression control chart method.

  • Process control alert: The method presented compares the slope of the current batch with the average slope of historical batches by a t-test, allowing for uncertainty in the estimated slopes and random batch-to-batch variation in the historical slopes. As such, the interpretation of the test is similar to the slope-control chart method (3), though the statistical framework is slightly different. If the standard error of the historical slope accounts for uncertainty in the slope only, the method is similar to the test for poolability of batches (7), except for the fact that all the historical batches are pooled before comparison with the current batch and the fact that a pooled variance is used in the test. If the standard error of the historical slope includes random variation between batches, the framework is similar to the random coefficient regression (6), where the model is used to set limits for individual results.

  • Compliance alert: The method presented is the same as used in reference 7, where batches are not pooled and each batch is thus considered individually.

Conclusions and further development

The trend analysis system provides the trend responsible person with exact and reproducible results for evaluating stability data. It makes the evaluation of data objective and standardized, and provides greater flexibility in terms of who does the trending.

The system provides valuable summary measures for each batch, such as the estimated slope with confidence limits, a statistical test for whether the batch is comparable with historical batches, and the expected shelf life based on extrapolation of confidence intervals. The system makes it easy to account for the different sources of uncertainty in the evaluation of the data and thus provides control over the risk of false alarms and the risk of overlooking an OOT.

The system is relatively simple to implement, validate, and maintain, and can be based on a statistical software package such as JMP and existing database solutions, such as LIMS. The statistical methods strike a reasonable compromise between being relatively simple, based on linear regression model for each batch, yet sufficiently complex to handle, for instance, mixed effect models with random variation in the slope between batches.

Generating the database of parameter tables for all products requires analyses of historical data. Though this effort is a prerequisite for conducting a trend analysis, whether a trend system is used or not, the practical work of establishing, documenting, and maintaining the parameter tables in a system such as this should not be underestimated.

The system is not designed to encompass all parameters, and some level of “manual” trending should, therefore, be expected even with this system. Parameters that do not follow a linear pattern or ordinal responses cannot be analyzed by the system currently. Also, impurity data that are truncated below limit of quantification may need to be trended by other methods. One could extend the system, for instance, by including functionality for transforming responses to linearize the trend, or to include tolerance intervals methods. Still, it is important that the results of the analyses are intuitive and easy to interpret, and this feature should be a cardinal point when extending the system.

About the Authors

Niels Væver Hartvig, PhD, is a principal specialist, nvha@novonordisk.com, tel.: +45 30790913, and Liselotte Kamper is a chemist, both at Novo Nordisk A/S, Smørmosevej 17-19, DK-2880 Bagsværd, Denmark.


The system has been developed by a team of stability responsible persons in Novo Nordisk. In particular, the authors acknowledge valuable discussions and input from Marika Ejby Reinau, Jens Krogh Rasmussen, Helle Lindgaard Madsen, Lone Steenholt, Carsten Berth, and Karin Bilde.


1. EudraLex, The Rules Governing Medicinal Products in the European Union, Volume 4, Chapter 6.
2. PhRMA CMC Statistics Stability Expert Teams, Pharm. Technol., 29 (10), 66 (2005).
3. PhRMA CMC Statistics and Stability Expert Teams, Pharm. Technol. 27 (4), 38-52 (2003).
4. A. Torbovska and S. Trajkovic-Jolevska, Pharm. Technol., 37 (6), 48 (2013)
5. ECA, Laboratory Data Management Guidance; Out of Expectation (OOE) and Out of Trend (OOT) Results (draft, Aug 15, 2015).
6. JMP Statistical Software, SAS Institute Inc.
7. ICH, Q1E, Evaluation for Stability Data (Step 4 version, 2003).

Article Details

Pharmaceutical Technology
Vol. 41, No. 1
Pages: 34–43


When referring to this article, please cite it as N. Hartvig and L. Kamper, “A Statistical Decision System for Out-of-Trend Evaluation," Pharmaceutical Technology 41 (1) 2017.