A Review of Statistical Outlier Methods

November 2, 2006
Pharmaceutical Technology, Pharmaceutical Technology-11-02-2006, Volume 30, Issue 11

Outliers may provide useful information about the development and manufacturing process. Analysts use various statistical methods to evaluate outliers and to reduce their impact on the analysis. This article describes some of the more commonly used identification methods.

Statistical outlier detection has become a popular topic as a result of the US Food and Drug Administration's out of specification (OOS) guidance and increasing emphasis on the OOS procedures of pharmaceutical companies. When a test fails to meet its specifications, the initial response is to conduct a laboratory investigation to seek an assignable cause. As part of that investigation, an analyst looks for an observation in the data that could be classified as an outlier. The FDA guidance "Investigating Out of Specification (OOS) Test Results for Pharmaceutical Production" and the US Pharmacopeia are clear that a chemical result cannot be omitted with an outlier test, but that a bioassay can be omitted with an outlier test (1). The two areas specifically prohibited from outlier tests are content uniformity and dissolution testing.

An outlier is defined as an observation that "appears" to be inconsistent with other observations in the data set (2). An outlier has a low probability that it originates from the same statistical distribution as the other observations in the data set. On the other hand, an extreme value is an observation that might have a low probability of occurrence but cannot be statistically shown to originate from a different distribution than the rest of the data.

Why study outliers

Outliers can provide useful information about the process. An outlier can be created by a shift in the location (mean) or in the scale (variability) of the process. Though an observation in a particular sample might be a candidate as an outlier, the process might have shifted. Sometimes, the spurious result is a gross recording error or a measurement error. Measurement systems should be shown to be capable for the process they measure. Outliers also come from incorrect specifications that are based on the wrong distributional assumptions at the time the specifications are generated.

How to handle outliers

Once an observation is identified—by means of graphical or visual inspection—as a potential outlier, root cause analysis should begin to determine whether an assignable cause can be found for the spurious result. If no root cause can be determined, and a retest can be justified, the potential outlier should be recorded for future evaluation as more data become available. Often, values that seem to be outliers are the right tail of a skewed distribution. When reporting results, it is prudent to report conclusions with and without the suspected outlier in the analysis. Removing data points on the basis of statistical analysis without an assignable cause is not sufficient to throw data away. Robust or nonparametric statistical methods are alternative methods for analysis. Robust statistical methods such as weighted least-squares regression minimize the effect of an outlier observation (3).

There are various approaches to outlier detection depending on the application and number of observations in the data set. Iglewicz and Hoaglin provide a comprehensive text about labeling, accommodation, and identification of outliers (4). Visual inspection alone cannot always identify an outlier and can lead to mislabeling an observation as an outlier. Using a specific function of the observations leads to a superior outlier labeling rule. Because data are used in estimation with classical measures such as the mean being highly sensitive to outliers, statistical methods were developed to accommodate outliers and to reduce their impact on the analysis. Some of the more commonly used identification methods are discussed in this article.

Box plot

A box plot is a graphical representation of dispersion of the data. The graphic represents the lower quartile (Q1) and upper quartile (Q3) along with the median. The median is the 50th percentile of the data. A lower quartile is the 25th percentile, and the upper quartile is the 75th percentile. The upper and lower fences usually are set a fixed distance from the interquartile range (Q3 – Q1). Figure 1 shows the upper and lower fences to be set at 1.5 times the interquartile range. Any observation outside these fences is considered a potential outlier. Even when data are not normally distributed, a box plot can be used because it depends on the median and not the mean of the data.

Figure 1

Trimmed means

A trimmed mean is calculated by discarding a certain percentage of the lowest and the highest scores and then computing the mean of the remaining scores. The trimmed mean has the advantage of being relatively resistant to outliers. When outliers are present in the data, trimmed means are robust estimators of the population mean that are relatively insensitive to the outlying values. After viewing the box plot, a potential outlier might be identified. If the upper and lower 5% of the data are removed, then it creates a 10% trimmed mean. An example of trimmed means is the recent change in the Olympic scoring system for ice skating, in which the highest and lowest scores are eliminated, and the mean of the remaining scores is used to assess skaters' scores. If a trimmed mean is presented, the untrimmed mean should be presented for comparison.

Extreme studentized deviate

The extreme studentized deviate (ESD) test is quite good at identifying a single outlier in a normal sample. The maximum deviation from the mean:

is calculated and compared with a tabled value (see Table I). If the maximum deviation is greater than the tabled value, then the observation is removed, and the procedure is repeated. If no observation exceeds the tabled value, then we cannot claim there is a statistical outlier. A downfall of this method is that it requires the assumption of a normal data distribution. Usually, this assumption holds true as the sample size gets larger, though a formal test such as the Andersen–Darling method can be used to test the assumption (5). This approach can be generalized to investigate multiple outliers simultaneously. Table I is an example of 10 observations (raw data). Based on Table II, the critical value for N = 10 at an α level of 0.05 is 2.29. Therefore, the data value 16.3 is an outlier because it corresponds to a studentized deviation of 2.49, which exceeds the 2.29 critical value.

Table I: Example of extreme studentized deviate test.

Table II: Critical values for the extreme studentized deviate test (Reference 4).

Dixon-type tests

Dixon-type tests are based on the ratio of the ranges. These tests are flexible enough to allow for specific observations to be tested. They also perform well with small sample sizes. Because they are based on ordered statistics, there is no need to assume normality of the data. Depending on the number of suspected outliers, different ratios are used to identify potential outliers. The first class of ratios, r10, is used when the suspected outlier is the largest or smallest observation. The second set of ratios, r11, is used when the potential observation is the second smallest or second largest. Situations like these arise because of masking. Masking occurs when several observations are close together, but the group of observations is still outlying from the rest of the data. Masking is a common phenomenon especially for bimodal data (i.e., data from two distributions). There are additional sets of ratios depending on how many masked points are excluded (6). The following equations are the r10 and r11 ratios:

Testing the largest observation as an outlier:

Testing the smallest observation as an outlier:

Testing the largest observation as an outlier avoiding the smallest observation:

Testing the smallest observation as an outlier avoiding the largest observation:

If the distance between the potential outlier to its nearest neighbor is large enough, it would be considered an outlier. Table III shows the critical values for r10 and r11 ratios.

Table III: Critical value for Dixon tests (α = 0.05) (Reference 6).

Using the data set in Table I, the Dixon-type test can be used to to determine whether 16.3 is a potential outlier. For r10, the test statistic is (16.3 – 9.3)/(16.3 – 4.1) which is equal to 0.574 and is greater than the tabled value of 0.412. Therefore, for a sample size of 10, 16.3 is a statistical outlier.

Outliers in regression

Regression analysis or least-squares estimation is a statistical technique to estimate a linear relationship between two variables. This technique is highly sensitive to outliers and influential observations. Outliers in regression can overstate the coefficient of determination (R2 ), give erroneous values for the slope and intercept, and in many cases lead to false conclusions about the model. Outliers in regression usually are detected with graphical methods such as residual plots including deleted residuals.

A common statistical test is Cook's distance measure, which provides an overall measure of the impact of an observation on the estimated regression coefficient (7). Just because the residual plot or Cook's distance measure test identifies an observation as an outlier does not mean one should automatically eliminate the point. One should fit the regression equation with and without the suspect point and compare the coefficients of the model and review the mean-squared error and R2 from the two models.

Table IV: Comparison of methods.

Summary

Various methods for detecting outliers are used several times during an analysis. The first step in outlier detection is to plot the data using methods such as histograms, scatter plots, or a box plot. The extreme studentized deviate test is an excellent test for data sets that have more than 10 observations from a normally distributed sample. For less than 10 observations, the Dixon test is a good method and does not require distributional assumptions. When performing regression analysis, always review residual plots to ensure no outliers are affecting the model coefficients and inflating the R2 value. Table IV summarizes these methods and their ease of use.

Steven Walfish is the president of Statistical Outsourcing Services, 403 King Farm Blvd., Suite 201, Rockville, MD 20850, tel. 301.325.3129, fax 301.330.2143, steven@statisticaloutsourcingservices.com, www.statisticaloutsourcingservices.com.

Submitted: June 12, 2006. Accepted: Aug. 14, 2006.

Keywords: analytical testing, manufacturing, statistics

References

1. USP 29–NF 24 (US Pharmacopeial Convention, Rockville, MD, 2006).

2. V. Barnett and T. Lewis, Outliers in Statistical Data (John Wiley & Sons, 2d ed., New York, NY, 1985).

3. P.J. Rousseeuw and A.M. Leroy, Robust Regression and Outlier Detection (John Wiley & Sons, New York, NY, 1987).

4. B. Iglewicz and D.C. Hoaglin, How to Detect and Handle Outliers (American Society for Quality Control, Milwaukee, WI, 1993).

5. M.A. Stephens, "Tests Based on EDF Statistics," in Goodness-of-Fit Techniques, R.B. D'Agostino and M.A. Stephens, Eds. (Marcel Dekker, New York, NY, 1986).

6. CRC Standard Probability and Statistics, W.H. Beyer, Ed. (CRC Press, Boca Raton, FL).

7. J. Neter et al., Applied Linear Statistical Models (Irwin, Chicago, IL, 1985).