Pitfalls in Statistics

April 2, 2011
Lynn D. Torbeck
Pharmaceutical Technology
Volume 35, Issue 4

The hardest errors to spot are the ones that don't look like errors at all.

We all have a scrupulous eye for subjects we are passionate about. Foodies insist on using the most authentic ingredients for their favorite dishes. Classic-car collectors require the smallest details to be as close to the original as possible. And statisticians cajole nonstatisticians to avoid classic pitfalls in applied statistics. In this latter case, however, it is not a matter of taste or authenticity. Incorrect statistical practices can result in erroneous calculations and poor conclusions. Some errors are small, but others can be monumental, and since we never know which way the apple will fall, we should treat them all the same.

Lynn D. Torbeck

Although there is always a large scope for error in a statistical project, some mistakes are more common than others. Those that appear to be correct on the surface are what we call pitfalls.

The most common and deadliest pitfalls

This section highlights some of the most common challenges facing statisticians.

Reportable values. Reportable value or result is not defined for the data and the analysis (1). By definition, the reportable value is the end result of the complete measurement method as documented. It is the value compared with the specification and the official value most often used for statistical analysis. If different people or departments have different definitions, confusion reigns and out-of-tolerance and out-of-specification investigations multiply.

Averages. The average of a set of averages is correct only if the sample sizes are the same. Otherwise, the averages are weighted by the sample sizes (2). In addition, avoid averaging standard deviations even when the sample sizes are the same. The variance is the standard deviation squared. Variances can be averaged when the sample sizes are the same. If the sample sizes are not the same, then a weighting formula is used (2).

The percentage relative standard deviation. The percentage relative standard deviation (%RSD) is not a substitute for the standard deviation because they measure different aspects of variation. Report both with the sample size. Also, avoid trying to average %RSDs or calculate %RSD on data expressed as percentage recovery. The data is already a percentage, so the average and the standard deviation will also be percentages.

Sample size. Always report the sample size. Remember the famous rat study where one third of the rats got better, one third got worse and the third rat ran away?

Summaries. Avoid gratuitous summary statistics without a clear purpose; they cloud the interpretation. Likewise, printing out massive lists of all possible summary statistics of all possible sets, subset and sub-subsets isn't worth the paper it is printed on.

Values and ranks. Give absolute values when looking at relative changes. For example, 2 out of 100 million versus 1 out of 100 million is a 100% increase, but it is still only 2 out of 100 million. Trust your reader to understand the practical implications of the data.

Ranking anything without giving absolute values and/or some sense of comparison to practical importance can be misleading. For example, consider that we ranked schools using a metric that results in one school being at the bottom of the list and another school at the top. But then, we realize both schools produced Nobel Prize winners. Does the ranking therefore have any meaning?

Charts and graphs. Avoid pie charts unless your goal is to deliberately confuse your reader. Graph the data before starting a formal statistical analysis. Common graphs include histograms, time plots, and scatter plots. Attempt to get cause and effect on the same page (3).

Underlying data. Attempt to determine the underlying distribution of the data before starting a formal statistical analysis. While the normal distribution is the most common, many other distributions, such as the log-normal, are also found.

If the data are symmetrical around the average, use the average and the standard deviation. If the data are skewed, the median and interquartile range is preferred. This is not a hard and fast rule, just good practice.

Assay results. Correcting or adjusting one assay result with the result of a second assay that has the same as or larger variance than the first will result in more total variability, not less, because the variances add up. This is known as the "weight to run" problem, in which tablet weight is adjusted using an assay test for potency. This problem can lead to rejecting good lots of materials and products. In most cases, setting the weight equal to the target results in less variation.

Population paremeters. Recognize that the population parameters, such as mu (µ), the population mean, and sigma (σ), the population standard deviation, are single values, whereas sample estimates of the population mean and sample estimates of the population standard deviation are random results from a distribution. Every additional sample gives slightly different results, so problems arise when sample estimates are treated as if they are population values. This leads to treating other sample estimates such as %RSD and Cpk as if they are without variation. Confidence intervals should be calculated for these statistics to estimate their uncertainty.

Definitions and intervals. Define in exact detail what the phrase "within the variation of the method" means for specific applications because there is no universally accepted definition. Try not to use use confidence intervals to set specification criteria. Instead, use tolerance intervals to get a starting point. Overlapping or non-overlapping confidence intervals are not a significance test.

The most egregious pitfall of all is calculating the sample average plus and minus three times the sample standard deviation without considering the sample size and distribution, and then using it for confidence intervals, setting specification criteria, identifying outliers and all other manner of ad-hoc comparisons. It is not a universal statistical tool. In fact, this equation came about in the 1920s via Dr. Walter Shewhart for defining control charts and is no longer of much use today.

Of course, there are many more potential pitfalls, and when in doubt, contact your local statistician.

Lynn D. Torbeck is a statistician at Torbeck and Assoc., 2000 Dempster Plaza, Evanston, IL 60202, tel. 847.424.1314, Lynn@Torbeck.org, www.torbeck.org.


1. L. D. Torbeck, "Analytical Validation" supplement to Pharm. Technol. 23, 21-23 (1999).

2. M. Spiegel and L. Stephens, Schaum's Outline of Statistics Fourth Edition (McGraw Hill, NY, NY, 2008).

3. E. R. Tufte, The Visual Display of Quantitative Information (Graphics Press, Cheshire, CT, 1983).