Pitfalls in Statistics - Pharmaceutical Technology

Latest Issue
PharmTech

Latest Issue
PharmTech Europe

 Pharmaceutical Technology All results
Pitfalls in Statistics
The hardest errors to spot are the ones that don't look like errors at all.
 Apr 2, 2011 Pharmaceutical Technology Volume 35, Issue 4, pp. 40-42

 Lynn D. Torbeck
We all have a scrupulous eye for subjects we are passionate about. Foodies insist on using the most authentic ingredients for their favorite dishes. Classic-car collectors require the smallest details to be as close to the original as possible. And statisticians cajole nonstatisticians to avoid classic pitfalls in applied statistics. In this latter case, however, it is not a matter of taste or authenticity. Incorrect statistical practices can result in erroneous calculations and poor conclusions. Some errors are small, but others can be monumental, and since we never know which way the apple will fall, we should treat them all the same.

Although there is always a large scope for error in a statistical project, some mistakes are more common than others. Those that appear to be correct on the surface are what we call pitfalls.

The most common and deadliest pitfalls

This section highlights some of the most common challenges facing statisticians.

Reportable values. Reportable value or result is not defined for the data and the analysis (1). By definition, the reportable value is the end result of the complete measurement method as documented. It is the value compared with the specification and the official value most often used for statistical analysis. If different people or departments have different definitions, confusion reigns and out-of-tolerance and out-of-specification investigations multiply.

Averages. The average of a set of averages is correct only if the sample sizes are the same. Otherwise, the averages are weighted by the sample sizes (2). In addition, avoid averaging standard deviations even when the sample sizes are the same. The variance is the standard deviation squared. Variances can be averaged when the sample sizes are the same. If the sample sizes are not the same, then a weighting formula is used (2).

The percentage relative standard deviation. The percentage relative standard deviation (%RSD) is not a substitute for the standard deviation because they measure different aspects of variation. Report both with the sample size. Also, avoid trying to average %RSDs or calculate %RSD on data expressed as percentage recovery. The data is already a percentage, so the average and the standard deviation will also be percentages.

Sample size. Always report the sample size. Remember the famous rat study where one third of the rats got better, one third got worse and the third rat ran away?

Summaries. Avoid gratuitous summary statistics without a clear purpose; they cloud the interpretation. Likewise, printing out massive lists of all possible summary statistics of all possible sets, subset and sub-subsets isn't worth the paper it is printed on.

Values and ranks. Give absolute values when looking at relative changes. For example, 2 out of 100 million versus 1 out of 100 million is a 100% increase, but it is still only 2 out of 100 million. Trust your reader to understand the practical implications of the data.

Ranking anything without giving absolute values and/or some sense of comparison to practical importance can be misleading. For example, consider that we ranked schools using a metric that results in one school being at the bottom of the list and another school at the top. But then, we realize both schools produced Nobel Prize winners. Does the ranking therefore have any meaning?

Charts and graphs. Avoid pie charts unless your goal is to deliberately confuse your reader. Graph the data before starting a formal statistical analysis. Common graphs include histograms, time plots, and scatter plots. Attempt to get cause and effect on the same page (3).

Underlying data. Attempt to determine the underlying distribution of the data before starting a formal statistical analysis. While the normal distribution is the most common, many other distributions, such as the log-normal, are also found.

If the data are symmetrical around the average, use the average and the standard deviation. If the data are skewed, the median and interquartile range is preferred. This is not a hard and fast rule, just good practice.

| Weekly
| Monthly
|Monthly
| Weekly
 Survey
FDASIA was signed into law two years ago. Where has the most progress been made in implementation?
Reducing drug shortages
Breakthrough designations
Protecting the supply chain
Expedited reviews of drug submissions
More stakeholder involvement
Reducing drug shortages 32%
Breakthrough designations 8%
Protecting the supply chain 40%
Expedited reviews of drug submissions 8%
More stakeholder involvement 12%
Most Viewed Articles
 Columnists Outsourcing Outlook Jim Miller Health Systems Raise the Bar on Reimbursing New Drugs sponsored by Ingredients Insider Cynthia ChallenerThe Mainstreaming of Continuous Flow API Synthesis Regulatory Watch Jill Wechsler Industry Seeks Clearer Standards for Track and Trace Ask the Expert Siegfried SchmittData Integrity
 VALIDATION RESOURCES FROM IVT NETWORK Process Validation Special Editions 19th Annual Validation Week Compendium Computer and Software Validation Volume II Special Edition Analytical Method Validation Toolkit More from IVT