Never Mind the Statistics; Just Tell Me What the Answer Is!

Published on: 

How confident are you in your lab results? This paper explores this question and, by way of a worked example, provides a methodology to answer it.

Many may wonder at the number of analytical chemists who have faced challenges from management who demand answers from analytical results without explanation of their statistical significance. For the analytical chemist, however, results are never just about getting an answer. For any analytical procedure’s reportable value, there is only a most probable estimate of the true value with an associated interval of confidence. Unfortunately, this fact is not always readily appreciated by management. Perhaps they would prefer to have precise ‘lies’ than the inexact ‘truth.’

So, what might be the root cause of this problem?One suggestion is that some members of management don’t feel comfortable with the idea of uncertainty at all. This attitude probably stems from a fundamental lack of understanding that all measurements and their derived values are subject to error, and that this error is normal (usually). I hope that I may be excused the pun!

Sources of error

Suppose that management can be convinced of the reality of error and uncertainty in analytical measurements. Hence, there will be a necessary requirement to minimize or remove as much error as economically feasible. What can be done about it? As a first step, the potential sources of error within analytical processes must be identified and their individual contributions to that error must be quantified, as illustrated in Figure 1.

Gross errors are such that only abandonment of the analysis and a fresh start is an adequate cure.For example, if a portion of the sample is spilt or an instrument has been improperly set up, then no statistical methodology can be applied.Random errors occur on either side of the reported result and affect precision (repeatability, intermediate precision, and reproducibility).This type of error may be thought of as the noise element in the measurement.Systematic errors relate to all the results being biased in one direction, either too high or too low, and hence affect accuracy (closeness to the true value).Random and systematic errors are amenable to statistical evaluation.

Process mapping as an aid to error evaluation

It has long been standard practice to draw process maps for many business processes. However, this is strangely not so common for laboratory analyses. If one can draw a process, one gain an objective understanding of the technical and procedural steps involved. Process mapping tools have long been available since the early days of Six Sigma evaluations, including flow charts and the associated Ishikawa diagrams, for example.

These tools and techniques are less well applied in the laboratory, which is unfortunate. Since the rise of data integrity concerns by regulators and the necessity to assure demonstrable control in analysis, however, their use is increasing.

The principle of starting simple to understand the process is a good one. Learning to walk before attempting running is advisable. This article illustrates the mapping concept with a simple manual volumetric titration involving the standardization of a sodium hydroxide (NaOH) solution with a certified reference material (CRM) of potassium hydrogen phthalate (KHP).

First, a basic process flow can be drawn, as shown in Figure 2. The question then must be asked: is this sufficient for the required purpose? Ideally, process would need to be drawn in more detail to identify the possible error sources and possible error types. A more detailed process flow has been drawn in Figure 3. Figure 3 incorporates the error structures in Figure 1 and expands the basic process flow from Figure 2.

With the details now clear, it is reasonable to suppose that the manual operations conducted by the analyst pose the greatest potential for error. The assumption can be made that no gross errors have occured and that the proper precautions have been taken to protect the NaOH solution from possible contamination with carbon dioxide (CO2), and that the reference standard is 100% pure.

Consideration of the manual procedure clearly shows that there could be systematic errors by the analyst in the burette readings as well, because the analyst determines the endpoint visually.In addition, there could be systematic errors in the burette itself, despite the burette being Grade A certified. Classical volumetric analysis requires analyst skill and experience, which is not as readily available as it was in the days before instrumental analysis became widely available.

Using the data provided in Figure 3, one can calculate the concentration of the NaOH solution (CNaOH) as 0.1019 mol L-1. However, one must ask, how accurate is this manual result, and what confidence can be ascribed to its value?

The answer to these questions will be explored in the latter part of this article.

Automated titrimetry is the way forward, isn’t it?

Autotitrators are now commonly available in many laboratories and should be used to mitigate the manual operational risks of analyst systematic error. For the purpose of this article, it is assumed that management is convinced by the argument for automated titrimetry and has thus condoned the purchase of an autotitator.

Based on that scenario, Figure 2 can be redrawn to produce an automated version, which is now shown in Figure 4. As the concept of uncertainty is discussed later in this article using International Organization for Standardization error budget concepts, a new term must be introduced here to denote the output from an analytical process; that term is the measurand (1). This term simply means the particular quantity to be measured; in this instance, the quantity to be measured is the CNaOH.

Assuming that the instrument has been properly qualified, the titration process can be automated to minimize the titration procedural errors. Note that the word used here is “minimize” not “remove.” The process now relies on an autotitator with a digitally controlled piston burette under software control. As will be demonstrated, the use of this autotitrator makes the estimation of uncertainty in the measurand a straightforward process. This particular process was chosen because it was used as a worked example, A2, in the Quantifying Uncertainty in Analytical Measurement (QUAM:2012.P1) (2) for the classical error budget approach that will be described later.

Having drawn the automated process flow, an Ishikawa (Fishbone) diagram can be used in Figure 5 to examine in more detail the four process elements that contribute to the uncertainty, namely:

  • the molecular weight of the KHP CRM itself
  • the purity of the KHP CRM
  • the weight taken of the KHP CRM
Advertisement
  • the volume of the NaOH titrant required to neutralize the weight of the KHP CRM.

As this is an exercise, the uncertainties in the atomic masses of carbon, hydrogen, oxygen, and potassium will be considered, and the uncertainty of the molecular weight of KHP will be calculated. As will become apparent, this error source would not need to be considered routinely; however, without these data, there will be no basis for ignoring the error source in the future. With the setup in place, it will now be possible to estimate the uncertainty in the measurand. An error budget must first be constructed from the Ishikawa diagram, however.

Error budgets

The basics of the measurement uncertainty approach and error budgets have been described in previous papers (3,4), and the detailed formal calculation approach is described in another paper (2). However, it is best to visualize the components as a structured model as shown in Figure 6, the reason being that the visualization makes clear the interrelationships between the error sources needed when performing the calculations.

There are two distinct calculation methods available. The first method requires approximations of the error distributions and an Excel spreadsheet to give a single value as described in the EURACHEM/CITAC Guide (2), and the second method does not require approximations and produces a series distribution of values; the second method does, however, require simulation software (5). The starting point is the same for both these methods.

The equations to be used are facilitated if Figure 3 is updated to reflect the autotitator process and the uncertainty contribution data are added, as shown in Figure 7.

For example, the corrected volume of the titrant, VT, is a function of the accuracy of the autotitator, the temperature of the laboratory, and the measured volume. The accuracy of the autotitator is known to be ±0.03 mL. It is assumed that this range is distributed as a triangular distribution because it is more likely to have a value nearer the center.

The laboratory temperature is controlled between 17 °C and 23 °C, a range that has a uniform or rectangular distribution. In the first calculation approach (2), these ranges are converted to standard deviations, using approximations, by dividing the values using factors of √6 and √3. The basis for these approximation factors is described in a technical appendix (2).

The overall uncertainty of the measurand using the method fully described in the EURACHEM/CITAC Guide (2) is given as Equation 1:

This equation may be mathematically elegant, but it is time consuming to carry out and not really user friendly for use in laboratories.

However, there is another, more visual way. The second method, which uses Monte Carlo Simulation (MCS), will be described in more detail later in this article, where a comparison of the results from these two approaches will be given. The classical calculation, illustrated in Equation 1, gives a single value of the uncertainties and approximates the distributions. MCS yields many values that are then used to calculate the required values and associated uncertainties. This method is much quicker and easier to perform but requires large amounts of computing power and a specialist software application.

Simulation modelling of the error budget

MCS statistical modeling has been around for more than 70 years (6), but because it requires large amounts of computational power, it remained a specialist area for approximately 60 years or so. With the advent of cloud computing and user-friendly application software, however, MCS modeling is now readily achievable without the need for programming knowledge. In this paper, Minitab Workspace software (5) is used for the automated titration example.

Having identified 12 inputs (X) for the error budget, one can set up the contributions using drop down menus for distributions from Figure 6 and Figure 7 and then write the output equations. The 12 inputs are shown in Figure 8. The distribution type can be a fixed value (constant) as in the case of the observed mass of KHP and the observed volume of titrant. Visually, the distribution shape is automatically given based on the selected distribution.

Based on these inputs, simple equations can be written for the five outputs (Y) needed to calculate the uncertainty of the measurand only using the X input variables already defined in Figure 9.

Before the simulation is run, there is a dendrogram display of the relationships between the 12 (X) inputs in blue and the five (Y) output equations in orange, which should be checked for model correctness and for completeness, as shown in Figure 10.

The purity of the KHP input has been included in the mass of KHP output, and the RFP factor has been included in the volume output for computational convenience; these inputs yield the same numerical answers. However, the structure in Figure 6 is necessary when the comparison of the two approaches is considered.

Now the MCS model can be run. How many iterations (N) are needed to get reliable outputs?Usually, 10 to 50 thousand iterations are sufficient; however, Minitab Workspace currently allows up to one million (note that this would not be practical in Excel).

N = 500,000 was selected for this example. How long does it take to do the calculations half a million times, do the statistics, and plot the output files in the Cloud? The answer to that question is not clear because much of the time depends on the speed of the Internet connection. For this example, the time taken was about 10 seconds, which was quick enough, and another few seconds to download and export the file to PowerPoint. The total time to set up the X inputs and Y outputs was approximately 30 minutes. The output (see Figure 11) gave a value for the measurand of 0.1021, which was close to that of the manual value of 0.1019. When a 95% confidence range is calculated from the modeling, the results come out to 0.1019–0.1023 by both approaches.

Before looking at a comparison between the two approaches in more detail, it is interesting to take note of the uncertainty of the molecular weight of the KHP, while keeping in mind that the atomic masses of carbon, hydrogen, oxygen, and potassium from the EURACHEM/CITAC Guide were inputted, and that those atomic masses were taken from International Union of Pure and Applied Chemistry values at the time with their uncertainties model as a rectangular distribution. The molecular weight of KHP, as defined by the first Y output equation in Figure 9, was calculated 500,000 times from values selected at random from the defined rectangular distributions.The result is shown in Figure 12.

Therefore, the relative standard deviation in the molecular weight of KHP due to uncertainties in the atomic masses is 0.0019%. Now that this value is known, it won’t be necessary to include it next time.

Comparison of the two approaches

The results from the classical QUAM calculation approach and the MCS approach are shown in Table I. Whilst there are some small individual differences, which would be expected, the uncertainty values for the measurand are comfortingly close. However, the real power of using a modeling approach is the comparison of the sizes of the error sources. This comparison is graphically shown in Figure 13.

Pictures speak louder than words! Both approaches are comparable and provide the same information: having gotten rid of the analyst contributions, the remainder is dominated by the autotitator contributions for accuracy and precision of the titrant volume. At 95% confidence, the uncertainty for the molar concentration of the NaOH solution is 0.20%, which is, for most purposes, good enough.

Summary

Analytical chemists need to be comfortable with the error structure of their procedures and need to be able to communicate measurement uncertainty to management. To do this effectively and efficiently, they should consider employing the tools of process mapping and error budget calculation approaches as described.Data integrity and data quality are increasingly subject to regulatory scrutiny, and scientifically sound and defendable approaches are thus required.

References

1. JCGM. International Vocabulary of Metrology—Basic and General Concepts and Associated Terms (VIM); 3rd ed., 2008.
2. EURACHEM/CITAC Guide. Quantifying Uncertainty in Analytical Measurement; 3rd ed., 2012.
3. Burgess, C. The Basics of Measurement Uncertainty in Pharma Analysis. Pharm. Tech. 2013, 37 (9).
4. Burgess, C. Measurement Uncertainty without the Math. Pharm. Tech. 2016, 40 (2), pp. 36–40.
5. Minitab. Minitab Workspace. www.minitab.com (accessed on Sept. 14, 2022).
6. Metropolis, N.; Ulam, S. The Monte Carlo Method. J Amer. Stats Assoc. 1949, 44 (247), pp. 35–41.

About the author

Christopher Burgess, PhD, is managing director, Burgess Analytical Consultancy Limited, Barnard Castle, Co Durham, UK.