When making an identity assessment based on spectral data, the unknown measurement is examined in relation to one or more
reference spectra. A common approach for spectral comparison is to calculate the wavelength correlation, which is equivalent
to measuring the cosine of the angle between the two spectra. The resulting correlation coefficient, r, is 1 when the two spectra are in perfect correspondence and 0 when they are orthogonal. Although useful for quick similarity
assessments, the correlation coefficient is not particularly sensitive to discrepancies between the two spectra of interest.
More problematic, a correlation coefficient other than 0 or 1 has no direct interpretation in the context of spectral identity
testing because a transparent interpretation as a test statistic only holds when dealing with random normal variates, which
is clearly not the case for FTIR, Raman, or NIR spectra. Despite these difficulties, there is regulatory guidance on selecting
a correlation threshold that states, "Unless otherwise justified, a [correlation] threshold below 0.95 is not acceptable..."(13).
Arbitrary designation of a correlation threshold in this manner can be perilous because it is unsupported by either basic
statistics or demonstration, a point that has also been made by other researchers (14).
Figure 3 shows a Raman reference spectrum for pure glycerin and a Raman spectrum measured for an "unknown" substance, in this
case glycerin contaminated with 20% diethylene glycol by volume. Contamination of glycerin with diethylene glycol is a problem
of current interest, as evidenced by several news reports and a recent FDA guidance (15). In spite of clear differences in
the highlighted regions of the spectra in Figure 5, there is a very favorable correlation coefficient (0.96), which indicates
that this material would pass as acceptable unless a higher than typical correlation threshold were applied.
An alternate approach to wavelength correlation used by the handheld units for this study is to evaluate whether the measured
spectrum lies within the multivariate domain of the reference spectrum (or spectra), which is defined by the uncertainty characteristics
of each measurement, including exposure settings, instrument and environment properties (e.g., temperature, dark current,
ambient lighting), and the optical properties of the sample itself. When comparing spectra in this manner, the analyst looks
for spectral features that contradict the reference spectrum given the uncertainty of the measurement, rather than how well
the bulk spectrum matches. For identity testing, the critical question is whether the measured spectrum can be considered
consistent with the reference spectrum given the multivariate uncertainty of the measurement conditions. Like most statistical
tests, the analysis is distilled to a p-value, in this case the probability that the observed differences between the test and reference spectra simply arose by
chance given the uncertainty of the measurement. Higher p-values indicate that any differences are not large relative to the uncertainty of the measurement. In such cases, the measured
spectrum is deemed consistent with the reference spectrum, and the instrument declares "pass." If the p-value is too low (below 0.05 as the device default), then it suggests that discrepancies between the measured and reference
spectrum were unlikely to arise from the uncertainty in the measurement alone, and the device declares "fail." The system
logic just described is illustrated in Figure 4. The earlier example for the spectra in Figure 3 resulted in a p-value of 3.2 × 10–3, which indicates that there is a discrepancy.