Statistical limitations of the current method
One statistical limitation of the current method is that the VRL is determined based on observed data without describing a
relationship between observations and the experimental parameters. Suppose that in a VC verification study, a residue is spiked
at levels of 0, 0.5, 1.0, 2.0, 3.0, and 4.0 μg/cm^{2}. If all four inspectors detect residue at 2.0 μg/cm^{2} and only three inspectors detect residue at 1.0 μg/cm^{2}, then the VRL would be 2.0 μg/cm^{2}, assuming that the residue levels between 1.0 and 2.0 μg/cm^{2} would not be detected by all inspectors. The VRL is inappropriate because a residue level between 1.0 and 2.0 μg/cm^{2} could possibly have been detected by all inspectors.
To predict the number of observers that would detect residue at levels other than those spiked (e.g., 1.5 μg/cm^{2}), the observed data must be incorporated into a reasonable model that describes a relationship between an outcome and a set
of independent variables. The results obtained from spiking studies for verifying the VC criterion are binary (i.e., only
two values are possible) rather than continuous. The regulatory guidelines and the available literature do not explain how
to establish a modeling procedure, based on these discrete responses, that could be used to derive VRLs, however.
Another important parameter in determining appropriate VRLs is the sample size (i.e., number of inspectors and total number
of observations) for VC verification studies. Most published studies are based on relatively small sample sizes. Forsyth's
studies are based on only four observers (4, 10, 11). Because the VRL depends on the proportion of detection (i.e. the number
of detections of residue to the number of inspections), a small sample size increases the width of the confidence interval
and the margin of error. Thus, an 0.8 proportion of detection with a sample size of five would result in a 95% exactconfidence
interval of 0.2836–0.9949 and an approximately 35.57% margin of error. At the same proportion of detection, a sample size
of 25 would result in a 95% exactconfidence interval of 0.5930–0.9317 and an approximately 16.94% margin of error. Because
no consensus has been established about the appropriate number of observers for VC verification studies, the sample size of
the study could cause over or underestimation of VRLs.
Logistic regression
The objective of VC verification studies is to prove that the VC criterion would ensure cleanliness if implemented in the
manufacturing setting. During VC verification, spiking studies are performed and inspectors state whether they can detect
residue visually under controlled viewing conditions. VC verification studies thus provide a basis for the establishment of
VRLs and the determination of appropriate viewing conditions. The lowest concentration of residue that is visually detected
by all the observers is then used as VRL. In the current method, VRL could be defined mathematically as the lowest residue
concentration for which the ratio of the number of observers able to detect the residue to the total number of observers is
equal to 1. As discussed earlier, any knowledge about the outcome in future situations could not be obtained from the observed
data unless the data were fitted with the most conservative model that explains the data.
One of the most common examples of modeling is the linearregression technique. However, linear regression is not suitable
for binary data. If we represent the binary responses "Yes" and "No" with values of 1 and 0, respectively, then the mean is
the proportion of cases with a value of 1 and can be interpreted as proportion or probability of detection. Although the proportions
and probabilities cannot exceed 1 or fall below 0, fitting the data with linear regression could give predicted values of
the response variable above 1 and below 0. Clearly, linear regression is not appropriate when the data must lie between 0
and 1 because predictions from the model are not similarly constrained. Other problems that arise when fitting binary data
with linear regression are that the variance of the error term is not constant and that the error term is not normally distributed.
Table III: The outcome of visually clean verification studies.

The most suitable modeling technique that could be applied to describe a relationship between explanatory variables (e.g.,
experimental parameters such as residue concentration, viewing distance, viewing angle, and light intensity) and the binaryresponse
variable is logistic regression. Logistic regression allows scientists to predict probabilities of detection of residue based
on experimental variables, which is an advantage compared with other prediction techniques. Logistic regression is a flexible
and easily applied modeling technique that can be used effectively to model data with continuous or discrete explanatory variables.
In addition, the technique accommodates response variables that are not normally distributed.
Figure 1: Plot of probability of detection against the residue concentration for the data presented in Table III.

The modeling procedure based on logistic regression is explained here using a hypothetical data set (see Table III). This
data set may represent an ideal case and is certainly within the experience of those involved in conducting VC verification
studies (4, 11, 12). The response variable is binary and indicates two different outcomes (i.e., "Yes" or "No") based on the
detection of residue by the observers at a specific viewing condition. The continuous explanatory variable is the measure
of the theoretical residue concentration spiked on the model surface. At each spiking level, five replicates indicate the
total number of observers used for the visual inspection of the surface. The observed proportion and probability of detection
for each spiking level is the ratio of observers that detected the residue to the total number of observers. Figure 1 shows
a plot of these observed probabilities of detection. It suggests that the probability of detection increases with the spiked
residue concentration. The relationship is nonlinear, however, and the probability of detection changes little at the high
extreme of spiked residue. This pattern is typical because proportions and probabilities cannot lie outside the range of 0
to 1.
