Materials of Construction Based on Recovery Data for Cleaning Validation

October 2, 2007
Pharmaceutical Technology
Volume 31, Issue 10

The material of construction is a factor in the recovery of residue in cleaning validation. An analysis of existing recovery data showed that recovery factors for drug products on various materials of construction may be categorized into several groupings.

Residue assays are a critical quality attribute in establishing a validated cleaning program. They are essential for accurately determining amounts of residual active pharmaceutical ingredient (API) or formulation component in comparison with the acceptable residue limit (ARL) for a given process or equipment train (1). The residue assays are validated for several parameters: linearity, precision, sensitivity, specificity, and recovery. From an analytical standpoint, recovery is from the cleaning-test sample or the swab. From the cleaning-program standpoint, the concern is the recovery of the residue from the manufacturing equipment (1, 2), which is determined through experiments in which sample equipment materials spiked with known amounts of the substance of interest are swabbed and tested. The swab and the swabbing solvent must be capable of recovering a sufficient amount of material to allow an accurate and precise measurement of the spiked component.

It is a common practice to set baseline limits for a minimum acceptable recovery (e.g., a minimum product recovery of 70%). These limits, however, are sometimes set without scientific justification. The most important aspects for product-recovery factors are that the data be consistent, reproducible, and provide an adjusted ARL that is within the method limit of quantitation. ARLs must be achievable and practical. If recoveries are too low, either the methods will need to be optimized or equipment dedicated for the process.

Although reducing the amount of testing for a cleaning-validation program is desirable, efficiencies must have technical merit and be scientifically justified. For each new drug product manufactured, quantitative studies determine acceptable swab recoveries from each material of construction that requires surface sampling. One way to reduce the amount of testing is to test a new substance on only a subset of materials. A justification based on low risk for product carryover would include testing representative materials and excluding materials that have limited product contact (i.e., <5% of total surface area). An alternative justification groups materials based on similarity in composition (e.g., metals, plastics, and glass). The former justification ignores potential residue buildup, and the latter discounts individual material characteristics.

Several parameters affect the recovery of residue from equipment surfaces (3). Residue solubility, swab and solvent type, and material of construction are the three most influential parameters. These three parameters are interrelated. Each residue has an inherent cleanability, which may be related to its solubility (4). The cleanability of the residue affects the choice of cleaning procedure. The harder a residue is to clean, the higher the anticipated amount of residue to be swabbed. The amount of anticipated residue affects the choice of swab and solvent and the validation of the analytical testing method.

The materials used to recover the residue also affect recovery. The swab material must be able to absorb sufficient residue and solvent to remove the residue from the equipment material surface. The type and amount of recovery solvent must solvate the residue sufficiently for removal without leaving residue or solvent behind. The swabbing technique should be standardized to minimize subjectivity and should recover enough residue consistently so that a precise measurement is ensured. The combination of swab material and recovery solvent should not interfere with the subsequent sample assay.

Finally, the material of construction of the manufacturing equipment needs to be considered. Most equipment in the pharmaceutical industry with drug-product contact is composed of stainless steel. Component parts of equipment, however, may be made from other materials such as Teflon and rubber. The swab recovery of residue from each material of construction should be determined to accurately quantify residue levels and assess material cleanliness.

The goals for this study were to gather and statistically analyze all available historical data to achieve the following:

  • Group materials of construction according to recovery performance

  • Select representative materials from each grouping

  • Identify the physical characteristics of the materials and parameters that influence recovery-data results.

The result of this analysis may reduce the number of studies required for new substances in a cleaning-validation program.


Multiple manufacturing sites conducted recovery studies based on site manufacturing equipment and the current product manufacturing matrix. For this study, 16 sites manufacturing drug products accumulated a significantly diverse database during several years. The data set consisted of 1262 recovery-factor (RF) values for 48 different drug substances, formulations, or detergents on 29 different materials of construction.

The swab material and solvents were defined for each residue. The analytical methods were validated for the specific residue of interest. The swabbing technique at each site was consistent, but there were differences in procedures from site to site.

The two primary factors for statistical analysis were the material of construction and the test substance. The data were analyzed to determine if there were any trends or relationships that might be leveraged to reduce testing without compromising quality. The statistical methods used included analysis of variance to compare recovery factors across substances and materials. Variance components analysis estimated variance across sites and repeated tests. Least squares means estimated average recovery factors adjusted for different materials and substances and confidence intervals around those averages. Exploratory data analysis identified the primary causes of differences in recovery factors.

Results and discussion

Assumptions in recovery-data analysis. Although standardized methods for recovery exist throughout the company, each site that conducted recovery studies implemented these procedures with different equipment, personnel, and slight modifications to the methods and swab technique. For the purposes of this analysis, these differences were all combined into a site-to-site variance component. Specific causes of variation among sites were not explored.

Limitations. This data analysis was performed with available data from routine recovery studies at each site. There was no attempt to collect data in a structure, so the data were not balanced across materials, substances, or sites. For example, a particular product may have been studied on only one material at one site. This approach made it difficult to separate recovery differences based on materials from recovery differences attributed to substances or sites. Analysis of variance on unbalanced data is analogous to building a table with legs of unequal height. Although differences may be detected between substances or materials, they are more difficult to support if the data structure is not well balanced across test conditions.

Analysis of data from such undesigned data sets can identify differences between substances and materials, but it does not provide strong support for cause-and-effect relationships. Follow-up designed experiments will test the hypotheses generated from this initial data exploration.

Recovery range. The data set used for this analysis, shown in Table I, consisted of 1262 RF values obtained for 48 different substances tested on 29 different materials. The mean RF was 80 and ranged from 3 to 154. Although the majority of the substances tested were products (APIs and formulated drug substances), detergents were also tested. There were 1072 RF values available for 42 products and 190 RF values available for 6 different detergents.

Table I: Detergents and products tested.

The average RF was 13 units higher for detergents than for products. This result agreed with the concept that detergents are typically formulated to be easily removed from surfaces. The standard deviations for the two groups were similar: 22 units for detergents and 19 for products. The RF ranged from 30 to 154 for detergents and from 3 to 107 for products, shown in Figure 1, which each color represents a different material group.

Figure 1: Product recoveries by material. (FIGURE 1: MERCK & CO. INC.)

The recovery range for detergents was higher than expected. All of the high results were assayed using total organic carbon (TOC) analyzers, and the product data were generated using high-performance liquid chromatography. Several detergents had a low carbon load, and at low assay levels, TOC can have high recovery data accompanied with high assay variability (5). Historically, detergent-residue assays have been far enough below the ARL that the high recoveries and increased variability were not considered factors requiring additional work.

Deviations. The RF varied more between different materials than it did between different products. The average RF by material ranged from 31.5–92.5; by product, it ranged from 47.7–102.3. The additional variability among sites was not significant relative to the variability for repeated measurements within each site. After allowing different RF means for each material and product, 97% of the remaining variability in RF was within sites, and only 3% additional variance was across sites. These results meant that across-site and within-site variability could be combined, and comparisons of materials and products across sites had essentially the same precision as comparisons made within a single site. This relationship was a very useful finding because it allows RF results to be leveraged among different sites without accounting for site-to-site differences.

The standard deviation for individual API repeated measurements of RF was 14 units. To obtain a more precise estimate of RF for a new product or material, multiple measurements of RF can be obtained at a single site. The standard error of the average of multiple measurements is less than the standard deviation. If n measurements are taken, the standard error of the mean (σmeanRF) is the standard deviation(σRF) divided by the square root of n as shown in equation 1:

To obtain an estimate of RF with a confidence interval of ±10%, a recovery study can be tested at least seven times. Three repeated tests will give a confidence interval of ±15%.

Effect of material of construction. To identify reasonable groups of materials, the RF means were divided into groups with ranges less than or equal to the standard deviation for repeated measurements (i.e, 14 units). This split was chosen to create groups with consistent RF values within the group and different average RF means between groups. Applying this strategy resulted in five material groups (see Table II). Materials within these groups are not significantly different from each other. Of the 1262 RF values used for this analysis, 1052 (83%) are from the largest group of materials, with the second highest recovery factor ranging from 76 to 86. The other four groups of materials had only 28–115 results for each group. Since the largest group was composed of 17 of the 29 materials studied, it may be useful to consider this group as the primary reference for recovery studies.

Table II: Material of construction grouping by recovery factor (RF).

The 1052 RF results from the largest material group were analyzed separately from the other results. Both detergents and products were included. When RF was compared across products, ignoring differences among materials within the group and ignoring sites, the standard deviation was less than 13 units. This result compared favorably to the 14-unit standard deviation for the entire data set when all products and materials were estimated separately.

Representative material from each group. This result implies that RF for any new product can be estimated from a material within this group with a standard deviation of 13 units. The mean RF for each material in the group is shown in Table III. The most conservative choice for a representative material is the lowest mean recovery, which is nylon. The mean or median recovery of the materials in the group is either brass or Nylaclast Oilon. These three materials, however, represent a very small percentage of the actual equipment. The most logical and practical representative material for recovery studies is stainless steel. Stainless steel is included in the group, is not on the extreme high end, and is the major component of manufacturing equipment. The RF from other material groups can be determined as exceptions to this group.

Table III: Second highest group mean recovery factor (RF).

Test case. As a test of the data analysis, a recent study measured the percent recovery of a development compound, MK-0524, and niacin. The study included five materials of construction from the largest group of recovery materials (see Tables II and III), including stainless steel. The study results, shown in Table IV and Figure 2, confirm the overall study conclusions. The average recoveries on the materials of construction were well within the expected Table III value ±15% range.

Table IV: Average recovery data.

The repeatability for these results was better than typical. The standard deviation estimate of 4.3 for repeated tests across different materials compared favorably to the data in the large data set, where standard deviations were typically 13 to 14 units. The tighter data set was most likely the result of study parameters. The study included a single site and relatively large number of recoveries for each compound, conducted in the same time frame following the same procedure.

Figure 2: Recovery variability. (FIGURE 2: MERCK & CO. INC.)

The standard deviation was tighter for this data set, so it was possible to detect statistically significant differences among some materials for MK-0524; however, the confidence intervals for recovery estimated on the five different materials were all well within the ± 15% expected range. The confidence intervals for niacin tested on five different materials were all overlapping.


The analysis showed that a recovery study conducted at one manufacturing site using stainless steel could serve as a representative material of construction for most materials used in drug-product manufacturing applicable across multiple sites.

Statistical approach for the recovery analysis.


The authors acknowledge the contributions of the 16 site representatives, without whom, this study would not have been possible.

Richard J. Forsyth* is an associate director of worldwide GMP quality with Merck & Co., Inc., WP53C-307, West Point, PA 19486, tel. 215.652.7462, fax 215.652.7106, richard_forsyth@merck.comJulia C. O'Neill is a senior scientific fellow of regulatory and analytical sciences, and Jeffrey L. Hartman is a validation manager of regulatory and analytical sciences at Merck & Co., Inc.

* To whom correspondence should be addressed.

Submitted: Mar. 26, 2007. Accepted June 4, 2007.


1. FDA, Guide to Inspection of Validation of Cleaning Processes (FDA, Rockville, MD) , July 1993).

2. R.J. Forsyth and D. Haynes, "Cleaning Validation in a Pharmaceutical Research Facility," Pharm. Technol. 22 (9), 104–112 (1998).

3. G.M. Chudzik, "General Guide to Recovery Studies Using Swab Sampling Methods for Cleaning Validation," J. Val. Technol. 5 (1), 77–81 (1999).

4. R. Sharnez et al., "In Situ Monitoring of Soil Dissolution Dynamics: A Rapid and Simple Method for Determining Worst-case Soils for Cleaning Validation," PDA J. Pharm. Sci. and Technol., 58 (4), 203–214 (2004).

5. C. Glover, "Validation of the Total Organic Carbon (TOC) Swab Sampling and Test Method," PDA J. Pharm. Sci. and Technol. 60 (5), 284–290 (2006).