OR WAIT 15 SECS
The authors recommend a strategy for classifying similar nonstainless-steel surfaces into three groups based upon the analytical recovery that was observed in this study.
Cleaning validation and verification are based on the premise of risk management. Several regulatory and guidance documents make this clear. The International Conference on Harmonization's (ICH) guideline on risk management outlines several approaches to making and documenting risk-based decisions (1). It clearly states that risk management should be based on scientific knowledge and that personnel should evaluate the effect of potential failures on the patient. In addition, it notes that the levels of effort, formality (e.g., use of tools), and documentation of the quality risk-management process should be commensurate with the level of risk.
The US Code of Federal Regulations states that equipment and utensils shall be cleaned, maintained, and sanitized at appropriate intervals to prevent malfunctions or contamination that would alter the safety, identity, strength, quality, or purity of the drug product (2). In accordance with 21 CFR 211.67, ICH issued recommendations on equipment maintenance and cleaning (Q7A, Sections 5.20–5.26) for compliance and safety that include similar, but more detailed requirements (3).
The US Food and Drug Administration's 1993 guidance on cleaning inspections states that for a swab method, recovery should be established from the surface (4). The guidance contains no specific requirements about how to establish these recovery estimates, or the acceptance limits. It is up to the manufacturer to document the cleaning rationale (i.e., process and acceptance limits) for maintaining the quality and purity of the drug product being manufactured.
Cleaning validation and verification
Cleaning verification consists of routine monitoring (e.g., swab analysis) of equipment-cleaning processes. Cleaning validation confirms the effectiveness and consistency of a cleaning procedure and eliminates the need for routine testing (5). For example, cleaning limits are established to determine the maximum allowance of Product A that can carry over to Product B. The calculation of these limits is well documented and includes factors that increase the margin of safety to protect the patient (6, 7). Because it is not feasible to swab every square inch of the equipment, swabbing locations are chosen based upon factors such as how difficult the area is to clean, the size of the equipment, and the areas where product buildup is likely. All product-contact surfaces must be considered during cleaning verification to demonstrate that equipment is clean, and a recovery value is expected to be established for each product-contact surface during method validation. The recovery is used to correct the submitted swab result for incomplete removal from the surface and to compare it with the acceptance limit. This last aspect of risk management (i.e., establishing the surface recovery) is the focus of this article.
Analysts have many ways to establish the swab-recovery value for a particular product-contact surface. Stainless steel is the most common material in a manufacturing environment (see Figure 1). Some companies therefore establish a recovery value for stainless steel and apply that standard to all swab submissions. Other companies attempt to establish a recovery value for each product-contact surface for every compound. From an analytical standpoint, supporting this activity becomes arduous, if not impossible to sustain. For example, equipment in a clinical-trial materials (CTM) manufacturing area is used for many compounds in the company's portfolio. New equipment might have different product-contact surfaces. Each compound in the portfolio manufactured on a new piece of equipment would require a method revalidation to add a recovery factor for the new product-contact surface. As the number of materials of construction increases, the difficulty of sustaining that approach also increases. Grouping materials of construction for analytical-method development in support of cleaning verification and validation activities is an excellent opportunity to apply a quality risk-management approach, especially when the total product-contact surface area is considered. Stainless steel accounts for approximately 95% of the surface area in a CTM manufacturing and packaging environment. Other product-contact surfaces account for only 5% of the total surface area. When polymer surfaces are considered in a CTM packaging environment, the number of minor product-contact surfaces can grow significantly. A risk-management approach allows the majority of the time and effort to be spent on activities that ensure the cleanliness of the stainless-steel area while identifying, analyzing, evaluating, and communicating the risks associated with the small fraction of remaining surfaces. This strategy does not ignore the surfaces other than stainless steel, but divides them into three recovery groups to support analytical-method validation. By choosing representative recovery surfaces for those nonstainless-steel materials, the effort proportionally addresses the risk.
Figure 1: (ALL FIGURES ARE COURTESY OF THE AUTHORS)
Design of experiments
Several variables (i.e., roughness average, material of construction, active ingredient, and spiked amount) were evaluated in a randomized fashion to prevent systematic bias that could be introduced by going from the lowest to the highest acceptance limit, from the smoothest to the roughest surface, or from one material of construction to the next. The initial design of experiments included two active pharmaceutical ingredients (APIs), three spiked acceptance-limit levels (i.e., 0.5, 5.0, and 50 μg/swab), seven surface types, four target roughness averages (Ra < 25, 75, 125, and 150 μin.), and six replicates per surface. These Ras were targeted to evaluate whether surface recovery depended on the surface Ra. Coupons were divided into a group of polymers [i.e., Lexan (polycarbonate), acetal (Polyoxymethylene), and PTFE] and a group of metals (i.e., stainless steel 316L, bronze, Type III hard-anodized aluminum, and cast iron). These surfaces were chosen to represent a cross section of surfaces found in the CTM manufacturing and packaging areas and required 1008 swab determinations to complete the study. The remaining product-contact surfaces found in the clinical-trial manufacturing and packaging areas were evaluated according to the initial design of experiments. These surfaces included nickel, anodized aluminum, Rilsan (polyamide), Oilon (blended-oil nylon), and stainless steel 316L with a 4 × 4-in. area.
The authors chose two APIs for this evaluation on the basis of their solubility profiles to represent the most- and least-soluble compounds a company would likely manufacture. Compound A, the less soluble, is slightly soluble in methanol and insoluble across the pH range, but Compound B is soluble in all solvents. In addition, Eli Lilly (Indianapolis, IN) identified Compound A as one of the most difficult compounds to clean from equipment, based on its low solubility and staining properties. A control (i.e., stainless steel 316L, 0.5 μg/swab, Compound A) was run each day that data were generated.
Equipment and operating conditions
The authors used an Agilent 1100 high-performance liquid chromatography (HPLC) analyzer (Agilent, Santa Clara, CA) for all experiments. The HPLC operating conditions were validated according to ICH standards for precision, linearity, limit of detection (LOD), limit of quantitation (LOQ) and specificity (see Table I) (8). Precision was 1.85% and 3.13% for Compounds A and B, respectively, and was determined at 0.025 μg/mL (i.e., 25% of the lowest spike). The method was linear across the equivalent range of 0.5 μg/swab to 5 μg/swab (R = 0.999). The LOQ was calculated to be 0.005 μg/mL for Compound A and 0.008 μg/mL for Compound B. The LOD was calculated to be 0.001 μg/mL for Compound A and 0.0024 μg/mL for Compound B. Swabs and solvents did not result in interfering peaks. The authors performed swabbing consistently using Texwipe Alpha large swabs (ITW Texwipe, Kernersville, NC). First, 10 vertical swipes, then 10 horizontal swipes were performed for the 2 × 2-in. surfaces. For the 4 × 4-in. surfaces, 20 swipes were executed in each direction. Methanol was used as the swabbing solvent. Spike amounts were 0.5, 5, and 50 μg per surface and were extracted into 5 mL of mobile phase, which corresponded to 0.1-, 1.0-, and 10-μg/mL standard concentrations, respectively. The authors used a Quanta FEG 200F field-emission scanning electron microscope (SEM, FEI, Hillsboro, OR) to generate the surface images.
Table I: High-performance liquid chromatography (HPLC) operating conditions.
Results and discussion
In this study, a single analyst evaluated the analytical swab recovery from a representative set of surfaces found in the CTM manufacturing and packaging areas. The surfaces were manufactured specifically for this study to have a broad range of Ras. In addition to Ra, the effect of the material of construction, acceptance limit, compound, and method variability also were evaluated. Based upon these data sets, the authors used a strategy involving three groups of materials to represent all of the surfaces in CTM operations. Merck and Co. used a similar strategy to establish five recovery groups (9). The authors expanded on Merck's strategy by adding a detailed study supporting the groups and an approach for determining the appropriate placement of new surfaces into pre-established groups.
Roughness average (Ra). The Ra targets listed above were difficult to achieve. The intermediate Ra values were significantly lower than the target values given in the design of experiments section above. Both intermediate Ra values, initially targeted for 75 and 125 Ra, were measured to be approximately 40 μin. Although the machining process at each level yielded visually different surfaces, the measured Ra changed little from surface to surface. The authors decided to proceed with the surfaces and define smooth surfaces as Ra < 100 μin. and rough surfaces as Ra > 100 μin. This approach allowed for an assessment of the anticipated relationship between Ra and analytical recovery.
The Ra had little impact on the observed analytical-swab recovery, but the recovery was expected to improve with lower Ras. Figure 2 shows roughness grouped by surfaces that had a measured Ra > 100 μin. and by surfaces that had a Ra < 100 μin. Only 5- and 50-μg spikes are represented in Figure 2; the variability in the 0.5-μg spikes confused the interpretation of the data slightly, but is consistent. As Figure 2 shows, the recovery within each roughness group was approximately the same for a given analyte on a given material and did not correlate to Ra. Therefore, Ra should not be used as a predictor of analytical recovery or as a grouping criterion.
Material of construction. Because Ra was eliminated as a factor contributing to recovery losses, the authors performed data analysis by combining all average recovery values and assessing the effect of the material of construction. The data in Figure 3 were first separated by API, and groups were generated to represent the logical separations in recovery. Figures 3(a–c) contain the data for the 0.5-μg spikes, the 5-μg spikes, and the 50-μg spikes, respectively. The data from the 0.5- and 5-μg spikes exhibited a trend similar to that of the 50-μg spikes. The variability in the results increased as the spiked amount decreased, and the 50-μg spike results were substantially less than that of the other spike levels.
For both compounds, the Type III hard anodized aluminum exhibited the poorest recovery. The next logical break point grouped bronze and cast iron. The recovery of Compound B from bronze suggested that the material was representative of Group 1. The recovery of Compound A on bronze was lower and more variable, however, so the authors placed bronze into Group 2. For the majority of the surfaces, the recovery of Compound A was lower than that for Compound B at a given limit. In some cases, the recovery was approximately the same (i.e., of 5- and 50-μg spikes on cast iron, and of the 50-μg spike on Type III hard anodized aluminum). In addition, the predominant trend was that the average recovery of a compound increased as the spiked amount increased on a given material of construction. For example, the recovery of Compound B from stainless steel 316L was approximately 74%, 90%, and 95% at 0.5-μg, 5-μg, and 50-μg swabs, respectively.
Ra was originally considered a variable in the experiments previously outlined and did not affect swab recovery. To understand the surface attributes that might contribute to incomplete recovery for the different materials of construction, the authors acquired SEM images for Group 1, Group 2, and Group 3 surfaces (see Figure 4). Stainless steel is a relatively smooth surface with some striations from machining (see Figure 4a). Cast iron has a pitted surface that could provide opportunities for an API in solution to be trapped during a spiking experiment (see Figure 4b). The anodization process makes Type III hard anodized aluminum, the worst recovery surface, porous, thereby creating the greatest opportunity to lose analyte (see Figure 4c).
Note that polymers were grouped together with metals and might not be considered to be similar on first pass. The SEM image of Lexan in Figure 4d, however, illustrated that the polymer surface was smooth, albeit with some surface debris, which prevented the loss of analyte. The polymer surface was grouped with stainless steel in Group 1. The SEM images were good supporting evidence that the groupings were logical based upon surface characteristics.
Table II is based on the data shown in Figure 3. The top surface in Table II represents the surface that was validated for recovery in each group. This recovery value represented all others within a given group. The groupings were supplied to the CTM areas, and the group number was included on the swab submission to the analytical laboratory so that the correct recovery factor was applied to each surface. In addition, the table served as a tool for engineering to determine whether newly purchased equipment contained a new product-contact surface.
Table II: Grouping of material surface of construction.
Analytical methods. The model and worst-case compound evaluation did not replace any analytical-method validation activities. Analytical recovery must be established for each compound in the portfolio, but not on all surfaces. If multiple limits are to be considered, or if a range of reporting is required, the lowest limit may be evaluated, and that recovery can be applied to all acceptance limits as a conservative estimate. In the analytical method, three recovery factors were presented: Group 1, Group 2, and Group 3. Methods could be validated for any surface within a group and could be considered representative. The authors chose stainless steel 316L because it is the most prevalent, and cast iron because it is a common material on a tablet press. Type III hard anodized aluminum is the only surface in Group 3. This strategy did not ignore any uncommon surfaces. It grouped them appropriately, swabbed them, and applied a representative recovery factor.
Variability. Method variability was evaluated by performing a control sample (Compound A, 6 replicates, 0.5-μg swab, stainless steel 316L, Ra = 3.5) each day. The mean recovery of the entire experiment was 52%. These data suggested that the swabbing ability of the analyst did not change over time. The standard deviation within a day typically was less than 6. The pooled-within-run standard deviation was 3.99 over the course of the experiments. This value was used as a criterion for grouping new surfaces. The day-to-day standard deviation was 15.34.
Incorporating new materials of construction into the grouping strategy. Periodically, new equipment will be introduced into the CTM area that incorporates a product-contact surface made of a material of construction that is not listed in Table II. This problem is often caused by alloys of metals that have already been evaluated and by polymers of proprietary composition. Because surface recovery must be evaluated, personnel need a way to incorporate new surfaces into the groupings outlined in Table II. When a new piece of equipment is purchased, CTM-engineering employees prepare the needed documentation to evaluate the equipment with regard to the cleaning program before use. If an identified sampling location is made of a new material of construction, the engineer asks the person responsible for the cleaning program and analytical development to perform the next steps, which are shown in Figure 5.
Suppose that new equipment incorporated three new materials: a crystalline thermoplastic polyester marketed under the trade name of Ertalyte (Quadrant Engineering, Reading, PA), stainless steel 420, and stainless steel 630. Without the grouping strategy in place, method revalidation would have to occur for all compounds handled by this piece of equipment. With the strategy in place, these surfaces are placed into groups based on model-compound recovery. No method revisions are required.
The analytical recovery of Compound A was evaluated for Ertalyte, stainless steel 420, and stainless steel 630 at the 5.0-μg/in.2 level. The authors used the validated method to evaluate the recovery of the three new surfaces compared with a representative surface from Group 1 (i.e., stainless steel 316L), Group 2 (i.e., cast iron), and Group 3 (Type III hard anodized aluminum). Recovery was evaluated for both the group representative and the new surfaces on the same two days with three replicates on each day. As an alternative, six replicates may be performed on the same day as the controls because the comparison of recovery is relative.
The new surfaces are placed in one of the three groups or define a new lower group, based on how close their average is to that of the group control. The authors obtained a cutoff value of 3.0% in the following manner. Based upon the data for the control, the within-day standard deviation was calculated to be 3.99 and was used in Equation 1. A series of one-sided hypothesis tests with an error rate of α = 0.10 were performed to assess whether the new surface mean was less than a specified control-surface mean. The confidence limit half-width for the difference between the means of two surfaces was computed using the within-day standard deviation because the replicates for each surface had to be run on the same two days, with each day treated as a block. For these calculations, it was assumed that this standard deviation was known. The lower one-sided confidence limit for the difference in means was derived using the following steps:
This calculation did not indicate a difference between a control surface and the new surface under evaluation if the recovery differed by less than 3.0%. This approach was conservative because Eq. 1 categorized a new surface into Group 2 if it differed from stainless steel by more than 3.0%, which could be viewed as a strict criterion. Because the results were obtained on the same days and runs, the authors believed that this approach was reasonable. When evaluating the recovery of a new surface, this strategy helps personnel to place each material into the appropriate group. The grouping starts with a comparison of the new surface average to that of Group 1 (stainless steel 316L) and continues sequentially. If the new surface recovery (NSR) is more than 3.0% less than that of the group reference, it is compared with the reference surface in the next lower group until a group is found with which it does not differ by more than 3.0%. If no such group is found, then the new surface forms a new, lower group. The procedure is as follows:
The data for the three new surfaces are outlined in Table III. Based on this approach, Ertalyte was placed into Group 1 because the difference between its recovery and that of stainless steel 316L (i.e., Group 1) was less than 3.0% (i.e., 2.48%). The recovery from stainless steel 420 and stainless steel 630 was more than 3.0% less than that from stainless steel 316L, but greater than that from cast iron. The authors placed stainless steel 420 and stainless steel 630 into Group 2.
Table III: Swab recovery of three new materials of construction compared with controls from each representative group.
The placement of these two grades of stainless steel into Group 2 highlighted the conservative nature of this approach because their recoveries were only 4% and 8% less than that from stainless steel 316L. The recoveries were 12% and 9% greater than that from cast iron for stainless steel 630 and 420, respectively. Table II was updated to reflect this placement. Because the grouping strategy was conservative, it prevented the underestimation of recovery factors when assay values were reported and prevented the formation of additional groups for method validation unless the recovery value for a new surface is sufficiently low to warrant such an addition.
The authors' data-driven risk-management approach to cleaning verification methods uses analytical-recovery values for a model compound to place product-contact surfaces into groupings for analytical-method validation. The data generated during the studies supported the formation of three recovery groups to validate analytical swab methods. Groups 1–3 were represented by stainless steel 316L, cast iron, and Type III hard anodized aluminum, respectively. This approach allowed all surfaces to be considered during analytical-method validation and provided an objective mechanism to incorporate new surfaces into the strategy.
The benefits of this strategy are numerous. First, only three surfaces must be validated on each compound, which drastically minimizes the number of recovery values established to support the entire portfolio. Second, the strategy includes a way to add new materials of construction to the cleaning program if new equipment is purchased. Traditionally, all swab methods must be revalidated to incorporate the new surface. With this strategy in place, a model compound is evaluated, the new surface is grouped, and no changes to existing methods are required. Third, the strategy allows for a constant state of compliance. A relative recovery value is known for any material of construction for all equipment.
Because the grouping strategy is applied to a small fraction of the total surface area, no surface material of construction is ignored, each molecule undergoes a typical method validation, and the strategy places surfaces into groups conservatively. The authors believe that the strategy controls risks appropriately and that the data set given in this study scientifically supports the strategy of grouping materials of construction to support analytical methods within the cleaning program.
The authors would like to acknowledge the following colleagues at Eli Lilly: Gifford Fitzgerald, intern, for generating the swab-recovery data; Ron Iacocca, research advisor, for the SEM data; Sarah Davison, consultant chemist; Mike Ritchie, senior specialist; Mark Strege, senior research scientist; Matt Embry, associate consultant chemist; Kelly Hill, associate consultant for quality assurance; Bill Cleary, analytical chemist; and Laura Montgomery, senior technician, for their contributions and insightful suggestions throughout the project. In addition, Leo Manley, associate consultant engineer, provided the roughness measurements in support of this project.
Brian W. Pack* is a research advisor for analytical sciences research and development, and Jeffrey D. Hofer is a research advisor for statistics, discovery and development, both at Eli Lilly and Company, Indianapolis, IN, tel. 317.422.9043, firstname.lastname@example.org.
*To whom all correspondence should be addressed.
Submitted: Oct. 12, 2009. Accepted: Dec. 22, 2009.
1. ICH, Q9 Quality Risk Management, Step 5 version (2005).
2. Code of Federal Regulations, Title 21, Food and Drugs (Government Printing Office, Washington, DC), Part 211.67.
3. ICH, Q7 Good Manufacturing Practice Guide for Active Pharmaceutical Ingredients, Step 5 version (2000).
4. FDA, Guideline to Inspection of Validation of Cleaning Processes (Rockville, MD, July 1993).
5. L. Ray et al., Pharm. Eng. 26 (2), 54–64 (2006).
6. PDA, Technical Report 29, "Points to Consider for Cleaning Validation" (PDA, Bethesda, MD, Aug. 1998).
7. G.L. Fourman and M.V. Mullen, Pharm. Technol. 17 (4), 54–60 (1993).
8. ICH, Q2 Validation of Analytical Procedures: Text and Methodology, Step 5 version (1994).
9. R.J. Forsyth, J.C. O'Neill, and J.L. Hartman, Pharm. Technol. 31 (10), 102–116 (2007).