The recovery mission – coping with validation failures

Stephan Krause;

News

Article

Pharmaceutical Technology Europe

May 1, 2008

Pharmaceutical Technology Europe

Pharmaceutical Technology Europe-05-01-2008

Volume20

Issue 5

The recovery mission – coping with validation failures

Author(s):

Stephan Krause

Practical guidance on how to handle validation failures cannot be found in the existing literature because they are not supposed to happen.

Validation failures, just like out-of-specification (OOS) results, occasionally occur when a firm performs risk-based validation. Practical guidance on how to handle these failures cannot be found in the existing literature because validation failures are not supposed to happen. In reality, however, they occur at somewhat predictable frequencies similar to the likelihood of obtaining OOS results whenever true risk-based validations are performed. This article follows-up from previous publications regarding this controversial topic.^1,2 It openly discusses the recovery process options for failed validations to stimulate the development of regulatory guidance. Although applicable to all validation projects, this issue is limited to analytical method validation (AMV) failures, and discussed with respect to the recently published guidance documents (CDER/FDA guidance, Investigating OOS Test Results for Pharmaceutical Production, and ICH Q10, Pharmaceutical Quality System).

Challenging our systems by setting tighter acceptance criteria whenever needed is an important part of risk-based validation. In today's fast-paced and project-completion driven biopharmaceutical industry, failures are sometimes necessary to trigger much-needed process optimizations. Without intending to conceptually change the objectives of validation studies, this article provides practical guidance on how to handle occasional failures in a GMP environment. The setting of acceptable limits based on risks to patient and firm for all analytical method lifecycle steps that include test method selection, development, optimization, validation, transfers, maintenance and retirement/replacement has been extensively discussed elsewhere.¹

Before we can clarify how to handle AMV failures, we should first define them in the context and intent of this article. A validation failure differs from a validation discrepancy as it results from failing to pass acceptance criteria (or preset test method performance specifications). This is more difficult to deal with than protocol discrepancies when we simply did not follow our approved protocol instructions.^1,2

Strictly speaking, failing to pass our protocol limits indicates that the to-be-validated test method is not suitable for its intended use. However, we cannot usually abandon the project and simply move on to another. Because the description of particular AMV failures may not be part of regulatory pre- or post-licensure submission, AMV failures can become a significant inspection risk to the firm. Similar to the handling of OOS results and unacceptable levels of discrepancies, the focus in regulatory inspections on how validation failures are managed may quickly shift towards the firm's quality systems if apparent deficiencies or violations exist.¹ Because of the similarity to the OOS investigation process, the principles applied here are similar to those discussed in the recently published CDER/FDA guidance for OOS results.³ The recently published ICH Q10 guidance addresses management's responsibilities in facilitating continuous improvements.⁴ Both documents support the need for 'open' recovery processes for failures, but they lack sufficient practical guidance.

The 'recovery mission'

AMV protocols for a critical test method can contain many acceptance criteria and often several ones for each validation characteristic. There is a predictable chance of not passing acceptance criteria especially when sample manipulations, such as spiking, are performed or if interfering matrix components exist. If everything passes acceptance criteria, we would likely consider the project completed and hold back on implementing immediate optimizations. To stay within the focus of this article, let us assume that we failed protocol acceptance criteria and that we must now deal with this situation.

Figure 1

Figure 1 illustrates the management of an AMV failure and its recovery process. The failure to pass our protocol acceptance criteria is highlighted in grey to visually represent the fact that we have entered the 'grey zone' where processes may no longer be governed by detailed procedures. It is now critical for future inspections, as well as project completion, to make good decisions. Once a validation failure is observed, it should be noted in a formal investigation report. To make decisions that keep in mind the interest of all stakeholders — especially the patient — we should now answer the questions in Table 1. The answers should lead us to the best possible course of action. However, the remedy may involve accepting some level of project completion risk (upper half of Figure 1) or inspection risk (lower part of Figure 1).

Table 1 Checklist of most common questions and possible answers.1,2

Assessing our answers

In a world with unrestricted time and resources, we would prefer to use the upper loop. The lower loop process steps are in grey because we may only achieve a 'non-optimal' validated status once the report is closed, but this inspection risk may be acceptable from a business perspective. The acceptance criteria could have been set too conservatively and resetting the criteria may be justifiable. Of course, the number of failing versus passing validations should be significantly smaller.¹

Choosing the upper loop, the first choice in Figure 1 is to re-execute with the current protocol acceptance criteria based on having located and fixed the root cause. Correcting an unexpected root cause will not change or improve anything for the routine test method performance. For example, spiked proteins were partially adsorbed to glass containers before sample preparation causing low protein recoveries to be observed. Thus, after concluding that plastic containers should have been used, the validation studies are re-executed. This could be justified simply based on the fact that the original suitability requirements for the test method remain unchanged.¹

The second choice, 'Tightening of operational limits', would require us to run the test method system under more stringent operational limits. For example, if we failed the intermediate precision requirements we could now procedurally reduce the allowed sample preparation range or overall testing time to minimize variations in degradation or other inconsistencies that impact the test results. Alternatively, as we now understand which one of the contributors to the observed overall test result variation is unacceptable, we could tighten the qualification requirements for this test system component; for example, if it was found that the operators significantly contributed to this failure and we are forced to raise the requirements for demonstrating operator proficiency. We are not limited to improving only single sources here. All implemented improvements usually have a positive effect on reliability and certainty in test results. Although these limitations may indicate that the test method is not very robust, this should, nevertheless, lead to improved intermediate precision results.¹

Choosing to optimize the test methods may have greatest effect on future test method performance, but this is usually the most expensive recovery process — when considering only the project completion impact — and may require significant time to complete. A rigorous method optimization effort may often result in the most noticeable increase in test method performance. As timely completion of projects is vital for many firms, we should consider all aspects that may impact patient safety, process or product quality, compliance/inspections, completion time, costs, chances for improvement success, and short- and long-term benefits.¹

Case study: failing intermediate precision for a potency assay

In this simplified case study, an automated enzyme-linked immunosorbent assay (ELISA) test method is used as a potency targeting assay for filling. The assay results are used to adjust the final formulation buffer proportions with respect to final bulk drug substance (BDS) potencies, with the goal of obtaining results around the final container nominal potency of 100 IU/mL. The final container release specifications proposed are 80–125 IU/mL. As part of AMV studies, the method performance characteristic intermediate precision was formally evaluated. This test method performance characteristic is the most critical. In addition to test method imperfections at the final container stage, the observed variation in results caused by the test method will directly contribute to final container potency variation. Therefore, we should limit the measured variation at the critical BDS stage as only this step affects the actual efficacy/potency in the vials.

To evaluate intermediate precision, the AMV protocol required a design of experiments matrix in which the factors days, instruments and operators were varied. All factors were evaluated in sets of three testing an identical BDS sample. The variation in test results was assessed using a mixed linear model analysis from EURACHEM/CITAC Guide CG4.⁵

The AMV protocol had the following acceptance criteria for intermediate precision:

Individual factor coefficients of variation (CV) for instrument, operator and day to be ≤6%.

Overall CV to be ≤10%.

The results for intermediate precision are shown in Table 2 and we can immediately assume that 'operator' and 'day' are not critical method components that will cause a significant variation (>6%) in test results during routine operations. This is a positive situation regarding training and operator proficiency qualification requirements, as there is a lower risk that test results will be affected when using different operators with time. In addition to the demonstration of test result reliability with time (days), we were able to simultaneously verify that our BDS sample aliquots are stable during several days at the storage conditions required (per protocol). Conversely, we failed to pass the overall CV and individual CV criteria for 'instrument'. There is a significant amount of variability observed among the three different instruments used. Although this is somewhat typical of an automated potency assay with relatively minimum operator involvement, it is still something that will significantly contribute to overall assay variability. So, what are our options?

Table 2 Mixed linear model results for intermediate precision.

Using Figure 1, we decided not to go into the lower loop because of the critical nature of this process stage for the dosing range (80–125 IU/mL) and the required reliability (intermediate precision) of this test method. If all three instruments were not needed for future ELISA testing, at this stage we could drop the one instrument that is furthest from the mean result and redo the AMV part with the two remaining instruments.

If we needed to use all three instruments for release testing, we may want to consider evaluating which particular automation steps for this assay contribute to the variability by dissecting the automated process into smaller steps. Once these steps have been identified as the critical ones, we may be able to further tighten the operational or system suitability limits to reduce the differences among the instruments.

Similarly, the unidentified residual variation (CV=11.6%) could be analyzed first by thoroughly reviewing all analytical method development (AMD) data. Further component analysis could be done if the target is to improve the overall test method precision. The AMV studies could then be re-executed and the original risk-based acceptance criteria would likely be passed. Failing the critical acceptance criteria for intermediate precision may force us to improve the reliability of the BDS potency test results. This would lower the risks to the patient (more consistent batch-to-batch potencies), as well as the firm (less OOS results at the final container stage).

Case study: failing accuracy for a purity assay

In this second case study, a sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) assay is used as our approved final container release test method for a biopharmaceutical drug. This test method has been used to quantitate a known product impurity in clinical Phase III studies. The firm intends to implement an automated capillary electrophoresis (CE) test method. To demonstrate that the CE method is equal or better than the approved test method, the method performance characteristics accuracy, intermediate precision and quantitation limit were compared. Accuracy (or 'matching') is particularly important as we wish to avoid loosing the direct link to the current process performance and clinical data or having to change release specifications. For simplicity, we will focus only on comparing the matching of test results for the protein impurities. The licensed release specification for the host cell impurity at the final drug product stage is ≤7.0%.

To demonstrate matching, we are using an equivalence comparability model similar to ICH E9.⁶ When comparing matching, obtaining equivalence is preferable compared with a superior or noninferior model simply because we want to avoid a shift in future test results. After a historical data review, the protocol maximum difference (delta) between measured impurities from the reference and the new method of ± 1.0% was chosen to be an appropriate limit. Both methods are run side-by-side for 30 final product samples. Results are compared by two-sided matched-paired t-test statistics at 95% confidence level.

The AMV protocol was set up for the equivalence part:

Sample size (n) for each method: 30

Desired difference for 95% confidence interval (CI) for mean results between methods: 0%

Minus Delta: –1.0%

Plus Delta: +1.0%

Specifications: ≤7.0%.

The results for equivalence were:

SDS-PAGE mean (n=30): 3.8%

Resulting allowable CE range for mean difference: 2.8–4.8%

Actual CE mean (n=30): 5.1%

95% CI of CE results (versus SDS-PAGE): 4.88–5.32%

The results for the testing of equivalence between the licensed SDS-PAGE test method and the new CE test method are graphically illustrated in Figure 2. In addition to the actual results for the 95% CI (4.88–5.32%), several other possible outcomes are shown and briefly discussed. Starting from the left in Figure 2, the 95% CI did not overlap the hypothesized 0 difference mark, resulting in a statistically significant difference between both methods for the reported impurity. However, the 95% CI falls entirely within the allowable ± limit. We would have passed the AMV protocol acceptance criteria for equivalence between test methods. The next one to the right, the 95% CI overlaps 0 and falls entirely within ± delta. This would have been an ideal result. Further to the right, the 95% CI overlapped the +delta limit. This hypothetical result is inconclusive as we would be unable to accept it as a passing or failing result. Usually more equivalence testing would have been performed, resulting in a greater sample size and narrower 95% CI that may then fall entirely inside or outside of the +delta limit. In this particular case, it may not be possible to achieve the goal because 30 paired results have already been obtained. The generation of more results may not sufficiently shrink and/or shift the 95% CI.

Figure 2

Returning to our AMV failure, as illustrated in the 95% CI on the very right in Figure 2, what can we do? We demonstrated that a significant difference exists between both test methods when reporting host cell impurities. We should not perform more testing, as our equivalence studies were set up for a fixed sample size of n=30. We simply need to accept the results. Our options are:

We could accept this as a failure, close the report and move on to another project.

We could accept this difference (instead of 'failure'), close the report and relate this difference with respect to the impact to patients, clinical data, and historical product and process data. This test method change could be submitted without changing release specifications (≤7.0%). The request can be approved from a patient safety perspective because the final product specification for the host cell impurity would be proportionally lower for the new CE test method. However, this may be undesirable for the firm because it will result in a higher probability for OOSs, and all future purity results by the CE method will be statistically different from the historical data. Together, both may outweigh the benefits of higher product quality, technical gain and convenience in the quality control laboratory.

Another possibility may be to follow the option above, but, instead of submitting this request as a test method replacement only, to submit it with a request of change for the release specification for the protein impurity. This can be justified because actual impurity levels are not changing, and the process and product quality remain the same. In addition, the link to clinical and historical data, as well as continuous process monitoring data, can be sustained if the sudden shift in future postimplementation impurity results is anticipated. Further to our existing AMV data, we may want to compare test results closer to the specification level (7.0%) to generate evidence as to whether this bias in testing is a constant difference (7.0%= >8.3%) or a relative difference resulting in a difference of approximately 2.4% (7.0%= >9.4%). Following this option would allow us to sustain the current product quality without having to suffer from a significant increase in costly batch rejections.

On the go...

Acknowledgements

The author would like to thank the PTE team and the Tako-team (P. Bonaz and 'The Twins') for editing this article.

Stephan Krause is the QC Director at Favrille, a San Diego-based manufacturer of patient-specific cancer immunotherapies. Prior to joining Favrille, Stephan held leading positions in quality for major pharmaceutical firms. He has published numerous articles and presented at many conferences worldwide. His book on risk-based validation and implementation strategies won the 2008 Annual PDA Best Author Award, and he is PDA's Task Force Chair for analytical method validation for commercial biopharmaceutical products. Stephan completed his doctoral work at the University of Southern California (CA, USA) in the the field of bioanalytical chemistry.

References

1. S.O. Krause, Validation of Analytical Methods for Biopharmaceuticals — A Guide to Risk-Based Validation and Implementation Strategies (PDA/DHI Publishing, Bethesda, MD, USA, 2007).

2. S.O. Krause, PDA Letter, 43(9), 11–20 (2007).

3. CDER/FDA Guidance for Industry — Investigating Out-of-Specification (OOS) Results for Pharmaceutical Production, October 2006. www.fda.gov/cder

4. ICH Q10 — Pharmaceutical Quality System, Draft Consensus Guideline, May 2007. www.ich.org

5. EURACHEM — EURACHEM / CITAC Guide CG4, Quantifying Uncertainty in Analytical Measurement, 2000. www.measurementuncertainty.org

6. ICH E9 — Statistical Principles for Clinical Trials, September 1998. www.ich.org

Articles in this issue

Handling difficult samples in bio-analytical chemical analysis

Dawning of a new drug delivery era?

A matter of perspective

Managing the future

China captivates the clinical trial sector

Getting the most out of producing therapeutic biologics

Get the essential updates shaping the future of pharma manufacturing and compliance—subscribe today to Pharmaceutical Technology and never miss a breakthrough.

Subscribe Now!

The recovery mission – coping with validation failures

The 'recovery mission'

Assessing our answers

Case study: failing intermediate precision for a potency assay

Case study: failing accuracy for a purity assay

Acknowledgements

References

Newsletter