Reduced method robustness testing of analytical methods driven by a risk-based approach - Pharmaceutical Technology

Latest Issue

Latest Issue
PharmTech Europe

Reduced method robustness testing of analytical methods driven by a risk-based approach
A novel approach for assessing method robustness is described that uses risk-based assessment tools to identify, score, prioritise and then group method parameters. These parameters are then studied using reduced fractional factorial designs (termed Reduced Method Robustness) to evaluate the suitability of analytical methods prior to full validation. This simple approach helps to identify high risk method parameters earlier and can potentially save resource later in the development process.

Pharmaceutical Technology Europe
Volume 4, Issue 22

This article has been published simultaneously in the April issues of Pharmaceutical Technology Europe and Pharmaceutical Technology.

The development of robust and rugged analytical methods is a vital part of the drug development process.1,2 It is essential to ensure that levels of critical quality attributes (CQAs) present in batches of drug substance (this paper focuses on low level impurities in a starting material used in the manufacture of the API) or drug product are accurately and precisely quantified, thus assuring patient safety. The benefits of reliable methods in a manufacturing environment can be measured in terms of reduced process down time and reduced overall costs, 3 as the number of atypical results/method incidents will be minimised. Furthermore, robust analytical methods form a key part of the control strategy in Quality by Design (QbD).4,5

The goal of robustness testing is to evaluate the effect of small deliberate changes in method parameters (e.g., temperature, % organic modifier and pH of mobile phase, which are internal to the written method) on the qualitative and quantitative abilities of the method. Ruggedness testing, 6 however, evaluates whether noise factors4 (e.g., different analysts, different instruments and different lots of reagents, which are external to the written method) have an effect on the reproducibility of the method.

The use of experimental design approaches (Design of Experiment; DoE) has been used for analytical work (in particular for determining robustness of analytical methods) for more than three decades. Validation of analytical methods in late phase development has traditionally involved performing robustness testing as one of the last activities after characteristics, such as specificity, linearity, range, accuracy, precision and sensitivity have been studied. Although, this is the standard approach reported in the literature and in ICH Q2 guidance,7,8 there are some associated risks with it. For instance, the identification of method parameters that can give rise to non‑robust behaviour leading to poor method performance may only be discovered at the end of the validation study. Moreover, if there is a need to redevelop any part of the method, it is likely that the whole validation exercise would need to be repeated. In the time‑constrained environment of drug development where speed to market is an important criterion of success, delays of this type could be costly.9

DoE is not always used in the development of analytical methods;10 systematic approaches to HPLC method development, which involve the scouting of key components of a HPLC method (e.g., column, pH and organic modifier), are often used.11 DoE can be used to further optimise method conditions resulting from such scouting work, but this may not always be required. The effects of method parameters that have not been studied as part of method development are unknown/unquantified until robustness testing is performed. Therefore, it is desirable to check robustness for such methods in advance of final method validation.

Part of QbD involves identifying the highest risk parameters that affect method performance.4 Risk assessment tools, such as fishbone diagrams,12 failure mode and effects analysis13 and prioritisation matrices offer an easy scientific means of achieving this. A key part of the process involves applying scientific knowledge, past experience and judgement to identify, score and prioritise method risk parameters. The use of a risk-based approach to decide which parameters should be studied in a robustness exercise has been well documented.4 A variant of the prioritisation matrix has been used in this paper to rank method parameters in order of their likely impact on key method responses (e.g., resolution of a critical peak pair and sensitivity to detect a key impurity). The data from the risk assessment have then been used to construct a reduced form of a fractional factorial design or ‘Reduced Method Robustness’ (RMR) design.

RMR approach

Robustness studies typically utilise fractional factorial designs to meet validation requirements - although sometimes Plackett Burman designs are used.14 These designs help to estimate the effects of individual method parameters and all their interactions with each other. 15,16 Each parameter is varied over two levels: high or “level +1” and low or “level -1”, and the two levels of each parameter are then systematically combined to create the set of experiments. Full two-level factorial designs permit estimation of the effect of individual parameters and all their interactions. Although the maximum amount of information is be obtained, full factorial designs are not practical because of the elevated number of experiments to be performed. In fractional factorial designs, only a fraction of the full design is studied, which decreases the overall number of experiments and the statistical resolution because not all the single parameter effects or interactions are estimated independently. It is desirable to further reduce the number of experiments used for two reasons:

  • To enable the use of such studies to provide an early indication of robustness and a direction for further method improvement (if required).
  • To most effectively use resources for the demonstration of robustness.

The number of experiments in fractional factorial designs can be reduced in two ways, which can be combined:

  • Reducing the number of factors
  • Reducing the statistical resolution of the design.

Figure 1: Two ways of reducing the number of experiments (runs); illustration from DX7 software.
Figure 1, taken from the DoE software Design-Expert (DX7), shows how many experiments (runs, vertical axis) are required for various numbers of factors (horizontal axis) and for different statistical resolutions (explained later). It illustrates the two ways of reducing the number of experiments.

Reducing the number of factors

Prior to building the statistical design, a risk assessment is performed to help determine whether the number of factors and the resolution of the design can be reduced. A prioritisation matrix tabulates the method parameters and the method performance characteristics, and the impact of each parameter over these characteristics is assessed and scored. Different scoring scales can be used; in this paper, the following scale has been used: 1=very low impact, 3=slight impact, 5=possible impact, 7=likely impact and 9=strong impact.

The scores for each method parameter are then summed to give an importance score, which is used to rank each parameter with respect to risk.

The outcome of this prioritisation exercise determines whether any method parameters can be removed from the design, combined with other parameters, or should be included as a single factor. It also helps decide appropriate design resolution. An example of the prioritisation matrix is shown in the case study that follows.

The acceptability of removing a parameter partly depends on whether it is an early or final robustness assessment. If doubts remain as to whether or not a parameter should be removed, the parameter can be combined with other low‑risk parameters to determine whether these parameters studied together produce an effect (e.g., instrument‑related parameters with narrow ranges).

Combining parameters is a very efficient way of reducing the number of experiments, but caution is required. When two or more parameters are combined, they will be studied at the same level in the design; for example, whenever parameter A is set at “level +1”, parameter B will also be set to “level +1”, and whenever parameter A has “level -1”, parameter B will also be at “level -1”. Scientific judgement, therefore, must be used when allocating combinations to ensure that the effects of the parameters do not cancel each other out. For instance, if increasing the pH increases the resolution and the effect of the % organic on the resolution is unknown, it is unwise to group pH with % organic in case increasing the % organic decreases the resolution. As a rule, parameters can be combined if they have the same direction of effect (e.g., if increasing the pH and the % organic will increase the resolution) or if they affect different method performance characteristics. Categorical parameters (e.g., column batch and column age) can also be combined, though often the effect of these parameters is the hardest to predict. Studies using these concepts have been termed RMR because they do not investigate all method parameters separately.

The parameters with the highest risk are included as single factors in the design.

Reducing the statistical resolution of the design

Figure 2: Statistical resolutions available for a 7 factor design and scoping experiment.
Reducing the statistical resolution is another way of decreasing the number of experiments. Reducing the resolution means that not all the terms (single parameters or interactions) are estimated independently. The consequence of this is that some of the terms are statistically ‘aliased’ (a statistically combined), where the estimated effect is the combination of the true effect of each of the aliased terms. Depending on the number of factors, several resolutions will be available (Figure 2).

In DX7 (Design‑Expert 7.1 software; Stat‑Ease Inc., MN, USA), the statistical resolution is indicated by both the colours and notation. In the notation 2k-p, 2 indicates that the design is a 2-level fractional factorial design; k is the number of factors; and p is the fraction exponent. Suppose there are 7 factors (k=7):

  • When p=0, 2k is a full design and corresponds to 27= 128 experiments (runs)
  • When p = 1, this corresponds to a fractional design. The fraction is 2-1 = 0.5; the design is cut in half. This equates to 27-1 = 64 experiments (runs)
  • When p = 2, this also corresponds to a fractional design. The fraction is 2-2 = 0.25; the design is a quarter of the full design. This equates to 27-2 = 32 experiments (runs)

And so on.

The most limited form of RMR is a scoping study.17,18 Typically, four experiments make up a scoping study; two centre points and two extreme sets of conditions. At one extreme, the parameters are set to their chosen levels (either “level +1 or -1”) which, for all parameters together, form the condition most likely to produce that extreme response. At the other extreme, the parameters are set to their alternative level. The extremes depend on prior knowledge of the parameters and are not necessarily set at all the low and all the high parameters’ settings - similar to that described earlier with respect to the effect of pH and % organic on the resolution. The centre points represent the nominal method conditions and give an indication of repeatability; the extremes are thought to lead to the lowest and highest expected results. Scoping experiments are often used to check the repeatability and the experimental design space (assessment of whether the factor range settings are appropriate) before committing resources to a more detailed study. A scoping study is extremely unlikely to be suitable for determining robustness of most analytical methods.

For RMR, a resolution III design is typically used. Although this design does not allow direct estimation of interactions, it still tests the whole design space. In addition, should an important interaction exist, this will generally be identified even though more experiments (e.g., by ‘folding over’ the design) may be required to clarify whether it is the interaction or an aliased main effect. As robustness is being tested, few (if any) important effects are expected, which reduces the likelihood of needing to clarify aliases. Scientific judgement can often be used to assign the likely main effect/interaction. In addition to this, if an interaction exists, usually one, if not both, of the main effects involved will also show (hierarchical evidence). Within each category of resolution, the extent of the aliasing increases as the number of experiments is reduced. In addition to accepting a higher degree of aliasing, careful allocation of parameters, or groups of parameters, to each main factor (denoted by a letter in DX7) can assist with retaining separation of important effects.

Consider a specific example where there are 7 parameters to be evaluated and there is considered to be some risk of an interaction between two parameters, here allocated to factors A and F. Ideally the design would estimate AF separately from main effects. Table 1 illustrates some of the considerations and choices to be made for a specific example as follows:

  • Design A is a resolution IV design that separates all main effects from 2 factor interactions, but requires 16 runs (ignoring centre points).
  • Design B is a resolution III (so AF will be aliased with a main effect), but only requires 8 runs. If this design is used, AF should be aliased with the lowest risk parameter (letter G in this case).
  • If it is appropriate to group two parameters, this leads to 6 factors and design C which, despite being resolution III, separates AF from main effects.

Table 1: Example of resolution and aliasing considerations.
If no two parameter interactions were of particular risk, then design B rather than design C should be used because there would only be disadvantages from grouping parameters.

The literature has suggested the use of supersaturated designs, that is designs where the number of main effects exceeds the number of runs in process development.19 This approach is a group factor screening type designs. Risk assessment combined with scientific understanding is used to provide the grouping scheme, an aspect that Lin20 suggests is seldom discussed, but crucial. Whilst supersaturated designs have been suggested for analytical methods,21,22 these focus on designs, which have factors partially correlated in the design rather than using grouping. Consequently, the statistical analysis is not straightforward and Dejaegher and Heyden22 even suggest it should not be used for estimating effects in analytical method robustness testing. Whilst there may be situations where those designs are useful, the approach of grouping factors used in this paper has the following advantages:

  • Easy for analysts to use existing knowledge of fractional factorial design and software.
  • Careful grouping of parameters reduces ambiguities relating to cause of effects and any additional experimentation (e.g., folding over the original design) required to de‑alias effects or separate out effects of grouped parameters will be familiar to many analysts using fractional factorial design.

Case study: application of RMR to GC-FID analysis of N‑acetyl piperazine (NAP)

Table 2: NAP method conditions.
A gas chromatographic (GC) method with flame ionisation detection (FID) for the analysis of a starting material (N-acetyl piperazine; NAP) had been observed in the past to sometimes produce variable results although the cause for this variation was unknown.

Figure 3: Example chromatogram from NAP method.
Therefore, it was deemed necessary to conduct robustness testing to identify and control any parameters that were responsible for this variation. The robustness testing was planned and conducted using the reduced method robustness principals outlined above. Details of the chromatographic conditions and a typical sample chromatogram are shown in Table 2 and Figure 3, respectively.

Parameter risk assessment and prioritisation

Figure 4: Fishbone for GC-FID method parameters (with enlarged method section).
Parameters for the method were brainstormed and categorised on a fishbone using mind-mapping software as in Figure 4.

Table 3: Risk scoring of X parameters and RMR decision based upon risk scoring for GC-FID method parameters.
The fishbone categories were as follows: equipment; environment; measurement; method; materials; and people. The parameters were then given a variable classification - C, N or X (as described by Borman et al.,).4 The X (experimental) parameters are the parameters that will be considered for inclusion in the robustness study. These include instrumental parameters, such as flow rate and temperature, which are given a set point in the method, but will vary in line with the precision of the instrument. These parameters could also be related to the sample preparation (e.g., heating of NAP or sonication time) where variability is derived from how the operator follows the method procedure. To establish which parameters were most important to the method, GC experts scored and prioritised the X parameters using a prioritisation matrix. The method characteristics were: resolution of impurity A and impurity B; % area impurity A; % area impurity B; % area impurity C; limit of quantitation (LOQ), retention time (Rt) for impurity C; and Rt for NAP. Each parameter was assigned a score in the range 1-9 for each method characteristic, as described in the RMR approach section, and the overall importance score was calculated. Results are detailed in Table 3.

Parameters were grouped primarily based on those that are low risk and/or likely to affect different responses. This makes the analysis of the data at the end of the study simpler because there is less ambiguity around assigning significant factors.

Parameters such as column flow and column loading affect signal‑to‑noise ratio and are in the same group. Since the direction of their effects was thought to be the same, this combination was deemed acceptable. It will, however, be impossible to determine which of the two parameters is responsible for any effect observed (only the total effect can be assessed and whether it is large enough to cause concern).

The single factors are high‑risk factors that cannot be easily combined with other factors. The excluded factors are those with low risk, or, in the case of split time, their effect will be accounted by another factor (split ratio).

Figure 5: Reduction of factors reduces resource required by 50%.
The 19 parameters detailed in Table 3 were risk assessed and a study containing only 7 instrumental factors were created; this was achieved by eliminating some parameters, combining other low risk parameters and combining parameters that were thought to have similar effects or affect different responses. The combination of parameters was quite aggressive to obtain a 7‑factor study where the number of experiments required was reduced. The resolution of the design was also kept to the minimum resolution (III) to reduce the resources required to run the experiment. Taking these steps cuts the total number of experiments required by at least 50% (i.e., 16 experiments; 8 for the instrument factors study and 8 for a further sample preparation factors study) instead of 32 (19 parameter resolution III design) or typically 40 (as sample preparation is usually evaluated separately, ignoring centre points). See Figure 5 for a visual representation of how this resource was reduced.

Figure 6: Alias list for 7 factors resolution III design.
In a resolution III study, the two factor interactions are aliased with single factors. Therefore, it is important to allocate the factors carefully to minimise the confusion this can cause during the analysis of the study if the factor is seen to have an effect. It is advisable to review the alias list prior to the finalisation of the design and assess whether the two‑factor interactions aliased with each single factor could have the same effect on any of the responses that are assessed in the study. If the same effect is likely, then factors should be re-allocated to avoid these situations as far as possible, which will make the analysis of the study easier and lower the likelihood of having to perform further experimentation to de-alias factors. The alias list for this study is shown in Figure 6.

Table 4: Allocation of factors.
Given the symmetry of the aliasing pattern, there is little opportunity for smart allocation of factors. However, from the risk assessment, it is known that some parameters have a higher risk than others; also, as a rule, three-factor interactions are less likely than two-factor interactions (which are less likely than single factors). The factors were allocated as in Table 4 to reduce the complexity of the analysis of the study.

A, B and D are low risk groups, making it easy to discard any two‑factor interaction containing any of these factors.

C, F and G are high risk and all the two‑factor interactions they are aliased with contain either A, B or D; therefore, there should not be any ambiguity in determining the cause of an effect.

Had A, B and C been allocated to the low risk parameters, on the other hand, then D, E and F would have been aliased with one two-factor interaction that did not contain any of these low risk parameters, and is therefore more likely (Figure 6). In this later example, the complexity of the analysis of the study would not have been reduced.

Study results

The statistical design used in the case study had additional complications to that outlined because of the need to block runs (i.e., to reduce the number of column and liner changes) and missing results in the data. For these reasons, modelling of the data was complicated and has therefore not been included in this article because the main focus is the design setup. However, a brief summary of the conclusions from the study is given.

The study showed the method was robust for the responses collected (the method characteristics in Table 3). There was one control identified following the RMR study that was required to minimise variability in the level observed for one of the impurities (B). One of the factors that was most responsible for this variation was the combined injection temperature/initial oven hold time group factor (Group 4). The most likely cause of this variation is the temperature of the injector (240–260 C) rather than the length of time the oven is kept at the minimum temperature of the gradient (140 C) because the impurity is known to be thermally labile.

Figure 7: predicted effect of Group factors 4 and 5.
The plot in Figure 7 visually displays the predicted effect of increasing the temperature of the injector (Group 4) combined with the effect of the other most significant factor (the column - Group 5). Therefore, a tighter control was placed around the injector temperature to minimise the variability seen for this impurity prior to proceeding to full method validation. Without the careful combination of factors that preceded this study, it could have been harder to deduce which parameter was the cause of the variability.

Despite being considered lower risk, a difference was also noticed between the two columns tested Figure 7) with some of the impurities being observed at slightly different levels on the two columns. This illustrates the importance of including parameters in robustness assessment - even if assessed as lower risk. As the age of the column and the column batch were grouped, it is impossible to identify whether the observed difference is derived from column batch-to-batch variability or the performance of a column varying with time. It was recommended that this issue was to be further assessed as part of a method ruggedness study.


The use of risk‑based assessment tools to identify, score and prioritise method parameters coupled with RMR is a novel adaptation of already established techniques. Reduced method robustness provides an effective means of reducing the number of experiments required to assess method suitability and performance. This approach has been successfully applied to assess robustness of a GC‑FID method used for the analysis of a key pharmaceutical starting material. The results from this study allowed the analyst to identify key method parameters by performing 16 experiments instead of the usual 32 or 40. This simple approach can easily be applied to any analytical method and provides an analyst with a checkpoint for progression of analytical methods in drug development. If all of the important parameters are accommodated and testing shows the method is robust then a further robustness study may not be needed when proceeding to full validation.


1. B.A. Olsen, Pharm. Technol., 29(3), Suppl. s14–s25 (2005).

2. T.K. Natishan, American Pharmaceutical Outsourcing, 7(3), 28–35 (2006).

3. L. Mockus and P. Basu, AI ChE Annual Conference Proceedings, 1741–1745 (2004).

4. P. Borman et al., Pharm. Technol., 31(12), 142–152 (2007).

5. M. Schweitzer et al., Pharm. Technol., 34(2), 52–59 (2010).

6. L.D. Torbeck, Pharm. Technol., 20(3), 168–172 (1996).

7. G. Shabir, Pharm. Technol. Eur., 13(12), 72–76 (2001).

8. P.G. Muijselaar, Sep. Sci. and Technology, 9, 145–169 (2008).

9. C. Beaver, Drug Discovery World, 9(4), 21-27 (2008).

10. R. Bergman et al., Chemometric and Intelligent Lab Systems, 44(1–2), 271–286 (1998).

11. C. Ye et al., J. Pharm. Biomed. Anal., 50(3), 426–431 (2009).

12. V.R. Meyer, J. Chrom. Sci., 41(8), 439–443 (2003).

13. D.H. Stamatis, Failure Mode and Effect Analysis: FMEA from Theory to Execution (ASQ Quality Press, WI, USA, 2003).

14. R.L. Plackett and J.P. Burman, Biometrika, 33, 305 (1946).

15. V. Heyden, M.S. Khots and D.L. Massart, Anal. Chim. Acta., 276(1), 189–195 (1993).

16. R. Ragonese, M. Mulholland and J. Kalman, J. Chrom. A., 870(1-2), 45–51 (2000).

17. A.M-F. Laures et al., Rapid Commun. Mass Spectrom., 21, 529–535 (2007).

18. E. Champarnaud et al., Rapid Commun. Mass Spectrom., 23, 181–193 (2009).

19. D.R. Holcomb, D.C. Montgomery and W.M. Carlyle, Quality Engineering, 19(1), 17–27 (2007).

20. D.K.J. Lin, “Supersaturated Designs,” in F. Ruggeri, R. Kenett and F.W. Faltin, Eds, Encyclopedia of Statistics in Quality and Reliability, (Wiley, 2007).

21. Y.V. Heyden et al., Anal. Chem., 72(13), 2869–2874 (2000).

22. B. Dejaegher and Y.V. Heyden, Anal. Bioanal. Chem., 390, 1227–1240 (2008).


The authors wish to thank Luca Martini, Anna Nicoletti and Jill Trewartha who were involved in the GC–FID RMR study mentioned in this article.


blog comments powered by Disqus
LCGC E-mail Newsletters

Subscribe: Click to learn more about the newsletter
| Weekly
| Monthly
| Weekly

What role should the US government play in the current Ebola outbreak?
Finance development of drugs to treat/prevent disease.
Oversee medical treatment of patients in the US.
Provide treatment for patients globally.
All of the above.
No government involvement in patient treatment or drug development.
Finance development of drugs to treat/prevent disease.
Oversee medical treatment of patients in the US.
Provide treatment for patients globally.
All of the above.
No government involvement in patient treatment or drug development.
Jim Miller Outsourcing Outlook Jim MillerOutside Looking In
Cynthia Challener, PhD Ingredients Insider Cynthia ChallenerAdvances in Large-Scale Heterocyclic Synthesis
Jill Wechsler Regulatory Watch Jill Wechsler New Era for Generic Drugs
Sean Milmo European Regulatory WatchSean MilmoTackling Drug Shortages
New Congress to Tackle Health Reform, Biomedical Innovation, Tax Policy
Combination Products Challenge Biopharma Manufacturers
Seven Steps to Solving Tabletting and Tooling ProblemsStep 1: Clean
Legislators Urge Added Incentives for Ebola Drug Development
FDA Reorganization to Promote Drug Quality
Source: Pharmaceutical Technology Europe,
Click here