|Email Newsletters from Pharmaceutical Technology and Pharmaceutical Technology Europe|
Providing the latest business, scientific, and regulatory news for the pharmaceutical and biotech industries.
News from Europe's pharmaceutical manufacturing industry coupled with upcoming events, and exclusive articles and interviews from industry experts.
Multivariate data analysis easily retrieves insight from a wealth of data
Traditionally, studies are performed and analysed following a univariate (one variable at a time) approach. The big advantage of this is that simple 2D plots can be used to assess cause and effect relationships, and the corresponding statistics are straightforward too. The production environment, however, is never univariate, and interactions between parameters should be expected. In the pharmaceutical arena, this situation has been well recognised — guidelines such as ICH Q8 on Pharmaceutical Development and ICH Q10 on Pharmaceutical Quality System explicatively mention the multidimensional design space in which product performance should be tested to assure quality.1,2 In this context, trend analyses of the manufacturing process performance and its products have been mentioned as an important tool for innovation and continuous improvements.3 A potential complicating factor with multidimensional data, however, is that it is not possible to visually inspect such data and so other ways are needed to represent the results.
From dull tables to essential information
For a process running for several years, a wealth of data is usually stored in databases containing continuously measured data and routine-based quality control (QC) data, but it is extremely difficult to obtain useful information from such an intimidating amount of numbers and other data. Multivariate data analysis (MVA) is an approach that converts data into knowledge by using data exploration techniques, without narrowing down solely on allegedly unknown aspects. Representing this knowledge for human interpretation can be done visually.
Historical data can be analysed using MVA to learn from the past, which can be useful to solve current problems, avoid future ones or to make a validation study of a similar production process or compound quicker and cheaper. Analysing historical data can also avoid, or shorten, new studies, which are often expensive. When visualised properly, extended sets of data, such as dull and perhaps confusing tables, can be changed into spatial representations that clearly depict essential information that is not visible a-priori. The methods are widely applicable and can be used, for instance, for measuring the quality and authenticity of samples, or for monitoring a production process.
MVA for a pharmaceutical quality system
Case study: exploiting historical data for a pharmaceutical formulation
If data are highly correlated, only a few principal components (PCs; linear combinations of the original variables) are needed to reproduce the original data sufficiently. In this example, the first two PCs (PC1 and PC2) describe 38% of the data that were originally described by 25 variables. PC1 explained 25% of the total variance in the data set, and 13% of the variance was explained by PC2. The PCA results are visualised in a biplot, in which both scores and loadings are plotted. Figure 2 presents these results for three selected variables: thickness, yield and water content. The dots reflect the scores and the red triangles reflect the loadings. The first step is to look at the scores.
The biplot can help interpret the trends seen. Variables that point in the same direction show a high-positive correlation, whereas variables that are in the opposite direction reveal a high-negative correlation. Variables that are plotted perpendicular (orthogonal) to each other are uncorrelated. The higher the loading, the more the variable contributes to the PC. For instance, a high loading is given to the variable yield on PC1; the loading for yield points out in the direction of the LD samples, which means that LD samples have higher values for yield than HD samples. Therefore, yield is negatively correlated with dose.
In the opposite direction, high loadings are found for thickness: the higher the dosage, the thicker the tablet. Therefore, thickness is positively correlated with dose. As a result, yield and thickness are negatively correlated.
Water content is located right in the middle, meaning that water content has no contribution to the separation between doses nor to the grouping of the kneading time. It also reveals no correlation with yield or thickness. Therefore, alterations in water content do not contribute to yield.
So the visualised PCA results provide information on structures in the data and on correlations between variables such as yield, thickness and water content, as well as sample properties such as dosage or kneading time. This can help formulate new strategies to improve the production process. However, there may be a risk of jumping to conclusions too quickly.
For the lowest dose [LD; Figure 3(a)], the variables yield, water content and thickness point in the same direction, indicating that there is a very high correlation between the three. However, the loadings of these variables are orthogonal to PC1, meaning the variables have no contribution to the separation between short and long kneading time in PC1.
A different correlation structure is revealed for the mid-dose group [MD; Figure 3(b)]. Water content is orthogonal to yield and thickness, meaning there is no correlation between water content and the other two parameters. Yield has a positive loading on PC1, which is the direction of difference in kneading time. Therefore, for MD, it can be seen that long kneading times correspond to a high yield, which was not seen for LD.
Finally, a different correlation is seen for the highest dosage [HD; Figure 3(c)]. As with LD, yield and thickness point out in the same direction, which means there is a positive correlation between the two parameters. Conversely, water content is partly negatively correlated to yield and thickness as it is points out in the opposite direction. The angle is between 90° and 180°, so it is not completely orthogonal to yield and thickness. The relation of thickness to the separation in kneading time is opposite to its relation for MD: a longer kneading time correlates with thicker tablets than for a short kneading time.
Straightforward, understandable and cheap
The above case study demonstrates the advantage of using historical data to better understand the correlation between variables in the process of pharmaceutical tablet production using PCA as an analysis tool. Based on only a few plots, which were generated in close cooperation between the statistician and the technological expert, insight was generated into the behaviour of the production process for the different doses and leads were identified as to how to increase the yield of production.
By enabling the visualisation of complex relationships, MVA allows processes to be better understood. As a consequence, new development strategies or adapted processing can be identified for product, process or quality improvement (corresponding to ICH Q10).
PCA is not the only possible tool that can be used. Regression and/or classification analyses using, for instance, Partial Least Squares (PLS) regression, are extremely useful in cases where process measurements have to be related to product quality (such as tablet dissolution). These MVA analyses can be performed easily and quickly, with only minor data requirements.
The advantage of using historical data or routinely measured QC data is that they are readily available, and often mean that new studies are not required or can take place on a smaller scale. In addition, the outcomes are straightforward and intuitively understandable. Although in practice historical data sets may be missing data entries for specific batches or parameters, a great deal of information can still be obtained if this is accounted for.
This work has been performed under the framework of the Dutch Top Institute Pharma (project D6-203).
Carina Rubingh is Biostatistician in the group of Analytical Information Sciences at the Department of Analytical Research, TNO Quality of Life (The Netherlands).
Kees van de Voort Maarschalk is Director Oral and Polymeric Product Development at Schering Plough (The Netherlands) and Professor Industrial Pharmacy at the University of Groningen (The Netherlands).
Uwe Thissen is Project Manager and Senior Scientist in the group of Analytical Information Sciences, Department of Analytical Research,
TNO Quality of Life, Business Unit Quality and Safety, PO Box 360, NL-3700 AJ Zeist (The Netherlands). Tel. +31 30 694 4002
1. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use, ICH Q8 Pharmaceutical Development. www.ich.org
2. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use, ICH Q10 Pharmaceutical Quality System. www.ich.org
3. W.R. Dillon and M. Goldstein, Multivariate Analysis, Methods and Applications (John Wiley & Sons, NY, USA, 1984).
4. H. Martens and T. Naes, Multivariate Calibration (John Wiley & Sons, Chichester, UK, 1989).
5. D.L. Massart et al., Handbook of Chemometrics and Qualimetrics: Part A (Elsevier, Amsterdam, The Netherlands, 1997).
6. B.G.M. Vandeginste et al., Handbook of Chemometrics and Qualimetrics: Part B (Elsevier, Amsterdam, The Netherlands, 1998).
7. D.L. Massart, Y. Vander Heyden, LCGC Europe, 17(11), 586–591 (2004).