Drinking from a fire hose

Published on: 

Pharmaceutical Technology Europe

Pharmaceutical Technology Europe, Pharmaceutical Technology Europe-09-01-2008, Volume 20, Issue 9

The pharmaceutical industry is facing the perfect storm. Increasing healthcare costs, a changing regulatory environment and vigorous global competition coupled with the increasing complexity of small molecule and biotech drugs contributing to expensive discovery processes and clinical trials, as well as the resultant manufacturing challenges, all pose major threats to the industry.

The pharmaceutical industry is facing the perfect storm. Increasing healthcare costs, a changing regulatory environment and vigorous global competition coupled with the increasing complexity of small molecule and biotech drugs contributing to expensive discovery processes and clinical trials, as well as the resultant manufacturing challenges, all pose major threats to the industry. The urgency to increase productivity, shorten development cycle time, reduce costs and get it right first time has never been more serious. The old empirical paradigms of guess-and-test and learning by doing no longer work as they used to.

Andreas Kindler/Getty Images

The multipronged strategy

In light of these challenges, regulatory agencies and the industry have radically rethought current practices in product and process development and manufacturing, and pursued a new multipronged strategy that centres around a predictive model-based approach that employs PAT and Quality by Design (QbD) concepts.1 These concepts have largely been captured in the ICH quality guidelines for QbD development (Q8), risk-based development and review (Q9) and integrated quality systems (Q10 step 2).2 The goal is to use the best science, whether it be new or prior knowledge, and to be guided by continual risk analysis and assessment to design and test products, and develop an optimized, flexible control strategy that can robustly produce products with the desired quality attributes. However, this new paradigm poses major modelling and informatics challenges that have not been fully recognized and appreciated by the majority of the pharmaceutical community.

Decision making in product development and manufacturing involves:

  • Integration of process modelling tools.
  • Effective use of laboratory-generated information.
  • Use of knowledge from the scientific literature.
  • Development of technical specifications.
  • Information-knowledge base to satisfy regulatory requirements.

The amount and complexity of the different types of information, ranging from raw experimental data and laboratory reports to sophisticated mathematical models that need to be stored, accessed, validated, manipulated, managed and used for decision making, is overwhelming. Voluminous information is generated in the form of:

  • Raw data from analytical instruments.
  • Pictures from scanning electron microscopes.
  • Experimental setups.
  • Experiment notes and reports.
  • Various calculations from simulation tools.
  • Chemometric models.

The information could also be in different formats, such as plain text files, Word documents, Excel worksheets, JPEG files, MPEG movies, PDFs, mathematical models and so on. Trying to absorb all this information can be likened to drinking from a fire hose.

Lightening the load

A systematic and integrated informatics framework based on explicit models of information is needed,3 as well as tools that support the rapid extraction of mechanistic first principles (i.e., mechanism-based) and knowledge from raw data gathered from PAT-like techniques. These information models must be easily accessible by staff and software tools, and should provide a common understanding for information sharing. Only with such a framework can intelligent, model-based decision support systems be developed to assist real-time decision making for formulation design, scale-up, control, optimization and operations. Thus, getting to the promised land of ICH Q8–Q10 (and thriving there) requires significant investments in pharmaceutical informatics and modelling environments.


Previously, there have been several automation attempts to address various aspects of information management (IM) and decision making, such as expert systems,4–9 laboratory information management systems,10,11 electronic laboratory notebooks12 and content management systems.13 These methods all address different slices of the overall problem, leading to stand-alone systems with limited capabilities that pose problems for integration. Furthermore, little work has been done on supporting the development of mathematical models, which is central to QbD and continual improvement. As the appropriate theoretical frameworks and practical technologies become available, it is time to address the problem in its entirety by developing a comprehensive approach to pharmaceutical informatics.

Informatics: not just IT or IM

The current industrial response to all these challenges seems suboptimal. Even with rapid progress in information integration and sharing in business functions (enterprise resource planning systems), as well as on the plant floor (manufacturing execution systems), the area of process and product development has been largely neglected. There exists many individual islands of automation, but there is no comprehensive, model-based decision support environment that links these islands.

Instead, practitioners must suffice with limited computer-based assistance and great amounts of human intervention to acquire, manage, analyse and interpret the complex product and processing information. This also impedes the transparent transfer of information and knowledge up and down stream for timely decision making, such as capacity planning or ordering APIs for clinical supplies, and increases inefficiencies, uncertainties, costs, delays, and product quality concerns throughout development and manufacturing. It also hampers interaction between process development and business or manufacturing functions, which amplifies the very silos that computerized systems are supposed to overcome.

It is important not to underestimate pharmaceutical informatics as just programming, or to confuse it with information technology (IT). IT in the pharmaceutical industry refers to the support services provided by an IT department (e.g., maintaining hardware, software, networks and cybersecurity). The looming challenges cannot be addressed by kicking the ball over the fence to the IT department and hoping that the IT personnel will somehow take care of it. They cannot because they lack the domain knowledge. It is the same as assuming that transport phenomena problems in chemical engineering that involve equations can be solved by mathematicians. They can provide us with generic concepts, tools and techniques, but it is up to us to suitably adapt these and blend them with our domain knowledge to solve the problem. We need to foster a similar attitude and approach towards developing models of domain knowledge in pharmaceutical engineering in conjunction with IT tools — this is beyond the capabilities of traditional IT.

Another area of potential confusion is IM: an important component of informatics. Aspects of informatics that involve analysing and using the managed information together with predictive models are beyond the role of IM as it is currently practised; for example, the interface with sophisticated models that are both used for and refined by monitoring of process data requires an additional level of complexity beyond the role of IM. We need to ensure that there is transparency between IM systems and the many other systems that make up the informatics suite of resources.

This is an all too familiar gap that often goes unrecognized until an IM system is acquired, after which patches must be purchased or generated with all of the associated problems. As the use of prior knowledge gains importance in both designing and justifying decisions for our products and processes, the retrieving and reconciliation of our information may become the most contentious problem we face in the QbD reality. IM cannot address the issue alone.

Ontology-based information and knowledge modelling

All these challenges provide us with great opportunities for making groundbreaking contributions and forming new business ventures in high-end modelling and informatics products/services. One such contribution is ontological informatics.

Information can be divided into two types: unstructured and structured. Information that can only be processed by humans, such as experimental results reported in Microsoft Word documents, is categorized as unstructured information. This information cannot be used directly by software tools for information processing or drawing inferences. For it to become machine processable, it has to be in a syntax that is semantically rich and understandable by machines and humans. This is structured information. Examples are metadata for files, such as the predefined set of terms to describe the title, subject, author and other information, and data generated from instruments, which are typically in tabular form with a specified meaning for each column. Formal information models are the foundation of structured information.

Figure 1

In our information-centric approach,3 information is modelled using an ontology (Figure 1), a formal and explicit specification of a shared abstract model of a phenomenon through identification of its relevant concepts.14 An ontology defines and semantically describes data, information and models. It is also the basis for modelling different forms of knowledge.

Figure 2

We have focused on two different types of knowledge: mathematical knowledge, which is concise, precise and abstract, and knowledge to guide decision making, which includes decision trees and heuristics that can be modelled as guidelines.

To illustrate these concepts, consider defining an ontology for pharmaceutical materials. Figure 2 provides a simple example of material ontology through the description of a material, which has several properties and property values that are determined from experiments. By capturing the property and, explicitly, where it lives in relation to its uses, other properties, materials and uses, one can immediately capture the potential contribution of the property and its value to any other part of the network. Compared with a database schema that targets physical data independence and an XML schema that targets document structure, an ontology targets agreed-upon semantics of the information, and directly describes the concepts and their relations. The web ontology language (OWL) is used in this work to create ontologies.

On the go...

Based on these concepts, we have developed the Purdue Ontology for Pharmaceutical Engineering (POPE) informatics infrastructure to support key decisions that span the entire process, including product portfolio selection, capacity allocation, pilot plant operation, drug product formulation, process simulation, production planning and scheduling, process safety analysis, and supply chain management for API and drug products.3,15 Figure 3 shows the overall organization of POPE.

Figure 3

Mathematical knowledge modelling

A large amount of knowledge used in pharmaceutical product development and manufacturing is in the form of mathematical equations, which we refer to as mathematical knowledge. Compared with other forms of knowledge, such as rules and guidelines, mathematical knowledge is more abstract and highly structured.2

Most of the mathematical knowledge is either embedded in specific software tools, such as unit operation models in simulation software, or entered into a mathematical tool following a specific syntax, such as MATLAB or Mathematica. Therefore, users must be familiar with the syntax designed for the solver being used. However, much of this knowledge concerns specific applications and is expressed procedurally rather than declaratively. As a result, it is difficult to use knowledge that is so embedded for automated decision support. While this may seem like a minor inconvenience, when trying to represent information or knowledge across functions, it must be presented in the form that will be understood by the programme or person using it. Otherwise, at worst, the information will not be used or, at best, it will take significant effort to incorporate it into the decision making process. Such communication issues are often major road blocks in achieving the ICH promise.

Thankfully, recent progress in information technologies is transforming how mathematical knowledge is modelled, communicated and applied. Mathematical knowledge management, a new interdisciplinary field of research, has attracted researchers from mathematics, computer science, library science and scientific publishing.16 Marchiori provides a general account of technologies such as XML, RDF and OWL to foster the integration of mathematical representation and semantic web.17 By doing so, it is becoming possible to integrate various mathematical sources, search globally for existing models, associate metadata as context and integrate these with other forms of knowledge.

By exploiting these developments in our approach, the declarative and procedural parts of the mathematical knowledge are separated. The declarative part consists of the information required by the model to run, information generated from the model and the model equations. The variables used in the model equations are formally defined. A model ontology formalizes information that is related to a model, including input, output, assumptions and equation sets. The model ontology allows representation of mathematical equations in a form that is independent of the solver (e.g., MATLAB, Mathematica and Excel). The solution of the model equation is governed by the context in which the model is used. A specific model is created as an instance of the model ontology. Mathematical markup language (MathML), which is based on XML, is used to describe mathematical equations.

The procedural part consists of solving the model equations. Mathematica is used as a general purpose solver. It has several features that can be used directly in the proposed approach:

  • Symbolic processing capability that handles equations in MathML formats without translating into procedures.
  • Extensibility with programming languages, such as Java, that facilitate communication between the Mathematica kernel and the engine.
  • Web enabled, which provides an environment to access the functionalities of the mathematical models through a web browser.

Each model is an instance of the model class of the model ontology (Figure 4).

Figure 4

This approach provides a systematic method for model creators to describe the models in terms of equations with the help of the intuitive and visual equation editor, and the variables that are described using the ontology and linked to the information resources. The MathML description of the equations and ontologies provides an open and solid foundation for the reasoning engine to understand the equations and variables, along with the links to access the values of the variables during execution of the model.

Ironically, mathematical knowledge management, which many view as the most foreboding source of the development landscape, may be the first piece integrated into the overall knowledge management goal of ICH Q10, and serves as a possible template to tackle other required elements. Other sources of knowledge will be treated in an analogous manner conceptually. The details will, of course, differ, but the ontological approach provides the common framework.


Product development and manufacturing is a complex, information-intensive decision making process involving a staggering amount of different types of information. We believe the pharmaceutical industry has not fully recognized the modelling and informatics challenges we face as we embrace the new QbD framework based on predictive models. The new era requires a fresh approach and significant investments in model-based informatics environments.

We are trying to address this by developing a novel informatics framework, centred on ontologies, by exploiting recent progress in information and knowledge management methodologies and tools. This ontological informatics infrastructure is the dawn of a new paradigm for representing, analysing, interpreting, managing, and using large amounts of complex and varied data, information and models. Considerable intellectual and implementation challenges lay ahead, but the potential rewards will completely transform how we perform pharmaceutical product development and manufacturing in the future.


1. FDA — Pharmaceutical Quality for the 21st Century: A Risk-Based Approach, Progress Report, May 2007. www.fda.gov

2. International Conference on Harmonisation, Guidelines reports, 2007. www.ich.org

3. C. Zhao et al., Journal of Pharmaceutical Innovation, 1(1), 25–35 (2006).

4. R.C. Rowe and R.J. Roberts, "Expert Systems in Pharmaceutical Product Development" in J. Swarbrick, Ed., Encyclopedia of Pharmaceutical Technology 2nd Edition (Marcel-Dekker, New York, NY, USA, 2002) pp 1188–1210.

5. K.V Ramani, M.R. Patel and S.K. Patel, Interfaces, 22(2), 101–108 (1992).

6. F. Podczeck, Proceedings of the 11th Pharmaceutical Technology Conference, 1 240–264 (Manchester, UK, 1992).

7. B. Skingle, Proceedings of the 10th International Workshop on Expert Systems and Their applications, 907–922 (Avignon, France, 1990).

8. M. Wood, Lab. Equip. Digest.,17–19 (1991).

9. R.C. Rowe, Manufacturing Chemist, 67(10), 21–23 (1996).

10. C. Paszko and C. Pugsley, Am. Lab., 9 38–42 (2000)

11. Z. Grauer, Z. Am. Lab., 9, 15–18 (2003).

12. M. Zall, Chem. Innovat., 31(2), 15–21 (2001).

13. M. Noga and F. Kruper, Lecture Notes in Computer Science, 2487, 252–267 (2002).

14. T.R. Gruber, Knowl. Acquis., 5(2), 199–220 (1993).

15. V. Venkatasubramanian et al., Computers and Chemical Engineering, CPC 7 Special Issue, 30(10–12), 1482–1496 (2006).

16. W.M. Farmer, ACM SIGSAM Bulletin, 38(2), 47–52 (2004).

17. M. Marchiori, Lecture Notes in Computer Science, 2594, 216–224 (2003).

Venkat Venkatasubramanian is Professor of Chemical Engineering at Purdue University (IN, USA).

Kenneth R. Morris is Professor of Pharmaceutics at University of Hawaii (USA).