Increased Data Output is a Double-Edged Sword in Drug Discovery and Manufacturing

Published on: 
Pharmaceutical Technology, Pharmaceutical Technology, February 2022 Issue, Volume 46, Issue 2
Pages: 36–39, 47

Automated analytical workflows require advanced technologies to manage the higher volume of data output.

Advancements in analytics have significantly increased the amount of data that is generated throughout the biologic drug lifecycle. This increased data flow presents challenges to managing the information. Managing the increased output is best achieved with the help of automated analytical workflows, which can organize data collection and streamline analyses.

Incorporating automation into the workflow

The biopharma industry has seen a significant increase over the past few years in the complexity and variety of biomolecular therapies. Manufacturers have found themselves often facing situations in which existing platforms and prior knowledge no longer apply to newer molecular formats, says Xiao Dong, marketing manager, Biopharmaceutical Business, Waters Corporation. As a result, larger-scale analytical testing is needed, especially during earlier-stage development, to minimize the risks associated with manufacturability and quality, Dong states.

“Automated analytical workflows allow manufacturers to address the gap between existing analytical capability and increased testing needs by reducing resource costs and improving turnaround time. In addition, automation improves data quality and reduces FTE [full-time employee] resource needs, leading to cost benefits as well as reduced compliance risks,” says Dong.

The drug discovery and development process involves a series of complex stages that take 12–15 years to commercialize and cost between $0.9 billion and $2 billion, with a low success rate, says Anis H. Khimani, PhD, senior strategy leader, Pharma Development, Life Science, PerkinElmer. “Workflows within every stage contribute to large volumes of data that require management and interpretation, for both small- and large-molecule drug development,” Khimani says.

Over the past two decades, the discovery and development of large-molecule therapeutics has seen significant progress in the areas of target characterization, screening, process optimization, automation, and analytics, Khimani explains. Implementation of automation and analytics across biologics workflows has played a key role in this success; they have enhanced productivity and are overcoming big data management challenges.

“Automation and data management deliver accuracy and reproducibility as well as enhance data quality and biological relevance. Furthermore, more recent advancements in artificial intelligence (AI) and machine learning have enabled predictability and process design,” Khimani points out.

In the drug target identification (ID) and drug discovery space, next-generation sequencing (NGS) has revolutionized target discovery across the genome, exome, and transcriptome paradigm, while bioinformatics capabilities have enabled panel design for specific disease areas and the screening of disorders at the population genomics level, Khimani emphasizes.

Moreover, automation solutions in liquid handling and nucleic acid sample preparation continue to empower scientists in achieving higher throughput, accuracy, and reproducibility along the target ID workflows, Khimani adds.

“Drug discovery involving large-molecule screening has benefited significantly from automation and leveraging analytics. Integrating a workflow through interface and protocol configurations enables each step involving related instrument(s), software, and consumables to offer a seamless process, data generation, data interpretation, and biological evaluation,” he states.

Regarding protein characterization, the implementation of automation workflows and AI tools has accelerated protein characterization and process development by enabling critical quality attributes (CQAs) and streamlining integration. According to Khimani, capillary electrophoresis (CE) platforms with higher throughput, for example, have enabled lower sample requirement, superior resolution of isolated/separated proteins, and reliable characterization.

For process development and scale up, it is important to remember that CQAs for biologics involve cell culture and production process parameters as well as storage conditions to ensure stability along the workflow. “In this context, an important data analytic to monitor is long-term stability on three production batches. Temperature

mapping services (e.g., OneSource, PerkinElmer), can help biopharma operators monitor variations and maintain consistency supporting good laboratory practice (GLP)/good manufacturing practice (GMP) compliance,” according to Khimani.

Meanwhile, fully automated reports and advanced techniques, such as wireless temperature probes, reduce human error, while software solutions support data visualization of stability studies. Other quality control measures, such as impurities testing, can be achieved using analytical tools such as chromatography and atomic and molecular spectroscopy, Khimani says.

Managing increased data output

Automation technologies are also key in helping to collect and organize increased data output, which facilitates data analyses. The integration and automation of workflows results in enhanced data output at every step, from configuration of integration through metadata management and eventually data analysis, visualization, and interpretation, Khimani points out. Over the past decade, the adoption of electronic lab notebooks (ELN) have facilitated the migration of data-related content from paper notebooks to electronic formats. These electronic formats manage unstructured and structured data and provide for subsequent analysis and visualization.

“The clustering and visualization of vast volumes of data enable scientists to gain biological insights into target characterization and validation. There has been increasing awareness that identifying and validating the right target upstream is critical to the screening and development of drug candidates downstream to further enhance the success rate of therapeutic molecules,” Khimani explains.

Advanced analytics technologies (e.g., TIBCO Spotfire, PerkinElmer) offer data analysis and visualization capabilities for this workflow; these capabilities play a critical role in decision-making. Additionally, ELN technologies (e.g., Signals Notebook, part of the new PerkinElmer Signals Research Suite) can enable audit trail and security features to support 21 Code of Federal Regulations (CFR) Part 11 (1) compliance within the biologics GMP environment. “Huge volumes of data either require database capabilities or a cloud environment that can be maintained on site or outsourced via third-party informatics solutions. Implementation of Biopharma 4.0 is being adopted and driven by the principles of Industry 4.0 to improve efficiency, facilitate real-time monitoring, reduce cost, enhance predictability, and allow for better decision-making within bioprocess workflows,” emphasizes Khimani.

Some of these advanced automated technologies are leveraged both upstream in drug discovery and development as well as downstream in bioprocess development and manufacturing; however, the requirements differ, Khimani points out. He explains that discovery workflows upstream seek flexibility and an open format to organize and develop protocols followed by management of structured data. “Research scientists may need to vary experimental conditions to optimize protocols into SOPs [standard operating procedures] as well as analyze a wide array of data for biological relevance,” he says.

An ELN notebook (e.g., Signals Notebook, PerkinElmer), for instance, provides the latitude to manage metadata and analyze various groups of experimental data into visualization formats to enable decision-making. Biopharma research groups leverage the capabilities of ELN to optimize workflows to enable discovery and validation of disease-specific targets as well as the development of large-molecule therapeutics.


Meanwhile, further downstream in drug development, these advanced technologies also facilitate validation and security requirements within the GLP and GMP environments. ELN notebooks (e.g., Signals Notebook, PerkinElmer) can offer security-driven administration and audit capabilities. In the manufacturing environment, scale up for higher production volumes and GMP compliance are key. “Therefore, downstream it is important for analytics technologies to support automation, as well as provide electronic compliance supporting capabilities for biopharma to be able to meet regulatory requirements. Bioprocess workflows have continued to migrate from a batch process to continuous manufacturing; hence, there is a constant demand in this space for technologies to offer real-time monitoring and accuracy to maintain production that is devoid of errors and meets scale, sterility, and safety requirements,” Khimani says.

Gunnar Schaefer, co-founder and chief technology officer of Flywheel, a research data management platform, notes the importance of having end-to-end solutions to streamline data ingested from multiple sources and curate it to common standards. Focusing on the facilitation of large-scale organization, processing, and analyses of complex data, such as annotated biomedical images frequently collected as part of clinical trials, can enable real-world data collaboration, machine learning, and AI development, all of which can accelerate drug discovery, Schaefer says.

Historically, for instance, most data assets collected in clinical trials were archived and largely forgotten, often stored with partners offsite, Schaefer observes. “With the broad adoption of machine learning, these data assets have gained a second life. However, many data archives are offline or otherwise inaccessible, poorly organized, and seldom quality-controlled,” Schaefer states.

Addressing those problems head on, starting with flexible and scalable data ingestion and establishing a de-identification framework as well as having a customizable and multi-modal data processing automation and structured annotation options, goes a long way towards optimizing analytical workflows. “In concert, these capabilities accelerate [the] path to meaningful data at scale,” Schaefer explains.

Khimani adds that bottlenecks in data collection and processing exist at every step of the workflow, with greater challenges arising when higher volumes of data are generated from automation and real-time monitoring of the process. He points out that multiple factors contribute to data overload in the process development and scale-up environment, such as the evaluation of process parameters, raw material requirements, and monitoring along the workflow, including product stability, media components, and microbial presence. Additionally, data across the workflow is obtained and sorted manually using accessory tools or devices.

“Further contributing to bottlenecks is the use of complex spreadsheets. [In addition,] locally developed or third-party software may be utilized to collect and process data. These programs can be labor- and time-intensive and may result in errors,” Khimani says.

“Automation across workflows with centralization of informatics tools to manage data can alleviate these bottlenecks,” Khimani continues. “Subsequently, data can be structured, tracked, and processed for visualization, as well as archived for future reference.” Furthermore, adoption of the holistic automation tenets of Industry 4.0 with enhanced process technologies, as well as advanced digital solutions, can alleviate numerous bottlenecks. Having an automation roadmap such as this aims to deliver enhanced robustness and reproducibility of quality metrics and can significantly reduce cost, provide flexibility, and enhance productivity, Khimani concludes.

Facilitating implementation

Automating the entire analytical workflow often requires a manufacturer to integrate solutions from different vendors. This often requires steep learning curves, cautions Dong. Thus, working closely with different vendors while having a cross-functional team that can work on data and system integration are essential requirements.

“Automated platforms need to be flexible, because the number of samples can vary. The platforms themselves should not be complex or hard to operate as many of these scientists are not automation engineers,” explains Dong. “Easy-to-use platforms with simple, user-friendly informatic interfaces will be critical to successful implementation. Additionally, it is often a good practice to start implementing automation in non-GMP development settings before GMP manufacturing, which has higher compliance needs.”

Furthermore, automated analytical workflows require access to large-scale, standardized, quality-controlled, and expert-labeled datasets, adds Schaefer. At the same time, data privacy and security requirements are becoming ever stricter. A comprehensive data management and sharing framework would allow researchers to focus on new discoveries, rather than solving information technology problems, Schaefer notes.

Although the level of automation and data management across analytical workflows differs between a laboratory setting and a bioprocess environment in terms of the scale and compliance requirements, there are important common themes, says Khimani. These common themes include streamlined workflows, standardization of protocols, and reproducibility.

The main components required to allow the implementation of fully automated analytical workflows, either in the laboratory setting or in a commercial biomanufacturing setting, include workflow integration, informatics capabilities, standardization, quality control programs, and digitization, Khimani states.

Workflow integration includes a customized configuration using informatics capabilities, which translate protocols interfacing from one step—or from an analytical instrument of the workflow—to the next. This configuration is designed to enable walk-away automation with the least manual intervention, according to Khimani.

Khimani also explains that, at an enterprise level, an organization requires informatics solutions that are either cloud-based, local databases, or laboratory-level laboratory information management systems (LIMS) platforms for the management of various instrumentation and data communication. Standard software tools to enable data analysis and visualization are useful at the end-user level, Khimani points out. A laboratory informatics footprint allows security-based central monitoring of analytical workflows, data management and archiving, root cause analysis (RCA), and enhanced productivity and cost-savings.

Having reference standards programs across hardware calibration, reagents, and consumables can establish uniformity across workflows. Most importantly, such standards enable reproducibility, reliability, and monitoring of results for errors and deviation. Timely qualification and calibration of instruments and tools is critical to maintaining reliability. The use of appropriate reference standards or controls for assay and diagnostic reagents establishes quality control for the consumables and the entire assay, says Khimani.

Ensuring monitoring as well as identification and control of any deviations for quality control requires a disciplined and systems approach that proactively establishes workflow and process design requirements (e.g., quality by design established within drug manufacturing) across laboratories and scale-up environments. Any further deviations can also be managed via RCA. Moreover, in biomanufacturing, RCA, can point to whether critical process parameters (CPPs) and CQAs are being met.

In addition to informatics capabilities, empowering the laboratory—and the bioprocess ecosystem in particular—with digitization and AI capabilities enables continuous manufacturing, real-time monitoring, process predictability, and superior control of the workflows, emphasizes Khimani. He explains that data captured at every step from this streamlined digitization can be analyzed to develop both real-time and predictable models to ensure CPPs and delivery of CQAs.


1. CFR Title 21, 11 (Government Printing Office, Washington, DC).

About the author

Feliza Mirasol is the science editor for Pharmaceutical Technology.

Article Details

Pharmaceutical Technology
Volume 46, Number 2
February 2022
Page: 36–39, 47


When referring to this article, please cite it as F. Mirasol, “Increased Data Output is a Double-Edged Sword in Drug Discovery and Manufacturing,” Pharmaceutical Technology 46 (2) 36–39, 47 (2022).