Strategic Management is Growing Increasingly Important for Larger Quantities of Generated Data

In the Lab eNewsletter, Pharmaceutical Technology's In the Lab eNewsletter, November 2021, Volume 16, Issue 11

The increased flow of data from laboratory studies requires a data management strategy to optimize study success.

The use of data has grown increasingly crucial to both the molecular development and manufacturing process development of new biologic drugs. With the growing use of automated platforms that generate vast amounts of data, some in real time, the issue of managing data to arrive at meaningful evaluations is concurrently more challenging.

Automation in the research lab

In the industrial and clinical life sciences laboratory, reliance on manual lab protocols becomes impractical and inconvenient. The growing use of automated platforms in the lab can streamline operational efficiency as well as lend stronger sterility assurance by minimizing the amount of human interference in sampling and running analyses. Another meaningful benefit of automation in the lab is the reduction in time during drug discovery processes or data generation for product characterization. Automation can improve reproducibility, researcher efficiency, clinical translation, and safety (1).

With the use of automation, data generation happens much quicker, and the vast amount of data generated requires more sophisticated data management. Integrating data systems via software systems has become more of a requirement. Software systems such as laboratory information management systems (LIMS), electronic laboratory notebooks (ELNs), sample management solutions, and hybrid systems have been made available to coordinate these data (2). Further, a growing number of laboratories now include automated computer-controlled instruments that generate data in various formats (2).

Although solutions exist for managing the breadth of data, they are limited by the fact that data are often in proprietary formats, making it difficult to connect with other systems. While vendors may sell end-to-end solutions, such integrative systems are typically costly to implement in terms of time, investment, and infrastructure (2). While most commercial systems are designed for bigger labs or production laboratories, for studies conducted in academic research labs, parameters are subject to change due to high staff turnover, which can have an effect on how assays are performed and observational outcomes are reported. Recently, ELNs have been shown to adapt better to a changing environment than LIMS, but they remain an incomplete solution to current data management needs. Particularly in research labs, data still need to be managed and shared even as standardized protocols are being developed (2).

In addition, the use of bioinformatics in defining experimental parameters poses further challenges to data management because data need to be shared prior to concluding study results. What is needed is management of data at both the production-laboratory level and research-laboratory level.

Generating the data

Scientific research has shifted in recent years from being process driven to being data driven, and scientists are now able to infer the results of their studies through data analysis because more data are generated nowadays (3). Managing the data generated from research remains a challenge. “R&D generates enormous amounts of not only operational data, but also high-value scientific data, all stored mostly in isolated databases throughout a vast array of locations and systems,” says Bob Voelkner, vice-president, Sales and Marketing, LabVantage Solutions. “The data, which can be in unstructured and structured formats, are very difficult to manage and maintain,” he confirms.

Voelkner also notes that accessing all the data and combining them into one system can be very complex. “R&D labs are also faced with the struggle of securing distributed system data stores and later finding and using that data,” he states.

The use of automated platforms in laboratories has helped by harmonizing and structuring the data and making them available, Voelkner points out. Making useful data accessible is an important factor in making informed data-driven decisions. Automation alone, however, is not the full solution to lab data management, Voelkner cautions. “There can be potential data integrity issues in accessing the data. An automated platform requires processes and controls around it for a complete solution,” he adds.

There still remain significant challenges to data-driven research, however, starting with the fact that such generating data-rich research is time consuming, consuming as much as 80% of a researcher’s time on mundane data processing tasks (3). Secondly, data exchange is inefficient when reliant on manual curation on one end and manual data extraction at the receiver’s end. Such data exchange processes create inefficiencies and opens opportunities for transcription errors. Finally, a lack of reliability still exists. For instance, with the increasing use of artificial intelligence and machine learning models as a part of decision making, data errors can impact modeling results, potentially compromising reproducibility. A published report has suggested that 50% of deep learning experiments in pharmaceuticals were not reproducible. Of those, 25% were attributed to data integrity (3).

Strategizing data management

The vast amounts of data now able to be generated via automated lab platforms have made it more vital to create a data management strategy. According to Voelkner, having deep knowledge on how an organization can make the best use of its data, who needs access to that data, and who the data needs to be shared with outside R&D are important considerations. “This will drive which technology and approach is the best solution for a specific organization. Eliminating accessibility issues with siloed data and moving toward a harmonized solution is solved with an analytics solution that can bring value to an organization’s data,” he states.

Not having a well-thought-out data management strategy can become a barrier to lab data workflow, impeding the success of analytical studies. “Knowing where your data is, how your data will be used, and by whom is key to success,” says Voelkner.

Voelkner points out that leading-edge companies have recognized the fact that they have silos of valuable data that are difficult to access and make meaningful use of. He recommends that establishing a centralized laboratory informatics solution that can support an organization’s enterprise R&D workflow and data-capture needs as a first step in tackling data management issues. Next, an organization should utilize a powerful analytics solution that can ingest data from multiple sources. This will allow the organization to consolidate its data into a single data lake, to which business intelligence tools can be applied to provide context and visualization. “Additionally, the analytics tool should apply machine learning and artificial intelligence algorithms to predict possible outcomes and make informed decisions,” Voelkner states.


1. I. Holland and J.A. Davies, Front Bioeng Biotechnol. 8, 571777 (2020).
2. K.A. Hobbie, et al., J Lab Autom. 17 (4) 275–283 (2012).
3. R. Brown, “The Value Of Lab Data Automation To Facilitate Data-Centric Research,” Bio-IT World, June 30, 2021.

About the author

Feliza Mirasol is the science editor for Pharmaceutical Technology.