Dealing with Data: How to Keep It Manageable

Published on: 
In the Lab eNewsletter, Pharmaceutical Technology's In the Lab eNewsletter, September 2023, Volume 18, Issue 9

Advanced analytical tools generate more data in today’s labs than ever before.

In today’s biologics/contract research organization (CRO) labs, more data than ever are being generated on protein characterization, thanks to advanced and more thorough analytical technologies. Managing these volumes of data, especially in the clinical phases of drug development, has become a real challenge for researchers.

Clinical trial data are at the heart of how biopharma companies develop new therapeutics for commercial launch, but developing a new drug and bringing it to market can take a decade or more and cost upwards of a billion dollars (1). Outsourcing clinical trial services has become a common practice to save a biopharma company on cost and time, but the practice brings up risks in data transfer as well as data management.

Laboratory issues can result in a delay of data receipt as well as poor quality of data, which can lead to rejection or rework. Data entry issues, delays in responses to data queries, errors during data transfer and merger, and discrepancies due to poor database setup can all also be the cause of delays (1).

A discussion with David Hardy, PhD, senior manager, Data Analytics and AI Enablement, Digital Science Solutions, Thermo Fisher Scientific, explores best practices for how to approach data management in today’s biopharmaceutical lab and highlights some effective strategies for distilling meaningful interpretation from these data.

Handling the data flow

BioPharm: Although solutions such as laboratory information management systems (LIMS) and the electronic lab notebook (ELN) have helped to address the vast amounts of data being generated in biologics/CRO labs, what are the biggest challenges facing LIMS and ELN?

Hardy (Thermo Fisher Scientific): The biggest challenge when it comes to LIMS and ELNs is ensuring that these systems fit seamlessly into laboratory processes to make sure they are used throughout an organization, at which point they can be better used to unlock deeper data-driven insights. Usability can also be a challenge of many LIMS and ELN solutions, but it is critical for company-wide, successful adoption.

Another important challenge is that, while LIMS and ELN can help ensure data quality after data capture, they can do little to rectify errors made through poor quality input data. To help improve data quality at the point of data generation, labs should seek to use automation wherever possible.

BioPharm: Do automation and digitalization truly offer solutions that allow scientists to analyze these data in a meaningful way, or are there as much cons as there are pros to using such advanced technologies for the data scientist?

Hardy (Thermo Fisher Scientific): The pros and cons of using automation and digitalization tools to analyze large amounts of data are the same across all applications, including biologics. For example, biologics labs still need to conduct initial validation of their tools and constantly check them, requiring close collaboration between the laboratory and data scientists. However, over time, the pros of advanced digitalization and automation tools—greater accuracy, reproducibility, efficiency, and deeper insight—vastly outweigh any initial cons.

Importantly, even with advanced technologies, scientists face the hurdle of understanding their data and which data fields to use. Then, they need to be clear on which model best suits the problem they’re trying to solve. This isn’t always easy but analytics solutions are striving to make this simpler.

Employing good habits


BioPharm: How has the user (i.e., the scientists) experience been affected or changed by the amount and quality of data being generated and the ability to manage that data?

Hardy (Thermo Fisher Scientific): By their nature, scientists are data savvy. However, with the sheer amount of data that labs generate, as well as the need to manage these data with advanced technologies, scientists have had to become more tech savvy. For example, most scientists have the knowledge and skills to check and analyze their data using languages such as R, Python, and tools such as KNIME [Konstanz Information Miner]. Given the growing importance of these skills, they are now widely taught to undergraduate students.

Scientists have also embraced ‘big data,’ adopting new technologies to such an extent that big data isn’t the obstacle it once was. Given the increasing importance of technology in handling and processing such data, scientists now commonly work alongside their data science, IT [information technology], and TechOps [technical operations] colleagues to find the optimum and most cost-effective solution, which was less common in the past.

Beyond data capture, storage, and analysis, labs increasingly need to be able to use their data to their full potential. Accordingly, we’ve seen scientists increasingly focus on the FAIR (findable, accessible, interoperable, and reusable) data principles. Data [are] only truly useful if [they meet] these requirements.

BioPharm: What would you say are some key best practice habits or techniques that scientists can employ to managing these data?

Hardy (Thermo Fisher Scientific): Scientists can—and should—do several things to maximize their ability to manage these data. First, they should embrace automation, as it’s a scientist’s best friend. With more automation, scientists get free time to focus on more important, value-adding activities.

Scientists should also keep in mind that managing all of these data is a team sport. Data Science, IT, and TechOps teams are there to help, so scientists should leverage their expertise and support to maximize their chances of overcoming data challenges.

Finally, scientists should keep up to date on the latest technologies for managing and analyzing data—the field is advancing very rapidly, and taking your eye off the ball, even briefly, can lead to missed opportunities.

Finding an effective strategy

BioPharm: What are some effective strategies that scientists can use to ensure that they are processing quality lab data and not just noise (are there such strategies)?

Hardy (Thermo Fisher Scientific): As the saying goes, ‘garbage in, garbage out.’ That is to say that the root of good quality data is good input—good experiment design followed by good execution. Scientists can achieve these in several ways.

For example, they can adopt electronic tools such as laboratory execution systems (LES), which help by ensuring scientists follow methods and procedures correctly. LES and LIMS can also flag anomalous result values that fall outside a given specification range, helping to ensure data quality.

Scientists should also look to adopt artificial intelligence (AI) technologies, as they can greatly help in improving data quality. But this doesn’t have to be an overwhelming task. It can help to start with a small project and build from there. For example, a lab might start by using an AI-based approach to identify systematic errors, before gradually expanding its remit and functionality. It’s important to engage the data science team early and draw on their experience as you embark on this new and exciting journey.


1. Johnson, J.; Kanagali, V.; Prabu, D. Third Party Laboratory Data Management: Perspective with Respect to Clinical Data Management. Perspect Clin Res. 2014, 5 (1), 41–44. DOI: 10.4103/2229-3485.124573