Shoring Up Data Integrity Practices

Published on: 
Pharmaceutical Technology, Pharmaceutical Technology, May 2024, Volume 48, Issue 5
Pages: 33-34

Digital transformation is allowing for better handling, analysis, and protection of vast data collection.

Data are the biopharma industry’s most valuable asset today, which is prompting the industry to step up its measures to not only gather these data, but to protect them as well. In a conversation with Nathalie Batoux, product manager, Research Innovation, IDBS, Pharmaceutical Technology EuropeTM delved into the approaches the biopharma industry has taken to shore up data integrity protection.

A most valuable asset

PTE: Explain why data are the biopharma industry’s most valuable asset. The analogy is that there’s a gold rush for data (1); what does that mean?

Batoux (IDBS): With the current digital transformation happening in the biopharma industry, the industry has also seen the explosion of [tools such as] artificial intelligence (AI) and machine learning (ML). There is also a significant amount of data that exists now based on the products already on the market. With the explosion of AI and ML, people want to reuse the data that they have. So, instead of running new experiments, already existing data are being used; these historical data are very important. There are some new initiatives underway to review and repurpose existing drugs for other applications, thus, AI and ML are being used to review these already existing data. Not only is the biopharma industry producing a vast amount of new data, it also has its hands on a significant amount of historical data.

PTE: Describe the impact of this perception—this ‘gold rush’ for data—on data integrity. Has it led to a flurry of activity?

Batoux (IDBS): Yes, there is a flurry of activity. There are many startups that are proposing AI and ML to the biopharma industry, amongst other industries. There are initiatives, as I mentioned, around the repurposing of existing drugs using historical data at hand.

Data integrity practices to date

PTE: Can you discuss where data integrity practices in the biopharma industry are the strongest and where they are the weakest?

Batoux (IDBS): My perspective is to look at the evolution within the industry, and where we are now with digitalization, data integrity is getting much stronger. Looking at the practice of manually recording, or manually transcribing, data from one system to another, this is very much error prone. Whether intentionally or accidentally, errors are highly likely. If there is an error in the data, then any analysis conducted later on will be incorrect, and the decision-making using that data will not be good and will thus be counterproductive. This will cost money at the end of the day.


Good data integrity involves the use of more automated systems, using digital transformation where the system records the context around the results. For example, in an experiment, scientists—even though they are very well trained—are still human. There’s only a finite number of parameters that they can deal with. While a computer, or a series of computers, can record all sorts of things around the context. It’s as simple as the date and time of an experiment that might change the outcome, whether [the experiment] is done in the summer or winter. That is important as temperature and humidity conditions may affect the outcome.

The other advantage is that, with data digitally recorded, there is traceability, and most of the modern systems now have audit capabilities. [This makes it easier] to understand what has happened to generate that data, but also what has happened after the data has been generated—such as, has the data been transferred from one system to another? Having all that context, having this history, ensures that, in the first place, the data—that is, the data that will be used to train the models for ML or analysis, or for use by AI—will be of quality. [And that assurance comes from] all that context, all that history behind the data.

Data that are digitally recorded [are] easier to find. [In the laboratory], finding an experiment on paper or in a notebook is not easy. [Digitally recorded data] offer interoperability and [better] accessibility. If [the data are] on paper, for example, [the lab tech] needs to have that piece of paper in front of [them] to be able to see it. This hinders interoperability [because] that piece of paper, if the handwriting is, as they say, like a doctor’s handwriting, not many people will be able to understand it. Being ‘reusable’ as a whole is very, very important for data, so having FAIR [findable, accessible, interoperable, and reusable] data is one of the principles of data integrity. This will enable the digital transformation and all those initiatives that are happening with ML and AI.

Evaluating data quality

PTE: Let’s talk about the data itself. Good data versus bad data; how is that determined?

Batoux (IDBS): As mentioned earlier, having all the context around the data recorded is one of the attributes of good data. The other attributes, as I mentioned, are the FAIR principles; however, one needs to make sure that they are recording everything. It’s human nature to try to record things that are happening and [what is] working, but from a learning perspective—and for ML, in particular—one needs to also record what didn’t work because [there is] much more to learn from things that didn’t work. So, the ‘how to’ recognize good data versus bad data requires having rigour, using well documented SOPs [standard operating procedures], and having controls in place, all of which are quite important because these help with the reproducibility of that data.

In addition, if using a digital system with controls in place, then [the technician] will receive an alert, and [this allows] them to record an exception. It is very important to record all those things, not only for good data in general, but also for the data that regulatory bodies require. [Keep in mind], all the compliance requirements are very important. Another point [to highlight] is that for accessibility and interoperability, more than one system will need to be used; likely, it will require several systems. And, data needs to be connected easily, so having silos that are not talking to each other, that don’t represent each other, that is not good data. Having the ability to connect the data—to connect to the structure of that data in order to understand them—that is a symptom of good data.

PTE: How is the biopharma industry handling this flow of data; how is it preparing for this gold rush that was mentioned earlier?

Batoux (IDBS): [With the] digitalization that is happening, less companies are using paper, and [it seems that] more and more companies have been investing in data scientists to be able to mine the data. However, there is a difference between having all that data mined by specialists and the people who are actually using the data at the bench, so having democratization of data is extremely important. Empowering [bench] scientists with decision-making based on large data sets, including past data, at the click of a button is very valuable. Rather than having to wait for a few weeks for data scientists’ analyses—although the data scientists’ skills are still important for more complex analyses—a more diverse set of scientists or end users can have accessibility to data with [the goal] of helping to put algorithms in place for the analyses, and so on.


1. Batoux, N.; Rayner, D. In a Digital Gold Rush, Data Integrity is Priceless. Pharmaceutical Technology’s In The Lab eNewsletter 2023, 18 (10).

About the author

Feliza Mirasol is the science editor for Pharmaceutical Technology Europe®.

Article details

Pharmaceutical Technology Europe®
Vol. 36, No. 5
May 2024
Pages: 33–34


When referring to this article, please cite it as Mirasol, F. Shoring Up Data Integrity Practices. Pharmaceutical Technology Europe 2024, 36 (5) 33–34.