Beyond GenAI: The Training-Free Discovery Potential of LLMs in a Drug Safety and Regulatory Context

Published on: 

Life sciences data sets can be vast and complex to process, but up to now bringing intelligent automation systems up to speed and validating them has felt debilitatingly onerous. Large language models tackle these barriers head on. Ramesh Ramani and RaviKanth Valigari, technology innovators at ArisGlobal, explain.

Wherever there are large volumes of complex data to be processed and understood, especially where information exists in an unstructured form (in a Word document or some other note form, perhaps mentioned in an email), there is considerable administrative overhead involved in rendering the pertinent content useable and meaningful. And it is here that the latest advances in artificial intelligence (AI) and machine learning (ML) offer substantial process transformation potential, not only process efficiency, but also significantly improved accuracy—once the software knows what it is looking for.

Generative AI (GenAI) technology, using large language models (LLMs), has shown a new path forward by quickly understanding what to look out for and ably summarizing key findings for the user—and, crucially, without the need for painstaking “training” by overstretched teams, or validation of each configuration (required to cope with each different content format).

Contextual discernment

In a drug development context, safety and regulatory requirements present an enormous data burden that consumes vast resources and usually carries a time-based penalty (e.g., linked to prompt adverse event notification or safety reporting, or affecting speed to market). While process automation solutions have existed for some time to lighten the manual load and enhance efficiency, there have been two main sticking points up to now: how to swiftly train modern AI algorithms so that they pick up on only what’s significant, and how to satisfy regulators’ need for accuracy and transparency (so that any findings can be fully trusted).

LLMs (the vast data banks referred to by GenAI tools), and advanced natural language processing (NLP) techniques like retrieval-augmented generation (RAG), are now being applied to fill these gaps and make advanced automation a safe and reliable reality in key life sciences R&D processes, and crucially without the need for continuous, painstaking oversight. In simple terms, RAG simplifies the process of fine-tuning AI models by allowing LLMs to integrate proprietary data with publicly available information, giving them a bigger pool of knowledge, and context, to draw from.

Learning on the fly, without risk to data

The biggest breakthrough with all of this is that specialized applications can now be developed that can apply GenAI-type techniques, contextually, to data they haven’t seen before—learning from and processing the contents on the fly.

For drug developers, this has the potential to transform numerous labor-intensive processes ranging from dynamic data extraction associated with adverse event (AE) intake, to safety case narrative generation, to narrative theme analysis in safety signal detection, to the drafting of safety reports. And solutions for all of these use cases are coming down the line.

Importantly, carefully combined LLM and RAG capabilities are sufficiently transparent and explainable to regulators for the technology to be acceptable as safe and reliable. Responsible AI and AI compliance are particularly critical in life sciences-use cases, so it is essential that companies deploy solutions that are proven and transparent. The LLM/RAG approach addresses potential concerns about data security and privacy, too, as it does not require the use of potentially sensitive patient data for algorithm training and ML. It also stands up to validation by way of periodic sampling by human team members, sampling which can be calibrated as confidence grows in the technology’s performance, ensuring that efforts to monitor its integrity do not undermine the significant efficiency gains that are possible.


One capability, multiple applications

The trouble with ML solutions up to now has been the training burden. For instance, in the case of AE recording, systems would need to be shown what to look for in the information provided via a range of different channels and formats, before extracting and processing it. For each different source type, a new configuration of the software would be needed too, pushing up the training overhead and overall expense, including the maintenance burden each time the technology was updated.

LLMs make it possible to bypass the need to train AI models or algorithms on what to look out for and/or what something means, so that a single technology solution can handle all variations of incoming data. RAG patterns can play an important role here, in explaining a standard operation procedure to an LLM using natural language, so that the system knows what to do with each of many thousands of forms—without the need for special configuration for each relative format.

The potential impact is impressive. Application of LLM/RAG technology to transform AE case intake has been shown to deliver upwards of 65% efficiency gains, with 90% or better data extraction accuracy and quality in early pilots of our Advanced Intake solution this year (1). In the case of safety case narrative generation, the same technology is already demonstrating 80–85% consistency in the summaries it creates. And that’s from a standing start, without prior exposure.

Only the beginning

The ability to retrieve data in context, rather than via a “Ctrl-F” (find all) command (e.g., everything among a content set that mentions headaches) could transform a range of processes linked to safety and AE discovery and reporting.

Certainly, it lays the foundation for drug developers to substantially streamline some of their most demanding data-based processes. In due course, these will also include the drafting of hefty regulatory safety reports, with advanced automation generating the preliminary narrative, and narrative theme analysis in safety signal detection. Here, there is vast scope for the technology to help in distilling trends that have not been captured in the structured data. These could include a history of drug abuse, or of people living with obesity, across 500 patient narratives that are potentially of interest. The potential is extremely exciting.

It is this kind of development that is now being avidly discussed at meetings of the industry’s new global GenAI Council, a powerful peer group that is pushing the boundaries of what’s possible with Generative AI across the life sciences R&D lifecycle. Any hesitation about adopting smarter automation out of reliability or compliance fears has now been superseded by a hunger to embrace new iterations of the technology which directly address those concerns and deliver demonstrable step changes in productivity and efficiency. That 2024–2025 will be a defining period for GenAI and surrounding innovation in this industry seems certain.


1. Aris Global. Early customer pilots of ArisGlobal Advanced Intake, 2024.

About the authors

Ramesh Ramani is vice president of technology at ArisGlobal, with a special interest in the application of AI and ML to transform life sciences R&D regulatory and safety use cases. He is based in Bengaluru, Karnataka, India.

RaviKanth Valigari is vice president of product development at ArisGlobal, and a specialist in digital transformation in life sciences R&D. He is based in Charlotte, North Carolina, USA.

Both Ramesh and Ravi have deep expertise in applying AI, NLP, and other smart automation technologies to life sciences R&D regulatory and safety use cases via powerful, targeted and quick-to-deploy cloud/multi-tenant software as a service (SaaS) applications.