
Is Compliant AI in Pharma Even Possible? The Search for an Honest Answer
Key Takeaways
- Context matters: the cited firm exhibited pervasive CGMP failures, and the AI issue was embedded within 21 CFR 211.22(c) quality-unit oversight deficiencies.
- FDA’s durable expectation is that AI outputs can aid drafting, but must be reviewed and cleared by an authorized QU representative before regulated decisions.
A warning letter, a hard look at why “a human reviews it” doesn’t settle anything, and one company’s argument that accuracy was never the real problem.
Disclaimer: The views expressed in the article are those of the author and do not necessarily reflect those of MJH Life Sciences or PharmTech.
A line that stuck with me came out of an FDA inspection in Livonia, Mich. Investigators told a small drug manufacturer it had shipped product without ever running process validation. The company’s explanation, recorded in Warning Letter 320-26-58, was that it hadn’t known validation was required, because the AI agent it had been leaning on to write its procedures never mentioned it.1
I’ve spent enough years inside GMP training and quality systems to register a few things at once when I read that. It’s darkly funny and also terrifying. It’s the kind of statement that ends up in a deposition. And it sent me down a longer road than I expected, because the question underneath it is one the industry keeps answering with marketing instead of evidence: can you actually put generative AI inside a regulated quality system and have it survive contact with an inspector, or is “compliant AI” a phrase vendors say because buyers want to hear it?
I wanted a real answer, not the brochure version of one. Here’s where a few weeks of poking at it left me.
What the Purolea Letter Actually Was
Maybe you’ve seen this one go around as FDA’s first standalone citation for AI, the letter that supposedly rewrote pharma’s buying criteria overnight. Slow down there. That framing falls apart the second you actually read the thing.
The firm, Purolea Cosmetics Lab, was a homeopathic and cosmetics operation. I’m using past tense because their operations ceased after the grocery list of citations they received. Among the flagged products were “Dermveda Extra Strength Shingles Relief” and a genital herpes “relief” product carrying disease claims that make them unapproved new drugs.1 Investigators found insects, filth, leaves, and clutter in the facility, plus a docking bay door that opened straight onto the manufacturing area.1 No microbial testing on finished product. No component identity testing. A quality unit that wasn’t reviewing batch records before release.1 This was a firm operating with almost no quality system to speak of.
The AI finding isn’t a separate citation either. It sits inside the quality-unit (QU) failure under 21 CFR 211.22, specifically 211.22(c): if you use AI to help create documents, you have to review those documents to confirm they’re accurate and compliant, and not doing so is the violation.1 The process-validation gap under 211.100 reads like a punchline if it weren’t shipping real product: the owner later told investigators she hadn’t known validation was required because the AI never brought it up.
So a sloppy supplement shop isn’t a stand-in for a real manufacturer, and anyone selling it that way is overreaching. What makes the letter worth your time is the sentence FDA wrote in response, the one that will long be remembered rather than the incident itself. Any output from an AI agent, the agency said, “must be reviewed and cleared by an authorized human representative of your firm’s QU.”1 FDA wasn’t banning AI. It said plainly that AI can be used as an aid. The deficiency was that nobody with authority stood between the machine’s output and the regulated decision.
That’s a low bar. Have a qualified human review it. When I started asking whether that bar is actually enough, the answer got very uncomfortable.
The Trouble with “A Human Reviews It”
The reflex fix, once you accept AI into the workflow, is to put a competent reviewer at the end of it. Problem solved, supposedly. The model drafts, the expert checks, the expert signs.
The research on how people behave around automated output doesn’t support that confidence. A systematic review of automation bias in clinical decision support found that good automated advice lifts performance, but incomplete or incorrect advice drags it down, because reviewers stop scrutinizing and start deferring to the system.2 A 2024 Nature Medicine study of radiologists working with AI assistance found the effect isn’t even reliably positive. Whether the AI helped or hurt depended on the case and the individual, and in some scenarios inaccurate predictions pulled skilled readers toward worse calls than they’d have made alone.3
Translate that into a quality organization. A reviewer facing 40 AI-drafted deviation rationales in a week, each one reading clean and authoritative, is not the neutral error-catcher the control assumes. Volume erodes the scrutiny. So does any process that treats the signature as a formality rather than a decision. “A human reviewed it” can be technically true and functionally empty, and an inspector who understands that won’t be impressed that a name appears in the approval field.
This is the point where I stopped believing the problem was about better models or better prompts. If the human checkpoint is weaker than everyone assumes, then making the AI more fluent only makes its mistakes more convincing.
The Governance Problem Hiding Underneath AI
The sharpest version of this I heard came from two people I’ll get to in a moment, the founders of Counterpoint AI, and it reframed the whole thing for me. The industry keeps framing regulated AI around content accuracy and hallucination: the fear that a generative system will produce something false that slips into a regulated record. That risk is real, but they argue it may not be the defining governance problem as AI adoption expands in pharma. I went in unsure they were right about that and came out mostly persuaded.
Their harder problem, in their words, is keeping a regulated operation ready while change comes faster than it used to. Quality has never just produced accurate documents. It keeps track of what applies to whom, what got authorized, who is qualified for what, and whether all of that still holds as procedures and protocols move underneath it. That work has always been a grind. It used to be a manageable one.
What makes that manageability run out is volume and velocity. The share of trial protocols carrying at least one substantial amendment has climbed from 57% to 76% since 2015, and the mean number of amendments per protocol is up ~60%, to 3.3 from 2.1.4,5 The parallel rise of generative AI makes it tempting to believe these capabilities will let an organization keep up by generating and propagating the content around that change. The catch is that AI scales content generation far faster than it scales governance, which still runs on slower, human-scale coordination. Without mechanisms that control downstream operational activity, ungoverned document generation outruns the humans who are supposed to keep it in check.
Their framing gave me the language for something I had felt but could not name, and building on it, I think the failure mode has a more concrete shape than “readiness” suggests. Call it operational drift. The exposure I keep coming back to isn’t the wrong document, it’s the right one landing in the wrong place. A change that propagates incompletely. A qualification state that quietly diverges from its governing source. A procedure that goes operationally live before anyone has governed its downstream consequence. Run that forward and you get a second, worse problem: the gradual loss of authoritative visibility itself, where quality leaders can no longer say with confidence which people, sites, procedures, or qualifications are operating in a governed state. That is a different and harder class of failure than a hallucinated sentence, and it’s the one most quality systems were never built to catch, because they assume change moves at human speed and AI has changed the speed.
Change control is the clean example. A machine can determine which sites and roles a change touches and hold it until training and validation are done, and that is real work taken off a human plate. But assessing the quality impact and deciding to approve or reject is judgment that stays human, and it gets sharper when the routine reconciliation underneath it is handled. Counterpoint maps the whole quality operation this way, and once you look closely most functions do split along those lines. Some of the work is pure administration a system can absorb, some shrinks to a smaller human footprint, and some is judgment that does not go anywhere. The trouble starts when a tool quietly takes on the judgment-shaped parts under cover of doing the administrative ones.
The founders believe this challenge lands hardest on contract research organizations (CROs), and the logic held up for me. A CRO juggling several sponsors runs each program as a discrete governance context, with separate procedures, separate qualification matrices, and overlapping staff, then absorbs the coordination cost as overhead that scales faster than its revenue. A single qualification gap at one site doesn’t stay at that site, because sponsors are obligated to look at what it means across every program they run with that CRO. That’s where the manual model strains first and where the operational and commercial pressure become impossible to ignore, because what’s at stake is a book of business.
The Question Nobody Was Answering
One thing I owe you up front. I’m not the first to read Purolea this way. In the weeks after the letter, several quality and medical-device writers reached the same conclusion: a company with no functioning quality unit had reached for AI to plug the hole, and everything in the letter about AI was downstream of that. Greenlight Guru, Clarkston, and others made versions of the point, and they were right, so the diagnosis isn’t mine to claim.
The question I couldn’t find anyone answering was the one I cared most about. If the real problem is governance rather than accuracy, does anyone have a fix that would survive an inspector? That’s what sent me looking for someone building for the hard problem instead of the easy one.
One company, at least, believes the answer is yes.
In Part II, I'll examine one company's attempt to answer that question: can governance-by-design solve a problem that better content generation alone cannot?
References:
1.U.S. Food and Drug Administration. Warning Letter 320-26-58, Purolea Cosmetics Lab (FEI 3011669383). Center for Drug Evaluation and Research, Office of Compliance. April 2, 2026.
2. Goddard K, Roudsari A, Wyatt JC. Automation bias: a systematic review of frequency, effect mediators, and mitigators. J Am Med Inform Assoc. 2012;19(1):121–127.
3. Yu F, Moehring A, Banerjee O, et al. Heterogeneity and predictors of the effects of AI assistance on radiologists. Nat Med. 2024;30:837-849. doi:10.1038/s41591-024-02850-w
4. Getz KA, Smith Z, Botto E, et al. New benchmarks on protocol amendment practices, trends and their impact on clinical trial performance. Ther Innov Regul Sci. 2024;58. doi:10.1007/s43441-024-00622-9
5. Getz KA, Stergiopoulos S, Short M, et al. The impact of protocol amendments on clinical trial performance and cost. Ther Innov Regul Sci. 2016;50(4):436-441. doi:10.1177/2168479016632271




