Safety by Design: Our Approach to AI in Clinical Document Processing

22 Jul

2025

Recent headlines have drawn attention to the serious risk of introducing generative AI into clinical and administrative workflows without the right safeguards. These incidents underscore a growing concern in modern primary care: AI’s ability to fabricate details with potentially dangerous consequences via so-called “hallucinations.” In healthcare, where precision and accuracy are vital, these failures are more than technical glitches - they are potential threats to patient safety and wellbeing.

At BetterLetter, we don’t believe in slowing down innovation, but we do believe in building responsibly. For us, AI is a tool to support clinical coders and admin staff, not replace them. This principle helps shape everything that we build, and this article is a short round-up of the AI safety features that are built into BetterLetter.

AI suggestions linked to document annotations

According to the MHRA’s guidance on software as a medical device, clinical decision support software that provides guidance or suggestions should enable healthcare professionals to view those suggestions alongside the raw clinical data. We take it a step further, and annotate the raw data with AI suggestions such that a reviewer can see at a glance where every recommended code or task has come from. By presenting the AI suggestions linked to specific words in the letter, the reviewer has the full context for each suggestion at a glance, and by interacting directly with the letter itself, the reviewer is able to work very efficiently because they are not hunting for information before making a decision about whether each code is appropriate.


Every suggestion must be individually approved

Unlike some applications that allow AI-generated summaries to be filed directly into patient records without human review, BetterLetter insists on explicit and individual approval for every AI suggestion. We’ve designed the system to act as an intelligent assistant, picking the most appropriate SNOMED code, linking to problems, and surfacing other relevant information, however ultimately each recommendation must be confirmed or rejected by the human-in-the-loop. This ensures that final outputs are always subject to human judgement and reflect clinical intent, with a full audit trail maintained for each decision made.


An AI that resists hallucination

Recent incidents with AI have shown that off-the-shelf large language models (LLMs) will at times confidently make things up: diagnoses, symptoms, medications, and even the name of the hospital who sent the letter. This is a well-known limitation of general-purpose LLMs: they are powerful tools, but the current generation of generative AI technology intrinsically suffers from hallucinations and most importantly, this can’t be completely fixed by any existing techniques or tools. .

To combat hallucinations, we have built an AI engine called GENIE (Graph ENabled Information Extraction) specifically for primary care coding. GENIE uses NLP, knowledge graphs and guardrail technology to reduce the chance of hallucinations. No AI vendor can credibly claim to have eliminated LLM hallucinations, but we have put a lot of energy and innovative engineering into minimising them as much as possible. We will share more details about GENIE in a future article. Ultimately, to prioritise clinical safety, we insist on human approval as the final safety check.

An extensive system of checks and warnings

In addition to AI coding suggestions, we have built a large number of checks and balances into BetterLetter. These draw a coder's attention to key information in both the letter and the patient’s care history. This includes reference ranges for pathology results, safeguarding information on the patient record, and unusual diagnoses or variations from coding protocol (e.g. “Are you sure you trust that DVT diagnosis from A&E with no associated ultrasound?”). This goes a step further than coding a letter in isolation, and encourages the reviewer to consider the patient’s wider medical history and personal context. Ultimately it is about providing the reviewer with the best data available for them to make a decision about which codes and actions are most appropriate.

No LLM-based document summarisation without MHRA Class 2A certification

One of the riskiest practices we have witnessed in the market is applications using LLMs to generate summaries of consultations or clinical documents. Not only is it technically difficult to ensure all important information and context is accurately captured, it’s generally unsafe when the model might misinterpret or fabricate critical clinical details. While LLMs may superficially seem to do an acceptable job, current research indicates that hallucinations are pervasive where LLMs are applied to clinical text, and it is therefore only a matter of time before critical mistakes are made.

NHSE recently circulated guidance on medical device classification for ambient voice technology (AVT), stating that all AVT solutions that undertake summarisation require at least MHRA Class 1 status and that AI software must not produce generative diagnoses, management plans or referrals without MHRA Class 2A approval. 

Another concern is applications that offer “auto-filing” of certain clinical documents. Whether powered by LLMs or more conventional machine learning models, this is high-risk functionality unless clinical safety has been demonstrated via Class 2A certification. BetterLetter is currently undertaking Class 1 medical device certification for our current feature set, and we are committed to not generating document summaries or offering human-out-of-the-loop filing without having Class 2A certification in place.

Comprehensive data privacy, cyber security and clinical safety documentation

Technology is moving quickly, and we appreciate it can be difficult for individual practices to make assessments about which technologies are safe to use. For this reason it is of the utmost importance to ask the right questions before signing a contract. In particular, when evaluating solutions you should review:

  • Data privacy documentation (DPA, DPIA, DSPT).
  • Cyber security certifications (Cyber Essentials Plus, DTAC, ISO27001 / SOC2).
  • An up to date penetration test with no major or significant vulnerabilities.
  • Clinical safety case (DCB0129 / support for implementing DCB0160).
  • Compliance with assurance frameworks (at the ICB level or via NHSE programs such as IM1).

We encourage you to ask critical questions - if an application is offering features that aren't explicitly covered in the above documentation, this should raise concerns about whether the clinical safety implications of those functionalities have been properly addressed.

Summary

We believe safety is the foundation of meaningful innovation in healthcare. The best clinical tools aren’t just fast - they’re trustworthy. We support the expansion of AI to help overwhelmed care systems like the NHS, but we also believe the best way to do this is with rigorous testing, safety-by-design, and a “do things the right way, the first time” attitude.

Cases in recent news highlights the real-world consequences of deploying AI without careful consideration of all the risks. At BetterLetter, we have taken a different path - one that puts clinical safety first, and builds trust through thoughtful design:

  • AI suggestions are linked to raw clinical data enabling safe and effective decision making.
  • Every AI suggestion is explicitly approved by a human-in-the-loop reviewer.
  • Our approach to AI is hallucination-resistant due to a combination of several technologies rather just than using LLMs to do all the heavy lifting.
  • We won’t offer LLM-generated document summaries or auto-file letters without Class2A medical certification.
  • We embrace oversight and governance not as a burden to innovation, but as a core design principle.

The NHS deserves the best of AI, but also the safest. That ideal is reflected in everything that we build at BetterLetter.

Other articles

Check out what else's going on at BetterLetter