NHS Primary Care · AI Decision Support
Predicting Patient Disengagement
Before It Happens
An AI early-warning system for GP practices — identifying the patients most at risk of stopping care before their health deteriorates. Built on UK clinical standards and openly audited for fairness.
Model Performance
What the model tells us
Model Accuracy
AUC 0.94
XGBoost · out-of-fold validation · 10,000 synthetic CPRD Gold patients
High-Risk Patients
20 Flagged
≥96% disengagement risk · ranked by SHAP feature importance
Fairness Audit
EOD 0.21
IMD quintile equalized-odds difference · NICE ESF Tier B monitoring
About This Project
Built for the UK healthcare system, from the ground up
Patient disengagement — when someone stops attending their GP — often precedes serious health decline. This model identifies who is most at risk before that happens, giving practice teams the information they need to intervene early. Every technical decision in this project reflects NHS data standards, UK regulation, and clinical workflow.
CPRD Gold → CPRD Aurum
This project is built on the Clinical Practice Research Datalink (CPRD) — the NHS's gold-standard anonymised GP records dataset, used by researchers at NICE, the Wellcome Trust, and the NHS itself. We use CPRD Gold (Read Code–coded) as our training cohort, representing a realistic snapshot of 10,000 UK primary care patients. As NHS practices migrate to CPRD Aurum (EMIS-native), our pipeline is designed to ingest both formats without re-engineering.
Legacy Read Codes → SNOMED CT
The NHS mandated SNOMED CT (Systematized Nomenclature of Medicine Clinical Terms) as the single clinical terminology for all GP systems from April 2018 — replacing the older Read Code system (CTV3 / Read v2) that underpinned decades of UK GP records. Our OMOP CDM transformation layer bridges this gap: historical Read Code diagnoses are mapped to SNOMED CT concept IDs, ensuring this model works correctly whether a practice is still mid-migration or fully on SNOMED CT. This future-proofs the pipeline for NHS England's Long Term Plan mandate that all providers be SNOMED CT–compliant by 2025.
OMOP Common Data Model
Raw GP data passes through a Delta Lake medallion pipeline (Bronze → Silver → Gold) before being standardised into the OMOP CDM — the international standard for observational health data used by the FDA, EMA, and NHS England's OpenSAFELY platform. This means every feature in our risk model — consultations, diagnoses, prescriptions — is expressed in a vendor-neutral, internationally portable format. Any NHS trust or ICS wishing to replicate this model against their own OMOP data can do so without bespoke ETL work.
QOF & ICD-10 Conditions
Long-term conditions are flagged using the Quality and Outcomes Framework (QOF) register codes — the same coding system GP practices use for their annual contract reporting to NHS England. Conditions covered include Type 2 diabetes (qof_dm2), Hypertension (qof_ht), Asthma, COPD, Depression, Obesity, CKD, and Cancer. ICD-10 mappings are maintained in the Silver layer for hospital linkage readiness, and SNOMED CT equivalents are stored in the OMOP Gold layer.
IMD — Index of Multiple Deprivation
Every patient is assigned an IMD quintile using the English Indices of Multiple Deprivation (MHCLG, 2019) — the official UK government measure of relative deprivation at LSOA level. IMD is the second-strongest predictor of disengagement in our model, after age. The model's fairness audit measures whether the algorithm treats deprived communities equitably — a NICE ESF Tier B requirement for AI tools used in NHS settings. Equalized odds difference by IMD quintile is reported transparently with every model run.
NICE ESF · UK GDPR · NHS DSP Toolkit
This tool is designed to meet the NICE Evidence Standards Framework (ESF) for Digital Health Technologies — specifically Tier B (AI supporting clinical decisions). All risk scores are model outputs, not clinical findings, and require human review before any action — complying with UK GDPR Article 22 (automated decision-making safeguards). Data governance follows the NHS Data Security and Protection (DSP) Toolkit. The underlying dataset is synthetic (no real patient data leaves the NHS boundary), making this safe for demonstration and external validation.
NHS Datasets & Linkage Awareness
The pipeline is architecturally aware of the broader NHS data ecosystem. Regional performance is benchmarked against Fingertips (UKHSA Public Health Profiles) at ICB level. Mental health outcomes reference IAPT (Improving Access to Psychological Therapies) waiting time standards. Practice quality ratings from the CQC (Care Quality Commission) are included as model features. NHS England Integrated Care Board (ICB) boundaries — which replaced CCGs in 2022 — are used for geographic aggregation. Future linkage to HES (Hospital Episode Statistics), ONS mortality, and NHS 111/999 data is supported by the OMOP schema already in place.
Synthetic data only. All patients in this pilot are generated from CPRD statistical distributions — no real NHS patient records are processed outside secure NHS environments. This makes the model safe to demonstrate, evaluate, and validate externally before NHS IG approval.