AI Clin Med

AI in Clinical Medicine, ISSN 2819-7437 online, Open Access

Article copyright, the authors; Journal compilation copyright, AI Clin Med and Elmer Press Inc

Journal website https://aicm.elmerpub.com

Review

Volume 2, April 2026, e21

Integrating Artificial Intelligence Into Sepsis Care: A Narrative Review of Predictive Models and Implementation Pathways

Figures

↓ Figure 1. Workflow of artificial intelligence (AI)-based sepsis prediction in the intensive care unit (ICU). This figure illustrates the end-to-end pipeline of AI-driven sepsis prediction. Patient data, including vital signs, laboratory values, and electronic health record (EHR) inputs, are continuously collected and pre-processed (data cleaning, normalization, and handling of missing values). These data are then fed into machine learning or deep learning models (e.g., random forest, recurrent neural networks, or temporal convolutional networks), which generate dynamic risk scores in real time. The output is integrated into clinical workflows via EHR-based alert systems, prompting early clinical evaluation and intervention (e.g., antibiotic initiation, fluid resuscitation). The figure also highlights feedback loops for model refinement and the role of clinician oversight to ensure safe and effective implementation.

↓ Figure 2. Validation hierarchy of artificial intelligence (AI)-based sepsis prediction models. This figure depicts the sequential stages of validation required for AI models prior to clinical implementation. Internal validation evaluates model performance within the training dataset using techniques such as cross-validation. External validation assesses generalizability across independent datasets or institutions with different patient populations and clinical settings. Prospective validation represents real-world testing in clinical environments, measuring impact on patient outcomes, workflow integration, and decision-making. The figure emphasizes that while many models demonstrate strong performance in retrospective analyses, relatively few have undergone prospective, multicenter validation. Regulatory oversight and reporting standards (e.g., CONSORT-AI) are also highlighted as essential components for safe clinical adoption.

Tables

↓ **Table 1.** Comparative Summary of AI Models for Sepsis Prediction
Model	Year	Institution	Data inputs	AUROC (best reported)	Validation type
AI: artificial intelligence; AUROC: area under the receiver operating characteristic curve; EHR: electronic health record; ICU: intensive care unit; ML: machine learning.
Epic Sepsis Model (ESM)	2018	Epic Systems (USA)	Proprietary EHR variables	0.63	External
InSight	2016	UCSF & University of Chicago	6 vital signs	> 0.90	Internal + external
Sepsis Watch	2019	Duke University + HBI Solutions	Real-time EHR data	0.85–0.90	Prospective (ED)
DeepAISE	2021	UCSD & Emory Univ.	65 clinical variables	0.90 (internal), 0.87 (external)	Internal + external
TREWS	2022	Johns Hopkins University	Hybrid rules + ML	Not reported (improved outcomes)	Prospective (multi-site)
MGP-TCN/GRU-D-MGP-TCN	2019–2024	ETH Zurich + Collaborators	ICU time-series data	Up to 0.99	Internal

↓ **Table 2.** Comparison of AI Models With Conventional Sepsis Prediction Tools
Tool/model	Typical AUROC	Key strengths	Main limitations
AI: artificial intelligence; APACHE II: Acute Physiology and Chronic Health Evaluation II; AUROC: area under the receiver operating characteristic curve; EHR: electronic health record; ICU: intensive care unit; qSOFA: quick Sequential Organ Failure Assessment; SIRS: systemic inflammatory response syndrome.
SIRS	0.50–0.65	Simple, highly sensitive	Poor specificity, many false positives
qSOFA/SOFA	0.60–0.90 (varies by cohort)	Better prognostic accuracy than SIRS	Requires labs, performance varies by setting
APACHE II	0.80–0.83	Widely used for ICU mortality prediction	Complex, not designed for early sepsis detection
Epic Sepsis Model (ESM)	0.63 (external validation)	Widely deployed; EHR integrated	Poor external performance, non-transparent
InSight	0.88–0.92	Minimal data (six vitals), robust across sites	Limited prospective validation
DeepAISE	0.87–0.90	High accuracy, interpretable factors	High data requirements
MGP-TCN/GRU-D hybrid	0.90–0.99	Handles time-series ICU data; very high accuracy	Computationally intensive, early-stage
Random forest	0.85–0.95	Robust, interpretable variable importance	Performance varies by dataset