AI in Clinical Medicine, ISSN 2819-7437 online, Open Access
Article copyright, the authors; Journal compilation copyright, AI Clin Med and Elmer Press Inc
Journal website https://aicm.elmerpub.com

Review

Volume 2, April 2026, e21


Integrating Artificial Intelligence Into Sepsis Care: A Narrative Review of Predictive Models and Implementation Pathways

Figures

↓  Figure 1. Workflow of artificial intelligence (AI)-based sepsis prediction in the intensive care unit (ICU). This figure illustrates the end-to-end pipeline of AI-driven sepsis prediction. Patient data, including vital signs, laboratory values, and electronic health record (EHR) inputs, are continuously collected and pre-processed (data cleaning, normalization, and handling of missing values). These data are then fed into machine learning or deep learning models (e.g., random forest, recurrent neural networks, or temporal convolutional networks), which generate dynamic risk scores in real time. The output is integrated into clinical workflows via EHR-based alert systems, prompting early clinical evaluation and intervention (e.g., antibiotic initiation, fluid resuscitation). The figure also highlights feedback loops for model refinement and the role of clinician oversight to ensure safe and effective implementation.
Figure 1.
↓  Figure 2. Validation hierarchy of artificial intelligence (AI)-based sepsis prediction models. This figure depicts the sequential stages of validation required for AI models prior to clinical implementation. Internal validation evaluates model performance within the training dataset using techniques such as cross-validation. External validation assesses generalizability across independent datasets or institutions with different patient populations and clinical settings. Prospective validation represents real-world testing in clinical environments, measuring impact on patient outcomes, workflow integration, and decision-making. The figure emphasizes that while many models demonstrate strong performance in retrospective analyses, relatively few have undergone prospective, multicenter validation. Regulatory oversight and reporting standards (e.g., CONSORT-AI) are also highlighted as essential components for safe clinical adoption.
Figure 2.

Tables

↓  Table 1. Comparative Summary of AI Models for Sepsis Prediction
 
ModelYearInstitutionData inputsAUROC (best reported)Validation type
AI: artificial intelligence; AUROC: area under the receiver operating characteristic curve; EHR: electronic health record; ICU: intensive care unit; ML: machine learning.
Epic Sepsis Model (ESM)2018Epic Systems (USA)Proprietary EHR variables0.63External
InSight2016UCSF & University of Chicago6 vital signs> 0.90Internal + external
Sepsis Watch2019Duke University + HBI SolutionsReal-time EHR data0.85–0.90Prospective (ED)
DeepAISE2021UCSD & Emory Univ.65 clinical variables0.90 (internal), 0.87 (external)Internal + external
TREWS2022Johns Hopkins UniversityHybrid rules + MLNot reported (improved outcomes)Prospective (multi-site)
MGP-TCN/GRU-D-MGP-TCN2019–2024ETH Zurich + CollaboratorsICU time-series dataUp to 0.99Internal

 

↓  Table 2. Comparison of AI Models With Conventional Sepsis Prediction Tools
 
Tool/modelTypical AUROCKey strengthsMain limitations
AI: artificial intelligence; APACHE II: Acute Physiology and Chronic Health Evaluation II; AUROC: area under the receiver operating characteristic curve; EHR: electronic health record; ICU: intensive care unit; qSOFA: quick Sequential Organ Failure Assessment; SIRS: systemic inflammatory response syndrome.
SIRS0.50–0.65Simple, highly sensitivePoor specificity, many false positives
qSOFA/SOFA0.60–0.90 (varies by cohort)Better prognostic accuracy than SIRSRequires labs, performance varies by setting
APACHE II0.80–0.83Widely used for ICU mortality predictionComplex, not designed for early sepsis detection
Epic Sepsis Model (ESM)0.63 (external validation)Widely deployed; EHR integratedPoor external performance, non-transparent
InSight0.88–0.92Minimal data (six vitals), robust across sitesLimited prospective validation
DeepAISE0.87–0.90High accuracy, interpretable factorsHigh data requirements
MGP-TCN/GRU-D hybrid0.90–0.99Handles time-series ICU data; very high accuracyComputationally intensive, early-stage
Random forest0.85–0.95Robust, interpretable variable importancePerformance varies by dataset