Evaluating ChatGPT-5o as a Clinical Decision-Support Tool in Inflammatory Bowel Disease: A Pilot Study of Guideline Adherence and Clinical Agreement

Authors

DOI:

https://doi.org/10.14740/

Keywords:

Inflammatory bowel disease, Crohn’s disease, Ulcerative colitis, Artificial intelligence, Clinical decision support, Machine learning, Large language models

Abstract

Background: Inflammatory bowel disease (IBD) presents complex management challenges. While care is guided by expertise and guidelines, artificial intelligence (AI) is being explored as an adjunct. This study evaluates ChatGPT-5o’s ability to provide IBD recommendations by comparing its outputs with real-world decisions and European Crohn’s and Colitis Organisation (ECCO) guidelines.

Methods: We performed a retrospective analysis of 19 anonymized IBD cases spanning initial and complicated disease. ChatGPT-5o generated management recommendations, which were compared with clinician treatments and ECCO guidelines across seven therapeutic domains (5-aminosalicylic acid (5-ASA), steroids, antibiotics, thiopurines, anti-tumor necrosis factor (TNF), anti-integrins, anti-interleukin-23 (IL-23)) plus diagnostic workup, symptom management, surgical consultation, and monitoring. Agreement was quantified using Cohen’s Kappa.

Results: ChatGPT-5o showed perfect agreement (κ = 1.000) with providers and/or guidelines for antibiotics, diagnostic workup, symptom management, surgical consultation, monitoring, and anti-IL-23. Substantial agreement (κ ≈ 0.6 - 0.8) was observed for 5-ASA and steroids. Moderate to fair agreement (κ ≈ 0.3 - 0.5) occurred for anti-TNF and anti-integrins, reflecting variability in complex scenarios. Thiopurines demonstrated the lowest concordance, with none-to-slight agreement in human-AI comparisons but higher alignment of ChatGPT-5o with ECCO, suggesting evolving practice patterns and safety considerations.

Conclusions: ChatGPT-5o closely aligns with clinicians and ECCO guidelines in multiple standardized domains, supporting its potential as a decision-support tool to enhance guideline adherence and broaden access to IBD expertise. Variability in biologic selection and thiopurine use underscores the need for expert oversight and patient-specific judgment. Prospective studies should assess longitudinal outcomes and integration strategies to ensure safe, patient-centered deployment.

Published

2025-12-15

Issue

Section

Original Article

How to Cite

Aillaud-De-Uriarte, D., Cendejas-Higuera, A., Acosta-Marquez, E., Hernandez-Flores, L. A., Reyes-Bastidas, M., & Manzano-Cortes, H. (2025). Evaluating ChatGPT-5o as a Clinical Decision-Support Tool in Inflammatory Bowel Disease: A Pilot Study of Guideline Adherence and Clinical Agreement. AI in Clinical Medicine, 1, e12. https://doi.org/10.14740/