Evaluating ChatGPT-5o as a Clinical Decision-Support Tool in Inflammatory Bowel Disease: A Pilot Study of Guideline Adherence and Clinical Agreement
DOI:
https://doi.org/10.14740/Keywords:
Inflammatory bowel disease, Crohn’s disease, Ulcerative colitis, Artificial intelligence, Clinical decision support, Machine learning, Large language modelsAbstract
Background: Inflammatory bowel disease (IBD) presents complex management challenges. While care is guided by expertise and guidelines, artificial intelligence (AI) is being explored as an adjunct. This study evaluates ChatGPT-5o’s ability to provide IBD recommendations by comparing its outputs with real-world decisions and European Crohn’s and Colitis Organisation (ECCO) guidelines.
Methods: We performed a retrospective analysis of 19 anonymized IBD cases spanning initial and complicated disease. ChatGPT-5o generated management recommendations, which were compared with clinician treatments and ECCO guidelines across seven therapeutic domains (5-aminosalicylic acid (5-ASA), steroids, antibiotics, thiopurines, anti-tumor necrosis factor (TNF), anti-integrins, anti-interleukin-23 (IL-23)) plus diagnostic workup, symptom management, surgical consultation, and monitoring. Agreement was quantified using Cohen’s Kappa.
Results: ChatGPT-5o showed perfect agreement (κ = 1.000) with providers and/or guidelines for antibiotics, diagnostic workup, symptom management, surgical consultation, monitoring, and anti-IL-23. Substantial agreement (κ ≈ 0.6 - 0.8) was observed for 5-ASA and steroids. Moderate to fair agreement (κ ≈ 0.3 - 0.5) occurred for anti-TNF and anti-integrins, reflecting variability in complex scenarios. Thiopurines demonstrated the lowest concordance, with none-to-slight agreement in human-AI comparisons but higher alignment of ChatGPT-5o with ECCO, suggesting evolving practice patterns and safety considerations.
Conclusions: ChatGPT-5o closely aligns with clinicians and ECCO guidelines in multiple standardized domains, supporting its potential as a decision-support tool to enhance guideline adherence and broaden access to IBD expertise. Variability in biologic selection and thiopurine use underscores the need for expert oversight and patient-specific judgment. Prospective studies should assess longitudinal outcomes and integration strategies to ensure safe, patient-centered deployment.
Published
Issue
Section
License
Copyright (c) 2025 The authors

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.






