AI in Clinical Medicine, ISSN 2819-7437 online, Open Access
Article copyright, the authors; Journal compilation copyright, AI Clin Med and Elmer Press Inc
Journal website https://aicm.elmerpub.com

Original Article

Volume 1, 2025, e12


Evaluating ChatGPT-5o as a Clinical Decision-Support Tool in Inflammatory Bowel Disease: A Pilot Study of Guideline Adherence and Clinical Agreement

Figures

Figure 1.
Figure 1. Bar chart for Cohen’s Kappa values by category and comparison.
Figure 2.
Figure 2. Cohen’s Kappa agreement levels across different categories. The color scale represents the level of agreement: dark blue (1.0): perfect agreement (e.g., anti-IL-23, antibiotics, diagnostic workup, symptom management, surgical consultation, continuous monitoring). Medium blue (≈ 0.6 - 0.8): substantial agreement. Light blue (≈ 0.3 - 0.5): moderate to fair agreement, indicating some variability in alignment.

Tables

Table 1. Interpretation of Agreement Levels
 
Agreement levelInterpretation
Perfect (Kappa = 1.000)Complete alignment, no variation between recommendations.
Substantial (Kappa ≈ 0.6 - 0.8)High agreement with some minor differences, reflecting consistent recommendations.
Moderate to fair (Kappa ≈ 0.3 - 0.5)Notable variability, indicating different interpretations or thresholds in decision-making.
Non-significantAgreement level is not statistically meaningful; recommendations may vary substantially.

 

Table 2. Summary of Cohen’s Kappa Agreement for Each Category
 
CategoryHuman vs. guidelines KappaHuman vs. ChatGPT KappaChatGPT vs. guidelines KappaAgreement levelSignificance
5-ASA: 5-aminosalicylic acid; IL: interleukin; TNF: tumor necrosis factor.
5-ASA0.6890.6890.604SubstantialYes
Steroids0.4060.5780.771Moderate to substantialYes
Anti-TNF0.2890.3450.685Fair to substantialYes
Anti-integrins0.4860.3360.771Fair to substantialYes
Anti-IL-231.0001.0001.000PerfectYes
Thiopurines0.4410.3130.771Fair to substantialNo
Antibiotics0.6421.0001.000Substantial to perfectYes
Diagnostic workup1.0001.0001.000PerfectYes
Symptom management-1.0001.000PerfectYes
Surgical consult-1.0001.000PerfectYes
Continuous monitoring-1.0001.000PerfectYes

 

Table 3. Explanation of Cohen’s Kappa Agreement for Each Category
 
CategoryHuman vs. guidelines KappaHuman vs. guidelines P-valueHuman vs. ChatGPT KappaHuman vs. ChatGPT P-valueChatGPT vs. guidelines KappaChatGPT vs. guidelines P-value
5-ASA: 5-aminosalicylic acid; IL: interleukin; TNF: tumor necrosis factor.
5-ASA0.6890.0020.6890.0020.6040.008
Steroids0.4060.0280.5780.0050.7710.001
Anti-TNF0.2890.0730.3450.0470.6850.003
Anti-integrins0.4860.0130.3360.050.7710.001
Anti-IL-23101010
Thiopurines0.4410.0540.3130.1610.7710.001
Antibiotics0.6420.0031010
Diagnostic workup101010
Symptom management1010
Surgical consult1010
Continuous monitoring1010