Generative artificial intelligence as a source of breast cancer information for patients: proceed with caution

Messaggi chiave

È stato condotto uno studio volto ad analizzare l’uso di ChatGTP 3.5 come fonte di informazioni per pazienti con carcinoma mammario (BC). Il chatbot è stato interrogato su 20 quesiti considerati di potenziale interesse in tre momenti diversi. Le risposte sono state valutate in termini di accuratezza (misurata su scala Likert a 4 punti), concordanza clinica (somiglianza delle informazioni fornite dal chatbot con la risposte dei clinici, misurata su scala Likert a 5 punti) e leggibilità (valutata con scala di Flesch Kincaid). La concordanza tra le risposte fornite nei tre punti temporali dello studio è stata stimata mediante coefficiente di correlazione intraclasse (ICC).
L’accuratezza e la concordanza clinica medie complessive sono risultate pari a 1,88 (range, 1,0-3,0; IC 95%, 1,42-1,94) e 2,79 (range, 1,0-5,0; IC 95%, 1,94-3,64), rispettivamente. La leggibilità media è stata scarsa, con un punteggio di 37,9 (range, 18,0-60,5) e un grado di concordanza elevato (ICC, 0,73; IC 95%, 0,57-0,90; p <0,001). È stata osservata una debole correlazione tra facilità di lettura e migliore concordanza clinica (-0,15; p = 0,025), mentre non è emersa alcuna correlazione tra accuratezza e leggibilità (0,05; p = 0,079). Il numero medio di riferimenti bibliografici forniti è stato di 1,97 (range, 1-4; totale, 119), con citazione di articoli peer-reviewed in un’unica occasione.
ChatGPT 3.5 ha fornito risposte errate il 24% delle volte e indirizzato l’utente verso siti web inesistenti il 41% delle volte. Nel complesso, i risultati dello studio sottolineano l’importanza di ammonire i pazienti circa l’opportunità di ricorrere a questo strumento di intelligenza artificiale per il reperimento di informazioni di natura medica.

Abstract

Background

This study evaluated the accuracy, clinical concordance, and readability of the chatbot interface generative pretrained transformer (ChatGPT) 3.5 as a source of breast cancer information for patients.

Methods

Twenty questions that patients are likely to ask ChatGPT were identified by breast cancer advocates.
These were posed to ChatGPT 3.5 in July 2023 and were repeated three times.
Responses were graded in two domains: accuracy (4-point Likert scale, 4 = worst) and clinical concordance (information is clinically similar to physician response; 5-point Likert scale, 5 = not similar at all).
The concordance of responses with repetition was estimated using intraclass correlation coefficient (ICC) of word counts.
Response readability was calculated using the Flesch Kincaid readability scale.
References were requested and verified.

Results

The overall average accuracy was 1.88 (range 1.0-3.0; 95% confidence interval [CI], 1.42-1.94), and clinical concordance was 2.79 (range 1.0-5.0; 95% CI, 1.94-3.64).
The average word count was 310 words per response (range, 146-441 words per response) with high concordance (ICC, 0.75; 95% CI, 0.59-0.91; p <0.001).
The average readability was poor at 37.9 (range, 18.0-60.5) with high concordance (ICC, 0.73; 95% CI, 0.57-0.90; p <0.001).
There was a weak correlation between ease of readability and better clinical concordance (-0.15; p = 0.025).
Accuracy did not correlate with readability (0.05; p = 0.079).
The average number of references was 1.97 (range, 1-4; total, 119).
ChatGPT cited peer-reviewed articles only once and often referenced nonexistent websites (41%).

Conclusions

Because ChatGPT 3.5 responses were incorrect 24% of the time and did not provide real references 41% of the time, patients should be cautioned about using ChatGPT for medical information.

Breast Cancer Journal Scan

Generative artificial intelligence as a source of breast cancer information for patients: proceed with caution

Home ⁄ Issues ⁄ Ottobre 2024 ⁄ Generative artificial intelligence as a source of breast cancer information for patients: proceed with caution

Generative artificial intelligence as a source of breast cancer information for patients: proceed with caution

Messaggi chiave

Abstract