August 15, 2023
By Dwight Akerman, OD, MBA, FAAO, FBCLA, FIACLE
Artificial intelligence-based chatbot functions are increasingly used for customer service and to obtain information for education, research, and health care practice. With the widespread availability of smartphones and the internet, chatbots are inexpensive and readily accessible. ChatGPT is one of the latest AI-based large language models (launched on November 30, 2022), generating human-like responses. It uses supervised and reinforcement learning strategies and has recently gained widespread attention in the medical community. This AI uses natural language processing to respond automatically to questions and simulate human conversation. Although this system was not built for health care, with the extent of its popularity and use among the general public, its potential in answering ophthalmic patient queries remains underexplored. This study aimed to evaluate the accuracy and quality of information provided by ChatGPT on myopia.
Because parents, patients, and other stakeholders increasingly use AI tools to obtain health care information, researchers at Aston University assessed ChatGPT’s capacity for generating accurate information on myopia. To evaluate ChatGPT (free version GPT-3.5, OpenAI), the researchers generated 11 questions about myopia to ask the AI model. Each question was entered five times into a fresh session. Question categories included a general summary of myopia, its cause, symptoms, onset, prevention, complications, natural history, treatment, and prognosis. Five expert optometrists rated the quality of the responses.
ChatGPT’s 275 responses garnered a median score of 4 (on the 1-5 Likert scale) for 10 questions and acceptable responses (median 3) for one question. Overall, 24% of responses were very good, 49% were good, 22% were acceptable, 3.6% were poor, and 1.8% were very poor.
ChatGPT is limited by its (1) inability to critically analyze results from the literature, (2) knowledge database limited (not updated), (3) misinterpretation of medical terms, (4) inability to differentiate between predatory and reputable journal articles, and (5) lack of scientific accuracy and reliability, biased and potential misinformation for readers.
The researchers pointed out in their paper that many patients today seek medical advice from the internet and social media and, as a result, receive inaccurate responses. Despite the limitations of this study and ChatGPT, AI promises to lower health care costs by reducing consultation time and providing valuable patient information. Future versions of AI technology may reduce the patient load from the already burdened health care and eye care systems worldwide by enhancing patient information, thus refining the workflow and improving patient outcomes.
The researchers concluded with the following key points:
- Artificial intelligence is increasingly used to obtain information on education, research, and practice. Since its launch in November 2022, ChatGPT-3.5 has gained widespread attention with its natural language processing to mimic human responses.
- In total, 24% of the responses on myopia by ChatGPT were rated very good and 49% good, whereas 22% were rated acceptable, 3.6% poor, and 1.8% very poor.
- ChatGPT has the potential to provide accurate and quality information on myopia over the internet, but further evaluation and awareness concerning its limitations are crucial to avoid potential misinterpretation.
Abstract
Assessing the Utility of ChatGPT as an Artificial Intelligence-Based Large Language Model for Information to Answer Questions on Myopia
Sayantan Biswas, Nicola S. Logan, Leon N. Davies, Amy L. Sheppard, James S. Wolffsohn
Purpose: ChatGPT is an artificial intelligence language model that uses natural language processing to simulate human conversation. It has seen a wide range of applications, including healthcare education, research, and clinical practice. This study evaluated the accuracy of ChatGPT (version GPT-3.5, OpenAI) in providing accurate and quality information to answer questions on myopia.
Methods: A series of 11 questions (nine categories of general summary, cause, symptom, onset, prevention, complication, natural history, treatment, and prognosis) were generated for this cross-sectional study. Each question was entered five times into fresh ChatGPT sessions (free from the influence of prior questions). The responses were evaluated by a five-member team of optometry teaching and research staff. The evaluators individually rated the accuracy and quality of responses on a Likert scale, where a higher score indicated greater quality of information (1: very poor; 2: poor; 3: acceptable; 4: good; 5: very good). Median scores for each question were estimated and compared between evaluators. Agreement between the five evaluators and the reliability statistics of the questions were estimated.
Results: Of the 11 questions on myopia, ChatGPT provided good quality information (median scores: 4.0) for 10 questions and acceptable responses (median scores: 3.0) for one question. Out of 275 responses in total, 66 (24%) were rated very good, 134 (49%) were rated good, whereas 60 (22%) were rated acceptable, 10 (3.6%) were rated poor, and 5 (1.8%) were rated very poor. Cronbach’s α of 0.807 indicated a good level of agreement between test items. Evaluators’ ratings demonstrated ‘slight agreement’ (Fleiss’s κ, 0.005) with a significant difference in scoring among the evaluators (Kruskal–Wallis test, p < 0.001).
Conclusion: Overall, ChatGPT generated good quality information to answer questions on myopia. Although ChatGPT shows great potential in rapidly providing information on myopia, the presence of inaccurate responses demonstrates that further evaluation and awareness concerning its limitations are crucial to avoid potential misinterpretation.
Biswas, S., Logan, N. S., Davies, L. N., Sheppard, A. L., & Wolffsohn, J. S. (2023). Assessing the utility of ChatGPT as an artificial intelligence‐based large language model for information to answer questions on myopia. Ophthalmic and Physiological Optics.
DOI: https://doi.org/10.1111/opo.13207