Abstract
Chat Generative Pretrained Transformer (ChatGPT; OpenAI) is a state-of-the-art large language model that can simulate human-like conversations based on user input. We evaluated the performance of GPT-4 V in the Japanese National Clinical Engineer Licensing Examination using 2,155 questions from 2012 to 2023. The average correct answer rate for all questions was 86.0%. In particular, clinical medicine, basic medicine, medical materials, biological properties, and mechanical engineering achieved a correct response rate of ≥ 90%. Conversely, medical device safety management, electrical and electronic engineering, and extracorporeal circulation obtained low correct answer rates ranging from 64.8% to 76.5%. The correct answer rates for questions that included figures/tables, required numerical calculation, figure/table ∩ calculation, and knowledge of Japanese Industrial Standards were 55.2%, 85.8%, 64.2% and 31.0%, respectively. The reason for the low correct answer rates is that ChatGPT lacked recognition of the images and knowledge of standards and laws. This study concludes that careful attention is required when using ChatGPT because several of its explanations lack the correct description.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
No datasets were generated or analysed during the current study.
References
Introducing ChatGPT. OpenAI. https://openai.com/blog/chatgpt/
Shen Y, Heacock L, Elias J et al (2023) ChatGPT and Other Large Language Models Are Double-edged Swords. Radiology, 307: e230163. https://doi.org/10.1148/radiol.230163
Hirosawa T, Harada Y, Yokose M et al (2023) Diagnostic accuracy of differential-diagnosis lists generated by Generative Pretrained Transformer 3 Chatbot for clinical vignettes with common chief complaints: a pilot study. Int J Environ Res Public Health, 20:3378. https://doi.org/10.3390/ijerph20043378
Johnson S, King A, Warner E et al (2023) Using ChatGPT to evaluate cancer myths and misconceptions: artificial intelligence and cancer information. JNCI Cancer Spectr, 7:1–9. https://doi.org/10.1093/jncics/pkad015
Ali SR, Dobbs TD, Hutchings HA et al (2023) Using ChatGPT to write patient clinic letters. Lancet Digit Health, 5:e179–e181. https://doi.org/10.1016/S2589-7500(23)00048-1
Kung TH, Cheatham M, Medenilla A et al (2023) Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health, 2:e0000198. https://doi.org/10.1371/journal.pdig.0000198
GPT-4. OpenAI. https://openai.com/index/gpt-4-research/
Takagi S, Watari T, Erabi A et al (2023) Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study. JMIR Med Educ, 9:e48002. https://doi.org/10.2196/48002
Gilson A, Safranek CW, Huang T et al (2023) How does ChatGPT perform on the United States Medical Licensing Examination (USMLE)? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ, 9:e45312. https://doi.org/10.2196/45312
Lai UH, Wu KS, Hsu TY et al (2023) Evaluating the performance of ChatGPT-4 on the United Kingdom Medical Licensing Assessment. Front Med (Lausanne), 10:1240915. https://doi.org/10.3389/fmed.2023.1240915
Mousavi M, Shafiee S, Harley JM et al (2024) Performance of generative pre-trained transformers (GPTs) in Certification Examination of the College of Family Physicians of Canada. Fam Med Community Health, 12:e002626. https://doi.org/10.1136/fmch-2023-002626
Jung LB, Gudera JA, Wiegand TLT et al (2023) ChatGPT Passes German State Examination in Medicine With Picture Questions Omitted. Dtsch Arztebl Int, 120:373–374. https://doi.org/10.3238/arztebl.m2023.0113
Ebrahimian M, Behnam B, Ghayebi N et al (2023) ChatGPT in Iranian medical licensing examination: evaluating the diagnostic accuracy and decision-making capabilities of an AI-based model. BMJ Health Care Inform, 30:e100815. https://doi.org/10.1136/bmjhci-2023-100815
Rosoł M, Gąsior JS, Łaba J et al (2023) Evaluation of the performance of GPT-3.5 and GPT-4 on the Polish Medical Final Examination. Sci Rep, 13:20512. https://doi.org/10.1038/s41598-023-46995-z
Flores-Cohaila JA, García-Vicente A, Vizcarra-Jiménez SF et al (2023) Taype-Rondan A performance of ChatGPT on the Peruvian National Licensing Medical Examination: cross-Sectional study. JMIR Med Educ, 9:e48039. https://doi.org/10.2196/48039
Wang X, Gong Z, Wang G et al (2023) ChatGPT performs on the Chinese National Medical Licensing Examination. J Med Syst, 47:86. https://doi.org/10.1007/s10916-023-01961-0
Huang CH, Hsiao HJ, Yeh PC et al (2024) Performance of ChatGPT on Stage 1 of the Taiwanese medical licensing exam. Digit Health, 10:20552076241233144. https://doi.org/10.1177/20552076241233144
Zong H, Li J, Wu E et al (2024) Performance of ChatGPT on Chinese national medical licensing examinations: a five-year examination evaluation study for physicians, pharmacists and nurses. BMC Med Educ, 24:143. https://doi.org/10.1186/s12909-024-05125-7
Wang Y, Shen HW, Chen TJ (2023) Performance of ChatGPT on the pharmacist licensing examination in Taiwan. J Chin Med Assoc, 86:653-658. https://doi.org/10.1097/JCMA.0000000000000942
Lee SA, Heo S, Park JH (2024) Performance of ChatGPT on the National Korean Occupational Therapy Licensing Examination. Digit Health, 10:20552076241236635. https://doi.org/10.1177/20552076241236635
Kanai H (1989) Clinical engineering in Japan and the Bill for the Clinical Engineering Technician Law. Front Med Biol Eng, 1:177–182.
White J, Fu Q, Hays S et al (2023) A prompt pattern catalog to enhance prompt engineering with ChatGPT. arXiv, 2302:11382. https://doi.org/10.48550/arXiv.2302.11382
Tanaka Y, Nakata T, Aiga K et al (2024) Performance of generative pretrained transformer on the National Medical Licensing Examination in Japan. PLOS Digit Health, 3: e0000433. https://doi.org/10.1371/journal.pdig.0000433
Yanagita Y, Yokokawa D, Uchida S et al (2023) Accuracy of ChatGPT on medical questions in the National Medical Licensing Examination in Japan: Evaluation Study. JMIR Form Res, 7:e48023. https://doi.org/10.2196/48023
Kataoka Y, Yamamoto-Kataoka S, So R et al (2023) Beyond the pass mark: accuracy of ChatGPT and Bing in the National Medical Licensure Examination in Japan. JMA J, 6:536-538. https://doi.org/10.31662/jmaj.2023-0043
Ohta K, Ohta S (2023) The performance of GPT-3.5, GPT-4, and Bard on the Japanese National Dentist Examination: a comparison study. Cureus, 15:e50369. https://doi.org/10.7759/cureus.50369
Kunitsu Y (2023) The Potential of GPT-4 as a Support Tool for Pharmacists: Analytical study using the Japanese National Examination for Pharmacists. JMIR Med Edu, 9:e48452. https://doi.org/10.2196/48452
Kaneda Y, Takahashi R, Kaneda U et al (2023) Assessing the performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination. Cureus, 15:e42924. https://doi.org/10.7759/cureus.42924
Open AI (2024) GPT-4 Technical Report. arXiv, 2303:08774. https://doi.org/10.48550/arXiv.2303.08774
McFarlan SI, Alkaissi H (2023) Artificial hallucinations in ChatGPT: implications in scientific writing. Cureus, 15:e35179. https://doi.org/10.7759/cureus.35179
Author information
Authors and Affiliations
Contributions
KI conceived and designed the study. KI and NA encoded and input the data into ChatGPT. KI and KF classified the correctness of generated answers. KI and KF developed the study protocol and performed statistical analyses. KI wrote the manuscript.
Corresponding author
Ethics declarations
Ethical Approval
This study did not involve human subjects. Therefore, neither approval from the institutional review board nor informed consent was required.
Competing Interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ishida, K., Arisaka, N. & Fujii, K. Analysis of Responses of GPT-4 V to the Japanese National Clinical Engineer Licensing Examination. J Med Syst 48, 83 (2024). https://doi.org/10.1007/s10916-024-02103-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10916-024-02103-w