[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content

Advertisement

Log in

Analysis of Responses of GPT-4 V to the Japanese National Clinical Engineer Licensing Examination

  • Published:
Journal of Medical Systems Aims and scope Submit manuscript

Abstract

Chat Generative Pretrained Transformer (ChatGPT; OpenAI) is a state-of-the-art large language model that can simulate human-like conversations based on user input. We evaluated the performance of GPT-4 V in the Japanese National Clinical Engineer Licensing Examination using 2,155 questions from 2012 to 2023. The average correct answer rate for all questions was 86.0%. In particular, clinical medicine, basic medicine, medical materials, biological properties, and mechanical engineering achieved a correct response rate of ≥ 90%. Conversely, medical device safety management, electrical and electronic engineering, and extracorporeal circulation obtained low correct answer rates ranging from 64.8% to 76.5%. The correct answer rates for questions that included figures/tables, required numerical calculation, figure/table ∩ calculation, and knowledge of Japanese Industrial Standards were 55.2%, 85.8%, 64.2% and 31.0%, respectively. The reason for the low correct answer rates is that ChatGPT lacked recognition of the images and knowledge of standards and laws. This study concludes that careful attention is required when using ChatGPT because several of its explanations lack the correct description.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

No datasets were generated or analysed during the current study.

References

  1. Introducing ChatGPT. OpenAI. https://openai.com/blog/chatgpt/

  2. Shen Y, Heacock L, Elias J et al (2023) ChatGPT and Other Large Language Models Are Double-edged Swords. Radiology, 307: e230163. https://doi.org/10.1148/radiol.230163

    Article  PubMed  Google Scholar 

  3. Hirosawa T, Harada Y, Yokose M et al (2023) Diagnostic accuracy of differential-diagnosis lists generated by Generative Pretrained Transformer 3 Chatbot for clinical vignettes with common chief complaints: a pilot study. Int J Environ Res Public Health, 20:3378. https://doi.org/10.3390/ijerph20043378

    Article  PubMed  PubMed Central  Google Scholar 

  4. Johnson S, King A, Warner E et al (2023) Using ChatGPT to evaluate cancer myths and misconceptions: artificial intelligence and cancer information. JNCI Cancer Spectr, 7:1–9. https://doi.org/10.1093/jncics/pkad015

    Article  Google Scholar 

  5. Ali SR, Dobbs TD, Hutchings HA et al (2023) Using ChatGPT to write patient clinic letters. Lancet Digit Health, 5:e179–e181. https://doi.org/10.1016/S2589-7500(23)00048-1

    Article  CAS  PubMed  Google Scholar 

  6. Kung TH, Cheatham M, Medenilla A et al (2023) Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health, 2:e0000198. https://doi.org/10.1371/journal.pdig.0000198

    Article  PubMed  PubMed Central  Google Scholar 

  7. GPT-4. OpenAI. https://openai.com/index/gpt-4-research/

  8. Takagi S, Watari T, Erabi A et al (2023) Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study. JMIR Med Educ, 9:e48002. https://doi.org/10.2196/48002

  9. Gilson A, Safranek CW, Huang T et al (2023) How does ChatGPT perform on the United States Medical Licensing Examination (USMLE)? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ, 9:e45312. https://doi.org/10.2196/45312

    Article  PubMed  PubMed Central  Google Scholar 

  10. Lai UH, Wu KS, Hsu TY et al (2023) Evaluating the performance of ChatGPT-4 on the United Kingdom Medical Licensing Assessment. Front Med (Lausanne), 10:1240915. https://doi.org/10.3389/fmed.2023.1240915

    Article  PubMed  Google Scholar 

  11. Mousavi M, Shafiee S, Harley JM et al (2024) Performance of generative pre-trained transformers (GPTs) in Certification Examination of the College of Family Physicians of Canada. Fam Med Community Health, 12:e002626. https://doi.org/10.1136/fmch-2023-002626

    Article  PubMed  PubMed Central  Google Scholar 

  12. Jung LB, Gudera JA, Wiegand TLT et al (2023) ChatGPT Passes German State Examination in Medicine With Picture Questions Omitted. Dtsch Arztebl Int, 120:373–374. https://doi.org/10.3238/arztebl.m2023.0113

    Article  PubMed  Google Scholar 

  13. Ebrahimian M, Behnam B, Ghayebi N et al (2023) ChatGPT in Iranian medical licensing examination: evaluating the diagnostic accuracy and decision-making capabilities of an AI-based model. BMJ Health Care Inform, 30:e100815. https://doi.org/10.1136/bmjhci-2023-100815

    Article  PubMed  PubMed Central  Google Scholar 

  14. Rosoł M, Gąsior JS, Łaba J et al (2023) Evaluation of the performance of GPT-3.5 and GPT-4 on the Polish Medical Final Examination. Sci Rep, 13:20512. https://doi.org/10.1038/s41598-023-46995-z

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Flores-Cohaila JA, García-Vicente A, Vizcarra-Jiménez SF et al (2023) Taype-Rondan A performance of ChatGPT on the Peruvian National Licensing Medical Examination: cross-Sectional study. JMIR Med Educ, 9:e48039. https://doi.org/10.2196/48039

    Article  PubMed  PubMed Central  Google Scholar 

  16. Wang X, Gong Z, Wang G et al (2023) ChatGPT performs on the Chinese National Medical Licensing Examination. J Med Syst, 47:86. https://doi.org/10.1007/s10916-023-01961-0

    Article  PubMed  Google Scholar 

  17. Huang CH, Hsiao HJ, Yeh PC et al (2024) Performance of ChatGPT on Stage 1 of the Taiwanese medical licensing exam. Digit Health, 10:20552076241233144. https://doi.org/10.1177/20552076241233144

    Article  PubMed  PubMed Central  Google Scholar 

  18. Zong H, Li J, Wu E et al (2024) Performance of ChatGPT on Chinese national medical licensing examinations: a five-year examination evaluation study for physicians, pharmacists and nurses. BMC Med Educ, 24:143. https://doi.org/10.1186/s12909-024-05125-7

    Article  PubMed  PubMed Central  Google Scholar 

  19. Wang Y, Shen HW, Chen TJ (2023) Performance of ChatGPT on the pharmacist licensing examination in Taiwan. J Chin Med Assoc, 86:653-658. https://doi.org/10.1097/JCMA.0000000000000942

    Article  PubMed  Google Scholar 

  20. Lee SA, Heo S, Park JH (2024) Performance of ChatGPT on the National Korean Occupational Therapy Licensing Examination. Digit Health, 10:20552076241236635. https://doi.org/10.1177/20552076241236635

    Article  PubMed  PubMed Central  Google Scholar 

  21. Kanai H (1989) Clinical engineering in Japan and the Bill for the Clinical Engineering Technician Law. Front Med Biol Eng, 1:177–182.

    CAS  PubMed  Google Scholar 

  22. White J, Fu Q, Hays S et al (2023) A prompt pattern catalog to enhance prompt engineering with ChatGPT. arXiv, 2302:11382. https://doi.org/10.48550/arXiv.2302.11382

  23. Tanaka Y, Nakata T, Aiga K et al (2024) Performance of generative pretrained transformer on the National Medical Licensing Examination in Japan. PLOS Digit Health, 3: e0000433. https://doi.org/10.1371/journal.pdig.0000433

    Article  PubMed  PubMed Central  Google Scholar 

  24. Yanagita Y, Yokokawa D, Uchida S et al (2023) Accuracy of ChatGPT on medical questions in the National Medical Licensing Examination in Japan: Evaluation Study. JMIR Form Res, 7:e48023. https://doi.org/10.2196/48023

    Article  PubMed  PubMed Central  Google Scholar 

  25. Kataoka Y, Yamamoto-Kataoka S, So R et al (2023) Beyond the pass mark: accuracy of ChatGPT and Bing in the National Medical Licensure Examination in Japan. JMA J, 6:536-538. https://doi.org/10.31662/jmaj.2023-0043

    Article  PubMed  PubMed Central  Google Scholar 

  26. Ohta K, Ohta S (2023) The performance of GPT-3.5, GPT-4, and Bard on the Japanese National Dentist Examination: a comparison study. Cureus, 15:e50369. https://doi.org/10.7759/cureus.50369

    Article  PubMed  PubMed Central  Google Scholar 

  27. Kunitsu Y (2023) The Potential of GPT-4 as a Support Tool for Pharmacists: Analytical study using the Japanese National Examination for Pharmacists. JMIR Med Edu, 9:e48452. https://doi.org/10.2196/48452

    Article  Google Scholar 

  28. Kaneda Y, Takahashi R, Kaneda U et al (2023) Assessing the performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination. Cureus, 15:e42924. https://doi.org/10.7759/cureus.42924

    Article  PubMed  PubMed Central  Google Scholar 

  29. Open AI (2024) GPT-4 Technical Report. arXiv, 2303:08774. https://doi.org/10.48550/arXiv.2303.08774

  30. McFarlan SI, Alkaissi H (2023) Artificial hallucinations in ChatGPT: implications in scientific writing. Cureus, 15:e35179. https://doi.org/10.7759/cureus.35179

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

KI conceived and designed the study. KI and NA encoded and input the data into ChatGPT. KI and KF classified the correctness of generated answers. KI and KF developed the study protocol and performed statistical analyses. KI wrote the manuscript.

Corresponding author

Correspondence to Kai Ishida.

Ethics declarations

Ethical Approval

This study did not involve human subjects. Therefore, neither approval from the institutional review board nor informed consent was required.

Competing Interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ishida, K., Arisaka, N. & Fujii, K. Analysis of Responses of GPT-4 V to the Japanese National Clinical Engineer Licensing Examination. J Med Syst 48, 83 (2024). https://doi.org/10.1007/s10916-024-02103-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10916-024-02103-w

Keywords