[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Rams, hounds and white boxes: : Investigating human–AI collaboration protocols in medical diagnosis

Published: 01 April 2023 Publication History

Abstract

In this paper, we study human–AI collaboration protocols, a design-oriented construct aimed at establishing and evaluating how humans and AI can collaborate in cognitive tasks. We applied this construct in two user studies involving 12 specialist radiologists (the knee MRI study) and 44 ECG readers of varying expertise (the ECG study), who evaluated 240 and 20 cases, respectively, in different collaboration configurations. We confirm the utility of AI support but find that XAI can be associated with a “white-box paradox”, producing a null or detrimental effect. We also find that the order of presentation matters: AI-first protocols are associated with higher diagnostic accuracy than human-first protocols, and with higher accuracy than both humans and AI alone. Our findings identify the best conditions for AI to augment human diagnostic skills, rather than trigger dysfunctional responses and cognitive biases that can undermine decision effectiveness.

Highlights

We study how humans and AI can collaborate in cognitive tasks in the medical setting.
Two studies involved 12 radiologists and 44 ECG readers (for 240 and 20 cases, resp.)
AI support was found useful but XAI was associated with a null or detrimental effect.
AI-first protocols had higher accuracy than human-first ones and humans or AI alone.
Our findings identify the best conditions for AI to augment human diagnostic skills.

References

[1]
Elmore J.G., Lee C.I., Artificial intelligence in medical imaging—learning from past mistakes in mammography, in: JAMA health forum, Vol. 3, American Medical Association, 2022.
[2]
Topol E.J., High-performance medicine: the convergence of human and artificial intelligence, Nat Med 25 (2019) 44–56.
[3]
Yu K.-H., Beam A.L., Kohane I.S., Artificial intelligence in healthcare, Nat Biomed Eng 2 (2018) 719–731.
[4]
Gaube S., Suresh H., Raue M., Merritt A., Berkowitz S.J., Lermer E., Coughlin J.F., Guttag J.V., Colak E., Ghassemi M., Do as ai say: susceptibility in deployment of clinical decision-aids, NPJ Digit Med 4 (2021) 1–8.
[5]
Cutillo C.M., Sharma K.R., Foschini L., Kundu S., Mackintosh M., Mandl K.D., Machine intelligence in healthcare—perspectives on trustworthiness, explainability, usability, and transparency, NPJ Digit Med 3 (2020) 1–5.
[6]
Holzinger A.T., Muller H., Toward human–ai interfaces to support explainability and causability in medical ai, Computer 54 (2021) 78–86.
[7]
Cabitza F., Campagner A., Sconfienza L.M., Studying human-AI collaboration protocols: the case of the kasparov’s law in radiological double reading, Health Inf Sci Syst 9 (2021) 1–20,.
[8]
Guermazi A., Tannoury C., Kompel A.J., Murakami A.M., Ducarouge A., Gillibert A., Li X., Tournier A., Lahoud Y., Jarraya M., et al., Improving radiographic fracture recognition performance and efficiency using artificial intelligence, Radiology 302 (2022) 627–636.
[9]
Hwang E.J., Park S., Jin K.-N., Kim J.Im., Choi S.Y., Lee J.H., Goo J.M., Aum J., Yim J.-J., Cohen J.G., et al., Development and validation of a deep learning–based automated detection algorithm for major thoracic diseases on chest radiographs, JAMA Netw Open 2 (2019).
[10]
Jain A., Way D., Gupta Y., de Oliveira Marinho G., Hartford J., Sayres R., Kanada K., Eng C., Nagpal K., et al., Development and assessment of an artificial intelligence–based tool for skin condition diagnosis by primary care physicians and nurse practitioners in teledermatology practices, JAMA Netw Open 4 (2021).
[11]
Kiani A., Uyumazturk B., Rajpurkar P., Wang A., Gao R., Jones E., Yu C.P., Ball R.L., Montine T.J., et al., Impact of a deep learning assistant on the histopathologic classification of liver cancer, NPJ Digit Med 3 (2020) 1–8.
[12]
Tschandl P., Rinner C., Apalla Z., Argenziano G., Codella N., Halpern A., Janda M., Lallas A., Longo C., Malvehy J., et al., Human–computer collaboration for skin cancer recognition, Nat Med 26 (2020) 1229–1234.
[13]
Lyell D., Coiera E., Automation bias and verification complexity: a systematic review, J Am Med Inform Assoc 24 (2017) 423–431.
[14]
Cabitza F., Campagner A., Simone C., The need to move away from agential-AI: Empirical investigations, useful concepts and open issues, Int J Hum-Comput Stud 155 (2021),.
[15]
Dietvorst B.J., Simmons J.P., Massey C., Algorithm aversion: people erroneously avoid algorithms after seeing them err, J Exp Psychol [Gen] 144 (114) (2015).
[16]
Hoff T., Deskilling and adaptation among primary care physicians using two work innovations, Health Care Manage Rev 36 (2011) 338–348.
[17]
Troya J., Fitting D., Brand M., Sudarevic B., Kather J.N., Meining A., Hann A., The influence of computer-aided polyp detection systems on reaction time for polyp detection and eye gaze, Endoscopy (2022).
[18]
Skitka L.J., Mosier K.L., Burdick M., Rosenblatt B., Automation bias and errors: are crews better than individuals?, Int J Aviat Psychol 10 (2000) 85–97.
[19]
Cummings ML. Automation bias in intelligent time critical decision support systems. In: AIAA 3rd intelligent systems conference. 2004, p. 2004–6313.
[20]
Green B., Chen Y., The principles and limits of algorithm-in-the-loop decision making, Proc ACM Hum-Comput Interact 3 (2019) 1–24.
[21]
Doshi-Velez F., Kim B., Towards a rigorous science of interpretable machine learning, 2017, arXiv preprint arXiv:1702.08608.
[22]
van Berkel N., Skov M.B., Kjeldskov J., Human-ai interaction: intermittent, continuous, and proactive, Interactions 28 (2021) 67–71.
[23]
Bertrand A, Belloum R, Eagan JR, Maxwell W. How cognitive biases affect xai-assisted decision-making: A systematic review. In: Proceedings of the 2022 AAAI/ACM conference on AI, ethics, and society. 2022, p. 78–91.
[24]
Cabitza F, Campagner A, Zotti FD, Ravizza A, Sternini F. All you need is higher accuracy? On the quest for minimum acceptable accuracy for medical artificial intelligence. In: Proceedings of the 12th IADIS International Conference e-Health 2020, EH 2020 - Part of the 14th Multi Conference on Computer Science and Information Systems, MCCSIS 2020. 2020, p. 159–66.
[25]
Vodrahalli K., Gerstenberg T., Zou J., Uncalibrated models can improve human-ai collaboration, 2022, arXiv preprint arXiv:2202.05983.
[26]
Schmidt K., Simonee C., Coordination mechanisms: Towards a conceptual foundation of cscw systems design, in: Computer supported cooperative work (CSCW), Vol. 5, 1996, pp. 155–200.
[27]
Newell B.R., Shanks D.R., Unconscious influences on decision making: A critical review, Behav Brain Sci 37 (2014) 1–19.
[28]
Shin D., How do users interact with algorithm recommender systems? The interaction of users, algorithms, and performance, Comput Hum Behav 109 (2020).
[29]
Bental D.S., Cawsey A., Jones R., Patient information systems that tailor to the individual, Patient Educ Couns 36 (1999) 171–180.
[30]
Ooge J, Verbert K. Explaining artificial intelligence with tailored interactive visualisations. In: 27th international conference on intelligent user interfaces. 2022, p. 120–3.
[31]
Buçinca M.B., Gajos K.Z., To trust or to think: cognitive forcing functions can reduce overreliance on ai in ai-assisted decision-making, Proc ACM Hum-Comput Interact 5 (2021) 1–21.
[32]
Vasconcelos H., Jörke M., Grunde-McLaughlin M., Gerstenberg T., Bernstein M., Krishna R., Explanations can reduce overreliance on ai systems during decision-making, 2022, arXiv preprint arXiv:2212.06823.
[33]
Ammenwerth E., Evidence-based health informatics: how do we know what we know?, Methods Inf Med 54 (2015) 298–307.
[34]
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision. 2017, p. 618–26.
[35]
Ronzio L., Campagner A., Cabitza F., Gensini G.F., Unity is intelligence: A collective intelligence experiment on ecg reading to improve diagnostic performance in cardiology, J Intell 9 (17) (2021),.
[36]
Itoh M., Tanaka K., Mathematical modeling of trust in automation: Trust, distrust, and mistrust, in: Proceedings of the human factors and ergonomics society annual meeting, Vol. 44, SAGE Publications Sage CA, Los Angeles, CA, 2000, pp. 9–12.
[37]
Siau K., Wang W., Building trust in artificial intelligence, machine learning, and robotics, Cutter Bus Technol J 31 (2018) 47–53.
[38]
Schemmer M., Hemmer P., Kühl N., Benz C., Satzger G., Should i follow ai-based advice? measuring appropriate reliance in human-ai decision-making, 2022, arXiv preprint arXiv:2204.06916.
[39]
Mann H.B., Whitney D.R., On a test of whether one of two random variables is stochastically larger than the other, Ann Math Stat 5 (1947) 0–60.
[40]
Woolson R.F., Wilcoxon signed-rank test, Wiley Encyclopedia Clin Trials (2007) 1–3.
[41]
Cureton E.E., Rank-biserial correlation, Psychometrika 21 (1956) 287–290.
[42]
Bertrand A., Belloum R., Eagan J.R., Maxwell W., How cognitive biases affect xai-assisted decision-making: A systematic review, in: Proc. of the AAAI/ACM conference on artificial intelligence, ethics, and society, AIES, ACM, New York, NY, 2022,.
[43]
Rastogi C., Zhang Y., Wei D., Varshney K.R., Dhurandhar A., Tomsett R., Deciding fast and slow: The role of cognitive biases in ai-assisted decision-making, 2020, arXiv preprint arXiv:2010.07938.
[44]
Alufaisan Y, Marusich LR, Bakdash JZ, Zhou Y, Kantarcioglu M. Does explainable artificial intelligence improve human decision-making?. In: Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 2021, p. 6618–26.
[45]
Bansal G, Wu T, Zhou R, Nushi B, Kamar E, Ribeiro MT, Weld D. Does the whole exceed its parts? The effect of AI explanations on complementary team performance. In: Proceedings of the 2021 CHI conference on human factors in computing systems. 2021, p. 1–16.
[46]
Paleja R., Ghuy M., Arachchige N.Ranawaka., Jensen R., Gombolay M., The utility of explainable ai in ad hoc human-machine teaming, Adv Neural Inf Process Syst (2021) 34.
[47]
Rafner J., Dellermann D., Hjorth A., Verasztó C., Mackay W., Sherson J., Deskilling, upskilling, and reskilling: a case for hybrid intelligence, Morals Mach 1 (2022) 24–39.
[48]
Bansal G, Nushi B, Kamar E, Lasecki WS, Weld DS, Horvitz E. Beyond accuracy: The role of mental models in human-ai team performance. In: Proceedings of the AAAI conference on human computation and crowdsourcing. Vol. 7. 2019, p. 2–11.
[49]
Hemmer P., Schemmer M., Vössing M., Kühl N., Human-ai complementarity in hybrid intelligence systems: A structured literature review, PACIS (2021) 78.
[50]
Inkpen K. Does my ai help or hurt? exploring human-ai complementarity. In: Proceedings of the 28th ACM conference on user modeling, adaptation and personalization. 2020, p. 2.
[51]
Zhang Q, Lee ML, Carter S. You complete me: Human-ai teams and complementary expertise. In: CHI conference on human factors in computing systems. 2022, p. 1–28.
[52]
Schemmer M., Kühl N., Benz C., Satzger G., On the influence of explainable ai on automation bias, in: Thirtieth European Conference on Information Systems (ECIS 2022), 2022, Preprint at https://arxiv.org/abs/2204.08859.
[53]
Combi C., Amico B., Bellazzi R., Holzinger A., Moore J.H., Zitnik M., Holmes J.H., A manifesto on explainability for artificial intelligence in medicine, Artif Intell Med (2022).
[54]
Ghassemi M., Oakden-Rayner L., Beam A.L., The false hope of current approaches to explainable artificial intelligence in health care, Lancet Digit Health 3 (2021) e745–e750.
[55]
Ploug T., Holm S., The four dimensions of contestable ai diagnostics-a patient-centric approach to explainable ai, Artif Intell Med 107 (2020).
[56]
Dragoni M., Donadello I., Eccher C., Explainable ai meets persuasiveness: Translating reasoning results into behavioral change advice, Artif Intell Med 105 (2020).
[57]
Ehsan U., Passi S., Liao Q.V., Chan L., Lee I., Muller M., Riedl M.O., et al., The who in explainable ai: how ai background shapes perceptions of ai explanations, 2021, arXiv preprint arXiv:2107.13509.
[58]
Ehsan U., Riedl M.O., Explainability pitfalls: Beyond dark patterns in explainable ai, 2021, arXiv preprint arXiv:2109.12480.
[59]
Evans T., Retzlaff C.O., Geißler C., Kargl M., Plass M., Müller H., Kiehl T.-R., Zerbe N., Holzinger A., The explainability paradox: Challenges for xai in digital pathology, Future Gener Comput Syst (2022).
[60]
Klein G., Snapshots of the mind, The MIT Press, 2022.
[61]
Tversky A., Kahneman D., Advances in prospect theory: Cumulative representation of uncertainty, J Risk Uncertain 5 (1992) 297–323.
[62]
Kahneman D., Thinking, fast and slow, Macmillan, 2011.
[63]
Gigerenzer G., Gut feelings: the intelligence of the unconscious, Penguin, 2007.
[64]
Yanase J., Triantaphyllou E., A systematic survey of computer-aided diagnosis in medicine: Past and present developments, Expert Syst Appl 138 (2019).
[65]
Dey S, Karahalios K, Fu W-T. Getting there and beyond: Incidental learning of spatial knowledge with turn-by-turn directions and location updates in navigation interfaces. In: Proceedings of the symposium on spatial user interaction. 2018, p. 100–10.
[66]
Gajos KZ, Mamykina L. Do people engage cognitively with ai? impact of ai assistance on incidental learning. In: 27th international conference on intelligent user interfaces. 2022, p. 794–806.
[67]
Coiera E., A new informatics geography, Yearb Med Inform 25 (2016) 251–255.
[68]
Friedman C.P., A fundamental theorem of biomedical informatics, J Am Med Inform Assoc 16 (2009) 169–170.
[69]
Cabitza F., Zeitoun J.-D., The proof of the pudding: in praise of a culture of real-world validation for medical artificial intelligence, Ann Transl Med (2019) 7,.
[70]
Gur D., Bandos A.I., Cohen C.S., Hakim C.M., Hardesty L.A., Ganott M.A., Perrin R.L., Poller W.R., Shah R., Sumkin J.H., et al., The laboratory effect: comparing radiologists’ performance and variability during prospective clinical and laboratory mammography interpretations, Radiology 249 (47) (2008).
[71]
Holzinger A., The next frontier: AI we can really trust, in: Joint European conference on machine learning and knowledge discovery in databases, Springer, 2021, pp. 427–440.

Cited By

View all
  • (2024)Towards Understanding Human-AI Reliance Patterns Through Explanation StylesCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3678996(861-865)Online publication date: 5-Oct-2024
  • (2024)A Human–AI interaction paradigm and its application to rhinocytologyArtificial Intelligence in Medicine10.1016/j.artmed.2024.102933155:COnline publication date: 1-Sep-2024
  • (2024)Never tell me the oddsArtificial Intelligence in Medicine10.1016/j.artmed.2024.102819150:COnline publication date: 2-Jul-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Artificial Intelligence in Medicine
Artificial Intelligence in Medicine  Volume 138, Issue C
Apr 2023
168 pages

Publisher

Elsevier Science Publishers Ltd.

United Kingdom

Publication History

Published: 01 April 2023

Author Tags

  1. Human–AI collaboration protocols
  2. Artificial intelligence
  3. Explainable AI
  4. Cognitive biases
  5. Automation bias

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 29 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Towards Understanding Human-AI Reliance Patterns Through Explanation StylesCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3678996(861-865)Online publication date: 5-Oct-2024
  • (2024)A Human–AI interaction paradigm and its application to rhinocytologyArtificial Intelligence in Medicine10.1016/j.artmed.2024.102933155:COnline publication date: 1-Sep-2024
  • (2024)Never tell me the oddsArtificial Intelligence in Medicine10.1016/j.artmed.2024.102819150:COnline publication date: 2-Jul-2024
  • (2024)PIPNet3D: Interpretable Detection of Alzheimer in MRI ScansMedical Image Computing and Computer Assisted Intervention – MICCAI 2024 Workshops10.1007/978-3-031-77610-6_7(69-78)Online publication date: 7-Oct-2024
  • (2023)Let Me Think! Investigating the Effect of Explanations Feeding Doubts About the AI AdviceMachine Learning and Knowledge Extraction10.1007/978-3-031-40837-3_10(155-169)Online publication date: 28-Aug-2023

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media