[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3514094.3534128acmconferencesArticle/Chapter ViewAbstractPublication PagesaiesConference Proceedingsconference-collections
research-article

A Meta-Analysis of the Utility of Explainable Artificial Intelligence in Human-AI Decision-Making

Published: 27 July 2022 Publication History

Abstract

Research in artificial intelligence (AI)-assisted decision-making is experiencing tremendous growth with a constantly rising number of studies evaluating the effect of AI with and without techniques from the field of explainable AI (XAI) on human decision-making performance. However, as tasks and experimental setups vary due to different objectives, some studies report improved user decision-making performance through XAI, while others report only negligible effects. Therefore, in this article, we present an initial synthesis of existing research on XAI studies using a statistical meta-analysis to derive implications across existing research. We observe a statistically positive impact of XAI on users' performance. Additionally, the first results indicate that human-AI decision-making tends to yield better task performance on text data. However, we find no effect of explanations on users' performance compared to sole AI predictions. Our initial synthesis gives rise to future research investigating the underlying causes and contributes to further developing algorithms that effectively benefit human decision-makers by providing meaningful explanations.

Supplementary Material

MP4 File (aies008.mp4)
Research in artificial intelligence (AI)-assisted decision-making is experiencing tremendous growth with a constantly rising number of studies evaluating the effect of AI with and without techniques from the field of explainable AI (XAI) on human decision-making performance. However, as tasks and experimental setups vary due to different objectives, some studies report improved user decision-making performance through XAI, while others report only negligible effects. Therefore, in this article, we present an initial synthesis of existing research on XAI studies using a statistical meta-analysis to derive implications across existing research.

References

[1]
Amina Adadi and Mohammed Berrada. 2018. Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access, Vol. 6 (2018), 52138--52160.
[2]
Julius Adebayo, Michael Muelly, Ilaria Liccardi, and Been Kim. 2020. Debugging tests for model explanations. arXiv preprint arXiv:2011.05429 (2020).
[3]
Yasmeen Alufaisan, Laura R Marusich, Jonathan Z Bakdash, Yan Zhou, and Murat Kantarcioglu. 2021. Does explainable artificial intelligence improve human decision-making?. In Proceedings of the AAAI Conference on Artificial Intelligence. 6618--6626.
[4]
Gagan Bansal, Tongshuang Wu, Joyce Zhou, Raymond Fok, Besmira Nushi, Ece Kamar, Marco Tulio Ribeiro, and Daniel Weld. 2021. Does the whole exceed its parts? The effect of AI explanations on complementary team performance. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1--16.
[5]
Kevin Bauer, Oliver Hinz, Wil van der Aalst, and Christof Weinhardt. 2021. Expl(AI)n it to me--explainable AI and information systems research. Business & Information Systems Engineering, Vol. 63, 2 (2021), 79--82.
[6]
Michael Borenstein, Larry V Hedges, Julian PT Higgins, and Hannah R Rothstein. 2010. A basic introduction to fixed-effect and random-effects models for meta-analysis. Research synthesis methods, Vol. 1, 2 (2010), 97--111.
[7]
Noam Brown and Tuomas Sandholm. 2019. Superhuman AI for multiplayer poker. Science, Vol. 365 (2019), 885--890.
[8]
Zana Bucc inca, Phoebe Lin, Krzysztof Z Gajos, and Elena L Glassman. 2020. Proxy tasks and subjective measures can be misleading in evaluating explainable AI systems. In Proceedings of the 25th International Conference on Intelligent User Interfaces. 454--464.
[9]
Zana Bucc inca, Maja Barbara Malaya, and Krzysztof Z Gajos. 2021. To trust or to think: cognitive forcing functions can reduce overreliance on AI in AI-assisted decision-making. Proceedings of the ACM on Human-Computer Interaction, Vol. 5, CSCW1 (2021), 1--21.
[10]
Carrie J Cai, Jonas Jongejan, and Jess Holbrook. 2019. The effects of example-based explanations in a machine learning interface. In Proceedings of the 24th International Conference on Intelligent User Interfaces. 258--262.
[11]
Samuel Carton, Qiaozhu Mei, and Paul Resnick. 2018. Extractive adversarial networks: high-recall explanations for identifying personal attacks in social media posts. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.
[12]
Samuel Carton, Qiaozhu Mei, and Paul Resnick. 2020. Feature-based explanations don't help people detect misclassifications of online toxicity. In Proceedings of the International AAAI Conference on Web and Social Media. 95--106.
[13]
Arjun Chandrasekaran, Viraj Prabhu, Deshraj Yadav, Prithvijit Chattopadhyay, and Devi Parikh. 2018. Do explanations make VQA models more predictable to a human?. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 1036--1042.
[14]
Eric Chu, Deb Roy, and Jacob Andreas. 2020. Are visual explanations useful? A case study in model-in-the-loop prediction. arXiv preprint arXiv:2007.12248 (2020).
[15]
Arun Das and Paul Rad. 2020. Opportunities and challenges in explainable artificial intelligence (XAI): a survey. arXiv preprint arXiv:2006.11371 (2020).
[16]
Min-Yuh Day, Tun-Kung Cheng, and Jheng-Gang Li. 2018. AI robo-advisor with big data analytics for financial services. In International Conference on Advances in Social Networks Analysis and Mining. 1027--1031.
[17]
Maria De-Arteaga, Alexey Romanov, Hanna Wallach, Jennifer Chayes, Christian Borgs, Alexandra Chouldechova, Sahin Geyik, Krishnaram Kenthapadi, and Adam Tauman Kalai. 2019. Bias in bios: a case study of semantic representation bias in a high-stakes setting. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 120--128.
[18]
Rebecca DerSimonian and Nan Laird. 1986. Meta-analysis in clinical trials. Controlled Clinical Trials, Vol. 7, 3 (1986), 177--188.
[19]
Dheeru Dua and Casey Graff. 2017. UCI machine learning repository. http://archive.ics.uci.edu/ml
[20]
JA Eaden, KR Abrams, and JF Mayberry. 2001. The risk of colorectal cancer in ulcerative colitis: a meta-analysis. Gut, Vol. 48, 4 (2001), 526--535.
[21]
Andreas Fügener, Jörn Grahl, Alok Gupta, and Wolfgang Ketter. 2021. Will humans-in-the-loop become borgs? Merits and pitfalls of working with AI. Management Information Systems Quarterly, Vol. 45, 3 (2021), 1527--1556.
[22]
Ben Green and Yiling Chen. 2019. The principles and limits of algorithm-in-the-loop decision making. Proceedings of the ACM on Human-Computer Interaction, Vol. 3, CSCW (2019), 1--24.
[23]
Peter Hase and Mohit Bansal. 2020. Evaluating explainable AI: which algorithmic explanations help users predict model behavior?. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 5540--5552.
[24]
Kaiming He, X. Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. International Conference on Computer Vision (2015), 1026--1034.
[25]
Ruining He and Julian McAuley. 2016. Ups and downs: modeling the visual evolution of fashion trends with one-class collaborative filtering. In Proceedings of the 25th International Conference on World Wide Web. 507--517.
[26]
Larry V Hedges and Ingram Olkin. 1985. Statistical methods for meta-analysis. Academic Press.
[27]
Patrick Hemmer, Max Schemmer, Niklas Kühl, Michael Vössing, and Gerhard Satzger. 2022. On the effect of information asymmetry in human-AI teams. Human-Centered Explainable AI Workshop at the CHI Conference on Human Factors in Computing Systems (2022).
[28]
Patrick Hemmer, Maximilian Schemmer, Michael Vössing, and Niklas Kühl. 2021. Human-AI complementarity in hybrid intelligence systems: a structured literature review. In Proceedings of the 25th Pacific Asia Conference on Information Systems.
[29]
Julian PT Higgins, James Thomas, Jacqueline Chandler, Miranda Cumpston, Tianjing Li, Matthew J Page, and Vivian A Welch. 2019. Cochrane Handbook for Systematic Reviews of Interventions. John Wiley & Sons.
[30]
Joanna IntHout, John PA Ioannidis, Maroeska M Rovers, and Jelle J Goeman. 2016. Plea for routinely presenting prediction intervals in meta-analysis. BMJ Open, Vol. 6, 7 (2016).
[31]
Dan Jackson and Jack Bowden. 2016. Confidence intervals for the between-study variance in random-effects meta-analysis using generalised heterogeneity statistics: should we use unequal tails? BMC Medical Research Methodology, Vol. 16, 1 (2016), 1--15.
[32]
Harmanpreet Kaur, Harsha Nori, Samuel Jenkins, Rich Caruana, Hanna Wallach, and Jennifer Wortman Vaughan. 2020. Interpreting interpretability: understanding data scientists' use of interpretability tools for machine learning. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1--14.
[33]
Jon Kleinberg, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. 2018. Human decisions and machine predictions. The Quarterly Journal of Economics, Vol. 133, 1 (2018), 237--293.
[34]
Johannes Kunkel, Tim Donkers, Lisa Michael, Catalin-Mihai Barbu, and Jürgen Ziegler. 2019. Let me explain: impact of personal and impersonal explanations on trust in recommender systems. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1--12.
[35]
Vivian Lai, Chacha Chen, Q Vera Liao, Alison Smith-Renner, and Chenhao Tan. 2021. Towards a science of human-AI decision making: a survey of empirical studies. arXiv preprint arXiv:2112.11471 (2021).
[36]
Vivian Lai, Han Liu, and Chenhao Tan. 2020. "Why is 'Chicago' deceptive?" Towards building model-driven tutorials for humans. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1--13.
[37]
Vivian Lai and Chenhao Tan. 2019. On human predictions with explanations and predictions of machine learning models: a case study on deception detection. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 29--38.
[38]
John D. Lee and Katrina A. See. 2004. Trust in automation: designing for appropriate reliance. Human Factors, Vol. 46 1 (2004), 50--80.
[39]
Han Liu, Vivian Lai, and Chenhao Tan. 2021. Understanding the effect of out-of-distribution examples and interactive explanations on human-AI decision making. Proceedings of the ACM on Human-Computer Interaction, Vol. 5 (2021), 1--45.
[40]
Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems.
[41]
Julian McAuley, Jure Leskovec, and Dan Jurafsky. 2012. Learning attitudes and attributes from multi-aspect reviews. In 12th International Conference on Data Mining. 1020--1025.
[42]
Scott Mayer McKinney, Marcin Sieniek, Varun Godbole, Jonathan Godwin, Natasha Antropova, Hutan Ashrafian, Trevor Back, Mary Chesus, Greg Corrado, Ara Darzi, Mozziyar Etemadi, Florencia Garcia-Vicente, Fiona J. Gilbert, Mark D. Halling-Brown, Demis Hassabis, Sunny Jansen, Alan Karthikesalingam, Christopher J. Kelly, Dominic King, Joseph R. Ledsam, David S. Melnick, Hormuz Mostofi, Lily H. Peng, Joshua Jay Reicher, Bernardino Romera-Paredes, Richard Sidebottom, Mustafa Suleyman, Daniel Tse, Kenneth C. Young, Jeffrey De Fauw, and Shravya Shetty. 2020. International evaluation of an AI system for breast cancer screening. Nature, Vol. 577 (2020), 89--94.
[43]
Giang Nguyen, Daeyoung Kim, and Anh Nguyen. 2021. The effectiveness of feature attribution methods and its correlation with automatic evaluation scores. In Advances in Neural Information Processing Systems.
[44]
United States Department of Justice. Office of Justice Programs. Bureau of Justice Statistics. 2014. State court processing statistics, 1990--2009: felony defendants in large urban counties.
[45]
Myle Ott, Claire Cardie, and Jeffrey T Hancock. 2013. Negative deceptive opinion spam. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 497--501.
[46]
Myle Ott, Yejin Choi, Claire Cardie, and Jeffrey T Hancock. 2011. Finding deceptive opinion spam by any stretch of the imagination. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. 309--319.
[47]
Matteo Pennisi, Isaak Kavasidis, Concetto Spampinato, Vincenzo Schininà, Simone Palazzo, Francesco Rundo, Massimo Cristofaro, Paolo Campioni, Elisa Pianura, Federica Di Stefano, Ada Petrone, Fabrizio Albarello, Giuseppe Ippolito, Salvatore Cuzzocrea, and Sabrina Conoci. 2021. An explainable AI system for automated COVID-19 assessment and lesion categorization from CT-scans. Artificial Intelligence in Medicine, Vol. 118 (2021), 102114--102114.
[48]
ProPublica. 2016. Machine bias: there's software used across the country to predict future criminals. And it's biased against blacks. (2016). https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
[49]
016)]% ribeiro_why_2016, Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why should I trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1135--1144.
[50]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: high-precision model-agnostic explanations. In Proceedings of the AAAI Conference on Artificial Intelligence.
[51]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et almbox. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, Vol. 115, 3 (2015), 211--252.
[52]
Max Schemmer, Niklas Kühl, Carina Benz, and Gerhard Satzger. 2022. On the influence of explainable AI on automation bias. In Proceedings of the 30th European Conference on Information Systems.
[53]
Michiel Schotten, Wim JN Meester, Susanne Steiginga, Cameron A Ross, et almbox. 2017. A brief history of Scopus: The world's largest abstract and citation database of scientific literature. In Research analytics. Auerbach Publications, 31--58.
[54]
Jakob Schöffer, Maria De-Arteaga, and Niklas Kuehl. 2022. On the relationship between explanations, fairness perceptions, and decisions. Human-Centered Explainable AI Workshop at the CHI Conference on Human Factors in Computing Systems (2022).
[55]
Julian Senoner, Torbjørn H. Netland, and Stefan Feuerriegel. 2021. Using explainable artificial intelligence to improve process quality: evidence from semiconductor manufacturing. Management Science (2021).
[56]
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, L. Sifre, Dharshan Kumaran, Thore Graepel, Timothy P. Lillicrap, Karen Simonyan, and Demis Hassabis. 2018. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, Vol. 362 (2018), 1140--1144.
[57]
Maximilian Stauder and Niklas Kühl. 2021. AI for in-line vehicle sequence controlling: development and evaluation of an adaptive machine learning artifact to predict sequence deviations in a mixed-model production line. Flexible Services and Manufacturing Journal (2021), 1--39.
[58]
LSAT Prep Books Team. 2017. LSAT prep book study guide: quick study & practice test questions for the Law School Admissions council's (LSAC) Law school admission test. Mometrix Test Preparation, Beaumont, TX.
[59]
Alexander Treiss, Jannis Walk, and Niklas Kühl. 2020. An uncertainty-based human-in-the-loop system for industrial tool wear analysis. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 85--100.
[60]
Jasper van der Waa, Elisabeth Nieuwburg, Anita Cremers, and Mark Neerincx. 2021. Evaluating XAI: a comparison of rule-based and example-based explanations. Artificial Intelligence, Vol. 291 (2021), 103404.
[61]
Jan vom Brocke, Alexander Simons, Björn Niehaves, Kai Riemer, Ralf Plattfaut, and Anne Cleven. 2009. Reconstructing the giant: on the importance of rigour in documenting the literature search process. In Proceedings of the 17th European Conference on Information Systems.
[62]
Michael Vössing, Niklas Kühl, Matteo Lind, and Gerhard Satzger. 2022. Designing transparency for effective human-AI collaboration. Information Systems Frontiers (2022).
[63]
N. Wu, Jason Phang, Jungkyu Park, Yiqiu Shen, Zhe Huang, Masha Zorin, Stanislaw Jastrzebski, Thibault Févry, Joe Katsnelson, Eric Kim, S. Wolfson, Ujas Parikh, Sushma Gaddam, L. Lin, Kara Ho, Joshua D. Weinstein, B. Reig, Yiming Gao, H. Toth, Kristine Pysarenko, A. Lewin, Jiyon Lee, Krystal Airola, E. Mema, Stephanie Chung, Esther Hwang, N. Samreen, S. Kim, L. Heacock, L. Moy, Kyunghyun Cho, and K. Geras. 2020. Deep neural networks improve radiologists' performance in breast cancer screening. IEEE Transactions on Medical Imaging, Vol. 39, 4 (2020), 1184--1194.
[64]
Ellery Wulczyn, Nithum Thain, and Lucas Dixon. 2017. Ex machina: personal attacks seen at scale. In Proceedings of the 26th International Conference on World Wide Web. 1391--1399.
[65]
Arnold Yeung, Shalmali Joshi, Joseph Jay Williams, and Frank Rudzicz. 2020. Sequential explanations with mental model-based policies. Workshop on Human Interpretability in Machine Learning at International Conference on Machine Learning (2020).
[66]
Kun Yu, Shlomo Berkovsky, Ronnie Taib, Jianlong Zhou, and Fang Chen. 2019. Do I trust my machine teammate? An investigation from perception to decision. In Proceedings of the 24th International Conference on Intelligent User Interfaces.
[67]
Yunfeng Zhang, Q Vera Liao, and Rachel KE Bellamy. 2020. Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 295--305.

Cited By

View all
  • (2025)Nullius in Explanans: an ethical risk assessment for explainable AIEthics and Information Technology10.1007/s10676-024-09800-727:1Online publication date: 1-Mar-2025
  • (2025)Trustworthy Hybrid Decision-MakingMachine Learning and Principles and Practice of Knowledge Discovery in Databases10.1007/978-3-031-74627-7_17(239-244)Online publication date: 1-Jan-2025
  • (2024)Using AI uncertainty quantification to improve human decision-makingProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693491(34949-34960)Online publication date: 21-Jul-2024
  • Show More Cited By

Index Terms

  1. A Meta-Analysis of the Utility of Explainable Artificial Intelligence in Human-AI Decision-Making

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      AIES '22: Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society
      July 2022
      939 pages
      ISBN:9781450392471
      DOI:10.1145/3514094
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 July 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. decision-making
      2. empirical studies
      3. explainable artificial intelligence
      4. meta-analysis

      Qualifiers

      • Research-article

      Conference

      AIES '22
      Sponsor:
      AIES '22: AAAI/ACM Conference on AI, Ethics, and Society
      May 19 - 21, 2021
      Oxford, United Kingdom

      Acceptance Rates

      Overall Acceptance Rate 61 of 162 submissions, 38%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)479
      • Downloads (Last 6 weeks)31
      Reflects downloads up to 09 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)Nullius in Explanans: an ethical risk assessment for explainable AIEthics and Information Technology10.1007/s10676-024-09800-727:1Online publication date: 1-Mar-2025
      • (2025)Trustworthy Hybrid Decision-MakingMachine Learning and Principles and Practice of Knowledge Discovery in Databases10.1007/978-3-031-74627-7_17(239-244)Online publication date: 1-Jan-2025
      • (2024)Using AI uncertainty quantification to improve human decision-makingProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693491(34949-34960)Online publication date: 21-Jul-2024
      • (2024)The Emerging Cybersecurity Challenges With Artificial IntelligenceMultisector Insights in Healthcare, Social Sciences, Society, and Technology10.4018/979-8-3693-3226-9.ch010(163-185)Online publication date: 5-Jan-2024
      • (2024)Humans in XAI: increased reliance in decision-making under uncertainty by using explanation strategiesFrontiers in Behavioral Economics10.3389/frbhe.2024.13770753Online publication date: 8-Mar-2024
      • (2024)Developing Interpretable Models for Complex Decision-Making2024 36th Conference of Open Innovations Association (FRUCT)10.23919/FRUCT64283.2024.10749922(66-75)Online publication date: 30-Oct-2024
      • (2024)AI-assistance to decision-makers: evaluating usability, induced cognitive load, and trust's impactProceedings of the European Conference on Cognitive Ergonomics 202410.1145/3673805.3673845(1-4)Online publication date: 8-Oct-2024
      • (2024)You Can Only Verify When You Know the Answer: Feature-Based Explanations Reduce Overreliance on AI for Easy Decisions, but Not for Hard OnesProceedings of Mensch und Computer 202410.1145/3670653.3670660(156-170)Online publication date: 1-Sep-2024
      • (2024)Reassuring, Misleading, Debunking: Comparing Effects of XAI Methods on Human DecisionsACM Transactions on Interactive Intelligent Systems10.1145/366564714:3(1-36)Online publication date: 22-May-2024
      • (2024)Influence on Judgements of Learning Given Perceived AI AnnotationsProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3662044(221-231)Online publication date: 9-Jul-2024
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media