[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3377325.3377519acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article

Evaluating saliency map explanations for convolutional neural networks: a user study

Published: 17 March 2020 Publication History

Abstract

Convolutional neural networks (CNNs) offer great machine learning performance over a range of applications, but their operation is hard to interpret, even for experts. Various explanation algorithms have been proposed to address this issue, yet limited research effort has been reported concerning their user evaluation. In this paper, we report on an online between-group user study designed to evaluate the performance of "saliency maps" - a popular explanation algorithm for image classification applications of CNNs. Our results indicate that saliency maps produced by the LRP algorithm helped participants to learn about some specific image features the system is sensitive to. However, the maps seem to provide very limited help for participants to anticipate the network's output for new images. Drawing on our findings, we highlight implications for design and further research on explainable AL In particular, we argue the HCI and AI communities should look beyond instance-level explanations.

References

[1]
Ashraf Abdul, Jo Vermeulen, Danding Wang, Brian Y. Lim, and Mohan Kankanhalli. 2018. Trends and Trajectories for Explainable, Accountable and Intelligible Systems: An HCI Research Agenda. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, Article 582, 18 pages.
[2]
A. Adadi and M. Berrada. 2018. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access 6 (2018), 52138--52160.
[3]
Alper T. Alan, Enrico Costanza, Sarvapali D. Ramchurn, Joel Fischer, Tom Rodden, and Nicholas R. Jennings. 2016. Tariff Agent: Interacting with a Future Smart Energy System at Home. ACM Trans. Comput.-Hum. Interact. 23, 4 (Aug. 2016), 25:1--25:28.
[4]
Diego Ardila, Atilla P. Kiraly, Sujeeth Bharadwaj, Bokyung Choi, Joshua J. Reicher, Lily Peng, Daniel Tse, Mozziyar Etemadi, Wenxing Ye, Greg Corrado, David P. Naidich, and Shravya Shetty. 2019. End-to-End Lung Cancer Screening with Three-Dimensional Deep Learning on Low-Dose Chest Computed Tomography. Nature Medicine 25, 6 (June 2019), 954.
[5]
Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. 2015. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation. PLOS ONE 10, 7 (July 2015), e0130140.
[6]
Piraye Bayman and Richard E. Mayer. 1984. Instructional Manipulation of Users' Mental Models for Electronic Calculators. International Journal of Man-Machine Studies 20, 2 (1984), 189--199.
[7]
Christopher M Bishop. 2006. Pattern recognition and machine learning. Springer, New York, NY.
[8]
Benedict Du Boulay, Tim O'shea, and John Monk. 1999. The Black Box inside the Glass Box: Presenting Computing Concepts to Novices. International Journal of Human-Computer Studies 51, 2 (1999), 265--277.
[9]
A. Bussone, S. Stumpf, and D. O'Sullivan. 2015. The Role of Explanations on Trust and Reliance in Clinical Decision Support Systems. In 2015 International Conference on Healthcare Informatics. 160--169.
[10]
Z. Bylinskii, T. Judd, A. Oliva, A. Torralba, and F. Durand. 2019. What Do Different Evaluation Metrics Tell Us About Saliency Models? IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 3 (March 2019), 740--757.
[11]
Carrie J. Cai, Jonas Jongejan, and Jess Holbrook. 2019. The Effects of Example-Based Explanations in a Machine Learning Interface. In Proceedings of the 24th International Conference on Intelligent User Interfaces (IUI '19). ACM, New York, NY, USA, 258--262.
[12]
Alex Campolo, Madelyn Sanfilippo, Meredith Whittaker, and Kate Crawford. 2018. AI Now 2017 Report. Microsoft Research (Feb. 2018).
[13]
Henriette Cramer, Vanessa Evers, Satyan Ramlal, Maarten Someren, Lloyd Rutledge, Natalia Stash, Lora Aroyo, and Bob Wielinga. 2008. The Effects of Transparency on Trust in and Acceptance of a Content-Based Art Recommender. User Modeling and User-Adapted Interaction 18, 5 (Nov. 2008), 455--496.
[14]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Miami, FL, 248--255.
[15]
Anind K. Dey and Alan Newberger. 2009. Support for Context-Aware Intelligibility and Control. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '09). ACM, New York, NY, USA, 859--868.
[16]
Alan Dix. 1992. Human Issues in the Use of Pattern Recognition Techniques. Ellis Horwood, USA, 429--451.
[17]
Finale Doshi-Velez and Been Kim. 2017. Towards A Rigorous Science of Interpretable Machine Learning. (2017). arXiv:1702.08608
[18]
Mary T. Dzindolet, Scott A. Peterson, Regina A. Pomranky, Linda G. Pierce, and Hall P. Beck. 2003. The Role of Trust in Automation Reliance. Int. J. Hum.-Comput. Stud. 58, 6 (June 2003), 697--718.
[19]
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. 2007. The PASCAL Visual Object Classes Challenge 2007 (VOC2007). (2007). http://host.robots.ox.ac.uk/pascal/VOC/voc2007/
[20]
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. 2012. The PASCAL Visual Object Classes Challenge 2012 (VOC2012). (2012). http://host.robots.ox.ac.uk/pascal/VOC/voc2012/
[21]
Li Fei-Fei, Asha Iyer, Christof Koch, and Pietro Perona. 2007. What do we perceive in a glance of a real-world scene? Journal of Vision 7, 1 (Jan. 2007), 10--10.
[22]
Shi Feng and Jordan Boyd-Graber. 2019. What Can AI Do for Me?: Evaluating Machine Learning Interpretations in Cooperative Play. In Proceedings of the 24th International Conference on Intelligent User Interfaces (IUI '19). ACM, New York, NY, USA, 229--239.
[23]
Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. 2018. A Survey of Methods for Explaining Black Box Models. ACM Comput. Surv. 51, 5 (Aug. 2018), 93:1--93:42.
[24]
Lisa Anne Hendricks, Zeynep Akata, Marcus Rohrbach, Jeff Donahue, Bernt Schiele, and Trevor Darrell. 2016. Generating Visual Explanations. In Computer Vision - ECCV 2016 (Lecture Notes in Computer Science). Springer, Cham, 3--19.
[25]
Jonathan L. Herlocker, Joseph A. Konstan, and John Riedl. 2000. Explaining Collaborative Filtering Recommendations. In Proceedings of the 2000 ACM Conference on Computer Supported Cooperative Work (CSCW '00). ACM, New York, NY, USA, 241--250.
[26]
David E. Kieras and Susan Bovair. 1984. The Role of a Mental Model in Learning to Operate a Device. Cognitive Science 8, 3 (1984), 255--273.
[27]
Been Kim, Cynthia Rudin, and Julie A Shah. 2014. The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 1952--1960. http://papers.nips.cc/paper/5313-the-bayesian-case-model-a-generative-approach-for-case-based-reasoning-and-prototype-classification.pdf
[28]
Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, and Rory Sayres. 2017. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV). (2017). arXiv:1711.11279
[29]
René F. Kizilcec. 2016. How Much Information?: Effects of Transparency on Trust in an Algorithmic Interface. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16). ACM, New York, NY, USA, 2390--2395.
[30]
Jon Kleinberg, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. 2018. Human Decisions and Machine Predictions. The Quarterly Journal of Economics 133, 1 (Feb. 2018), 237--293.
[31]
Jürgen Koenemann and Nicholas J. Belkin. 1996. A Case for Interaction: A Study of Interactive Information Retrieval Behavior and Effectiveness. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '96). ACM, New York, NY, USA, 205--212.
[32]
Todd Kulesza, Margaret Burnett, Weng-Keen Wong, and Simone Stumpf. 2015. Principles of Explanatory Debugging to Personalize Interactive Machine Learning. In Proceedings of the 20th International Conference on Intelligent User Interfaces (IUI '15). ACM, New York, NY, USA, 126--137.
[33]
Todd Kulesza, Simone Stumpf, Margaret Burnett, and Irwin Kwan. 2012. Tell Me More?: The Effects of Mental Model Soundness on Personalizing an Intelligent Agent. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '12). ACM, New York, NY, USA, 1--10.
[34]
Todd Kulesza, Weng-Keen Wong, Simone Stumpf, Stephen Perona, Rachel White, Margaret M. Burnett, Ian Oberst, and Andrew J. Ko. 2009. Fixing the Program My Computer Learned: Barriers for End Users, Challenges for the Machine. In Proceedings of the 14th International Conference on Intelligent User Interfaces (IUI '09). ACM, New York, NY, USA, 187--196.
[35]
Vivian Lai and Chenhao Tan. 2019. On Human Predictions with Explanations and Predictions of Machine Learning Models: A Case Study on Deception Detection. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT* '19). ACM, New York, NY, USA, 29--38.
[36]
Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. 2019. Unmasking Clever Hans Predictors and Assessing What Machines Really Learn. Nature Communications 10, 1 (March 2019), 1096.
[37]
John D. Lee and Katrina A. See. 2004. Trust in Automation: Designing for Appropriate Reliance. Human Factors 46, 1 (March 2004), 50--80.
[38]
Brian Y. Lim, Anind K. Dey, and Daniel Avrahami. 2009. Why and Why Not Explanations Improve the Intelligibility of Context-Aware Intelligent Systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '09). ACM, New York, NY, USA, 2119--2128.
[39]
Zachary C. Lipton. 2018. The Mythos of Model Interpretability. Commun. ACM 61, 10 (Sept. 2018), 36--43.
[40]
Philipp Mayring. 2014. Qualitative content analysis: theoretical foundation, basic procedures and software solution. https://nbn-resolving.org/urn:nbn:de:0168-ssoar-395173
[41]
Tim Miller. 2019. Explanation in Artificial Intelligence: Insights from the Social Sciences. Artificial Intelligence 267 (2019), 1--38.
[42]
G. O. Mohler, M. B. Short, Sean Malinowski, Mark Johnson, G. E. Tita, Andrea L. Bertozzi, and P.J. Brantingham. 2015. Randomized Controlled Field Trials of Predictive Policing. J. Amer. Statist. Assoc. 110, 512 (Oct. 2015), 1399--1411.
[43]
Neville Moray. 1999. Mental Models in Theory and Practice. Attention and performance XVII: Cognitive regulation of performance: Interaction of theory and application (1999), 223--258.
[44]
Jack Muramatsu and Wanda Pratt. 2001. Transparent Queries: Investigation Users' Mental Models of Search Engines. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 217--224.
[45]
Menaka Narayanan, Emily Chen, Jeffrey He, Been Kim, Sam Gershman, and Finale Doshi-Velez. 2018. How Do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human-Interpretability of Explanation. (Feb. 2018). arXiv:1802.00682
[46]
Anh Nguyen, Alexey Dosovitskiy, Jason Yosinski, Thomas Brox, and Jeff Clune. 2016. Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., 3387--3395. http://papers.nips.cc/paper/6519-synthesizing-the-preferred-inputs-for-neurons-in-neural-networks-via-deep-generator-networks.pdf
[47]
Donald Norman. 2014. On the Relationship between Conceptual and Mental Models. In Gentner et al (e.d) Mental Models. Psychology Press.
[48]
Forough Poursabzi-Sangdeh, Daniel G. Goldstein, Jake M. Hofman, Jennifer Wortman Vaughan, and Hanna Wallach. 2018. Manipulating and Measuring Model Interpretability. arXiv: 1802.07810
[49]
Samira Pouyanfar, Saad Sadiq, Yilin Yan, Haiman Tian, Yudong Tao, Maria Presa Reyes, Mei-Ling Shyu, Shu-Ching Chen, and S. S. Iyengar. 2018. A Survey on Deep Learning: Algorithms, Techniques, and Applications. ACM Comput. Surv. 51, 5 (Sept. 2018), 92:1--92:36.
[50]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). ACM, New York, NY, USA, 1135--1144.
[51]
Martin Schuessler and Philipp Weiß. 2019. Minimalistic Explanations: Capturing the Essence of Decisions. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (CHI EA '19). ACM, New York, NY, USA, Article LBW2810, 6 pages
[52]
Ben Shneiderman. 2016. Opinion: The Dangers of Faulty, Biased, or Malicious Algorithms Requires Independent Oversight. Proceedings of the National Academy of Sciences 113, 48 (Nov. 2016), 13538--13540.
[53]
Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. arXiv:1312.6034
[54]
Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556
[55]
Aaron Springer and Steve Whittaker. 2019. Progressive Disclosure: Empirically Motivated Approaches to Designing Effective Transparency. In Proceedings of the 24th International Conference on Intelligent User Interfaces (IUI '19). ACM, New York, NY, USA, 107--120.
[56]
Peter Stone, Rodney Brooks, Erik Brynjolfsson, Ryan Calo, Oren Etzioni, Greg Hager, Julia Hirschberg, Shivaram Kalyanakrishnan, Ece Kamar, Kevin Leyton-Brown, David C. Parkes, William Press, AnnaLee Saxenian, Julie Shah, Milind Tambe, and Astro Teller. 2016. "Artificial Intelligence and Life in 2030." One Hundred Year Study on Artificial Intelligence: Report of the 2015--2016 Study Panel, https://ai100.stanford.edu/2016-report
[57]
The IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems. 2017. Ethically Aligned Design: A Vision for Prioritizing Human Well-Being with Autonomous and Intelligent Systems, Version 2. https://standards.ieee.org/content/dam/ieee-standards/standards/web/documents/other/ead_v2.pdf
[58]
Joe Tullio, Anind K. Dey, Jason Chalecki, and James Fogarty. 2007. How It Works: A Field Study of Non-Technical Users Interacting with an Intelligent System. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '07). ACM, New York, NY, USA, 31--40.
[59]
A. Volokitin, M. Gygli, and X. Boix. 2016. Predicting When Saliency Maps are Accurate and Eye Fixations Consistent. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 544--552.
[60]
Halbert White. 1980. A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity. Econometrica 48, 4 (1980), 817--838. http://www.jstor.org/stable/1912934
[61]
Rayoung Yang and Mark W. Newman. 2013. Learning from a Learning Thermostat: Lessons for Intelligent Systems for the Home. In Proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing. ACM, 93--102.
[62]
Ming Yin, Jennifer Wortman Vaughan, and Hanna Wallach. 2019. Understanding the Effect of Accuracy on Trust in Machine Learning Models. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19). ACM, New York, NY, USA, 279:1--279:12.

Cited By

View all
  • (2024)Explaining the Behavior of POMDP-based Agents Through the Impact of Counterfactual InformationProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662993(1346-1354)Online publication date: 6-May-2024
  • (2024)Explainable AI for Interpretation of Ovarian Tumor Classification Using Enhanced ResNet50Diagnostics10.3390/diagnostics1414156714:14(1567)Online publication date: 19-Jul-2024
  • (2024)Enhancing Interpretability in Medical Image Classification by Integrating Formal Concept Analysis with Convolutional Neural NetworksBiomimetics10.3390/biomimetics90704219:7(421)Online publication date: 10-Jul-2024
  • Show More Cited By

Index Terms

  1. Evaluating saliency map explanations for convolutional neural networks: a user study

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    IUI '20: Proceedings of the 25th International Conference on Intelligent User Interfaces
    March 2020
    607 pages
    ISBN:9781450371186
    DOI:10.1145/3377325
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 March 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. explainable AI
    2. heatmap
    3. human-AI interaction
    4. saliency-maps
    5. user studies

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    IUI '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 746 of 2,811 submissions, 27%

    Upcoming Conference

    IUI '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)447
    • Downloads (Last 6 weeks)57
    Reflects downloads up to 11 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Explaining the Behavior of POMDP-based Agents Through the Impact of Counterfactual InformationProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662993(1346-1354)Online publication date: 6-May-2024
    • (2024)Explainable AI for Interpretation of Ovarian Tumor Classification Using Enhanced ResNet50Diagnostics10.3390/diagnostics1414156714:14(1567)Online publication date: 19-Jul-2024
    • (2024)Enhancing Interpretability in Medical Image Classification by Integrating Formal Concept Analysis with Convolutional Neural NetworksBiomimetics10.3390/biomimetics90704219:7(421)Online publication date: 10-Jul-2024
    • (2024)User‐Centered Evaluation of Explainable Artificial Intelligence (XAI): A Systematic Literature ReviewHuman Behavior and Emerging Technologies10.1155/2024/46288552024:1Online publication date: 15-Jul-2024
    • (2024)Explainable AI Reloaded: Challenging the XAI Status Quo in the Era of Large Language ModelsProceedings of the Halfway to the Future Symposium10.1145/3686169.3686185(1-8)Online publication date: 21-Oct-2024
    • (2024)Slide to Explore 'What If': An Analysis of Explainable InterfacesAdjunct Proceedings of the 2024 Nordic Conference on Human-Computer Interaction10.1145/3677045.3685416(1-6)Online publication date: 13-Oct-2024
    • (2024)Reassuring, Misleading, Debunking: Comparing Effects of XAI Methods on Human DecisionsACM Transactions on Interactive Intelligent Systems10.1145/366564714:3(1-36)Online publication date: 22-May-2024
    • (2024)The AI-DEC: A Card-based Design Method for User-centered AI ExplanationsProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661576(1010-1028)Online publication date: 1-Jul-2024
    • (2024)"If it is easy to understand then it will have value": Examining Perceptions of Explainable AI with Community Health Workers in Rural IndiaProceedings of the ACM on Human-Computer Interaction10.1145/36373488:CSCW1(1-28)Online publication date: 26-Apr-2024
    • (2024)Transparency in the Wild: Navigating Transparency in a Deployed AI System to Broaden Need-Finding ApproachesProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3658985(1494-1514)Online publication date: 3-Jun-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media