[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

SNIPPET: A Framework for Subjective Evaluation of Visual Explanations Applied to DeepFake Detection

Published: 13 June 2024 Publication History

Abstract

Explainable Artificial Intelligence (XAI) attempts to help humans understand machine learning decisions better and has been identified as a critical component toward increasing the trustworthiness of complex black-box systems, such as deep neural networks. In this article, we propose a generic and comprehensive framework named SNIPPET and create a user interface for the subjective evaluation of visual explanations, focusing on finding human-friendly explanations. SNIPPET considers human-centered evaluation tasks and incorporates the collection of human annotations. These annotations can serve as valuable feedback to validate the qualitative results obtained from the subjective assessment tasks. Moreover, we consider different user background categories during the evaluation process to ensure diverse perspectives and comprehensive evaluation. We demonstrate SNIPPET on a DeepFake face dataset. Distinguishing real from fake faces is a non-trivial task even for humans that depends on rather subtle features, making it a challenging use case. Using SNIPPET, we evaluate four popular XAI methods which provide visual explanations: Gradient-weighted Class Activation Mapping, Layer-wise Relevance Propagation, attention rollout, and Transformer Attribution. Based on our experimental results, we observe preference variations among different user categories. We find that most people are more favorable to the explanations of rollout. Moreover, when it comes to XAI-assisted understanding, those who have no or lack relevant background knowledge often consider that visual explanations are insufficient to help them understand. We open-source our framework for continued data collection and annotation at https://github.com/XAI-SubjEvaluation/SNIPPET.

References

[1]
Samira Abnar and Willem Zuidema. 2020. Quantifying attention flow in transformers. arXiv preprint arXiv:2005.00928 (2020).
[2]
Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. 2015. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS One 10, 7 (2015), e0130140.
[3]
Benjamin Balas and Jonathan Pacella. 2015. Artificial faces are harder to remember. Computers in Human Behavior 52 (2015), 331–337.
[4]
Benjamin Balas and Jonathan Pacella. 2017. Trustworthiness perception is disrupted in artificial faces. Computers in Human Behavior 77 (2017), 240–248.
[5]
Judy Borowski, Roland S. Zimmermann, Judith Schepers, Robert Geirhos, Thomas S. A. Wallis, Matthias Bethge, and Wieland Brendel. 2020. Exemplary natural images explain CNN activations better than state-of-the-art feature visualization. arXiv preprint arXiv:2010.12606 (2020).
[6]
Zoya Bylinskii, Tilke Judd, Aude Oliva, Antonio Torralba, and Frédo Durand. 2018. What do different evaluation metrics tell us about saliency models? IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 3 (2018), 740–757.
[7]
Hila Chefer, Shir Gur, and Lior Wolf. 2021. Transformer interpretability beyond attention visualization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 782–791.
[8]
Xiangning Chen, Chen Liang, Da Huang, Esteban Real, Kaiyuan Wang, Yao Liu, Hieu Pham, Xuanyi Dong, Thang Luong, Cho-Jui Hsieh, Yifeng Lu, and Quoc V. Le. 2023. Symbolic discovery of optimization algorithms. arXiv preprint arXiv:2302.06675 (2023).
[9]
Yizhen Chen and Haifeng Hu. 2021. Y-Net: Dual-branch joint network for semantic segmentation. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 4 (2021), 1–22.
[10]
Zhineng Chen, Shanshan Ai, and Caiyan Jia. 2019. Structure-aware deep learning for product image classification. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 1s (2019), 1–20.
[11]
Julien Colin, Thomas Fel, Rémi Cadène, and Thomas Serre. 2022. What I cannot predict, I do not understand: A human-centered evaluation framework for explainability methods. Advances in Neural Information Processing Systems 35 (2022), 2832–2845.
[12]
Brian Dolhansky, Joanna Bitton, Ben Pflaum, Jikuo Lu, Russ Howes, Menglin Wang, and Cristian Canton Ferrer. 2020. The DeepFake Detection Challenge (DFDC) dataset. arXiv preprint arXiv:2006.07397 (2020).
[13]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[14]
Hany Farid and Mary J. Bravo. 2012. Perceptual discrimination of computer generated and photographic faces. Digital Investigation 8, 3-4 (2012), 226–235.
[15]
Ruth Fong, Mandela Patrick, and Andrea Vedaldi. 2019. Understanding deep networks via extremal perturbations and smooth masks. In Proceedings of the IEEE International Conference on Computer Vision. 2950–2958.
[16]
Ruth Fong and Andrea Vedaldi. 2018. Net2Vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8730–8738.
[17]
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 580–587.
[18]
Peter Hase and Mohit Bansal. 2020. Evaluating explainable AI: Which algorithmic explanations help users predict model behavior? arXiv preprint arXiv:2005.01831 (2020).
[19]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[20]
Yinan He, Bei Gan, Siyu Chen, Yichun Zhou, Guojun Yin, Luchuan Song, Lu Sheng, Jing Shao, and Ziwei Liu. 2021. ForgeryNet: A versatile benchmark for comprehensive forgery analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4360–4369.
[21]
Shuaixiong Hui, Qiang Guo, Xiaoyu Geng, and Caiming Zhang. 2023. Multi-guidance CNNs for salient object detection. ACM Transactions on Multimedia Computing, Communications, and Applications 19, 3 (2023), 1–19.
[22]
Jeya Vikranth Jeyakumar, Joseph Noor, Yu-Hsi Cheng, Luis Garcia, and Mani Srivastava. 2020. How can I explain this to you? An empirical study of deep neural network explanation methods. Advances in Neural Information Processing Systems 33 (2020), 4211–4222.
[23]
Tilke Judd, Frédo Durand, and Antonio Torralba. 2012. A Benchmark of Computational Models of Saliency to Predict Human Fixations. Technical Report MIT-CSAIL-TR-2021-001. Computer Science and Artificial Intelligence Lab (CSAIL).
[24]
Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4401–4410.
[25]
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and improving the image quality of StyleGAN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8110–8119.
[26]
Sunnie S. Y. Kim, Nicole Meister, Vikram V. Ramaswamy, Ruth Fong, and Olga Russakovsky. 2022. Hive: Evaluating the human interpretability of visual explanations. In Computer Vision—ECCV 2022. Lecture Notes in Computer Science, Vol. 13672. Springer, 280–298.
[27]
Pavel Korshunov and Sébastien Marcel. 2020. Deepfake detection: Humans vs. machines. arXiv preprint arXiv:2009.03155 (2020).
[28]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25 (2012), 1–9.
[29]
Marc Lalonde, Langis Gagnon, and Marie-Carole Boucher. 2001. Automatic visual quality assessment in optical fundus images. In Proceedings of Vision Interface, Vol. 32. 259–264.
[30]
Olivier Le Meur and Thierry Baccino. 2013. Methods for comparing scanpaths and saliency maps: Strengths and weaknesses. Behavior Research Methods 45, 1 (2013), 251–266.
[31]
Olivier Le Meur, Patrick Le Callet, and Dominique Barba. 2007. Predicting visual fixations on video based on low-level visual features. Vision Research 47, 19 (2007), 2483–2498.
[32]
Cheng-Han Lee, Ziwei Liu, Lingyun Wu, and Ping Luo. 2020. MaskGAN: Towards diverse and interactive facial image manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5549–5558.
[33]
Samuel C. Lee and Yiming Wang. 1999. Automatic retinal image quality assessment and enhancement. In Medical Imaging 1999: Image Processing, Vol. 3661. SPIE, 1581–1590.
[34]
Lingzhi Li, Jianmin Bao, Hao Yang, Dong Chen, and Fang Wen. 2019. FaceShifter: Towards high fidelity and occlusion aware face swapping. arXiv preprint arXiv:1912.13457 (2019).
[35]
Yi-Shan Lin, Wen-Chuan Lee, and Z Berkay Celik. 2021. What do you see? Evaluation of explainable artificial intelligence (XAI) interpretability through neural backdoors. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1027–1035.
[36]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin Transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10012–10022.
[37]
Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. 2022. A ConvNet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11976–11986.
[38]
He Lyu, Ningyu Sha, Shuyang Qin, Ming Yan, Yuying Xie, and Rongrong Wang. 2019. Manifold denoising by nonlinear robust principal component analysis. Advances in Neural Information Processing Systems 32 (2019), 1–11.
[39]
Sina Mohseni, Niloofar Zarei, and Eric D. Ragan. 2021. A multidisciplinary survey and framework for design and evaluation of explainable AI systems. ACM Transactions on Interactive Intelligent Systems 11, 3-4 (2021), 1–45.
[40]
Grégoire Montavon, Sebastian Lapuschkin, Alexander Binder, Wojciech Samek, and Klaus-Robert Müller. 2017. Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recognition 65 (2017), 211–222.
[41]
Muhammad Muzammal Naseer, Kanchana Ranasinghe, Salman H. Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. 2021. Intriguing properties of vision transformers. Advances in Neural Information Processing Systems 34 (2021), 23296–23308.
[42]
Giang Nguyen, Daeyoung Kim, and Anh Nguyen. 2021. The effectiveness of feature attribution methods and its correlation with automatic evaluation scores: Evaluating explainable AI. Advances in Neural Information Processing Systems 34 (2021), 26422–26436.
[43]
Giang Nguyen, Mohammad Reza Taesiri, and Anh Nguyen. 2022. Visual correspondence-based explanations improve AI robustness and human-AI team accuracy. In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS’22).
[44]
Jakob Nielsen. 1994. Usability Engineering. Morgan Kaufmann.
[45]
Forough Poursabzi-Sangdeh, Daniel G. Goldstein, Jake M. Hofman, Jennifer Wortman Wortman Vaughan, and Hanna Wallach. 2021. Manipulating and measuring model interpretability. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–52.
[46]
Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. FaceForensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1–11.
[47]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115 (2015), 211–252.
[48]
Wojciech Samek, Alexander Binder, Grégoire Montavon, Sebastian Lapuschkin, and Klaus-Robert Müller. 2016. Evaluating the visualization of what a deep neural network has learned. IEEE Transactions on Neural Networks and Learning Systems 28, 11 (2016), 2660–2673.
[49]
Wojciech Samek, Grégoire Montavon, Andrea Vedaldi, Lars Kai Hansen, and Klaus-Robert Müller. 2019. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. Vol. 11700. Springer Nature.
[50]
Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision. 618–626.
[51]
Bingyu Shen, Brandon RichardWebster, Alice O’Toole, Kevin Bowyer, and Walter J. Scheirer. 2021. A study of the human perception of synthetic faces. In Proceedings of the 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG’21). IEEE, 1–8.
[52]
Ran Shi, Jing Ma, King Ngi Ngan, Jian Xiong, and Tong Qiao. 2022. Objective object segmentation visual quality evaluation: Quality measure and pooling method. ACM Transactions on Multimedia Computing, Communications, and Applications 18, 3 (2022), 1–19.
[53]
Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning important features through propagating activation differences. In Proceedings of the International Conference on Machine Learning. 3145–3153.
[54]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[55]
Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. 2017. SmoothGrad: Removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017).
[56]
John Sweller. 1994. Cognitive load theory, learning difficulty, and instructional design. Learning and Instruction 4, 4 (1994), 295–312.
[57]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017), 1–11.
[58]
Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. 2011. The Caltech-UCSD Birds-200-2011 Dataset. California Institute of Technology.
[59]
Kaili Wang, Jose Oramas, and Tinne Tuytelaars. 2021. Towards human-understandable visual explanations: Imperceptible high-frequency cues can better be removed. arXiv preprint arXiv:2104.07954 (2021).
[60]
Peng Wang, Shijie Wang, Junyang Lin, Shuai Bai, Xiaohuan Zhou, Jingren Zhou, Xinggang Wang, and Chang Zhou. 2023. ONE-PEACE: Exploring one general representation model toward unlimited modalities. arXiv preprint arXiv:2305.11172 (2023).
[61]
Fan Yang, Mengnan Du, and Xia Hu. 2019. Evaluating explanation without ground truth in interpretable machine learning. arXiv preprint arXiv:1907.06831 (2019).
[62]
Yuqing Yang, Saeed Mahmoudpour, Peter Schelkens, and Nikos Deligiannis. 2023. Evaluating quality of visual explanations of deep learning models for vision tasks. In Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX’23).
[63]
Rulei Yu and Lei Shi. 2018. A user-based taxonomy for deep learning visualization. Visual Informatics 2, 3 (2018), 147–154.
[64]
Yunfeng Zhang, Q. Vera Liao, and Rachel K. E. Bellamy. 2020. Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 295–305.
[65]
Qiongyi Zhou, Changde Du, and Huiguang He. 2022. Exploring the brain-like properties of deep neural networks: A neural encoding perspective. Machine Intelligence Research 19, 5 (2022), 439–455.
[66]
Zhuofan Zong, Guanglu Song, and Yu Liu. 2022. DETRs with collaborative hybrid assignments training. arXiv preprint arXiv:2211.12860 (2022).

Cited By

View all
  • (2024)Pseudo Label Association and Prototype-Based Invariant Learning for Semi-Supervised NIR-VIS Face RecognitionIEEE Transactions on Image Processing10.1109/TIP.2024.336453033(1448-1463)Online publication date: 1-Jan-2024
  • (2024)Unsupervised NIR-VIS Face Recognition via Homogeneous-to-Heterogeneous Learning and Residual-Invariant EnhancementIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.334617619(2112-2126)Online publication date: 1-Jan-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 8
August 2024
726 pages
EISSN:1551-6865
DOI:10.1145/3618074
  • Editor:
  • Abdulmotaleb El Saddik
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2024
Online AM: 22 May 2024
Accepted: 12 May 2024
Revised: 03 May 2024
Received: 15 September 2023
Published in TOMM Volume 20, Issue 8

Check for updates

Author Tags

  1. Explainable AI
  2. visual explanations
  3. subjective evaluation framework
  4. DeepFake detection
  5. user background categories

Qualifiers

  • Research-article

Funding Sources

  • FWO
  • Flemish Government
  • Onderzoeksprogramma Artificiele Intelligentie (AI) Vlaanderen
  • Trustworthy AI Methods (TAIM)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)366
  • Downloads (Last 6 weeks)88
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Pseudo Label Association and Prototype-Based Invariant Learning for Semi-Supervised NIR-VIS Face RecognitionIEEE Transactions on Image Processing10.1109/TIP.2024.336453033(1448-1463)Online publication date: 1-Jan-2024
  • (2024)Unsupervised NIR-VIS Face Recognition via Homogeneous-to-Heterogeneous Learning and Residual-Invariant EnhancementIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.334617619(2112-2126)Online publication date: 1-Jan-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media