More Web Proxy on the site http://driver.im/

research-article

SNIPPET: A Framework for Subjective Evaluation of Visual Explanations Applied to DeepFake Detection

Authors:

Boris Joukovsky,

José Oramas Mogrovejo,

Tinne Tuytelaars,

Nikos DeligiannisAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 20, Issue 8

Article No.: 253, Pages 1 - 29

https://doi.org/10.1145/3665248

Published: 13 June 2024 Publication History

Abstract

Explainable Artificial Intelligence (XAI) attempts to help humans understand machine learning decisions better and has been identified as a critical component toward increasing the trustworthiness of complex black-box systems, such as deep neural networks. In this article, we propose a generic and comprehensive framework named SNIPPET and create a user interface for the subjective evaluation of visual explanations, focusing on finding human-friendly explanations. SNIPPET considers human-centered evaluation tasks and incorporates the collection of human annotations. These annotations can serve as valuable feedback to validate the qualitative results obtained from the subjective assessment tasks. Moreover, we consider different user background categories during the evaluation process to ensure diverse perspectives and comprehensive evaluation. We demonstrate SNIPPET on a DeepFake face dataset. Distinguishing real from fake faces is a non-trivial task even for humans that depends on rather subtle features, making it a challenging use case. Using SNIPPET, we evaluate four popular XAI methods which provide visual explanations: Gradient-weighted Class Activation Mapping, Layer-wise Relevance Propagation, attention rollout, and Transformer Attribution. Based on our experimental results, we observe preference variations among different user categories. We find that most people are more favorable to the explanations of rollout. Moreover, when it comes to XAI-assisted understanding, those who have no or lack relevant background knowledge often consider that visual explanations are insufficient to help them understand. We open-source our framework for continued data collection and annotation at https://github.com/XAI-SubjEvaluation/SNIPPET.

References

[1]

Samira Abnar and Willem Zuidema. 2020. Quantifying attention flow in transformers. arXiv preprint arXiv:2005.00928 (2020).

[2]

Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. 2015. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS One 10, 7 (2015), e0130140.

[3]

Benjamin Balas and Jonathan Pacella. 2015. Artificial faces are harder to remember. Computers in Human Behavior 52 (2015), 331–337.

Digital Library

[4]

Benjamin Balas and Jonathan Pacella. 2017. Trustworthiness perception is disrupted in artificial faces. Computers in Human Behavior 77 (2017), 240–248.

Digital Library

[5]

Judy Borowski, Roland S. Zimmermann, Judith Schepers, Robert Geirhos, Thomas S. A. Wallis, Matthias Bethge, and Wieland Brendel. 2020. Exemplary natural images explain CNN activations better than state-of-the-art feature visualization. arXiv preprint arXiv:2010.12606 (2020).

[6]

Zoya Bylinskii, Tilke Judd, Aude Oliva, Antonio Torralba, and Frédo Durand. 2018. What do different evaluation metrics tell us about saliency models? IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 3 (2018), 740–757.

Digital Library

[7]

Hila Chefer, Shir Gur, and Lior Wolf. 2021. Transformer interpretability beyond attention visualization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 782–791.

[8]

Xiangning Chen, Chen Liang, Da Huang, Esteban Real, Kaiyuan Wang, Yao Liu, Hieu Pham, Xuanyi Dong, Thang Luong, Cho-Jui Hsieh, Yifeng Lu, and Quoc V. Le. 2023. Symbolic discovery of optimization algorithms. arXiv preprint arXiv:2302.06675 (2023).

[9]

Yizhen Chen and Haifeng Hu. 2021. Y-Net: Dual-branch joint network for semantic segmentation. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 4 (2021), 1–22.

Digital Library

[10]

Zhineng Chen, Shanshan Ai, and Caiyan Jia. 2019. Structure-aware deep learning for product image classification. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 1s (2019), 1–20.

Digital Library

[11]

Julien Colin, Thomas Fel, Rémi Cadène, and Thomas Serre. 2022. What I cannot predict, I do not understand: A human-centered evaluation framework for explainability methods. Advances in Neural Information Processing Systems 35 (2022), 2832–2845.

[12]

Brian Dolhansky, Joanna Bitton, Ben Pflaum, Jikuo Lu, Russ Howes, Menglin Wang, and Cristian Canton Ferrer. 2020. The DeepFake Detection Challenge (DFDC) dataset. arXiv preprint arXiv:2006.07397 (2020).

[13]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).

[14]

Hany Farid and Mary J. Bravo. 2012. Perceptual discrimination of computer generated and photographic faces. Digital Investigation 8, 3-4 (2012), 226–235.

[15]

Ruth Fong, Mandela Patrick, and Andrea Vedaldi. 2019. Understanding deep networks via extremal perturbations and smooth masks. In Proceedings of the IEEE International Conference on Computer Vision. 2950–2958.

[16]

Ruth Fong and Andrea Vedaldi. 2018. Net2Vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8730–8738.

[17]

Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 580–587.

Digital Library

[18]

Peter Hase and Mohit Bansal. 2020. Evaluating explainable AI: Which algorithmic explanations help users predict model behavior? arXiv preprint arXiv:2005.01831 (2020).

[19]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.

[20]

Yinan He, Bei Gan, Siyu Chen, Yichun Zhou, Guojun Yin, Luchuan Song, Lu Sheng, Jing Shao, and Ziwei Liu. 2021. ForgeryNet: A versatile benchmark for comprehensive forgery analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4360–4369.

[21]

Shuaixiong Hui, Qiang Guo, Xiaoyu Geng, and Caiming Zhang. 2023. Multi-guidance CNNs for salient object detection. ACM Transactions on Multimedia Computing, Communications, and Applications 19, 3 (2023), 1–19.

Digital Library

[22]

Jeya Vikranth Jeyakumar, Joseph Noor, Yu-Hsi Cheng, Luis Garcia, and Mani Srivastava. 2020. How can I explain this to you? An empirical study of deep neural network explanation methods. Advances in Neural Information Processing Systems 33 (2020), 4211–4222.

[23]

Tilke Judd, Frédo Durand, and Antonio Torralba. 2012. A Benchmark of Computational Models of Saliency to Predict Human Fixations. Technical Report MIT-CSAIL-TR-2021-001. Computer Science and Artificial Intelligence Lab (CSAIL).

[24]

Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4401–4410.

[25]

Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and improving the image quality of StyleGAN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8110–8119.

[26]

Sunnie S. Y. Kim, Nicole Meister, Vikram V. Ramaswamy, Ruth Fong, and Olga Russakovsky. 2022. Hive: Evaluating the human interpretability of visual explanations. In Computer Vision—ECCV 2022. Lecture Notes in Computer Science, Vol. 13672. Springer, 280–298.

[27]

Pavel Korshunov and Sébastien Marcel. 2020. Deepfake detection: Humans vs. machines. arXiv preprint arXiv:2009.03155 (2020).

[28]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25 (2012), 1–9.

[29]

Marc Lalonde, Langis Gagnon, and Marie-Carole Boucher. 2001. Automatic visual quality assessment in optical fundus images. In Proceedings of Vision Interface, Vol. 32. 259–264.

[30]

Olivier Le Meur and Thierry Baccino. 2013. Methods for comparing scanpaths and saliency maps: Strengths and weaknesses. Behavior Research Methods 45, 1 (2013), 251–266.

[31]

Olivier Le Meur, Patrick Le Callet, and Dominique Barba. 2007. Predicting visual fixations on video based on low-level visual features. Vision Research 47, 19 (2007), 2483–2498.

[32]

Cheng-Han Lee, Ziwei Liu, Lingyun Wu, and Ping Luo. 2020. MaskGAN: Towards diverse and interactive facial image manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5549–5558.

[33]

Samuel C. Lee and Yiming Wang. 1999. Automatic retinal image quality assessment and enhancement. In Medical Imaging 1999: Image Processing, Vol. 3661. SPIE, 1581–1590.

[34]

Lingzhi Li, Jianmin Bao, Hao Yang, Dong Chen, and Fang Wen. 2019. FaceShifter: Towards high fidelity and occlusion aware face swapping. arXiv preprint arXiv:1912.13457 (2019).

[35]

Yi-Shan Lin, Wen-Chuan Lee, and Z Berkay Celik. 2021. What do you see? Evaluation of explainable artificial intelligence (XAI) interpretability through neural backdoors. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1027–1035.

Digital Library

[36]

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin Transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10012–10022.

[37]

Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. 2022. A ConvNet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11976–11986.

[38]

He Lyu, Ningyu Sha, Shuyang Qin, Ming Yan, Yuying Xie, and Rongrong Wang. 2019. Manifold denoising by nonlinear robust principal component analysis. Advances in Neural Information Processing Systems 32 (2019), 1–11.

[39]

Sina Mohseni, Niloofar Zarei, and Eric D. Ragan. 2021. A multidisciplinary survey and framework for design and evaluation of explainable AI systems. ACM Transactions on Interactive Intelligent Systems 11, 3-4 (2021), 1–45.

Digital Library

[40]

Grégoire Montavon, Sebastian Lapuschkin, Alexander Binder, Wojciech Samek, and Klaus-Robert Müller. 2017. Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recognition 65 (2017), 211–222.

Digital Library

[41]

Muhammad Muzammal Naseer, Kanchana Ranasinghe, Salman H. Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. 2021. Intriguing properties of vision transformers. Advances in Neural Information Processing Systems 34 (2021), 23296–23308.

[42]

Giang Nguyen, Daeyoung Kim, and Anh Nguyen. 2021. The effectiveness of feature attribution methods and its correlation with automatic evaluation scores: Evaluating explainable AI. Advances in Neural Information Processing Systems 34 (2021), 26422–26436.

[43]

Giang Nguyen, Mohammad Reza Taesiri, and Anh Nguyen. 2022. Visual correspondence-based explanations improve AI robustness and human-AI team accuracy. In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS’22).

[44]

Jakob Nielsen. 1994. Usability Engineering. Morgan Kaufmann.

Digital Library

[45]

Forough Poursabzi-Sangdeh, Daniel G. Goldstein, Jake M. Hofman, Jennifer Wortman Wortman Vaughan, and Hanna Wallach. 2021. Manipulating and measuring model interpretability. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–52.

Digital Library

[46]

Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. FaceForensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1–11.

[47]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115 (2015), 211–252.

Digital Library

[48]

Wojciech Samek, Alexander Binder, Grégoire Montavon, Sebastian Lapuschkin, and Klaus-Robert Müller. 2016. Evaluating the visualization of what a deep neural network has learned. IEEE Transactions on Neural Networks and Learning Systems 28, 11 (2016), 2660–2673.

[49]

Wojciech Samek, Grégoire Montavon, Andrea Vedaldi, Lars Kai Hansen, and Klaus-Robert Müller. 2019. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. Vol. 11700. Springer Nature.

Digital Library

[50]

Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision. 618–626.

[51]

Bingyu Shen, Brandon RichardWebster, Alice O’Toole, Kevin Bowyer, and Walter J. Scheirer. 2021. A study of the human perception of synthetic faces. In Proceedings of the 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG’21). IEEE, 1–8.

Digital Library

[52]

Ran Shi, Jing Ma, King Ngi Ngan, Jian Xiong, and Tong Qiao. 2022. Objective object segmentation visual quality evaluation: Quality measure and pooling method. ACM Transactions on Multimedia Computing, Communications, and Applications 18, 3 (2022), 1–19.

Digital Library

[53]

Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning important features through propagating activation differences. In Proceedings of the International Conference on Machine Learning. 3145–3153.

[54]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[55]

Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. 2017. SmoothGrad: Removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017).

[56]

John Sweller. 1994. Cognitive load theory, learning difficulty, and instructional design. Learning and Instruction 4, 4 (1994), 295–312.

[57]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017), 1–11.

[58]

Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. 2011. The Caltech-UCSD Birds-200-2011 Dataset. California Institute of Technology.

[59]

Kaili Wang, Jose Oramas, and Tinne Tuytelaars. 2021. Towards human-understandable visual explanations: Imperceptible high-frequency cues can better be removed. arXiv preprint arXiv:2104.07954 (2021).

[60]

Peng Wang, Shijie Wang, Junyang Lin, Shuai Bai, Xiaohuan Zhou, Jingren Zhou, Xinggang Wang, and Chang Zhou. 2023. ONE-PEACE: Exploring one general representation model toward unlimited modalities. arXiv preprint arXiv:2305.11172 (2023).

[61]

Fan Yang, Mengnan Du, and Xia Hu. 2019. Evaluating explanation without ground truth in interpretable machine learning. arXiv preprint arXiv:1907.06831 (2019).

[62]

Yuqing Yang, Saeed Mahmoudpour, Peter Schelkens, and Nikos Deligiannis. 2023. Evaluating quality of visual explanations of deep learning models for vision tasks. In Proceedings of the International Conference on Quality of Multimedia Experience (QoMEX’23).

[63]

Rulei Yu and Lei Shi. 2018. A user-based taxonomy for deep learning visualization. Visual Informatics 2, 3 (2018), 147–154.

[64]

Yunfeng Zhang, Q. Vera Liao, and Rachel K. E. Bellamy. 2020. Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 295–305.

Digital Library

[65]

Qiongyi Zhou, Changde Du, and Huiguang He. 2022. Exploring the brain-like properties of deep neural networks: A neural encoding perspective. Machine Intelligence Research 19, 5 (2022), 439–455.

[66]

Zhuofan Zong, Guanglu Song, and Yu Liu. 2022. DETRs with collaborative hybrid assignments training. arXiv preprint arXiv:2211.12860 (2022).

Cited By

Samrouth KEl Housseini PDeforges O(2025)Siamese Network-Based Detection of Deepfake Impersonation Attacks with a Person of Interest ApproachACM Transactions on Multimedia Computing, Communications, and Applications10.1145/370835221:3(1-23)Online publication date: 19-Feb-2025
https://doi.org/10.1145/3708352
Hu WYang YHu H(2024)Pseudo Label Association and Prototype-Based Invariant Learning for Semi-Supervised NIR-VIS Face RecognitionIEEE Transactions on Image Processing10.1109/TIP.2024.336453033(1448-1463)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TIP.2024.3364530
Yang YHu WHu H(2024)Unsupervised NIR-VIS Face Recognition via Homogeneous-to-Heterogeneous Learning and Residual-Invariant EnhancementIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.334617619(2112-2126)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TIFS.2023.3346176

Index Terms

SNIPPET: A Framework for Subjective Evaluation of Visual Explanations Applied to DeepFake Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI design and evaluation methods
      1. User studies
  2. Visualization
    1. Visualization design and evaluation methods

Recommendations

Towards Quantitative Evaluation of Explainable AI Methods for Deepfake Detection
MAD '24: Proceedings of the 3rd ACM International Workshop on Multimedia AI against Disinformation

In this paper we propose a new framework for evaluating the performance of explanation methods on the decisions of a deepfake detector. This framework assesses the ability of an explanation method to spot the regions of a fake image with the biggest ...
Visual, textual or hybrid: the effect of user expertise on different explanations
IUI '21: Proceedings of the 26th International Conference on Intelligent User Interfaces

As the use of AI algorithms keeps rising continuously, so does the need for their transparency and accountability. However, literature often adopts a one-size-fits-all approach for developing explanations when in practice, the type of explanations ...
STEEX: Steering Counterfactual Explanations with Semantics
Computer Vision – ECCV 2022
Abstract
As deep learning modelsCord, Matthieu are increasingly used in safety-critical applications, explainability and trustworthiness become major concerns. For simple images, such as low-resolution face portraits, synthesizing visual counterfactual ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 20, Issue 8

August 2024

726 pages

EISSN:1551-6865

DOI:10.1145/3618074

Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada

Issue’s Table of Contents

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2024

Online AM: 22 May 2024

Accepted: 12 May 2024

Revised: 03 May 2024

Received: 15 September 2023

Published in TOMM Volume 20, Issue 8

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

FWO
Flemish Government
Onderzoeksprogramma Artificiele Intelligentie (AI) Vlaanderen
Trustworthy AI Methods (TAIM)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
455
Total Downloads

Downloads (Last 12 months)455
Downloads (Last 6 weeks)39

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Samrouth KEl Housseini PDeforges O(2025)Siamese Network-Based Detection of Deepfake Impersonation Attacks with a Person of Interest ApproachACM Transactions on Multimedia Computing, Communications, and Applications10.1145/370835221:3(1-23)Online publication date: 19-Feb-2025
https://doi.org/10.1145/3708352
Hu WYang YHu H(2024)Pseudo Label Association and Prototype-Based Invariant Learning for Semi-Supervised NIR-VIS Face RecognitionIEEE Transactions on Image Processing10.1109/TIP.2024.336453033(1448-1463)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TIP.2024.3364530
Yang YHu WHu H(2024)Unsupervised NIR-VIS Face Recognition via Homogeneous-to-Heterogeneous Learning and Residual-Invariant EnhancementIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.334617619(2112-2126)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TIFS.2023.3346176

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents