Abstract
The literature shows outstanding capabilities for Convolutional Neural Networks (CNNs) in event recognition in images. However, fewer attempts are made to analyze the potential causes behind the decisions of the models and explore whether the predictions are based on event-salient objects/regions? To explore this important aspect of event recognition, in this work, we propose an explainable event recognition framework relying on Grad-CAM and an Xception architecture-based CNN model. Experiments are conducted on four large-scale datasets covering a diversified set of natural disasters, social, and sports events. Overall, the model showed outstanding generalization capabilities obtaining overall F1 scores of 0.91, 0.94, and 0.97 on natural disasters, social, and sports events, respectively. Moreover, for subjective analysis of activation maps generated through Grad-CAM for the predicted samples of the model, a crowd-sourcing study is conducted to analyze whether the model’s predictions are based on event-related objects/regions or not? The results of the study indicate that 78%, 84%, and 78% of the model decisions on natural disasters, sports, and social events datasets, respectively, are based on event-related objects/regions.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Adadi A, Berrada M (2020) Explainable ai for healthcare: from black box to interpretable models. In: Embedded systems and artificial intelligence, pp 327–337. Springer
Afridi YS, Ahmad K, Hassan L (2021) Artificial intelligence based prognostic maintenance of renewable energy systems: a review of techniques, challenges, and future research directions. International Journal of Energy Research
Ahmad K, Conci N (2019) How deep features have improved event recognition in multimedia: a survey. ACM Trans Multimed Comput Commun Applic (TOMM) 15(2):1–27
Ahmad K, Conci N, Boato G, De Natale F (2016) Used: a large-scale social event detection dataset. In: Proceedings of the 7th international conference on multimedia systems, pp 1–6
Ahmad K, Conci N, De Natale F (2018) A saliency-based approach to event recognition. Signal Process Image Commun 60:42–51
Ahmad K, Maabreh M, Ghaly M, Khan K, Qadir J, Al-Fuqaha A (2022) Developing future human-centered smart cities: critical analysis of smart city security, data management, and ethical challenges. Comput Sci Rev 43 (100):452
Ahmad K, Mekhalfi ML, Conci N, Boato G, Melgani F, De Natale F (2017) A pool of deep models for event recognition. In: 2017 IEEE international conference on image processing (ICIP), pp 2886–2890. IEEE
Ahmad K, Mekhalfi ML, Conci N, Melgani F, Natale FD (2018) Ensemble of deep models for event recognition. ACM Trans Multimed Comput Commun Applic (TOMM) 14(2):1–20
Ahmad K, Pogorelov K, Riegler M, Conci N, Halvorsen P (2019) Social media and satellites: disaster event detection, linking and summarization. Multimed Tools Appl 78(3):2837–2875
Ahmad K, Sohail A, Conci N, De Natale F (2018) A comparative study of global and deep features for the analysis of user-generated natural disaster related images. In: 2018 IEEE 13th image, video, and multidimensional signal processing workshop (IVMSP), pp 1–5. IEEE
Ahsan U, Sun C, Hays J, Essa I (2017) Complex event recognition from images with few training examples. In: 2017 IEEE winter conference on applications of computer vision (WACV), pp 669–678. IEEE
Baro X, Gonzalez J, Fabian J, Bautista MA, Oliu M, Jair Escalante H, Guyon I, Escalera S (2015) Chalearn looking at people 2015 challenges: action spotting and cultural event recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 1–9
Chandrakala S, Venkatraman M, Shreyas N, Jayalakshmi S (2021) Multi-view representation for sound event recognition. SIViP, 1–9
Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE Conference on computer vision and pattern recognition, 2009. CVPR 2009, pp 248–255. IEEE
Fiok K, Farahani FV, Karwowski W, Ahram T (2021) Explainable artificial intelligence for education and training. The Journal of Defense Modeling and Simulation, 15485129211028651
Francois AR, Nevatia R, Hobbs J, Bolles RC, Smith JR (2005) Verl: an ontology framework for representing and annotating video events. IEEE Multimed 12(4):76–86
Gade K, Geyik SC, Kenthapadi K, Mithal V, Taly A (2019) Explainable ai in industry. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp 3203–3204
Gan C, Wang N, Yang Y, Yeung DY, Hauptmann AG (2015) Devnet: a deep event network for multimedia event detection and evidence recounting. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2568–2577
Li LJ, Fei-Fei L (2007) What, where and who? Classifying events by scene and object recognition. In: 2007 IEEE 11th international conference on computer vision, pp 1–8. IEEE
Liu M, Liu X, Li Y, Chen X, Hauptmann AG, Shan S (2015) Exploiting feature hierarchies with convolutional neural networks for cultural event recognition. In: Proceedings of the IEEE international conference on computer vision workshops, pp 32–37
Mattivi R, Uijlings J, De Natale F, Sebe N (2011) Exploitation of time constraints for (sub-) event recognition. In: Proceedings of the 2011 joint ACM workshop on modeling and representing events, pp 7–12
Papadopoulos S, Troncy R, Mezaris V, Huet B, Kompatsiaris I (2011) Social event detection at mediaeval 2011: challenges, dataset and evaluation. In: MediaEval
Park S, Kwak N (2015) Cultural event recognition by subregion classification with convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 45–50
Rosani A, Boato G, De Natale F (2015) Eventmask: a game-based framework for event-saliency identification in images. IEEE Trans Multimed 17 (8):1359–1371
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Said N, Ahmad K, Riegler M, Pogorelov K, Hassan L, Ahmad N, Conci N (2019) Natural disasters detection in social media and satellite imagery: a survey. Multimed Tools Applic 78(22):31,267–31,302
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A et al (2015) Going deeper with convolutions. Cvpr
Wang L, Wang Z, Du W, Qiao Y (2015) Object-scene convolutional neural networks for event recognition in images. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 30–35
Wang L, Wang Z, Qiao Y, Van Gool L (2018) Transferring deep object and scene representations for event recognition in still images. Int J Comput Vis 126(2):390–409
Wei X, Gao BB, Wu J (2015) Deep spatial pyramid ensemble for cultural event recognition. In: Proceedings of the IEEE international conference on computer vision workshops, pp 38–44
Xiong Y, Zhu K, Lin D, Tang X (2015) Recognize complex events from static images by fusing deep channels. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE
Yang S, Gao T, Wang J, Deng B, Lansdell B, Linares-Barranco B (2021) Efficient spike-driven learning with dendritic event-based processing. Front Neurosci 15:97
Yang S, Wang J, Hao X, Li H, Wei X, Deng B, Loparo KA (2021) Bicoss: toward large-scale cognition brain with multigranular neuromorphic architecture. IEEE Transactions on Neural Networks and Learning Systems
Yang S, Wang J, Zhang N, Deng B, Pang Y, Azghadi MR (2021) Cerebellumorphic: large-scale neuromorphic model and architecture for supervised motor learning. IEEE Transactions on Neural Networks and Learning Systems
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2921–2929
Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in neural information processing systems, pp 487–495
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Khan, I., Ahmad, K., Gul, N. et al. Explainable event recognition. Multimed Tools Appl 82, 40531–40557 (2023). https://doi.org/10.1007/s11042-023-14832-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-14832-0