Abstract
This paper addresses the problem of recognition of naturally-appearing human facial movements (action units), as an intermediate step toward their aggregation for the recognition and understanding of facial expressions. With respect to the proposed method, we introduce a domain adaptation solution that is applied to deep convolutional networks, taking advantage of the networks capability of providing simultaneous predictions and discriminative embeddings. In this way, we adapt information gathered from training on mutual expression recognition to facial action unit detection. The described strategy is evaluated in the context of action units in the wild within the EmotioNet dataset and action units acquired in laboratory conditions within the DISFA and CK+ datasets. Our method achieves results comparable to state-of-the-art and demonstrates superior recognition in the case of rarely occurring action units. Additionally, the embedding space structuring is significantly enhanced with respect to the results obtained by classical losses.
Similar content being viewed by others
References
Ekman, P., Friesen, W.V., Hager, J.: Facial Action Coding System: Research Nexus, vol. 1. Network Research Information, Salt Lake City (2002)
Zhou, B., Ghose, T., Lukowicz, P.: Expressure: detect expressions related to emotional and cognitive activities using forehead textile pressure mechanomyography. Sensors 20(3), 730 (2020)
Bartlett, M., Hager, J., Ekman, P., Sejnowski, T.: Measuring facial expressions by computer image analysis. Psychophysiology 36(2), 253–263 (1999)
Ekman, P., Rosenberg, E.L.: What the Face Reveals: Basic and Applied Studies of Spontaneous Expression Using the FACS vol. 1. Oxford Scholarship, Oxford (2005)
Barsoum, E., Zhang, C., Ferrer, C., Zhang, Z.: Training deep networks for facial expression recognition with crowd-sourced label distribution. In: International Conference on Multimodal Interfaces, pp. 279–283 (2016)
Li, S., Deng, W.: Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition. IEEE Trans. Image Process. 28(1), 356–370 (2019)
Lucey, P., Cohn, J., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn–Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 94–101 (2010)
Sariyanidi, E., Gunes, H., Cavallaro, A.: Automatic analysis of facial affect: a survey of registration, representation, and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(6), 1113–1133 (2015)
Corneanu, C., Simón, M.O., Cohn, J., Guerrero, S.: Survey on RGB, 3D, thermal, and multimodal approaches for facial expression recognition: history, trends, and affect-related applications. IEEE Trans. Pattern Anal. Mach. Intell. 38(8), 1548–1568 (2016)
Cao, N., Jiang, Z., Gao, J., Cui, B.: Bearing state recognition method based on transfer learning under different working conditions. Sensors 20(1), 234 (2020)
Wang, M., Deng, W.: Deep visual domain adaptation: a survey. Neurocomputing 312, 135–153 (2018)
Ko, B.C.: A brief review of facial emotion recognition based on visual information. Sensors 18(2), 401 (2018)
Yu, Z., Zhang, C.: Image based static facial expression recognition with multiple deep network learning. In: Proceedings of the ACM Conference on Multimodal Interaction, pp. 435–442 (2015)
Zhao, X., Liang, X., Liu, L., Li, T., Han, Y., Vasconcelos, N., Yan, S.: Peak-piloted deep network for facial expression recognition. In: European Conference on Computer Vision, pp. 425–442 (2016)
Kuo, C.-M., Lai, S.-H., Sarkis, M.: A compact deep learning model for robust facial expression recognition. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 2121–2129 (2018)
Zhao, S., Cai, H., Liu, H., Zhang, J., Chen, S.: Feature selection mechanism in CNNs for facial expression recognition. In: British Machine Vision Conference (2018)
Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2852–2861 (2017)
Du, C., Du, C., Wang, H., Li, J., Zheng, W.L., Lu, B.L., He, H.: Semi-supervised deep generative modelling of incomplete multi-modality emotional data. In: ACM Multimedia, pp. 108–116 (2018)
Racoviteanu, A., Badea, M., Florea, C., Florea, L., Vertan, C.: Large margin loss for learning facial movements from pseudo-emotions. In: British Machine Vision Conference (2019)
Mavadati, S.M., Mahoor, M.H., Bartlett, K., Trinh, P., Cohn, J.F.: Disfa: A spontaneous facial action intensity database. IEEE Trans. Affect. Comput. 4(2), 151–160 (2013)
Benitez-Quiroz, F.C., Srinivasan, R., Martinez, A.: Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5562–5570 (2016)
Zhao, K., Chu, W.S., Zhang, H.: Deep region and multi-label learning for facial action unit detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3391–3399 (2016)
Corneanu, C., Madadi, M., Escalera, S.: Deep structure inference network for facial action unit recognition. In: European Conference on Computer Vision (2018)
Kaltwang, S., Todorovic, S., Pantic, M.: Latent trees for estimating intensity of facial action units. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 296–304 (2015)
Benitez-Quiroz, F., Wang, Y., Martinez, A.: Recognition of action units in the wild with deep nets and a new global-local loss. In: International Conference on Computer Vision, pp. 3990–3999. IEEE (2017)
Li, G., Zhu, X., Zeng, Y., Wang, Q., Lin, L.: Semantic relationships guided representation learning for facial action unit recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8594–8601 (2019)
Shao, Z., Liu, Z., Cai, J., Ma, L.: Deep adaptive attention for joint facial action unit detection and face alignment. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 705–720 (2018)
Song, T., Chen, L., Zheng, W., Ji, Q.: Uncertain graph neural networks for facial action unit detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 5993–6001 (2021)
Cao, J., Liu, Z., Zhang, Y.: Cross-subject action unit detection with meta learning and transformer-based relation modeling. In: 2022 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2022)
Shao, Z., Cai, J., Cham, T.-J., Lu, X., Ma, L.: Unconstrained facial action unit detection via latent feature domain. IEEE Trans. Affect. Comput. 13(2), 1111–1126 (2021)
Eleftheriadis, S., Rudovic, O., Pantic, M.: Multi-conditional latent variable model for joint facial action unit detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3792–3800 (2015)
Wang, Z., Li, Y., Wang, S., Ji, Q.: Capturing global semantic relationships for facial action unit recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3304–3311 (2013)
Hao, L., Wang, S., Peng, G., Ji, Q.: Facial action unit recognition augmented by their dependencies. In: 2018 13th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2018), pp. 187–194. IEEE (2018)
Zhao, K., Chu, W.-S., Martinez, A.M.: Learning facial action units from web images with scalable weakly supervised clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2090–2099 (2018)
Jacob, G.M., Stenger, B.: Facial action unit detection with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7680–7689 (2021)
Wen, Y., Zhang, K., Li, Z., Qiao, Y.: A discriminative feature learning approach for deep face recognition. In: European Conference on Computer Vision, pp. 499–515 (2016)
Zhang, X., Fang, Z., Wen, Y., Li, Z., Qiao, Y.: Range loss for deep face recognition with long-tailed training data. In: International Conference on Computer Vision, pp. 5419–5428 (2017)
Zheng, Y., Pal, D., Savvides, M.: Ring loss: convex feature normalization for face recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5089–5097 (2018)
Cai, J., Meng, Z., Khan, A.S., Li, Z., O’Reilly, J., Tong, Y.: Island loss for learning discriminative features in facial expression recognition. In: IEEE Conference on Face and Gesture, pp. 302–309 (2018)
Florea, C., Badea, M., L., F., Vertan, C., Racoviteanu, A.: Margin-mix: semi-supervised learning for face expression recognition. In: ECCV, vol. LNCS 12368, pp. 1–17 (2020)
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: 33rd Annual Meeting of the Association for Computational Linguistics, pp. 189–196 (1995)
Du, S., Tao, Y., Martinez, A.: Compound facial expressions of emotion. Proc. Natl. Acad. Sci. 111(15), 1454–1462 (2014)
Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn–Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-workshops, pp. 94–101. IEEE (2010)
Lee, D.-H.: Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: International Conference on Machine Learning Workshops (2013)
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pp. 630–645. Springer (2016)
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)
Florea, C., Florea, L., Vertan, C., Badea, M., Racoviteanu, A.: Annealed label transfer for face expression recognition. In: BMVC (2019)
Badea, M., Florea, C., Racoviţeanu, A., Florea, L., Vertan, C.: Timid semi-supervised learning for face expression analysis. Pattern Recogn. 138, 109417 (2023)
Li, W., Abtahi, F., Zhu, Z.: Action unit detection with region adaptation, multi-labeling learning and optimal temporal fusing. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6766–6775 (2017)
Cao, J., Liu, Z., Zhang, Y.: Cross-subject action unit detection with meta learning and transformer-based relation modeling. In: 2022 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2022)
Song, T., Chen, L., Zheng, W., Ji, Q.: Uncertain graph neural networks for facial action unit detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 5993–6001 (2021)
Ghosh, S., Laksana, E., Scherer, S., Morency, L.-P.: A multi-label convolutional neural network approach to cross-domain action unit detection. In: 2015 International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 609–615. IEEE (2015)
Wang, S., Gan, Q., Ji, Q.: Expression-assisted facial action unit recognition under incomplete au annotation. Pattern Recogn. 61, 78–91 (2017)
Funding
This work was funded by the Ministry of Investments and European Projects through the Human Capital Sectoral Operational Program 2014-2020, Contract no. 62461/03.06.2022, SMIS code 153735.
Author information
Authors and Affiliations
Contributions
Andrei Racoviteanu, Corneliu Florea, Laura Florea, and Constantin Vertan were involved in the conceptualization; Andrei Racoviteanu and Corneliu Florea contributed to the methodology; Andrei Racoviteanu contributed to the software; Corneliu Florea and Laura Florea assisted in the validation; Andrei Racoviteanu and Corneliu Florea contributed to writing—original draft preparation; all authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Conflicts of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Racoviteanu, A., Florea, C., Florea, L. et al. Normalized margin loss for action unit detection. Machine Vision and Applications 35, 9 (2024). https://doi.org/10.1007/s00138-023-01490-3
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-023-01490-3