Abstract
Using lightweight networks for facial expression recognition (FER) is becoming an important research topic in recent years. The key to the success of FER with lightweight networks is to explore the potentials of expression features in distinct abstract levels and regions, and design robust features to characterize the facial appearance. This paper proposes a lightweight network called Multi-feature Fusion Based Convolutional Neural Network (MFF-CNN), for image-based FER. The proposed model uses the Image Branch to extract both mid-level and high-level global features from the whole input image and utilizes the Patch Branch to extract local features from sixteen image patches of the original image. In MFF-CNN, feature selection based on L2 norm is performed to obtain more discriminative local features. Joint tuning is employed to integrate the two branches and fuse features. Experiment results on three widely used datasets, CK+, JAFFE and Oulu-CASIA show the proposed MFF-CNN outperforms the state-of-the-art methods in terms of average recognition accuracy. Compared to other competitive models with similar or larger number of parameters, our MFF-CNN improves the average recognition accuracy by 9.80% to 15.05%.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Yolcu G, Oztel I, Kazan S, et al. (2019) Facial expression recognition for monitoring neurological disorders based on convolutional neural network. Multimed Tools Appl 78:31581–31603. https://doi.org/10.1007/s11042-019-07959-6
Jabon M, Bailenson J, Pontikakis E, et al. (2011) Facial expression analysis for predicting unsafe driving behavior. IEEE Perv Comput 10:84–95. https://doi.org/10.1109/mprv.2010.46
Chu H, Li D, Fidler S (2018) A face-to-face neural conversation model. In: IEEE/CVF Conference on computer vision and pattern recognition(CVPR), pp 7113–7121. https://doi.org/10.1109/cvpr.2018.00743
Ekman P, Friesen WV (1971) Constants across cultures in the face and emotion. J Pers Soc Psychol 17(2):124–129. https://doi.org/10.1037/h0030377
Ekman P, Friesen WV (1978) Facial action coding system (FACS): A technique for the measurement of facial movement. Consulting Psychologists Press
Wang S, Ding H, Peng G (2020) Dual learning for facial action unit detection under nonfull annotation. IEEE Trans Cybern 1–13. https://doi.org/10.1109/TCYB.2020.3003502
He J, Yu X, Sun B, Yu L (2021) Facial expression and action unit recognition augmented by their dependencies on graph convolutional networks. J Multimodal User Interfaces. https://doi.org/10.1007/s12193-020-00363-7
Wang S, Peng G (2019) Weakly supervised dual learning for facial action unit recognition. IEEE Trans Multimed 21:3218–3230. https://doi.org/10.1109/TMM.2019.2916063
Zhong L, Liu Q, Yang P, et al. (2007) Learning Multiscale Active Facial Patches for Expression Analysis. IEEE Trans Cybern 45:1499–1510. https://doi.org/10.1109/tcyb.2014.2354351
Majumder A, Behera L, Subramanian VK (2018) Automatic facial expression recognition system using deep network-based data fusion. IEEE Trans Cybern 48:103–114. https://doi.org/10.1109/tcyb.2016.2625419
Majumder A, Behera L, Subramanian VK (2018) Emotion recognition from geometric facial features using self-organizing map. Pattern Recognit 47:1282–1293. https://doi.org/10.1016/j.patcog.2013.10.010
Kong F (2019) Facial expression recognition method based on deep convolutional neural network combined with improved LBP features. Pers Ubiquitous Comput 531–539. https://doi.org/10.1007/s00779-019-01238-9
Revina IM, Emmanuel WRS (2019) Face expression recognition with the optimization based multi-SVNN classifier and the modified LDP features. J Vis Communi Image Represent 62:43–55. https://doi.org/10.1016/j.jvcir.2019.04.013
Zhang T, Zheng W, Cui Z, et al. (2016) A deep neural network-driven feature learning method for multi-view facial expression recognition. IEEE Trans Multimed 18(12):2528–2536. https://doi.org/10.1109/tmm.2016.2598092
Uddin MZ, Khaksar W, Torresen J, et al. (2017) Facial expression recognition using salient features and convolutional neural network. IEEE Access 5:26146–26161. https://doi.org/10.1109/access.2017.2777003
Shao J, Qian Y (2019) Three convolutional neural network models for facial expression recognition in the wild. Neurocomputing 355:82–92. https://doi.org/10.1016/j.neucom.2019.05.005
Xie S, Hu H (2019) Facial expression recognition using hierarchical features with deep comprehensive multipatches aggregation convolutional neural networks. IEEE Trans Multimed 21:211–220. https://doi.org/10.1109/tmm.2018.2844085
Nguyen H, Kim S, Lee G et al (2019) Facial expression recognition using a temporal ensemble of multi-level convolutional neural network. IEEE Trans Affect Comput. (Early Access Article) https://doi.org/10.1109/taffc.2019.2946540
Wang J, Yuan C (2016) Facial expression recognition with multiscale convolution neural network. In: 17th Pacific-rim conference on advances in multimedia information processing, pp 376–385. https://doi.org/10.1007/978-3-319-48890-5_37
Hamester D, Barros P, Wermter S (2015) Face expression recognition with a 2-channel convolutional neural network. In: International Joint Conference on Neural Networks (IJCNN). https://doi.org/10.1109/IJCNN.2015.7280539
Jung H, Lee S, Yim J et al (2015) Joint fine-tuning in deep neural networks for facial expression recognition. In: IEEE Int Conf Comput Vis (ICCV), pp 2983–2991. https://doi.org/10.1109/iccv.2015.341
Happy SL, Routray A (2014) Automatic facial expression recognition using features of salient facial patches. IEEE Trans Affect Comput 6(1):1–12. https://doi.org/10.1109/taffc.2014.2386334
Babenko A, Lempitsky V (2015) Aggregating local deep features for image retrieval. In: IEEE Int Conf Comput Vis (ICCV), 1269–1277. https://doi.org/10.1109/iccv.2015.150
Lucey P, Cohn JF, Kanade T, et al. (2010) The extended cohn-kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 94–101. https://doi.org/10.1109/cvprw.2010.5543262
Lyons M, Akamatsy S, Kamachi M et al (1998) Coding facial expressions with Gabor wavelets. In: 3rd IEEE International conference on automatic face and gesture recognition, pp 200–205. https://doi.org/10.1109/afgr.1998.670949
Taini M, Zhao G, Li SZ, Pietikainen M (2008) Facial expression recognition from near-infrared videos. In: 19th International conference on pattern recognition (ICPR), pp 607–619. https://doi.org/10.1109/ICPR.2008.4761697
Dhall A, Murthy OVR, Geoecke R, et al. (2015) Video and image based emotion recognition challenges in the wild: EmotiW 2015. In: Proceedings of the 2015 ACM on international conference on multimodal interaction, pp 423–426. https://doi.org/10.1145/2818346.2829994
Ding H, Zhou SK, Chellappa R (2017) FaceNet2ExpNet: Regularizing a deep face recognition net for expression recognition. In: IEEE 12th International conference on automatic face & gesture recognition, pp 118–126. https://doi.org/10.1109/FG.2017.23
Alphonse AS, Dharma D (2017) Enhanced Gabor (E-Gabor), hypersphere-based normalization and pearson general kernel-based discriminant analysis for dimension reduction and classification of facial emotions. Expert Syst Appl 90:127–145. https://doi.org/10.1016/j.eswa.2017.08.013
Ming Z, Chazalon J, Luqman MM et al (2018) FaceLiveNet end-to-end networks combining face verification with interactive facial expression-based liveness detection. In: 24th International conference on pattern recognition (ICPR), pp 3507–3512. https://doi.org/10.1109/ICPR.2018.8545274
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154. https://doi.org/10.1023/B:VISI.0000013087.49260.fb
Deng J, Guo J, Ververas E, et al. (2020) RetinaFace: Single-shot multi-level face localisation in the wild. In: IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 5202–5211. https://doi.org/10.1109/CVPR42600.2020.00525
Kingma D, Ba J (2015) Adam: A method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015). arXiv:1412.6980
Li Y, Zeng J, Shan S et al (2018) Patch-gated CNN for occlusion-aware facial expression recognition. In: 24th International conference on pattern recognition (ICPR), pp 2209–2214. https://doi.org/10.1109/ICPR.2018.8545853
Li S, Deng W (2018) Reliable Crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition. IEEE Trans Image Proc (TIP) 28:356–370. https://doi.org/10.1109/TIP.2018.2868382
Li H, Wang N, Yu Y, et al. (2021) LBAN-IL: A novel method of high discriminative representation for facial expression recognition. Neurocomputing 432:159–169. https://doi.org/10.1016/j.neucom.2020.12.076
Selvaraju RR, Cogswell M, Das A et al (2017) Grad-CAM: Visual explanations from deep networks via gradient-based localization. In: IEEE International conference on computer vision (ICCV), pp 618–626. https://doi.org/10.1109/ICCV.2017.74
Acknowledgements
This work was supported by Guangzhou Municipal People’s Livelihood Science and Technology Plan (201903010040), and Science and Technology Program of Guangzhou, China (202007030011).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zou, W., Zhang, D. & Lee, DJ. A new multi-feature fusion based convolutional neural network for facial expression recognition. Appl Intell 52, 2918–2929 (2022). https://doi.org/10.1007/s10489-021-02575-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02575-0