Abstract
Both the multiple sources of the available in-the-wild datasets and noisy information of images lead to huge challenges for discriminating subtle distinctions between combinations of regional expressions in facial expression recognition (FER). Although deep learning-based approaches have made substantial progresses in FER in recent years, small-scale datasets result in over-fitting during training. To this end, we propose a novel LSGB method which focuses on discriminative attention regions accurately and pretrain the model on ImageNet with the aim of alleviating the problem of over-fitting. Specifically, a more efficient manner combined with a key map, multiple partial maps and a position map is presented in local relation (LR) module to construct higher-level entities through compositional relationship of local pixel pairs. A compact global weighted representation is aggregated by region features, of which the weight is obtained by putting original and regional images to the sequential layer of self-attention module. Finally, extensive experiments are conducted to verify the effectiveness of our proposal. The experimental results on three popular benchmarks demonstrate the superiority of our network with 88.8% on FERplus, 58.68% on AffectNet and 94.9% on JAFFE.
Similar content being viewed by others
Availability of data and material
Some or all data, models, or code generated or used during the study are available from the corresponding author by request.
References
Jack RE, Garrod OG, Yu H, Caldara R, Schyns PG (2012) Facial expressions of emotion are not culturally universal. In Proc Nat Acad Sci 109(19):7241–7244
Tian YI, Kanade T, Cohn JF (2001) Recognizing action units for facial expression analysis. IEEE Trans Pattern Anal Mach Intell 23(2):97–115
Bai Y, Gao C, Singh S et al (2018) A framework of rapid regional tsunami damage recognition from post-event terraSAR-x imagery using deep neural networks. IEEE Geosci Remote Sens Lett 15(1):43–47
M Valstar and M Pantic (2010) “Induced disgust, happiness and surprise: an addition to the mmi facial expression database.” In Proceeding of 2010 IEEE 3rd intern workshop on emotion corpora for research on emotion and affect pp 65–70
M Lyons, S Akamatsu, M Kamachi, and J Gyoba (1998) “Coding facial expressions with gabor wavelets.” In Proceeding of 1998 IEEE 3rd international conference on automatic face and gesture recognition, Nara, Japan, pp 200–205
A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, and A N Gomez (2017) “Attention is all you need.” In Proc. 2017 IEEE 31st advances in neural information processing systems, long beach, USA, , pp 6000–6010
E Barsoum, C Zhang, C C Ferrer, and Z Zhang (2016) “Training deep networks for facial expression recognition with crowd-sourced label distribution.” In Proc. the 2019 IEEE 18th ACM international conference on multimodal interaction, Tokyo, Japan, pp 27–28
Zhao G, Huang X, Taini M, Li SZ, PietikaInen M (2011) Facial expression recognition from near-infrared videos. Image Vis Comput 29(9):607–619
A Dhall, O Ramana Murthy, R Goecke, J Joshi, and T Gedeon (2015) “Video and image based emotion recognition challenges in the wild.” In Proc. 2015 IEEE 9th ACM international conference on multimodal interaction (ICMI): Emotiw 2015, Seattle, USA, pp 423–426
C F Benitez-Quiroz, R Srinivasan, and A M Martinez (2016) “Emotionet: an accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild.” In Proc. the 2016 IEEE 29th computer vision and pattern recognition (CVPR), Las Vegas, USA, pp 5562–5570
S Li, W Deng, and J Du (2017) “Reliable crowd-sourcing and deep locality preserving learning for expression recognition in the wild.” In proceeding of the 2017 IEEE 30th computer vision and pattern recognition (CVPR), Hawaii, USA, pp 2584–2593
Mollahosseini A, Hasani B, Mahoor MH (2017) Affectnet: a database for facial expression, valence, and arousal computing in the wild. Trans Affect Comput 10(1):18–31
Zhang Z, Luo P, Chen CL, Tang X (2018) From facial expression recognition to interpersonal relation prediction. Int J Comput Vision 126(5):1–20
Shan C, Gong S, Mcowan PW (2009) Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis Comput 27(6):803–816
Zhao G, Pietikainen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans Pattern Anal Mach Intell 29(6):915–928
Y Yaddaden, M Adda, and A Bouzouane (2021) “Facial expression recognition using locally linear embedding with LBP and HOG descriptors.” In Proceeding of the 2021 2nd International workshop on human-centric smart environments for health and well-being (IHSH) Boumerdes, Algeria, 221–226
Zhi R, Flierl M, Ruan Q, Kleijn WB (2011) “Graph-preserving sparse nonnegative matrix factorization with application to facial expression recognition”, IEEE transactions on systems, man, and cybernetics. Part B (Cybernetics) 41(1):38–52
L Zhong, Q Liu, P Yang, B Liu, J Huang, and D N Metaxas (2012) “Learning active facial patches for expression analysis.” In proceeding of the 2012 IEEE 25th computer vision and pattern recognition (CVPR) Providence, USA, 2562–2569
Pauline CN, Steven H (2003) Sift: predicting amino acid changes that affect protein function. Nucleic Acids Res 31(13):3812–3814
Chengjun L, Wechsler H (2002) Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Trans Image Process 11(4):467–476
Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503
R Vemulapalli, and A Agarwala (2019) “A compact embedding for facial expression similarity.” In Proceeding of the 2019 IEEE 20th computer vision and pattern recognition (CVPR), Long Beach, USA, 5683–5692
X Niu, H Han,, S Yang, Y Huang, and S Shan 9(2019) “Local relationship learning with person specific shape regularization for facial action unit detection.” In proceeding of the 2019 IEEE 20th computer vision and pattern recognition (CVPR), Long Beach, USA, pp 11917–11926
K Zhao, W S Chu, F Torre, J F Cohn, and H Zhang (2015) “Joint patch and multi-label learning for facial action unit detection.” In Proceeding of the 2015 IEEE 18th computer vision and pattern recognition (CVPR), Boston, USA, 2207–2216
Y Li, J Zeng, S Shan, and X Chen (2019) “Self-supervised representation learning from videos for facial action unit detection.” In Proceeding of the 2019 IEEE 20th computer vision and pattern recognition (CVPR), Long Beach, USA, pp 10916–10925
Ekman P, Friesen WV (1978) Facial action coding system: a technique for the measurement of facial movement. Riv Psichiatr 47(2):126–138
Wang K, Peng X, Yang J, Meng D, Qiao Y (2020) Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans Image Process 29:4057–4069
Li Y, Zeng J, Shan S, Chen X (2019) Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE Trans Image Proc 28(5):2439–2450
K Wang, X Peng, J Yang, S Lu, and Y Qiao (2020) “Suppressing uncertainties for large-scale facial expression recognition.” In proceedings of 2020 IEEE 21th computer vision and pattern recognition (CVPR), Seattle, USA, pp 6896–6905
J She, Y Hu, H Shi, J Wang, Q Shen, and T Mei (2021) “Dive into ambiguity: latent distribution mining and pairwise uncertainty estimation for facial expression recognition.” In Proceeding of 2021 IEEE 22th computer vision and pattern recognition (CVPR), Nashville, USA, pp 6244–6253
D Ruan, Y Yan, S Lai, Z Chai, C Shen, and H Wang (2021) “Feature decomposition and reconstruction learning for effective facial expression recognition.” In Proceeding of 2021 IEEE 22th computer vision and pattern recognition (CVPR), Nashville, USA, 7656–7665
Gera D, Balasubramanian S (2021) Landmark guidance independent spatio channel attention and complementary context information based facial expression recognition. Pattern Recogn Lett 145:58–66
Q Cao, L Shen, W Xie, O M Parkhi, and A Zisserman (2018) “VGGFACE2: a dataset for recognising face across pose and age.” In proceeding of the 2018 IEEE 13th international conference on automatic face & gesture recognition (FG), Xi'an, China, 67–74
Norouzi E, Ahmadabadi MN, Araabi BN (2011) Attention control with reinforcement learning for face recognition under partial occlusion. Mach Vis Appl 22(2):337–348
D Meng, X Peng, K Wang and Y Qiao (2019) “Attention networks for facial expression recognition in videos.” In proceedings of the 2019 IEEE 26th international conference on image processing (ICIP), Taiwan, 3866–3870
L Zhao, L Xi, Y Zhuang, and J Wang (2017) “Deeply-learned part-aligned representations for person re-identification.” In proceeding of . the 2017 IEEE 16th international conference on computer vision (ICCV), Italy, 3239–3248
Xie S, Hu H, Wu Y (2019) Deep multi-path convolutional neural network joint with salient region attention for facial expression recognition. Pattern Recogn 92:177–191
Long X, Melo GD, He D (2020) Purely attention based local feature integration for video classification. IEEE Trans Software Eng 14:99
J Wang, Y Yuan, and G Yu (2017) “Face attention network: an effective face detector for the occluded faces.” CoRR, abs/1711.07246
J Yang, P Ren, D Zhang, D Chen, F Wen and H Li (2017) “Neural aggregation network for video face recognition.” In proceedings of the 2017 19th IEEE conference on computer vision and pattern recognition (CVPR). Honolulu, USA, pp 5216–5225
Hengshun Zhou, Debin Meng, Yuanyuan Zhang, (2019) “Exploring emotion features and fusion strategies for audio-video emotion recognition ”. In proceeding of the 2019 international conference on multimodal interaction, Suzhou, China, pp 562–566
V Kazemi, and J Sullivan (2014) “One millisecond face alignment with an ensemble of regression trees.” In proceeding of the 2014 IEEE 26th computer vision and pattern recognition (CVPR), Columbus, USA, pp 1867–1874
Fan X, Jiang W, Luo H, Fei M (2019) Spherereid: deep hypersphere manifold embedding for person re-identification. J Vis Commun Image Represent 60:51–58
H Hu, Z Zhang, Z Xie, and S Lin (2019) “Local relation networks for image recognition.” In proceedings the 2019 IEEE 17th international conference on computer vision (ICCV), Seoul, Korea, 3463–3472
Luo H, Jiang W, Gu Y, Liu F, Liao X, Lai S (2020) A strong baseline and batch normalization neck for deep person re-identification. IEEE Trans Multimedia 22(10):2597–2609
C Huang (2017) “Combining convolutional neural networks for emotion recognition.” In proceedings of the 2017 IEEE MIT undergraduate research technology conference (URTC), Cambridge, USA, pp 1–4
J Zeng, S Shan, and X Chen (2018) “Facial expression recognition with inconsistently annotated datasets.” In proceeding of the 2018 IEEE 15th European conference on computer vision (ECCV), Munich, Germany, pp 1–16
Minaee S, Abdolrashidi A (2021) Deep-emotion: facial expression recognition using attentional convolutional network. Sensors 21(9):3046
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Su, C., Wei, J., Lin, D. et al. Using attention LSGB network for facial expression recognition. Pattern Anal Applic 26, 543–553 (2023). https://doi.org/10.1007/s10044-022-01124-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-022-01124-w