Using attention LSGB network for facial expression recognition

Chan Su¹,
Jianguo Wei¹,
Deyu Lin ORCID: orcid.org/0000-0003-1400-4769^1,2 &
…
Linghe Kong²

372 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Both the multiple sources of the available in-the-wild datasets and noisy information of images lead to huge challenges for discriminating subtle distinctions between combinations of regional expressions in facial expression recognition (FER). Although deep learning-based approaches have made substantial progresses in FER in recent years, small-scale datasets result in over-fitting during training. To this end, we propose a novel LSGB method which focuses on discriminative attention regions accurately and pretrain the model on ImageNet with the aim of alleviating the problem of over-fitting. Specifically, a more efficient manner combined with a key map, multiple partial maps and a position map is presented in local relation (LR) module to construct higher-level entities through compositional relationship of local pixel pairs. A compact global weighted representation is aggregated by region features, of which the weight is obtained by putting original and regional images to the sequential layer of self-attention module. Finally, extensive experiments are conducted to verify the effectiveness of our proposal. The experimental results on three popular benchmarks demonstrate the superiority of our network with 88.8% on FERplus, 58.68% on AffectNet and 94.9% on JAFFE.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

A Novel Facial Expression Recognition (FER) Model Using Multi-scale Attention Network

LKRNet: a dual-branch network based on local key regions for facial expression recognition

Article 28 July 2020

A dual stream attention network for facial expression recognition in the wild

Article 23 July 2024

Availability of data and material

Some or all data, models, or code generated or used during the study are available from the corresponding author by request.

References

Jack RE, Garrod OG, Yu H, Caldara R, Schyns PG (2012) Facial expressions of emotion are not culturally universal. In Proc Nat Acad Sci 109(19):7241–7244
Article Google Scholar
Tian YI, Kanade T, Cohn JF (2001) Recognizing action units for facial expression analysis. IEEE Trans Pattern Anal Mach Intell 23(2):97–115
Article Google Scholar
Bai Y, Gao C, Singh S et al (2018) A framework of rapid regional tsunami damage recognition from post-event terraSAR-x imagery using deep neural networks. IEEE Geosci Remote Sens Lett 15(1):43–47
Article Google Scholar
M Valstar and M Pantic (2010) “Induced disgust, happiness and surprise: an addition to the mmi facial expression database.” In Proceeding of 2010 IEEE 3rd intern workshop on emotion corpora for research on emotion and affect pp 65–70
M Lyons, S Akamatsu, M Kamachi, and J Gyoba (1998) “Coding facial expressions with gabor wavelets.” In Proceeding of 1998 IEEE 3rd international conference on automatic face and gesture recognition, Nara, Japan, pp 200–205
A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, and A N Gomez (2017) “Attention is all you need.” In Proc. 2017 IEEE 31st advances in neural information processing systems, long beach, USA, , pp 6000–6010
E Barsoum, C Zhang, C C Ferrer, and Z Zhang (2016) “Training deep networks for facial expression recognition with crowd-sourced label distribution.” In Proc. the 2019 IEEE 18th ACM international conference on multimodal interaction, Tokyo, Japan, pp 27–28
Zhao G, Huang X, Taini M, Li SZ, PietikaInen M (2011) Facial expression recognition from near-infrared videos. Image Vis Comput 29(9):607–619
Article Google Scholar
A Dhall, O Ramana Murthy, R Goecke, J Joshi, and T Gedeon (2015) “Video and image based emotion recognition challenges in the wild.” In Proc. 2015 IEEE 9th ACM international conference on multimodal interaction (ICMI): Emotiw 2015, Seattle, USA, pp 423–426
C F Benitez-Quiroz, R Srinivasan, and A M Martinez (2016) “Emotionet: an accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild.” In Proc. the 2016 IEEE 29th computer vision and pattern recognition (CVPR), Las Vegas, USA, pp 5562–5570
S Li, W Deng, and J Du (2017) “Reliable crowd-sourcing and deep locality preserving learning for expression recognition in the wild.” In proceeding of the 2017 IEEE 30th computer vision and pattern recognition (CVPR), Hawaii, USA, pp 2584–2593
Mollahosseini A, Hasani B, Mahoor MH (2017) Affectnet: a database for facial expression, valence, and arousal computing in the wild. Trans Affect Comput 10(1):18–31
Article Google Scholar
Zhang Z, Luo P, Chen CL, Tang X (2018) From facial expression recognition to interpersonal relation prediction. Int J Comput Vision 126(5):1–20
Article MathSciNet Google Scholar
Shan C, Gong S, Mcowan PW (2009) Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis Comput 27(6):803–816
Article Google Scholar
Zhao G, Pietikainen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans Pattern Anal Mach Intell 29(6):915–928
Article Google Scholar
Y Yaddaden, M Adda, and A Bouzouane (2021) “Facial expression recognition using locally linear embedding with LBP and HOG descriptors.” In Proceeding of the 2021 2nd International workshop on human-centric smart environments for health and well-being (IHSH) Boumerdes, Algeria, 221–226
Zhi R, Flierl M, Ruan Q, Kleijn WB (2011) “Graph-preserving sparse nonnegative matrix factorization with application to facial expression recognition”, IEEE transactions on systems, man, and cybernetics. Part B (Cybernetics) 41(1):38–52
Article Google Scholar
L Zhong, Q Liu, P Yang, B Liu, J Huang, and D N Metaxas (2012) “Learning active facial patches for expression analysis.” In proceeding of the 2012 IEEE 25th computer vision and pattern recognition (CVPR) Providence, USA, 2562–2569
Pauline CN, Steven H (2003) Sift: predicting amino acid changes that affect protein function. Nucleic Acids Res 31(13):3812–3814
Article Google Scholar
Chengjun L, Wechsler H (2002) Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Trans Image Process 11(4):467–476
Article Google Scholar
Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503
Article Google Scholar
R Vemulapalli, and A Agarwala (2019) “A compact embedding for facial expression similarity.” In Proceeding of the 2019 IEEE 20th computer vision and pattern recognition (CVPR), Long Beach, USA, 5683–5692
X Niu, H Han,, S Yang, Y Huang, and S Shan 9(2019) “Local relationship learning with person specific shape regularization for facial action unit detection.” In proceeding of the 2019 IEEE 20th computer vision and pattern recognition (CVPR), Long Beach, USA, pp 11917–11926
K Zhao, W S Chu, F Torre, J F Cohn, and H Zhang (2015) “Joint patch and multi-label learning for facial action unit detection.” In Proceeding of the 2015 IEEE 18th computer vision and pattern recognition (CVPR), Boston, USA, 2207–2216
Y Li, J Zeng, S Shan, and X Chen (2019) “Self-supervised representation learning from videos for facial action unit detection.” In Proceeding of the 2019 IEEE 20th computer vision and pattern recognition (CVPR), Long Beach, USA, pp 10916–10925
Ekman P, Friesen WV (1978) Facial action coding system: a technique for the measurement of facial movement. Riv Psichiatr 47(2):126–138
Google Scholar
Wang K, Peng X, Yang J, Meng D, Qiao Y (2020) Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans Image Process 29:4057–4069
Article MATH Google Scholar
Li Y, Zeng J, Shan S, Chen X (2019) Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE Trans Image Proc 28(5):2439–2450
Article MathSciNet Google Scholar
K Wang, X Peng, J Yang, S Lu, and Y Qiao (2020) “Suppressing uncertainties for large-scale facial expression recognition.” In proceedings of 2020 IEEE 21th computer vision and pattern recognition (CVPR), Seattle, USA, pp 6896–6905
J She, Y Hu, H Shi, J Wang, Q Shen, and T Mei (2021) “Dive into ambiguity: latent distribution mining and pairwise uncertainty estimation for facial expression recognition.” In Proceeding of 2021 IEEE 22th computer vision and pattern recognition (CVPR), Nashville, USA, pp 6244–6253
D Ruan, Y Yan, S Lai, Z Chai, C Shen, and H Wang (2021) “Feature decomposition and reconstruction learning for effective facial expression recognition.” In Proceeding of 2021 IEEE 22th computer vision and pattern recognition (CVPR), Nashville, USA, 7656–7665
Gera D, Balasubramanian S (2021) Landmark guidance independent spatio channel attention and complementary context information based facial expression recognition. Pattern Recogn Lett 145:58–66
Article Google Scholar
Q Cao, L Shen, W Xie, O M Parkhi, and A Zisserman (2018) “VGGFACE2: a dataset for recognising face across pose and age.” In proceeding of the 2018 IEEE 13th international conference on automatic face & gesture recognition (FG), Xi'an, China, 67–74
Norouzi E, Ahmadabadi MN, Araabi BN (2011) Attention control with reinforcement learning for face recognition under partial occlusion. Mach Vis Appl 22(2):337–348
Article Google Scholar
D Meng, X Peng, K Wang and Y Qiao (2019) “Attention networks for facial expression recognition in videos.” In proceedings of the 2019 IEEE 26th international conference on image processing (ICIP), Taiwan, 3866–3870
L Zhao, L Xi, Y Zhuang, and J Wang (2017) “Deeply-learned part-aligned representations for person re-identification.” In proceeding of . the 2017 IEEE 16th international conference on computer vision (ICCV), Italy, 3239–3248
Xie S, Hu H, Wu Y (2019) Deep multi-path convolutional neural network joint with salient region attention for facial expression recognition. Pattern Recogn 92:177–191
Article Google Scholar
Long X, Melo GD, He D (2020) Purely attention based local feature integration for video classification. IEEE Trans Software Eng 14:99
Google Scholar
J Wang, Y Yuan, and G Yu (2017) “Face attention network: an effective face detector for the occluded faces.” CoRR, abs/1711.07246
J Yang, P Ren, D Zhang, D Chen, F Wen and H Li (2017) “Neural aggregation network for video face recognition.” In proceedings of the 2017 19th IEEE conference on computer vision and pattern recognition (CVPR). Honolulu, USA, pp 5216–5225
Hengshun Zhou, Debin Meng, Yuanyuan Zhang, (2019) “Exploring emotion features and fusion strategies for audio-video emotion recognition ”. In proceeding of the 2019 international conference on multimodal interaction, Suzhou, China, pp 562–566
V Kazemi, and J Sullivan (2014) “One millisecond face alignment with an ensemble of regression trees.” In proceeding of the 2014 IEEE 26th computer vision and pattern recognition (CVPR), Columbus, USA, pp 1867–1874
Fan X, Jiang W, Luo H, Fei M (2019) Spherereid: deep hypersphere manifold embedding for person re-identification. J Vis Commun Image Represent 60:51–58
Article Google Scholar
H Hu, Z Zhang, Z Xie, and S Lin (2019) “Local relation networks for image recognition.” In proceedings the 2019 IEEE 17th international conference on computer vision (ICCV), Seoul, Korea, 3463–3472
Luo H, Jiang W, Gu Y, Liu F, Liao X, Lai S (2020) A strong baseline and batch normalization neck for deep person re-identification. IEEE Trans Multimedia 22(10):2597–2609
Article Google Scholar
C Huang (2017) “Combining convolutional neural networks for emotion recognition.” In proceedings of the 2017 IEEE MIT undergraduate research technology conference (URTC), Cambridge, USA, pp 1–4
J Zeng, S Shan, and X Chen (2018) “Facial expression recognition with inconsistently annotated datasets.” In proceeding of the 2018 IEEE 15th European conference on computer vision (ECCV), Munich, Germany, pp 1–16
Minaee S, Abdolrashidi A (2021) Deep-emotion: facial expression recognition using attentional convolutional network. Sensors 21(9):3046
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Software, Nanchang University, Nanchang, China
Chan Su, Jianguo Wei & Deyu Lin
School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
Deyu Lin & Linghe Kong

Authors

Chan Su
View author publications
You can also search for this author in PubMed Google Scholar
Jianguo Wei
View author publications
You can also search for this author in PubMed Google Scholar
Deyu Lin
View author publications
You can also search for this author in PubMed Google Scholar
Linghe Kong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deyu Lin.

Ethics declarations

Conflict of interest

All authors declare that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Su, C., Wei, J., Lin, D. et al. Using attention LSGB network for facial expression recognition. Pattern Anal Applic 26, 543–553 (2023). https://doi.org/10.1007/s10044-022-01124-w

Download citation

Received: 19 May 2022
Accepted: 08 November 2022
Published: 28 November 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s10044-022-01124-w

Using attention LSGB network for facial expression recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Novel Facial Expression Recognition (FER) Model Using Multi-scale Attention Network

LKRNet: a dual-branch network based on local key regions for facial expression recognition

A dual stream attention network for facial expression recognition in the wild

Availability of data and material

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Using attention LSGB network for facial expression recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Novel Facial Expression Recognition (FER) Model Using Multi-scale Attention Network

LKRNet: a dual-branch network based on local key regions for facial expression recognition

A dual stream attention network for facial expression recognition in the wild

Availability of data and material

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now