[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition

Published: 01 January 2020 Publication History

Abstract

Occlusion and pose variations, which can change facial appearance significantly, are two major obstacles for automatic Facial Expression Recognition (FER). Though automatic FER has made substantial progresses in the past few decades, occlusion-robust and pose-invariant issues of FER have received relatively less attention, especially in real-world scenarios. This paper addresses the real-world pose and occlusion robust FER problem in the following aspects. First, to stimulate the research of FER under real-world occlusions and variant poses, we annotate several in-the-wild FER datasets with pose and occlusion attributes for the community. Second, we propose a novel Region Attention Network (RAN), to adaptively capture the importance of facial regions for occlusion and pose variant FER. The RAN aggregates and embeds varied number of region features produced by a backbone convolutional neural network into a compact fixed-length representation. Last, inspired by the fact that facial expressions are mainly defined by facial action units, we propose a region biased loss to encourage high attention weights for the most important regions. We validate our RAN and region biased loss on both our built test datasets and four popular datasets: FERPlus, AffectNet, RAF-DB, and SFEW. Extensive experiments show that our RAN and region biased loss largely improve the performance of FER with occlusion and variant pose. Our method also achieves state-of-the-art results on FERPlus, AffectNet, RAF-DB, and SFEW. Code and the collected test data will be publicly available.

References

[1]
S. Albanie, A. Nagrani, A. Vedaldi, and A. Zisserman, “Emotion recognition in speech using cross-modal transfer in the wild,” in Proc. ACM Multimedia Conf. Multimedia Conf. (MM), 2018, pp. 292–301.
[2]
S. Albanie, A. Nagrani, A. Vedaldi, and A. Zisserman, “Emotion recognition in speech using cross-modal transfer in the wild,” Aug. 2018, arXiv:1808.05561. [Online]. Available: https://arxiv.org/abs/1808.05561
[3]
B. Amos, B. Ludwiczuk, and M. Satyanarayanan, “Openface: A general-purpose face recognition library with mobile applications,” CMU School Comput. Sci., Pittsburgh, PA, USA, Tech. Rep. CMU-CS-16-118, 2016.
[4]
D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2015.
[5]
E. Barsoum, C. Zhang, C. C. Ferrer, and Z. Zhang, “Training deep networks for facial expression recognition with crowd-sourced label distribution,” in Proc. 18th ACM Int. Conf. Multimodal Interact. (ICMI), 2016, pp. 279–283.
[6]
J. C. Batista, V. Albiero, O. R. P. Bellon, and L. Silva, “AUMPNet: Simultaneous action units detection and intensity estimation on multipose facial images using a single convolutional neural network,” in Proc. 12th IEEE Int. Conf. Autom. Face Gesture Recognit. (FG), May 2017, pp. 866–871.
[7]
J. D. Boucher and P. Ekman, “Facial areas and emotional information,” J. Commun., vol. 25, no., pp. 21–29, Jun. 1975.
[8]
F. Bourel, C. C. Chibelushi, and A. A. Low, “Recognition of facial expressions in the presence of occlusion,” in Proc. Brit. Mach. Vis. Conf. (BMVC), 2001, pp. 1–10.
[9]
J. Cai, Z. Meng, A. S. Khan, Z. Li, J. Oreilly, and Y. Tong, “Island loss for learning discriminative features in facial expression recognition,” in Proc. 13th IEEE Int. Conf. Autom. Face Gesture Recognit. (FG), May 2018, pp. 302–309.
[10]
Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman, “VGGFace2: A dataset for recognising faces across pose and age,” in Proc. 13th IEEE Int. Conf. Autom. Face Gesture Recognit. (FG), May 2018, pp. 67–74.
[11]
J. Cheng, L. Dong, and M. Lapata, “Long short-term memory networks for machine reading,” in Proc. Conf. Empirical Methods Natural Lang. Process., Jan. 2016, pp. 551–561.
[12]
S. F. Cotter, “Sparse representation for accurate classification of corrupted and occluded facial expressions,” in Proc. ICASSP, Apr. 2010, pp. 838–841.
[13]
S. F. Cotter, “Weighted voting of sparse representation classifiers for facial expression recognition,” in Proc. Signal Process. Eur. Conf., 2010, pp. 1164–1168.
[14]
N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2005, pp. 886–893.
[15]
A. Dhall, R. Goecke, S. Lucey, and T. Gedeon, “Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark,” in Proc. IEEE Int. Conf. Comput. Vis. Workshops (ICCV Workshops), Nov. 2011, pp. 2106–2112.
[16]
A. Dhall, R. Goecke, S. Lucey, and T. Gedeon, “Collecting large, richly annotated facial-expression databases from movies,” IEEE Multimedia-Mag., vol. 19, no. 3, pp. 34–41, Jul. 2012.
[17]
A. Dhall, A. Kaur, R. Goecke, and T. Gedeon, “EmotiW 2018: Audio-video, student engagement and group-level affect prediction,” in Proc. Int. Conf. Multimodal Interact. (ICMI). New York, NY, USA: ACM, 2018, pp. 653–656.
[18]
A. Dhall, O. Ramana Murthy, R. Goecke, J. Joshi, and T. Gedeon, “Video and image based emotion recognition challenges in the wild: EmotiW 2015,” in Proc. ACM Int. Conf. Multimodal Interact. (ICMI). New York, NY, USA: ACM, 2015, pp. 423–426.
[19]
C. Ding and D. Tao, “A comprehensive survey on pose-invariant face recognition,” ACM Trans. Intell. Syst. Technol., vol. 7, no. 3, pp. 1–42, Feb. 2016.
[20]
H. Ding, S. K. Zhou, and R. Chellappa, “FaceNet2ExpNet: Regularizing a deep face recognition net for expression recognition,” in Proc. 12th IEEE Int. Conf. Autom. Face Gesture Recognit. (FG), May 2017, pp. 118–126.
[21]
C. F. Benitez-Quiroz, R. Srinivasan, and A. M. Martinez, “EmotioNet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 5562–5570.
[22]
B. Fasel, “Robust face analysis using convolutional neural networks,” in Proc. ICPR, Jun. 2003, pp. 40–43.
[23]
I. J. Goodfellowet al., “Challenges in representation learning: A report on three machine learning contests,” Neural Netw., vol. 64, pp. 59–63, Apr. 2015.
[24]
S. L. Happy and A. Routray, “Automatic facial expression recognition using features of salient facial patches,” IEEE Trans. Affective Comput., vol. 6, no. 1, pp. 1–12, Jan. 2015.
[25]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 770–778.
[26]
J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 7132–7141.
[27]
C. Huang, “Combining convolutional neural networks for emotion recognition,” in Proc. IEEE MIT Undergraduate Res. Technol. Conf. (URTC), Nov. 2017, pp. 1–4.
[28]
H. Jung, S. Lee, J. Yim, S. Park, and J. Kim, “Joint fine-tuning in deep neural networks for facial expression recognition,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 2983–2991.
[29]
S. E. Kahouet al., “Combining modality specific deep neural networks for emotion recognition in video,” in Proc. Int. Conf. Multimodal Interact., 2013, pp. 543–550.
[30]
T. Kanade, J. Cohn, and Y. Tian, “Comprehensive database for facial expression analysis,” in Proc. 4th IEEE Int. Conf. Autom. Face Gesture Recognit., Nov. 2002, p. 46.
[31]
V. Kazemi and J. Sullivan, “One millisecond face alignment with an ensemble of regression trees,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 1867–1874.
[32]
I. Kotsia, I. Buciu, and I. Pitas, “An analysis of facial expression recognition under partial facial image occlusion,” Image Vis. Comput., vol. 26, no. 7, pp. 1052–1067, Jul. 2008.
[33]
G. Levi and T. Hassner, “Emotion recognition in the wild via convolutional neural networks and mapped binary patterns,” in Proc. ACM Int. Conf. Multimodal Interaction (ICMI). New York, NY, USA: ACM, 2015, pp. 503–510.
[34]
S. Li and W. Deng, “Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition,” IEEE Trans. Image Process., vol. 28, no. 1, pp. 356–370, Jan. 2019.
[35]
S. Li and W. Deng, “Deep facial expression recognition: A survey,” CoRR, vol. abs/1804.08348, Jun. 2018.
[36]
S. Li, W. Deng, and J. Du, “Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 2584–2593.
[37]
Y. Li, J. Zeng, S. Shan, and X. Chen, “Occlusion aware facial expression recognition using CNN with attention mechanism,” IEEE Trans. Image Process., vol. 28, no. 5, pp. 2439–2450, May 2019.
[38]
S. Liao, A. K. Jain, and S. Z. Li, “Partial face recognition: Alignment-free approach,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 5, pp. 1193–1205, May 2013.
[39]
C. Liu and H. Wechsler, “Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition,” IEEE Trans. Image Process., vol. 11, no. 4, pp. 467–476, Apr. 2002.
[40]
M. Liu, S. Li, S. Shan, and X. Chen, “AU-inspired deep networks for facial expression feature learning,” Neurocomputing, vol. 159, pp. 126–136, Jul. 2015.
[41]
S.-S. Liu, Y. Zhang, K.-P. Liu, and Y. Li, “Facial expression recognition under partial occlusion based on Gabor multi-orientation features fusion and local Gabor binary pattern histogram sequence,” in Proc. 9th Int. Conf. Intell. Inf. Hiding Multimedia Signal Process., Oct. 2013, pp. 218–222.
[42]
X. Long, C. Gan, G. D. Melo, J. Wu, X. Liu, and S. Wen, “Attention clusters: Purely attention based local feature integration for video classification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2018, pp. 7834–7843.
[43]
M.-T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attention-based neural machine translation,” Aug. 2015, arXiv:1508.04025. [Online]. Available: https://arxiv.org/abs/1508.04025
[44]
M. J. Lyons, S. Akamatsu, M. Kamachi, J. Gyoba, and J. Budynek, “The japanese female facial expression (jaffe) database,” in Proc. FG, 1998, pp. 14–16.
[45]
D. Meng, X. Peng, K. Wang, and Y. Qiao, “Frame attention networks for facial expression recognition in videos,” Jun. 2019, arXiv:1907.00193. [Online]. Available: https://arxiv.org/abs/1907.00193
[46]
Z. Meng, P. Liu, J. Cai, S. Han, and Y. Tong, “Identity-aware convolutional neural network for facial expression recognition,” in Proc. 12th IEEE Int. Conf. Autom. Face Gesture Recognit. (FG), May 2017, pp. 558–565.
[47]
V. Mnih, N. Heess, A. Graves, and K. kavukcuoglu, “Recurrent models of visual attention,” in Proc. NIPS, 2014, pp. 2204–2212.
[48]
A. Mollahosseini, B. Hasani, and M. H. Mahoor, “AffectNet: A database for facial expression, valence, and arousal computing in the wild,” IEEE Trans. Affect. Comput., vol. 10, no. 1, pp. 18–31, Jan. 2019.
[49]
P. C. Ng, “SIFT: Predicting amino acid changes that affect protein function,” Nucleic Acids Res., vol. 31, no. 13, pp. 3812–3814, Jul. 2003.
[50]
O. M. Parkhi, A. Vedaldi, and A. Zisserman, “Deep face recognition,” in Proc. Brit. Mach. Vis. Conf., 2015, vol. 1, no. 3, p. 6.
[51]
O. Rudovic, M. Pantic, and I. Patras, “Coupled Gaussian processes for pose-invariant facial expression recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 6, pp. 1357–1369, Jun. 2013.
[52]
C. Shan, S. Gong, and P. W. Mcowan, “Facial expression recognition based on Local Binary Patterns: A comprehensive study,” Image Vis. Comput., vol. 27, no. 6, pp. 803–816, May 2009.
[53]
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” Sep. 2014, arXiv:1409.1556. [Online]. Available: https://arxiv.org/abs/1409.1556
[54]
Y. Sun, X. Wang, and X. Tang, “Deep learning face representation from predicting 10,000 classes,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 1891–1898.
[55]
C. Szegedyet al., “Going deeper with convolutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2015, pp. 1–9.
[56]
L. Tan, K. Zhang, K. Wang, X. Zeng, X. Peng, and Y. Qiao, “Group emotion recognition with individual facial emotion CNNs and global image based CNNs,” in Proc. 19th ACM Int. Conf. Multimodal Interact. (ICMI). New York, NY, USA: ACM, 2017, pp. 549–552.
[57]
Y. Tang, “Deep learning using linear support vector machines,” in Proc. Workshop Challenges Represent. Learn. (ICML), Jun. 2013.
[58]
A. Vaswaniet al., “Attention is all you need,” in Proc. NIPS. New York, NY, USA: Curran Associates, 2017, pp. 5998–6008.
[59]
J. Wang, Y. Yuan, and G. Yu, “Face attention network: An effective face detector for the occluded faces,” CoRR, vol. abs/1711.07246, Nov. 2017.
[60]
Y. Wen, K. Zhang, Z. Li, and Y. Qiao, “A discriminative feature learning approach for deep face recognition,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2016, pp. 499–515.
[61]
F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales, “Learning to compare: Relation network for few-shot learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 1199–1208.
[62]
J. Yang, P. Ren, D. Chen, F. Wen, H. Li, and G. Hua, “Neural aggregation network for video face recognition,” CoRR, vol. abs/1603.05474, pp. 4362–4371, Mar. 2016.
[63]
D. Yi, Z. Lei, S. Liao, and S. Z. Li, “Learning face representation from scratch,” Nov. 2014, arXiv:1411.7923. [Online]. Available: https://arxiv.org/abs/1411.7923
[64]
G. Yovel and B. Duchaine, “Specialized face perception mechanisms extract both part and spacing information: evidence from developmental prosopagnosia,” J. Cognit. Neurosci., vol. 18, no. 4, pp. 580–593, Apr. 2006.
[65]
Z. Yu and C. Zhang, “Image based static facial expression recognition with multiple deep network learning,” in Proc. ACM Int. Conf. Multimodal Interact. (ICMI). New York, NY, USA: ACM, 2015, pp. 435–442.
[66]
J. Zeng, S. Shan, and X. Chen, “Facial expression recognition with inconsistently annotated datasets,” in Proc. ECCV, 2018, pp. 227–243.
[67]
K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint face detection and alignment using multitask cascaded convolutional networks,” IEEE Signal Process. Lett., vol. 23, no. 10, pp. 1499–1503, Oct. 2016.
[68]
K. Zhang, L. Tan, Z. Li, and Y. Qiao, “Gender and smile classification using deep convolutional neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2016, pp. 34–38.
[69]
L. Zhang, D. Tjondronegoro, and V. Chandran, “Facial expression recognition experiments with data from television broadcasts and the World Wide Web,” Image Vis. Comput., vol. 32, no. 2, pp. 107–119, Feb. 2014.
[70]
L. Zhang, D. Tjondronegoro, and V. Chandran, “Random Gabor based templates for facial expression recognition in images with facial occlusion,” Neurocomputing, vol. 145, pp. 451–464, Dec. 2014.
[71]
S. Zhang, X. Zhao, and B. Lei, “Robust facial expression recognition via compressive sensing,” Sensors, vol. 12, no. 3, pp. 3747–3761, Mar. 2012.
[72]
X. Zhaoet al., “Peak-piloted deep network for facial expression recognition,” in Proc. Eur. Conf. Comput. Vis., Cham, Switzerland: Springer, 2016, pp. 425–442.

Cited By

View all
  • (2025)Multimodal temporal context network for tracking dynamic changes in emotionThe Journal of Supercomputing10.1007/s11227-024-06484-081:1Online publication date: 1-Jan-2025
  • (2024)Facial Expression Recognition Using a Semantic-Based Bottleneck Attention ModuleInternational Journal on Semantic Web & Information Systems10.4018/IJSWIS.35241820:1(1-25)Online publication date: 17-Sep-2024
  • (2024)Refining Facial Expression Recognition with Bilinear ResSpikeNet (BRS-Net): ADeep Learning ApproachProceedings of the 2024 2nd Asia Conference on Computer Vision, Image Processing and Pattern Recognition10.1145/3663976.3664027(1-6)Online publication date: 26-Apr-2024
  • Show More Cited By

Index Terms

  1. Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image IEEE Transactions on Image Processing
        IEEE Transactions on Image Processing  Volume 29, Issue
        2020
        3918 pages

        Publisher

        IEEE Press

        Publication History

        Published: 01 January 2020

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 11 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2025)Multimodal temporal context network for tracking dynamic changes in emotionThe Journal of Supercomputing10.1007/s11227-024-06484-081:1Online publication date: 1-Jan-2025
        • (2024)Facial Expression Recognition Using a Semantic-Based Bottleneck Attention ModuleInternational Journal on Semantic Web & Information Systems10.4018/IJSWIS.35241820:1(1-25)Online publication date: 17-Sep-2024
        • (2024)Refining Facial Expression Recognition with Bilinear ResSpikeNet (BRS-Net): ADeep Learning ApproachProceedings of the 2024 2nd Asia Conference on Computer Vision, Image Processing and Pattern Recognition10.1145/3663976.3664027(1-6)Online publication date: 26-Apr-2024
        • (2024)An Investigation into the Impact of Occlusion on Facial Emotion Recognition in the WildProceedings of the 17th International Conference on PErvasive Technologies Related to Assistive Environments10.1145/3652037.3663932(365-368)Online publication date: 26-Jun-2024
        • (2024)UFaceProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435468:1(1-27)Online publication date: 6-Mar-2024
        • (2024)Visual-Textual Attribute Learning for Class-Incremental Facial Expression RecognitionIEEE Transactions on Multimedia10.1109/TMM.2024.337457326(8038-8051)Online publication date: 12-Mar-2024
        • (2024)Uncertain Facial Expression Recognition via Multi-Task Assisted CorrectionIEEE Transactions on Multimedia10.1109/TMM.2023.330120926(2531-2543)Online publication date: 1-Jan-2024
        • (2024)Cross-Layer Contrastive Learning of Latent Semantics for Facial Expression RecognitionIEEE Transactions on Image Processing10.1109/TIP.2024.337845933(2514-2529)Online publication date: 27-Mar-2024
        • (2024)Relationship-Guided Knowledge Transfer for Class-Incremental Facial Expression RecognitionIEEE Transactions on Image Processing10.1109/TIP.2024.337411633(2293-2304)Online publication date: 1-Jan-2024
        • (2024)CEPrompt: Cross-Modal Emotion-Aware Prompting for Facial Expression RecognitionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.342477734:11_Part_2(11886-11899)Online publication date: 8-Jul-2024
        • Show More Cited By

        View Options

        View options

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media