[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Timid semi–supervised learning for face expression analysis

Published: 01 June 2023 Publication History

Highlights

Face expression analysis suffers from insufficiently annotated data.
We introduce the timid semi-supervised strategy based on diversity for face expression analysis.
The solution produces superior results in both expression recognition and action unit estimation.
Simple solutions based on diversity perform better when biased unlabelled data is available.

Graphical abstract

Display Omitted

Abstract

In the last years, semi–supervised learning has been proposed as a strategy with high potential for improving machine learning capabilities. Face expression recognition may highly benefit from such a technique, as accurate labeling is both difficult and costly, whereas millions of unlabeled images with human faces are available on the Internet, but without annotations. In this paper we evaluate the benefits of semi–supervised learning in the practical scenarios of face expression analysis. Our conclusion is that better performance is indeed achievable, but by methods that put a distinct emphasis on the diversity of exploring patterns in the unlabeled data domain. The evaluation is carried on multiple tasks such as detecting Action Units on EmotioNet, assessing Action Units intensity on the spontaneous DISFA database and, respectively, recognizing expressions on static images acquired in the wild, from the RAF-DB and FER+ databases. We show that, in these scenarios, a so–called timid semi–supervised learner is more robust and achieves higher performance than standard, confident semi–supervised learners.

References

[1]
K. Zhao, W.-S. Chu, H. Zhang, Deep region and multi-label learning for facial action unit detection, Proc. of IEEE Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3391–3399.
[2]
R. Walecki, O. Rudovic, V. Pavlovic, B. Schuller, M. Pantic, Deep structured learning for facial action unit intensity estimation, Proc. of IEEE Computer Vision and Pattern Recognition (CVPR), 2017, pp. 5709–5718.
[3]
S. Li, W. Deng, Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition, IEEE Trans. Image Process. 28 (1) (2019) 356–370.
[4]
P. Ekman, W. Friesen, J. Hager, Facial Action Coding System: Research nexus, Network Research Information, Salt Lake City, 2010.
[5]
M.S. Bartlett, J.C. Hager, P. Ekman, T.J. Sejnowski, Measuring facial expressions by computer image analysis, Psychophysiology 36 (2) (1999) 253–263.
[6]
J. Susskind, G. Littlewort, M. Bartlett, J. Movellan, A. Anderson, Human and computer recognition of facial expressions of emotion, Neuropsychologia 45 (1) (2007) 152–162.
[7]
P. Ekman, E.L. Rosenberg, What the Face Reveals: Basic and Applied Studies of Spontaneous Expression Using the FACS, Oxford Scholarship, 2005.
[8]
S. Dodge, L. Karam, Can the early human visual system compete with deep neural networks?, Proceedings of the IEEE Int. Conf. on Computer Vision, 2017, pp. 2798–2804.
[9]
A. Oliver, A. Odena, C. Raffel, E.D. Cubuk, I.J. Goodfellow, Realistic evaluation of deep semi-supervised learning algorithms, Proc. of Neural Information Processing(NIPS), 2018, pp. 3239–3250.
[10]
A. Torralba, A.A. Efros, Unbiased look at dataset bias, Proc. of IEEE Computer Vision and Pattern Recognition (CVPR), 2011, pp. 1521–1528.
[11]
S. Fralick, Learning to recognize patterns without a teacher, IEEE Trans. Inf. Theory 13 (1) (1967) 57–64.
[12]
O. Chapelle, B. Scholkopf, A. Zien, Semi-supervised learning (Chapelle, O. et al., eds.; 2006)[book reviews], IEEE Trans. Neural Netw. 20 (3) (2009).
[13]
W. Pang, G. Wu, Fast algorithms for incremental and decremental semi-supervised discriminant analysis, Pattern Recognit. 131 (2022) 108888.
[14]
F.G. Cozman, I. Cohen, Unlabeled data can degrade classification performance of generative classifiers, Proc. of Association for the Advancement of Artificial Intelligence (AAAI) Conf., 2002, pp. 327–331.
[15]
D.H. Lee, Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks, Proc. of Int. Conf. on Machine Learning (ICML) Workshops, 2013, pp. 1–6.
[16]
P. Haeusser, A. Mordvintsev, D. Cremers, Learning by association-a versatile semi-supervised training method for neural networks, Proc. of IEEE Computer Vision and Pattern Recognition (CVPR), 2017, pp. 89–98.
[17]
A. Tarvainen, H. Valpola, Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results, Proc. of Neural Information Processing Systems(NIPS), 2017, pp. 1195–1204.
[18]
T. Miyato, S.I. Maeda, S. Ishii, M. Koyama, Virtual adversarial training: a regularization method for supervised and semi-supervised learning, IEEE Trans. Pattern Anal. Mach. Intell. 41 (8) (2018) 1979–1993.
[19]
S. Laine, T. Aila, Temporal ensembling for semi-supervised learning, Fifth International Conference on Learning Representations (ICLR), 2017, pp. 1–13.
[20]
D. Berthelot, N. Carlini, I.J. Goodfellow, N. Papernot, A. Oliver, C. Raffel, MixMatch: a holistic approach to semi-supervised learning, Proc. of Neural Information Processing Systems (NIPS), 2019, pp. 5050–5060.
[21]
C.A. Corneanu, M.O. Simón, J.F. Cohn, S.E. Guerrero, Survey on RGB, 3D, thermal, and multimodal approaches for facial expression recognition: history, trends, and affect-related applications, IEEE Trans. Pattern Anal. Mach.Intell. 38 (8) (2016) 1548–1568.
[22]
Y. Liu, X. Yuan, X. Gong, Z. Xie, F. Fang, Z. Luo, Conditional convolution neural network enhanced random forest for facial expression recognition, Pattern Recognit 84 (2018) 251–261.
[23]
C.-M. Kuo, S.-H. Lai, M. Sarkis, A compact deep learning model for robust facial expression recognition, Proc. of IEEE Computer Vision and Pattern Recognition (CVPR) Workshops, 2018, pp. 2121–2129.
[24]
S. Zhao, H. Cai, H. Liu, J. Zhang, S. Chen, Feature selection mechanism in CNNs for facial expression recognition, Proc. of British Machine Vision Conference (BMVC), 2018, pp. 1–12.
[25]
W. Xie, X. Jia, L. Shen, M. Yang, Sparse deep feature learning for facial expression recognition, Pattern Recognit. 96 (2019) 106–966.
[26]
E. Barsoum, C. Zhang, C.C. Ferrer, Z. Zhang, Training deep networks for facial expression recognition with crowd-sourced label distribution, Proc. of Int. Conf. on Multimedia Interfaces (ICMI), 2016, pp. 279–283.
[27]
V. Vielzeuf, C. Kervadec, S. Pateux, A. Lechervy, F. Jurie, An occam’s razor view on learning audiovisual emotion recognition with small training sets, Proc. of Int. Conf. on Multimedia Interfaces (ICMI), 2018, pp. 589–593.
[28]
J. Zhang, H. Yu, Improving the facial expression recognition and its interpretability via generating expression pattern-map, Pattern Recognit. 129 (2022) 108737.
[29]
J. Chen, L. Yang, L. Tan, R. Xu, Orthogonal channel attention-based multi-task learning for multi-view facial expression recognition, Pattern Recognit. 129 (2022) 108753.
[30]
C.F. Benitez-Quiroz, R. Srinivasan, A.M. Martinez, EmotioNet: an accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild, Proc. of IEEE Computer Vision and Pattern Recognition (CVPR), 2016, pp. 5562–5570.
[31]
Y. Chen, G. Song, Z. Shao, J. Cai, T.-J. Cham, J. Zheng, GeoConv: geodesic guided convolution for facial action unit recognition, Pattern Recognit. 122 (2022) 108355.
[32]
S. Kaltwang, S. Todorovic, M. Pantic, Latent trees for estimating intensity of facial action units, Proc. of IEEE Computer Vision and Pattern Recognition (CVPR), 2015, pp. 296–304.
[33]
Z. Niu, M. Zhou, L. Wang, X. Gao, G. Hua, Ordinal regression with multiple output CNN for age estimation, Proc. of IEEE Computer Vision and Pattern Recognition (CVPR), 2016, pp. 4920–4928.
[34]
D.L. Tran, R. Walecki, O. Rudovic, S. Eleftheriadis, B. Schuller, M. Pantic, DeepCoder: Semi-parametric variational autoencoders for automatic facial action coding, Proc. of Int. Conf. on Computer Vision (ICCV), 2017, pp. 3209–3218.
[35]
Y. Zhang, R. Zhao, W. Dong, B.-G. Hu, Q. Ji, Bilateral ordinal relevance multi-instance regression for facial action unit intensity estimation, Proc. of IEEE Computer Vision and Pattern Recognition (CVPR), 2018, pp. 2314–2323.
[36]
R. Zhao, Q. Gan, S. Wang, Q. Ji, Facial expression intensity estimation using ordinal information, Proc. of IEEE Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3466–3474.
[37]
A. Ruiz, O. Rudovic, X. Binefa, M. Pantic, Multi-instance dynamic ordinal random fields for weakly supervised facial behavior analysis, IEEE Trans. Image Process. 27 (8) (2018) 3969–3982.
[38]
K. Zhao, W.-S. Chu, A.M. Martinez, Learning facial action units from web images with scalable weakly supervised clustering, Proc. of IEEE Computer Vision and Pattern Recognition (CVPR), 2018, pp. 2090–2099.
[39]
Y. Zhang, W. Dong, B.-G. Hu, Q. Ji, Weakly-supervised deep convolutional neural network learning for facial action unit intensity estimation, Proc. of IEEE Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7034–7043.
[40]
R. Caruana, A. Niculescu-Mizil, G. Crew, A. Ksikes, Ensemble selection from libraries of models, Proc. of Int. Conf. on Machine Learning, 2004, p. 18.
[41]
A. Krogh, J. Vedelsby, Neural network ensembles, cross validation, and active learning, Proc. of Neural Information Processing Systems (NIPS), 1995, pp. 231–238.
[42]
I. Kemelmacher-Shlizerman, S.M. Seitz, D. Miller, E. Brossard, The megaface benchmark: 1 million faces for recognition at scale, Proc. of IEEE Computer Vision and Pattern Recognition (CVPR), 2016, pp. 4873–4882.
[43]
S.M. Mavadati, M.H. Mahoor, K. Bartlett, P. Trinh, J.F. Cohn, DISFA: a spontaneous facial action intensity database, IEEE Trans. Affect. Comput. 4 (2) (2013) 151–160.
[44]
P. Lucey, J.F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, I. Matthews, The extended Cohn-Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression, Proc. of IEEE Computer Vision and Pattern Recognition (CVPR) Workshops, 2010, pp. 94–101.
[45]
T. Joachims, Transductive inference for text classification using support vector machines, Proc. of Int. Conf. on Machine Learning (ICML), 1999, pp. 200–209.
[46]
B. Gong, Y. Shi, F. Sha, K. Grauman, Geodesic flow kernel for unsupervised domain adaptation, Proc. of IEEE Computer Vision and Pattern Recognition (CVPR), 2012, pp. 2066–2073.
[47]
S. Wang, J. Yang, Z. Gao, Q. Ji, Feature and label relation modeling for multiple-facial action unit classification and intensity estimation, Pattern Recognit. 65 (2017) 71–81.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Pattern Recognition
Pattern Recognition  Volume 138, Issue C
Jun 2023
967 pages

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 June 2023

Author Tags

  1. Face expression
  2. Action units
  3. Semi–supervised learning
  4. Diversity

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media