AUPro: Multi-label Facial Action Unit Proposal Generation for Sequence-Level Analysis

Yingjie Chen¹³,
Jiarui Zhang¹³,
Diqi Chen¹⁴,
Tao Wang¹³,
Yizhou Wang¹³ &
…
Yun Liang¹³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13110))

Included in the following conference series:

International Conference on Neural Information Processing

1813 Accesses

Abstract

Facial action unit (AU) plays an essential role in human facial behavior analysis. Despite the progress made in frame-level AU analysis, the discrete classification results provided by previous work are not explicit enough for the analysis required by many real-world applications, and as AU is a dynamic process, sequence-level analysis maintaining a global view has yet been gravely ignored in the literature. To fill in the blank, we propose a multi-label AU proposal generation task for sequence-level facial action analysis. To tackle the task, we design AUPro, which takes a video clip as input and directly generates proposals for each AU category. Extensive experiments conducted on two commonly used AU benchmark datasets, BP4D and DISFA, show the superiority of our proposed method.

Y. Chen and J. Zhang—Equal Contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 71.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 89.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

AU-vMAE: Knowledge-Guide Action Units Detection via Video Masked Autoencoder

Exploiting Human Pose for Weakly-Supervised Temporal Action Localization

Semi-supervised Facial Action Unit Intensity Estimation with Contrastive Learning

References

Bodla, N., Singh, B., Chellappa, R., Davis, L.S.: Soft-NMS - improving object detection with one line of code. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), October 2017 (2017)
Google Scholar
Chen, Y., Wu, H., Wang, T., Wang, Y., Liang, Y.: Cross-modal representation learning for lightweight and accurate facial action unit detection. IEEE Robot. Autom. Lett. 6(4), 7619–7626 (2021)
Article Google Scholar
Cohn, J.F., Schmidt, K.: The timing of facial motion in posed and spontaneous smiles. In: Active Media Technology, pp. 57–69. World Scientific (2003)
Google Scholar
Ekman, P., Friesen, W.: Facial action coding system: a technique for the measurement of facial movement (1978)
Google Scholar
Gao, J., Yang, Z., Chen, K., Sun, C., Nevatia, R.: Turn TAP: temporal unit regression network for temporal action proposals. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), October 2017 (2017)
Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Teh, Y.W., Titterington, M. (eds.) Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010, vol. 9, pp. 249–256. PMLR (2010)
Google Scholar
He, J., Li, D., Yang, B., Cao, S., Sun, B., Yu, L.: Multi view facial action unit detection based on CNN and BLSTM-RNN. In: International Conference on Automatic Face and Gesture Recognition (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Li, G., Zhu, X., Zeng, Y., Wang, Q., Lin, L.: Semantic relationships guided representation learning for facial action unit recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8594–8601 (2019)
Google Scholar
Li, W., Abtahi, F., Zhu, Z.: Action unit detection with region adaptation, multi-labeling learning and optimal temporal fusing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Google Scholar
Lin, C., et al.: Fast learning of temporal action proposal via dense boundary generator. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11499–11506 (2020)
Google Scholar
Lin, T., Liu, X., Li, X., Ding, E., Wen, S.: BMN: boundary-matching network for temporal action proposal generation (2019)
Google Scholar
Lin, T., Zhao, X., Su, H., Wang, C., Yang, M.: BSN: boundary sensitive network for temporal action proposal generation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 3–21. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_1
Chapter Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization (2019)
Google Scholar
Mavadati, S.M., Mahoor, M.H., Bartlett, K., Trinh, P., Cohn, J.F.: DISFA: a spontaneous facial action intensity database. IEEE Trans. Affect. Comput. 4, 151–160 (2013)
Article Google Scholar
Niu, X., Han, H., Shan, S., Chen, X.: Multi-label co-regularization for semi-supervised facial action unit recognition. arXiv preprint arXiv:1910.11012 (2019)
Schmidt, K.L., Ambadar, Z., Cohn, J.F., Reed, L.I.: Movement differences between deliberate and spontaneous facial expressions: Zygomaticus major action in smiling. J. Nonverbal Behav. 30(1), 37–52 (2006)
Article Google Scholar
Senechal, T., Rapp, V., Salam, H., Seguier, R., Bailly, K., Prevost, L.: Combining AAM coefficients with LGBP histograms in the multi-kernel SVM framework to detect facial action units. In 2011 IEEE International Conference on Automatic Face Gesture Recognition (FG), pp. 860–865 (2011)
Google Scholar
Shou, Z., Wang, D., Chang, S.-F.: Temporal action localization in untrimmed videos via multi-stage CNNs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016 (2016)
Google Scholar
Tong, Y., Liao, W., Ji, Q.: Facial action unit recognition by exploiting their dynamic and semantic relationships. IEEE Trans. Pattern Anal. Mach. Intell. 29(10), 1683–1699 (2007)
Article Google Scholar
Walecki, R., Rudovic, O., Pavlovic, V., Pantic, M.: Copula ordinal regression framework for joint estimation of facial action unit intensity. IEEE Trans. Affect. Comput. 10(3), 297–312 (2017)
Article Google Scholar
Wang, C., Wang, S.: Personalized multiple facial action unit recognition through generative adversarial recognition network. In: Proceedings of the 26th ACM international conference on Multimedia, pp. 302–310 (2018)
Google Scholar
Zhang, X., et al.: BP4D-Spontaneous: a high-resolution spontaneous 3D dynamic facial expression database. Image Vis. Comput. 32(10), 692–706 (2014)
Article Google Scholar
Zhang, Y., Jiang, H., Wu, B., Fan, Y., Ji, Q.: Context-aware feature and label fusion for facial action unit intensity estimation with partially labeled data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 733–742 (2019)
Google Scholar
Zhao, K., Chu, W., Zhang, H.: Deep region and multi-label learning for facial action unit detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Zhao, Y., Xiong, Y., Wang, L., Wu, Z., Tang, X., Lin, D.: Temporal action detection with structured segment networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), October 2017 (2017)
Google Scholar
Zhu, Y., De la Torre, F., Cohn, J.F., Zhang, Y.J.: Dynamic cascades with bidirectional bootstrapping for action unit detection in spontaneous facial behavior. IEEE Trans. Affect. Comput. 2(2), 79–91 (2011)
Article Google Scholar

Download references

Acknowledgements

This work is in part supported by the PKU-NTU Joint Research Institute (JRI) sponsored by a donation from the Ng Teng Fong Charitable Foundation.

Author information

Authors and Affiliations

Department of Computer Science and Technology, Peking University, Beijing, China
Yingjie Chen, Jiarui Zhang, Tao Wang, Yizhou Wang & Yun Liang
Advanced Institute of Information Technology, Peking University, Hangzhou, China
Diqi Chen

Authors

Yingjie Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jiarui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Diqi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Tao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yizhou Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yun Liang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Wang .

Editor information

Editors and Affiliations

Sampoerna University, Jakarta, Indonesia
Teddy Mantoro
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee
Sampoerna University, Jakarta, Indonesia
Media Anugerah Ayu
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Universitas Indonesia, Depok, Indonesia
Achmad Nizar Hidayanto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Y., Zhang, J., Chen, D., Wang, T., Wang, Y., Liang, Y. (2021). AUPro: Multi-label Facial Action Unit Proposal Generation for Sequence-Level Analysis. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13110. Springer, Cham. https://doi.org/10.1007/978-3-030-92238-2_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-92238-2_8
Published: 05 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92237-5
Online ISBN: 978-3-030-92238-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

AUPro: Multi-label Facial Action Unit Proposal Generation for Sequence-Level Analysis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

AU-vMAE: Knowledge-Guide Action Units Detection via Video Masked Autoencoder

Exploiting Human Pose for Weakly-Supervised Temporal Action Localization

Semi-supervised Facial Action Unit Intensity Estimation with Contrastive Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

AUPro: Multi-label Facial Action Unit Proposal Generation for Sequence-Level Analysis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

AU-vMAE: Knowledge-Guide Action Units Detection via Video Masked Autoencoder

Exploiting Human Pose for Weakly-Supervised Temporal Action Localization

Semi-supervised Facial Action Unit Intensity Estimation with Contrastive Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation