More Web Proxy on the site http://driver.im/

research-article

Cascaded Structure-Learning Network with Using Adversarial Training for Robust Facial Landmark Detection

Authors:

Xingzhong Nong,

Haifeng HuAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 18, Issue 2

Article No.: 49, Pages 1 - 20

https://doi.org/10.1145/3474595

Published: 16 February 2022 Publication History

Abstract

Recently, great progress has been achieved on facial landmark detection based on convolutional neural network, while it is still challenging due to partial occlusion and extreme head pose. In this paper, we propose a Cascaded Structure-Learning Network (CSLN) with using adversarial training to improve the performance of 2D facial landmark detection by taking the structure of facial landmarks into account. In the first stage, we improve the original stacked hourglass network, which applies a multi-branch module to capture different scales of features, a progressive convolution structure to compensate for the missing structural features in hourglass networks, and a pyramid inception structure to expand the receptive field. Specially, by introducing a discriminator, we use the adversarial training strategy to urge the improved hourglass network for generating more accurate heatmaps. The second stage, which is based on attention mechanism, optimizes the spatial correlations between different facial landmarks by reusing the structural features. Moreover, we propose a novel region loss, which can adaptively allocate proper weights to different regions. In this way, the network can focus more on those occluded landmarks. The experimental results on several datasets, i.e. 300W, COFW, and AFLW, show that our proposed method achieves superior performance compared with the state-of-the-art methods.

References

[1]

Peter N. Belhumeur, David W. Jacobs, David J. Kriegman, and Neeraj Kumar. 2013. Localizing parts of faces using a consensus of exemplars. IEEE Trans. Pattern Anal. Mach. Intell. 35, 12 (2013), 2930–2940.

Digital Library

[2]

David Berthelot, Tom Schumm, and Luke Metz. 2017. BEGAN: Boundary equilibrium generative adversarial networks. CoRR abs/1703.10717 (2017).

[3]

Adrian Bulat and Georgios Tzimiropoulos. 2017. Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources. In ICCV. 3726–3734.

[4]

Xavier P. Burgos-Artizzu, Pietro Perona, and Piotr Dollár. 2013. Robust face landmark estimation under occlusion. In ICCV. 1513–1520.

Digital Library

[5]

Xudong Cao, Yichen Wei, Fang Wen, and Jian Sun. 2014. Face alignment by explicit shape regression. International Journal of Computer Vision 107, 2 (2014), 177–190.

Digital Library

[6]

Dong Chen, Shaoqing Ren, Yichen Wei, Xudong Cao, and Jian Sun. 2014. Joint cascade face detection and alignment. In ECCV. 109–122.

[7]

Weiliang Chen, Qiang Zhou, and Roland Hu. 2018. Face alignment by combining residual features in cascaded hourglass network. In ICIP. 196–200.

[8]

Yu Chen, Chunhua Shen, Xiu-Shen Wei, Lingqiao Liu, and Jian Yang. 2017. Adversarial PoseNet: A structure-aware convolutional network for human pose estimation. In ICCV. 1221–1230.

[9]

Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, and Jian Sun. 2018. Cascaded pyramid network for multi-person pose estimation. In CVPR. 7103–7112.

[10]

Chia-Jung Chou, Jui-Ting Chien, and Hwann-Tzong Chen. 2018. Self adversarial training for human pose estimation. In APSIPA. 17–30.

[11]

Timothy F. Cootes, Gareth J. Edwards, and Christopher J. Taylor. 2001. Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23, 6 (2001), 681–685.

Digital Library

[12]

Timothy F. Cootes, Christopher J. Taylor, David H. Cooper, and Jim Graham. 1995. Active shape models-their training and application. Computer Vision and Image Understanding 61, 1 (1995), 38–59.

Digital Library

[13]

Jiankang Deng, George Trigeorgis, Yuxiang Zhou, and Stefanos Zafeiriou. 2017. Joint multi-view face alignment in the wild. CoRR abs/1708.06023 (2017).

[14]

Piotr Dollár, Peter Welinder, and Pietro Perona. 2010. Cascaded pose regression. In CVPR. 1078–1085.

[15]

Xuanyi Dong, Yan Yan, Wanli Ouyang, and Yi Yang. 2018. Style aggregated network for facial landmark detection. In CVPR. 379–388.

[16]

Golnaz Ghiasi and Charless C. Fowlkes. 2014. Occlusion coherence: Localizing occluded faces with a hierarchical deformable part model. In CVPR. 1899–1906.

Digital Library

[17]

Golnaz Ghiasi and Charless C. Fowlkes. 2015. Occlusion coherence: Detecting and localizing occluded faces. CoRR abs/1506.08347 (2015).

[18]

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In NIPS. 2672–2680.

Digital Library

[19]

Riza Alp Güler, Natalia Neverova, and Iasonas Kokkinos. 2018. DensePose: Dense human pose estimation in the wild. In CVPR. 7297–7306.

[20]

Tal Hassner, Shai Harel, Eran Paz, and Roee Enbar. 2015. Effective face frontalization in unconstrained images. In CVPR. 4295–4304.

[21]

Sina Honari, Pavlo Molchanov, Stephen Tyree, Pascal Vincent, Christopher J. Pal, and Jan Kautz. 2018. Improving landmark localization with semi-supervised learning. In CVPR. 1546–1555.

[22]

Zhiao Huang, Erjin Zhou, and Zhimin Cao. 2015. Coarse-to-fine face alignment with multi-scale local patch regression. CoRR abs/1511.04901 (2015).

[23]

Arjun Jain, Jonathan Tompson, Mykhaylo Andriluka, Graham W. Taylor, and Christoph Bregler. 2013. Learning human pose estimation features with convolutional networks. CoRR abs/1312.7302 (2013).

[24]

Vahid Kazemi and Josephine Sullivan. 2014. One millisecond face alignment with an ensemble of regression trees. In CVPR. 1867–1874.

Digital Library

[25]

Lipeng Ke, Ming-Ching Chang, Honggang Qi, and Siwei Lyu. 2018. Multi-scale structure-aware network for human pose estimation. In ECCV. 731–746.

[26]

Josef Kittler, Patrik Huber, Zhen-Hua Feng, Guosheng Hu, and William J. Christmas. 2016. 3D morphable face models and their applications. In AMDO. 185–206.

[27]

Martin Köstinger, Paul Wohlhart, Peter M. Roth, and Horst Bischof. 2011. Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In IEEE International Conference on Computer Vision Workshops, ICCV 2011 Workshops, Barcelona, Spain, November 6–13, 2011. 2144–2151.

[28]

Amit Kumar and Rama Chellappa. 2018. Disentangling 3D pose in a dendritic CNN for unconstrained 2D face alignment. In CVPR. 430–439.

[29]

Vuong Le, Jonathan Brandt, Zhe Lin, Lubomir D. Bourdev, and Thomas S. Huang. 2012. Interactive facial feature localization. In Computer Vision - ECCV 2012-12th European Conference on Computer Vision, Florence, Italy, October 7–13, 2012, Proceedings, Part III. 679–692.

Digital Library

[30]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner et al. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.

[31]

Hong Joo Lee, Seong Tae Kim, Hakmin Lee, and Yong Man Ro. 2019. Lightweight and effective facial landmark detection using adversarial learning with face geometric map generative network. IEEE Transactions on Circuits and Systems for Video Technology 30, 3 (2019), 771–780.

Digital Library

[32]

Qingshan Liu, Jiankang Deng, Jing Yang, Guangcan Liu, and Dacheng Tao. 2017. Adaptive cascade regression model for robust face alignment. IEEE Trans. Image Processing 26, 2 (2017), 797–807.

Digital Library

[33]

Yaojie Liu, Amin Jourabloo, William Ren, and Xiaoming Liu. 2017. Dense face alignment. In 2017 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2017, Venice, Italy, October 22–29, 2017. 1619–1628.

[34]

Jiang-Jing Lv, Xiaohu Shao, Junliang Xing, Cheng Cheng, and Xi Zhou. 2017. A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In CVPR. 3691–3700.

[35]

Iaroslav Melekhov, Juha Ylioinas, Juho Kannala, and Esa Rahtu. 2017. Image-based localization using hourglass networks. In 2017 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2017, Venice, Italy, October 22–29, 2017. 870–877.

[36]

Xin Miao, Xiantong Zhen, Xianglong Liu, Cheng Deng, Vassilis Athitsos, and Heng Huang. 2018. Direct shape regression networks for end-to-end face alignment. In CVPR. 5040–5049.

[37]

Stephen Milborrow and Fred Nicolls. 2008. Locating facial features with an extended active shape model. In ECCV. 504–513.

Digital Library

[38]

Joel Ruben Antony Moniz, Christopher Beckham, Simon Rajotte, Sina Honari, and Chris Pal. 2018. Unsupervised depth estimation, 3D face rotation and replacement. In Advances in NNeural IInformation PProcessing SSystems. 9736–9746.

Digital Library

[39]

Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In ECCV. 483–499.

[40]

Rajeev Ranjan, Vishal M. Patel, and Rama Chellappa. 2017. Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE TTransactions on PPattern AAnalysis and MMachine IIntelligence 41, 1 (2017), 121–135.

Digital Library

[41]

Shaoqing Ren, Xudong Cao, Yichen Wei, and Jian Sun. 2016. Face alignment via regressing local binary features. IEEE Trans. Image Processing 25, 3 (2016), 1233–1245.

Digital Library

[42]

Joseph Roth, Yiying Tong, and Xiaoming Liu. 2017. Adaptive 3D face reconstruction from unconstrained photo collections. IEEE Trans. Pattern Anal. Mach. Intell. 39, 11 (2017), 2127–2141.

Digital Library

[43]

Christos Sagonas, Georgios Tzimiropoulos, Stefanos Zafeiriou, and Maja Pantic. 2013. 300 Faces in-the-wwild cchallenge: The ffirst f facial llandmark llocalization cchallenge. In 2013 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2013, Sydney, Australia, December 1–8, 2013. 397–403.

Digital Library

[44]

Jason M. Saragih and Roland Göcke. 2007. A nonlinear discriminative approach to AAM fitting. In ICCV. 1–8.

[45]

Yi Sun, Yuheng Chen, Xiaogang Wang, and Xiaoou Tang. 2014. Deep learning face representation by joint identification-verification. In NIPS. 1988–1996.

Digital Library

[46]

Yi Sun, Xiaogang Wang, and Xiaoou Tang. 2013. Deep convolutional network cascade for facial point detection. In CVPR. 3476–3483.

Digital Library

[47]

Zhiqiang Tang, Xi Peng, Shijie Geng, Lingfei Wu, Shaoting Zhang, and Dimitris N. Metaxas. 2018. Quantized densely connected u-nets for efficient landmark localization. In ECCV. 348–364.

[48]

George Trigeorgis, Patrick Snape, Mihalis A. Nicolaou, Epameinondas Antonakos, and Stefanos Zafeiriou. 2016. Mnemonic descent method: A recurrent process applied for end-to-end face alignment. In CVPR. 4177–4187.

[49]

Georgios Tzimiropoulos and Maja Pantic. 2013. Optimization problems for fast AAM fitting in-the-wwild. In ICCV. 593–600.

Digital Library

[50]

Roberto Valle, José Miguel Buenaposada, Antonio Valdés, and Luis Baumela. 2018. A deeply-initialized coarse-to-fine ensemble of regression trees for face alignment. In ECCV. 609–624.

[51]

Wei Wang, Sergey Tulyakov, and Nicu Sebe. 2018. Recurrent convolutional shape regression. IEEE TTransactions on PPattern AAnalysis and MMachine IIntelligence 40, 11 (2018), 2569–2582.

Digital Library

[52]

Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2016. Convolutional pose machines. In CVPR. 4724–4732.

[53]

Wayne Wu, Chen Qian, Shuo Yang, Quan Wang, Yici Cai, and Qiang Zhou. 2018. Look at boundary: A boundary-aware face alignment algorithm. In CVPR. 2129–2138.

[54]

Wenyan Wu and Shuo Yang. 2017. Leveraging intra and inter-dataset variations for robust face alignment. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2017, Honolulu, HI, USA, July 21–26, 2017. 2096–2105.

[55]

Yue Wu, Tal Hassner, KangGeon Kim, Gérard G. Medioni, and Prem Natarajan. 2018. Facial landmark detection with tweaked convolutional neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 40, 12 (2018), 3067–3074.

Digital Library

[56]

Shengtao Xiao, Jiashi Feng, Junliang Xing, Hanjiang Lai, Shuicheng Yan, and Ashraf A. Kassim. 2016. Robust facial landmark detection via recurrent attentive-refinement networks. In ECCV. 57–72.

[57]

Xuehan Xiong and Fernando De la Torre. 2013. Supervised descent method and its applications to face alignment. In CVPR. 532–539.

Digital Library

[58]

Jing Yang, Qingshan Liu, and Kaihua Zhang. 2017. Stacked hourglass network for robust facial landmark localisation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2017, Honolulu, HI, USA, July 21–26, 2017. 2025–2033.

[59]

Jiaolong Yang, Peiran Ren, Dongqing Zhang, Dong Chen, Fang Wen, Hongdong Li, and Gang Hua. 2017. Neural aggregation network for video face recognition. In CVPR. 5216–5225.

[60]

Stefanos Zafeiriou, George Trigeorgis, Grigorios Chrysos, Jiankang Deng, and Jie Shen. 2017. The MMenpo facial landmark localisation challenge: A step towards the solution. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2017, Honolulu, HI, USA, July 21–26, 2017. 2116–2125.

[61]

Jie Zhang, Shiguang Shan, Meina Kan, and Xilin Chen. 2014. Coarse-to-ffine auto-encoder networks (CFAN) for real-time face alignment. In ECCV. 1–16.

[62]

Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multi-task cascaded convolutional networks. CoRR abs/1604.02878 (2016).

[63]

Zhanpeng Zhang, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2014. Facial landmark detection by deep multi-task learning. In ECCV. 94–108.

[64]

Meilu Zhu, Daming Shi, Mingjie Zheng, and Muhammad Sadiq. 2019. Robust facial landmark detection via occlusion-adaptive deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3486–3496.

[65]

Shizhan Zhu, Cheng Li, Chen Change Loy, and Xiaoou Tang. 2015. Face alignment by coarse-to-fine shape searching. In CVPR. 4998–5006.

[66]

Shizhan Zhu, Cheng Li, Chen Change Loy, and Xiaoou Tang. 2016. Unconstrained face alignment via cascaded compositional learning. In CVPR. 3409–3417.

[67]

Xiangyu Zhu, Zhen Lei, Xiaoming Liu, Hailin Shi, and Stan Z. Li. 2016. Face alignment across large poses: A 3D solution. In CVPR. 146–155.

[68]

Xiangxin Zhu and Deva Ramanan. 2012. Face detection, pose estimation, and landmark localization in the wild. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, June 16–21, 2012. 2879–2886.

[69]

Xu Zou, Sheng Zhong, Luxin Yan, Xiangyun Zhao, Jiahuan Zhou, and Ying Wu. 2019. Learning robust facial landmark detection via hierarchical structured ensemble. In Proceedings of the IEEE International Conference on Computer Vision. 141–150.

Cited By

M. KR. J(2024)Face Identification Based on Active Facial Patches Using Multi-Task Cascaded Convolutional NetworksJournal of Advances in Information Technology10.12720/jait.15.1.118-12615:1(118-126)Online publication date: 2024
https://doi.org/10.12720/jait.15.1.118-126
Li JTan YYang JLi ZYe HXia CLi Y(2024)Unsupervised Adversarial Example Detection of Vision Transformers for Trustworthy Edge ComputingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3674981Online publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1145/3674981
Maiti SGupta A(2024)Local eye-net: An attention based deep learning architecture for localization of eyesExpert Systems with Applications10.1016/j.eswa.2023.122416239(122416)Online publication date: Apr-2024
https://doi.org/10.1016/j.eswa.2023.122416
Show More Cited By

Index Terms

Cascaded Structure-Learning Network with Using Adversarial Training for Robust Facial Landmark Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory

Recommendations

Robust facial landmark extraction scheme using multiple convolutional neural networks

Facial landmarks are a set of features that can be distinguished on the human face with the naked eye. Typical facial landmarks include eyes, eyebrows, nose, and mouth. Landmarks play an important role in human-related image analysis. For example, they ...
A hybrid adversarial training for deep learning model and denoising network resistant to adversarial examples
Abstract
Deep neural networks (DNNs) are vulnerable to adversarial attacks that generate adversarial examples by adding small perturbations to the clean images. To combat adversarial attacks, the two main defense methods used are denoising and adversarial ...
Facial Landmark Detection and Tracking for Facial Behavior Analysis
ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval

The face is the most dominant and distinct communication tool of human beings. Automatic analysis of facial behavior allows machines to understand and interpret a human's states and needs for natural interactions. This research focuses on developing ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 18, Issue 2

May 2022

494 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3505207

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 February 2022

Accepted: 01 July 2021

Revised: 01 July 2021

Received: 01 July 2020

Published in TOMM Volume 18, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

National Natural Science Foundation of China
Natural Science Foundation of Guangdong Province
Science and Technology Program of Huizhou of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
240
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)2

Reflects downloads up to 12 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

M. KR. J(2024)Face Identification Based on Active Facial Patches Using Multi-Task Cascaded Convolutional NetworksJournal of Advances in Information Technology10.12720/jait.15.1.118-12615:1(118-126)Online publication date: 2024
https://doi.org/10.12720/jait.15.1.118-126
Li JTan YYang JLi ZYe HXia CLi Y(2024)Unsupervised Adversarial Example Detection of Vision Transformers for Trustworthy Edge ComputingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3674981Online publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1145/3674981
Maiti SGupta A(2024)Local eye-net: An attention based deep learning architecture for localization of eyesExpert Systems with Applications10.1016/j.eswa.2023.122416239(122416)Online publication date: Apr-2024
https://doi.org/10.1016/j.eswa.2023.122416
Zhou DZhang HLi QMa JXu X(2023)COutfitGAN: Learning to Synthesize Compatible Outfits Supervised by Silhouette Masks and Fashion StylesIEEE Transactions on Multimedia10.1109/TMM.2022.318589425(4986-5001)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TMM.2022.3185894

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents