[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Cascaded Structure-Learning Network with Using Adversarial Training for Robust Facial Landmark Detection

Published: 16 February 2022 Publication History

Abstract

Recently, great progress has been achieved on facial landmark detection based on convolutional neural network, while it is still challenging due to partial occlusion and extreme head pose. In this paper, we propose a Cascaded Structure-Learning Network (CSLN) with using adversarial training to improve the performance of 2D facial landmark detection by taking the structure of facial landmarks into account. In the first stage, we improve the original stacked hourglass network, which applies a multi-branch module to capture different scales of features, a progressive convolution structure to compensate for the missing structural features in hourglass networks, and a pyramid inception structure to expand the receptive field. Specially, by introducing a discriminator, we use the adversarial training strategy to urge the improved hourglass network for generating more accurate heatmaps. The second stage, which is based on attention mechanism, optimizes the spatial correlations between different facial landmarks by reusing the structural features. Moreover, we propose a novel region loss, which can adaptively allocate proper weights to different regions. In this way, the network can focus more on those occluded landmarks. The experimental results on several datasets, i.e. 300W, COFW, and AFLW, show that our proposed method achieves superior performance compared with the state-of-the-art methods.

References

[1]
Peter N. Belhumeur, David W. Jacobs, David J. Kriegman, and Neeraj Kumar. 2013. Localizing parts of faces using a consensus of exemplars. IEEE Trans. Pattern Anal. Mach. Intell. 35, 12 (2013), 2930–2940.
[2]
David Berthelot, Tom Schumm, and Luke Metz. 2017. BEGAN: Boundary equilibrium generative adversarial networks. CoRR abs/1703.10717 (2017).
[3]
Adrian Bulat and Georgios Tzimiropoulos. 2017. Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources. In ICCV. 3726–3734.
[4]
Xavier P. Burgos-Artizzu, Pietro Perona, and Piotr Dollár. 2013. Robust face landmark estimation under occlusion. In ICCV. 1513–1520.
[5]
Xudong Cao, Yichen Wei, Fang Wen, and Jian Sun. 2014. Face alignment by explicit shape regression. International Journal of Computer Vision 107, 2 (2014), 177–190.
[6]
Dong Chen, Shaoqing Ren, Yichen Wei, Xudong Cao, and Jian Sun. 2014. Joint cascade face detection and alignment. In ECCV. 109–122.
[7]
Weiliang Chen, Qiang Zhou, and Roland Hu. 2018. Face alignment by combining residual features in cascaded hourglass network. In ICIP. 196–200.
[8]
Yu Chen, Chunhua Shen, Xiu-Shen Wei, Lingqiao Liu, and Jian Yang. 2017. Adversarial PoseNet: A structure-aware convolutional network for human pose estimation. In ICCV. 1221–1230.
[9]
Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, and Jian Sun. 2018. Cascaded pyramid network for multi-person pose estimation. In CVPR. 7103–7112.
[10]
Chia-Jung Chou, Jui-Ting Chien, and Hwann-Tzong Chen. 2018. Self adversarial training for human pose estimation. In APSIPA. 17–30.
[11]
Timothy F. Cootes, Gareth J. Edwards, and Christopher J. Taylor. 2001. Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23, 6 (2001), 681–685.
[12]
Timothy F. Cootes, Christopher J. Taylor, David H. Cooper, and Jim Graham. 1995. Active shape models-their training and application. Computer Vision and Image Understanding 61, 1 (1995), 38–59.
[13]
Jiankang Deng, George Trigeorgis, Yuxiang Zhou, and Stefanos Zafeiriou. 2017. Joint multi-view face alignment in the wild. CoRR abs/1708.06023 (2017).
[14]
Piotr Dollár, Peter Welinder, and Pietro Perona. 2010. Cascaded pose regression. In CVPR. 1078–1085.
[15]
Xuanyi Dong, Yan Yan, Wanli Ouyang, and Yi Yang. 2018. Style aggregated network for facial landmark detection. In CVPR. 379–388.
[16]
Golnaz Ghiasi and Charless C. Fowlkes. 2014. Occlusion coherence: Localizing occluded faces with a hierarchical deformable part model. In CVPR. 1899–1906.
[17]
Golnaz Ghiasi and Charless C. Fowlkes. 2015. Occlusion coherence: Detecting and localizing occluded faces. CoRR abs/1506.08347 (2015).
[18]
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In NIPS. 2672–2680.
[19]
Riza Alp Güler, Natalia Neverova, and Iasonas Kokkinos. 2018. DensePose: Dense human pose estimation in the wild. In CVPR. 7297–7306.
[20]
Tal Hassner, Shai Harel, Eran Paz, and Roee Enbar. 2015. Effective face frontalization in unconstrained images. In CVPR. 4295–4304.
[21]
Sina Honari, Pavlo Molchanov, Stephen Tyree, Pascal Vincent, Christopher J. Pal, and Jan Kautz. 2018. Improving landmark localization with semi-supervised learning. In CVPR. 1546–1555.
[22]
Zhiao Huang, Erjin Zhou, and Zhimin Cao. 2015. Coarse-to-fine face alignment with multi-scale local patch regression. CoRR abs/1511.04901 (2015).
[23]
Arjun Jain, Jonathan Tompson, Mykhaylo Andriluka, Graham W. Taylor, and Christoph Bregler. 2013. Learning human pose estimation features with convolutional networks. CoRR abs/1312.7302 (2013).
[24]
Vahid Kazemi and Josephine Sullivan. 2014. One millisecond face alignment with an ensemble of regression trees. In CVPR. 1867–1874.
[25]
Lipeng Ke, Ming-Ching Chang, Honggang Qi, and Siwei Lyu. 2018. Multi-scale structure-aware network for human pose estimation. In ECCV. 731–746.
[26]
Josef Kittler, Patrik Huber, Zhen-Hua Feng, Guosheng Hu, and William J. Christmas. 2016. 3D morphable face models and their applications. In AMDO. 185–206.
[27]
Martin Köstinger, Paul Wohlhart, Peter M. Roth, and Horst Bischof. 2011. Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In IEEE International Conference on Computer Vision Workshops, ICCV 2011 Workshops, Barcelona, Spain, November 6–13, 2011. 2144–2151.
[28]
Amit Kumar and Rama Chellappa. 2018. Disentangling 3D pose in a dendritic CNN for unconstrained 2D face alignment. In CVPR. 430–439.
[29]
Vuong Le, Jonathan Brandt, Zhe Lin, Lubomir D. Bourdev, and Thomas S. Huang. 2012. Interactive facial feature localization. In Computer Vision - ECCV 2012-12th European Conference on Computer Vision, Florence, Italy, October 7–13, 2012, Proceedings, Part III. 679–692.
[30]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner et al. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.
[31]
Hong Joo Lee, Seong Tae Kim, Hakmin Lee, and Yong Man Ro. 2019. Lightweight and effective facial landmark detection using adversarial learning with face geometric map generative network. IEEE Transactions on Circuits and Systems for Video Technology 30, 3 (2019), 771–780.
[32]
Qingshan Liu, Jiankang Deng, Jing Yang, Guangcan Liu, and Dacheng Tao. 2017. Adaptive cascade regression model for robust face alignment. IEEE Trans. Image Processing 26, 2 (2017), 797–807.
[33]
Yaojie Liu, Amin Jourabloo, William Ren, and Xiaoming Liu. 2017. Dense face alignment. In 2017 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2017, Venice, Italy, October 22–29, 2017. 1619–1628.
[34]
Jiang-Jing Lv, Xiaohu Shao, Junliang Xing, Cheng Cheng, and Xi Zhou. 2017. A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In CVPR. 3691–3700.
[35]
Iaroslav Melekhov, Juha Ylioinas, Juho Kannala, and Esa Rahtu. 2017. Image-based localization using hourglass networks. In 2017 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2017, Venice, Italy, October 22–29, 2017. 870–877.
[36]
Xin Miao, Xiantong Zhen, Xianglong Liu, Cheng Deng, Vassilis Athitsos, and Heng Huang. 2018. Direct shape regression networks for end-to-end face alignment. In CVPR. 5040–5049.
[37]
Stephen Milborrow and Fred Nicolls. 2008. Locating facial features with an extended active shape model. In ECCV. 504–513.
[38]
Joel Ruben Antony Moniz, Christopher Beckham, Simon Rajotte, Sina Honari, and Chris Pal. 2018. Unsupervised depth estimation, 3D face rotation and replacement. In Advances in NNeural IInformation PProcessing SSystems. 9736–9746.
[39]
Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In ECCV. 483–499.
[40]
Rajeev Ranjan, Vishal M. Patel, and Rama Chellappa. 2017. Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE TTransactions on PPattern AAnalysis and MMachine IIntelligence 41, 1 (2017), 121–135.
[41]
Shaoqing Ren, Xudong Cao, Yichen Wei, and Jian Sun. 2016. Face alignment via regressing local binary features. IEEE Trans. Image Processing 25, 3 (2016), 1233–1245.
[42]
Joseph Roth, Yiying Tong, and Xiaoming Liu. 2017. Adaptive 3D face reconstruction from unconstrained photo collections. IEEE Trans. Pattern Anal. Mach. Intell. 39, 11 (2017), 2127–2141.
[43]
Christos Sagonas, Georgios Tzimiropoulos, Stefanos Zafeiriou, and Maja Pantic. 2013. 300 Faces in-the-wwild cchallenge: The ffirst f facial llandmark llocalization cchallenge. In 2013 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2013, Sydney, Australia, December 1–8, 2013. 397–403.
[44]
Jason M. Saragih and Roland Göcke. 2007. A nonlinear discriminative approach to AAM fitting. In ICCV. 1–8.
[45]
Yi Sun, Yuheng Chen, Xiaogang Wang, and Xiaoou Tang. 2014. Deep learning face representation by joint identification-verification. In NIPS. 1988–1996.
[46]
Yi Sun, Xiaogang Wang, and Xiaoou Tang. 2013. Deep convolutional network cascade for facial point detection. In CVPR. 3476–3483.
[47]
Zhiqiang Tang, Xi Peng, Shijie Geng, Lingfei Wu, Shaoting Zhang, and Dimitris N. Metaxas. 2018. Quantized densely connected u-nets for efficient landmark localization. In ECCV. 348–364.
[48]
George Trigeorgis, Patrick Snape, Mihalis A. Nicolaou, Epameinondas Antonakos, and Stefanos Zafeiriou. 2016. Mnemonic descent method: A recurrent process applied for end-to-end face alignment. In CVPR. 4177–4187.
[49]
Georgios Tzimiropoulos and Maja Pantic. 2013. Optimization problems for fast AAM fitting in-the-wwild. In ICCV. 593–600.
[50]
Roberto Valle, José Miguel Buenaposada, Antonio Valdés, and Luis Baumela. 2018. A deeply-initialized coarse-to-fine ensemble of regression trees for face alignment. In ECCV. 609–624.
[51]
Wei Wang, Sergey Tulyakov, and Nicu Sebe. 2018. Recurrent convolutional shape regression. IEEE TTransactions on PPattern AAnalysis and MMachine IIntelligence 40, 11 (2018), 2569–2582.
[52]
Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2016. Convolutional pose machines. In CVPR. 4724–4732.
[53]
Wayne Wu, Chen Qian, Shuo Yang, Quan Wang, Yici Cai, and Qiang Zhou. 2018. Look at boundary: A boundary-aware face alignment algorithm. In CVPR. 2129–2138.
[54]
Wenyan Wu and Shuo Yang. 2017. Leveraging intra and inter-dataset variations for robust face alignment. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2017, Honolulu, HI, USA, July 21–26, 2017. 2096–2105.
[55]
Yue Wu, Tal Hassner, KangGeon Kim, Gérard G. Medioni, and Prem Natarajan. 2018. Facial landmark detection with tweaked convolutional neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 40, 12 (2018), 3067–3074.
[56]
Shengtao Xiao, Jiashi Feng, Junliang Xing, Hanjiang Lai, Shuicheng Yan, and Ashraf A. Kassim. 2016. Robust facial landmark detection via recurrent attentive-refinement networks. In ECCV. 57–72.
[57]
Xuehan Xiong and Fernando De la Torre. 2013. Supervised descent method and its applications to face alignment. In CVPR. 532–539.
[58]
Jing Yang, Qingshan Liu, and Kaihua Zhang. 2017. Stacked hourglass network for robust facial landmark localisation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2017, Honolulu, HI, USA, July 21–26, 2017. 2025–2033.
[59]
Jiaolong Yang, Peiran Ren, Dongqing Zhang, Dong Chen, Fang Wen, Hongdong Li, and Gang Hua. 2017. Neural aggregation network for video face recognition. In CVPR. 5216–5225.
[60]
Stefanos Zafeiriou, George Trigeorgis, Grigorios Chrysos, Jiankang Deng, and Jie Shen. 2017. The MMenpo facial landmark localisation challenge: A step towards the solution. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2017, Honolulu, HI, USA, July 21–26, 2017. 2116–2125.
[61]
Jie Zhang, Shiguang Shan, Meina Kan, and Xilin Chen. 2014. Coarse-to-ffine auto-encoder networks (CFAN) for real-time face alignment. In ECCV. 1–16.
[62]
Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multi-task cascaded convolutional networks. CoRR abs/1604.02878 (2016).
[63]
Zhanpeng Zhang, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2014. Facial landmark detection by deep multi-task learning. In ECCV. 94–108.
[64]
Meilu Zhu, Daming Shi, Mingjie Zheng, and Muhammad Sadiq. 2019. Robust facial landmark detection via occlusion-adaptive deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3486–3496.
[65]
Shizhan Zhu, Cheng Li, Chen Change Loy, and Xiaoou Tang. 2015. Face alignment by coarse-to-fine shape searching. In CVPR. 4998–5006.
[66]
Shizhan Zhu, Cheng Li, Chen Change Loy, and Xiaoou Tang. 2016. Unconstrained face alignment via cascaded compositional learning. In CVPR. 3409–3417.
[67]
Xiangyu Zhu, Zhen Lei, Xiaoming Liu, Hailin Shi, and Stan Z. Li. 2016. Face alignment across large poses: A 3D solution. In CVPR. 146–155.
[68]
Xiangxin Zhu and Deva Ramanan. 2012. Face detection, pose estimation, and landmark localization in the wild. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, June 16–21, 2012. 2879–2886.
[69]
Xu Zou, Sheng Zhong, Luxin Yan, Xiangyun Zhao, Jiahuan Zhou, and Ying Wu. 2019. Learning robust facial landmark detection via hierarchical structured ensemble. In Proceedings of the IEEE International Conference on Computer Vision. 141–150.

Cited By

View all
  • (2024)Face Identification Based on Active Facial Patches Using Multi-Task Cascaded Convolutional NetworksJournal of Advances in Information Technology10.12720/jait.15.1.118-12615:1(118-126)Online publication date: 2024
  • (2024)Unsupervised Adversarial Example Detection of Vision Transformers for Trustworthy Edge ComputingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3674981Online publication date: 2-Jul-2024
  • (2024)Local eye-net: An attention based deep learning architecture for localization of eyesExpert Systems with Applications10.1016/j.eswa.2023.122416239(122416)Online publication date: Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 18, Issue 2
May 2022
494 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3505207
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 February 2022
Accepted: 01 July 2021
Revised: 01 July 2021
Received: 01 July 2020
Published in TOMM Volume 18, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. 2D Facial landmark detection
  2. adversarial training
  3. convolutional neural networks
  4. keypoints structure learning

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • National Natural Science Foundation of China
  • Natural Science Foundation of Guangdong Province
  • Science and Technology Program of Huizhou of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)2
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Face Identification Based on Active Facial Patches Using Multi-Task Cascaded Convolutional NetworksJournal of Advances in Information Technology10.12720/jait.15.1.118-12615:1(118-126)Online publication date: 2024
  • (2024)Unsupervised Adversarial Example Detection of Vision Transformers for Trustworthy Edge ComputingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3674981Online publication date: 2-Jul-2024
  • (2024)Local eye-net: An attention based deep learning architecture for localization of eyesExpert Systems with Applications10.1016/j.eswa.2023.122416239(122416)Online publication date: Apr-2024
  • (2023)COutfitGAN: Learning to Synthesize Compatible Outfits Supervised by Silhouette Masks and Fashion StylesIEEE Transactions on Multimedia10.1109/TMM.2022.318589425(4986-5001)Online publication date: 1-Jan-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media