[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Mask-Guided Deformation Adaptive Network for Human Parsing

Published: 14 March 2022 Publication History

Abstract

Due to the challenges of densely compacted body parts, nonrigid clothing items, and severe overlap in crowd scenes, human parsing needs to focus more on multilevel feature representations compared to general scene parsing tasks. Based on this observation, we propose to introduce the auxiliary task of human mask and edge detection to facilitate human parsing. Different from human parsing, which exploits the discriminative features of each category, human mask and edge detection emphasizes the boundaries of semantic parsing regions and the difference between foreground humans and background clutter, which benefits the parsing predictions of crowd scenes and small human parts. Specifically, we extract human mask and edge labels from the human parsing annotations and train a shared encoder with three independent decoders for the three mutually beneficial tasks. Furthermore, the decoder feature maps of the human mask prediction branch are further exploited as attention maps, indicating human regions to facilitate the decoding process of human parsing and human edge detection. In addition to these auxiliary tasks, we further alleviate the problem of deformed clothing items under various human poses by tracking the deformation patterns with the deformable convolution. Extensive experiments show that the proposed method can achieve superior performance against state-of-the-art methods on both single and multiple human parsing datasets. Codes and trained models are available https://github.com/ViktorLiang/MGDAN.

References

[1]
Piotr Bilinski and Victor Prisacariu. 2018. Dense decoder shortcut connections for single-pass semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 6596–6605.
[2]
Liang-Chieh Chen, Jonathan T. Barron, George Papandreou, Kevin Murphy, and Alan L. Yuille. 2016. Semantic image segmentation with task-specific edge detection using CNNs and a discriminatively trained domain transform. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 4545–4554.
[3]
Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision.
[4]
Xianjie Chen, Roozbeh Mottaghi, Xiaobai Liu, Sanja Fidler, Raquel Urtasun, and Alan Yuille. 2014. Detect what you can: Detecting and representing objects using holistic models and body parts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1979–1986.
[5]
J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei. 2017. Deformable convolutional networks. In Proceedings of IEEE International Conference on Computer Vision. IEEE, 764–773.
[6]
Hao-Shu Fang, Guansong Lu, Xiaolin Fang, Jianwen Xie, Yu-Wing Tai, and Cewu Lu. 2018. Weakly and semi supervised human body part parsing via pose-guided knowledge transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 70–78.
[7]
Pedro F. Felzenszwalb, Ross B. Girshick, David A. McAllester, and Deva Ramanan. 2009. Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (2009), 1627–1645.
[8]
Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang, and Hanqing Lu. 2019. Dual attention network for scene segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3141–3149.
[9]
Ke Gong, Yiming Gao, Xiaodan Liang, Xiaohui Shen, Meng Wang, and Liang Lin. 2019. Graphonomy: Universal human parsing via graph transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 7442–7451.
[10]
Ke Gong, Xiaodan Liang, Yicheng Li, Yimin Chen, Ming Yang, and Liang Lin. 2018. Instance-level human parsing via part grouping network. In Proceedings of the European Conference on Computer Vision. Springer, Cham, 805–822.
[11]
Ke Gong, Xiaodan Liang, Dongyu Zhang, Xiaohui Shen, and Liang Lin. 2017. Look into person: Self-supervised structure-sensitive learning and a new benchmark for human parsing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 6757–6765.
[12]
Haoyu He, Jing Zhang, Qiming Zhang, and Dacheng Tao. 2020. Grapy-ML: Graph pyramid mutual learning for cross-dataset human parsing. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI.
[13]
K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In CVPR. IEEE, 770–778.
[14]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 7132–7141.
[15]
Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu. 2015. Spatial transformer networks. In NeurIPS. Curran Associates, Inc., Montreal, Quebec, Canada.
[16]
Ruyi Ji, Dawei Du, Libo Zhang, Longyin Wen, Yanjun Wu, Chen Zhao, Feiyue Huang, and Siwei Lyu. 2020. Learning semantic neural tree for human parsing. In ECCV, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.).
[17]
Mahdi M. Kalayeh, Emrah Basaran, Muhittin Gokmen, Mustafa E. Kamasak, and Mubarak Shah. 2018. Human semantic parsing for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1062–1071.
[18]
P. Li, Y. Xu, Y. Wei, and Y. Yang. 2020. Self-correction for human parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence (Early Access) (2020), 1–1.
[19]
T. Li, Z. Liang, S. Zhao, J. Gong, and J. Shen. 2020. Self-learning with rectification strategy for human parsing. In CVPR. 9260–9269.
[20]
Yanwei Li, Xinze Chen, Zheng Zhu, Lingxi Xie, Guan Huang, Dalong Du, and Xingang Wang. 2019. Attention-guided unified network for panoptic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 7019–7028.
[21]
X. Liang, K. Gong, X. Shen, and L. Lin. 2019. Look into person: Joint body parsing pose estimation network and a new benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence 41 (2019), 871–885.
[22]
X. Liang, L. Lin, Y. Wei, X. Shen, J. Yang, and S. Yan. 2018. Proposal-free network for instance-level object segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (2018), 2978–2991.
[23]
X. Liang, L. Lin, W. Yang, P. Luo, J. Huang, and S. Yan. 2016. Clothes co-parsing via joint image segmentation and labeling with application to clothing retrieval. IEEE Transactions on Multimedia 18 (2016), 1175–1186.
[24]
X. Liang, S. Liu, X. Shen, J. Yang, L. Liu, J. Dong, L. Lin, and S. Yan. 2015. Deep human parsing with active template regression. IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (2015), 2402–2414.
[25]
Guosheng Lin, Fayao Liu, Anton Milan, Chunhua Shen, and Ian Reid. 2019. RefineNet: Multi-path refinement networks for dense prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence 42 (2019), 1228–1242.
[26]
Yutian Lin, Liang Zheng, Zhedong Zheng, Yu Wu, and Yang Yang. 2019. Improving person re-identification by attribute and identity learning. Pattern Recognition 95 (2019), 151–161.
[27]
S. Liu, J. Feng, C. Domokos, H. Xu, J. Huang, Z. Hu, and S. Yan. 2014. Fashion parsing with weak color-category labels. IEEE Transactions on Multimedia 16 (2014), 253–265.
[28]
S. Liu, X. Liang, L. Liu, X. Shen, J. Yang, C. Xu, L. Lin, Xiaochun Cao, and S. Yan. 2015. Matching-CNN meets KNN: Quasi-parametric human parsing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1419–1427.
[29]
Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2013. Pedestrian parsing via deep decompositional network. In Proceedings of IEEE International Conference on Computer Vision. IEEE, 2380–7504.
[30]
Yawei Luo, Zhedong Zheng, Liang Zheng, Tao Guan, Junqing Yu, and Yi Yang. 2018. Macro-micro adversarial network for human parsing. In Proceedings of the European Conference on Computer Vision. Springer, Cham, Munich, Germany, 424–440.
[31]
Xuecheng Nie, Jiashi Feng, and Shuicheng Yan. 2018. Mutual learning to adapt for joint human parsing and pose estimation. In Proceedings of the European Conference on Computer Vision. Springer, Cham, Munich, Germany, 519–534.
[32]
Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, and Jian Sun. 2017. Large kernel matters–improve semantic segmentation by global convolutional network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4353–4361.
[33]
Xuebin Qin, Zichen Zhang, Chenyang Huang, Chao Gao, Masood Dehghan, and Martin Jagersand. 2019. BASNet: Boundary-Aware salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 7471–7481.
[34]
Rodolfo Quispe and Helio Pedrini. 2019. Enhanced person re-identification based on saliency and semantic parsing with deep neural network models. Image and Vision Computing 92 (2019), 103809.
[35]
Tao Ruan, Ting Liu, Zilong Huang, Yunchao Wei, Shikui Wei, and Yao Zhao. 2019. Devil in the details: Towards accurate single and multiple human parsing. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4814–4821.
[36]
Tao Ruan, Ting Liu, Zilong Huang, Yunchao Wei, Shikui Wei, Yao Zhao, and Thomas Huang. 2019. Devil in the details: Towards accurate single and multiple human parsing. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI, 4814–4821.
[37]
A. Shahroudy, T. Ng, Q. Yang, and G. Wang. 2016. Multimodal multipart learning for action recognition in depth videos. IEEE Transactions on Pattern Analysis and Machine Intelligence 38 (2016), 2123–2129.
[38]
Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. 2013. On the importance of initialization and momentum in deep learning. In Proceedings of International Conference on Machine Learning. PMLR, Atlanta, Georgia, 1139–1147.
[39]
Towaki Takikawa, David Acuna, Varun Jampani, and Sanja Fidler. 2019. Gated-SCNN: Gated shape CNNs for semantic segmentation. In Proceedings of IEEE International Conference on Computer Vision. IEEE, 5228–5237.
[40]
Wenguan Wang, Zhijie Zhang, Siyuan Qi, Jianbing Shen, Yanwei Pang, and Ling Shao. 2019. Learning compositional neural information fusion for human parsing. In Proceedings of IEEE International Conference on Computer Vision. IEEE, 5702–5712.
[41]
W. Wang, T. Zhou, S. Qi, J. Shen, and S. C. Zhu. 2021. Hierarchical human semantic parsing with comprehensive part-relation modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence (Early Access) (2021), 1–1.
[42]
W. Wang, H. Zhu, J. Dai, Y. Pang, J. Shen, and L. Shao. 2020. Hierarchical human parsing with typed part-relation reasoning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8926–8936.
[43]
Yang Wang, Duan Tran, Zicheng Liao, and David A. Forsyth. 2012. Discriminative hierarchical part-based models for human parsing and action recognition. Journal of Machine Learning Research 13 (2012), 3075–3102.
[44]
Y. Wu, Y. Lin, X. Dong, Y. Yan, W. Bian, and Y. Yang. 2019. Progressive learning for person re-identification with one example. IEEE Transactions on Image Processing 28 (2019), 2872–2881.
[45]
Fangting Xia, Jun Zhu, Peng Wang, and Alan L. Yuille. 2016. Pose-Guided human parsing by an and/or graph using pose-context features. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI, 3632–3640.
[46]
Saining Xie and Zhuowen Tu. 2017. Holistically-nested edge detection. International Journal of Computer Vision 125 (2017), 3–18.
[47]
Yuwen Xiong, Renjie Liao, Hengshuang Zhao, Rui Hu, Min Bai, Ersin Yumer, and Raquel Urtasun. 2019. UPSNet: A unified panoptic segmentation network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 8810–8818.
[48]
K. Yamaguchi, M. H. Kiapour, L. E. Ortiz, and T. L. Berg. 2015. Retrieving similar styles to parse clothing. IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (2015), 1028–1040.
[49]
Zhiding Yu, Chen Feng, Ming-Yu Liu, and Srikumar Ramalingam. 2017. CASENet: Deep category-aware semantic edge detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1761–1770.
[50]
Xiaomei Zhang, Y. Chen, B. Zhu, Jinqiao Wang, and Ming Tang. 2020. Blended grammar network for human parsing. In Proceedings of the European Conference on Computer Vision.
[51]
X. Zhang, Y. Chen, B. Zhu, J. Wang, and M. Tang. 2020. Part-aware context network for human parsing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8968–8977.
[52]
H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia. 2017. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 6230–6239.
[53]
J. Zhao, J. Li, X. Nie, F. Zhao, Y. Chen, Z. Wang, J. Feng, and S. Yan. 2017. Self-Supervised neural aggregation networks for human parsing. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’17). IEEE.
[54]
Ting Zhao and Xiangqian Wu. 2019. Pyramid feature attention network for saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3080–3089.
[55]
Bingke Zhu, Yingying Chen, Ming Tang, and Jinqiao Wang. 2018. Progressive cognitive human parsing. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI.
[56]
Xizhou Zhu, Han Hu, Stephen Lin, and Jifeng Dai. 2019. Deformable ConvNets V2: More deformable, better results. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 9300–9308.

Cited By

View all
  • (2024)Semantic Map Guided Identity Transfer GAN for Person Re-identificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363135520:11(1-20)Online publication date: 12-Sep-2024
  • (2023)ADOSMNet: a novel visual affordance detection network with object shape mask guided feature encodersMultimedia Tools and Applications10.1007/s11042-023-16898-283:11(31629-31653)Online publication date: 18-Sep-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 18, Issue 1
January 2022
517 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3505205
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 March 2022
Accepted: 01 May 2021
Revised: 01 March 2021
Received: 01 August 2020
Published in TOMM Volume 18, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Human parsing
  2. multi-task learning
  3. deformable convolution

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • National Natural Science Foundation of China
  • Guangdong International Science and Technology Cooperation Project
  • Guangdong Natural Science Foundation
  • Guangzhou Basic and Applied Research Project
  • Fundamental Research Funds for the Central Universities
  • Social Science Research Base of Guangdong Province-Research Center of Network Civilization in New Era of SCUT
  • CCF-Tencent Open Research fund

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)50
  • Downloads (Last 6 weeks)6
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Semantic Map Guided Identity Transfer GAN for Person Re-identificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363135520:11(1-20)Online publication date: 12-Sep-2024
  • (2023)ADOSMNet: a novel visual affordance detection network with object shape mask guided feature encodersMultimedia Tools and Applications10.1007/s11042-023-16898-283:11(31629-31653)Online publication date: 18-Sep-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media