More Web Proxy on the site http://driver.im/

research-article

Multimodal Asymmetric Dual Learning for Unsupervised Eyeglasses Removal

Authors:

Weimin TanAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 5092 - 5100

https://doi.org/10.1145/3474085.3475559

Published: 17 October 2021 Publication History

Abstract

Glasses removal is a challenging task due to the diversity of glasses species and the difficulty of obtaining paired datasets. Most existing methods need to build different models for different glasses or expensive paired datasets for supervised training, which lacks universality. In this paper, we propose a multimodal asymmetric dual learning method for unsupervised glasses removal. This method uses large-scale face images with and without glasses for dual feature learning, which does not require intensive manual marking of the glasses. Given a face image with glasses, we aim to generate a glasses-free image preserving the person identity. Thus, in order to make up for the lack of semantic features in the glasses region, we introduce the text description of the target image into the task, and propose a text-guided multimodal feature fusion method. We adaptively select the glasses-free image closest to the target one for better dual feature learning. We also propose a exchange residual loss to generate more precise mask of glasses. Extensive experiments prove that our method can generate real glasses-free images, and better retain the person identity, which can be useful for face recognition.

References

[1]

Aseem Agarwala, Mira Dontcheva, Maneesh Agrawala, Steven Drucker, Alex Colburn, Brian Curless, David Salesin, and Michael Cohen. 2004. Interactive digital photomontage. In ACM Transactions on Graphics (ToG), Vol. 23. ACM, 294--302.

Digital Library

[2]

Wei-Yi Chang, Shih-Huan Hsu, and Jen-Hsien Chien. 2017. FATAUVA-Net: An integrated deep learning framework for facial attribute recognition, action unit detection, and valence-arousal estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 17--25.

[3]

Agisilaos Chartsias, Thomas Joyce, Giorgos Papanastasiou, Scott Semple, Michelle Williams, David E Newby, Rohan Dharmakumar, and Sotirios A Tsaftaris. 2019. Disentangled representation learning in cardiac image analysis. Medical image analysis, Vol. 58 (2019), 101535.

[4]

Jiankang Deng, Jia Guo, and Stefanos Zafeiriou. 2019. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[5]

Hui Ding, Hao Zhou, Shaohua Zhou, and Rama Chellappa. 2018. A deep cascade network for unaligned face attribute classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.

[6]

Brian Dolhansky and Cristian Canton Ferrer. 2018. Eye in-painting with exemplar generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7902--7911.

[7]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672--2680.

Digital Library

[8]

Yuwei Guo, Licheng Jiao, Shuang Wang, Shuo Wang, and Fang Liu. 2017. Fuzzy sparse autoencoder framework for single image per person face recognition. IEEE transactions on cybernetics, Vol. 48, 8 (2017), 2402--2415.

[9]

Zekun Hao, Yu Liu, Hongwei Qin, Junjie Yan, Xiu Li, and Xiaolin Hu. 2017. Scale-aware face detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6186--6195.

[10]

Bingwen Hu, Zhedong Zheng, Ping Liu, Wankou Yang, and Mingwu Ren. 2020. Unsupervised eyeglasses removal in the wild. IEEE Transactions on Cybernetics (2020).

[11]

Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision. 1501--1510.

[12]

Xun Huang, Ming-Yu Liu, Serge Belongie, and Jan Kautz. 2018. Multimodal unsupervised image-to-image translation. In Proceedings of the European conference on computer vision (ECCV). 172--189.

Digital Library

[13]

Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. 2017. Globally and locally consistent image completion. ACM Transactions on Graphics (ToG), Vol. 36, 4 (2017), 107.

Digital Library

[14]

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1125--1134.

[15]

Zi-Hang Jiang, Qianyi Wu, Keyu Chen, and Juyong Zhang. 2019. Disentangled representation learning for 3D face shape. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11957--11966.

[16]

Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4401--4410.

[17]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[18]

Ashu Kumar, Amandeep Kaur, and Munish Kumar. 2019. Face detection techniques: a review. Artificial Intelligence Review, Vol. 52, 2 (2019), 927--948.

Digital Library

[19]

Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Maneesh Singh, and Ming-Hsuan Yang. 2018. Diverse image-to-image translation via disentangled representations. In Proceedings of the European conference on computer vision (ECCV). 35--51.

Digital Library

[20]

Wonkwang Lee, Donggyun Kim, Seunghoon Hong, and Honglak Lee. 2020. High-fidelity synthesis with disentangled representation. In European Conference on Computer Vision. Springer, 157--174.

Digital Library

[21]

Yu-Hui Lee and Shang-Hong Lai. 2020. ByeGlassesGAN: Identity Preserving Eyeglasses Removal for Face Images. In European Conference on Computer Vision. Springer, 243--258.

[22]

Shan Li and Weihong Deng. 2020. Deep facial expression recognition: A survey. IEEE Transactions on Affective Computing (2020).

[23]

Xiaoming Li, Guosheng Hu, Jieru Zhu, Wangmeng Zuo, Meng Wang, and Lei Zhang. 2020. Learning symmetry consistent deep cnns for face completion. IEEE Transactions on Image Processing, Vol. 29 (2020), 7641--7655.

Digital Library

[24]

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision. 3730--3738.

Digital Library

[25]

Jiaxu Miao, Yu Wu, Ping Liu, Yuhang Ding, and Yi Yang. 2019. Pose-guided feature alignment for occluded person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 542--551.

[26]

Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).

[27]

Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A Efros. 2016. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2536--2544.

[28]

Patrick Pérez, Michel Gangnet, and Andrew Blake. 2003. Poisson image editing. ACM Transactions on graphics (TOG), Vol. 22, 3 (2003), 313--318.

Digital Library

[29]

Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 815--823.

[30]

Yifan Sun, Qin Xu, Yali Li, Chi Zhang, Yikang Li, Shengjin Wang, and Jian Sun. 2019. Perceive where to focus: Learning visibility-aware part-level features for partial person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 393--402.

[31]

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2818--2826.

[32]

Luan Tran, Xi Yin, and Xiaoming Liu. 2017. Disentangled representation learning gan for pose-invariant face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1415--1424.

[33]

Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Zhifeng Li, Dihong Gong, Jingchao Zhou, and Wei Liu. 2018. Cosface: Large margin cosine loss for deep face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[34]

Yulong Wang, Yuan Yan Tang, Luoqing Li, and Hong Chen. 2019. Modal regression-based atomic representation for robust face recognition and reconstruction. IEEE transactions on cybernetics, Vol. 50, 10 (2019), 4393--4405.

[35]

Zhou Wang, Alan C Bovik, Hamid R Sheikh, Eero P Simoncelli, et almbox. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, Vol. 13, 4 (2004), 600--612.

Digital Library

[36]

Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, and Xiaodong He. 2018. Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1316--1324.

[37]

Bo Yan, Qing Lin, Weimin Tan, and Shili Zhou. 2020. Assessing Eye Aesthetics for Automatic Multi-Reference Eye In-Painting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13509--13517.

[38]

Zili Yi, Hao Zhang, Ping Tan, and Minglun Gong. 2017a. Dualgan: Unsupervised dual learning for image-to-image translation. In Proceedings of the IEEE international conference on computer vision. 2849--2857.

[39]

Zili Yi, Hao Zhang, Ping Tan, and Minglun Gong. 2017b. Dualgan: Unsupervised dual learning for image-to-image translation. In Proceedings of the IEEE international conference on computer vision. 2849--2857.

[40]

Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang. 2018. Generative image inpainting with contextual attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5505--5514.

[41]

Leslie Zebrowitz. 2018. Reading faces: Window to the soul? Routledge.

[42]

Feifei Zhang, Tianzhu Zhang, Qirong Mao, and Changsheng Xu. 2018b. Joint pose and expression modeling for facial expression recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3359--3368.

[43]

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018a. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586--595.

[44]

Yajie Zhao, Weikai Chen, Jun Xing, Xiaoming Li, Zach Bessinger, Fuchang Liu, Wangmeng Zuo, and Ruigang Yang. 2018. Identity preserving face completion for large ocular region occlusion. arXiv preprint arXiv:1807.08772 (2018).

[45]

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017a. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision. 2223--2232.

[46]

Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A Efros, Oliver Wang, and Eli Shechtman. 2017b. Toward multimodal image-to-image translation. arXiv preprint arXiv:1711.11586 (2017).

Digital Library

Cited By

Sun YLin QTan WYan BCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Audio-Driven Identity Manipulation for Face InpaintingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680975(6123-6132)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680975
Ye ZZhang HLi XZhang Q(2024)DeMaskGAN: a de-masking generative adversarial network guided by semantic segmentationThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-03125-040:8(5605-5618)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1007/s00371-023-03125-0

Index Terms

Multimodal Asymmetric Dual Learning for Unsupervised Eyeglasses Removal
1. Computing methodologies

Recommendations

Eyeglasses removal from facial images

A novel approach of removing eyeglasses from frontal facial images is proposed. The region occluded by eyeglasses is firstly detected; a natural looking eyeglassless facial image is then synthesized by recursive error compensation of PCA reconstruction. ...
PCA-based image recombination for multimodal 2D+3D face recognition

Most of the existing approaches of multimodal 2D+3D face recognition exploit the 2D and 3D information at the feature or score level. They do not fully benefit from the dependency between modalities. Exploiting this dependency at the early stage is more ...
Learning Dual Retrieval Module for Semi-supervised Relation Extraction
WWW '19: The World Wide Web Conference

Relation extraction is an important task in structuring content of text data, and becomes especially challenging when learning with weak supervision-where only a limited number of labeled sentences are given and a large number of unlabeled sentences are ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

October 2021

5796 pages

ISBN:9781450386517

DOI:10.1145/3474085

General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

MM '21

Sponsor:

SIGMM

MM '21: ACM Multimedia Conference

October 20 - 24, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
201
Total Downloads

Downloads (Last 12 months)17
Downloads (Last 6 weeks)3

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sun YLin QTan WYan BCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Audio-Driven Identity Manipulation for Face InpaintingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680975(6123-6132)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680975
Ye ZZhang HLi XZhang Q(2024)DeMaskGAN: a de-masking generative adversarial network guided by semantic segmentationThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-03125-040:8(5605-5618)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1007/s00371-023-03125-0

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten