[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3474085.3475559acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Multimodal Asymmetric Dual Learning for Unsupervised Eyeglasses Removal

Published: 17 October 2021 Publication History

Abstract

Glasses removal is a challenging task due to the diversity of glasses species and the difficulty of obtaining paired datasets. Most existing methods need to build different models for different glasses or expensive paired datasets for supervised training, which lacks universality. In this paper, we propose a multimodal asymmetric dual learning method for unsupervised glasses removal. This method uses large-scale face images with and without glasses for dual feature learning, which does not require intensive manual marking of the glasses. Given a face image with glasses, we aim to generate a glasses-free image preserving the person identity. Thus, in order to make up for the lack of semantic features in the glasses region, we introduce the text description of the target image into the task, and propose a text-guided multimodal feature fusion method. We adaptively select the glasses-free image closest to the target one for better dual feature learning. We also propose a exchange residual loss to generate more precise mask of glasses. Extensive experiments prove that our method can generate real glasses-free images, and better retain the person identity, which can be useful for face recognition.

References

[1]
Aseem Agarwala, Mira Dontcheva, Maneesh Agrawala, Steven Drucker, Alex Colburn, Brian Curless, David Salesin, and Michael Cohen. 2004. Interactive digital photomontage. In ACM Transactions on Graphics (ToG), Vol. 23. ACM, 294--302.
[2]
Wei-Yi Chang, Shih-Huan Hsu, and Jen-Hsien Chien. 2017. FATAUVA-Net: An integrated deep learning framework for facial attribute recognition, action unit detection, and valence-arousal estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 17--25.
[3]
Agisilaos Chartsias, Thomas Joyce, Giorgos Papanastasiou, Scott Semple, Michelle Williams, David E Newby, Rohan Dharmakumar, and Sotirios A Tsaftaris. 2019. Disentangled representation learning in cardiac image analysis. Medical image analysis, Vol. 58 (2019), 101535.
[4]
Jiankang Deng, Jia Guo, and Stefanos Zafeiriou. 2019. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
[5]
Hui Ding, Hao Zhou, Shaohua Zhou, and Rama Chellappa. 2018. A deep cascade network for unaligned face attribute classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
[6]
Brian Dolhansky and Cristian Canton Ferrer. 2018. Eye in-painting with exemplar generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7902--7911.
[7]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672--2680.
[8]
Yuwei Guo, Licheng Jiao, Shuang Wang, Shuo Wang, and Fang Liu. 2017. Fuzzy sparse autoencoder framework for single image per person face recognition. IEEE transactions on cybernetics, Vol. 48, 8 (2017), 2402--2415.
[9]
Zekun Hao, Yu Liu, Hongwei Qin, Junjie Yan, Xiu Li, and Xiaolin Hu. 2017. Scale-aware face detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6186--6195.
[10]
Bingwen Hu, Zhedong Zheng, Ping Liu, Wankou Yang, and Mingwu Ren. 2020. Unsupervised eyeglasses removal in the wild. IEEE Transactions on Cybernetics (2020).
[11]
Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision. 1501--1510.
[12]
Xun Huang, Ming-Yu Liu, Serge Belongie, and Jan Kautz. 2018. Multimodal unsupervised image-to-image translation. In Proceedings of the European conference on computer vision (ECCV). 172--189.
[13]
Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. 2017. Globally and locally consistent image completion. ACM Transactions on Graphics (ToG), Vol. 36, 4 (2017), 107.
[14]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1125--1134.
[15]
Zi-Hang Jiang, Qianyi Wu, Keyu Chen, and Juyong Zhang. 2019. Disentangled representation learning for 3D face shape. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11957--11966.
[16]
Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4401--4410.
[17]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[18]
Ashu Kumar, Amandeep Kaur, and Munish Kumar. 2019. Face detection techniques: a review. Artificial Intelligence Review, Vol. 52, 2 (2019), 927--948.
[19]
Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Maneesh Singh, and Ming-Hsuan Yang. 2018. Diverse image-to-image translation via disentangled representations. In Proceedings of the European conference on computer vision (ECCV). 35--51.
[20]
Wonkwang Lee, Donggyun Kim, Seunghoon Hong, and Honglak Lee. 2020. High-fidelity synthesis with disentangled representation. In European Conference on Computer Vision. Springer, 157--174.
[21]
Yu-Hui Lee and Shang-Hong Lai. 2020. ByeGlassesGAN: Identity Preserving Eyeglasses Removal for Face Images. In European Conference on Computer Vision. Springer, 243--258.
[22]
Shan Li and Weihong Deng. 2020. Deep facial expression recognition: A survey. IEEE Transactions on Affective Computing (2020).
[23]
Xiaoming Li, Guosheng Hu, Jieru Zhu, Wangmeng Zuo, Meng Wang, and Lei Zhang. 2020. Learning symmetry consistent deep cnns for face completion. IEEE Transactions on Image Processing, Vol. 29 (2020), 7641--7655.
[24]
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision. 3730--3738.
[25]
Jiaxu Miao, Yu Wu, Ping Liu, Yuhang Ding, and Yi Yang. 2019. Pose-guided feature alignment for occluded person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 542--551.
[26]
Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).
[27]
Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A Efros. 2016. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2536--2544.
[28]
Patrick Pérez, Michel Gangnet, and Andrew Blake. 2003. Poisson image editing. ACM Transactions on graphics (TOG), Vol. 22, 3 (2003), 313--318.
[29]
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 815--823.
[30]
Yifan Sun, Qin Xu, Yali Li, Chi Zhang, Yikang Li, Shengjin Wang, and Jian Sun. 2019. Perceive where to focus: Learning visibility-aware part-level features for partial person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 393--402.
[31]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2818--2826.
[32]
Luan Tran, Xi Yin, and Xiaoming Liu. 2017. Disentangled representation learning gan for pose-invariant face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1415--1424.
[33]
Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Zhifeng Li, Dihong Gong, Jingchao Zhou, and Wei Liu. 2018. Cosface: Large margin cosine loss for deep face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
[34]
Yulong Wang, Yuan Yan Tang, Luoqing Li, and Hong Chen. 2019. Modal regression-based atomic representation for robust face recognition and reconstruction. IEEE transactions on cybernetics, Vol. 50, 10 (2019), 4393--4405.
[35]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, Eero P Simoncelli, et almbox. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, Vol. 13, 4 (2004), 600--612.
[36]
Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, and Xiaodong He. 2018. Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1316--1324.
[37]
Bo Yan, Qing Lin, Weimin Tan, and Shili Zhou. 2020. Assessing Eye Aesthetics for Automatic Multi-Reference Eye In-Painting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13509--13517.
[38]
Zili Yi, Hao Zhang, Ping Tan, and Minglun Gong. 2017a. Dualgan: Unsupervised dual learning for image-to-image translation. In Proceedings of the IEEE international conference on computer vision. 2849--2857.
[39]
Zili Yi, Hao Zhang, Ping Tan, and Minglun Gong. 2017b. Dualgan: Unsupervised dual learning for image-to-image translation. In Proceedings of the IEEE international conference on computer vision. 2849--2857.
[40]
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang. 2018. Generative image inpainting with contextual attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5505--5514.
[41]
Leslie Zebrowitz. 2018. Reading faces: Window to the soul? Routledge.
[42]
Feifei Zhang, Tianzhu Zhang, Qirong Mao, and Changsheng Xu. 2018b. Joint pose and expression modeling for facial expression recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3359--3368.
[43]
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018a. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586--595.
[44]
Yajie Zhao, Weikai Chen, Jun Xing, Xiaoming Li, Zach Bessinger, Fuchang Liu, Wangmeng Zuo, and Ruigang Yang. 2018. Identity preserving face completion for large ocular region occlusion. arXiv preprint arXiv:1807.08772 (2018).
[45]
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017a. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision. 2223--2232.
[46]
Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A Efros, Oliver Wang, and Eli Shechtman. 2017b. Toward multimodal image-to-image translation. arXiv preprint arXiv:1711.11586 (2017).

Cited By

View all
  • (2024)Audio-Driven Identity Manipulation for Face InpaintingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680975(6123-6132)Online publication date: 28-Oct-2024
  • (2024)DeMaskGAN: a de-masking generative adversarial network guided by semantic segmentationThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-03125-040:8(5605-5618)Online publication date: 1-Aug-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dual learning
  2. multimodal
  3. unsupervised eyeglasses removal

Qualifiers

  • Research-article

Funding Sources

Conference

MM '21
Sponsor:
MM '21: ACM Multimedia Conference
October 20 - 24, 2021
Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)2
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Audio-Driven Identity Manipulation for Face InpaintingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680975(6123-6132)Online publication date: 28-Oct-2024
  • (2024)DeMaskGAN: a de-masking generative adversarial network guided by semantic segmentationThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-03125-040:8(5605-5618)Online publication date: 1-Aug-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media