More Web Proxy on the site http://driver.im/

research-article

Semantic-Aware Generator and Low-level Feature Augmentation for Few-shot Image Generation

Authors:

Ziqiu ChiAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 5079 - 5088

https://doi.org/10.1145/3581783.3612219

Published: 27 October 2023 Publication History

Abstract

Few-shot image generation aims to generate novel images for an unseen category with only a few samples. Prior studies fail to produce novel images with desirable diversity and fidelity. To ameliorate the generation quality, we in this paper propose a Semantic-Aware Generator (SAG) to provide explicit semantic guidance to the discriminator, and a Low-level Feature Augmentation (LFA) technique to provide fine-grained information, facilitating the diversity. Specifically, we observe that the generator feature layers contain different levels of semantic information. Such observation motivates us to employ intermediate feature maps of the generator as semantic labels to guide the discriminator, improving the semantic awareness of the generator. Moreover, spatially informative and diverse features obtained via LFA contribute to better generation quality. Together with the aforementioned module, we conduct extensive experiments on three representative benchmarks and the results demonstrate the effectiveness and advancement of our method.

References

[1]

Antreas Antoniou, Amos Storkey, and Harrison Edwards. 2017. Data augmentation generative adversarial networks. arXiv preprint arXiv:1711.04340 (2017).

[2]

Sergey Bartunov and Dmitry Vetrov. 2018. Few-shot generative modelling with generative matching networks. In International Conference on Artificial Intelligence and Statistics. PMLR, 670--678.

[3]

Christopher Burton, Per Fink, Peter Henningsen, Bernd Löwe, and Winfried Rief. 2020. Functional somatic disorders: discussion paper for a new common classification for research and clinical use. Bmc Medicine, Vol. 18, 1 (2020), 1--7.

[4]

Qiong Cao, Li Shen, Weidi Xie, Omkar M Parkhi, and Andrew Zisserman. 2018. Vggface2: A dataset for recognising faces across pose and age. In 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018). IEEE, 67--74.

Digital Library

[5]

Louis Clouâtre and Marc Demers. 2019. Figr: Few-shot image generation with reptile. arXiv preprint arXiv:1901.02199 (2019).

[6]

Edo Collins, Raja Bala, Bob Price, and Sabine Susstrunk. 2020. Editing in style: Uncovering the local semantics of gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5771--5780.

[7]

Kaiwen Cui, Jiaxing Huang, Zhipeng Luo, Gongjie Zhang, Fangneng Zhan, and Shijian Lu. 2022. GenCo: generative co-training for generative adversarial networks with limited data. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 499--507.

[8]

Hao Ding, Changchang Sun, Hao Tang, Dawen Cai, and Yan Yan. 2023. Few-shot Medical Image Segmentation with Cycle-resemblance Attention. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2488--2497.

[9]

Yuki Endo and Yoshihiro Kanamori. 2021. Few-shot semantic image synthesis using stylegan prior. arXiv preprint arXiv:2103.14877 (2021).

[10]

Ruiwei Feng, Xiangshang Zheng, Tianxiang Gao, Jintai Chen, Wenzhe Wang, Danny Z Chen, and Jian Wu. 2021. Interactive few-shot learning: Limited supervision, better medical image segmentation. IEEE Transactions on Medical Imaging, Vol. 40, 10 (2021), 2575--2588.

[11]

Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning. PMLR, 1126--1135.

[12]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM, Vol. 63, 11 (2020), 139--144.

Digital Library

[13]

Zheng Gu, Wenbin Li, Jing Huo, Lei Wang, and Yang Gao. 2021. Lofgan: Fusing local representations for few-shot image generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8463--8471.

[14]

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, Vol. 30 (2017).

Digital Library

[15]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, Vol. 33 (2020), 6840--6851.

[16]

Yan Hong, Li Niu, Jianfu Zhang, and Liqing Zhang. 2020a. Matchinggan: Matching-based few-shot image generation. In 2020 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1--6.

[17]

Yan Hong, Li Niu, Jianfu Zhang, and Liqing Zhang. 2022a. Deltagan: Towards diverse few-shot image generation with sample-specific delta. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XVI. Springer, 259--276.

[18]

Yan Hong, Li Niu, Jianfu Zhang, and Liqing Zhang. 2022b. Few-shot Image Generation Using Discrete Content Representation. In Proceedings of the 30th ACM International Conference on Multimedia. 2796--2804.

Digital Library

[19]

Yan Hong, Li Niu, Jianfu Zhang, Weijie Zhao, Chen Fu, and Liqing Zhang. 2020b. F2gan: Fusing-and-filling gan for few-shot image generation. In Proceedings of the 28th ACM international conference on multimedia. 2535--2543.

Digital Library

[20]

Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision. 1501--1510.

[21]

Liming Jiang, Bo Dai, Wayne Wu, and Chen Change Loy. 2021. Deceive D: adaptive pseudo augmentation for GAN training with limited data. Advances in Neural Information Processing Systems, Vol. 34 (2021), 21655--21667.

[22]

Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. 2020a. Training generative adversarial networks with limited data. Advances in neural information processing systems, Vol. 33 (2020), 12104--12114.

[23]

Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4401--4410.

[24]

Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020b. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8110--8119.

[25]

Junho Kim, Yunjey Choi, and Youngjung Uh. 2022. Feature statistics mixing regularization for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11294--11303.

[26]

Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).

[27]

Gayoung Lee, Hyunsu Kim, Junho Kim, Seonghyeon Kim, Jung-Woo Ha, and Yunjey Choi. 2022. Generator Knows What Discriminator Should Learn in Unconditional GANs. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XVII. Springer, 406--422.

[28]

Ziqiang Li, Muhammad Usman, Rentuo Tao, Pengfei Xia, Chaoyue Wang, Huanhuan Chen, and Bin Li. 2023 a. A systematic survey of regularization and normalization in GANs. Comput. Surveys, Vol. 55, 11 (2023), 1--37.

Digital Library

[29]

Ziqiang Li, Chaoyue Wang, Heliang Zheng, Jing Zhang, and Bin Li. 2022a. FakeCLR: Exploring contrastive learning for solving latent discontinuity in data-efficient GANs. In European Conference on Computer Vision. Springer, 598--615.

Digital Library

[30]

Ziqiang Li, Beihao Xia, Jing Zhang, Chaoyue Wang, and Bin Li. 2022b. A comprehensive survey on data-efficient GANs in image generation. arXiv preprint arXiv:2204.08329 (2022).

[31]

Ziqiang Li, Pengfei Xia, Xue Rui, and Bin Li. 2023 b. Exploring The Effect of High-frequency Components in GANs Training. ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 19, 5 (2023), 1--22.

Digital Library

[32]

Weixin Liang, Zixuan Liu, and Can Liu. 2020. Dawson: A domain adaptive few shot generation framework. arXiv preprint arXiv:2001.00576 (2020).

[33]

Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, and Jan Kautz. 2019a. Few-shot unsupervised image-to-image translation. In Proceedings of the IEEE/CVF international conference on computer vision. 10551--10560.

[34]

Xihui Liu, Guojun Yin, Jing Shao, Xiaogang Wang, et al. 2019b. Learning to predict layout-to-image conditional convolutions for semantic image synthesis. Advances in Neural Information Processing Systems, Vol. 32 (2019).

[35]

Alex Nichol and John Schulman. 2018. Reptile: a scalable metalearning algorithm. arXiv preprint arXiv:1803.02999, Vol. 2, 3 (2018), 4.

[36]

Maria-Elena Nilsback and Andrew Zisserman. 2008. Automated flower classification over a large number of classes. In 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing. IEEE, 722--729.

Digital Library

[37]

Augustus Odena, Christopher Olah, and Jonathon Shlens. 2017. Conditional image synthesis with auxiliary classifier gans. In International conference on machine learning. PMLR, 2642--2651.

[38]

Utkarsh Ojha, Yijun Li, Jingwan Lu, Alexei A Efros, Yong Jae Lee, Eli Shechtman, and Richard Zhang. 2021. Few-shot image generation via cross-domain correspondence. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10743--10752.

[39]

Oindrila Saha, Zezhou Cheng, and Subhransu Maji. 2022. GANORCON: Are Generative Models Useful for Few-shot Segmentation?. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9991--10000.

[40]

Edgar Schönfeld, Vadim Sushko, Dan Zhang, Juergen Gall, Bernt Schiele, and Anna Khoreva. 2021. You Only Need Adversarial Supervision for Semantic Image Synthesis. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.

[41]

Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 31.

[42]

Mingxing Tan and Quoc V Le. 2019. Mixconv: Mixed depthwise convolutional kernels. arXiv preprint arXiv:1907.09595 (2019).

[43]

Hao Tang, Song Bai, and Nicu Sebe. 2020. Dual attention gans for semantic image synthesis. In Proceedings of the 28th ACM International Conference on Multimedia. 1994--2002.

Digital Library

[44]

Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. 2016. Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022 (2016).

[45]

Zhe Wang, Ziqiu Chi, Yanbing Zhang, et al. 2022. Fregan: exploiting frequency components for training gans under limited data. Advances in Neural Information Processing Systems, Vol. 35 (2022), 33387--33399.

[46]

Jiayu Xiao, Liang Li, Chaofei Wang, Zheng-Jun Zha, and Qingming Huang. 2022. Few shot generative model adaption via relaxed spatial structural alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11204--11213.

[47]

Jianjin Xu, Zhaoxiang Zhang, and Xiaolin Hu. 2023. Extracting Semantic Knowledge from GANs with Unsupervised Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).

Digital Library

[48]

Jianjin Xu and Changxi Zheng. 2021. Linear semantics in generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9351--9360.

[49]

Ceyuan Yang, Yujun Shen, Yinghao Xu, Deli Zhao, Bo Dai, and Bolei Zhou. 2022a. Improving gans with a dynamic discriminator. arXiv preprint arXiv:2209.09897 (2022).

[50]

Mengping Yang, Zhe Wang, Ziqiu Chi, and Wenyi Feng. 2022b. WaveGAN: Frequency-Aware GAN for High-Fidelity Few-Shot Image Generation. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XV. Springer, 1--17.

[51]

Gunwoo Yong, Kahyun Jeon, Daeyoung Gil, and Ghang Lee. 2022. Prompt engineering for zero-shot and few-shot defect detection and classification using a visual-language pretrained model. Computer-Aided Civil and Infrastructure Engineering (2022).

[52]

Ruixiang Zhang, Tong Che, Zoubin Ghahramani, Yoshua Bengio, and Yangqiu Song. 2018a. Metagan: An adversarial approach to few-shot learning. Advances in neural information processing systems, Vol. 31 (2018).

[53]

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018b. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586--595.

[54]

Xiaofeng Zhang, Zhangyang Wang, Dong Liu, Qifeng Lin, and Qing Ling. 2020. Deep adversarial data augmentation for extremely low data regimes. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 31, 1 (2020), 15--28.

Digital Library

[55]

Shengyu Zhao, Zhijian Liu, Ji Lin, Jun-Yan Zhu, and Song Han. 2020. Differentiable augmentation for data-efficient gan training. Advances in Neural Information Processing Systems, Vol. 33 (2020), 7559--7570.

[56]

Yunqing Zhao, Henghui Ding, Houjing Huang, and Ngai-Man Cheung. 2022. A closer look at few-shot image generation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9140--9150.

[57]

Zhengli Zhao, Sameer Singh, Honglak Lee, Zizhao Zhang, Augustus Odena, and Han Zhang. 2021. Improved consistency regularization for gans. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 11033--11041.

[58]

Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8697--8710.

Cited By

Xiao TCai YGuan JWang Z(2025)Semantic Mask Reconstruction and Category Semantic Learning for few-shot image generationNeural Networks10.1016/j.neunet.2024.106946183(106946)Online publication date: Mar-2025
https://doi.org/10.1016/j.neunet.2024.106946

Index Terms

Semantic-Aware Generator and Low-level Feature Augmentation for Few-shot Image Generation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Appearance and texture representations
        Image representations
  2. Machine learning
    1. Machine learning approaches
      1. Learning latent representations
      2. Neural networks

Recommendations

Efficient Few-Shot Image Generation via Lightweight Octave Generative Adversarial Networks
Image and Graphics
Abstract
Generating high-quality images from few-shot image is always a challenging task. FastGAN achieves great success in few-shot image generation task by using a lightweight network structure with incorporating a self-supervised discriminator and skip-...
Semantic Mask Reconstruction and Category Semantic Learning for few-shot image generation
Abstract
Few-shot image generation aims at generating novel images for the unseen category when given K images from the same category. Despite significant advancements in existing few-shot image generation methods, great challenges remain regarding the ...
Lightweight dual-path octave generative adversarial networks for few-shot image generation
Abstract
Generative Adversarial Networks (GANs) can synthesize high-quality images by estimating the latent distribution of adversarial learning samples. However, GAN-based methods often suffer from severe overfitting when confronted with limited training ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Shanghai Science and Technology Program
Shanghai Science and Technology Program
Natural Science Foundation of China

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
153
Total Downloads

Downloads (Last 12 months)72
Downloads (Last 6 weeks)5

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xiao TCai YGuan JWang Z(2025)Semantic Mask Reconstruction and Category Semantic Learning for few-shot image generationNeural Networks10.1016/j.neunet.2024.106946183(106946)Online publication date: Mar-2025
https://doi.org/10.1016/j.neunet.2024.106946

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten