[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3581783.3612219acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Semantic-Aware Generator and Low-level Feature Augmentation for Few-shot Image Generation

Published: 27 October 2023 Publication History

Abstract

Few-shot image generation aims to generate novel images for an unseen category with only a few samples. Prior studies fail to produce novel images with desirable diversity and fidelity. To ameliorate the generation quality, we in this paper propose a Semantic-Aware Generator (SAG) to provide explicit semantic guidance to the discriminator, and a Low-level Feature Augmentation (LFA) technique to provide fine-grained information, facilitating the diversity. Specifically, we observe that the generator feature layers contain different levels of semantic information. Such observation motivates us to employ intermediate feature maps of the generator as semantic labels to guide the discriminator, improving the semantic awareness of the generator. Moreover, spatially informative and diverse features obtained via LFA contribute to better generation quality. Together with the aforementioned module, we conduct extensive experiments on three representative benchmarks and the results demonstrate the effectiveness and advancement of our method.

References

[1]
Antreas Antoniou, Amos Storkey, and Harrison Edwards. 2017. Data augmentation generative adversarial networks. arXiv preprint arXiv:1711.04340 (2017).
[2]
Sergey Bartunov and Dmitry Vetrov. 2018. Few-shot generative modelling with generative matching networks. In International Conference on Artificial Intelligence and Statistics. PMLR, 670--678.
[3]
Christopher Burton, Per Fink, Peter Henningsen, Bernd Löwe, and Winfried Rief. 2020. Functional somatic disorders: discussion paper for a new common classification for research and clinical use. Bmc Medicine, Vol. 18, 1 (2020), 1--7.
[4]
Qiong Cao, Li Shen, Weidi Xie, Omkar M Parkhi, and Andrew Zisserman. 2018. Vggface2: A dataset for recognising faces across pose and age. In 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018). IEEE, 67--74.
[5]
Louis Clouâtre and Marc Demers. 2019. Figr: Few-shot image generation with reptile. arXiv preprint arXiv:1901.02199 (2019).
[6]
Edo Collins, Raja Bala, Bob Price, and Sabine Susstrunk. 2020. Editing in style: Uncovering the local semantics of gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5771--5780.
[7]
Kaiwen Cui, Jiaxing Huang, Zhipeng Luo, Gongjie Zhang, Fangneng Zhan, and Shijian Lu. 2022. GenCo: generative co-training for generative adversarial networks with limited data. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 499--507.
[8]
Hao Ding, Changchang Sun, Hao Tang, Dawen Cai, and Yan Yan. 2023. Few-shot Medical Image Segmentation with Cycle-resemblance Attention. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2488--2497.
[9]
Yuki Endo and Yoshihiro Kanamori. 2021. Few-shot semantic image synthesis using stylegan prior. arXiv preprint arXiv:2103.14877 (2021).
[10]
Ruiwei Feng, Xiangshang Zheng, Tianxiang Gao, Jintai Chen, Wenzhe Wang, Danny Z Chen, and Jian Wu. 2021. Interactive few-shot learning: Limited supervision, better medical image segmentation. IEEE Transactions on Medical Imaging, Vol. 40, 10 (2021), 2575--2588.
[11]
Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning. PMLR, 1126--1135.
[12]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM, Vol. 63, 11 (2020), 139--144.
[13]
Zheng Gu, Wenbin Li, Jing Huo, Lei Wang, and Yang Gao. 2021. Lofgan: Fusing local representations for few-shot image generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8463--8471.
[14]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, Vol. 30 (2017).
[15]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, Vol. 33 (2020), 6840--6851.
[16]
Yan Hong, Li Niu, Jianfu Zhang, and Liqing Zhang. 2020a. Matchinggan: Matching-based few-shot image generation. In 2020 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1--6.
[17]
Yan Hong, Li Niu, Jianfu Zhang, and Liqing Zhang. 2022a. Deltagan: Towards diverse few-shot image generation with sample-specific delta. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XVI. Springer, 259--276.
[18]
Yan Hong, Li Niu, Jianfu Zhang, and Liqing Zhang. 2022b. Few-shot Image Generation Using Discrete Content Representation. In Proceedings of the 30th ACM International Conference on Multimedia. 2796--2804.
[19]
Yan Hong, Li Niu, Jianfu Zhang, Weijie Zhao, Chen Fu, and Liqing Zhang. 2020b. F2gan: Fusing-and-filling gan for few-shot image generation. In Proceedings of the 28th ACM international conference on multimedia. 2535--2543.
[20]
Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision. 1501--1510.
[21]
Liming Jiang, Bo Dai, Wayne Wu, and Chen Change Loy. 2021. Deceive D: adaptive pseudo augmentation for GAN training with limited data. Advances in Neural Information Processing Systems, Vol. 34 (2021), 21655--21667.
[22]
Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. 2020a. Training generative adversarial networks with limited data. Advances in neural information processing systems, Vol. 33 (2020), 12104--12114.
[23]
Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4401--4410.
[24]
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020b. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8110--8119.
[25]
Junho Kim, Yunjey Choi, and Youngjung Uh. 2022. Feature statistics mixing regularization for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11294--11303.
[26]
Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
[27]
Gayoung Lee, Hyunsu Kim, Junho Kim, Seonghyeon Kim, Jung-Woo Ha, and Yunjey Choi. 2022. Generator Knows What Discriminator Should Learn in Unconditional GANs. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XVII. Springer, 406--422.
[28]
Ziqiang Li, Muhammad Usman, Rentuo Tao, Pengfei Xia, Chaoyue Wang, Huanhuan Chen, and Bin Li. 2023 a. A systematic survey of regularization and normalization in GANs. Comput. Surveys, Vol. 55, 11 (2023), 1--37.
[29]
Ziqiang Li, Chaoyue Wang, Heliang Zheng, Jing Zhang, and Bin Li. 2022a. FakeCLR: Exploring contrastive learning for solving latent discontinuity in data-efficient GANs. In European Conference on Computer Vision. Springer, 598--615.
[30]
Ziqiang Li, Beihao Xia, Jing Zhang, Chaoyue Wang, and Bin Li. 2022b. A comprehensive survey on data-efficient GANs in image generation. arXiv preprint arXiv:2204.08329 (2022).
[31]
Ziqiang Li, Pengfei Xia, Xue Rui, and Bin Li. 2023 b. Exploring The Effect of High-frequency Components in GANs Training. ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 19, 5 (2023), 1--22.
[32]
Weixin Liang, Zixuan Liu, and Can Liu. 2020. Dawson: A domain adaptive few shot generation framework. arXiv preprint arXiv:2001.00576 (2020).
[33]
Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, and Jan Kautz. 2019a. Few-shot unsupervised image-to-image translation. In Proceedings of the IEEE/CVF international conference on computer vision. 10551--10560.
[34]
Xihui Liu, Guojun Yin, Jing Shao, Xiaogang Wang, et al. 2019b. Learning to predict layout-to-image conditional convolutions for semantic image synthesis. Advances in Neural Information Processing Systems, Vol. 32 (2019).
[35]
Alex Nichol and John Schulman. 2018. Reptile: a scalable metalearning algorithm. arXiv preprint arXiv:1803.02999, Vol. 2, 3 (2018), 4.
[36]
Maria-Elena Nilsback and Andrew Zisserman. 2008. Automated flower classification over a large number of classes. In 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing. IEEE, 722--729.
[37]
Augustus Odena, Christopher Olah, and Jonathon Shlens. 2017. Conditional image synthesis with auxiliary classifier gans. In International conference on machine learning. PMLR, 2642--2651.
[38]
Utkarsh Ojha, Yijun Li, Jingwan Lu, Alexei A Efros, Yong Jae Lee, Eli Shechtman, and Richard Zhang. 2021. Few-shot image generation via cross-domain correspondence. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10743--10752.
[39]
Oindrila Saha, Zezhou Cheng, and Subhransu Maji. 2022. GANORCON: Are Generative Models Useful for Few-shot Segmentation?. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9991--10000.
[40]
Edgar Schönfeld, Vadim Sushko, Dan Zhang, Juergen Gall, Bernt Schiele, and Anna Khoreva. 2021. You Only Need Adversarial Supervision for Semantic Image Synthesis. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
[41]
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 31.
[42]
Mingxing Tan and Quoc V Le. 2019. Mixconv: Mixed depthwise convolutional kernels. arXiv preprint arXiv:1907.09595 (2019).
[43]
Hao Tang, Song Bai, and Nicu Sebe. 2020. Dual attention gans for semantic image synthesis. In Proceedings of the 28th ACM International Conference on Multimedia. 1994--2002.
[44]
Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. 2016. Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022 (2016).
[45]
Zhe Wang, Ziqiu Chi, Yanbing Zhang, et al. 2022. Fregan: exploiting frequency components for training gans under limited data. Advances in Neural Information Processing Systems, Vol. 35 (2022), 33387--33399.
[46]
Jiayu Xiao, Liang Li, Chaofei Wang, Zheng-Jun Zha, and Qingming Huang. 2022. Few shot generative model adaption via relaxed spatial structural alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11204--11213.
[47]
Jianjin Xu, Zhaoxiang Zhang, and Xiaolin Hu. 2023. Extracting Semantic Knowledge from GANs with Unsupervised Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).
[48]
Jianjin Xu and Changxi Zheng. 2021. Linear semantics in generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9351--9360.
[49]
Ceyuan Yang, Yujun Shen, Yinghao Xu, Deli Zhao, Bo Dai, and Bolei Zhou. 2022a. Improving gans with a dynamic discriminator. arXiv preprint arXiv:2209.09897 (2022).
[50]
Mengping Yang, Zhe Wang, Ziqiu Chi, and Wenyi Feng. 2022b. WaveGAN: Frequency-Aware GAN for High-Fidelity Few-Shot Image Generation. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XV. Springer, 1--17.
[51]
Gunwoo Yong, Kahyun Jeon, Daeyoung Gil, and Ghang Lee. 2022. Prompt engineering for zero-shot and few-shot defect detection and classification using a visual-language pretrained model. Computer-Aided Civil and Infrastructure Engineering (2022).
[52]
Ruixiang Zhang, Tong Che, Zoubin Ghahramani, Yoshua Bengio, and Yangqiu Song. 2018a. Metagan: An adversarial approach to few-shot learning. Advances in neural information processing systems, Vol. 31 (2018).
[53]
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018b. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586--595.
[54]
Xiaofeng Zhang, Zhangyang Wang, Dong Liu, Qifeng Lin, and Qing Ling. 2020. Deep adversarial data augmentation for extremely low data regimes. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 31, 1 (2020), 15--28.
[55]
Shengyu Zhao, Zhijian Liu, Ji Lin, Jun-Yan Zhu, and Song Han. 2020. Differentiable augmentation for data-efficient gan training. Advances in Neural Information Processing Systems, Vol. 33 (2020), 7559--7570.
[56]
Yunqing Zhao, Henghui Ding, Houjing Huang, and Ngai-Man Cheung. 2022. A closer look at few-shot image generation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9140--9150.
[57]
Zhengli Zhao, Sameer Singh, Honglak Lee, Zizhao Zhang, Augustus Odena, and Han Zhang. 2021. Improved consistency regularization for gans. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 11033--11041.
[58]
Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8697--8710.

Cited By

View all
  • (2025)Semantic Mask Reconstruction and Category Semantic Learning for few-shot image generationNeural Networks10.1016/j.neunet.2024.106946183(106946)Online publication date: Mar-2025

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '23: Proceedings of the 31st ACM International Conference on Multimedia
October 2023
9913 pages
ISBN:9798400701085
DOI:10.1145/3581783
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. few-shot image generation
  2. generative adversarial network
  3. generator features

Qualifiers

  • Research-article

Funding Sources

  • Shanghai Science and Technology Program
  • Shanghai Science and Technology Program
  • Natural Science Foundation of China

Conference

MM '23
Sponsor:
MM '23: The 31st ACM International Conference on Multimedia
October 29 - November 3, 2023
Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)91
  • Downloads (Last 6 weeks)6
Reflects downloads up to 26 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Semantic Mask Reconstruction and Category Semantic Learning for few-shot image generationNeural Networks10.1016/j.neunet.2024.106946183(106946)Online publication date: Mar-2025

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media