[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3123266.3123450acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Stylized Adversarial AutoEncoder for Image Generation

Published: 19 October 2017 Publication History

Abstract

In this paper, we propose an autoencoder-based generative adversarial network (GAN) for automatic image generation, which is called "stylized adversarial autoencoder". Different from existing generative autoencoders which typically impose a prior distribution over the latent vector, the proposed approach splits the latent variable into two components: style feature and content feature, both encoded from real images. The split of the latent vector enables us adjusting the content and the style of the generated image arbitrarily by choosing different exemplary images. In addition, a multiclass classifier is adopted in the GAN network as the discriminator, which makes the generated images more realistic. We performed experiments on hand-writing digits, scene text and face datasets, in which the stylized adversarial autoencoder achieves superior results for image generation as well as remarkably improves the corresponding supervised recognition task.

References

[1]
Yoshua Bengio, Ian J Goodfellow, and Aaron Courville. 2015. Deep learning. An MIT Press book in preparation. Draft chapters available at http://www. iro. umontreal. ca/ bengioy/dlbook (2015).
[2]
Yoshua Bengio, Eric Laufer, Guillaume Alain, and Jason Yosinski. 2014. Deep Generative Stochastic Networks Trainable by Backprop Proceedings of the 31st International Conference on Machine Learning (ICML-14). 226--234.
[3]
Yoshua Bengio, Grégoire Mesnil, Yann Dauphin, and Salah Rifai. 2013. Better Mixing via Deep Representations. In ICML (1). 552--560.
[4]
Olivier Breuleux, Yoshua Bengio, and Pascal Vincent. 2011. Quickly generating representative samples from an rbm-derived process. Neural Computation, Vol. 23, 8 (2011), 2058--2073.
[5]
Emily L Denton, Soumith Chintala, Rob Fergus, and others. 2015. Deep Generative Image Models using aïji Laplacian Pyramid of Adversarial Networks Advances in neural information processing systems. 1486--1494.
[6]
Alexey Dosovitskiy, Jost Tobias Springenberg, and Thomas Brox. 2015. Learning to generate chairs with convolutional neural networks Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1538--1546.
[7]
Leon A Gatys, Alexander S Ecker, and Matthias Bethge. 2015. A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576 (2015).
[8]
Ross Girshick. 2015. Fast r-cnn Proceedings of the IEEE International Conference on Computer Vision. 1440--1448.
[9]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. 2672--2680.
[10]
Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Rezende, and Daan Wierstra. 2015. DRAW: A Recurrent Neural Network For Image Generation Proceedings of the 32nd International Conference on Machine Learning (ICML-15). 1462--1471.
[11]
Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A fast learning algorithm for deep belief nets. Neural computation, Vol. 18, 7 (2006), 1527--1554.
[12]
Geoffrey E Hinton, Michael Revow, and Peter Dayan. 1995. Recognizing handwritten digits using mixtures of linear models. Advances in neural information processing systems (1995), 1015--1022.
[13]
Geoffrey E Hinton and Ruslan R Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science, Vol. 313, 5786 (2006), 504--507.
[14]
Geoffrey E Hinton and Richard S Zemel. 1994. Autoencoders, minimum description length, and Helmholtz free energy. Advances in neural information processing systems (1994), 3--3.
[15]
Gary B. Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. 2007. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. Technical Report 07--49. University of Massachusetts, Amherst.
[16]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015).
[17]
Diederik P Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and Max Welling. 2014. Semi-supervised learning with deep generative models Advances in Neural Information Processing Systems. 3581--3589.
[18]
Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
[19]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks Advances in neural information processing systems. 1097--1105.
[20]
Neeraj Kumar, Alexander C Berg, Peter N Belhumeur, and Shree K Nayar. 2009. Attribute and simile classifiers for face verification Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 365--372.
[21]
Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, and Ole Winther. 2015. Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300 (2015).
[22]
Nicolas Le Roux, Nicolas Heess, Jamie Shotton, and John Winn. 2011. Learning a generative model of images by factoring appearance and shape. Neural Computation, Vol. 23, 3 (2011), 593--650.
[23]
Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y Ng. 2009. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th annual international conference on machine learning. ACM, 609--616.
[24]
Yujia Li, Kevin Swersky, and Richard Zemel. 2015. Generative moment matching networks. In International Conference on Machine Learning. 1718--1727.
[25]
Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. 2013. Rectifier nonlinearities improve neural network acoustic models Proc. ICML, Vol. Vol. 30.
[26]
Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, and Ian Goodfellow. 2015. Adversarial autoencoders. arXiv preprint arXiv:1511.05644 (2015).
[27]
Anand Mishra, Karteek Alahari, and CV Jawahar. 2012. Scene text recognition using higher order language priors BMVC 2012--23rd British Machine Vision Conference. BMVA.
[28]
Andrew Ng. 2011. Sparse autoencoder. CS294A Lecture notes Vol. 72 (2011), 1--19.
[29]
Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015).
[30]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks Advances in neural information processing systems. 91--99.
[31]
Eder Santana, Matthew Emigh, and Jose C Principe. 2016. Information Theoretic-Learning Auto-Encoder. arXiv preprint arXiv:1603.06653 (2016).
[32]
Baoguang Shi, Xiang Bai, and Cong Yao. 2015. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. arXiv preprint arXiv:1507.05717 (2015).
[33]
Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research Vol. 15, 1 (2014), 1929--1958.
[34]
Zhuowen Tu. 2007. Learning generative models via discriminative approaches 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1--8.
[35]
Dmitry Ulyanov, Vadim Lebedev, Andrea Vedaldi, and Victor Lempitsky. 2016. Texture Networks: Feed-forward Synthesis of Textures and Stylized Images. arXiv preprint arXiv:1603.03417 (2016).
[36]
Xinchen Yan, Jimei Yang, Kihyuk Sohn, and Honglak Lee. 2016. Attribute2image: Conditional image generation from visual attributes European Conference on Computer Vision. Springer, 776--791.

Cited By

View all
  • (2024)Evaluation Metrics for Intelligent Generation of Graphical Game Assets: A Systematic Survey-Based FrameworkIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.339899846:12(7998-8017)Online publication date: Dec-2024
  • (2024)On the Feasibility of Predicting Volumes of Fake News—The Spanish CaseIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.329709311:4(5230-5240)Online publication date: Aug-2024
  • (2023)Stroke-GAN Painter: Learning to paint artworks using stroke-style generative adversarial networksComputational Visual Media10.1007/s41095-022-0287-39:4(787-806)Online publication date: 11-Mar-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '17: Proceedings of the 25th ACM international conference on Multimedia
October 2017
2028 pages
ISBN:9781450349062
DOI:10.1145/3123266
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. autoencoder
  2. generative adversarial network
  3. image generation

Qualifiers

  • Research-article

Funding Sources

Conference

MM '17
Sponsor:
MM '17: ACM Multimedia Conference
October 23 - 27, 2017
California, Mountain View, USA

Acceptance Rates

MM '17 Paper Acceptance Rate 189 of 684 submissions, 28%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)31
  • Downloads (Last 6 weeks)3
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Evaluation Metrics for Intelligent Generation of Graphical Game Assets: A Systematic Survey-Based FrameworkIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.339899846:12(7998-8017)Online publication date: Dec-2024
  • (2024)On the Feasibility of Predicting Volumes of Fake News—The Spanish CaseIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.329709311:4(5230-5240)Online publication date: Aug-2024
  • (2023)Stroke-GAN Painter: Learning to paint artworks using stroke-style generative adversarial networksComputational Visual Media10.1007/s41095-022-0287-39:4(787-806)Online publication date: 11-Mar-2023
  • (2022)Feature Space of XRD Patterns Constructed by an AutoencoderAdvanced Theory and Simulations10.1002/adts.2022006136:2Online publication date: 18-Dec-2022
  • (2021)Spatially Constrained GAN for Face and Fashion Synthesis2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021)10.1109/FG52635.2021.9666991(01-08)Online publication date: 15-Dec-2021
  • (2020)End-to-End Text-to-Image Synthesis with Spatial ConstrainsACM Transactions on Intelligent Systems and Technology10.1145/339170911:4(1-19)Online publication date: 25-May-2020
  • (2019)Laser Engraver Control System based on Reinforcement Adversarial Learning2019 International Russian Automation Conference (RusAutoCon)10.1109/RUSAUTOCON.2019.8867762(1-5)Online publication date: Sep-2019
  • (2019)Joint Sketch-Attribute Learning for Fine-Grained Face SynthesisMultiMedia Modeling10.1007/978-3-030-37731-1_64(790-801)Online publication date: 24-Dec-2019
  • (2018)Sparsely Grouped Multi-Task Generative Adversarial Networks for Facial Attribute ManipulationProceedings of the 26th ACM international conference on Multimedia10.1145/3240508.3240594(392-401)Online publication date: 15-Oct-2018
  • (2018)Facial Expression Recognition in the WildProceedings of the 26th ACM international conference on Multimedia10.1145/3240508.3240574(126-135)Online publication date: 15-Oct-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media