More Web Proxy on the site http://driver.im/

research-article

Stylized Adversarial AutoEncoder for Image Generation

Authors:

Jianqiang Huang,

Xian-Sheng HuaAuthors Info & Claims

MM '17: Proceedings of the 25th ACM international conference on Multimedia

Pages 244 - 251

https://doi.org/10.1145/3123266.3123450

Published: 19 October 2017 Publication History

Abstract

In this paper, we propose an autoencoder-based generative adversarial network (GAN) for automatic image generation, which is called "stylized adversarial autoencoder". Different from existing generative autoencoders which typically impose a prior distribution over the latent vector, the proposed approach splits the latent variable into two components: style feature and content feature, both encoded from real images. The split of the latent vector enables us adjusting the content and the style of the generated image arbitrarily by choosing different exemplary images. In addition, a multiclass classifier is adopted in the GAN network as the discriminator, which makes the generated images more realistic. We performed experiments on hand-writing digits, scene text and face datasets, in which the stylized adversarial autoencoder achieves superior results for image generation as well as remarkably improves the corresponding supervised recognition task.

References

[1]

Yoshua Bengio, Ian J Goodfellow, and Aaron Courville. 2015. Deep learning. An MIT Press book in preparation. Draft chapters available at http://www. iro. umontreal. ca/ bengioy/dlbook (2015).

[2]

Yoshua Bengio, Eric Laufer, Guillaume Alain, and Jason Yosinski. 2014. Deep Generative Stochastic Networks Trainable by Backprop Proceedings of the 31st International Conference on Machine Learning (ICML-14). 226--234.

Digital Library

[3]

Yoshua Bengio, Grégoire Mesnil, Yann Dauphin, and Salah Rifai. 2013. Better Mixing via Deep Representations. In ICML (1). 552--560.

Digital Library

[4]

Olivier Breuleux, Yoshua Bengio, and Pascal Vincent. 2011. Quickly generating representative samples from an rbm-derived process. Neural Computation, Vol. 23, 8 (2011), 2058--2073.

Digital Library

[5]

Emily L Denton, Soumith Chintala, Rob Fergus, and others. 2015. Deep Generative Image Models using aïji Laplacian Pyramid of Adversarial Networks Advances in neural information processing systems. 1486--1494.

Digital Library

[6]

Alexey Dosovitskiy, Jost Tobias Springenberg, and Thomas Brox. 2015. Learning to generate chairs with convolutional neural networks Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1538--1546.

[7]

Leon A Gatys, Alexander S Ecker, and Matthias Bethge. 2015. A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576 (2015).

[8]

Ross Girshick. 2015. Fast r-cnn Proceedings of the IEEE International Conference on Computer Vision. 1440--1448.

Digital Library

[9]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. 2672--2680.

Digital Library

[10]

Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Rezende, and Daan Wierstra. 2015. DRAW: A Recurrent Neural Network For Image Generation Proceedings of the 32nd International Conference on Machine Learning (ICML-15). 1462--1471.

Digital Library

[11]

Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A fast learning algorithm for deep belief nets. Neural computation, Vol. 18, 7 (2006), 1527--1554.

Digital Library

[12]

Geoffrey E Hinton, Michael Revow, and Peter Dayan. 1995. Recognizing handwritten digits using mixtures of linear models. Advances in neural information processing systems (1995), 1015--1022.

Digital Library

[13]

Geoffrey E Hinton and Ruslan R Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science, Vol. 313, 5786 (2006), 504--507.

[14]

Geoffrey E Hinton and Richard S Zemel. 1994. Autoencoders, minimum description length, and Helmholtz free energy. Advances in neural information processing systems (1994), 3--3.

Digital Library

[15]

Gary B. Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. 2007. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. Technical Report 07--49. University of Massachusetts, Amherst.

[16]

Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015).

Digital Library

[17]

Diederik P Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and Max Welling. 2014. Semi-supervised learning with deep generative models Advances in Neural Information Processing Systems. 3581--3589.

Digital Library

[18]

Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).

[19]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks Advances in neural information processing systems. 1097--1105.

Digital Library

[20]

Neeraj Kumar, Alexander C Berg, Peter N Belhumeur, and Shree K Nayar. 2009. Attribute and simile classifiers for face verification Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 365--372.

[21]

Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, and Ole Winther. 2015. Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300 (2015).

[22]

Nicolas Le Roux, Nicolas Heess, Jamie Shotton, and John Winn. 2011. Learning a generative model of images by factoring appearance and shape. Neural Computation, Vol. 23, 3 (2011), 593--650.

Digital Library

[23]

Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y Ng. 2009. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th annual international conference on machine learning. ACM, 609--616.

Digital Library

[24]

Yujia Li, Kevin Swersky, and Richard Zemel. 2015. Generative moment matching networks. In International Conference on Machine Learning. 1718--1727.

Digital Library

[25]

Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. 2013. Rectifier nonlinearities improve neural network acoustic models Proc. ICML, Vol. Vol. 30.

[26]

Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, and Ian Goodfellow. 2015. Adversarial autoencoders. arXiv preprint arXiv:1511.05644 (2015).

[27]

Anand Mishra, Karteek Alahari, and CV Jawahar. 2012. Scene text recognition using higher order language priors BMVC 2012--23rd British Machine Vision Conference. BMVA.

[28]

Andrew Ng. 2011. Sparse autoencoder. CS294A Lecture notes Vol. 72 (2011), 1--19.

[29]

Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015).

[30]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks Advances in neural information processing systems. 91--99.

Digital Library

[31]

Eder Santana, Matthew Emigh, and Jose C Principe. 2016. Information Theoretic-Learning Auto-Encoder. arXiv preprint arXiv:1603.06653 (2016).

[32]

Baoguang Shi, Xiang Bai, and Cong Yao. 2015. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. arXiv preprint arXiv:1507.05717 (2015).

[33]

Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research Vol. 15, 1 (2014), 1929--1958.

Digital Library

[34]

Zhuowen Tu. 2007. Learning generative models via discriminative approaches 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1--8.

[35]

Dmitry Ulyanov, Vadim Lebedev, Andrea Vedaldi, and Victor Lempitsky. 2016. Texture Networks: Feed-forward Synthesis of Textures and Stylized Images. arXiv preprint arXiv:1603.03417 (2016).

[36]

Xinchen Yan, Jimei Yang, Kihyuk Sohn, and Honglak Lee. 2016. Attribute2image: Conditional image generation from visual attributes European Conference on Computer Vision. Springer, 776--791.

Cited By

Fukaya KDaylamani-Zad DAgius H(2024)Evaluation Metrics for Intelligent Generation of Graphical Game Assets: A Systematic Survey-Based FrameworkIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.339899846:12(7998-8017)Online publication date: Dec-2024
https://doi.org/10.1109/TPAMI.2024.3398998
Ibañez-Lissen LGonzález-Manzano Lde Fuentes JGoyanes M(2024)On the Feasibility of Predicting Volumes of Fake News—The Spanish CaseIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.329709311:4(5230-5240)Online publication date: Aug-2024
https://doi.org/10.1109/TCSS.2023.3297093
Wang QGuo CDai HLi P(2023)Stroke-GAN Painter: Learning to paint artworks using stroke-style generative adversarial networksComputational Visual Media10.1007/s41095-022-0287-39:4(787-806)Online publication date: 11-Mar-2023
https://doi.org/10.1007/s41095-022-0287-3
Show More Cited By

Index Terms

Stylized Adversarial AutoEncoder for Image Generation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning
    1. Machine learning approaches

Recommendations

A Method for Face Image Inpainting Based on Autoencoder and Generative Adversarial Network
Image and Video Technology
Abstract
Face image inpainting has great value in the fields of computer vision and digital image processing. In this paper, we propose a face image inpainting method based on autoencoder and Generative Adversarial Network (GAN). The neural network for ...
A review on Generative Adversarial Networks for image generation
Abstract
Generative Adversarial Networks (GANs) are a type of deep learning architecture that uses two networks namely a generator and a discriminator that, by competing against each other, pursue to create realistic but previously unseen samples. They ...
Graphical abstract

Display Omitted
Highlights
- A review on GANs for image generation, aiming at readers who are new to the area.
- A comprehensive overview of GAN fundamentals, and methods to address the most common issues.
- A detailed explanation of how various works applied GANs ...
A Deep Convolution Generative Adversarial Network for the Production of Images of Human Faces
Intelligent Information and Database Systems
Abstract
Generative models get huge attention by researchers in different topics of artificial intelligence applications, especially generative adversarial networks (GANs) which have demonstrated good performance in data generation. In this paper, we would ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '17: Proceedings of the 25th ACM international conference on Multimedia

October 2017

2028 pages

ISBN:9781450349062

DOI:10.1145/3123266

General Chairs:
Qiong Liu
FXPAL, USA
,
Rainer Lienhart
Universität Augsburg, Germany
,
Haohong Wang
TCL America, USA
,
Program Chairs:
Sheng-Wei "Kuan-Ta" Chen
Academia Sinica, Taiwan
,
Susanne Boll
University of Oldenburg, Germany
,
Phoebe Chen
La Trobe University, Australia
,
Gerald Friedland
Lawrence Livermore National Lab, USA
,
Jia Li
Google, USA
,
Shuicheng Yan
Qihoo 360, China

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
the 863 National High Technology Research and Development Program of China
the Basic Research Project of Innovation Action Plan
the Major Basic Research Program

Conference

MM '17

Sponsor:

SIGMM

MM '17: ACM Multimedia Conference

October 23 - 27, 2017

California, Mountain View, USA

Acceptance Rates

MM '17 Paper Acceptance Rate 189 of 684 submissions, 28%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
850
Total Downloads

Downloads (Last 12 months)31
Downloads (Last 6 weeks)3

Reflects downloads up to 12 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Fukaya KDaylamani-Zad DAgius H(2024)Evaluation Metrics for Intelligent Generation of Graphical Game Assets: A Systematic Survey-Based FrameworkIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.339899846:12(7998-8017)Online publication date: Dec-2024
https://doi.org/10.1109/TPAMI.2024.3398998
Ibañez-Lissen LGonzález-Manzano Lde Fuentes JGoyanes M(2024)On the Feasibility of Predicting Volumes of Fake News—The Spanish CaseIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.329709311:4(5230-5240)Online publication date: Aug-2024
https://doi.org/10.1109/TCSS.2023.3297093
Wang QGuo CDai HLi P(2023)Stroke-GAN Painter: Learning to paint artworks using stroke-style generative adversarial networksComputational Visual Media10.1007/s41095-022-0287-39:4(787-806)Online publication date: 11-Mar-2023
https://doi.org/10.1007/s41095-022-0287-3
Utimula KYano MKimoto HHongo KNakano KMaezono R(2022)Feature Space of XRD Patterns Constructed by an AutoencoderAdvanced Theory and Simulations10.1002/adts.2022006136:2Online publication date: 18-Dec-2022
https://doi.org/10.1002/adts.202200613
Jiang SLiu HWu YFu Y(2021)Spatially Constrained GAN for Face and Fashion Synthesis2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021)10.1109/FG52635.2021.9666991(01-08)Online publication date: 15-Dec-2021
https://doi.org/10.1109/FG52635.2021.9666991
Wang MLang CLiang LFeng SWang TGao Y(2020)End-to-End Text-to-Image Synthesis with Spatial ConstrainsACM Transactions on Intelligent Systems and Technology10.1145/339170911:4(1-19)Online publication date: 25-May-2020
https://dl.acm.org/doi/10.1145/3391709
Nikolaev E(2019)Laser Engraver Control System based on Reinforcement Adversarial Learning2019 International Russian Automation Conference (RusAutoCon)10.1109/RUSAUTOCON.2019.8867762(1-5)Online publication date: Sep-2019
https://doi.org/10.1109/RUSAUTOCON.2019.8867762
Yang BChen XHong RChen ZLi YZha Z(2019)Joint Sketch-Attribute Learning for Fine-Grained Face SynthesisMultiMedia Modeling10.1007/978-3-030-37731-1_64(790-801)Online publication date: 24-Dec-2019
https://doi.org/10.1007/978-3-030-37731-1_64
Zhang JShu YXu SCao GZhong FLiu MQin XBoll SMu Lee KLuo JZhu WByun HWen Chen CLienhart RMei T(2018)Sparsely Grouped Multi-Task Generative Adversarial Networks for Facial Attribute ManipulationProceedings of the 26th ACM international conference on Multimedia10.1145/3240508.3240594(392-401)Online publication date: 15-Oct-2018
https://dl.acm.org/doi/10.1145/3240508.3240594
Zhang FZhang TMao QDuan LXu CBoll SMu Lee KLuo JZhu WByun HWen Chen CLienhart RMei T(2018)Facial Expression Recognition in the WildProceedings of the 26th ACM international conference on Multimedia10.1145/3240508.3240574(126-135)Online publication date: 15-Oct-2018
https://dl.acm.org/doi/10.1145/3240508.3240574

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents