[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3240508.3240587acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training

Published: 15 October 2018 Publication History

Abstract

Automatic generation of natural language from images has attracted extensive attention. In this paper, we take one step further to investigate generation of poetic language (with multiple lines) to an image for automatic poetry creation. This task involves multiple challenges, including discovering poetic clues from the image (e.g., hope from green), and generating poems to satisfy both relevance to the image and poeticness in language level. To solve the above challenges, we formulate the task of poem generation into two correlated sub-tasks by multi-adversarial training via policy gradient, through which the cross-modal relevance and poetic language style can be ensured. To extract poetic clues from images, we propose to learn a deep coupled visual-poetic embedding, in which the poetic representation from objects, sentiments \footnoteWe consider both adjectives and verbs that can express emotions and feelings as sentiment words in this research. and scenes in an image can be jointly learned. Two discriminative networks are further introduced to guide the poem generation, including a multi-modal discriminator and a poem-style discriminator. To facilitate the research, we have released two poem datasets by human annotators with two distinct properties: 1) the first human annotated image-to-poem pair dataset (with $8,292$ pairs in total), and 2) to-date the largest public English poem corpus dataset (with $92,265$ different poems in total). Extensive experiments are conducted with 8K images, among which 1.5K image are randomly picked for evaluation. Both objective and subjective evaluations show the superior performances against the state-of-the-art methods for poem generation from images. Turing test carried out with over $500$ human subjects, among which 30 evaluators are poetry experts, demonstrates the effectiveness of our approach.

References

[1]
Tseng-Hung Chen, Yuan-Hong Liao, Ching-Yao Chuang, Wan-Ting Hsu, Jianlong Fu, and Min Sun. 2017. Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner. ICCV (2017), 521--530.
[2]
Xinlei Chen and C Lawrence Zitnick. 2015. Mind's eye: A recurrent visual representation for image caption generation. In CVPR . 2422--2431.
[3]
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. NIPS 2014 Workshop on Deep Learning (2014).
[4]
Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh K Srivastava, Li Deng, Piotr Dollár, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John C Platt, et almbox. 2015. From captions to visual concepts and back. In CVPR. 1473--1482.
[5]
Ali Farhadi, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, and David Forsyth. 2010. Every picture tells a story: Generating sentences from images. In ECCV. 15--29.
[6]
Andrea Frome, Greg S Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Tomas Mikolov, et almbox. 2013. Devise: A deep visual-semantic embedding model. In NIPS. 2121--2129.
[7]
Jianlong Fu, Heliang Zheng, and Tao Mei. 2017. Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition. In CVPR. 4438--4446.
[8]
Marjan Ghazvininejad, Xing Shi, Yejin Choi, and Kevin Knight. 2016. Generating Topical Poetry. In EMNLP. 1183--1191.
[9]
Marjan Ghazvininejad, Xing Shi, Jay Priyadarshi, and Kevin Knight. 2017. Hafez: an Interactive Poetry Generation System. ACL (2017), 43--48.
[10]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In NIPS. 2672--2680.
[11]
Jing He, Ming Zhou, and Long Jiang. 2012. Generating Chinese Classical Poems with Statistical Machine Translation Models. In AAAI .
[12]
Jack Hopkins and Douwe Kiela. 2017. Automatically Generating Rhythmic Verse with Neural Networks. In ACL, Vol. 1. 168--178.
[13]
Long Jiang and Ming Zhou. 2008. Generating Chinese couplets using a statistical MT approach. In COLING. 377--384.
[14]
Andrej Karpathy, Armand Joulin, and Fei Fei F Li. 2014. Deep fragment embeddings for bidirectional image sentence mapping. In NIPS . 1889--1897.
[15]
Ryan Kiros, Ruslan Salakhutdinov, and Richard S Zemel. 2014. Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539 (2014).
[16]
Ryan Kiros, Yukun Zhu, Ruslan R Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Skip-thought vectors. In NIPS. 3294--3302.
[17]
Jonathan Krause, Justin Johnson, Ranjay Krishna, and Li Fei-Fei. 2017. A hierarchical approach for generating descriptive image paragraphs. CVPR (2017), 3337--3345.
[18]
Girish Kulkarni, Visruth Premraj, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C Berg, and Tamara L Berg. 2011. Baby talk: Understanding and generating image descriptions. In CVPR . 1601--1608.
[19]
Yu Liu, Jianlong Fu, Tao Mei, and Chang Wen Chen. 2017. Let Your Photos Talk: Generating Narrative Paragraph for Photo Stream via Bidirectional Attention Recurrent Neural Networks. In AAAI .
[20]
Hisar Maruli Manurung. 1999. A chart generator for rhythm patterned text. In Proceedings of the First International Workshop on Literature in Cognition and Computer. 15--19.
[21]
Hugo Oliveira. 2009. Automatic generation of poetry: an overview. Universidade de Coimbra (2009).
[22]
Hugo Goncc alo Oliveira. 2012. PoeTryMe: a versatile platform for poetry generation. Computational Creativity, Concept Invention, and General Intelligence, Vol. 1 (2012), 21.
[23]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In ACL. 311--318.
[24]
Cesc C Park and Gunhee Kim. 2015. Expressing an image stream with a sequence of natural sentences. In NIPS . 73--81.
[25]
Steven J Rennie, Etienne Marcheret, Youssef Mroueh, Jarret Ross, and Vaibhava Goel. 2017. Self-critical sequence training for image captioning. In CVPR, Vol. 1. 3.
[26]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et almbox. 2015. Imagenet large scale visual recognition challenge. IJCV, Vol. 115, 3 (2015), 211--252.
[27]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[28]
Kristina Toutanova, Dan Klein, Christopher D Manning, and Yoram Singer. 2003. Feature-rich part-of-speech tagging with a cyclic dependency network. In HLT-NAACL . 173--180.
[29]
Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. In CVPR. 3156--3164.
[30]
Jingwen Wang, Jianlong Fu, Yong Xu, and Tao Mei. 2016. Beyond Object Recognition: Visual Sentiment Analysis with Deep Coupled Adjective and Noun Neural Networks. In IJCAI. 3484--3490.
[31]
Limin Wang, Sheng Guo, Weilin Huang, and Yu Qiao. 2015. Places205-vggnet models for scene recognition. arXiv preprint arXiv:1508.01667 (2015).
[32]
Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, Vol. 8, 3--4 (1992), 229--256.
[33]
Linli Xu, Liang Jiang, Chuan Qin, Zhe Wang, and Dongfang Du. 2018. How Images Inspire Poems: Generating Classical Chinese Poetry from Images with Memory Networks. In AAAI .
[34]
Zhen Xu, Bingquan Liu, Baoxun Wang, SUN Chengjie, Xiaolong Wang, Zhuoran Wang, and Chao Qi. 2017. Neural Response Generation via GAN with an Approximate Embedding Layer. In EMNLP . 628--637.
[35]
Rui Yan, Han Jiang, Mirella Lapata, Shou-De Lin, Xueqiang Lv, and Xiaoming Li. 2013. i, Poet: Automatic Chinese Poetry Composition through a Generative Summarization Framework under Constrained Optimization. In IJCAI . 2197--2203.
[36]
Ting Yao, Yingwei Pan, Yehao Li, Zhaofan Qiu, and Tao Mei. 2017. Boosting image captioning with attributes. In IEEE International Conference on Computer Vision, ICCV. 22--29.
[37]
Xiaoyuan Yi, Ruoyu Li, and Maosong Sun. 2017. Generating Chinese Classical Poems with RNN Encoder-Decoder. In Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data . 211--223.
[38]
Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and Jiebo Luo. 2016. Image captioning with semantic attention. In CVPR. 4651--4659.
[39]
Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. 2017. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. In AAAI . 2852--2858.
[40]
Wojciech Zaremba and Ilya Sutskever. 2015. Reinforcement Learning Neural Turing Machines-Revised. arXiv preprint arXiv:1505.00521 (2015).
[41]
Xingxing Zhang and Mirella Lapata. 2014. Chinese Poetry Generation with Recurrent Neural Networks. In EMNLP. 670--680.

Cited By

View all
  • (2024)Some Determinations as to Whether or Not Academic Texts Are Produced by Artificial IntelligenceInterdisciplinary Themes of Sociolinguistic Studies [Working Title]10.5772/intechopen.1007724Online publication date: 9-Dec-2024
  • (2024)MMT-benchProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694429(57116-57198)Online publication date: 21-Jul-2024
  • (2024)Automatic Generation of Multimedia Teaching Materials Based on Generative AI: Taking Tang Poetry as an ExampleIEEE Transactions on Learning Technologies10.1109/TLT.2024.337827917(1353-1366)Online publication date: 18-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '18: Proceedings of the 26th ACM international conference on Multimedia
October 2018
2167 pages
ISBN:9781450356657
DOI:10.1145/3240508
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2018

Permissions

Request permissions for this article.

Check for updates

Badges

  • Best Paper

Author Tags

  1. adversarial training
  2. image
  3. poetry generation

Qualifiers

  • Research-article

Funding Sources

Conference

MM '18
Sponsor:
MM '18: ACM Multimedia Conference
October 22 - 26, 2018
Seoul, Republic of Korea

Acceptance Rates

MM '18 Paper Acceptance Rate 209 of 757 submissions, 28%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)79
  • Downloads (Last 6 weeks)7
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Some Determinations as to Whether or Not Academic Texts Are Produced by Artificial IntelligenceInterdisciplinary Themes of Sociolinguistic Studies [Working Title]10.5772/intechopen.1007724Online publication date: 9-Dec-2024
  • (2024)MMT-benchProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694429(57116-57198)Online publication date: 21-Jul-2024
  • (2024)Automatic Generation of Multimedia Teaching Materials Based on Generative AI: Taking Tang Poetry as an ExampleIEEE Transactions on Learning Technologies10.1109/TLT.2024.337827917(1353-1366)Online publication date: 18-Mar-2024
  • (2024)Understanding GANs: fundamentals, variants, training challenges, applications, and open problemsMultimedia Tools and Applications10.1007/s11042-024-19361-yOnline publication date: 14-May-2024
  • (2024)Combining Image Caption and Aesthetic Description Using Siamese NetworkProceedings of the 13th International Conference on Computer Engineering and Networks10.1007/978-981-99-9239-3_4(41-51)Online publication date: 4-Jan-2024
  • (2023)Machine Visions: Mapping Depictions of Machine Vision through AI Image SynthesisOpen Library of Humanities10.16995/olh.100779:2Online publication date: 5-Sep-2023
  • (2023)Prose2Poem: The Blessing of Transformers in Translating Prose to Persian PoetryACM Transactions on Asian and Low-Resource Language Information Processing10.1145/359279122:6(1-18)Online publication date: 14-Apr-2023
  • (2023)A Review on Generative Adversarial Networks: Algorithms, Theory, and ApplicationsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.313019135:4(3313-3332)Online publication date: 1-Apr-2023
  • (2023)Anchor-Based Detection for Natural Language Localization in Ego-Centric Videos2023 IEEE International Conference on Consumer Electronics (ICCE)10.1109/ICCE56470.2023.10043460(01-04)Online publication date: 6-Jan-2023
  • (2023)Psychological factors underlying attitudes toward AI toolsNature Human Behaviour10.1038/s41562-023-01734-27:11(1845-1854)Online publication date: 20-Nov-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media