Bimodal Neural Style Transfer for Image Generation Based on Text Prompts

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14035))

Included in the following conference series:

International Conference on Human-Computer Interaction

1264 Accesses
1 Citations

Abstract

Neural networks have become one of the essential areas in Artificial Intelligence due to their extraordinary capacity to address problems in different domains. This ability led to the proposal of novel architectures and models to tackle challenging tasks such as neural style transfer. We propose a novel methodology for bimodal style transfer using text as input. We initially retrieve one image and a short descriptive text, which are mapped into a multimodal common latent space. Then, a new image is retrieved using an image retrieval engine. Finally, we use a generative model, which allows us to create artistic images by combining content and style. The proposed system can retrieve semantically similar images concerning a descriptive text (prompt), achieving great precision rates in image retrieval applied to the SemArt dataset. The transfer style neural model also preserves the image’s high quality, combining style and content.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 71.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 89.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Article Google Scholar
Bugueño, M., Mendoza, M.: Learning to detect online harassment on twitter with the transformer. PKDD/ECML Workshops (2), 298–306 (2019)
Google Scholar
Castillo, S., et al.: Detection of bots and cyborgs in twitter: a study on the chilean presidential election in 2017. HCI (13), 311–323 (2019)
Google Scholar
Mendoza, M.: A new term-weighting scheme for naïve Bayes text categorization. Int. J. Web Inf. Syst. 8(1), 55–72 (2012)
Article Google Scholar
Aghajanyan, A., Shrivastava, A., Gupta, A., Goyal, N.: Luke Zettlemoyer. Better Fine-Tuning by Reducing Representational Collapse. ICLR, Sonal Gupta (2021)
Google Scholar
Paranjape, B., Michael, J., Ghazvininejad, M., Hajishirzi, H., Zettlemoyer, L.: Prompting contrastive explanations for commonsense reasoning tasks. ACL/IJCNLP (Findings), 4179–4192 (2021)
Google Scholar
Tampe, I., Mendoza, M., Milios, E.: Neural abstractive unsupervised summarization of online news discussions. IntelliSys (2), 822–841 (2021)
Google Scholar
Mendoza, M., Tesconi, M., Cresci, S.: Bots in social and interaction networks: detection and impact estimation. ACM Trans. Inf. Syst. 39(1), 5:1–5:32 (2020)
Google Scholar
Ulloa, G., Veloz, A., Allende-Cid, H., Monge, R., Allende, H.: Efficient methodology based on convolutional neural networks with augmented penalization on hard-to-classify boundary voxels on the task of brain lesion segmentation. MCPR, 338–347 (2022)
Google Scholar
Molina, G., et al.: A new content-based image retrieval system for SARS-CoV-2 computer-aided diagnosis. MICAD, 316–324 (2021)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML, pp. 8748–8763 (2021)
Google Scholar
Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A.: Mark Chen, pp. 8821–8831. Zero-Shot Text-to-Image Generation. ICML, Ilya Sutskever (2021)
Google Scholar
Mery, D., Filbert, D.: Automated flaw detection in aluminum castings based on the tracking of potential defects in a radioscopic image sequence. IEEE Trans. Robot. Autom. 18(6), 890–901 (2002)
Article Google Scholar
Saavedra, D., Banerjee, S., Mery, D.: Detection of threat objects in baggage inspection with X-ray images using deep learning. Neural Comput. Appl. 33(13), 7803–7819 (2021)
Article Google Scholar
Duan, Y., Andrychowicz, M., Stadie, B.C., Ho, J., Schneider, J., Sutskever, I.: Pieter Abbeel, pp. 1087–1098. One-Shot Imitation Learning. NIPS, Wojciech Zaremba (2017)
Google Scholar
Nichol, A.Q., et al.: GLIDE: towards photorealistic image generation and editing with text-guided diffusion models. In: ICML, pp. 16784–16804 (2022)
Google Scholar
Diederik, P.: Kingma, max welling: an introduction to variational autoencoders. Found. Trends Mach. Learn. 12(4), 307–392 (2019)
Article Google Scholar
Ian, J. et al.: Generative adversarial nets. In: NIPS, pp. 2672–2680 (2014)
Google Scholar
Zhu, J.-Y., Park, T., Isola, P., Alexei A.: EFROS: unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV, pp. 2242–2251 (2017)
Google Scholar
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR (Poster) (2016)
Google Scholar
Jiang, Y., et al.: SimGAN: hybrid simulator identification for domain adaptation via adversarial reinforcement learning. In: ICRA, pp. 2884–2890 (2021)
Google Scholar
Gatys, L.A., Ecker, A.S., Bethge, M., Hertzmann, A., Shechtman, E.: Controlling perceptual factors in neural style transfer. In: CVPR, pp. 3730–3738 (2017)
Google Scholar
Jin, D., Jin, Z., Zhiting, H., Vechtomova, O., Mihalcea, R.: Deep learning for text style transfer: a survey. Comput. Linguist. 48(1), 155–205 (2022)
Article Google Scholar
Garcia, N., Vogiatzis, G.: How to read paintings: semantic art understanding with multi-modal retrieval. ECCV Workshops (2), 676–691 (2018)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Google Scholar

Download references

Acknowledgements

The authors acknowledge funding support from the Millennium Institute for Foundational Research on Data (IMFD ANID - Millennium Science Initiative Program - Code ICN17_002) and the National Center of Artificial Intelligence (CENIA FB210017, Basal ANID). Marcelo Mendoza was funded by the National Agency of Research and Development (ANID) grant FONDECYT 1200211. The founders played no role in the design of this study.

Author information

Authors and Affiliations

Department of Informatics, Universidad Técnica Federico Santa María, Av. Vicuña Mackenna 3939, Santiago, Chile
Diego Gutiérrez
Department of Computer Science, Pontificia Universidad Católica de Chile, Av. Vicuña Mackenna 6840, Santiago, Chile
Marcelo Mendoza

Authors

Diego Gutiérrez
View author publications
You can also search for this author in PubMed Google Scholar
Marcelo Mendoza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcelo Mendoza .

Editor information

Editors and Affiliations

Eindhoven University of Technology, Eindhoven, The Netherlands
Matthias Rauterberg

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gutiérrez, D., Mendoza, M. (2023). Bimodal Neural Style Transfer for Image Generation Based on Text Prompts. In: Rauterberg, M. (eds) Culture and Computing. HCII 2023. Lecture Notes in Computer Science, vol 14035. Springer, Cham. https://doi.org/10.1007/978-3-031-34732-0_29

Download citation

DOI: https://doi.org/10.1007/978-3-031-34732-0_29
Published: 09 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34731-3
Online ISBN: 978-3-031-34732-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics