Abstract
Neural networks have become one of the essential areas in Artificial Intelligence due to their extraordinary capacity to address problems in different domains. This ability led to the proposal of novel architectures and models to tackle challenging tasks such as neural style transfer. We propose a novel methodology for bimodal style transfer using text as input. We initially retrieve one image and a short descriptive text, which are mapped into a multimodal common latent space. Then, a new image is retrieved using an image retrieval engine. Finally, we use a generative model, which allows us to create artistic images by combining content and style. The proposed system can retrieve semantically similar images concerning a descriptive text (prompt), achieving great precision rates in image retrieval applied to the SemArt dataset. The transfer style neural model also preserves the image’s high quality, combining style and content.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Bugueño, M., Mendoza, M.: Learning to detect online harassment on twitter with the transformer. PKDD/ECML Workshops (2), 298–306 (2019)
Castillo, S., et al.: Detection of bots and cyborgs in twitter: a study on the chilean presidential election in 2017. HCI (13), 311–323 (2019)
Mendoza, M.: A new term-weighting scheme for naïve Bayes text categorization. Int. J. Web Inf. Syst. 8(1), 55–72 (2012)
Aghajanyan, A., Shrivastava, A., Gupta, A., Goyal, N.: Luke Zettlemoyer. Better Fine-Tuning by Reducing Representational Collapse. ICLR, Sonal Gupta (2021)
Paranjape, B., Michael, J., Ghazvininejad, M., Hajishirzi, H., Zettlemoyer, L.: Prompting contrastive explanations for commonsense reasoning tasks. ACL/IJCNLP (Findings), 4179–4192 (2021)
Tampe, I., Mendoza, M., Milios, E.: Neural abstractive unsupervised summarization of online news discussions. IntelliSys (2), 822–841 (2021)
Mendoza, M., Tesconi, M., Cresci, S.: Bots in social and interaction networks: detection and impact estimation. ACM Trans. Inf. Syst. 39(1), 5:1–5:32 (2020)
Ulloa, G., Veloz, A., Allende-Cid, H., Monge, R., Allende, H.: Efficient methodology based on convolutional neural networks with augmented penalization on hard-to-classify boundary voxels on the task of brain lesion segmentation. MCPR, 338–347 (2022)
Molina, G., et al.: A new content-based image retrieval system for SARS-CoV-2 computer-aided diagnosis. MICAD, 316–324 (2021)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML, pp. 8748–8763 (2021)
Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A.: Mark Chen, pp. 8821–8831. Zero-Shot Text-to-Image Generation. ICML, Ilya Sutskever (2021)
Mery, D., Filbert, D.: Automated flaw detection in aluminum castings based on the tracking of potential defects in a radioscopic image sequence. IEEE Trans. Robot. Autom. 18(6), 890–901 (2002)
Saavedra, D., Banerjee, S., Mery, D.: Detection of threat objects in baggage inspection with X-ray images using deep learning. Neural Comput. Appl. 33(13), 7803–7819 (2021)
Duan, Y., Andrychowicz, M., Stadie, B.C., Ho, J., Schneider, J., Sutskever, I.: Pieter Abbeel, pp. 1087–1098. One-Shot Imitation Learning. NIPS, Wojciech Zaremba (2017)
Nichol, A.Q., et al.: GLIDE: towards photorealistic image generation and editing with text-guided diffusion models. In: ICML, pp. 16784–16804 (2022)
Diederik, P.: Kingma, max welling: an introduction to variational autoencoders. Found. Trends Mach. Learn. 12(4), 307–392 (2019)
Ian, J. et al.: Generative adversarial nets. In: NIPS, pp. 2672–2680 (2014)
Zhu, J.-Y., Park, T., Isola, P., Alexei A.: EFROS: unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV, pp. 2242–2251 (2017)
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR (Poster) (2016)
Jiang, Y., et al.: SimGAN: hybrid simulator identification for domain adaptation via adversarial reinforcement learning. In: ICRA, pp. 2884–2890 (2021)
Gatys, L.A., Ecker, A.S., Bethge, M., Hertzmann, A., Shechtman, E.: Controlling perceptual factors in neural style transfer. In: CVPR, pp. 3730–3738 (2017)
Jin, D., Jin, Z., Zhiting, H., Vechtomova, O., Mihalcea, R.: Deep learning for text style transfer: a survey. Comput. Linguist. 48(1), 155–205 (2022)
Garcia, N., Vogiatzis, G.: How to read paintings: semantic art understanding with multi-modal retrieval. ECCV Workshops (2), 676–691 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Acknowledgements
The authors acknowledge funding support from the Millennium Institute for Foundational Research on Data (IMFD ANID - Millennium Science Initiative Program - Code ICN17_002) and the National Center of Artificial Intelligence (CENIA FB210017, Basal ANID). Marcelo Mendoza was funded by the National Agency of Research and Development (ANID) grant FONDECYT 1200211. The founders played no role in the design of this study.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gutiérrez, D., Mendoza, M. (2023). Bimodal Neural Style Transfer for Image Generation Based on Text Prompts. In: Rauterberg, M. (eds) Culture and Computing. HCII 2023. Lecture Notes in Computer Science, vol 14035. Springer, Cham. https://doi.org/10.1007/978-3-031-34732-0_29
Download citation
DOI: https://doi.org/10.1007/978-3-031-34732-0_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34731-3
Online ISBN: 978-3-031-34732-0
eBook Packages: Computer ScienceComputer Science (R0)