[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3586183.3606725acmconferencesArticle/Chapter ViewAbstractPublication PagesuistConference Proceedingsconference-collections
research-article

Promptify: Text-to-Image Generation through Interactive Prompt Exploration with Large Language Models

Published: 29 October 2023 Publication History

Abstract

Text-to-image generative models have demonstrated remarkable capabilities in generating high-quality images based on textual prompts. However, crafting prompts that accurately capture the user’s creative intent remains challenging. It often involves laborious trial-and-error procedures to ensure that the model interprets the prompts in alignment with the user’s intention. To address these challenges, we present Promptify, an interactive system that supports prompt exploration and refinement for text-to-image generative models. Promptify utilizes a suggestion engine powered by large language models to help users quickly explore and craft diverse prompts. Our interface allows users to organize the generated images flexibly, and based on their preferences, Promptify suggests potential changes to the original prompt. This feedback loop enables users to iteratively refine their prompts and enhance desired features while avoiding unwanted ones. Our user study shows that Promptify effectively facilitates the text-to-image workflow, allowing users to create visually appealing images on their first attempt while requiring significantly less cognitive load than a widely-used baseline tool.

Supplemental Material

ZIP File
Supplemental File

References

[1]
2022. Automatic1111 Extensions. https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Extensions
[2]
2022. Promptgen. https://github.com/AUTOMATIC1111/stable-diffusion-webui-promptgen
[3]
2023. CLIP. https://github.com/openai/CLIP
[4]
2023. Lexica. https://lexica.art/
[5]
2023. MagicPrompt-Stable-Diffusion. https://huggingface.co/spaces/Gustavosta/MagicPrompt-Stable-Diffusion
[6]
2023. Midjourney. https://www.midjourney.com/home/?callbackUrl=%2Fapp%2F
[7]
2023. OpenAI API. https://openai.com/blog/openai-api
[8]
2023. React Flow. https://reactflow.dev/
[9]
2023. SentenceTransformers. https://www.sbert.net/
[10]
Maneesh Agrawala. 2023. Unpredictable Black Boxes are Terrible Interfaces. https://magrawala.substack.com/p/unpredictable-black-boxes-are-terrible
[11]
automatic1111. 2022. Stable Diffusion Web UI. https://github.com/AUTOMATIC1111/stable-diffusion-webui.
[12]
Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural language processing with Python: analyzing text with the natural language toolkit. " O’Reilly Media, Inc.".
[13]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. arxiv:2005.14165 [cs.CL]
[14]
Siddhartha Chaudhuri, Evangelos Kalogerakis, Stephen Giguere, and Thomas Funkhouser. 2013. Attribit: Content Creation with Semantic Attributes. In Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology (St. Andrews, Scotland, United Kingdom) (UIST ’13). Association for Computing Machinery, New York, NY, USA, 193–202. https://doi.org/10.1145/2501988.2502008
[15]
cmdr2. 2022. Easy Diffusion. https://github.com/cmdr2/stable-diffusion-ui.
[16]
Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, and Pete Florence. 2023. PaLM-E: An Embodied Multimodal Language Model. In arXiv preprint arXiv:2303.03378.
[17]
Noyan Evirgen and Xiang 'Anthony' Chen. 2022. GANzilla: User-Driven Direction Discovery in Generative Adversarial Networks. In The 35th Annual ACM Symposium on User Interface Software and Technology. ACM. https://doi.org/10.1145/3526113.3545638
[18]
Noyan Evirgen and Xiang’Anthony’ Chen. 2023. GANravel: User-Driven Direction Disentanglement in Generative Adversarial Networks. arXiv preprint arXiv:2302.00079 (2023).
[19]
Brendan J. Frey and Delbert Dueck. 2007. Clustering by Passing Messages Between Data Points. Science 315, 5814 (2007), 972–976. https://doi.org/10.1126/science.1136800 arXiv:https://www.science.org/doi/pdf/10.1126/science.1136800
[20]
Sandra G Hart and LE Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research.[W]: PA Hancock, N. Meshkati (Eds.): Human Mental Workload.
[21]
Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. 2022. Prompt-to-prompt image editing with cross attention control. (2022).
[22]
Hyung-Kwon Ko, Gwanmo Park, Hyeon Jeon, Jaemin Jo, Juho Kim, and Jinwook Seo. 2022. Large-scale Text-to-Image Generation Models for Visual Artists’ Creative Works. ArXiv abs/2210.08477 (2022).
[23]
Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. 2022. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. arxiv:2201.12086 [cs.CV]
[24]
Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, and Weizhu Chen. 2021. What Makes Good In-Context Examples for GPT-3?arxiv:2101.06804 [cs.CL]
[25]
Vivian Liu and Lydia B Chilton. 2022. Design Guidelines for Prompt Engineering Text-to-Image Generative Models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 384, 23 pages. https://doi.org/10.1145/3491102.3501825
[26]
Vivian Liu, Han Qiao, and Lydia Chilton. 2022. Opal: Multimodal Image Generation for News Illustration. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology (Bend, OR, USA) (UIST ’22). Association for Computing Machinery, New York, NY, USA, Article 73, 17 pages. https://doi.org/10.1145/3526113.3545621
[27]
Vivian Liu, Jo Vermeulen, George Fitzmaurice, and Justin Matejka. 2022. 3DALL-E: Integrating Text-to-Image AI in 3D Design Workflows. arxiv:2210.11603 [cs.HC]
[28]
Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, and Ruslan Salakhutdinov. 2016. Generating Images from Captions with Attention. arxiv:1511.02793 [cs.LG]
[29]
Joe Marks, Brad Andalman, Paul A Beardsley, William Freeman, Sarah Gibson, Jessica Hodgins, Thomas Kang, Brian Mirtich, Hanspeter Pfister, Wheeler Ruml, 1997. Design galleries: A general approach to setting parameters for computer graphics and animation. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques. ACM Press/Addison-Wesley Publishing Co., 389–400.
[30]
Justin Matejka, Michael Glueck, Erin Bradner, Ali Hashemi, Tovi Grossman, and George Fitzmaurice. 2018. Dream Lens: Exploration and Visualization of Large-Scale Generative Design Datasets. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3173574.3173943
[31]
Justin Matejka, Wei Li, Tovi Grossman, and George Fitzmaurice. 2009. CommunityCommands: Command Recommendations for Software Applications. In Proceedings of the 22nd Annual ACM Symposium on User Interface Software and Technology (Victoria, BC, Canada) (UIST ’09). Association for Computing Machinery, New York, NY, USA, 193–202. https://doi.org/10.1145/1622176.1622214
[32]
Anh Nguyen, Jeff Clune, Yoshua Bengio, Alexey Dosovitskiy, and Jason Yosinski. 2017. Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space. arxiv:1612.00005 [cs.CV]
[33]
nolan dev. 2019. GANInterface. https://github.com/nolan-dev/GANInterface.
[34]
OpenAI. 2023. GPT-4 Technical Report. arxiv:2303.08774 [cs.CL]
[35]
Jonas Oppenlaender. 2022. A Taxonomy of Prompt Modifiers for Text-To-Image Generation. arxiv:2204.13988 [cs.MM]
[36]
pharmapsychotic. 2022. clip-interrogator. https://github.com/pharmapsychotic/clip-interrogator.
[37]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. arxiv:2103.00020 [cs.CV]
[38]
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical Text-Conditional Image Generation with CLIP Latents. arxiv:2204.06125 [cs.CV]
[39]
Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. Zero-Shot Text-to-Image Generation. arxiv:2102.12092 [cs.CV]
[40]
Scott Reed, Zeynep Akata, Santosh Mohan, Samuel Tenka, Bernt Schiele, and Honglak Lee. 2016. Learning What and Where to Draw. arxiv:1610.02454 [cs.CV]
[41]
Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. 2016. Generative Adversarial Text to Image Synthesis. In Proceedings of The 33rd International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 48), Maria Florina Balcan and Kilian Q. Weinberger (Eds.). PMLR, New York, New York, USA, 1060–1069. https://proceedings.mlr.press/v48/reed16.html
[42]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2021. High-Resolution Image Synthesis with Latent Diffusion Models. arxiv:2112.10752 [cs.CV]
[43]
Elvis Saravia. 2022. Prompt Engineering Guide. https://github.com/dair-ai/Prompt-Engineering-Guide (12 2022).
[44]
Ana Serrano, Diego Gutierrez, Karol Myszkowski, Hans-Peter Seidel, and Belen Masia. 2016. An Intuitive Control Space for Material Appearance. ACM Trans. Graph. 35, 6, Article 186 (dec 2016), 12 pages. https://doi.org/10.1145/2980179.2980242
[45]
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. In Proceedings of the 32nd International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 37), Francis Bach and David Blei (Eds.). PMLR, Lille, France, 2256–2265. https://proceedings.mlr.press/v37/sohl-dickstein15.html
[46]
John Swales. 2014. The concept of discourse community. Wardle and Downs (2014), 215–28.
[47]
Michael Terry and Elizabeth D. Mynatt. 2002. Side Views: Persistent, on-Demand Previews for Open-Ended Tasks. In Proceedings of the 15th Annual ACM Symposium on User Interface Software and Technology (Paris, France) (UIST ’02). Association for Computing Machinery, New York, NY, USA, 71–80. https://doi.org/10.1145/571985.571996
[48]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE.Journal of machine learning research 9, 11 (2008).
[49]
Yunlong Wang, Shuyuan Shen, and Brian Y Lim. 2023. RePrompt: Automatic Prompt Editing to Refine AI-Generative Art Towards Precise Expressions. arXiv preprint arXiv:2302.09466 (2023).
[50]
Zijie J. Wang, Evan Montoya, David Munechika, Haoyang Yang, Benjamin Hoover, and Duen Horng Chau. 2022. DiffusionDB: A Large-Scale Prompt Gallery Dataset for Text-to-Image Generative Models. arXiv:2210.14896 [cs] (2022). https://arxiv.org/abs/2210.14896
[51]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2023. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arxiv:2201.11903 [cs.CL]
[52]
Yuxin Wen, Neel Jain, John Kirchenbauer, Micah Goldblum, Jonas Geiping, and Tom Goldstein. 2023. Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery. arxiv:2302.03668 [cs.LG]
[53]
Sam Witteveen and Martin Andrews. 2022. Investigating Prompt Engineering in Diffusion Models. arxiv:2211.15462 [cs.CV]
[54]
Mehmet Ersin Yumer, Paul Asente, Radomir Mech, and Levent Burak Kara. 2015. Procedural Modeling Using Autoencoder Networks. In Proceedings of the 28th Annual ACM Symposium on User Interface Software and Technology (Charlotte, NC, USA) (UIST ’15). Association for Computing Machinery, New York, NY, USA, 109–118. https://doi.org/10.1145/2807442.2807448
[55]
Mehmet Ersin Yumer, Siddhartha Chaudhuri, Jessica K. Hodgins, and Levent Burak Kara. 2015. Semantic Shape Editing Using Deformation Handles. ACM Trans. Graph. 34, 4, Article 86 (jul 2015), 12 pages. https://doi.org/10.1145/2766908
[56]
Loutfouz Zaman, Wolfgang Stuerzlinger, Christian Neugebauer, Rob Woodbury, Maher Elkhaldi, Naghmi Shireen, and Michael Terry. 2015. GEM-NI: A System for Creating and Managing Alternatives In Generative Design. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul, Republic of Korea) (CHI ’15). Association for Computing Machinery, New York, NY, USA, 1201–1210. https://doi.org/10.1145/2702123.2702398
[57]
Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris Metaxas. 2017. StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks. arxiv:1612.03242 [cs.CV]
[58]
Lvmin Zhang and Maneesh Agrawala. 2023. Adding Conditional Control to Text-to-Image Diffusion Models. arxiv:2302.05543 [cs.CV]
[59]
Tony Z. Zhao, Eric Wallace, Shi Feng, Dan Klein, and Sameer Singh. 2021. Calibrate Before Use: Improving Few-Shot Performance of Language Models. arxiv:2102.09690 [cs.CL]

Cited By

View all
  • (2025)KNowNEt:Guided Health Information Seeking from LLMs via Knowledge Graph IntegrationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345636431:1(547-557)Online publication date: Jan-2025
  • (2025)A survey of emerging applications of large language models for problems in mechanics, product design, and manufacturingAdvanced Engineering Informatics10.1016/j.aei.2024.10306664(103066)Online publication date: Mar-2025
  • (2024)ID.8: Co-Creating Visual Stories with Generative AIACM Transactions on Interactive Intelligent Systems10.1145/367227714:3(1-29)Online publication date: 2-Aug-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
UIST '23: Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology
October 2023
1825 pages
ISBN:9798400701320
DOI:10.1145/3586183
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Large Language Models
  2. Prompt Engineering
  3. Text-to-Image

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • National Sciences and Engineering Research Council of Canada (NSERC)

Conference

UIST '23

Acceptance Rates

Overall Acceptance Rate 561 of 2,567 submissions, 22%

Upcoming Conference

UIST '25
The 38th Annual ACM Symposium on User Interface Software and Technology
September 28 - October 1, 2025
Busan , Republic of Korea

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2,052
  • Downloads (Last 6 weeks)185
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2025)KNowNEt:Guided Health Information Seeking from LLMs via Knowledge Graph IntegrationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345636431:1(547-557)Online publication date: Jan-2025
  • (2025)A survey of emerging applications of large language models for problems in mechanics, product design, and manufacturingAdvanced Engineering Informatics10.1016/j.aei.2024.10306664(103066)Online publication date: Mar-2025
  • (2024)ID.8: Co-Creating Visual Stories with Generative AIACM Transactions on Interactive Intelligent Systems10.1145/367227714:3(1-29)Online publication date: 2-Aug-2024
  • (2024)"I look at it as the king of knowledge": How Blind People Use and Understand Generative AI ToolsProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675631(1-14)Online publication date: 27-Oct-2024
  • (2024)Helpful or Harmful? Exploring the Efficacy of Large Language Models for Online Grooming PreventionProceedings of the 2024 European Interdisciplinary Cybersecurity Conference10.1145/3655693.3655694(1-10)Online publication date: 5-Jun-2024
  • (2024)FathomGPT: A natural language interface for interactively exploring ocean science dataProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676462(1-15)Online publication date: 13-Oct-2024
  • (2024)StyleFactory: Towards Better Style Alignment in Image Creation through Style-Strength-Based Control and EvaluationProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676370(1-15)Online publication date: 13-Oct-2024
  • (2024)AutoSpark: Supporting Automobile Appearance Design Ideation with Kansei Engineering and Generative AIProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676337(1-19)Online publication date: 13-Oct-2024
  • (2024)BlendScape: Enabling End-User Customization of Video-Conferencing Environments through Generative AIProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676326(1-19)Online publication date: 13-Oct-2024
  • (2024)DesignPrompt: Using Multimodal Interaction for Design Exploration with Generative AIProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661588(804-818)Online publication date: 1-Jul-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media