More Web Proxy on the site http://driver.im/

research-article

Promptify: Text-to-Image Generation through Interactive Prompt Exploration with Large Language Models

Authors:

Mauricio Sousa,

Tovi GrossmanAuthors Info & Claims

UIST '23: Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology

Article No.: 96, Pages 1 - 14

https://doi.org/10.1145/3586183.3606725

Published: 29 October 2023 Publication History

Abstract

Text-to-image generative models have demonstrated remarkable capabilities in generating high-quality images based on textual prompts. However, crafting prompts that accurately capture the user’s creative intent remains challenging. It often involves laborious trial-and-error procedures to ensure that the model interprets the prompts in alignment with the user’s intention. To address these challenges, we present Promptify, an interactive system that supports prompt exploration and refinement for text-to-image generative models. Promptify utilizes a suggestion engine powered by large language models to help users quickly explore and craft diverse prompts. Our interface allows users to organize the generated images flexibly, and based on their preferences, Promptify suggests potential changes to the original prompt. This feedback loop enables users to iteratively refine their prompts and enhance desired features while avoiding unwanted ones. Our user study shows that Promptify effectively facilitates the text-to-image workflow, allowing users to create visually appealing images on their first attempt while requiring significantly less cognitive load than a widely-used baseline tool.

Supplemental Material

ZIP File

Supplemental File

Download
54.08 MB

References

[1]

2022. Automatic1111 Extensions. https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Extensions

[2]

2022. Promptgen. https://github.com/AUTOMATIC1111/stable-diffusion-webui-promptgen

[3]

2023. CLIP. https://github.com/openai/CLIP

[4]

2023. Lexica. https://lexica.art/

[5]

2023. MagicPrompt-Stable-Diffusion. https://huggingface.co/spaces/Gustavosta/MagicPrompt-Stable-Diffusion

[6]

2023. Midjourney. https://www.midjourney.com/home/?callbackUrl=%2Fapp%2F

[7]

2023. OpenAI API. https://openai.com/blog/openai-api

[8]

2023. React Flow. https://reactflow.dev/

[9]

2023. SentenceTransformers. https://www.sbert.net/

[10]

Maneesh Agrawala. 2023. Unpredictable Black Boxes are Terrible Interfaces. https://magrawala.substack.com/p/unpredictable-black-boxes-are-terrible

[11]

automatic1111. 2022. Stable Diffusion Web UI. https://github.com/AUTOMATIC1111/stable-diffusion-webui.

[12]

Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural language processing with Python: analyzing text with the natural language toolkit. " O’Reilly Media, Inc.".

Digital Library

[13]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. arxiv:2005.14165 [cs.CL]

[14]

Siddhartha Chaudhuri, Evangelos Kalogerakis, Stephen Giguere, and Thomas Funkhouser. 2013. Attribit: Content Creation with Semantic Attributes. In Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology (St. Andrews, Scotland, United Kingdom) (UIST ’13). Association for Computing Machinery, New York, NY, USA, 193–202. https://doi.org/10.1145/2501988.2502008

Digital Library

[15]

cmdr2. 2022. Easy Diffusion. https://github.com/cmdr2/stable-diffusion-ui.

[16]

Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, and Pete Florence. 2023. PaLM-E: An Embodied Multimodal Language Model. In arXiv preprint arXiv:2303.03378.

[17]

Noyan Evirgen and Xiang 'Anthony' Chen. 2022. GANzilla: User-Driven Direction Discovery in Generative Adversarial Networks. In The 35th Annual ACM Symposium on User Interface Software and Technology. ACM. https://doi.org/10.1145/3526113.3545638

Digital Library

[18]

Noyan Evirgen and Xiang’Anthony’ Chen. 2023. GANravel: User-Driven Direction Disentanglement in Generative Adversarial Networks. arXiv preprint arXiv:2302.00079 (2023).

[19]

Brendan J. Frey and Delbert Dueck. 2007. Clustering by Passing Messages Between Data Points. Science 315, 5814 (2007), 972–976. https://doi.org/10.1126/science.1136800 arXiv:https://www.science.org/doi/pdf/10.1126/science.1136800

[20]

Sandra G Hart and LE Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research.[W]: PA Hancock, N. Meshkati (Eds.): Human Mental Workload.

[21]

Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. 2022. Prompt-to-prompt image editing with cross attention control. (2022).

[22]

Hyung-Kwon Ko, Gwanmo Park, Hyeon Jeon, Jaemin Jo, Juho Kim, and Jinwook Seo. 2022. Large-scale Text-to-Image Generation Models for Visual Artists’ Creative Works. ArXiv abs/2210.08477 (2022).

[23]

Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. 2022. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. arxiv:2201.12086 [cs.CV]

[24]

Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, and Weizhu Chen. 2021. What Makes Good In-Context Examples for GPT-3?arxiv:2101.06804 [cs.CL]

[25]

Vivian Liu and Lydia B Chilton. 2022. Design Guidelines for Prompt Engineering Text-to-Image Generative Models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 384, 23 pages. https://doi.org/10.1145/3491102.3501825

Digital Library

[26]

Vivian Liu, Han Qiao, and Lydia Chilton. 2022. Opal: Multimodal Image Generation for News Illustration. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology (Bend, OR, USA) (UIST ’22). Association for Computing Machinery, New York, NY, USA, Article 73, 17 pages. https://doi.org/10.1145/3526113.3545621

Digital Library

[27]

Vivian Liu, Jo Vermeulen, George Fitzmaurice, and Justin Matejka. 2022. 3DALL-E: Integrating Text-to-Image AI in 3D Design Workflows. arxiv:2210.11603 [cs.HC]

[28]

Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, and Ruslan Salakhutdinov. 2016. Generating Images from Captions with Attention. arxiv:1511.02793 [cs.LG]

[29]

Joe Marks, Brad Andalman, Paul A Beardsley, William Freeman, Sarah Gibson, Jessica Hodgins, Thomas Kang, Brian Mirtich, Hanspeter Pfister, Wheeler Ruml, 1997. Design galleries: A general approach to setting parameters for computer graphics and animation. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques. ACM Press/Addison-Wesley Publishing Co., 389–400.

Digital Library

[30]

Justin Matejka, Michael Glueck, Erin Bradner, Ali Hashemi, Tovi Grossman, and George Fitzmaurice. 2018. Dream Lens: Exploration and Visualization of Large-Scale Generative Design Datasets. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3173574.3173943

Digital Library

[31]

Justin Matejka, Wei Li, Tovi Grossman, and George Fitzmaurice. 2009. CommunityCommands: Command Recommendations for Software Applications. In Proceedings of the 22nd Annual ACM Symposium on User Interface Software and Technology (Victoria, BC, Canada) (UIST ’09). Association for Computing Machinery, New York, NY, USA, 193–202. https://doi.org/10.1145/1622176.1622214

Digital Library

[32]

Anh Nguyen, Jeff Clune, Yoshua Bengio, Alexey Dosovitskiy, and Jason Yosinski. 2017. Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space. arxiv:1612.00005 [cs.CV]

[33]

nolan dev. 2019. GANInterface. https://github.com/nolan-dev/GANInterface.

[34]

OpenAI. 2023. GPT-4 Technical Report. arxiv:2303.08774 [cs.CL]

[35]

Jonas Oppenlaender. 2022. A Taxonomy of Prompt Modifiers for Text-To-Image Generation. arxiv:2204.13988 [cs.MM]

[36]

pharmapsychotic. 2022. clip-interrogator. https://github.com/pharmapsychotic/clip-interrogator.

[37]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. arxiv:2103.00020 [cs.CV]

[38]

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical Text-Conditional Image Generation with CLIP Latents. arxiv:2204.06125 [cs.CV]

[39]

Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. Zero-Shot Text-to-Image Generation. arxiv:2102.12092 [cs.CV]

[40]

Scott Reed, Zeynep Akata, Santosh Mohan, Samuel Tenka, Bernt Schiele, and Honglak Lee. 2016. Learning What and Where to Draw. arxiv:1610.02454 [cs.CV]

[41]

Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. 2016. Generative Adversarial Text to Image Synthesis. In Proceedings of The 33rd International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 48), Maria Florina Balcan and Kilian Q. Weinberger (Eds.). PMLR, New York, New York, USA, 1060–1069. https://proceedings.mlr.press/v48/reed16.html

[42]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2021. High-Resolution Image Synthesis with Latent Diffusion Models. arxiv:2112.10752 [cs.CV]

[43]

Elvis Saravia. 2022. Prompt Engineering Guide. https://github.com/dair-ai/Prompt-Engineering-Guide (12 2022).

[44]

Ana Serrano, Diego Gutierrez, Karol Myszkowski, Hans-Peter Seidel, and Belen Masia. 2016. An Intuitive Control Space for Material Appearance. ACM Trans. Graph. 35, 6, Article 186 (dec 2016), 12 pages. https://doi.org/10.1145/2980179.2980242

Digital Library

[45]

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. In Proceedings of the 32nd International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 37), Francis Bach and David Blei (Eds.). PMLR, Lille, France, 2256–2265. https://proceedings.mlr.press/v37/sohl-dickstein15.html

[46]

John Swales. 2014. The concept of discourse community. Wardle and Downs (2014), 215–28.

[47]

Michael Terry and Elizabeth D. Mynatt. 2002. Side Views: Persistent, on-Demand Previews for Open-Ended Tasks. In Proceedings of the 15th Annual ACM Symposium on User Interface Software and Technology (Paris, France) (UIST ’02). Association for Computing Machinery, New York, NY, USA, 71–80. https://doi.org/10.1145/571985.571996

Digital Library

[48]

Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE.Journal of machine learning research 9, 11 (2008).

[49]

Yunlong Wang, Shuyuan Shen, and Brian Y Lim. 2023. RePrompt: Automatic Prompt Editing to Refine AI-Generative Art Towards Precise Expressions. arXiv preprint arXiv:2302.09466 (2023).

[50]

Zijie J. Wang, Evan Montoya, David Munechika, Haoyang Yang, Benjamin Hoover, and Duen Horng Chau. 2022. DiffusionDB: A Large-Scale Prompt Gallery Dataset for Text-to-Image Generative Models. arXiv:2210.14896 [cs] (2022). https://arxiv.org/abs/2210.14896

[51]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2023. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arxiv:2201.11903 [cs.CL]

[52]

Yuxin Wen, Neel Jain, John Kirchenbauer, Micah Goldblum, Jonas Geiping, and Tom Goldstein. 2023. Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery. arxiv:2302.03668 [cs.LG]

[53]

Sam Witteveen and Martin Andrews. 2022. Investigating Prompt Engineering in Diffusion Models. arxiv:2211.15462 [cs.CV]

[54]

Mehmet Ersin Yumer, Paul Asente, Radomir Mech, and Levent Burak Kara. 2015. Procedural Modeling Using Autoencoder Networks. In Proceedings of the 28th Annual ACM Symposium on User Interface Software and Technology (Charlotte, NC, USA) (UIST ’15). Association for Computing Machinery, New York, NY, USA, 109–118. https://doi.org/10.1145/2807442.2807448

Digital Library

[55]

Mehmet Ersin Yumer, Siddhartha Chaudhuri, Jessica K. Hodgins, and Levent Burak Kara. 2015. Semantic Shape Editing Using Deformation Handles. ACM Trans. Graph. 34, 4, Article 86 (jul 2015), 12 pages. https://doi.org/10.1145/2766908

Digital Library

[56]

Loutfouz Zaman, Wolfgang Stuerzlinger, Christian Neugebauer, Rob Woodbury, Maher Elkhaldi, Naghmi Shireen, and Michael Terry. 2015. GEM-NI: A System for Creating and Managing Alternatives In Generative Design. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul, Republic of Korea) (CHI ’15). Association for Computing Machinery, New York, NY, USA, 1201–1210. https://doi.org/10.1145/2702123.2702398

Digital Library

[57]

Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris Metaxas. 2017. StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks. arxiv:1612.03242 [cs.CV]

[58]

Lvmin Zhang and Maneesh Agrawala. 2023. Adding Conditional Control to Text-to-Image Diffusion Models. arxiv:2302.05543 [cs.CV]

[59]

Tony Z. Zhao, Eric Wallace, Shi Feng, Dan Klein, and Sameer Singh. 2021. Calibrate Before Use: Improving Few-Shot Performance of Language Models. arxiv:2102.09690 [cs.CL]

Cited By

Yan YHou YXiao YZhang RWang Q(2025)KNowNEt:Guided Health Information Seeking from LLMs via Knowledge Graph IntegrationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345636431:1(547-557)Online publication date: Jan-2025
https://doi.org/10.1109/TVCG.2024.3456364
Mustapha K(2025)A survey of emerging applications of large language models for problems in mechanics, product design, and manufacturingAdvanced Engineering Informatics10.1016/j.aei.2024.10306664(103066)Online publication date: Mar-2025
https://doi.org/10.1016/j.aei.2024.103066
Antony VHuang C(2024)ID.8: Co-Creating Visual Stories with Generative AIACM Transactions on Interactive Intelligent Systems10.1145/367227714:3(1-29)Online publication date: 2-Aug-2024
https://dl.acm.org/doi/10.1145/3672277
Show More Cited By

Index Terms

Promptify: Text-to-Image Generation through Interactive Prompt Exploration with Large Language Models

Index terms have been assigned to the content through auto-classification.

Recommendations

PromptCharm: Text-to-Image Generation through Multi-modal Prompting and Refinement
CHI '24: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems

The recent advancements in Generative AI have significantly advanced the field of text-to-image generation. The state-of-the-art text-to-image model, Stable Diffusion, is now capable of synthesizing high-quality images with a strong sense of aesthetics. ...
PROMPTIST: Automated Prompt Optimization for Text-to-Image Synthesis
Natural Language Processing and Chinese Computing
Abstract
Recent advancements in text-to-image synthesis (TIS) models, such as DALL-E 3 and stable diffusion XL, have significantly improved image quality. However, these models typically require experts with artistic and photographic expertise to create ...
Is It AI or Is It Me? Understanding Users’ Prompt Journey with Text-to-Image Generative AI Tools
CHI '24: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems

Generative Artificial Intelligence (AI) has witnessed unprecedented growth in text-to-image AI tools. Yet, much remains unknown about users’ prompt journey with such tools in the wild. In this paper, we posit that designing human-centered text-to-image ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

UIST '23: Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology

October 2023

1825 pages

ISBN:9798400701320

DOI:10.1145/3586183

Editors:
Sean Follmer
Stanford University, USA
,
Jeff Han,
Jürgen Steimle
Saarland University, Germany
,
Nathalie Henry Riche
Microsoft Research, USA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Sciences and Engineering Research Council of Canada (NSERC)

Conference

UIST '23

Sponsor:

UIST '23: The 36th Annual ACM Symposium on User Interface Software and Technology

October 29 - November 1, 2023

CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 561 of 2,567 submissions, 22%

Upcoming Conference

UIST '25

Sponsor:
sigchi
sigchi

The 38th Annual ACM Symposium on User Interface Software and Technology

September 28 - October 1, 2025

Busan , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

34
Total Citations
View Citations
2,389
Total Downloads

Downloads (Last 12 months)2,052
Downloads (Last 6 weeks)185

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yan YHou YXiao YZhang RWang Q(2025)KNowNEt:Guided Health Information Seeking from LLMs via Knowledge Graph IntegrationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345636431:1(547-557)Online publication date: Jan-2025
https://doi.org/10.1109/TVCG.2024.3456364
Mustapha K(2025)A survey of emerging applications of large language models for problems in mechanics, product design, and manufacturingAdvanced Engineering Informatics10.1016/j.aei.2024.10306664(103066)Online publication date: Mar-2025
https://doi.org/10.1016/j.aei.2024.103066
Antony VHuang C(2024)ID.8: Co-Creating Visual Stories with Generative AIACM Transactions on Interactive Intelligent Systems10.1145/367227714:3(1-29)Online publication date: 2-Aug-2024
https://dl.acm.org/doi/10.1145/3672277
Adnin RDas M(2024)"I look at it as the king of knowledge": How Blind People Use and Understand Generative AI ToolsProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675631(1-14)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3663548.3675631
Prosser EEdwards M(2024)Helpful or Harmful? Exploring the Efficacy of Large Language Models for Online Grooming PreventionProceedings of the 2024 European Interdisciplinary Cybersecurity Conference10.1145/3655693.3655694(1-10)Online publication date: 5-Jun-2024
https://dl.acm.org/doi/10.1145/3655693.3655694
Khanal NYu CChiu JChaudhary AZhang ZKatija KForbes A(2024)FathomGPT: A natural language interface for interactively exploring ocean science dataProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676462(1-15)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676462
Zhou MZhang DYou WYu ZWu YPan CLiu HLao TChen P(2024)StyleFactory: Towards Better Style Alignment in Image Creation through Style-Strength-Based Control and EvaluationProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676370(1-15)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676370
Chen LJing QTsang YWang QLiu RXia DZhou YSun L(2024)AutoSpark: Supporting Automobile Appearance Design Ideation with Kansei Engineering and Generative AIProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676337(1-19)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676337
Rajaram SNuman NKumaravel BMarquardt NWilson A(2024)BlendScape: Enabling End-User Customization of Video-Conferencing Environments through Generative AIProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676326(1-19)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676326
Peng XKoch JMackay W(2024)DesignPrompt: Using Multimodal Interaction for Design Exploration with Generative AIProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661588(804-818)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3643834.3661588
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents