1 Introduction
Text-to-Image (TTI) models like DALL•E [
43] and Stable Diffusion [
47] generate images from a combination of text prompts and numerical parameters (e.g., seeds). Interfaces for these models are seeing wide end-user adoption in a variety of settings, from marketing collateral to independent art making.
A critical skill for Text-to-Image (TTI) users is navigating between prompt text inputs and image outputs. Building such an understanding is neither trivial nor straightforward [
58,
59]: the spaces of possible inputs and outputs are
massive, and the mapping of one to the other is highly
opaque.
Current patterns in commercial interfaces present limited support for exploration: a text box for prompt input and an area for displaying and saving a few output images. Some offer a number of additional features to support prompt engineering: canned prompt ideas, including options to influence “style,” and sliders for manipulating hyperparameters.
This lack of explicit interface support has led to the creation of resources such as community-curated prompt books [
38], spreadsheets [
56], and tutorials [
33,
49] that document exploration processes. Researchers have also begun to study prompting practices [
4,
19,
25,
34,
35,
36] and to propose alternative interfaces for interacting with TTI (e.g., Promptify [
2]) and other types of generative models (e.g., GanZilla [
10], Spacesheets [
28], PromptAid [
32]). These systems tend to support particular prompt-image or prompt-text workflows, helping users refine prompts towards a goal.
In this paper we argue that supporting Text-To-Image users goes beyond ensuring that they can achieve a particular end result;
gaining an understanding of the mapping between input and output is core to successful creation with generative AI systems. Exploring the relationship between TTI inputs and outputs is a sensemaking process [
41,
48] where users aim to build mental representations that allow them to reliably navigate to desired outputs. While the input and output spaces are massive and opaque, they are
not arbitrary: prompters
can develop reliable exploration “targeting strategies” with experience and productive reflection. Users can use the knowledge they gain from many iterative input-output tests to effectively “prompt-craft,” effectively steering generations towards desirable results.
Thus, our guiding research question is: How might new interfaces support users in sense-making for successful art making with such models?
To investigate this question, we built DreamSheets, a tool that enables TTI model users to compose targeted exploration systems within a spreadsheet interface. In DreamSheets, spreadsheet cells can contain prompts, or images generated from those prompts. A set of novel prompt manipulation functions enable users to explore prompt space computationally, through the construction and strategic combination of categorical lists, alternative wordings, embellishments, synonyms, and more. These functions are implemented through prompts to a large language model (LLM).
Spreadsheets may not readily provide the ideal affordances for organizing image collections; however, they are a highly flexible, computational substrate for what-if exploration; by presenting image and text generation tools within a customizable sandbox, DreamSheets enables users to compose novel creative workflows.
We investigated users’ exploration strategies in two studies with DreamSheets: a 1-hour lab study with 12 primarily amateur participants, and a two-week extended study with five expert TTI artists. In these studies, we examined how both groups (1) develop intuition for prompt designs that yield specific outputs, and (2) use DreamSheets’ affordances for computational prompt manipulation, workflow creation, and output evaluation.
Our primary insights lie in observing and analyzing how participants used DreamSheets to develop custom TTI sheet-systems, and in identifying sense-making and exploration patterns across these generative prototypes, including the construction and iterative reuse of composable exploration “structures” that map to generalizable UI concepts. We then use these insights to create UI mockups to inform potential future interfaces, and report on the feedback and speculation it elicited from our participants.
This paper makes three contributions:
First, it describes DreamSheets, a spreadsheet-based TTI design space exploration platform that enables user-defined computational and LLM-supported interactions over the joint design space of prompts, seeds, and other TTI model hyperparameters.
Second, it offers a rich description of a first-of-its-kind extended (2-week) study exploring the ways in which artists use computational structures for sense-making and to explore the design space of TTI models.
Finally, it presents a set of generalized UI design suggestions, co-designed in a visual UI mock-up with our expert artist participants, and informed by the sheet-systems they developed while working in DreamSheets.
2 Related Work
Our work builds on research showing that considering many alternatives in parallel effectively aids design space exploration [
15,
44], such as through gallery interfaces [
31], tracking exploration history [
16,
21], with suggestions for possible input shifts [
31], and effective organization [
29,
46,
55]. Spreadsheets enable many of these abilities that common TTI interfaces often lack.
In this section, we draw explicit connections to Creativity Support Tools, prompting and other TTI model workflows, design space exploration of images in non-TTI contexts, and sensemaking.
2.1 Creativity Support
In 2007, Shneiderman identified four underlying design principles for creativity support tools (CSTs):
support exploratory search,
enable collaboration,
provide rich history-keeping, and
design with low thresholds, high ceilings, and wide walls [
51]. A more recent body of research explores how CST design can aid users’ creative processes and productivity [
6,
11]. Spreadsheets are themselves an example of a tool that supports creative exploration, enabling users to separate fixed values from values they want to vary, affording effective exploration and evaluation of “what-if” scenarios [
45,
51].
2.2 Prompting & Text-to-Image Model Workflows
At the surface, prompting can appear straightforward, but crafting effective prompts is a challenge, even for experts [
26,
58,
59]. How a prompt directly impacts model outputs is an active area of research [
24,
50]. Choosing the right language to achieve desirable visual results in these prompt-based interactions can be difficult, motivating online user communities [
9,
42] and researchers to develop and investigate new prompting techniques [
26,
27] and tools supporting prompt discovery and exploration [
2,
13]. These tools tend to be goal-driven, helping artists with a particular image “goal” target and improve their generations, while perhaps considering alternatives. In contrast,
DreamSheets seeks to explicitly support the rapid construction of flexible structures towards various user-defined goals. A growing body of literature has also begun to explore how communities of practice are approaching prompting [
4,
35,
36,
37]. These studies offer taxonomies of prompt structures, showing how artists use different prompt modifiers [
36], how they engage in the process of prompt engineering [
37], and how they consider the material properties of text prompts [
4]; this literature will rightfully continue to grow as these practices evolve alongside the models they rely on.
DreamSheets builds upon this prior work by enabling prompt-artists to rapidly prototype computational structures and pursue user-defined, targeted explorations across prompt-image space—allowing us, in turn, to learn how artists use such new capabilities to build structures for sense-making and art-making.
2.3 Design Space Exploration of Images
Prompt discovery tools follow a long line of research into design space exploration of images. In spaces like computer graphics and animation where
visual judgment of a human-designer-produced artifact is the primary evaluation mode, prior work has often focused on browsing interfaces, such as in Marks
et al.’s seminal Design Galleries [
31]. Interaction techniques for browsing include multi-step galleries [
31]; map metaphors [
53]; or faceted browsing [
14].
Narrowing down from the explored designs, users may wish to pursue multiple alternative options for deeper exploration, though typically many orders of magnitude fewer than the number of algorithmically explored designs, as in GEM-NI [
57].
Spreadsheets’ usefulness for visual design space exploration in part stems from the intrinsic 2D matrix layout enabling “small multiples”, a term Edward Tufte popularized [
54] as an answer to the question “compared to what?” In a “contact sheet”-like 2D matrix, one can readily compare many images at once and quickly identify the best candidates. Spreadsheets have a rich history of serving as vehicles for exploratory work, in accounting and far beyond––utilized as early as 1994 for information visualization of data and images themselves [
5,
22], including images generated from a numerical input space [
28].
2.4 Sense-making
A number of our observations relate to the broader sense-making literature, including Pirolli and Card’s seminal work on information foraging [
40]—
DreamSheets offers users an
information scent on prompts—and sense-making more broadly [
41,
48]. This line of work models how users navigate and make decisions in information-rich environments (like
DreamSheets), balancing between the perceived cost of seeking information and the potential reward of finding what they’re seeking.
DreamSheets’s design draws upon the
free energy principle of the brain theory from cognitive science [
12] which describes how the brain reduces uncertainty by making predictions and updating an internal mental model accordingly, generatively optimizing its internal model with sensory input to enhance prediction accuracy. This principle formed a basis for Davis et.al’s Creative Sense-Making (CSM) framework [
8], which they applied to human-AI co-creation in the collaborative drawing domain.
DreamSheets’s design also draws inspiration and lessons from existing sense-making interfaces – including classics like Scatter/Gather [
7] and more modern implementations like Sensecape [
52]. SemanticCollage [
18] and ImageSense [
17] provided reusable, system-generated text and visuals to support creative image search and sense-making with “reflection in action”.
3 Prompt & Image Exploration with DreamSheets
DreamSheets leverages the inherently flexible spreadsheet model in support of iteration and exploration of the TTI generation prompt input space. The features of DreamSheets are embedded within a spreadsheet (built on Google Sheets) that recomputes and re-renders images in response to prompt additions and changes, allowing (for instance) drag-based “autofill” of columns, rows, or 2D regions with formula-generated prompts and images based on those prompts, alongside other common spreadsheet functionality. Specifically, DreamSheets offers access to diffusion model image generation as a spreadsheet function that can take the content of other cells in the sheet as input, including combinations or transformations of multiple cells. These features support the user in efficiently exploring, observing how generated outputs are influenced by modifications to the input.
Our prototype also includes a set of LLM-based spreadsheet functions for manipulating prompts directly, such as
gpt_list and
list_completion for generating or extending a list of items of a certain description,
embellish to create a detailed variation of the input text, and
alternatives to generate multiple variations of a seed prompt (see Table
1 for a full list). These functions serve as an aid to users in the construction of axes along which to explore the prompt/image design space, supporting template prompts with automatic value insertion.
For example, a template prompt like “A <facial expression> woman” can be expanded into a column of different prompts by substituting values generated by gpt_list("facial expressions").
3.1 DreamSheets Design
Our primary goal here was to enable the use of a computational substrate for TTI model design space exploration via spreadsheet formula construction.
Drawing on prior work in prompt design [
25,
34,
36,
38], we identified the testing of alternative phrasings and the addition of detail as core activities in TTI prompt exploration. These activities help users explore neighboring points in design space and recognize fruitful
directions for further prompt explorations. We learned from prior evaluations of prompt engineering in the TTI context [
37] that users were likely to express a diversity of design patterns and that supporting flexibility, providing immediate visual feedback, and offering an extended period of familiarization would be critical.
We operationalized support for these exploratory activities as the
alternatives,
divergents, and
embellish functions. Similarly, synonym and antonym generation are core NLP building blocks, useful for creating variation that targets specific words in a longer prompt—we integrated these capabilities through the
synonyms and
antonyms functions; see Table
1 for a full list. These operations formed a foundation from which users could build their own custom workflows and strategic approaches to discovering model capabilities and exploring the underlying prompt design space.
To use these concepts in a spreadsheet paradigm and support the generation of sets of images, we designed these functions to output lists of values that populate across a column (or row) of cells. These terms in these cells can be referenced in traditional spreadsheet style and concatenated with other values to form combined prompts. We also provided functions to extend lists of prompts or prompt parts, allowing users to build on a conceptual list by providing a few initial examples.
3.2 DreamSheets Implementation
We explored various service options for
DreamSheets’s underlying spreadsheet functionality, including building our own spreadsheet interface from scratch, open source spreadsheets HandsOnTable
1 and LuckySheet,
2 and both Microsoft Excel and Google Sheets.
One major challenge in integrating with an existing spreadsheet is the relatively long latency of image generation itself: up to 15 seconds or more, even when using cloud APIs. Spreadsheet users are accustomed to rapid updates and recomputations in response to changes in cell values—a multi-minute delay resulting from a backlog of image prompt updates and, consequently, new image generations, would be unacceptably slow. This need drove our use of the Stability.ai API,
3 which supports parallel image generation requests with the Stable Diffusion 2 model, and offers sub-15 second response times. This critically enabled the full-scale “small multiples” visualizations of results that we wanted users to be able to utilize to view and evaluate results across multiple input axes simultaneously.
Ultimately, we selected Google Sheets as the spreadsheet interface, as it is easily extensible and accessible to most people. Google Sheets’ Apps Script environment lets developers create add-ons in a JavaScript-like environment, has a sufficiently long timeout (30 seconds) for custom functions, and allows users to continue to edit the sheet even while our custom formulas, which required back-end calls to TTIs and LLMs, awaited responses.
As a side benefit, because Google Sheets is already an online-native platform, rapid collaboration and version history are built-in.
We implemented
DreamSheets as a Google Sheets Apps Script add-on and a proxy web server written in JavaScript using ExpressJS. The add-on adds custom functions described in Table
1, making the corresponding requests to the proxy web server which handles caching and calling the appropriate API to either Stability.ai or OpenAI. Figure
2 illustrates how the proxy server facilitates communication between the Google Sheets add-on and Stability.ai or OpenAI. For the
tti function, the proxy server makes a hash using a combination of the prompt, seed, and guidance values and checks if the image has been generated before. Otherwise, an API call is made to Stability.ai to generate a 512 × 512-pixel image which is then cached in the file system for easy retrieval in the future.
The LLM-based functions that return a list utilizes OpenAI’s ChatGPT with gpt-3.5-turbo. To ensure that ChatGPT returns a properly formatted list with the appropriate length, it is initialized with the following messages:
system: Respond with a Javascript array literal with the
given length in parentheses
user: types of animals (length: 5)"
assistant: ["dog", "cat", "frog", "horse", "deer"]
user: [PROMPT] (length: [LENGTH])
The implementation for each LLM-based function differs only in the prompt sent to the LLM proxy server: each function prepends different additional instructions to the user’s inputted prompt. The complete list of full prompts sent to ChatGPT are:
list_completion
Similar items to this list without repeating "[LIST]"
synonyms
Synonyms of "[USER INPUT]"
antonyms
Antonyms of "[USER INPUT]"
divergents
Divergent words to "[USER INPUT]"
alternatives
Alternative ways to say "[USER INPUT]"
embellish
Embellish this sentence: [USER INPUT]
3.3 Prototyping Interactions with Spreadsheets
In addition to serving as a platform for TTI exploration and sense-making, DreamSheets is a demonstration of a kind of creativity support tool for composable prototyping that can be built on top of existing spreadsheet software as a computational substrate. Spreadsheets are an excellent foundation for tools that might benefit from 2D structure, formula construction, and a familiarized user base—and they can be straightforward to extend if there is alignment between the spreadsheet and backend data models. Here we describe how lessons from DreamSheets can be used in the construction of other TTI-based CSTs.
First, in designing a system with a spreadsheet substrate, a number of considerations arise, grounded in how much and what kind of customization is needed. Is it sufficient, for example, to simply add new spreadsheet formula functions—or are more substantial changes needed to the interaction layers? Will custom data types need to be supported, and how will they be rendered into cells? How will users interact with these different data types—are new input mechanisms (i.e., beyond text-in-cell input) needed too, such as file uploads or image editing? What response latency is acceptable given the tasks users are expected to engage in? And, finally, how critical are history-keeping and real-time multiplayer functionality?
These considerations constrain which underlying spreadsheets can be used. For DreamSheets, we found that Google Sheets’ API support was sufficient: custom functions were easy to add, and meeting the required maximum latency (30s) was possible by parallelizing requests to our backend. One major drawback was that custom functions could not cause cells to render images—only the built-in =image(image-url) formula can do that—forcing the rather clunky =image(tti(prompt, seed, cfg)) construction.
In contexts where interaction layer changes are needed, open-source spreadsheets such as HandsOnTable or LuckySheet offer more expansive opportunities for customization. Both options support extending spreadsheets with additional content types for cells as well as supporting additional functionality for user input and manipulation of images, poses, embeddings, and other data types relevant to TTI models. For DreamSheets, this flexibility would have allowed us to build a =tti function that displays an image directly, without needing an intermediate =image function call—as well as integrate image and other types of inputs.
Custom data types can pose challenges too: for opaque (non-user-interpretable) data types, like embeddings, researchers should consider how to represent those values in a sheet itself. In some cases, simply storing the data on a server and sending a unique ID (mapped to each value on said server) would be preferable to passing blob data around a spreadsheet’s cells. Similarly, for compute-intensive operations like image synthesis, caching is critical: spreadsheets re-render frequently and unpredictably, and generating an entire sheet’s worth of images can be both expensive and time-consuming.
In DreamSheets, we cached images on the server and used a unique ID to pass around in the browser that represents a specific (prompt, seed, cfg) tuple. This approach could be generalized to support many other types of inputs and outputs, including vectors, embeddings, tokens, and other kinds of state, as well as intermediating models (LoRAs, ControlNet, etc.). Storing data on the server and referencing it by ID allows for less client-side overhead load in a spreadsheet (and potentially cost benefits through caching across users), and makes caching of those values straightforward, but requires an additional translation layer between sheet and backend.
4 Method
Having built DreamSheets, we first ran a preliminary 1-hour lab study with novice users to understand how users approach using DreamSheets for TTI exploration; this revealed that DreamSheets enables a variety of custom workflows, but that sensemaking was nearly always among the first activity participants engaged in.
Following our lab study, we ran a second, 2-week extended study with expert users. This second study was intended explicitly to better understand the kinds of custom workflows experts would build for sensemaking, given enough time, and what kinds of individual activities those sensemaking workflows consisted of.
4.1 Preliminary Lab Study
In this initial study, we observed how participants used DreamSheets to define and use a text-to-image generation workflow; we gave participants a concrete task and training in the tool, but did not direct them beyond that.
4.1.1 Participants.
We recruited 12 participants via email lists and social media. All 12 reported some spreadsheet experience, with most (10 out of 12) reporting frequent use (many times or daily). 10 out of 12 participants also had some experience with TTI models, and only 1 participant (P1) reported no prior experience with LLMs.
4.1.2 Task and Protocol.
Each study took place through a Zoom call that lasted approximately 60 minutes, during which participants and the facilitating researcher collaborated in a shared Google Sheets document. We designed a
concept art creation task to give users a direction achievable in the short amount of time provided, while leaving room for subjectivity and creativity: users were given a single inspirational image, then asked to generate three new images that could fit in a style and unspecified narrative as suggested by the inspiration image. We included a prompt explaining the task in the activity sheet, as well as an image of a post-apocalyptic, ruined Seattle, complete with space needle.
4Our protocol began with a brief tutorial on DreamSheets and its functionalities, followed by an observation of participants as they engaged in the concept art task. We used an example sheet to walk participants through a tutorial to first remind participants of general spreadsheet operations (i.e. using formulas with cell references and expanding them with autofill) and then introducing DreamSheets’s image and text generation functions. Once users were comfortable using the TTI and GPT functions, we introduced the concept art activity. Participants were encouraged to think aloud as they generated the three images required to complete the task.
4.1.3 Data Collection and Analysis.
We observed, recorded, and transcribed video and audio of each interview, including approximately 40 minutes of system use, in entirety, throughout which participants were encouraged to think aloud and provide clarification when prompted. We then engaged in an exploratory qualitative data analysis using the recordings, transcripts, and resulting spreadsheet artifacts. We recorded responses to surveys completed before and after the interview, and reviewed usage data, containing logs of each text or image generation function call used in DreamSheets.
4.1.4 Preliminary Study Results: DreamSheets in use.
Here, we provide an overview of some results specific to the preliminary lab study that informed our second, extended study.
We discuss usage patterns and themes informed by
both studies in section
5.
More than half of the participants in this study (7 out of 10) reported limited to no experience with TTI systems, but all participants were able to successfully utilize a prompt-crafting workflow in the DreamSheets system, and to produce generations that they were satisfied with for the concept art task. Though authored directly by participants, the workflows adopted by more novice participants were likely inspired by the example structures showcased during the initial tutorial phase, with P3 and P4 copying from the tutorial examples directly.
The seven participants with limited prompt-engineering experience (P1, 2, 3, 4, 6, 8, 10) wrote their prompt in “English” ranging from brief sentence fragments to detailed scene descriptions. Meanwhile, 4 of the 5 participants who reported substantial or extensive TTI experience (P5, P7, P9, P11) wrote in a structure specific to “prompt language”—comma separated lists of terms, including modifiers to influence visual style.
Novice and experienced participants found LLM-based functions useful for creating or improving their prompts. Participants with more spreadsheet experience more readily adopted string concatenation strategies to construct prompts. Using LLM-based functions to generate a series of words in a particular category, like gpt_list("camera angles") or synonyms("red"), participants introduced semantic modification “slots” into their prompts. We describe this LLM-assisted dynamic prompt construction as a prompt-space exploration strategy in Section 5.3.
This formative study confirmed
DreamSheets’s usefulness as an exploratory TTI system and illuminated promising usage patterns, as we observed even novice participants begin developing various strategies and structures to support their task completion goals in the 0 minutes provided. However, the brevity and constraints (i.e. prescribed creative task) of this first-use study format meant that users did not fully leverage
DreamSheets to develop strategies for user-defined, creatively motivated goals. This motivated a longer-term expert study to observe how generative artists might utilize
DreamSheets to build systems for “real-world” creative workflows.
4.2 Extended Expert Study Design
We turned to an extended study to observe the custom sheet-systems that experts would create when given the time and flexibility to pursue authentic creative explorations.
4.2.1 Participants.
To recruit experts for our second user study, we sent recruitment messages to individuals publicly participating in generative art communities on social media, and recruited 5 individuals (designated as E1-5 to differentiate from Study 1 participants).
4.2.2 Protocol.
We conducted three 45 minute interviews spanning 2 weeks with each participant. Participants were instructed to use the tool for about 7-10 hours over the course of the 2-week study. We suggested 30-45 minutes of tool use per day, but participants were given the freedom to decide the length and structure of their work sessions. As with the first study, the initial interview began with a short tutorial reviewing spreadsheet functionality and demonstrating DreamSheets functions. The collaborative spreadsheet shared with each participant included documentation and examples of DreamSheets function use. Participants could contact the research team via email with questions throughout the study.
The second interview took place 1 week into the study. We asked participants to explain their exploration goals and strategies, and to use relevant parts of their spreadsheets to illustrate. We described back to participants our observations, allowing them to clarify any potential misinterpretations of their actions. Based on the feedback we received, we designed a UI mockup that incorporated elements inspired by the structures built and functions used by participants during their first week of using DreamSheets.
In the third and final 45-minute interview, we again asked participants to describe the creative explorations evident in their spreadsheets, and to explain how they integrated DreamSheets’s functionalities into their creative process. We then showed participants the UI mockup to gain their perspective and elicit further feedback and suggestions for designing more supportive TTI interfaces.
4.2.3 Data Collection and Analysis.
We analyzed our participants’ usage of
DreamSheets as observed or described during interviews, as well as the resulting artifacts: the sheets and usage logs, containing the full chronology of function calls made to the
DreamSheets system. We periodically viewed their spreadsheets throughout the 2-week study, including their Version History – a detailed record of changes made to the sheet, which we used to recover and save copies of previous versions. We engaged in a rigorous data coding and analysis process by stepping through the version history of each expert participant’s Google Sheets document and leveraging usage data logs to build data visualizations of each “exploration session.” This included labeling high-level screenshots of each sheet noting structures (e.g. exploratory axes like “different seed in each column”), clustering semantically-similar prompts, and modeling the development of prompts over time, denoting both LLM-generated and manually authored iterative prompt modifications similar to those taxonomically defined in [
36].
5 Findings: Structures for Exploring Prompt-input-to-image-output Space
Across both studies, participants’ use of DreamSheets and the artifacts they produced allowed us to observe and identify key elements of their TTI creative workflow: their goals, the strategies they chose to pursue, and the interface structures they constructed to support these strategies.
Our expert participants were able to construct a number of sophisticated spreadsheet systems and improved them throughout the study. The design of these systems were shaped by their particular exploration strategies for navigating TTI space; we highlight some of the patterns that emerged in the findings below.
In this section, we abstract the hyperdimensional Prompt-Input and Visual-Output space as two dimensions that the generative model maps between. The hyperparameter input space (2-dimensional in DreamSheets, using seed and classifier free guidance, or cfg) is shown, when relevant, as a smaller, transformative dimension that lies between the two hyper-dimensions and influences the mappings between them. We use these abstractions to illustrate a particular exploration strategy and draw attention to the particular space(s) that TTI users are sensemaking during their exploration, and show structures prototyped by DreamSheets participants that exemplify that exploration strategy.
The overall trend was for participants to start by recreating the prompt templating activities as in other tools, but then build on top of these by selecting and combining axes of exploration–which can operate in both linear and non-linear ways. Participants then constructed 2D “small multiples”-style grids, iteratively layering axes for increasingly sophisticated explorations of prompt-image space—towards targeted areas of image space, and simultaneously, towards sense-making exploratory goals: to observe and understand capabilities and interactions. For example: what artists and styles can this model reproduce? What subjects (e.g., animals) and attributes (e.g., colors, facial expressions) might interact to yield interesting results?
5.1 Iterative Prompt Exploration
Iterative prompt refinement, where participants gradually refine a prompt while testing the effects of each addition, is a fundamental TTI exploration strategy possible in any interface. However, many interfaces provide only a few (< 10) results, and offer limited support for comparing results, displaying only the results of one experiment at a time, or only allowing users visually compare the results of chronologically adjacent experiments, as in the sequential “chat” history interface model used by Midjourney or Dreamstudio.
Limited support for user-structured comparisons makes it difficult to evaluate the impact of prompt modifiers, and confounds effective sense-making; what the user may perceive as a
semantically close edit in prompt space can translate to a confoundingly
large visual transformation, and vice versa – see Figure
6.
Our expert participants described previously using external tools to save and evaluate TTI results history – saving favorite results and prompt modifiers into a spreadsheet (E1, E5) or word document (E2, E4) with notes on the expected impact of each token, for example.
Spreadsheets inherently afford rich, reconfigurable, and structured history-keeping and results evaluation. DreamSheets users leveraged the“infinite canvas” qualities of digital spreadsheets to keep and evaluate in situ records of their exploration history. They also leveraged the inherent reconfigurability of results within DreamSheets; all participants used duplication (copying and pasting groups of cells, or entire sheets) to repurpose and iterate on prior explorations (including novices, who duplicated and adapted liberally from the tutorial structures.)
To effectively steer prompts towards desirable outputs, users benefit from reconfigurable history structures. Concurrently,
DreamSheets users found that it was beneficial to
generate and
view larger samples of results simultaneously. All participants tried organizing generated outputs in a “small multiples” or “contact sheet” layout (as described by E1); 3 of the 5 experts (E1, E4, E5) explicitly remarked on its usefulness for large-scale results evaluation. Calls to the
tti function comprised the bulk of participants use of
DreamSheets, as shown in Figure
5; generating an average
7,925 unique images over the course of the study.
To generate these larger samples of results, participants used variation strategies to efficiently and systematically generate many variations of a single prompt idea. These variation strategies took the form of Parametric Manipulations and Semantic Manipulations as described in the following subsections.
5.2 Parametric Manipulation
While prompt-crafting is central to the effective use of TTI models, the ability to quickly manipulate hyperparameters alone provides a useful dimension for exploration, this motivated participants to develop structures around hyperparameter control.
All expert participants used dynamic references to a column or row with a series of hyperparameter values to prototype a “slider” like evaluation structure; see Figure
8 for examples using
cfg.
3/5 expert participants (E2, E3, E5), used “Power Cells” to prototype the functionality of a global “Settings” panel with the option to
regenerate on update. By structuring sheets such that all generations reference a seed or cfg value from a particular cell, updating this “Power Cell” would regenerate the entire sheet of results. This afforded iteratively testing values on a large sample of results to find a desirable “setting.” See E5’s “Same Seed Prompt Explore” in Figure
7 B for an example.
DreamSheets provides two parameters that influence image generation: a stochastic hyperparameter (seed) and a non-stochastic hyperparameter (classifier-free-guidance, or cfg) which participants used to transform their explorations in different ways.
5.2.1 Stochastic Transformations.
Seeds define the specific random noise that the diffusion model will use as a starting point; the model then repeatedly “de-noises” successive versions to generate an image, with the text prompt as a guide.
Generating many images using the same prompt but different seeds was a common strategy across participants in both studies.
All 5 expert participants utilized seed variations to quickly evaluate many “versions” of the same prompt. As shown in Figure
7 A, a series of seed values allows the user to reveal a larger area of image-output space with each prompt. This improves efficiency towards creative goals (increasing the likelihood of finding a desirable output) and in sense-making (revealing larger samples of output space with each input test.)
There is no perceptual correlation between adjacent seeds, but images generated with the same seed may share visual similarities with the original noise pattern. A seed can then be useful for biasing generations towards a particular composition or color pattern, motivating targeted explorations of hyperparameter space. E5 prototyped several versions of a seed-exploration structure, including "Same Seed Prompt Explore" shown in Figure
7 B. With a “vector graphics” design goal, they used this structure to identify seeds that could bias the image generation to feature “a central object” on a flat background. E5 found seed
7935 and used this value in future “vector graphics” style explorations (see Figure
11 B).
5.2.2 Non-Stochastic Parametric Transformations.
E2 and E5 were particularly interested in exploring different cfg values. This hyperparameter has a perceptually linear influence on the image generation; a higher cfg value generates images more strongly influenced by the prompt. Low cfg values allowed E5 to gain a sense of “the model’s priors,” using a phrase commonly used in machine learning to refer to bias or preexisting (“prior”) beliefs. E5, E4, and E1 alluded to the “default style” latent to a specific image generation model as being important for prompt-artists to learn.
Non-stochastic hyperparameters that offer more “controlled” transformation are useful for exploration “depth” (i.e., repeated image refinement)—a capability that
DreamSheets lacks support for, and that systems like ControlNet[
60] cater to.
5.3 Semantic Explorations
“Does Stable Diffusion know the same artists I do?” (P11)
Participants manipulated language to make movements in prompt-space that would, ideally, translate into movements towards interesting areas of image-space. The spreadsheet interface provided a familiar structure for 2D evaluations; evaluating the combinatorial effect of two exploratory “axes” at a time (e.g. “subject” columns by “art-style” rows) was a common strategy. P6, a novice user in the preliminary study with limited prompting experience, said:
“I think it’s good to know how things change depending on different variables... the spreadsheet helps with navigating what exactly is changing within the image.” (P6)
5.3.1 Manual Semantic Exploration.
To streamline iterative prompt explorations, participants constructed dynamic “prompt templates” that combine “base prompts” with swappable “slots.” Our expert participants echoed [
4]’s findings, treating these carefully crafted “prompt templates” as art pieces in themselves.
In Figure
9, E1 combined simple “base prompts” with several “Internet aesthetic” words to evaluate their efficacy as prompt modifiers – checking if the model would interpret each “aesthetic” in alignment with their expectations. Participants used manually constructed series (as opposed to the list-generating LLM functions) to conduct specifically targeted explorations. E4 prioritizes total creative writing control, using TTI to craft comedic prompt-image pairs and narrative concept art; they chose to manually craft most prompts without LLM-assistance.
5.3.2 Generative Semantic Exploration.
Participants employed
DreamSheets’s LLM-based functions and spreadsheet concatenation to build sheet-systems that streamline the discovery of useful points in prompt-space. By crafting dynamic prompt templates that reference from LLM-generated lists, participants select semantic “axes” to define a “prompt space” for LLM-assisted exploration. 4 of 5 experts (all except E4) utilized cell concatenation to craft dynamic prompt templates with LLM-generated prompt parts, though all of expert participants used the LLM functions to some extent. See Figures
4 and
5 for counts of how frequently participants utilized each of the LLM functions across both studies.
E2 and E3 independently developed two-part interfaces that separate the design process into prompt-authoring and image-evaluation steps. The “lists” section of the interface houses the semantic-axes selection and sampling process: here, the user selects categories for text generation like “lighting techniques,” or “mythical creatures,” and decides how to combine them (often with row-wise, comma-separated concatenation.)
In a separate section, the concatenated prompts are used to generate a column of image results. After constructing these sheet-prototypes, E2 and E3 reused them extensively for many explorations.
For 3 of 5 expert participants (E1, E2, E5), list generation functions gpt_list and list_completion comprised more than half of their LLM Function use. E3 also used these list generation functions many times—in fact, more than any other participant—but their use of DreamSheets stood out overall: E3 made 9268 LLM function calls, 7.6 times more than the next most frequent user of the LLM functions.
E1 used text-generation to discover interesting new prompt modifiers, including “monochromatic,” “watercolor”, and “pixel art.”
“I didn’t set out to make pixel art or watercolors, but through the course of the study I discovered these aesthetic spaces that I really loved!” (E1)
LLM-functions can accelerate prompt space traversal while supporting creative
Recognition over Recall- users can choose
camera angle as a “slider” to explore, then recognize an appealing “setting” in the generated results - without having to know or recall the words to describe it [
33].
5.4 Flexible Scaffolding for User-Structured Multidimensional Explorations
Users approach TTI generation systems with a wide variety of creative goals, as showcased by the diverse images generated by our participants. Participants iteratively combined multiple, multidimensional exploration strategies to prototype bespoke sheet-systems for targeted explorations of prompt-image space.
Figure
11 A shows some of E1’s
Expressions Exploration combining an LLM-generated list of
facial expressions, stochastic transformations, and a manually authored list of
subjects (man, girl, dog).
Figure
11 B shows a segment of E5’s targeted, multidimensional “vector-graphics’’ exploration, combining manual and LLM-generated semantic axes with a global “Power Cell’’ for iteratively regenerating the sheet with different“flat-color-biased’’ seed values, identified via targeted explorations as in Figure
7 B.
Over the course of the study, E5 generated 4693 such “vector-graphics” style images across two large-scale exploration sheets; E3 made 11,562 unique “animal-plant photography” generations with the system shown in Figure
10. Other participants pursued a variety of diverse creative focuses throughout the study, but all of these prompt artists exemplify the visual art style development observed by Chang
et al. [
4]: with delicately crafted prompt-templates and deliberately selected hyperparameters, they developed distinct art styles and share with their online communities as images, prompts, and prompt-templates.
DreamSheets’s flexibility allowed users to develop custom systems for various goals. The patterns that nevertheless emerged suggest generalizable structures that future systems can offer as dimensional exploration “units” –
composable support structures with the flexibility to decide when and how to combine them. This echoes a takeaway from Li
et al. in “Beyond the Artifact: Power as a Lens for Creativity Support Tools”: creative practitioners are empowered when they can
laterally compose tools in an efficient workflow, or refuse tools and replace them with others [
23].
Flexible options for AI-assistance can play a role in fluidly supporting different exploration strategies and styles. When users hand off creative-labor and control to the power of AI-generated serendipity, they should maintain the power to reclaim control at any time. This echoes tensions observed by Lawton
et al. in “When is a Tool a Tool? User Perceptions of System Agency in Human–AI Co-Creative Drawing” [
20]. Future studies should investigate how co-creative systems might allow users to flexibly control where and how AI “assistance” influences their workflow.
6 Co-designing Exploration Supportive UI Features with the DreamSheets 2.0 Mockup
To more concretely probe into how participants conceptualized their own sense-making systems, and to inform a more generalized understanding of how we can support these processes, we developed “DreamSheets 2.0” UI mockups with participants to elicit concrete feedback and speculative design ideas.
To facilitate the application of our findings towards future interface designs, we map the co-designed UI concepts and components to the exploration strategies observed in participants’ use of
DreamSheets 1.0; we present these in Figure
14.
6.1 Rich, Reusable Exploration History
Our participants’ sheet-systems in
DreamSheets exemplified how TTI interfaces can offer improved support for iterative prompt exploration by affording rich history-keeping and structured, large-scale evaluation across results.
To that end, we designed the
DreamSheets 2.0 mockup to include two visual layout settings that users can freely toggle between: a “small-multiples” grid view (
12, left) and a focused list view(Fig.
12, right.) “Scalable” output display supports TTI users as they move between broad and focused evaluations of results.
Participants used sheet and cell duplication to iteratively repurpose exploration history in DreamSheets 1.0; this informed our decision to suggest features for revisiting prompt-history and exploration “sessions” in DreamSheets 2.0.
We proposed a prompt “token bank” system (Figure
12, featured in Figure
13, right) that would allow users to convert highlighted prompt text into a
Saved Token for reuse in future prompts. Tokens can be converted into
dynamic tokens for semantic explorations.
6.2 Supporting Exploration Breadth and Depth
Participants utilized the structured “infinite canvas’’ qualities of digital spreadsheets to iteratively expand explorations. Providing the user direct control over the “scale” of their generations allows them to flexibly expand and evaluate explorations. To this end, DreamSheets 2.0 would continuously load more results on-demand, for potentially infinite scrolling. Cost may limit the feasibility of large-scale generations; offering cheaper “low-fidelity” generations and options to increase quality on-demand may be a potential solution. We replicated the “Power Cell” strategy crafted by participants by providing options that would allow DreamSheets 2.0 users to regenerate their exploration session by updating global hyperparameter settings.
We used the “Explore this image...” option to elicit participant ideas for additional “image refinement controls” beyond the
classifier-free-guidance manipulation available in
DreamSheets 1.0. Suggestions included adding spatial conditioning controls (e.g. manipulating Poses or Edges; participants mentioned ControlNet [
60]), support for generative inpainting (as enabled by Muse [
3]), or image-to-image interpolation (as in SpaceSheets [
28]). Participants also suggested adding Image-to-Text transformations similar to the CLIP Interrogator [
39] or Midjourney’s
/describe function [
1], which could “close the loop” by allowing users to translate ideas back and forth between image and prompt space.
6.3 Supporting Semantic Exploration with Prompt Templates and Dynamic Tokens
DreamSheets 1.0 participants used cell concatenation to craft prompt templates with “slots” to select axes for structured semantic exploration, motivating the design of more supportive prompt-template features in our 2.0 Mock-Up.
Dynamic tokens take the role of the “slots” used by participants to specify where to introduce text variation. Prompt variations are automatically generated with LLM-assistance, systematically combined, and populated across several columns of generated images. The LLM supporting this interaction is surfaced, providing users control over the text-generation, or the option to refuse its support altogether. Users can manually append, edit, and remove prompt words for each column of exploration, at will. This component design meets
Chang et al.’s call for future interfaces to support prompt templates as standalone, interactive computational artifacts [
4].
6.4 Flexible Structures for User-Defined Multidimensional TTI Explorations
DreamSheets 1.0 offered the flexibility of an infinite “blank canvas” with the trade-off of minimal structural support “out of the box.” Our expert participants were required to iteratively prototype systems custom to their creative “styles” and workflows. Rather than designing for a particular workflow, DreamSheets 2.0 suggests supportive structures while maintaining flexibility by presenting composable exploration features. Users can effectively pursue simple prompt-input-image-output tests, or they can construct increasingly sophisticated multidimensional explorations. With configurable, reusable, and refusable components, users can compose targeted, iterative explorations of prompt-image space.
6.5 Participant Feedback
Participants reflected positively on the features presented in the DreamSheets 2.0 Mock-Up, validating our adaptive interpretations of the structures they prototyped in DreamSheets 1.0. The Mock-Up was developed and presented during the study; their feedback directly contributed to improving the design and identifying its most promising components. The ability to save and recover exploration history for reuse in future explorations was a highlight: E4 considered the “prompt token bank” a direct upgrade to their current history keeping practices (saving prompts and useful stylistic modifiers into a text document).
“If I can drag and drop a presaved dynamic chunk... I can fully focus on sculpting the prompt and being creative.” (E4)
The “Save session” feature prompted E1 and E5 to describe similar expected use cases – to save current state of the system, pausing their workflow to return at a later date, ideally supporting history-reuse. E5 expressed their preference to segment their process into a “generation” stage (crafting prompts to generate thousands of images) and a “evaluation” stage (curating the selection of images), and described being able to pause and temporally separate these activities as a potential “game-changer.”
7 Limitations
We acknowledge several limitations of our study. First, the particular hyperparameters exposed may differ between different TTI models; future work should investigate if our approach scales to higher-dimensional hyperparameter spaces. Second, more powerful techniques to guide image generation are emerging in research, such as ControlNet [
60], or Readout-Guidance [
30], which require different forms of input to the TTI model. As participants requested, future work should investigate how
DreamSheets could be extended to support these approaches.
8 Conclusion
Text-to-Image models challenge users to navigate vast, opaque design spaces on both sides—prompt input and image output. DreamSheets provides a flexible, spreadsheet-based interface for users to author strategies to achieve creative goals, and facilitating sensemaking—developing through experience the language and working understanding needed to reliably steer image generations towards interesting outputs. Through two user studies, including an extended expert study, we observed challenges, tensions, and opportunities in the TTI prompt-exploration process. We utilized these insights to develop a UI mockup, improved with participant feedback, and suggesting features for future supportive TTI exploration interfaces. Finally, we considered the implications of supporting users’ sensemaking in prompt-image space, and beyond.
9 Disclosure
The authors used ChatGPT for minor copy editing tasks.
10 Expert Artist Credits
In consenting to participate in the study, participants gave explicit and informed consent for the data collected during the study, including their creative contributions, to be anonymously published in research findings. That said, the authors believe that the AI and research community should strive to credit artists when their work is used, according to their wishes. After acceptance, the authors reached out to the expert generative artist participants to collect their preferences with regard to remaining anonymous or receiving credit in the final publication.
(1)
Expert Participant 1 (E1) is Stephen Young, a mixed media generative artist with 2 years of AI experience. He’s worked with multiple GAN and diffusion models to create art available on his website
https://www.kyrick.art, @kyrick.art on Threads and @kyrickyoung on X.com.
(2)
Expert Participant 2 (E2) is Jeremy Torman, an interdisciplinary artist/musician that has been painting for over 20 years and using generative ai tools since 2016. He has used GANs, deepdream, style-transfer, vqgan+clip, jax diffusion, deforum, et al. He shares his work online @tormanjeremy on X/Twitter and @jeremy_torman on Instagram.
(3)
Expert Participant 3 (E3) is Seth Niimi, a multi-passionate creator who explores unique ways to combine tools, processes and ideas in pursuit of fascinating experiences. Seth can be found on TikTok and Instagram as @synaestheory.
(4)
Expert Participant 4 (E4) chose to remain anonymous.
(5)
Expert Participant 5 (E5) is a Canadian generative artist known as sureai.i, with 2 years of experience in the TTI space. She has previously used a variety of tools, including Stable Diffusion and Midjourney, along with manual digital editing. Her work can be seen online at @sureailabs on X.com and @surea.i on Instagram.
Acknowledgments
We would like to thank our reviewers for their time and effort; their suggestions were immensely helpful in improving the quality and clarity of this work. We would also like to thank our participants for their contributions to the studies.