research-article

Open access

“They only care to show us the wheelchair”: disability representation in text-to-image AI models

Authors:

Cynthia L. BennettAuthors Info & Claims

CHI '24: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems

Article No.: 288, Pages 1 - 23

https://doi.org/10.1145/3613904.3642166

Published: 11 May 2024 Publication History

All formats PDF

Abstract

This paper reports on disability representation in images output from text-to-image (T2I) generative AI systems. Through eight focus groups with 25 people with disabilities, we found that models repeatedly presented reductive archetypes for different disabilities. Often these representations reflected broader societal stereotypes and biases, which our participants were concerned to see reproduced through T2I. Our participants discussed further challenges with using these models including the current reliance on prompt engineering to reach satisfactorily diverse results. Finally, they offered suggestions for how to improve disability representation with solutions like showing multiple, heterogeneous images for a single prompt and including the prompt with images generated. Our discussion reflects on tensions and tradeoffs we found among the diverse perspectives shared to inform future research on representation-oriented generative AI system evaluation metrics and development processes.

1 Introduction

Generative AI is growing in capability and popularity, promising wide-ranging improved utility. However, literature points out how generative AI replicates existing biases in the world in its outputs [7, 12, 15, 26, 33, 34, 49, 64, 76, 84]. Consequently, there is a growing call to focus on the ethics of these technologies, and especially their impacts on minoritized groups [12, 33, 64], by centering the expertise of impacted communities [63, 64, 80]. While some work has investigated how to quantify these issues in large-language models (LLMs), less has focused on disability and text-to-image (T2I) models. Meanwhile, users have already demonstrated that these models can replicate existing disability stereotypes [22]. People with disabilities are particularly well-situated to identify ableism and provide insights into more respectful AI system development and outputs [33]. As such, this work sought expertise from disabled people with a variety of experiences, and presents recommendations regarding disability representation in AI-generated images.

Thus, we invited a variety of people who self-reported having a disability to evaluate T2I images generated from disability-related prompts, and to share their broader perspectives on disability representation in T2I systems. We conducted eight 90-minute semi-structured focus groups with 25 participants, and each group was bookended by a pre- and post- survey. The range of their disabilities included: people with sensory disabilities, people with mobility impairments, people who had mental or chronic illnesses, and people who were neurodivergent. Our interviews focused on understanding 1) what people with disabilities think of current disability representations in T2I systems, especially if they notice any biases, tropes, or stereotypes, 2) the impacts of misrepresentation amplified through AI, and 3) what changes to T2I systems might promote respectful disability representation.

Our findings highlight several tropes and stereotypes, many negative, that are apparent to people with disabilities in T2I images, including perpetuating broader narratives in society around disabled people as primarily using wheelchairs, being sad and lonely, incapable, and inactive. Some of the images were strikingly dehumanizing as assistive technologies (AT) were more visible than the person [33]. Moreover, the representations lacked diversity in age, gender, race, and disability types. While our participants saw useful ways to engage with T2I systems, they were also concerned about the harms caused by these consistently problematic representations of people with disabilities in broader society (e.g., reifying stereotypes, thereby encouraging more bias in day-to-day interactions) [71]. They offered T2I improvement suggestions, such as displaying multiple highly different images in a single output to prevent homogenous representations of disabled people, and including metadata with images to add a non-visual method to indicate the presence of disability in a photo. At the same time, they pointed out instances where there was no clear solution to “better” representation, like how to represent invisible disabilities respectfully without resorting to reductive or stereotypical representations.

We note that our findings are not a comprehensive review of T2I outputs, but they are meant to demonstrate the harms that people with disabilities identified as currently being perpetuated in these images, and the nuanced expertise that is needed to identify and fix them. Our participants’ responses point to a complex path forward where tradeoffs are inevitable. Indeed, we saw conflicting opinions and priorities among participants, especially between people with different experiences with disability. Continued engagement with a diverse sample of people with disabilities is critical to better understand and mitigate the impacts of these tradeoffs. In summary, this work contributes:

(1)

A collection of disability portrayals and tropes found in T2I images identified through focus groups with 25 people with diverse disabilities, which provide a richer characterization of the biases that systems should avoid.

(2)

Recommendations for how T2I generative systems can better support disability representation through changes in their interfaces and in evaluation and development procedures.

(3)

Future research directions, informed by tensions around respectful representation of diverse disabilities in generative AI and visual media.

2 Related Work

2.1 Disability Representation in Popular Culture

Disability is an identity characteristic that is represented throughout popular culture and media. Representation in popular media requires nuance; omission or misrepresentations of narratives in media can cause harm by perpetuating biases or stereotypes about a group, or hindering their abilities to envision better futures for themselves [38, 52, 53]. Historically, representations of people with disabilities have often been fantastical: meaning disabled characters in film and literature were portrayed as “exotic” or “freaks” [35, 65], as heroes, inspirational, or savant-like [31, 35, 88], or as people in need of charity or help [35]. Disability studies scholar Rosemarie Garland-Thomson classified four rhetorics of disabled representation in photos as: the wondrous, the sentimental, the exotic, and the realistic. While several of these rhetorics propagate harmful stereotypes of disabled people as being pitiful or voyeuristically sensational, Garland-Thomson stated the importance of “the realistic” representation. Representations of people even doing unremarkable tasks in daily life, Garland-Thomson argued, are critical to normalizing respectful disability narratives.

Recently, activists, scholars, and survey efforts have identified limited representations in media of disabled people, and particularly when considering the breadth of disability experiences across intersections of identity (e.g., race or LGBTQIA+ status) [1, 21, 23]. For example, a disablity disability community-led project called #CriticalAxis categorizes popular media representation of disability according to different dimensions, like if they promote stigma versus empower disabled people [2]. Disability justice activists have similarly advocated for intersectional representation and leadership more generally by people who are multiply minoritized (e.g., BIPOC or queer disabled people) [45, 62, 86] and produced performances that center empowering representations of disabled people [46]. As AI systems learn and perpetuate biases from their training data, a lack of positive disability portrayals in media can also impact how AI systems represent people with disabilities [13, 48].

Turning to disability representation research in HCI, while few papers address the topic explicitly, we find synergy with related work on visual representations of disability in depictions of fictional people and caricatures. For example, work has explored how disabled people want representative personas used in professional design processes to show the nuance and complexity of disabilities and other identity characteristics intersecting rather than for them to simplistically map singular disabilities onto accessibility features [30]. In other research, people with disabilities wanted graphical anthropomorphic representations of fictional people to also reflect real-world disability and intersectional diversity [47]. Finally, research investigated how people with disabilities wanted their disabilities reflected in digital avatars. Many participants wanted to disclose their disability identity and used visual cues in their avatars to do so. While some participants could display their disability by choosing how their avatar’s body appeared (e.g., showing a limb difference), other participants relied on showing assistive technologies, behaviors, or disability cultural references to signal their affinity [57, 90]. Along with focusing on visual representations, our study required us to consider how participants articulated disability in text prompts to the T2I models, and their evaluations of how the models then expressed these specifications visually.

2.2 Generative Image Models and Bias

Text-to-image (T2I) models enable users to generate images in response to free-form text prompts. As their capabilities improve, T2I models including [68, 72, 73, 89] are being widely adopted by end users to complete a variety of tasks. For example, T2I models are being integrated into artists’ media-making workflows [19], and are being used to create stock photos [55] and visualizations within slide decks [14]. T2I tools can lower the bar for entry into graphic design work, by speeding up prototyping [51] and improving accessibility for creators who themselves have disabilities [39].

T2I systems are trained on large multimodal datasets to learn associations between words and images [66, 69, 74]. As developing and curating such datasets is never a neutral process, these models can also render biased outputs [13, 28, 48, 61]. Indeed, a large body of scholarship has uncovered pervasive biases within generative language and generative image technologies. For example, generative language technologies have documented issues around perpetuating negative stereotypes for people across different demographic characteristics including age [27], nationality or ethnicity [4, 25, 54], race [18, 54], gender [18, 25, 54], religion [54], and ability [33, 58, 82]. Similarly, generative image models, including T2I systems, have demonstrated biases pertaining to gender [12, 56], nationality or ethnicity [12, 64], race [12, 20], and ability [12].

While disability was once an understudied aspect of AI fairness, there is now a growing body of literature on the topic [9, 41, 60, 81, 85]. For example, studies found that natural language processing models classified text about people with disabilities more toxically or negatively [42, 82]. Generative language technologies similarly produce text with more negative sentiments when discussing people with disabilities [33, 58]. In one study, disabled participants prompted a LLM for narratives with subjects with disabilities. The model produced narratives that perpetuated biases including over-representing wheelchair use, people with disabilities being sad or lonely, disabled people as needing help, or as inspirational to nondisabled people [33], reifying Garland-Thomson’s “the wondrous“ and “the sentimental“ rhetorics [35]. Disability has been far less studied in the generative image domain, with the exception of Bianchi et al. which highlighted an example of T2I system generating disempowering representations of people with disabilities. Our study closes this gap by presenting the first empirical in-depth study on disability representation in T2I models.

2.3 T2I System Evaluations

Researchers have developed a range of methods to uncover and measure harmful biases in T2I systems, often by examining how different people, communities, and cultures are represented in response to different input prompts. A common approach focuses on examining the sociodemographic characteristics of people depicted in images. For example, Cho et al. quantified skin tone and gender biases by computing the distribution of skin tone and gender presentations of images generated for prompts referencing different professions (e.g., “A photo of a nurse”) using automated classifiers and human-raters [20]. Bianchi et al. compared CLIP [67] embeddings of generated images with CLIP embeddings of photographs with self-identified gender and race to assess the respective characteristics of images generated from prompts referencing harmful stereotypes (e.g. “a thug”) [12].

While the aforementioned automated approaches provide valuable insight into the prevalence of T2I system biases at scale, qualitative approaches can uncover novel biases and nuanced representational concerns. Bianchi et al. coupled quantitative evaluations with visual inspection of a smaller set of images, to reveal a range of problematic biases pertaining to culture and disability [12, 33, 64]. Recent scholarship has also gathered focus groups to qualitatively study South Asian cultural biases in T2I [64] and US disability biases in LLMs [33], respectively. Our work extends the nascent scholarship on qualitative and community-centered T2I evaluations, and presents the first of its kind study of disability representation in T2I.

3 Methods

To run our study we conducted eight focus groups, and each was bookended by a pre- and post-survey. Our method evolved from a small but growing body of qualitative evaluations of bias in generative AI models [33, 64]. Twenty-five people participated across the span of a month, whose sociodemographic information is shown in Table 1. We recruited from a distribution list of access technology users who have consented to receive information about studies at our institution. This study followed the guidelines on conducting research with human subjects specified by our organization, and our participants completed an informed consent process. To qualify, participants had to be US or Canada-based and at least 18. They were asked to self-report disabilities, chronic or mental health conditions, or neurodivergence. Prospective participants completed a screener linked from the recruitment message, where we determined this eligibility. We first selected participants to include a diverse range of disabilities, chronic conditions, and mental health conditions, then we prioritized choosing people to get a diverse set of assistive technology use and for intersecting identities (e.g., multiple disabilities). Participants self-reported all identity characteristics. We included a list of suggested options (e.g., woman, man, non-binary), as closed ended questions can can be more accessible for some participants. We also allowed them to instead describe their identity in their own words in a text box, understanding that multiple choices cannot encompass identity. Participants received a gift card as thanks for their participation.

Table 1:



Race		Gender
American Indian or Alaskan Native	1	Men	12
Asian	5	Women	13
Black or African American	3
Hispanic, Latino, or Spanish origin	3	Disability
White	13	Blind or low vision	6
Did not disclose	4	Chronic condition	7
		Cognitive related disability	4
Age		d/Deaf or hard of hearing	5
18-24	1	Deaf-blind	1
25-34	7	Mental health condition	2
35-44	7	Mobility or motor disability	14
45-54	3	Neurodivergent	4
55-64	4	Speech-related disability	3
65-74	2

Table 1: Participant demographics in terms of race, age, gender, and disability.

3.1 Pre-Study Work

Participants completed 30 minutes of pre-study work. These tasks included signing the consent form, watching a 4 minute video, and completing a survey. The video introduced basic concepts of text-to-image AI models, including how they are trained, how and why they are used, and example prompts (section B includes the full transcript). The survey established community guidelines for respectful interaction during the focus group, asked participants to share any accessibility needs to participate, and asked participants for prompts or communities (e.g., Deaf people, people who are blind) that they would be interested in seeing in T2I images shown during their focus group.

3.2 Study Images

During the focus group and post-study survey, participants evaluated a variety of T2I images, which we showed in a series of five image sets. We scoped our research to study three popular and publicly-available models Dall-E 2, Stable Diffusion 1.5, and Midjourney to generate the images. All images were generated between June and August 2023. We pre-generated most image sets, so the bulk of focus group time could be spent gathering feedback, as at the time of data collection, image sets could take around a minute to generate. However, we created one image set during focus groups with participants via live-prompting, understanding that we could not anticipate all prompts that participants were interested in trying.

Image sets 1-3 feature one selected output images for a given prompt from each of the three models. Each model produced multiple images for a given prompt. Based on pilot testing, we found that showing 4 output images from a single model over screen share on a video call made each image quite small, which could compromise accessibility. Therefore, we chose to show a single image from the 2-4 generated by each model. This decision made it possible for participants to more easily see and discuss details in the images. However, this choice meant that we could not make precise statements about how common certain types of images might be in typical use.

At the start of the project, the research team discussed which images would be the highest priority to get feedback about. The team decided to choose images that included the most details that were common to the image set (e.g., if 3 of four images showed wheelchairs we chose an image of a person with a wheelchair). If the photos had few common elements between them, we chose an image that was visually different from other images shown in the study in order to capture feedback about different types of images (e.g., choosing an image with a walker if no other study images showed one). Images for sets 1-3 were chosen by the first author and reviewed by the research team.

Image sets 4 and 5 emulated a user’s experience prompting a model, by showing all output images together without any researcher adjustments to the outputs, for participants to compare. Our study images were not a representative sample, but rather a set of prevalent representations demonstrating what the models were capable of producing. Below we share the makeup and rationale of each set.

(1)

Generic Disability Image Set: We showed all focus groups the same three images (one from each model) from the prompt: “a person with a disability, photo”. This created cohesion in our data in that every participant gave feedback about this image set, and it represented a use case when someone relatively unfamiliar with disability might prompt for that term.

(2)

Specified-Disability Image Set: We showed a group of six images which was comprised of three images (one from each model) in response to each of two disability-specific prompts (e.g., “a person who has cerebral palsy”); we selected one disability to prompt for based on the lived experiences of the participants in that particular focus group as gathered from the screener survey, and the other based on participants’ pre-study survey requests that did not overlap with the first disability type. We sometimes used their exact suggested prompts, and other times edited them or combined text from multiple participants’ prompt suggestions.

(3)

Context and Specified-Disability Image Set: We showed one image from each of three different prompts for people with specific disabilities who had intersectional identities (e.g., a group of Black disabled people hanging out) or disabled people doing different activities (e.g., working, parenting, dating). We engineered prompts similar to the process used in image set 2, basing them on participant requests in their prework. Since we varied the context in each prompt, we chose to keep the model consistent to minimize variables. We chose Midjourney because it often generated images with the fewest visual artifacts, model errors which we were concerned would distract participants. For example, sometimes faces were rendered in ways that were non-normative but did not accurately represent a disability.

(4)

Live-Generated Image Set: We then prompted Midjourney live during the study to create image set 4. We took prompt suggestions in real-time from participants, to capture the evolving conversation. Depending on time, we evaluated image outputs (4 per prompt) from between 1 and 3 prompts.

(5)

Post-Study Image Set: In the post-study survey, we shared 4 images for each of 3 prompts (12 images total, all from Midjourney) following the context and disability-specific formula used for image set 3, and we developed the prompts to cover a diversity of topics (e.g., general and specific disabilities, a visible and invisible disability, professional and hobby context). We included this final set to 1) allow participants more opportunities to respond to multiple images from a given prompt and 2) to get feedback on images in a different medium from the focus group. The prompts were:

Prompt 1: a person with a disability with their family, photo

Prompt 2: a person with depression at their job, photo

Prompt 3: a blind person doing hobbies, photo

This research involved discussing the identity characteristics of image subjects that were created by AI, and therefore do not have formal identities. In this paper, we used terminology from our participants in their suggested prompts and feedback during focus group discussions.

3.3 Focus Group

Each 90-minute, virtual focus group included 2-4 participants. We grouped participants such that they almost always shared a similar disability identity (e.g., all were neurodivergent)¹. We found that this shared identity often gave participants a common background that helped encourage discussion. However, several participants had multiple disabilities, and had experienced their disabilities differently. Thus focus group discussions often expanded beyond one disability type.

We asked participants to react to 3 image sets (described above) that we had prepared for the study, and then the images resulting from live-prompting (image set 4). The images were tailored to each focus group, though we reused photos when participants in a successive focus group requested similar prompts in their prework. As prior qualitative research work about disability representation in generative AI highlights the heavy bias in generated outputs [33], we chose to explicitly ask whether participants found evidence of stereotypes in the images, and their impacts on participants. Specifically, after showing each image set and reading their respective descriptions, we asked participants to share: 1) their initial reactions to the images, 2) if there were good or bad stereotypes or assumptions being made in the image, and 3) what output they would have wanted to see from the model. We then asked participants follow ups based on their interests and prior answers.

The focus groups closed with a reflective exercise. During this final conversation, we asked participants to respond to a scenario: Imagine that T2I systems like the ones we had used during the focus group were available for wider use, such as for professionals or children interested in seeing depictions of disability, and incorporating them into presentations. Participants shared their thoughts and reactions to this scenario. The transcripts we collected consisted of verbal communications and messages from the online text chat, which is how participants sometimes preferred to communicate.

3.3.1 Study Accommodations.

We provided a variety of accommodations including breaks, presending study materials, ASL interpretation, taking feedback verbally or from the chat, we compensated trusted communication aides when participants requested to bring one to the study, and we provided image descriptions. We referred to research to develop a format for image descriptions for all pre-generated images [8, 30, 77], and shared them with participants before the study. The facilitator described live-generated images on-the-fly using a similar format. We prioritized describing the image subjects, as disability representation was the study’s focus. Subjects’ actions, if any, were also prioritized near the beginning of descriptions. We then described the overall scene and aesthetic, and finally anything else that stood out given the topic (e.g., AT that seemed oddly designed, or meaningful objects like a puzzle piece or rainbow). Two researchers who have extensive experience with US disability community and media co-wrote the descriptions.

We did our best to ensure equitable access to information (e.g., making sure image descriptions were read out before anyone shared their reactions) and to create a space where everyone could share at their own speed. For example, we had four participants whose disabilities affected their speech rate, and we established a turn-taking order that prioritized our participant using an AAC device if he had an answer prepared, and allowed him to go last if he needed time to compose his message. Another participant had a cognitive disability that affected her memory, so the participants and facilitators decided together that this participant could share her thoughts first in response to a question while it was fresh in her memory.

3.4 Post-Study Survey

After the focus groups, participants completed a final 30-minute survey with two parts. Participants first shared any additional thoughts about the images from the focus group. Second, we asked them to answer the same follow-up questions for two images of their choice from set 5 (described above), and to share any broader trends that they noticed while examining all 12 images.

3.5 Analysis

To analyze our transcripts, we adopted a practice based on thematic analysis [16, 17]. First, at least two of the five authors closely read each transcript and developed an initial set of codes; one of these authors read all transcripts to establish cohesion in the analysis. We then refined codes during several meetings with the full team. The codebook included high level groupings (e.g., expectations of the prompter, model errors) and specific codes (e.g., the prompter expects to revise the prompt, participants felt the person did not look disabled). One author then applied the codebook to all of the transcripts and the post-survey data.

3.6 Positionality

The actions of collecting, analyzing, and writing about this data were informed by our own experiences and identities as researchers. For example, we operationalized disability in prompts based on English and western disability terminology and conceptions of disability rights and culture. Our team has done the majority of our research in US contexts, and we have a variety of visible and invisible disabilities and cisgender and transgender identities. Additionally, a majority of us are white, and we lacked representation of older adults. Combined, we have extensive experience in accessibility, disability studies, critical data studies, and machine learning research.

4 Results

Our participants shared a range of perspectives on AI-generated images and using T2I models more broadly. Here, we present their insights around 1) how disabled people were represented in images, which often drew on tropes and reduced disabled image subjects to a specific archetype, 2) tensions around the outputs of these models, expectations of their behavior, and iterating on prompts, and 3) recommendations for how to improve T2I systems and their outputs.

4.1 Participant Reactions to Text-to-Image Model Outputs

Participants shared what visual properties of the images they saw that they felt represented disability. Often, they thought an image subject was disabled if the subject used an assistive technology or was portrayed performing an action linked to disability (e.g., using American Sign Language). If a subject did not have one of these visual features in a prompt about disability, they speculated about if the person had an invisible disability or if the model made an error and omitted the prompted disability. In assessing for the presence of disability, participants also shared their evaluations of the quality of representation, including whether the images depicted existing disability tropes. We now share participants’ reactions to the outputs of the models, especially highlighting the common tropes that people saw reflected in the images.

4.1.1 Representational Tropes.

Participants commented on several tropes in how disability was represented, which we summarize in the following tables and further discuss below. We provide additional example images from each subtheme in section A.

Table 2:

Table 3:

Table 4:

Table 5:

Table 6:

4.1.2 Positive reactions to images.

When the model rendered the specificities of a prompt accurately, participants were happy or impressed. Oftentimes highly specific prompts were engineered to reflect a participant’s lived experience. For example, one participant asked for: “An African American woman with dreadlocks with a tracheostomy riding a power wheelchair, photo” (Figure 1) and was happy to see the results pick up on some of the details in her prompt: “It brought a smile to my face to see the first picture … Because I can see myself [in image] one … there’s an obvious difference between a manual wheelchair and a power wheelchair, so that caught my eye, and that she’s African-American.” In these cases, representation was specified directly in the prompt, and there were visual markers to indicate they had been rendered. When these visual markers appeared in the output, participants considered the output as a good example of disability representation. Particularly for participants who are multiply marginalized and excluded even from ‘diverse’ representation, it can be particularly important and moving to see someone who looks like yourself in these images.

Figure 1:

4.1.3 Exclusionary Filtering Mechanisms.

T2I tools often integrate content moderation strategies, such as filters on the text input or image output, to ensure generated images abide by content policies [36]. For example, input filters may be designed to check prompts for racial slurs, or output filters may be designed to check for violent or offensive imagery. However, such filtering and blocking mechanisms are not perfect, and can inadvertently end up censoring content that should be allowed [7, 29]. When preparing images for our participants for the prompt “a person with bipolar disorder, photo,” one of the models did not generate images, and instead showed the message “It looks like this request may not follow our content policy.” This was an exceptional occurrence in our image generation, as it occurred less than five times for hundreds of prompts². However, when participants saw the error message, they were confused: “The model, it’s essentially trying to say, well … I don’t want to work on that for some reason. It’s like, why?” (P11). They wanted more explanation as to why it was not run (i.e., a detailed error message instead of a vague reference to content policies).

4.2 Reductive, Archetypal Representations and Their Impacts

The consistently-generated tropes and biases in the tables below often reduce the representations of people with disabilities to single archetypes: for example, prompts about chronic illness resulting in images of sad, young white-, feminine-presenting people in dark rooms, and prompts about blind people resulting in people wearing big glasses with dark lenses. Participants called out the inaccuracies and harms of these reductive stereotypes. First, they shared that disability is a rich and dynamic experience, which was lost in the AI-generated images. One participant with bipolar disorder explained that her experience can range from manic to depressive but also so much in between (P13). Participants were further frustrated that, when the models reduced them to an archetype, the images focused on the negative rather than the positive aspects of their lives: “what struck me is that we always concentrate on the bad. So if a person happens to have cerebral palsy or memory loss,well, perhaps that affects, you know, [a] percentage of their life,” and went on to describe that there is much more to his life than the negative aspects of his disability (P42). To summarize, participants described that disability is diverse in its expression and experience; AI-generated images that reduce a disability to a single archetype or emotional valence are not presenting an inclusive or accurate portrayal of disability.

Our participants pointed out cases where AI-generated images of people with disabilities could be valuable. First, several participants recognized that AI tools already are and will be used by people to better understand what disability looks like. Some participants were okay with this use case if the image produced quality, diverse images: “Somebody is going to be like what does a person with a disability look like? Maybe it’s a kid in school... I think it’s ok for a kid to put [that] in an AI.” (P13). Participants overall felt that T2I systems could perform useful work around education, or that, at a minimum, they will inevitably be used for these purposes. Those interested in educational use cases were hopeful that models and user interfaces could be developed to purposefully counteract negative stereotypes and start putting out more forms of representation that are in line with what disabled people actually desire. Next, by design, T2I models make it easy for users to create a lot of images quickly. T2I could scale respectful imagery, and thereby make it easier for people to include disability in their imagery with low effort or knowledge of disability experiences. This use case is especially important given that, outside of specific collections [3, 43] stock photos often do not contain many photos of disabled people.

However, participants enumerated their concerns about the impacts of the current, negative representations. When considering nondisabled or otherwise unfamiliar T2I users, participants were concerned that such users will assume that repeated misrepresentations will be considered as fact. One participant expressed her concerns with images in response to the prompt “a person with cerebral palsy, photo”: “if it’s a person who has no idea what [cerebral palsy] means, they will just come to the conclusion a person who is sad, and who uses a wheelchair.” With respect to users with disabilities, participants were concerned about impacts on mental health: “the constant bombardment of these negative and inaccurate portrayals can … [lead] to feelings of inferiority, alienation, and isolation,” (P41). While negative representations would mislead consumers, the impact would be amplified for users who resonated with the prompted experiences. Finally, some participants were concerned about T2I’s impact on employment of people with disabilities to serve as subjects for stock photos; image generation that did not require real people with disabilities could reduce the already few opportunities for them to increase their own representation in media.

4.3 Tensions within Prompts, Outputs, and Expectations

While some of the aforementioned tropes were near-unanimously deemed problematic by our participants, we found that they had mixed opinions and expectations about 1) interacting with prompts and model defaults, 2) how to represent dynamic aspects of disability with static images, and 3) how models should render what is unspecified or underspecified in prompts.

4.3.1 Prompt engineering for more diversity in images.

Participants agreed that models do not by default display adequate diversity, and further, that they should produce diverse output by default. Several participants commented that the models are still usable, users just need to use the first image output to a prompt as a “starting point,” and add further specifications to the prompt to get the diverse representations that they desire. For example, one participant’s group chose the prompt “the spectrum of ability,” expecting an image with a variety of disabilities, and instead the images showed people in wheelchairs in front of a rainbow background. The participant explained, “I think we mostly have to blame ourselves for using the word spectrum,” (P01) since the model seemed to associate spectrum with color. In this example, the participant placed the onus on himself if something turned out unexpectedly, and he was confident he could reach a desired result through prompt engineering, an attitude shared by a few other participants.

However, many participants were doubtful that all users of AI have the motivation needed to reprompt, or even the information needed. For example, if someone has little experience with disability, they might see the results of white-presenting people in wheelchairs and assume that those photos are sufficient. Finally, one participant raised the issue that prompt engineering is not universally accessible. P02 gave the example of someone who is brain fogged and having trouble finding the right words to use, and the lack of alt text associated with output images also creates barriers to prompting nonvisually.

Finally, achieving more diverse representation through prompt engineering further perpetuates the existing labor inequities that exist for minoritized people. Based on the images shown in the sessions, some participants who are not white or who do not use wheelchairs realized they would need to put extra effort into their prompts to see images that reflect themselves. In one focus group, a Black, disabled participant commented: “The base way that this technology works, the assumptions are not what I [think it] should be thinking from day one. I don’t want to have to put all those details in [the prompt] because most people aren’t even going to know how to do that,” (P42). Here, the participant pointed out that it is unacceptable to ignore that these models are biased from the start, and that he shouldn’t have to perform extra work to get images that portray people of color. He also raised the point that understanding how to prompt engineer requires a certain amount of technological comfort and experience that not all people will have. In summary, participants had different relationships to prompt engineering and varied in how much they felt misrepresentation could or should be reliably solved by it.

4.3.2 Tensions with static, visual representation.

There are certain inherent limits to how people can be represented by T2I systems. Most models currently output multiple (oftentimes four), static images in response to one prompt execution. If people use these tools similarly to other image searching platforms, it is expected that they will then select one photo to use, and include that photo, perhaps with an attribution, but without listing the prompt (or search query).

This modality of a single, static image limits the types of ways of representing people, and especially their identities. Participants highlighted that certain disabilities, like Autism or a chronic illness, have more non-imageable features³[83, 87]. Stimming or nausea, for example, are difficult to show in a snapshot. Oftentimes, for these types of disabilities, our participants could come up with a way to represent the disability, but it often required niche background knowledge. For example, one participant suggested depicting fidgets to represent stimming; to depict chronic illness, another suggested showing that a person cooking was chronically ill through a subtle piece of community knowledge: showing the person with a stool to rest on if they got tired.

Participants complicated representation by expressing that they wanted subjects to look disabled, but also “like everyone”, as P30 explained: “What does a disability look like? … It looks like you and you and me. And that person over there and that person over there. Sometimes it looks like what we think it looks like and sometimes it doesn’t.” But, as another participant commented “… but also the point of coding identity visually is to code identity visually,” (P00). There was a tension between ensuring that people who engage with the image know that the person is disabled, but also presenting disabled people as “just like you and me” and avoiding common stereotypes and reductive forms of representation.

4.3.3 Tensions around sticking to the prompt.

Producing images from text descriptions remains an inherently underspecified task [40]. Particularly with more general prompts (e.g., ‘person with a disability’), participant feedback routinely highlighted this challenge and were unhappy if the model strayed too far from what the prompt specified. For example, in response to the prompt “Two blind parents with kids, photo”, the model produced an image of four people with dark glasses and one of the parents using a walker (Figure 2). Some participants were upset that the model added the walker, unprompted. Specifically, one participant was concerned that people already assume that blind people are very limited in what they can do, and that adding a walker makes it look like they are even less capable. However, others suggested that the model might be trying to represent that people can have multiple disabilities. Another concern with this image was that the system depicted all four family members as blind, unprompted. One participant was concerned that it would incorrectly make people think that blindness is always genetic, and another commented: “I think it should have shown the average American family except that the parents are blind. So it’s white picket fence, two kids, and a dog,” (P31). Interestingly, here the participant implied that he’d like the model to replicate “the norm” unless otherwise specified, even though his definition of the “norm” promotes an American-centric view and still leaves out many American families.

However, some participants were also frustrated when the model took a more literal interpretation of the prompt than they were expecting. For instance, participants were upset that disabled people were always posed alone, especially in public places like a park, even though the prompts given were singular in their wording “a person using a walker in a park, photo” and “a person with a disability, photo.” These preferences suggest that thoughtful defaults may not just impact underspecified subjects, but the overall vibe or scenery depicted. In the case of the park, people could have been added to the background while still sticking to the prompt by maintaining one person as the focal point.

Potential solutions to these issues are complicated in that they may invoke empowering representation for some people while erasing others. A solution that by default depicts disability only when prompted, was contrary to many participants’ ideals of normalizing disability. Portraying them in a manner that forefronts capability according to assumptions about disabilities occurring in isolation (e.g., blind people using canes and not other mobility aids like walkers), may in turn subvert chance depictions of people with multiple disabilities, for example.

Figure 2:

4.4 Participant Recommendations

After pointing out disability representation failures in AI systems, participants shared improvement suggestions, which we organized into image, interface, and model development recommendations.

4.4.1 Recommended Image Characteristics.

First and foremost, participants wanted to see more diversity in images and deviations from the tropes listed above. Specifically, they wanted to combat the few reductive archetypal representations of disability and to see people with a diverse range of abilities, genders, races, and ages looking happy. In addition to fixing the inaccuracies in how AT is rendered, they wanted a broader variety of ATs shown for a specific disability. Some participants recognized that these models output multiple images at a time (e.g., Dall-E 2 and Midjourney generate multiple images by default) and suggested using this feature to increase diversity in representation. Four photos provide four opportunities to highlight people with different disabilities, races, genders, and ages: “if i’m putting in a prompt and it’s going to generate 3 to 4 images, they are going to need [to be] varied,” (P33). One participant acknowledged that it is likely difficult to make all of the negative or one-dimensional representations go away, but suggested: “can we use subtle things to maybe sneak more healthy portrayals in here so that there might be the option to do … a healthier portrayal?” Here P40 called out that, it’s not just that there’s problematic portrayals coming out of these models, it’s that it’s almost exclusively negative portrayals; there are no good options by default.

4.4.2 Recommended User Interface Changes.

Our participants suggested how interfaces could help users reach better representations of disability through explainability, image alteration, and including metadata.

Figure 3:

Participants wanted to be able to ask a system why a specific element of an image was included because they often had to guess if an element was meant to indicate disability or not: “I feel like a good feature for a t2i model to have … would be having the tool be able to explain its reasoning … and [say] ‘this person has X disability’,” (P01). In one focus group, our participants shared this exchange in response to Figure 3 generated by the prompt “people with a lot of different disabilities:”

P01: The person who is just sitting (but not in a wheelchair) is really interesting! Are they sitting cause a chronic illness? Fatigue?
P00: what is she holding?
P02: i can’t figure it out
P02: a wire?
P00: she seems to be an amputee but the leg is not moving in an expected direction
P00: is the Black man in the bottom left supposed to be signing?
P00: everyone has a lot of bags (probably not meaningful, but just notable)
…
P01: Bags could contain assistive technology or important medication

The participants worked together to try and understand how each of the people they saw in the images was disabled via the visual cues at hand, because they were ambiguous. This ambiguity of why a detail was included led one participant to question if a body is non-normative because of a model error or disability representation. She explained: “[I’m not sure] how to comment on likely unrealistic artifacts of generative AI without engaging in ableism … the hand [in that photo] is likely a render error, but it feels difficult to comment upon that while not suggesting that someone with a nonstandard number of fingers is akin to a rendering error,” (P00). Overall, participants expressed a clear desire for object-level model explainability to better understand how and where disability was being reflected in the images.

Besides understanding why each element was included, participants also desired the ability to change a specific element while keeping the overall image composition. One participant explained that, in her experience, generative AI makes tasks more efficient but is also error prone: “[Compared to LLMs], it’s harder to edit an image. You can take [out] a sentence from text. I don’t have the skills to edit the image.” Because of the difficulty of editing images that P71 described, an alternative way to make alterations could be asking models to regenerate specific parts of an image according to user feedback.

Our participants suggested the metadata associated with the image could include some of the terms that were used to generate the image. Thus, the metadata could indicate that “disability” or “memory loss” are elements of the photo. This solution is beneficial for both avoiding representations that over-emphasize disability and providing better representation for invisible disabilities, which were two concerns of our participants.

4.4.3 Recommended System Development Changes.

Participants wanted to see models built with more data about disability and with more engagement from disabled communities. While some participants were not sure why a model was producing unsatisfactory results, others were quick to identify a lack of quality training data: “Whoever’s model this was didn’t have a good data set on that,” (P11). While it was clear that there was a lack of high quality training data for certain disabilities (e.g., people with limb differences, who often were rendered with no limb difference), participants especially wanted better training data for assistive technologies. For example, P21 suggested going to an audiologist office to capture quality images of cochlear implants and hearing aids. However, it is important for these datasets to include authentic AT use, or risk perpetuating some of the above stereotypes of medicalization and improper use of AT.

Finally, participants highlighted the importance of including people with disabilities in model evaluation processes. Furthermore, they specified that model builders need to talk to people with a variety of disabilities, as a person with one disability is not an expert on another disability. For example, one of our participants who was not neurodivergent was concerned about a representation of “a neurodiverse person, photo” which included swirls and ribbons of color around the head: “I wonder if people may make associations like, oh, yeah, there’s something wrong with... the brain of people who are neurodiverse,” (P03). However, two participants that identified as neurodivergent in the group, contrarily, appreciated the representation, with one participant commenting that “I’m not worried about these showing neurodiversity in a negative light whatsoever, because these look amazing– colorful, creative … these are strikingly beautiful,” (P01). In this case, people within the identity group had a different opinion to those outside the group. This example demonstrates how engaging with a few people with disabilities is likely not sufficient to create inclusive models.

5 Discussion

Our findings indicate that models are outputting negative, reductive representations of people with disabilities. Oftentimes, these representations reduce disability to a very narrow archetype that pulls on existing negative stereotypes, resulting in and replicating real-world harms [76, 83]. These current representational tropes act as several subcategories in Garland-Thomson’s overall category of “the sentimental,” where disabled people are viewed as incapable, warranting pity [35]. Very few portrayals represented “the realistic,” though our participants affirmed that this is the type of representation that they would most like to see. Broadly, media is pushing towards creating more positive representations of people with disabilities, but often still recreates harmful tropes along the way [5]. Still, generative AI relies on the past to produce the future. Whereas disability representation advocacy focuses on moving forward by involving people with disabilities (e.g., casting calls and sensitivity consultants), we must contend with legacies of erasure as they quite literally make the data that make the images. We speculate a tension between respectful representation and available training data, since people with disabilities have long been absent from digitized data sources given disability-based segregation’s legality in the US until the last few decades. Thus, to take steps towards reaching AI systems that produce images that are within “the realistic” without significant prompt engineering, we need to either focus on creating datasets with better representation or strive to use ML techniques that rely on less data.

Against the backdrop of long-established concerns around disability representation in the media, using generative AI comes with various ethical concerns and tradeoffs. Our participants discussed the concerns and benefits of using T2I systems for different purposes and some of the challenges with making the model safe and effective to use. For example, some of our participants indicated that users can iterate on these often negative, first outputs from the model by prompt engineering to reach more inclusive, positive representations. However, they also pointed out that some people may be unaware that the outputs are offensive, prompt engineering perpetuates existing inequities, and it is not possible for some individuals. While representation is complex and may not be fully addressed with a checklist, there are actionable ways we can improve these systems.

5.1 Towards Acceptable Disability Representation in T2I

While our work did provide some clear indicators on types of representation to encourage or avoid, it also raised tensions that need to be explored further moving forward.

5.1.1 Best Practices.

Our participants largely agreed that certain representational tropes output by the model were undesirable, ranging from annoying, to offensive, to harmful. Specifically, participants wanted a broader range of diversity in the images, including in disability, race, gender, age, and ATs shown. They wanted disabled people to be shown not just looking sad and alone, but doing everyday activities. More specifically, participants were not opposed to some of the options including negative emotions, as that is a valid experience of disabled people in certain situations. Rather, they did not appreciate all of the images in one prompt iteration looking so similar, and oftentimes negative by default.

More generally, participants had the clear objective of wanting people to perceive image subjects as disabled in prompts about disability, which is complicated by different disabilities having different levels of public perceivability [32]. One of the easiest ways to accomplish making disability identity highly perceivable is to rely on stereotypes and/or reductive representations (e.g., showing a depressed person crying, or only showing people who have epilepsy when they’re mid-seizure). However, participants quickly noted the harms of reifying these stereotypical representations. Thus, these kinds of images might propagate harm if they are not shown without broader context that aims at conveying the limits of this single representation. Instead, participants wanted to provide more nuanced and less superficial representations, while still signaling that the subject is disabled. Thus, an alternative solution is to focus on authentic representations targeted for disabled viewers. Our participants gave examples of how images could demonstrate highly specific community knowledge as a way of signaling representation to others in the identity group (e.g., a stool in the kitchen signaling chronic illness to viewers who are also chronically ill). While these nuances might be missed by people who are not in-group, it provides an authentic form of representation for a minoritized audience.

Participants considered potential solutions that both resist tendencies for models to produce reductive representation archetypes and still make disability perceivable by people without knowledge of disability. The main suggestions that participants made to work towards this goal were 1) to include the prompt with the image, so consumers can understand that people with disabilities can “look just like you and me,” and 2) to show multiple images that highlight a variety of experiences, rather than just one reductive perspective.

5.1.2 Open Challenges.

While some participant suggestions were near-unanimously agreed upon, other times, participants’ opinions conflicted with broader AI priorities. These representational conflicts complicate what it means to have “inclusive disability representation” in AI-generated images. For example, our participants provided well-reasoned justifications for and against adding disabilities to an image that were not explicitly specified in the prompt (e.g., inserting a disabled person in “a photo of a family”), especially adding multiple disabilities to a single person (e.g., showing a blind person who also uses a walker without explicit prompting). However, our participants also expressed a strong desire to normalize disability as a common part of society, which might include diversifying AI-generated image subjects to sometimes render disabled people, even when disability was not specified in the prompt.

At the same time, variations in cultural attitudes towards disability further complicate what it means to generate a “positive portrayal of disability.” Prior research demonstrates that T2I models outputs often bias towards an American or Western portrayal [12], and our participants were all North-America based. Further, different global cultures hold different beliefs and attitudes about disability and disability language. These differences may impact the language that people in different places choose to use in prompts and what kinds of representation they desire in outputs [10, 44, 59, 70, 79]. Future research is needed to better understand how model-produced disability representations are perceived in non-North-American disability contexts.

We argue that few decisions are “neutral” when it comes to representation– even omitting a group implicitly is a statement about who gets to be represented. For example, by refusing to provide representations of people with bipolar disorder, the model sends the message that the user is incorrect or acting in bad faith for trying to prompt about a stigmatized disability and further alienates the blocked disability [76]. This behavior mirrors broader issues in AI filtering such as models disallowing reclaimed slurs or topics or words commonly used within minoritized communities [7, 29, 50, 75, 78]. While models attempt to keep users safe by barring them from seeing offensive content, they can inadvertently reinforce certain topics as taboo when impacted communities are trying to normalize them.

Given that there are few neutral ways to resolve representational conflicts, we contend that system designers should make deliberate choices around which values are embedded in AI models and processes to use as guiding principles while choosing between tradeoffs. For example, inspired by the principles of disability justice, creators of AI systems could consider prioritizing including representation of those most impacted, perhaps portraying people with multiple disabilities, or multiply minoritized disabled people [11]. In this case, including disabilities not mentioned in prompts means risking the concerns participants raised around propagating inaccurate stereotypes. However, if motivated by aligning representation with disability justice, this might be an acceptable tradeoff to increase representation of those who are most minoritized.

In summary, providing a concrete list of best ways to represent people with disabilities is not the end-all goal. Rather, we encourage AI model builders to consider the values that they want to use to guide their decisions in the models, understand the tradeoffs they make with decisions around representation, and publicize the values they use to inform their development to increase transparency for model users.

5.2 Scaffold T2I User Engagement

Our participants commented that people are likely to use models either to learn about disability or as stock photography, and they were not opposed to these use cases, in theory. However, they also commented that these models are currently outputting negative stereotypes that people unfamiliar with disability could assume are reality. Currently, the only safeguard making sure that these harmful images are not used is the judgment of the user, and the aforementioned misguided safety filters. However, people expressed concerns about users having the background knowledge about disability and motivation needed to reject problematic portrayals and generate another set of images.

There are multiple ways to alleviate this issue, including at a model and interface level. At a prompt level, we could encourage more chaos in the images produced, with the goal of producing images with a broader diversity across the dimensions that mattered to our participants (race, gender, age, emotional affect, assistive technology used, etc.). At an interface level, models could play a bit more of an educational role and inform users about best use practices. For example, if the model detects that disability exists in the prompt or image, it could open a dialogue that informs the user about negative tropes that they should check are not present in their image before using it. In summary, there are opportunities for models to better safeguard against users using problematic images rather than relying only on their judgment, including providing better alternatives by default and educating users who might not have pre-existing disability knowledge.

5.3 Qualitative Evaluations of Emerging Technologies

Our qualitative approach to identifying T2I system failures related to disability representation builds on a growing interest in centering communities in model evaluations [33, 64]. In contrast to this nascent scholarship, most published T2I evaluations favor specified, automated, and quantitative metrics [12, 20, 56]. Our results show that models today behave in manners that are harmful, including by producing representations that are not in alignment with community values or refusing to produce images for specific disability communities. While it is known that adding a step of engaging with community members to an AI development process is not alone enough to fix problematic systems and policies [24, 37], we did find a host of benefits from engaging with a variety of disabled individuals in depth on this topic. We learned: 1) common tropes and representations that were near-unanimously not preferred, 2) priorities of what issues mattered most to our participants, 3) actionable feedback that can be implemented in a short timeframe, 4) different and sometimes conflicting perspectives on novel issues from emerging technologies, and 5) open challenges that need more time and research to be addressed. These insights are critical to building inclusive AI systems; there is value in understanding what errors a model makes and the harms that ensue in models today. Future models are unlikely to avoid these errors without intentional design. Further, in order to evaluate models with metrics, we have to understand what issues are important to communities to guide new metric creation and testing. Finally, models will continue to evolve, creating new, emergent issues. Therefore, we suggest that a qualitative evaluation, like what we performed, become a step of the process of evaluating generative AI models, not just a single safeguard, never to be revisited.

Our study added a crucial dimension to qualitative T2I, and broader AI evaluation. We were intentional to make our study procedures accessible, and in some cases, this involved varying procedures by sharing in different formats (e.g., some participants read image descriptions and others provided feedback in ASL). Centering access provides an entry into reimagining what we mean by evaluation. For example, a literal take on image evaluation might exclude nonvisual forms of experiencing or critiquing imagery. Yet blind participants’ perspectives on visual representation are still essential for broader disability inclusion, and the process of access can itself become a form of questioning and reimagining, as audio descriptions scaffolded digital anthropologist Balasubramanian‘s nonvisual analyses [6]. In fact, we argue that our multimodal communication (e.g., verbal facilitation, chat, showing images, providing image descriptions, and participants communicating in the medium most comfortable for them) enriched the feedback, providing a foundation on which participants could build and expand the common ground together, and then from which they branched with their personalized feedback. Accessible formats might open up possibilities for broader evaluation strategies to go beyond people responding in a singular format to a singular representation. For example, what if representation, or how people themselves conceive of representation, was evaluated in part on how people write image descriptions, or draw what comes to mind when they encounter AI-generated representations of any form? We are excited for future research to explore how multimodal evaluations rooted in accessible communication will inform techniques more broadly, and we posit that these approaches will become more relevant as mainstream products shift to multimodal generative AI offerings.

However, we want to emphasize that even when focusing on underrepresented communities such as people with disabilities, sample diversity is important since participants who had one disability were not experts in another. In fact, our participants who were in a specific group sometimes contradicted the opinions of those outside that group. Additionally, preferred representation sometimes perpetuated other forms of erasure, such as the participant who expressed that a walker rendering in an image that prompted for blind people signified they were incapable rather than signaling multiple disabilities. Conflicting perspectives brought out the importance of reflexivity throughout the process. It is not only relevant for researchers to be aware of their own position, but it is also important to recognize that the rich, nuanced lived experiences brought to model evaluations may also promote stereotypes or absences that must be accounted for.

6 Limitations

Our study had several limitations. First, our prompts were based on what our participants found interesting, but they did not provide even coverage on several important disability contexts such as showing people with multiple disabilities. Additionally, we scoped our recruiting efforts to maximize the diversity of types of disability and AT use in our sample. However, our participants were North-American-based, and our analysis was informed by a US-centric definition of disability identity and culture, which is not representative of disability globally [10, 44, 59, 70, 79]. Preferences of outputs for people in different countries and cultures will likely differ in how disability is represented, and how often it is inserted when not prompted for. Finally, we recognize that our results are not wholly reproducible because these models generate different outputs with each prompt iteration and the systems continue to evolve and change. However, that does not diminish the fact that 1) the issues discussed in this paper are actively being experienced today, and 2) these are tropes that will continue to be problematic, and future model developers should be aware of them.

7 Conclusion

T2I tools are proliferating in use right now, and prior research documents the biases they propagate with respect to different identity characteristics. This work presents a focus specifically on disability representation in T2I outputs. Through our focus groups with 25 disabled participants, we identified pervasive, problematic representations of people with disabilities that propagate broader societal misconceptions and biases about disability. Challenging tradeoffs within decisions about who to represent and how highlight the lack of neutrality in this process; we encourage generative AI practitioners to consider the values embedded in these tradeoffs and to move toward more diverse, positive representations of people with disabilities. Our work enumerates several of these tradeoffs to consider and concrete suggestions for improving current outputs to better meet community preferences. We also recommend that other researchers prioritize qualitative, community-based evaluations of emerging technologies to mitigate harm to minoritized communities.

Acknowledgments

We thank Georgina Kleege, Mara Mills, and Bess Williamson for their expertise on disability studies literature on disability representation, and Jaemarie Solyst and Aida Davani for giving us feedback on our study design.

A Representational Tropes Additional Images

In the figures below, we provide additional images for each representational trope explained in the tropes in the tables in Section 4.1.1. It is important to note that some representational tropes were problematic because of their prevalence in the dataset (e.g., overwhelming number of wheelchairs), while others were deeply problematic to even be shown once (e.g., disability and afterlife). Some tropes in this latter category only have 1-2 examples, but are still important to identify and avoid.

Figure 4:

Figure 5:

Figure 6:

Figure 7:

Figure 8:

Figure 9:

Figure 10:

Figure 11:

Figure 12:

Figure 13:

Figure 14:

Figure 15:

Figure 16:

Figure 17:

B Onboarding Video Transcript

Hello and thank you for taking the time to watch this video to help you prepare for our study about AI generated images. Let’s start with a bit about what AI generated images are. An AI generated image is an image that’s made by a machine learning model which takes in a text prompt and produces an image associated with that prompt. This isn’t like doing a web search for an image. The AI model is creating new images based on associations between images and words that it has seen before. People might use AI to generate images for several reasons, including to be creative and make art or to create stock images for presentations or websites. Here are a few examples on the slide of images generated with different AI image models. The first is “a cyborg dragon breathing fire, fantasy, landscape,” which shows a mechanical looking dragon in front of a grey city. The second is “a person giving a presentation, office, photo,” which shows a white man in a suit giving a presentation of a nature photo. The third is “an abstract painting of Artificial Intelligence,” which shows what looks like an oil painting of a robotic creature with one eye in its head.

Let’s dive into an example. Here I have an example of four images created by an AI image model the creator with the prompt "a cute cat playing with a ball of yarn, cartoon, colorful," and the AI system produced these four images of somewhat creepy looking cartoon cats with balls of yarn. Now part of the process of creating AI images is learning how different ways of phrasing prompts produces different images. In this example, the creator added these terms cartoon and colorful at the end of the prompts to indicate their preferred style. You don’t have to do that if you don’t want to, but you’re welcome to try it out. If the images don’t turn out exactly as you’d like, you can always try the same prompt again and you’ll get different images, or you can try tweaking the prompt a little and trying again. Here I have two sets of images for running the same prompts twice, and these images are different.

Let’s talk for a bit about what AI models are. millions of images with text captions or text descriptions are taken from the web and inputted into an AI model that uses these description image pairings to learn associations between words and images. So when a user inputs a text prompt into the model, the AI model analyzes the words you use and creates new images based on the associations it has learned from the millions of image-description pairs.

Last, I want to talk a bit about bias and AI. There’s a lot of discussion around the biases that can be built into these AI models. For example, consider the images below for the prompt "a person giving a presentation, office, photo". All of the people in these photos who are presenting appear to be white men, there may be some women in the audience, and I’m not sure if I see any people of color. This might indicate that the model has bias in thinking that women and people of color are not frequently office workers or at least not the people who give presentations and offices. In this study, we’re going to ask you to think about the bias that AI image models might have around disability or people with access needs. Thank you.

Footnotes

Occasionally a participant had a different disability due to availability and rescheduling.

The research team encountered other examples of such filtering mechanisms, which were triggered particularly when we prompted for people with mental health conditions and facial or skin differences (e.g. burn survivors).

A concept used in image generating and captioning communities referring to elements that are hard to show in images.

Supplemental Material

MP4 File - Video Presentation

Video Presentation

Transcript for: Video Presentation

References

[1]

2020. Crip Camp: A Disability Revolution.

DarkGray Prompt and T2I generated image	Participant quote	Instead, participants wanted
Prompt: "a person with a disability, photo"	“there is a wide spectrum of disabilities… just because most of the images that people seem to find are wheelchair users, that does not mean that there are not other disabilities out there that should be represented in these AI models [and] in these data sets.” P11	Increasing the diversity in the disabilities shown via ATs used, differences in appearance, or disability-specific behaviors or cultural elements (e.g., using ASL).

DarkGray Subtheme	Prompt and T2I generated image	Participant quote	Instead, participants wanted
Lack of diversity in AT: Depicting the same AT for a specific disability, like only signaling blindness with dark glasses	Prompt: "two people who are blind going on a date, photo"	“It’s like, can we get away from sunglasses, please?” P13	Showing a variety of factors that signal disability (e.g., for this image, P13 suggested: having them facing each other but not making eye contact, showing a guide dog or a white cane).

Errors in rendering AT: Depicting AT in ways that have errors or are not realistic.	Prompt: "a person who uses a hearing aid, photo"	“I don’t know if the computer generated them; they don’t look like real hearing aids. I haven’t seen them.” P70	Checking that ATs rendered by models have key features that matter to the users of those technologies.

Misuse of AT: Depicting an AT being used incorrectly.	Prompt: "a blind person, photo"	“That’s the weirdest description I’ve ever heard. Why can’t they just [show] simply "a blind person holding a cane in their hand" or something, like more normal?” P12 commenting on the image alt text	Showing accurate portrayals of how people who have the ATs use them in their lives.

Outdated AT: Depicting an old-fashioned version of a technology, or a design that looks outdated.	Prompt: "a person with a disability, photo"	“In terms of [AT], I don’t know that [the representation] has to be … precise. The most important thing is it shows a modernity to it, [that it’s] not finding images that are outdated, unattractive. It should just look positive … [something that] break down those stigmas,” P71	Rendering versions of ATs that are up to date with the technologies in the real world (e.g., showing updated versions of hearing aids).

Bionic AT: Depicting AT in hyper-mechanical, bionic ways.	Prompt: "a blind person with a cochlear implant at work, photo"	“That’s not really a good portrayal of a cochlear implant at all … It’s not a complex metal arrangement … Nowadays, they’re much smaller and more discreet.” P11	Rendering AT accurately and realistically, avoiding unnecessarily complex portrayals

Abstract

1 Introduction

2 Related Work

2.1 Disability Representation in Popular Culture

2.2 Generative Image Models and Bias

2.3 T2I System Evaluations

3 Methods

3.1 Pre-Study Work

3.2 Study Images

3.3 Focus Group

3.3.1 Study Accommodations.

3.4 Post-Study Survey

3.5 Analysis

3.6 Positionality

4 Results

4.1 Participant Reactions to Text-to-Image Model Outputs

4.1.1 Representational Tropes.

4.1.2 Positive reactions to images.

4.1.3 Exclusionary Filtering Mechanisms.

4.2 Reductive, Archetypal Representations and Their Impacts

4.3 Tensions within Prompts, Outputs, and Expectations

4.3.1 Prompt engineering for more diversity in images.

4.3.2 Tensions with static, visual representation.

4.3.3 Tensions around sticking to the prompt.

4.4 Participant Recommendations

4.4.1 Recommended Image Characteristics.

4.4.2 Recommended User Interface Changes.

4.4.3 Recommended System Development Changes.

5 Discussion

5.1 Towards Acceptable Disability Representation in T2I

5.1.1 Best Practices.

5.1.2 Open Challenges.

5.2 Scaffold T2I User Engagement

5.3 Qualitative Evaluations of Emerging Technologies

6 Limitations

7 Conclusion

Acknowledgments

A Representational Tropes Additional Images

B Onboarding Video Transcript

Footnotes

Supplemental Material

References

Cited By

Index Terms

Recommendations

"I wouldn’t say offensive but...": Disability-Centered Perspectives on Large Language Models

AI’s Regimes of Representation: A Community-centered Study of Text-to-Image Models in South Asia

Typology of Risks of Generative Text-to-Image Models

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

HTML Format

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations