More Web Proxy on the site http://driver.im/

research-article

Open access

EditScribe: Non-Visual Image Editing with Natural Language Verification Loops

Authors:

Ruei-Che Chang,

Anhong GuoAuthors Info & Claims

ASSETS '24: Proceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility

Article No.: 65, Pages 1 - 19

https://doi.org/10.1145/3663548.3675599

Published: 27 October 2024 Publication History

All formats PDF

Abstract

Image editing is an iterative process that requires precise visual evaluation and manipulation for the output to match the editing intent. However, current image editing tools do not provide accessible interaction nor sufficient feedback for blind and low vision individuals to achieve this level of control. To address this, we developed EditScribe, a prototype system that makes object-level image editing actions accessible using natural language verification loops powered by large multimodal models. Using EditScribe, the user first comprehends the image content through initial general and object descriptions, then specifies edit actions using open-ended natural language prompts. EditScribe performs the image edit, and provides four types of verification feedback for the user to verify the performed edit, including a summary of visual changes, AI judgement, and updated general and object descriptions. The user can ask follow-up questions to clarify and probe into the edits or verification feedback, before performing another edit. In a study with ten blind or low-vision users, we found that EditScribe supported participants to perform and verify image edit actions non-visually. We observed different prompting strategies from participants, and their perceptions on the various types of verification feedback. Finally, we discuss the implications of leveraging natural language verification loops to make visual authoring non-visually accessible.

References

[1]

2015. Specific Guidelines: Art, Photos & Cartoons. http://diagramcenter.org/specific-guidelines-final-draft.html

[2]

2018. How to Write Alt Text and Image Descriptions for the visually impaired. https://www.perkins.org/resource/how-write-alt-text-and-image-descriptions-visually-impaired/

[3]

2018. Web Content Accessibility Guidelines (WCAG) Overview. https://www.w3.org/WAI/standards-guidelines/wcag/

[4]

2022. Auto Color. https://helpx.adobe.com/ca/premiere-pro/using/auto-color.html

[5]

2022. Text to Color Grade. https://runwayml.com/ai-tools/text-to-color-grade/

[6]

2024. Aira. https://aira.io/

[7]

2024. BeMyEyes. https://www.bemyeyes.com/

[8]

2024. ChatGPT. https://chat.openai.com/

[9]

2024. GPT-4 Vision. https://platform.openai.com/docs/guides/vision

[10]

2024. Gradio. https://www.gradio.app/

[11]

2024. How to use Text Analyzer in JAWS to proofread documents. https://www.perkins.org/resource/how-to-use-text-analyzer-in-jaws-to-proofread-documents/

[12]

2024. Introducing Be My AI (formerly Virtual Volunteer) for People who are Blind or Have Low Vision, Powered by OpenAI’s GPT-4. https://www.bemyeyes.com/blog/introducing-be-my-eyes-virtual-volunteer

[13]

2024. Midjourney. https://www.midjourney.com/home

[14]

2024. OpenCV. https://opencv.org/

[15]

2024. SeeingAI. https://www.seeingai.com/

[16]

2024. Tap into the power of AI photo editing. https://www.adobe.com/products/photoshop/ai.html

[17]

2024. Use VoiceOver for images and videos on iPhone. https://support.apple.com/en-ca/guide/iphone/iph37e6b3844/ios

[18]

Dustin Adams, Lourdes Morales, and Sri Kurniawan. 2013. A qualitative study to support a blind photography mobile application. In Proceedings of the 6th International Conference on PErvasive Technologies Related to Assistive Environments (Rhodes, Greece) (PETRA ’13). Association for Computing Machinery, New York, NY, USA, Article 25, 8 pages. https://doi.org/10.1145/2504335.2504360

Digital Library

[19]

Tousif Ahmed, Patrick Shaffer, Kay Connelly, David Crandall, and Apu Kapadia. 2016. Addressing Physical Safety, Security, and Privacy for People with Visual Impairments. In Twelfth Symposium on Usable Privacy and Security (SOUPS 2016). USENIX Association, Denver, CO, 341–354. https://www.usenix.org/conference/soups2016/technical-sessions/presentation/ahmed

Digital Library

[20]

Rahaf Alharbi, Robin N. Brewer, and Sarita Schoenebeck. 2022. Understanding Emerging Obfuscation Technologies in Visual Description Services for Blind and Low Vision People. Proc. ACM Hum.-Comput. Interact. 6, CSCW2, Article 469 (nov 2022), 33 pages. https://doi.org/10.1145/3555570

Digital Library

[21]

Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for Human-AI Interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3290605.3300233

Digital Library

[22]

Cynthia L. Bennett, Jane E, Martez E. Mott, Edward Cutrell, and Meredith Ringel Morris. 2018. How Teens with Visual Impairments Take, Edit, and Share Photos on Social Media. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3173574.3173650

Digital Library

[23]

Jeffrey P. Bigham, Chandrika Jayant, Hanjie Ji, Greg Little, Andrew Miller, Robert C. Miller, Robin Miller, Aubrey Tatarowicz, Brandyn White, Samual White, and Tom Yeh. 2010. VizWiz: nearly real-time answers to visual questions. In Proceedings of the 23nd Annual ACM Symposium on User Interface Software and Technology (New York, New York, USA) (UIST ’10). Association for Computing Machinery, New York, NY, USA, 333–342. https://doi.org/10.1145/1866029.1866080

Digital Library

[24]

Tim Brooks, Aleksander Holynski, and Alexei A. Efros. 2023. InstructPix2Pix: Learning to Follow Image Editing Instructions. arxiv:2211.09800 [cs.CV]

[25]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.

[26]

Maitraye Das, Thomas Barlow McHugh, Anne Marie Piper, and Darren Gergle. 2022. Co11ab: Augmenting Accessibility in Synchronous Collaborative Writing for People with Vision Impairments. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 196, 18 pages. https://doi.org/10.1145/3491102.3501918

Digital Library

[27]

Danyang Fan, Alexa Fay Siu, Wing-Sum Adrienne Law, Raymond Ruihong Zhen, Sile O’Modhrain, and Sean Follmer. 2022. Slide-Tone and Tilt-Tone: 1-DOF Haptic Techniques for Conveying Shape Characteristics of Graphs to Blind Users. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 477, 19 pages. https://doi.org/10.1145/3491102.3517790

Digital Library

[28]

Noor Fatima. 2020. AI in Photography: Scrutinizing Implementation of Super-Resolution Techniques in Photo-Editors. In 2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ). 1–6. https://doi.org/10.1109/IVCNZ51579.2020.9290737

[29]

Ricardo E. Gonzalez Penuela, Paul Vermette, Zihan Yan, Cheng Zhang, Keith Vertanen, and Shiri Azenkot. 2022. Understanding How People with Visual Impairments Take Selfies: Experiences and Challenges. In Proceedings of the 24th International ACM SIGACCESS Conference on Computers and Accessibility (Athens, Greece) (ASSETS ’22). Association for Computing Machinery, New York, NY, USA, Article 63, 4 pages. https://doi.org/10.1145/3517428.3550372

Digital Library

[30]

Ricardo E. Gonzalez Penuela, Paul Vermette, Zihan Yan, Cheng Zhang, Keith Vertanen, and Shiri Azenkot. 2022. Understanding How People with Visual Impairments Take Selfies: Experiences and Challenges. In Proceedings of the 24th International ACM SIGACCESS Conference on Computers and Accessibility (Athens, Greece) (ASSETS ’22). Association for Computing Machinery, New York, NY, USA, Article 63, 4 pages. https://doi.org/10.1145/3517428.3550372

Digital Library

[31]

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Networks. arxiv:1406.2661 [stat.ML]

[32]

Susumu Harada, Daisuke Sato, Dustin W. Adams, Sri Kurniawan, Hironobu Takagi, and Chieko Asakawa. 2013. Accessible Photo Album: Enhancing the Photo Sharing Experience for People with Visual Impairment. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Paris, France) (CHI ’13). Association for Computing Machinery, New York, NY, USA, 2127–2136. https://doi.org/10.1145/2470654.2481292

Digital Library

[33]

R. Hartson and P.S. Pyla. 2012. The UX Book: Process and Guidelines for Ensuring a Quality User Experience. Elsevier Science. https://books.google.ca/books?id=w4I3Y64SWLoC

[34]

Jaylin Herskovitz, Andi Xu, Rahaf Alharbi, and Anhong Guo. 2023. Hacking, Switching, Combining: Understanding and Supporting DIY Assistive Technology Design by Blind People. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 57, 17 pages. https://doi.org/10.1145/3544548.3581249

Digital Library

[35]

Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. 2022. Prompt-to-Prompt Image Editing with Cross Attention Control. arxiv:2208.01626 [cs.CV]

[36]

Naoki Hirabayashi, Masakazu Iwamura, Zheng Cheng, Kazunori Minatani, and Koichi Kise. 2023. VisPhoto: Photography for People with Visual Impairments via Post-Production of Omnidirectional Camera Imaging. In Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility (New York, NY, USA) (ASSETS ’23). Association for Computing Machinery, New York, NY, USA, Article 6, 17 pages. https://doi.org/10.1145/3597638.3608422

Digital Library

[37]

Mina Huh, Yi-Hao Peng, and Amy Pavel. 2023. GenAssist: Making Image Generation Accessible. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (San Francisco, CA, USA) (UIST ’23). Association for Computing Machinery, New York, NY, USA, Article 38, 17 pages. https://doi.org/10.1145/3586183.3606735

Digital Library

[38]

Mina Huh, Saelyne Yang, Yi-Hao Peng, Xiang ’Anthony’ Chen, Young-Ho Kim, and Amy Pavel. 2023. AVscript: Accessible Video Editing with Audio-Visual Scripts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 796, 17 pages. https://doi.org/10.1145/3544548.3581494

Digital Library

[39]

Mina Huh, Saelyne Yang, Yi-Hao Peng, Xiang ’Anthony’ Chen, Young-Ho Kim, and Amy Pavel. 2023. AVscript: Accessible Video Editing with Audio-Visual Scripts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 796, 17 pages. https://doi.org/10.1145/3544548.3581494

Digital Library

[40]

Joonyoung Jun, Woosuk Seo, Jihyeon Park, Subin Park, and Hyunggu Jung. 2021. Exploring the Experiences of Streamers with Visual Impairments. Proc. ACM Hum.-Comput. Interact. 5, CSCW2, Article 297 (oct 2021), 23 pages. https://doi.org/10.1145/3476038

Digital Library

[41]

Ju Yeon Jung, Tom Steinberger, Junbeom Kim, and Mark S. Ackerman. 2022. “So What? What’s That to Do With Me?” Expectations of People With Visual Impairments for Image Descriptions in Their Personal Photo Activities. In Proceedings of the 2022 ACM Designing Interactive Systems Conference (Virtual Event, Australia) (DIS ’22). Association for Computing Machinery, New York, NY, USA, 1893–1906. https://doi.org/10.1145/3532106.3533522

Digital Library

[42]

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, 2023. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4015–4026.

[43]

Kai Konen, Sophie Jentzsch, Diaoulé Diallo, Peer Schütt, Oliver Bensch, Roxanne El Baff, Dominik Opitz, and Tobias Hecking. 2024. Style Vectors for Steering Generative Large Language Model. arXiv preprint arXiv:2402.01618 (2024).

[44]

Cheuk Yin Phipson Lee, Zhuohao Zhang, Jaylin Herskovitz, JooYoung Seo, and Anhong Guo. 2022. CollabAlly: Accessible Collaboration Awareness in Document Editing. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 596, 17 pages. https://doi.org/10.1145/3491102.3517635

Digital Library

[45]

Jaewook Lee, Jaylin Herskovitz, Yi-Hao Peng, and Anhong Guo. 2022. ImageExplorer: Multi-Layered Touch Exploration to Encourage Skepticism Towards Imperfect AI-Generated Image Captions. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 462, 15 pages. https://doi.org/10.1145/3491102.3501966

Digital Library

[46]

Feng Li, Hao Zhang, Peize Sun, Xueyan Zou, Shilong Liu, Jianwei Yang, Chunyuan Li, Lei Zhang, and Jianfeng Gao. 2023. Semantic-sam: Segment and recognize anything at any granularity. arXiv preprint arXiv:2307.04767 (2023).

[47]

Jingyi Li, Son Kim, Joshua A. Miele, Maneesh Agrawala, and Sean Follmer. 2019. Editing Spatial Layouts through Tactile Templates for People with Visual Impairments. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3290605.3300436

Digital Library

[48]

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. 2023. Flow Matching for Generative Modeling. arxiv:2210.02747 [cs.LG]

[49]

Haley MacLeod, Cynthia L. Bennett, Meredith Ringel Morris, and Edward Cutrell. 2017. Understanding Blind People’s Experiences with Computer-Generated Captions of Social Media Images. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’17). Association for Computing Machinery, New York, NY, USA, 5988–5999. https://doi.org/10.1145/3025453.3025814

Digital Library

[50]

Meredith Ringel Morris, Annuska Zolyomi, Catherine Yao, Sina Bahram, Jeffrey P. Bigham, and Shaun K. Kane. 2016. "With most of it being pictures now, I rarely use it": Understanding Twitter’s Evolving Accessibility to Blind Users. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’16). Association for Computing Machinery, New York, NY, USA, 5506–5516. https://doi.org/10.1145/2858036.2858116

Digital Library

[51]

Mahsan Nourani, Samia Kabir, Sina Mohseni, and Eric D Ragan. 2019. The effects of meaningful and meaningless explanations on trust and perceived system accuracy in intelligent systems. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 7. 97–105.

[52]

Soobin Park. 2020. Supporting Selfie Editing Experiences for People with Visual Impairments. In Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility (Virtual Event, Greece) (ASSETS ’20). Association for Computing Machinery, New York, NY, USA, Article 106, 3 pages. https://doi.org/10.1145/3373625.3417082

Digital Library

[53]

Gaurav Parmar, Krishna Kumar Singh, Richard Zhang, Yijun Li, Jingwan Lu, and Jun-Yan Zhu. 2023. Zero-shot Image-to-Image Translation. In ACM SIGGRAPH 2023 Conference Proceedings (Los Angeles, CA, USA) (SIGGRAPH ’23). Association for Computing Machinery, New York, NY, USA, Article 11, 11 pages. https://doi.org/10.1145/3588432.3591513

Digital Library

[54]

Yi-Hao Peng, Jason Wu, Jeffrey Bigham, and Amy Pavel. 2022. Diffscriber: Describing Visual Design Changes to Support Mixed-Ability Collaborative Presentation Authoring. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology (Bend, OR, USA) (UIST ’22). Association for Computing Machinery, New York, NY, USA, Article 35, 13 pages. https://doi.org/10.1145/3526113.3545637

Digital Library

[55]

Helen Petrie, Chandra Harrison, and Sundeep Dev. 2005. Describing images on the web: a survey of current practice and prospects for the future. Proceedings of Human Computer Interaction International (HCII) 71, 2 (2005).

[56]

Venkatesh Potluri, Tadashi E Grindeland, Jon E. Froehlich, and Jennifer Mankoff. 2021. Examining Visual Semantic Understanding in Blind and Low-Vision Technology Users. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 35, 14 pages. https://doi.org/10.1145/3411764.3445040

Digital Library

[57]

Venkatesh Potluri, Maulishree Pandey, Andrew Begel, Michael Barnett, and Scott Reitherman. 2022. CodeWalk: Facilitating Shared Awareness in Mixed-Ability Collaborative Software Development. In Proceedings of the 24th International ACM SIGACCESS Conference on Computers and Accessibility (Athens, Greece) (ASSETS ’22). Association for Computing Machinery, New York, NY, USA, Article 20, 16 pages. https://doi.org/10.1145/3517428.3544812

Digital Library

[58]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. arxiv:2103.00020 [cs.CV]

[59]

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical Text-Conditional Image Generation with CLIP Latents. arxiv:2204.06125 [cs.CV]

[60]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis with Latent Diffusion Models. arxiv:2112.10752 [cs.CV]

[61]

Ethan Z. Rong, Mo Morgana Zhou, Zhicong Lu, and Mingming Fan. 2022. “It Feels Like Being Locked in A Cage”: Understanding Blind or Low Vision Streamers’ Perceptions of Content Curation Algorithms. In Proceedings of the 2022 ACM Designing Interactive Systems Conference (Virtual Event, Australia) (DIS ’22). Association for Computing Machinery, New York, NY, USA, 571–585. https://doi.org/10.1145/3532106.3533514

Digital Library

[62]

Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. 2004. "GrabCut": interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23, 3 (aug 2004), 309–314. https://doi.org/10.1145/1015706.1015720

Digital Library

[63]

Emma Sadjo, Leah Findlater, and Abigale Stangl. 2021. Landscape Analysis of Commercial Visual Assistance Technologies. In Proceedings of the 23rd International ACM SIGACCESS Conference on Computers and Accessibility (Virtual Event, USA) (ASSETS ’21). Association for Computing Machinery, New York, NY, USA, Article 76, 4 pages. https://doi.org/10.1145/3441852.3476521

Digital Library

[64]

Abir Saha and Anne Marie Piper. 2020. Understanding Audio Production Practices of People with Vision Impairments. In Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility (Virtual Event, Greece) (ASSETS ’20). Association for Computing Machinery, New York, NY, USA, Article 36, 13 pages. https://doi.org/10.1145/3373625.3416993

Digital Library

[65]

Anastasia Schaadhardt, Alexis Hiniker, and Jacob O. Wobbrock. 2021. Understanding Blind Screen-Reader Users’ Experiences of Digital Artboards. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 270, 19 pages. https://doi.org/10.1145/3411764.3445242

Digital Library

[66]

Shelly Sheynin, Adam Polyak, Uriel Singer, Yuval Kirstain, Amit Zohar, Oron Ashual, Devi Parikh, and Yaniv Taigman. 2023. Emu Edit: Precise Image Editing via Recognition and Generation Tasks. arxiv:2311.10089 [cs.CV]

[67]

Alexa Siu, Gene S-H Kim, Sile O’Modhrain, and Sean Follmer. 2022. Supporting Accessible Data Visualization Through Audio Data Narratives. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 476, 19 pages. https://doi.org/10.1145/3491102.3517678

Digital Library

[68]

Abigale Stangl, Nitin Verma, Kenneth R. Fleischmann, Meredith Ringel Morris, and Danna Gurari. 2021. Going Beyond One-Size-Fits-All Image Descriptions to Satisfy the Information Wants of People Who Are Blind or Have Low Vision. In Proceedings of the 23rd International ACM SIGACCESS Conference on Computers and Accessibility (Virtual Event, USA) (ASSETS ’21). Association for Computing Machinery, New York, NY, USA, Article 16, 15 pages. https://doi.org/10.1145/3441852.3471233

Digital Library

[69]

Roman Suvorov, Elizaveta Logacheva, Anton Mashikhin, Anastasia Remizova, Arsenii Ashukha, Aleksei Silvestrov, Naejin Kong, Harshith Goka, Kiwoong Park, and Victor Lempitsky. 2021. Resolution-robust Large Mask Inpainting with Fourier Convolutions. arXiv preprint arXiv:2109.07161 (2021).

[70]

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. 2023. LLaMA: Open and Efficient Foundation Language Models. arxiv:2302.13971 [cs.CL]

[71]

Dani Valevski, Matan Kalman, Eyal Molad, Eyal Segalis, Yossi Matias, and Yaniv Leviathan. 2023. UniTune: Text-Driven Image Editing by Fine Tuning a Diffusion Model on a Single Image. ACM Trans. Graph. 42, 4, Article 128 (jul 2023), 10 pages. https://doi.org/10.1145/3592451

Digital Library

[72]

Tess Van Daele, Akhil Iyer, Yuning Zhang, Jalyn C Derry, Mina Huh, and Amy Pavel. 2024. Making Short-Form Videos Accessible with Hierarchical Video Summaries. arXiv preprint arXiv:2402.10382 (2024).

[73]

Violeta Voykinska, Shiri Azenkot, Shaomei Wu, and Gilly Leshed. 2016. How Blind People Interact with Visual Content on Social Networking Services. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing (San Francisco, California, USA) (CSCW ’16). Association for Computing Machinery, New York, NY, USA, 1584–1595. https://doi.org/10.1145/2818048.2820013

Digital Library

[74]

World Wide Web Consortium (W3C). 2022. W3C Image Concepts. https://www.w3.org/WAI/tutorials/images/

[75]

Chien-Yao Wang, I-Hau Yeh, and Hong-Yuan Mark Liao. 2024. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arxiv:2402.13616 [cs.CV]

[76]

Shaomei Wu, Jeffrey Wieland, Omid Farivar, and Julie Schiller. 2017. Automatic Alt-text: Computer-generated Image Descriptions for Blind Users on a Social Network Service. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (Portland, Oregon, USA) (CSCW ’17). Association for Computing Machinery, New York, NY, USA, 1180–1192. https://doi.org/10.1145/2998181.2998364

Digital Library

[77]

Andi Xu, Minyu Cai, Dier Hou, Ruei-Che Chang, and Anhong Guo. 2024. ImageExplorer Deployment: Understanding Text-Based and Touch-Based Image Exploration in the Wild(W4A ’24). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3677846.3677861

Digital Library

[78]

Chutian Yang, Xiping He, Qixian Kuang, Ling Huang, and Lingling Tao. 2023. Transformer-based high-fidelity StyleGAN inversion for face image editing. In Proceedings of the 2023 7th International Conference on Big Data and Internet of Things (Beijing, China) (BDIOT ’23). Association for Computing Machinery, New York, NY, USA, 76–81. https://doi.org/10.1145/3617695.3617701

Digital Library

[79]

Jianwei Yang, Hao Zhang, Feng Li, Xueyan Zou, Chunyuan Li, and Jianfeng Gao. 2023. Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V. arXiv preprint arXiv:2310.11441 (2023).

[80]

Ahmet Burak Yildirim, Vedat Baday, Erkut Erdem, Aykut Erdem, and Aysegul Dundar. 2023. Inst-Inpaint: Instructing to Remove Objects with Diffusion Models. arxiv:2304.03246 [cs.CV]

[81]

Tao Yu, Runseng Feng, Ruoyu Feng, Jinming Liu, Xin Jin, Wenjun Zeng, and Zhibo Chen. 2023. Inpaint Anything: Segment Anything Meets Image Inpainting. arxiv:2304.06790 [cs.CV]

[82]

Ann Yuan, Andy Coenen, Emily Reif, and Daphne Ippolito. 2022. Wordcraft: story writing with large language models. In 27th International Conference on Intelligent User Interfaces. 841–852.

Digital Library

[83]

Zequn Zeng, Hao Zhang, Ruiying Lu, Dongsheng Wang, Bo Chen, and Zhengjue Wang. 2023. Conzic: Controllable zero-shot image captioning by sampling-based polishing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 23465–23476.

[84]

Lotus Zhang, Abigale Stangl, Tanusree Sharma, Yu-Yun Tseng, Inan Xu, Danna Gurari, Yang Wang, and Leah Findlater. 2024. Designing Accessible Obfuscation Support for Blind Individuals’ Visual Privacy Management. In Proceedings of the CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 235, 19 pages. https://doi.org/10.1145/3613904.3642713

Digital Library

[85]

Lotus Zhang, Simon Sun, and Leah Findlater. 2023. Understanding Digital Content Creation Needs of Blind and Low Vision People. In Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility (New York, NY, USA) (ASSETS ’23). Association for Computing Machinery, New York, NY, USA, Article 8, 15 pages. https://doi.org/10.1145/3597638.3608387

Digital Library

[86]

Zhuohao Jerry Zhang, Smirity Kaushik, JooYoung Seo, Haolin Yuan, Sauvik Das, Leah Findlater, Danna Gurari, Abigale Stangl, and Yang Wang. 2023. { ImageAlly} : A { Human-AI} Hybrid Approach to Support Blind People in Detecting and Redacting Private Image Content. In Nineteenth Symposium on Usable Privacy and Security (SOUPS 2023). 417–436.

[87]

Zhuohao (Jerry) Zhang and Jacob O. Wobbrock. 2023. A11yBoard: Making Digital Artboards Accessible to Blind and Low-Vision Users. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 55, 17 pages. https://doi.org/10.1145/3544548.3580655

Digital Library

[88]

Yuhang Zhao, Shaomei Wu, Lindsay Reynolds, and Shiri Azenkot. 2017. The Effect of Computer-Generated Descriptions on Photo-Sharing Experiences of People with Visual Impairments. Proc. ACM Hum.-Comput. Interact. 1, CSCW, Article 121 (dec 2017), 22 pages. https://doi.org/10.1145/3134756

Digital Library

[89]

Xueyan Zou, Jianwei Yang, Hao Zhang, Feng Li, Linjie Li, Jianfeng Wang, Lijuan Wang, Jianfeng Gao, and Yong Jae Lee. 2024. Segment everything everywhere all at once. Advances in Neural Information Processing Systems 36 (2024).

Cited By

Chang RLiu YGuo A(2024)WorldScribe: Towards Context-Aware Live Visual DescriptionsProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676375(1-18)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676375

Index Terms

EditScribe: Non-Visual Image Editing with Natural Language Verification Loops
1. Human-centered computing
  1. Accessibility
    1. Accessibility technologies
  2. Human computer interaction (HCI)

Recommendations

Understanding Curators' Practices and Challenge of Making Exhibitions More Accessible for People with Visual Impairments
ASSETS '23: Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility

Assistive technologies are increasingly developed and applied in exhibition environments to help blind and low vision (BLV) people deal with the challenges they face when visiting exhibitions. While studies have examined the experiences of BLV people ...
Groupware Accessibility for Persons with Disabilities
UAHCI '09: Proceedings of the 5th International Conference on Universal Access in Human-Computer Interaction. Part III: Applications and Services

The accessibility issues of Groupware applications prevent visually impaired and other persons with disabilities access to these highly graphical interfaces. To address the accessibility issues persons with disabilities have with Groupware, a recent ...
Cuddling Up With a Print-Braille Book: How Intimacy and Access Shape Parents' Reading Practices with Children
CHI '24: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems

Like many parents, visually impaired parents (VIPs) read books with their children. However, research on accessible reading technologies predominantly focuses on blind adults reading alone or sighted adults reading with blind children, such that the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ASSETS '24: Proceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility

October 2024

1475 pages

ISBN:9798400706776

DOI:10.1145/3663548

Editors:
David Flatla
University of Guelph, CANADA
,
Faustina Hwang
University of Reading, UNITED KINGDOM
,
Tiago Guerreiro
University of Lisbon, PORTUGAL
,
Robin Brewer
University of Michigan, UNITED STATES

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGACCESS: ACM Special Interest Group on Accessible Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ASSETS '24

Sponsor:

SIGACCESS

ASSETS '24: The 26th International ACM SIGACCESS Conference on Computers and Accessibility

October 27 - 30, 2024

NL, St. John's, Canada

Acceptance Rates

Overall Acceptance Rate 436 of 1,556 submissions, 28%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
84
Total Downloads

Downloads (Last 12 months)84
Downloads (Last 6 weeks)64

Reflects downloads up to 12 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chang RLiu YGuo A(2024)WorldScribe: Towards Context-Aware Live Visual DescriptionsProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676375(1-18)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676375

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents