More Web Proxy on the site http://driver.im/

research-article

VizWiz: nearly real-time answers to visual questions

Authors:

Jeffrey P. Bigham,

Chandrika Jayant,

Robert C. Miller,

Aubrey Tatarowicz,

Tom YehAuthors Info & Claims

UIST '10: Proceedings of the 23nd annual ACM symposium on User interface software and technology

Pages 333 - 342

https://doi.org/10.1145/1866029.1866080

Published: 03 October 2010 Publication History

Abstract

The lack of access to visual information like text labels, icons, and colors can cause frustration and decrease independence for blind people. Current access technology uses automatic approaches to address some problems in this space, but the technology is error-prone, limited in scope, and quite expensive. In this paper, we introduce VizWiz, a talking application for mobile phones that offers a new alternative to answering visual questions in nearly real-time - asking multiple people on the web. To support answering questions quickly, we introduce a general approach for intelligently recruiting human workers in advance called quikTurkit so that workers are available when new questions arrive. A field deployment with 11 blind participants illustrates that blind people can effectively use VizWiz to cheaply answer questions in their everyday lives, highlighting issues that automatic approaches will need to address to be useful. Finally, we illustrate the potential of using VizWiz as part of the participatory design of advanced tools by using it to build and evaluate VizWiz::LocateIt, an interactive mobile tool that helps blind people solve general visual search problems.

References

[1]

}}Amazon Mechanical Turk. http://www.mturk.com/. 2010.

[2]

}}Amazon Remembers. http://www.amazon.com/gp/. 2010.

[3]

}}Bay, H., A. Ess, T. Tuytelaars, and L. Van Gool. Surf: Speeded up robust features. Proc. of CVIU 2008, v. 110, 346--359, 2008.

Digital Library

[4]

}}Blind with camera: Changing lives with photography. http://blindwithcamera.org/. 2009.

[5]

}}Chacha. http://www.chacha.com/. 2010.

[6]

}}Eyes-free. http://code.google.com/p/eyes-free/. 2010.

[7]

}}Gifford, S., J. Knox, J. James, and A. Prakash. Introduction to the talking points project. Proc. of ASSETS 2008, 271--272, 2008.

Digital Library

[8]

}}Google Goggles, 2010. http://www.google.com/mobile/goggles/.

[9]

}}Hong, D., S. Kimmel, R. Boehling, N. Camoriano, W. Cardwell, G. Jannaman, A. Purcell, D. Ross, an, E. Russel. Development of a semi-autonomous vehicle operable by the visually-impaired. IEEE Intl. Conf. on Multisensor Fusion and Integration for Intelligent Systems, 539--544, 2008.

[10]

}}Hsueh, P., P. Melville, and V. Sindhwani. Data quality from crowdsourcing: a study of annotation selection criteria. Proc. of the HLT 2009 Workshop on Active Learning for NLP, 27--35, 2009.

Digital Library

[11]

}}Intel reader. http://www.intel.com/healthcare/reader/. 2009.

[12]

}}Kane, S. K., J. P. Bigham, and J. O. Wobbrock. Slide rule: making mobile touch screens accessible to blind people using multi-touch interaction techniques. ASSETS 2008, 73--80, 2008.

Digital Library

[13]

}}Kane, S. K., C. Jayant, J. O. Wobbrock, and R. E. Ladner. Freedom to roam: a study of mobile device adoption and accessibility for people with visual and motor disabilities. ASSETS 2009, 115--122, 2009.

Digital Library

[14]

}}KGB, 2010. http://www.kgb.com.

[15]

}}Kittur A., E. H. Chi, and B. Suh. Crowdsourcing user studies with mechanical turk. In Proc. of the SIGCHI Conf. on Human Factors in Computing Systems (CHI 2008), pages 453--456, 2008.

Digital Library

[16]

}}16. kNFB reader. knfb Reading Technology, Inc., 2008. http://www.knfbreader.com/.

[17]

}}Knocking live video. ustream, 2010. http://knockinglive.com/.

[18]

}}Ko, J., and C. Kim. Low cost blur image detection and estimation for mobile devices. ICACT 2009, 1605--1610, 2009.

Digital Library

[19]

}}Little, G., L. Chilton, M. Goldman, and R. C. Miller. TurKit: Human Computation Algorithms on Mechanical Turk. UIST 2010, 2010.

Digital Library

[20]

}}Liu, X. A camera phone based currency reader for the visually impaired. ASSETS 2008, 305--306, 2008.

Digital Library

[21]

}}Looktel, 2010. http://www.looktel.com.

[22]

}}Matthews, T., S. Carter, C. Pai, J. Fong, and J. Mankoff. Scribe4me: Evaluating a mobile sound transcription tool for the deaf. UbiComp 2006, 159--176, 2006.

Digital Library

[23]

}}Matthews, T., J. Fong, F. W.-L. Ho-Ching, and J. Mankoff. Evaluating visualizations of non-speech sounds for the deaf. Behavior and Information Technology, 25(4):333--351, 2006.

[24]

}}Miniguide us. http://www.gdp-research.com.au/minig_4.htm/.

[25]

}}Mobile speak screen readers. Code Factory, 2008. http://www.codefactory.es/en/products.asp?id=16.

[26]

}}Ringel-Morris, M., J. Teevan, and K. Panovich. What do people ask their social networks, and why? a survey study of status message q&a behavior. CHI 2010, 1739--1748, 2010.

Digital Library

[27]

}}Power, M. R., Power, D., and Horstmanshof, L. Deaf people communicating via sms, tty, relay service, fax, and computers in australia. Journal of Deaf Studies and Deaf Education, v. 12, i. 1, 2006.

[28]

}}Rangin, H. B. Anatomy of a large-scale social search engine. WWW 2010), 431--440, 2010.

Digital Library

[29]

}}Solona, 2010. http://www.solona.net/.

[30]

}}Sorokin, A., and D. Forsyth. Utility data annotation with amazon mechanical turk. CVPRW 2008, 1--8, 2008.

[31]

}}Takagi, H., S. Kawanaka, M. Kobayashi, T. Itoh, and C. Asakawa. Social accessibility: achieving accessibility through collaborative metadata authoring. ASSETS 2008, 193--200, 2008.

Digital Library

[32]

}}Talking signs. http://www.talkingsigns.com/, 2008.

[33]

}}Testscout- your mobile reader, 2010. http://www.textscout.eu/en/.

[34]

}}Lanigan, P., A. M. Paulos, A. W. Williams, and P. Narasimhan. Trinetra: Assistive Technologies for the Blind. Carnegie Mellon University, CyLab, 2006.

[35]

}}UStream. ustream, 2010. http://www.ustream.tv/.

[36]

}}Voiceover: Macintosh OS X, 2007. http://www.apple.com/accessibility/voiceover/.

[37]

}}voice for android, 2010. www.seeingwithsound.com/android.htm.

[38]

}}von Ahn, L., and L. Dabbish. Labeling images with a computer game. CHI 2004, 319--326, 2004.

Digital Library

[39]

}}Yeh, T., J. J. Lee, and T. Darrell. Photo-based question answering. MM 2008, 389--398, 2008.

Digital Library

Cited By

Güzelci OKaradag I(2024)Revisiting the Key Components of Creativity Through Generative AIMaking Art With Generative AI Tools10.4018/979-8-3693-1950-5.ch001(1-16)Online publication date: 19-Apr-2024
https://doi.org/10.4018/979-8-3693-1950-5.ch001
Hao YYang FHuang HYuan SRangan SRizzo JWang YFang Y(2024)A Multi-Modal Foundation Model to Assist People with Blindness and Low Vision in Environmental InteractionJournal of Imaging10.3390/jimaging1005010310:5(103)Online publication date: 26-Apr-2024
https://doi.org/10.3390/jimaging10050103
Messaoudi MMenelas BMcheick H(2024)Integration of Smart Cane with Social Media: Design of a New Step Counter Algorithm for CaneIoT10.3390/iot50100095:1(168-186)Online publication date: 14-Mar-2024
https://doi.org/10.3390/iot5010009
Show More Cited By

Index Terms

VizWiz: nearly real-time answers to visual questions
1. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

VizWiz: nearly real-time answers to visual questions
W4A '10: Proceedings of the 2010 International Cross Disciplinary Conference on Web Accessibility (W4A)

Visual information pervades our environment. Vision is used to decide everything from what we want to eat at a restaurant and which bus route to take to whether our clothes match and how long until the milk expires. Individually, the inability to ...
Crowdsourcing subjective fashion advice using VizWiz: challenges and opportunities
ASSETS '12: Proceedings of the 14th international ACM SIGACCESS conference on Computers and accessibility

Fashion is a language. How we dress signals to others who we are and how we want to be perceived. However, this language is primarily visual, making it inaccessible to people with vision impairments. Someone who is low-vision or completely blind cannot ...
Analyzing visual questions from visually impaired users
ASSETS '11: The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility

Many new technologies have been developed to assist people who are visually impaired in learning about their environment, but there is little understanding of their motivations for using these tools. Our tool VizWiz allows users to take a picture using ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

UIST '10: Proceedings of the 23nd annual ACM symposium on User interface software and technology

October 2010

476 pages

ISBN:9781450302715

DOI:10.1145/1866029

General Chair:
Ken Perlin
New York University
,
Program Chairs:
Mary Czerwinski
Microsoft Research
,
Rob Miller
MIT CSAIL

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 October 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

UIST '10

Sponsor:

UIST '10: The 23nd Annual ACM Symposium on User Interface Software and Technology

October 3 - 6, 2010

New York, New York, USA

Acceptance Rates

Overall Acceptance Rate 561 of 2,567 submissions, 22%

Upcoming Conference

UIST '25

Sponsor:
sigchi
sigchi

The 38th Annual ACM Symposium on User Interface Software and Technology

September 28 - October 1, 2025

Busan , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

498
Total Citations
View Citations
5,429
Total Downloads

Downloads (Last 12 months)602
Downloads (Last 6 weeks)159

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Güzelci OKaradag I(2024)Revisiting the Key Components of Creativity Through Generative AIMaking Art With Generative AI Tools10.4018/979-8-3693-1950-5.ch001(1-16)Online publication date: 19-Apr-2024
https://doi.org/10.4018/979-8-3693-1950-5.ch001
Hao YYang FHuang HYuan SRangan SRizzo JWang YFang Y(2024)A Multi-Modal Foundation Model to Assist People with Blindness and Low Vision in Environmental InteractionJournal of Imaging10.3390/jimaging1005010310:5(103)Online publication date: 26-Apr-2024
https://doi.org/10.3390/jimaging10050103
Messaoudi MMenelas BMcheick H(2024)Integration of Smart Cane with Social Media: Design of a New Step Counter Algorithm for CaneIoT10.3390/iot50100095:1(168-186)Online publication date: 14-Mar-2024
https://doi.org/10.3390/iot5010009
Emara I(2024)Knocking on doors: The use of blogging sites by visually impaired people in the USA preliminary studyConvergence: The International Journal of Research into New Media Technologies10.1177/13548565241261963Online publication date: 14-Jun-2024
https://doi.org/10.1177/13548565241261963
Li QWu S(2024)"I Want to Publicize My Stutter": Community-led Collection and Curation of Chinese Stuttered Speech DataProceedings of the ACM on Human-Computer Interaction10.1145/36870148:CSCW2(1-27)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.1145/3687014
Zhao KLai RGuo BLiu LHe LZhao Y(2024)AI-Vision: A Three-Layer Accessible Image Exploration System for People with Visual Impairments in ChinaProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785378:3(1-27)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1145/3678537
Xu ACai MHou DChang RGuo A(2024)ImageExplorer Deployment: Understanding Text-Based and Touch-Based Image Exploration in the WildProceedings of the 21st International Web for All Conference10.1145/3677846.3677861(59-69)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3677846.3677861
Kaniwa YKuribayashi MKayukawa SSato DTakagi HAsakawa CMorishima S(2024)ChitChatGuide: Conversational Interaction Using Large Language Models for Assisting People with Visual Impairments to Explore a Shopping MallProceedings of the ACM on Human-Computer Interaction10.1145/36764928:MHCI(1-25)Online publication date: 24-Sep-2024
https://dl.acm.org/doi/10.1145/3676492
Marsh AMilne L(2024)I Don’t Want to Sound Rude, but It’s None of Their Business: Exploring Security and Privacy Concerns around Assistive Technology Use in Educational SettingsACM Transactions on Accessible Computing10.1145/367069017:2(1-30)Online publication date: 5-Jun-2024
https://dl.acm.org/doi/10.1145/3670690
Regimbal JBlum JKuo CCooperstock J(2024)IMAGE: An Open-Source, Extensible Framework for Deploying Accessible Audio and Haptic Renderings of Web GraphicsACM Transactions on Accessible Computing10.1145/366522317:2(1-17)Online publication date: 23-May-2024
https://dl.acm.org/doi/10.1145/3665223
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents