[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1866029.1866080acmconferencesArticle/Chapter ViewAbstractPublication PagesuistConference Proceedingsconference-collections
research-article

VizWiz: nearly real-time answers to visual questions

Published: 03 October 2010 Publication History

Abstract

The lack of access to visual information like text labels, icons, and colors can cause frustration and decrease independence for blind people. Current access technology uses automatic approaches to address some problems in this space, but the technology is error-prone, limited in scope, and quite expensive. In this paper, we introduce VizWiz, a talking application for mobile phones that offers a new alternative to answering visual questions in nearly real-time - asking multiple people on the web. To support answering questions quickly, we introduce a general approach for intelligently recruiting human workers in advance called quikTurkit so that workers are available when new questions arrive. A field deployment with 11 blind participants illustrates that blind people can effectively use VizWiz to cheaply answer questions in their everyday lives, highlighting issues that automatic approaches will need to address to be useful. Finally, we illustrate the potential of using VizWiz as part of the participatory design of advanced tools by using it to build and evaluate VizWiz::LocateIt, an interactive mobile tool that helps blind people solve general visual search problems.

References

[1]
}}Amazon Mechanical Turk. http://www.mturk.com/. 2010.
[2]
}}Amazon Remembers. http://www.amazon.com/gp/. 2010.
[3]
}}Bay, H., A. Ess, T. Tuytelaars, and L. Van Gool. Surf: Speeded up robust features. Proc. of CVIU 2008, v. 110, 346--359, 2008.
[4]
}}Blind with camera: Changing lives with photography. http://blindwithcamera.org/. 2009.
[5]
}}Chacha. http://www.chacha.com/. 2010.
[6]
}}Eyes-free. http://code.google.com/p/eyes-free/. 2010.
[7]
}}Gifford, S., J. Knox, J. James, and A. Prakash. Introduction to the talking points project. Proc. of ASSETS 2008, 271--272, 2008.
[8]
}}Google Goggles, 2010. http://www.google.com/mobile/goggles/.
[9]
}}Hong, D., S. Kimmel, R. Boehling, N. Camoriano, W. Cardwell, G. Jannaman, A. Purcell, D. Ross, an, E. Russel. Development of a semi-autonomous vehicle operable by the visually-impaired. IEEE Intl. Conf. on Multisensor Fusion and Integration for Intelligent Systems, 539--544, 2008.
[10]
}}Hsueh, P., P. Melville, and V. Sindhwani. Data quality from crowdsourcing: a study of annotation selection criteria. Proc. of the HLT 2009 Workshop on Active Learning for NLP, 27--35, 2009.
[11]
}}Intel reader. http://www.intel.com/healthcare/reader/. 2009.
[12]
}}Kane, S. K., J. P. Bigham, and J. O. Wobbrock. Slide rule: making mobile touch screens accessible to blind people using multi-touch interaction techniques. ASSETS 2008, 73--80, 2008.
[13]
}}Kane, S. K., C. Jayant, J. O. Wobbrock, and R. E. Ladner. Freedom to roam: a study of mobile device adoption and accessibility for people with visual and motor disabilities. ASSETS 2009, 115--122, 2009.
[14]
}}KGB, 2010. http://www.kgb.com.
[15]
}}Kittur A., E. H. Chi, and B. Suh. Crowdsourcing user studies with mechanical turk. In Proc. of the SIGCHI Conf. on Human Factors in Computing Systems (CHI 2008), pages 453--456, 2008.
[16]
}}16. kNFB reader. knfb Reading Technology, Inc., 2008. http://www.knfbreader.com/.
[17]
}}Knocking live video. ustream, 2010. http://knockinglive.com/.
[18]
}}Ko, J., and C. Kim. Low cost blur image detection and estimation for mobile devices. ICACT 2009, 1605--1610, 2009.
[19]
}}Little, G., L. Chilton, M. Goldman, and R. C. Miller. TurKit: Human Computation Algorithms on Mechanical Turk. UIST 2010, 2010.
[20]
}}Liu, X. A camera phone based currency reader for the visually impaired. ASSETS 2008, 305--306, 2008.
[21]
}}Looktel, 2010. http://www.looktel.com.
[22]
}}Matthews, T., S. Carter, C. Pai, J. Fong, and J. Mankoff. Scribe4me: Evaluating a mobile sound transcription tool for the deaf. UbiComp 2006, 159--176, 2006.
[23]
}}Matthews, T., J. Fong, F. W.-L. Ho-Ching, and J. Mankoff. Evaluating visualizations of non-speech sounds for the deaf. Behavior and Information Technology, 25(4):333--351, 2006.
[24]
}}Miniguide us. http://www.gdp-research.com.au/minig_4.htm/.
[25]
}}Mobile speak screen readers. Code Factory, 2008. http://www.codefactory.es/en/products.asp?id=16.
[26]
}}Ringel-Morris, M., J. Teevan, and K. Panovich. What do people ask their social networks, and why? a survey study of status message q&a behavior. CHI 2010, 1739--1748, 2010.
[27]
}}Power, M. R., Power, D., and Horstmanshof, L. Deaf people communicating via sms, tty, relay service, fax, and computers in australia. Journal of Deaf Studies and Deaf Education, v. 12, i. 1, 2006.
[28]
}}Rangin, H. B. Anatomy of a large-scale social search engine. WWW 2010), 431--440, 2010.
[29]
}}Solona, 2010. http://www.solona.net/.
[30]
}}Sorokin, A., and D. Forsyth. Utility data annotation with amazon mechanical turk. CVPRW 2008, 1--8, 2008.
[31]
}}Takagi, H., S. Kawanaka, M. Kobayashi, T. Itoh, and C. Asakawa. Social accessibility: achieving accessibility through collaborative metadata authoring. ASSETS 2008, 193--200, 2008.
[32]
}}Talking signs. http://www.talkingsigns.com/, 2008.
[33]
}}Testscout- your mobile reader, 2010. http://www.textscout.eu/en/.
[34]
}}Lanigan, P., A. M. Paulos, A. W. Williams, and P. Narasimhan. Trinetra: Assistive Technologies for the Blind. Carnegie Mellon University, CyLab, 2006.
[35]
}}UStream. ustream, 2010. http://www.ustream.tv/.
[36]
}}Voiceover: Macintosh OS X, 2007. http://www.apple.com/accessibility/voiceover/.
[37]
}}voice for android, 2010. www.seeingwithsound.com/android.htm.
[38]
}}von Ahn, L., and L. Dabbish. Labeling images with a computer game. CHI 2004, 319--326, 2004.
[39]
}}Yeh, T., J. J. Lee, and T. Darrell. Photo-based question answering. MM 2008, 389--398, 2008.

Cited By

View all
  • (2024)Revisiting the Key Components of Creativity Through Generative AIMaking Art With Generative AI Tools10.4018/979-8-3693-1950-5.ch001(1-16)Online publication date: 19-Apr-2024
  • (2024)A Multi-Modal Foundation Model to Assist People with Blindness and Low Vision in Environmental InteractionJournal of Imaging10.3390/jimaging1005010310:5(103)Online publication date: 26-Apr-2024
  • (2024)Integration of Smart Cane with Social Media: Design of a New Step Counter Algorithm for CaneIoT10.3390/iot50100095:1(168-186)Online publication date: 14-Mar-2024
  • Show More Cited By

Index Terms

  1. VizWiz: nearly real-time answers to visual questions

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    UIST '10: Proceedings of the 23nd annual ACM symposium on User interface software and technology
    October 2010
    476 pages
    ISBN:9781450302715
    DOI:10.1145/1866029
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 October 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. blind users
    2. non-visual interfaces
    3. real-time human computation

    Qualifiers

    • Research-article

    Conference

    UIST '10

    Acceptance Rates

    Overall Acceptance Rate 561 of 2,567 submissions, 22%

    Upcoming Conference

    UIST '25
    The 38th Annual ACM Symposium on User Interface Software and Technology
    September 28 - October 1, 2025
    Busan , Republic of Korea

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)602
    • Downloads (Last 6 weeks)159
    Reflects downloads up to 10 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Revisiting the Key Components of Creativity Through Generative AIMaking Art With Generative AI Tools10.4018/979-8-3693-1950-5.ch001(1-16)Online publication date: 19-Apr-2024
    • (2024)A Multi-Modal Foundation Model to Assist People with Blindness and Low Vision in Environmental InteractionJournal of Imaging10.3390/jimaging1005010310:5(103)Online publication date: 26-Apr-2024
    • (2024)Integration of Smart Cane with Social Media: Design of a New Step Counter Algorithm for CaneIoT10.3390/iot50100095:1(168-186)Online publication date: 14-Mar-2024
    • (2024)Knocking on doors: The use of blogging sites by visually impaired people in the USA preliminary studyConvergence: The International Journal of Research into New Media Technologies10.1177/13548565241261963Online publication date: 14-Jun-2024
    • (2024)"I Want to Publicize My Stutter": Community-led Collection and Curation of Chinese Stuttered Speech DataProceedings of the ACM on Human-Computer Interaction10.1145/36870148:CSCW2(1-27)Online publication date: 8-Nov-2024
    • (2024)AI-Vision: A Three-Layer Accessible Image Exploration System for People with Visual Impairments in ChinaProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785378:3(1-27)Online publication date: 9-Sep-2024
    • (2024)ImageExplorer Deployment: Understanding Text-Based and Touch-Based Image Exploration in the WildProceedings of the 21st International Web for All Conference10.1145/3677846.3677861(59-69)Online publication date: 13-May-2024
    • (2024)ChitChatGuide: Conversational Interaction Using Large Language Models for Assisting People with Visual Impairments to Explore a Shopping MallProceedings of the ACM on Human-Computer Interaction10.1145/36764928:MHCI(1-25)Online publication date: 24-Sep-2024
    • (2024)I Don’t Want to Sound Rude, but It’s None of Their Business: Exploring Security and Privacy Concerns around Assistive Technology Use in Educational SettingsACM Transactions on Accessible Computing10.1145/367069017:2(1-30)Online publication date: 5-Jun-2024
    • (2024)IMAGE: An Open-Source, Extensible Framework for Deploying Accessible Audio and Haptic Renderings of Web GraphicsACM Transactions on Accessible Computing10.1145/366522317:2(1-17)Online publication date: 23-May-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media