More Web Proxy on the site http://driver.im/

research-article

GAVIN: Gaze-Assisted Voice-Based Implicit Note-taking

Authors:

Anam Ahmad Khan,

Namrata Srivastava,

Eduardo VellosoAuthors Info & Claims

ACM Transactions on Computer-Human Interaction (TOCHI), Volume 28, Issue 4

Article No.: 26, Pages 1 - 32

https://doi.org/10.1145/3453988

Published: 11 August 2021 Publication History

Abstract

Annotation is an effective reading strategy people often undertake while interacting with digital text. It involves highlighting pieces of text and making notes about them. Annotating while reading in a desktop environment is considered trivial but, in a mobile setting where people read while hand-holding devices, the task of highlighting and typing notes on a mobile display is challenging. In this article, we introduce GAVIN, a gaze-assisted voice note-taking application, which enables readers to seamlessly take voice notes on digital documents by implicitly anchoring them to text passages. We first conducted a contextual enquiry focusing on participants’ note-taking practices on digital documents. Using these findings, we propose a method which leverages eye-tracking and machine learning techniques to annotate voice notes with reference text passages. To evaluate our approach, we recruited 32 participants performing voice note-taking. Following, we trained a classifier on the data collected to predict text passage where participants made voice notes. Lastly, we employed the classifier to built GAVIN and conducted a user study to demonstrate the feasibility of the system. This research demonstrates the feasibility of using gaze as a resource for implicit anchoring of voice notes, enabling the design of systems that allow users to record voice notes with minimal effort and high accuracy.

References

[1]

Yomna Abdelrahman, Anam Ahmad Khan, Joshua Newn, Eduardo Velloso, Sherine Ashraf Safwat, James Bailey, Andreas Bulling, Frank Vetere, and Albrecht Schmidt. 2019. Classifying attention types with thermal imaging and eye tracking. In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 3 (Sept. 2019), 27 pages.

Digital Library

[2]

Reem Y. Ali, Venkata M. V. Gunturi, S. Shekhar, Ahmed Eldawy, M. Mokbel, Andrew J. Kotz, and W. Northrop. 2015. Future connected vehicles: challenges and opportunities for spatio-temporal computing. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems.

Digital Library

[3]

Ignacio Aransay, M. Z. Sancho, P. A. Garcıa, and J. M. M. Fernández. 2015. Self-Organizing maps for detecting abnormal thermal behavior in data centers. In Proceedings of the 8th IEEE International Conference on Cloud Computing (CLOUD’15). 138–145.

[4]

Olivier Augereau, Hiroki Fujiyoshi, and Koichi Kise. 2016. Towards an automated estimation of english skill via TOEIC score based on reading analysis. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR’16). IEEE, 1285–1290.

[5]

Gal Ben-Yehudah and Yoram Eshet-Alkalai. 2014. The influence of text annotation tools on print and digital reading comprehension. In Proceedings of the 9th Chais Conference for the Study of Innovation and Learning Technologies. 28–35.

[6]

Carlos Bermejo, Débora Koatz, Carola Orrego, Lilisbeth Perestelo-Perez, Ana González González, Marta Ballester Santiago, Valeria Pacheco-Huergo, Yolanda Rey-Granado, Marcos Muñoz-Balsa, Ana Ramírez-Puerta, Yolanda Canellas-Criado, Francisco Pérez-Rivas, Ana Toledo Chávarri, and Mercedes Martínez-Marcos. 2019. Acceptability and feasibility of a virtual community of practice to primary care professionals regarding patient empowerment: a qualitative pilot study. BMC Health Services Research 19, 1 (2019), 403.

[7]

Dianne C. Berry. 1983. Metacognitive experience and transfer of logical reasoning. The Quarterly Journal of Experimental Psychology Section A 35, 1 (Feb. 1983), 39–49.

[8]

Hugh Beyer and Karen Holtzblatt. 1999. Contextual design. Interactions 6, 1 (Jan. 1999), 32–42.

Digital Library

[9]

Ralf Biedert, Andreas Dengel, Mostafa Elshamy, and Georg Buscher. 2012. Towards robust gaze-based objective quality measures for text. In Proceedings of the Symposium on Eye Tracking Research and Applications (ETRA’12). ACM, New York, NY, 201–204.

Digital Library

[10]

Peter Brandl, Christoph Richter, and Michael Haller. 2010. NiCEBook: Supporting natural note taking. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’10). Association for Computing Machinery, New York, NY, 599–608.

Digital Library

[11]

Georg Buscher, Andreas Dengel, Ludger van Elst, and Florian Mittag. 2008. Generating and using gaze-based document annotations. In Proceedings of the Conference on Human Factors in Computing Systems. 3045–3050.

Digital Library

[12]

Andrew J. Cavanaugh and Liyan Song. 2014. Audio feedback versus written feedback: Instructors’ and students’ perspectives. Journal of Online Learning and Teaching 10, 1 (2014), 122.

[13]

P. Chaudhari, Dipti P. Rana, Rupa G. Mehta, Narendra J. Mistry, and Mukesh M. Raghuwanshi. 2014. Discretization of temporal data: a survey. arXiv preprint arXiv:1402.4283 (2014).

[14]

Muhammad Faisal Cheema, Stefan Jänicke, and Gerik Scheuermann. 2016. AnnotateVis : Combining traditional close reading with visual text analysis. In Proceedings of the Workshop on Visualization for the Digital Humanities.

[15]

Shiwei Cheng, Zhiqiang Sun, Lingyun Sun, Kirsten Yee, and Anind K. Dey. 2015. Gaze-based annotations for reading comprehension. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI’15). ACM, New York, NY, 1569–1572.

Digital Library

[16]

John Cowan. 2017. The potential of cognitive think-aloud protocols for educational action-research. Active Learning in Higher Education 20, 3 (Oct. 2017), 219–232.

[17]

Allison Swan Dagen, C. Matter, Steven Rinehart, and Philip Ice. 2008. Can you hear me now? Providing feedback using audio commenting technology. College Reading Association Yearbook 29 (2008), 152–166.

[18]

Andreas Dengel, Ralf Biedert, Jörn Hees, and Georg Buscher. 2012. A robust realtime reading-skimming classifier. In Proceedings of the Symposium on Eye Tracking Research and Applications.

Digital Library

[19]

Santosh D’Mello, Roger Taylor, and Arthur Graesser. 2007. Monitoring affective trajectories during complex learning. In Proceedings of the 29th Annual Meeting of the Cognitive Science Society. 203–208.

[20]

M. Filetti, H. R. Tavakoli, N. Ravaja, and G. Jacucci. 2019. PeyeDF: An eye-tracking application for reading and self-indexing research. arXiv e-prints (April 2019). arXiv:cs.HC/1904.12152.

[21]

Noah D. Forrin and Colin M. MacLeod. 2017. This time it’s personal: the memory benefit of hearing oneself. Memory 26, 4 (Oct. 2017), 574–579.

[22]

Rudinei Goularte, Renan G. Cattelan, José A. Camacho-Guerrero, Valter R. Inácio Jr, and Maria da Graça C. Pimentel. 2004. Interactive multimedia annotations: enriching and extending content. In Proceedings of the 2004 ACM symposium on Document engineering. 84–86.

Digital Library

[23]

J. Grabowski. 2005. Speaking, writing, and memory span performance: replicating the Bourdin and Fayol results on cognitive load in German children and adults. Studies in Writing: PREPUBLICATIONS ARCHIVES. (2005). publication-archive.com/publication/1/163.

[24]

Qiong Gu, Li Zhu, and Zhihua Cai. 2009. Evaluation measures of the classification performance of imbalanced data sets. In Computational Intelligence and Intelligent Systems, Zhihua Cai, Zhenhua Li, Zhuo Kang, and Yong Liu (Eds.). Springer, Berlin. 461–471.

[25]

Dilek Hakkani-Tur, Malcolm Slaney, Celikyilmaz Asli, and Larry Heck. 2014. Eye gaze for spoken language understanding in multi-modal conversational interactions. In Proceedings of the 16th International Conference on Multimodal Interaction. 263–266.

Digital Library

[26]

Franz Hatfield and Eric A. Jenkins. 1997. An interface integrating eye gaze and voice recognition for hands-free computer access. In Proceedings of the CSUN 1997 Conference. 1–7.

[27]

Christian Holz and Patrick Baudisch. 2010. The generalized perceived input point model and how to double touch accuracy by extracting fingerprints. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 581–590.

Digital Library

[28]

Phil Ice, Reagan Curtis, Perry Phillips, and John Wells. 2007. Using asynchronous audio feedback to enhance teaching presence and students’ sense of community. Journal of Asynchronous Learning Networks 11, 2 (2007), 3–25.

[29]

Leonie Jacob, Andreas Lachner, and Katharina Scheiter. 2020. Learning by explaining orally or in written form? Text complexity matters. Learning and Instruction 68 (2020), 101344.

[30]

Robert J. K. Jacob. 1990. What you look at is what you get: Eye movement-based interaction techniques. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’90). ACM, New York, NY, 11–18.

Digital Library

[31]

Kjrsten Keane, Daniel McCrea, and Miriam Russell. 2018. Personalizing feedback using voice comments. Open Praxis 10, 4 (2018), 309–324.

[32]

Anam Ahmad Khan, Sadia Nawaz, Joshua Newn, Jason M. Lodge, James Bailey, and Eduardo Velloso. 2020. Using voice note-taking to promote learners’ conceptual understanding. arXiv:2012.02927.

[33]

Kai Kunze, Masakazu Iwamura, Koichi Kise, Seiichi Uchida, and Shinichiro Omachi. 2013. Activity recognition for the mind: Toward a cognitive “Quantified Self”. Computer 46, 10 (2013), 105–108.

Digital Library

[34]

Wen-Hung Liao, Chin-Wen Chang, and Yi-Chieh Wu. 2017. Classification of reading patterns based on gaze information. In Proceedings of the 2017 IEEE International Symposium on Multimedia (ISM’17). IEEE, 595–600.

[35]

Colin M. MacLeod. 2011. I said, you said: The production effect gets personal. Psychonomic Bulletin & Review 18, 6 (Dec. 2011), 1197–1202.

[36]

Chandra Sekhar Mantravadi. 2009. Adaptive multimodal integration of speech and gaze. Ph.D. Dissertation. Rutgers University-Graduate School-New Brunswick.

[37]

Vivian G. Motti, Roberto Fagá, Jr., Renan G. Catellan, Maria da Graça C. Pimentel, and Cesar A. C. Teixeira. 2009. Collaborative synchronous video annotation via the watch-and-comment paradigm. In Proceedings of the 7th European Conference on Interactive TV and Video (EuroITV’09). ACM, New York, NY, 67–76.

Digital Library

[38]

Fionn Murtagh and Pedro Contreras. 2011. Methods of hierarchical clustering. CoRR abs/1105.0121 (2011). arXiv:1105.0121. http://arxiv.org/abs/1105.0121.

[39]

Ayano Okoso, Kai Kunze, and Koichi Kise. 2014. Implicit gaze based annotations to support second language learning. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing. 283–286.

Digital Library

[40]

Roxana Ologeanu-Taddei, David Morquin, and R. Bourret. 2015. Understanding the perceived usefulness and the ease of use of a hospital information system: The case of a French university hospital. Studies in Health Technology and Informatics 210 (2015), 531–5.

[41]

Dan R. Olsen, Trent Taufer, and Jerry Alan Fails. 2004. ScreenCrayons: Annotating anything. In Proceedings of the 17th Annual ACM Symposium on User Interface Software and Technology (UIST’04). Association for Computing Machinery, New York, NY, 165–174.

Digital Library

[42]

Alex Olwal and Bernard Kress. 2018. 1D eyewear: peripheral, hidden LEDs and near-eye holographic displays for unobtrusive augmentation. In Proceedings of the 2018 ACM International Symposium on Wearable Computers. 184–187.

Digital Library

[43]

J. O’Donovan, J. Ward, S. Hodgins, and V. Sundstedt. 2009. Rabbit run: Gaze and voice based game interaction. In Proceedings of the Eurographics Ireland Workshop.

[44]

Catherine Pickering and Jason Byrne. 2013. The benefits of publishing systematic quantitative literature reviews for PhD candidates and other early-career researchers. Higher Education Research & Development 33, 3 (Nov. 2013), 534–548.

[45]

Carol Porter-Odonnell. 2004. Beyond the yellow highlighter: Teaching annotation skills to improve reading comprehension. The English Journal 93, 5 (2004), 82.

[46]

Luz Rello and Miguel Ballesteros. 2015. Detecting readers with dyslexia using machine learning with eye tracking measures. In Proceedings of the 12th Web for All Conference (W4A’15). ACM, New York, NY, Article 16, 8 pages.

Digital Library

[47]

Leon Rozenblit. 2002. The misunderstood limits of folk science: An illusion of explanatory depth. Cognitive Science 26, 5 (2002), 521–562.

[48]

Dario Salvucci and Joseph Goldberg. 2000. Identifying fixations and saccades in eye-tracking protocols. Proceedings of the Eye Tracking Research and Applications Symposium, 71–78.

Digital Library

[49]

Björn W. Schuller. 2018. Speech emotion recognition: Two decades in a nutshell, benchmarks, and ongoing trends. Communications of the ACM 61, 5 (April 2018), 90–99.

Digital Library

[50]

Linda E. Sibert and Robert J. K. Jacob. 2000. Evaluation of eye gaze interaction. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’00). Association for Computing Machinery, New York, NY, 281–288.

Digital Library

[51]

Susan Sipple. 2007. Ideas in practice: Developmental writers’ attitudes toward audio and written feedback. Journal of Developmental Education 30, 3 (2007), 22.

[52]

Mohammad Soleymani and Maja Pantic. 2012. Human-centered implicit tagging: Overview and perspectives. In Proceedings of the 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC’12). IEEE.

[53]

Namrata Srivastava, Joshua Newn, and Eduardo Velloso. 2018. Combining low and mid-level gaze features for desktop activity recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 4 (Dec. 2018), 27 pages.

Digital Library

[54]

Lisa J. Stifelman, Barry Arons, Chris Schmandt, and Eric A. Hulteen. 1993. VoiceNotes: a speech interface for a hand-held voice notetaker. In Proceedings of the INTERACT’93 and CHI’93 Conference on Human Factors in Computing Systems. 179–186.

Digital Library

[55]

David R. Thomas. 2006. A general inductive approach for analyzing qualitative evaluation data. American Journal of Evaluation 27, 2 (2006), 237–246.

[56]

Cagkan Uludagli and Cengiz Acarturk. 2018. User interaction in hands-free gaming: A comparative study of gaze-voice and touchscreen interface control. Turkish Journal of Electrical Engineering and Computer Sciences 26, 4 (2018), 1967–1976.

[57]

Jan van der Kamp and Veronica Sundstedt. 2011. Gaze and voice controlled drawing. In Proceedings of the 1st Conference on Novel Gaze-Controlled Applications (NGCA’11). Association for Computing Machinery, New York, NY, Article 9, 8 pages.

Digital Library

[58]

A. Vinciarelli, N. Suditu, and M. Pantic. 2009. Implicit human-centered tagging. In Proceedings of the 2009 IEEE International Conference on Multimedia and Expo. 1428–1431.

[59]

Tan Vo, B. Sumudu U. Mendis, and Tom Gedeon. 2010. Gaze pattern and reading comprehension. In Neural Information Processing. Models and Applications, Kok Wai Wong, B. Sumudu U. Mendis, and Abdesselam Bouzerdoum (Eds.). Springer, Berlin, 124–131.

[60]

Susanne Voelkel and Luciane V. Mello. 2014. Audio feedback – better feedback?Bioscience Education 22, 1 (2014), 16–30.

[61]

Peter Weill and Margrethe H. Olson. 1989. An assessment of the contingency theory of management information systems. Journal of Management Information Systems 6, 1 (1989), 59–86.

Digital Library

[62]

Lynn D. Wilcox, Bill N. Schilit, and Nitin Sawhney. 1997. Dynomite: A dynamically organized ink and audio notebook. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI’97). Association for Computing Machinery, New York, NY, 186–193.

Digital Library

[63]

Tom Wilcox, Mike Evans, Chris Pearce, Nick Pollard, and Veronica Sundstedt. 2008. Gaze and voice based game interaction: The revenge of the killer penguins. In Proceedings of the ACM SIGGRAPH 2008 Posters (SIGGRAPH’08). Association for Computing Machinery, New York, NY, Article 81, 1 pages.

Digital Library

[64]

Linda Wulf, Markus Garschall, Julia Himmelsbach, and Manfred Tscheligi. 2014. Hands free-care free: elderly people taking advantage of speech-only interaction. In Proceedings of the 8th Nordic Conference on Human-Computer Interaction: Fun, Fast, Foundational. 203–206.

Digital Library

[65]

Mustafa Yildiz and Ezgi Çetinkaya. 2017. The relationship between good readers’ attention, reading fluency and reading comprehension. Universal Journal of Educational Research 5, 3 (2017), 366–371.

[66]

K. Yoshimura, K. Kise, and K. Kunze. 2015. The eye as the window of the language ability: Estimation of English skills by analyzing eye movement while reading documents. In Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR’15). 251–255.

Digital Library

[67]

Qian Yu, Tonya Nguyen, Soravis Prakkamakul, and Niloufar Salehi. 2019. “I Almost Fell in love with a machine” speaking with computers affects self-disclosure. In Proceedings of the Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. 1–6.

Digital Library

[68]

Lina Zhou, Ammar S. Mohammed, and Dongsong Zhang. 2012. Mobile personal information management agent: Supporting natural language interface and application integration. Information Processing & Management 48, 1 (2012), 23–31.

Digital Library

Cited By

Bin Hossain MBinmad R(2024)The Analysis of Tourism Attitudes using Natural Language Processing Techniques: A Case of Malaysian TouristsAsian Health, Science and Technology Reports10.69650/ahstr.2024.115232:3(57-78)Online publication date: 18-Sep-2024
https://doi.org/10.69650/ahstr.2024.1152
Lei YHe SKhamis MYe J(2023)An End-to-End Review of Gaze Estimation and its Interactive Applications on Handheld Mobile DevicesACM Computing Surveys10.1145/360694756:2(1-38)Online publication date: 15-Sep-2023
https://dl.acm.org/doi/10.1145/3606947
Lei YWang YCaslin TWisowaty AZhu XKhamis MYe J(2023)DynamicRead: Exploring Robust Gaze Interaction Methods for Reading on Handheld Mobile Devices under Dynamic ConditionsProceedings of the ACM on Human-Computer Interaction10.1145/35911277:ETRA(1-17)Online publication date: 18-May-2023
https://dl.acm.org/doi/10.1145/3591127
Show More Cited By

Index Terms

GAVIN: Gaze-Assisted Voice-Based Implicit Note-taking
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction devices
      1. Sound-based input / output
    2. Interactive systems and tools

Recommendations

To type or to speak? The effect of input modality on text understanding during note-taking
CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems

Though recent technological advances have enabled note-taking through different modalities (e.g., keyboard, digital ink, voice), there is still a lack of understanding of the effect of the modality choice on learning. In this paper, we compared two note-...
Wordometer Systems for Everyday Life

We present in this paper a detailed comparison of different algorithms and devices to determine the number of words read in everyday life. We call our system the “Wordometer”. We used three kinds of eye tracking systems in our experiment: mobile video-...
Reducing shoulder-surfing by using gaze-based password entry
SOUPS '07: Proceedings of the 3rd symposium on Usable privacy and security

Shoulder-surfing -- using direct observation techniques, such as looking over someone's shoulder, to get passwords, PINs and other sensitive personal information -- is a problem that has been difficult to overcome. When a user enters information using a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Computer-Human Interaction

ACM Transactions on Computer-Human Interaction Volume 28, Issue 4

August 2021

297 pages

ISSN:1073-0516

EISSN:1557-7325

DOI:10.1145/3477419

Editor:
Kristina Höök
KTH Royal Institute of Technology, Sweden

Issue’s Table of Contents

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 August 2021

Accepted: 01 March 2021

Revised: 01 December 2020

Received: 01 March 2020

Published in TOCHI Volume 28, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Australian Research Council Discovery Early Career Researcher

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
585
Total Downloads

Downloads (Last 12 months)96
Downloads (Last 6 weeks)9

Reflects downloads up to 09 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bin Hossain MBinmad R(2024)The Analysis of Tourism Attitudes using Natural Language Processing Techniques: A Case of Malaysian TouristsAsian Health, Science and Technology Reports10.69650/ahstr.2024.115232:3(57-78)Online publication date: 18-Sep-2024
https://doi.org/10.69650/ahstr.2024.1152
Lei YHe SKhamis MYe J(2023)An End-to-End Review of Gaze Estimation and its Interactive Applications on Handheld Mobile DevicesACM Computing Surveys10.1145/360694756:2(1-38)Online publication date: 15-Sep-2023
https://dl.acm.org/doi/10.1145/3606947
Lei YWang YCaslin TWisowaty AZhu XKhamis MYe J(2023)DynamicRead: Exploring Robust Gaze Interaction Methods for Reading on Handheld Mobile Devices under Dynamic ConditionsProceedings of the ACM on Human-Computer Interaction10.1145/35911277:ETRA(1-17)Online publication date: 18-May-2023
https://dl.acm.org/doi/10.1145/3591127
Ghasemi YJeong HPark KChoi SLee J(2023)Evaluating User Interactions in Wearable Extended Reality: Modeling, Online Remote Survey, and In-Lab Experimental MethodsIEEE Access10.1109/ACCESS.2023.329859811(77856-77872)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3298598
Li JWoik LButz A(2022)Designing Mobile MR WorkspacesProceedings of the ACM on Human-Computer Interaction10.1145/35467166:MHCI(1-17)Online publication date: 20-Sep-2022
https://dl.acm.org/doi/10.1145/3546716
Khan ANewn JBailey JVelloso E(2022)Integrating Gaze and Speech for Enabling Implicit InteractionsProceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3502134(1-14)Online publication date: 29-Apr-2022
https://dl.acm.org/doi/10.1145/3491102.3502134

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents