[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

GAVIN: Gaze-Assisted Voice-Based Implicit Note-taking

Published: 11 August 2021 Publication History

Abstract

Annotation is an effective reading strategy people often undertake while interacting with digital text. It involves highlighting pieces of text and making notes about them. Annotating while reading in a desktop environment is considered trivial but, in a mobile setting where people read while hand-holding devices, the task of highlighting and typing notes on a mobile display is challenging. In this article, we introduce GAVIN, a gaze-assisted voice note-taking application, which enables readers to seamlessly take voice notes on digital documents by implicitly anchoring them to text passages. We first conducted a contextual enquiry focusing on participants’ note-taking practices on digital documents. Using these findings, we propose a method which leverages eye-tracking and machine learning techniques to annotate voice notes with reference text passages. To evaluate our approach, we recruited 32 participants performing voice note-taking. Following, we trained a classifier on the data collected to predict text passage where participants made voice notes. Lastly, we employed the classifier to built GAVIN and conducted a user study to demonstrate the feasibility of the system. This research demonstrates the feasibility of using gaze as a resource for implicit anchoring of voice notes, enabling the design of systems that allow users to record voice notes with minimal effort and high accuracy.

References

[1]
Yomna Abdelrahman, Anam Ahmad Khan, Joshua Newn, Eduardo Velloso, Sherine Ashraf Safwat, James Bailey, Andreas Bulling, Frank Vetere, and Albrecht Schmidt. 2019. Classifying attention types with thermal imaging and eye tracking. In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 3 (Sept. 2019), 27 pages.
[2]
Reem Y. Ali, Venkata M. V. Gunturi, S. Shekhar, Ahmed Eldawy, M. Mokbel, Andrew J. Kotz, and W. Northrop. 2015. Future connected vehicles: challenges and opportunities for spatio-temporal computing. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems.
[3]
Ignacio Aransay, M. Z. Sancho, P. A. Garcıa, and J. M. M. Fernández. 2015. Self-Organizing maps for detecting abnormal thermal behavior in data centers. In Proceedings of the 8th IEEE International Conference on Cloud Computing (CLOUD’15). 138–145.
[4]
Olivier Augereau, Hiroki Fujiyoshi, and Koichi Kise. 2016. Towards an automated estimation of english skill via TOEIC score based on reading analysis. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR’16). IEEE, 1285–1290.
[5]
Gal Ben-Yehudah and Yoram Eshet-Alkalai. 2014. The influence of text annotation tools on print and digital reading comprehension. In Proceedings of the 9th Chais Conference for the Study of Innovation and Learning Technologies. 28–35.
[6]
Carlos Bermejo, Débora Koatz, Carola Orrego, Lilisbeth Perestelo-Perez, Ana González González, Marta Ballester Santiago, Valeria Pacheco-Huergo, Yolanda Rey-Granado, Marcos Muñoz-Balsa, Ana Ramírez-Puerta, Yolanda Canellas-Criado, Francisco Pérez-Rivas, Ana Toledo Chávarri, and Mercedes Martínez-Marcos. 2019. Acceptability and feasibility of a virtual community of practice to primary care professionals regarding patient empowerment: a qualitative pilot study. BMC Health Services Research 19, 1 (2019), 403.
[7]
Dianne C. Berry. 1983. Metacognitive experience and transfer of logical reasoning. The Quarterly Journal of Experimental Psychology Section A 35, 1 (Feb. 1983), 39–49.
[8]
Hugh Beyer and Karen Holtzblatt. 1999. Contextual design. Interactions 6, 1 (Jan. 1999), 32–42.
[9]
Ralf Biedert, Andreas Dengel, Mostafa Elshamy, and Georg Buscher. 2012. Towards robust gaze-based objective quality measures for text. In Proceedings of the Symposium on Eye Tracking Research and Applications (ETRA’12). ACM, New York, NY, 201–204.
[10]
Peter Brandl, Christoph Richter, and Michael Haller. 2010. NiCEBook: Supporting natural note taking. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’10). Association for Computing Machinery, New York, NY, 599–608.
[11]
Georg Buscher, Andreas Dengel, Ludger van Elst, and Florian Mittag. 2008. Generating and using gaze-based document annotations. In Proceedings of the Conference on Human Factors in Computing Systems. 3045–3050.
[12]
Andrew J. Cavanaugh and Liyan Song. 2014. Audio feedback versus written feedback: Instructors’ and students’ perspectives. Journal of Online Learning and Teaching 10, 1 (2014), 122.
[13]
P. Chaudhari, Dipti P. Rana, Rupa G. Mehta, Narendra J. Mistry, and Mukesh M. Raghuwanshi. 2014. Discretization of temporal data: a survey. arXiv preprint arXiv:1402.4283 (2014).
[14]
Muhammad Faisal Cheema, Stefan Jänicke, and Gerik Scheuermann. 2016. AnnotateVis : Combining traditional close reading with visual text analysis. In Proceedings of the Workshop on Visualization for the Digital Humanities.
[15]
Shiwei Cheng, Zhiqiang Sun, Lingyun Sun, Kirsten Yee, and Anind K. Dey. 2015. Gaze-based annotations for reading comprehension. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI’15). ACM, New York, NY, 1569–1572.
[16]
John Cowan. 2017. The potential of cognitive think-aloud protocols for educational action-research. Active Learning in Higher Education 20, 3 (Oct. 2017), 219–232.
[17]
Allison Swan Dagen, C. Matter, Steven Rinehart, and Philip Ice. 2008. Can you hear me now? Providing feedback using audio commenting technology. College Reading Association Yearbook 29 (2008), 152–166.
[18]
Andreas Dengel, Ralf Biedert, Jörn Hees, and Georg Buscher. 2012. A robust realtime reading-skimming classifier. In Proceedings of the Symposium on Eye Tracking Research and Applications.
[19]
Santosh D’Mello, Roger Taylor, and Arthur Graesser. 2007. Monitoring affective trajectories during complex learning. In Proceedings of the 29th Annual Meeting of the Cognitive Science Society. 203–208.
[20]
M. Filetti, H. R. Tavakoli, N. Ravaja, and G. Jacucci. 2019. PeyeDF: An eye-tracking application for reading and self-indexing research. arXiv e-prints (April 2019). arXiv:cs.HC/1904.12152.
[21]
Noah D. Forrin and Colin M. MacLeod. 2017. This time it’s personal: the memory benefit of hearing oneself. Memory 26, 4 (Oct. 2017), 574–579.
[22]
Rudinei Goularte, Renan G. Cattelan, José A. Camacho-Guerrero, Valter R. Inácio Jr, and Maria da Graça C. Pimentel. 2004. Interactive multimedia annotations: enriching and extending content. In Proceedings of the 2004 ACM symposium on Document engineering. 84–86.
[23]
J. Grabowski. 2005. Speaking, writing, and memory span performance: replicating the Bourdin and Fayol results on cognitive load in German children and adults. Studies in Writing: PREPUBLICATIONS ARCHIVES. (2005). publication-archive.com/publication/1/163.
[24]
Qiong Gu, Li Zhu, and Zhihua Cai. 2009. Evaluation measures of the classification performance of imbalanced data sets. In Computational Intelligence and Intelligent Systems, Zhihua Cai, Zhenhua Li, Zhuo Kang, and Yong Liu (Eds.). Springer, Berlin. 461–471.
[25]
Dilek Hakkani-Tur, Malcolm Slaney, Celikyilmaz Asli, and Larry Heck. 2014. Eye gaze for spoken language understanding in multi-modal conversational interactions. In Proceedings of the 16th International Conference on Multimodal Interaction. 263–266.
[26]
Franz Hatfield and Eric A. Jenkins. 1997. An interface integrating eye gaze and voice recognition for hands-free computer access. In Proceedings of the CSUN 1997 Conference. 1–7.
[27]
Christian Holz and Patrick Baudisch. 2010. The generalized perceived input point model and how to double touch accuracy by extracting fingerprints. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 581–590.
[28]
Phil Ice, Reagan Curtis, Perry Phillips, and John Wells. 2007. Using asynchronous audio feedback to enhance teaching presence and students’ sense of community. Journal of Asynchronous Learning Networks 11, 2 (2007), 3–25.
[29]
Leonie Jacob, Andreas Lachner, and Katharina Scheiter. 2020. Learning by explaining orally or in written form? Text complexity matters. Learning and Instruction 68 (2020), 101344.
[30]
Robert J. K. Jacob. 1990. What you look at is what you get: Eye movement-based interaction techniques. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’90). ACM, New York, NY, 11–18.
[31]
Kjrsten Keane, Daniel McCrea, and Miriam Russell. 2018. Personalizing feedback using voice comments. Open Praxis 10, 4 (2018), 309–324.
[32]
Anam Ahmad Khan, Sadia Nawaz, Joshua Newn, Jason M. Lodge, James Bailey, and Eduardo Velloso. 2020. Using voice note-taking to promote learners’ conceptual understanding. arXiv:2012.02927.
[33]
Kai Kunze, Masakazu Iwamura, Koichi Kise, Seiichi Uchida, and Shinichiro Omachi. 2013. Activity recognition for the mind: Toward a cognitive “Quantified Self”. Computer 46, 10 (2013), 105–108.
[34]
Wen-Hung Liao, Chin-Wen Chang, and Yi-Chieh Wu. 2017. Classification of reading patterns based on gaze information. In Proceedings of the 2017 IEEE International Symposium on Multimedia (ISM’17). IEEE, 595–600.
[35]
Colin M. MacLeod. 2011. I said, you said: The production effect gets personal. Psychonomic Bulletin & Review 18, 6 (Dec. 2011), 1197–1202.
[36]
Chandra Sekhar Mantravadi. 2009. Adaptive multimodal integration of speech and gaze. Ph.D. Dissertation. Rutgers University-Graduate School-New Brunswick.
[37]
Vivian G. Motti, Roberto Fagá, Jr., Renan G. Catellan, Maria da Graça C. Pimentel, and Cesar A. C. Teixeira. 2009. Collaborative synchronous video annotation via the watch-and-comment paradigm. In Proceedings of the 7th European Conference on Interactive TV and Video (EuroITV’09). ACM, New York, NY, 67–76.
[38]
Fionn Murtagh and Pedro Contreras. 2011. Methods of hierarchical clustering. CoRR abs/1105.0121 (2011). arXiv:1105.0121. http://arxiv.org/abs/1105.0121.
[39]
Ayano Okoso, Kai Kunze, and Koichi Kise. 2014. Implicit gaze based annotations to support second language learning. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing. 283–286.
[40]
Roxana Ologeanu-Taddei, David Morquin, and R. Bourret. 2015. Understanding the perceived usefulness and the ease of use of a hospital information system: The case of a French university hospital. Studies in Health Technology and Informatics 210 (2015), 531–5.
[41]
Dan R. Olsen, Trent Taufer, and Jerry Alan Fails. 2004. ScreenCrayons: Annotating anything. In Proceedings of the 17th Annual ACM Symposium on User Interface Software and Technology (UIST’04). Association for Computing Machinery, New York, NY, 165–174.
[42]
Alex Olwal and Bernard Kress. 2018. 1D eyewear: peripheral, hidden LEDs and near-eye holographic displays for unobtrusive augmentation. In Proceedings of the 2018 ACM International Symposium on Wearable Computers. 184–187.
[43]
J. O’Donovan, J. Ward, S. Hodgins, and V. Sundstedt. 2009. Rabbit run: Gaze and voice based game interaction. In Proceedings of the Eurographics Ireland Workshop.
[44]
Catherine Pickering and Jason Byrne. 2013. The benefits of publishing systematic quantitative literature reviews for PhD candidates and other early-career researchers. Higher Education Research & Development 33, 3 (Nov. 2013), 534–548.
[45]
Carol Porter-Odonnell. 2004. Beyond the yellow highlighter: Teaching annotation skills to improve reading comprehension. The English Journal 93, 5 (2004), 82.
[46]
Luz Rello and Miguel Ballesteros. 2015. Detecting readers with dyslexia using machine learning with eye tracking measures. In Proceedings of the 12th Web for All Conference (W4A’15). ACM, New York, NY, Article 16, 8 pages.
[47]
Leon Rozenblit. 2002. The misunderstood limits of folk science: An illusion of explanatory depth. Cognitive Science 26, 5 (2002), 521–562.
[48]
Dario Salvucci and Joseph Goldberg. 2000. Identifying fixations and saccades in eye-tracking protocols. Proceedings of the Eye Tracking Research and Applications Symposium, 71–78.
[49]
Björn W. Schuller. 2018. Speech emotion recognition: Two decades in a nutshell, benchmarks, and ongoing trends. Communications of the ACM 61, 5 (April 2018), 90–99.
[50]
Linda E. Sibert and Robert J. K. Jacob. 2000. Evaluation of eye gaze interaction. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’00). Association for Computing Machinery, New York, NY, 281–288.
[51]
Susan Sipple. 2007. Ideas in practice: Developmental writers’ attitudes toward audio and written feedback. Journal of Developmental Education 30, 3 (2007), 22.
[52]
Mohammad Soleymani and Maja Pantic. 2012. Human-centered implicit tagging: Overview and perspectives. In Proceedings of the 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC’12). IEEE.
[53]
Namrata Srivastava, Joshua Newn, and Eduardo Velloso. 2018. Combining low and mid-level gaze features for desktop activity recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 4 (Dec. 2018), 27 pages.
[54]
Lisa J. Stifelman, Barry Arons, Chris Schmandt, and Eric A. Hulteen. 1993. VoiceNotes: a speech interface for a hand-held voice notetaker. In Proceedings of the INTERACT’93 and CHI’93 Conference on Human Factors in Computing Systems. 179–186.
[55]
David R. Thomas. 2006. A general inductive approach for analyzing qualitative evaluation data. American Journal of Evaluation 27, 2 (2006), 237–246.
[56]
Cagkan Uludagli and Cengiz Acarturk. 2018. User interaction in hands-free gaming: A comparative study of gaze-voice and touchscreen interface control. Turkish Journal of Electrical Engineering and Computer Sciences 26, 4 (2018), 1967–1976.
[57]
Jan van der Kamp and Veronica Sundstedt. 2011. Gaze and voice controlled drawing. In Proceedings of the 1st Conference on Novel Gaze-Controlled Applications (NGCA’11). Association for Computing Machinery, New York, NY, Article 9, 8 pages.
[58]
A. Vinciarelli, N. Suditu, and M. Pantic. 2009. Implicit human-centered tagging. In Proceedings of the 2009 IEEE International Conference on Multimedia and Expo. 1428–1431.
[59]
Tan Vo, B. Sumudu U. Mendis, and Tom Gedeon. 2010. Gaze pattern and reading comprehension. In Neural Information Processing. Models and Applications, Kok Wai Wong, B. Sumudu U. Mendis, and Abdesselam Bouzerdoum (Eds.). Springer, Berlin, 124–131.
[60]
Susanne Voelkel and Luciane V. Mello. 2014. Audio feedback – better feedback?Bioscience Education 22, 1 (2014), 16–30.
[61]
Peter Weill and Margrethe H. Olson. 1989. An assessment of the contingency theory of management information systems. Journal of Management Information Systems 6, 1 (1989), 59–86.
[62]
Lynn D. Wilcox, Bill N. Schilit, and Nitin Sawhney. 1997. Dynomite: A dynamically organized ink and audio notebook. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI’97). Association for Computing Machinery, New York, NY, 186–193.
[63]
Tom Wilcox, Mike Evans, Chris Pearce, Nick Pollard, and Veronica Sundstedt. 2008. Gaze and voice based game interaction: The revenge of the killer penguins. In Proceedings of the ACM SIGGRAPH 2008 Posters (SIGGRAPH’08). Association for Computing Machinery, New York, NY, Article 81, 1 pages.
[64]
Linda Wulf, Markus Garschall, Julia Himmelsbach, and Manfred Tscheligi. 2014. Hands free-care free: elderly people taking advantage of speech-only interaction. In Proceedings of the 8th Nordic Conference on Human-Computer Interaction: Fun, Fast, Foundational. 203–206.
[65]
Mustafa Yildiz and Ezgi Çetinkaya. 2017. The relationship between good readers’ attention, reading fluency and reading comprehension. Universal Journal of Educational Research 5, 3 (2017), 366–371.
[66]
K. Yoshimura, K. Kise, and K. Kunze. 2015. The eye as the window of the language ability: Estimation of English skills by analyzing eye movement while reading documents. In Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR’15). 251–255.
[67]
Qian Yu, Tonya Nguyen, Soravis Prakkamakul, and Niloufar Salehi. 2019. “I Almost Fell in love with a machine” speaking with computers affects self-disclosure. In Proceedings of the Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. 1–6.
[68]
Lina Zhou, Ammar S. Mohammed, and Dongsong Zhang. 2012. Mobile personal information management agent: Supporting natural language interface and application integration. Information Processing & Management 48, 1 (2012), 23–31.

Cited By

View all
  • (2024)The Analysis of Tourism Attitudes using Natural Language Processing Techniques: A Case of Malaysian TouristsAsian Health, Science and Technology Reports10.69650/ahstr.2024.115232:3(57-78)Online publication date: 18-Sep-2024
  • (2023)An End-to-End Review of Gaze Estimation and its Interactive Applications on Handheld Mobile DevicesACM Computing Surveys10.1145/360694756:2(1-38)Online publication date: 15-Sep-2023
  • (2023)DynamicRead: Exploring Robust Gaze Interaction Methods for Reading on Handheld Mobile Devices under Dynamic ConditionsProceedings of the ACM on Human-Computer Interaction10.1145/35911277:ETRA(1-17)Online publication date: 18-May-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Computer-Human Interaction
ACM Transactions on Computer-Human Interaction  Volume 28, Issue 4
August 2021
297 pages
ISSN:1073-0516
EISSN:1557-7325
DOI:10.1145/3477419
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 August 2021
Accepted: 01 March 2021
Revised: 01 December 2020
Received: 01 March 2020
Published in TOCHI Volume 28, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Implicit annotation
  2. eye tracking
  3. machine learning
  4. voice notes

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Australian Research Council Discovery Early Career Researcher

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)96
  • Downloads (Last 6 weeks)9
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)The Analysis of Tourism Attitudes using Natural Language Processing Techniques: A Case of Malaysian TouristsAsian Health, Science and Technology Reports10.69650/ahstr.2024.115232:3(57-78)Online publication date: 18-Sep-2024
  • (2023)An End-to-End Review of Gaze Estimation and its Interactive Applications on Handheld Mobile DevicesACM Computing Surveys10.1145/360694756:2(1-38)Online publication date: 15-Sep-2023
  • (2023)DynamicRead: Exploring Robust Gaze Interaction Methods for Reading on Handheld Mobile Devices under Dynamic ConditionsProceedings of the ACM on Human-Computer Interaction10.1145/35911277:ETRA(1-17)Online publication date: 18-May-2023
  • (2023)Evaluating User Interactions in Wearable Extended Reality: Modeling, Online Remote Survey, and In-Lab Experimental MethodsIEEE Access10.1109/ACCESS.2023.329859811(77856-77872)Online publication date: 2023
  • (2022)Designing Mobile MR WorkspacesProceedings of the ACM on Human-Computer Interaction10.1145/35467166:MHCI(1-17)Online publication date: 20-Sep-2022
  • (2022)Integrating Gaze and Speech for Enabling Implicit InteractionsProceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3502134(1-14)Online publication date: 29-Apr-2022

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media