[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3279972.3279976acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Multimodal Reference Resolution In Collaborative Assembly Tasks

Published: 16 October 2018 Publication History

Abstract

Humans use verbal and non-verbal cues to communicate their intent in collaborative tasks. In situated dialogue, speakers typically direct their interlocutor's attention to referent objects using multimodal cues, and references to such entities are resolved in a collaborative nature. In this study we designed a multiparty task where humans teach each other how to assemble furniture, and captured eye-gaze, speech and pointing gestures. We analysed which multimodal cues carry the most information for resolving referring expressions, and report an object saliency classifier that using a multisensory input from speaker and addressee, detects the referent objects during the collaborative task.

References

[1]
Henny Admoni and Brian Scassellati. 2014. Data-driven model of nonverbal behavior for socially assistive human-robot interactions. In Proceedings of the 16th international conference on multimodal interaction. ACM, 196--199.
[2]
Henny Admoni and Brian Scassellati. 2017. Social eye gaze in human-robot interaction: a review. Journal of Human-Robot Interaction 6, 1 (2017), 25--63.
[3]
Sean Andrist, Wesley Collier, Michael Gleicher, Bilge Mutlu, and David Shaffer. 2015. Look together: Analyzing gaze coordination with epistemic network analysis. Frontiers in psychology 6 (2015), 1016.
[4]
Herbert H Clark and Meredyth A Krych. 2004. Speaking while monitoring addressees for understanding. Journal of memory and language 50, 1 (2004), 62--81.
[5]
Herbert H Clark and Deanna Wilkes-Gibbs. 1986. Referring as a collaborative process. Cognition 22, 1 (1986), 1--39.
[6]
Stephanie Gross, Brigitte Krenn, and Matthias Scheutz. 2017. The reliability of non-verbal cues for situated reference resolution and their interplay with language: implications for human robot interaction. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM, 189--196.
[7]
Joy E Hanna and Susan E Brennan. 2007. Speakers' eye gaze disambiguates referring expressions early during face-to-face conversation. Journal of Memory and Language 57, 4 (2007), 596--615.
[8]
Patrik Jonell, Mattias Bystedt, Per Fallgren, Dimosthenis Kontogiorgos, Jose Lopes, Zofia Malisz, Samuel Mascarenhas, Catharine Oertel, Eran Raveh, and Todd Shore. 2018. FARMI: A Framework for Recording Multimodal Interactions. In Language Resources and Evaluation Conference LREC 2018.
[9]
Casey Kennington and David Schlangen. 2017. A simple generative model of incremental reference resolution for situated dialogue. Computer Speech & Language 41 (2017), 43--67.
[10]
Dimosthenis Kontogiorgos, Vanya Avramova, Simon Alexandersson, Patrik Jonell, Catharine Oertel, Jonas Beskow, Gabriel Skantze, and Joakim Gustafsson. 2018. A multimodal corpus for mutual gaze and joint attention in multiparty situated interaction. In LREC.
[11]
Andy Lücking, Thies Pfeiffer, and Hannes Rieser. 2015. Pointing and reference reconsidered. Journal of Pragmatics 77 (2015), 56--79.
[12]
Ross G Macdonald and Benjamin W Tatler. 2015. Referent expressions and gaze: Reference type influences real-world gaze cue utilization. Journal of Experimental Psychology: Human perception and performance 41, 2 (2015), 565.
[13]
Gregor Mehlmann, Markus Häring, Kathrin Janowski, Tobias Baur, Patrick Gebhard, and Elisabeth André. 2014. Exploring a model of gaze for grounding in multimodal HRI. In Proceedings of the 16th International Conference on Multimodal Interaction. ACM, 247--254.
[14]
Zahar Prasov and Joyce Y Chai. 2010. Fusing eye gaze with speech recognition hypotheses to resolve exophoric references in situated dialogue. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 471--481.
[15]
Michael F Schober and Herbert H Clark. 1989. Understanding by addressees and overhearers. Cognitive psychology 21, 2 (1989).
[16]
Gabriel Skantze, Martin Johansson, and Jonas Beskow. 2015. Exploring turn-taking cues in multi-party human-robot discussions about objects. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction. ACM, 67--74.
[17]
Andrea Thomaz, Guy Hoffman, Maya Cakmak, et al. 2016. Computational human-robot interaction. Foundations and Trends® in Robotics 4, 2--3 (2016), 105--223.

Cited By

View all
  • (2024)Uncovering and Addressing Blink-Related Challenges in Using Eye Tracking for Interactive SystemsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642086(1-23)Online publication date: 11-May-2024
  • (2023)The Role of Multimodal Data for Modeling Communication in Artificial Social AgentsHandbook of Human‐Machine Systems10.1002/9781119863663.ch8(83-93)Online publication date: 7-Jul-2023
  • (2022)Integrating Gaze and Speech for Enabling Implicit InteractionsProceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3502134(1-14)Online publication date: 29-Apr-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MA3HMI'18: Proceedings of the 4th International Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction
October 2018
50 pages
ISBN:9781450360760
DOI:10.1145/3279972
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 October 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. grounding
  2. human-robot interaction
  3. referential eye gaze

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICMI '18
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Uncovering and Addressing Blink-Related Challenges in Using Eye Tracking for Interactive SystemsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642086(1-23)Online publication date: 11-May-2024
  • (2023)The Role of Multimodal Data for Modeling Communication in Artificial Social AgentsHandbook of Human‐Machine Systems10.1002/9781119863663.ch8(83-93)Online publication date: 7-Jul-2023
  • (2022)Integrating Gaze and Speech for Enabling Implicit InteractionsProceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3502134(1-14)Online publication date: 29-Apr-2022
  • (2019)Estimating Uncertainty in Task-Oriented Dialogue2019 International Conference on Multimodal Interaction10.1145/3340555.3353722(414-418)Online publication date: 14-Oct-2019
  • (2019)The Effects of Anthropomorphism and Non-verbal Social Behaviour in Virtual AssistantsProceedings of the 19th ACM International Conference on Intelligent Virtual Agents10.1145/3308532.3329466(133-140)Online publication date: 1-Jul-2019
  • (2019)Exploring Temporal Dependencies in Multimodal Referring Expressions with Mixed RealityVirtual, Augmented and Mixed Reality. Applications and Case Studies10.1007/978-3-030-21565-1_8(108-123)Online publication date: 8-Jun-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media