More Web Proxy on the site http://driver.im/

research-article

Multimodal Reference Resolution In Collaborative Assembly Tasks

Authors:

Dimosthenis Kontogiorgos,

Elena Sibirtseva,

Gabriel Skantze,

Joakim GustafsonAuthors Info & Claims

MA3HMI'18: Proceedings of the 4th International Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction

Pages 38 - 42

https://doi.org/10.1145/3279972.3279976

Published: 16 October 2018 Publication History

Abstract

Humans use verbal and non-verbal cues to communicate their intent in collaborative tasks. In situated dialogue, speakers typically direct their interlocutor's attention to referent objects using multimodal cues, and references to such entities are resolved in a collaborative nature. In this study we designed a multiparty task where humans teach each other how to assemble furniture, and captured eye-gaze, speech and pointing gestures. We analysed which multimodal cues carry the most information for resolving referring expressions, and report an object saliency classifier that using a multisensory input from speaker and addressee, detects the referent objects during the collaborative task.

References

[1]

Henny Admoni and Brian Scassellati. 2014. Data-driven model of nonverbal behavior for socially assistive human-robot interactions. In Proceedings of the 16th international conference on multimodal interaction. ACM, 196--199.

Digital Library

[2]

Henny Admoni and Brian Scassellati. 2017. Social eye gaze in human-robot interaction: a review. Journal of Human-Robot Interaction 6, 1 (2017), 25--63.

Digital Library

[3]

Sean Andrist, Wesley Collier, Michael Gleicher, Bilge Mutlu, and David Shaffer. 2015. Look together: Analyzing gaze coordination with epistemic network analysis. Frontiers in psychology 6 (2015), 1016.

[4]

Herbert H Clark and Meredyth A Krych. 2004. Speaking while monitoring addressees for understanding. Journal of memory and language 50, 1 (2004), 62--81.

[5]

Herbert H Clark and Deanna Wilkes-Gibbs. 1986. Referring as a collaborative process. Cognition 22, 1 (1986), 1--39.

[6]

Stephanie Gross, Brigitte Krenn, and Matthias Scheutz. 2017. The reliability of non-verbal cues for situated reference resolution and their interplay with language: implications for human robot interaction. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM, 189--196.

Digital Library

[7]

Joy E Hanna and Susan E Brennan. 2007. Speakers' eye gaze disambiguates referring expressions early during face-to-face conversation. Journal of Memory and Language 57, 4 (2007), 596--615.

[8]

Patrik Jonell, Mattias Bystedt, Per Fallgren, Dimosthenis Kontogiorgos, Jose Lopes, Zofia Malisz, Samuel Mascarenhas, Catharine Oertel, Eran Raveh, and Todd Shore. 2018. FARMI: A Framework for Recording Multimodal Interactions. In Language Resources and Evaluation Conference LREC 2018.

[9]

Casey Kennington and David Schlangen. 2017. A simple generative model of incremental reference resolution for situated dialogue. Computer Speech & Language 41 (2017), 43--67.

Digital Library

[10]

Dimosthenis Kontogiorgos, Vanya Avramova, Simon Alexandersson, Patrik Jonell, Catharine Oertel, Jonas Beskow, Gabriel Skantze, and Joakim Gustafsson. 2018. A multimodal corpus for mutual gaze and joint attention in multiparty situated interaction. In LREC.

[11]

Andy Lücking, Thies Pfeiffer, and Hannes Rieser. 2015. Pointing and reference reconsidered. Journal of Pragmatics 77 (2015), 56--79.

[12]

Ross G Macdonald and Benjamin W Tatler. 2015. Referent expressions and gaze: Reference type influences real-world gaze cue utilization. Journal of Experimental Psychology: Human perception and performance 41, 2 (2015), 565.

[13]

Gregor Mehlmann, Markus Häring, Kathrin Janowski, Tobias Baur, Patrick Gebhard, and Elisabeth André. 2014. Exploring a model of gaze for grounding in multimodal HRI. In Proceedings of the 16th International Conference on Multimodal Interaction. ACM, 247--254.

Digital Library

[14]

Zahar Prasov and Joyce Y Chai. 2010. Fusing eye gaze with speech recognition hypotheses to resolve exophoric references in situated dialogue. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 471--481.

Digital Library

[15]

Michael F Schober and Herbert H Clark. 1989. Understanding by addressees and overhearers. Cognitive psychology 21, 2 (1989).

[16]

Gabriel Skantze, Martin Johansson, and Jonas Beskow. 2015. Exploring turn-taking cues in multi-party human-robot discussions about objects. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction. ACM, 67--74.

Digital Library

[17]

Andrea Thomaz, Guy Hoffman, Maya Cakmak, et al. 2016. Computational human-robot interaction. Foundations and Trends® in Robotics 4, 2--3 (2016), 105--223.

Digital Library

Cited By

Grootjen JWeingärtner HMayer S(2024)Uncovering and Addressing Blink-Related Challenges in Using Eye Tracking for Interactive SystemsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642086(1-23)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642086
Gross SKrenn B(2023)The Role of Multimodal Data for Modeling Communication in Artificial Social AgentsHandbook of Human‐Machine Systems10.1002/9781119863663.ch8(83-93)Online publication date: 7-Jul-2023
https://doi.org/10.1002/9781119863663.ch8
Khan ANewn JBailey JVelloso E(2022)Integrating Gaze and Speech for Enabling Implicit InteractionsProceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3502134(1-14)Online publication date: 29-Apr-2022
https://dl.acm.org/doi/10.1145/3491102.3502134
Show More Cited By

Index Terms

Multimodal Reference Resolution In Collaborative Assembly Tasks
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Discourse, dialogue and pragmatics
2. Human-centered computing
  1. Collaborative and social computing
    1. Empirical studies in collaborative and social computing

Recommendations

Human-robot collaborative tutoring using multiparty multimodal spoken dialogue
HRI '14: Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction

In this paper, we describe a project that explores a novel experimental setup towards building a spoken, multi-modally rich, and human-like multiparty tutoring robot. A human-robot interaction setup is designed, and a human-human dialogue corpus is ...
Linguistic theories in efficient multimodal reference resolution: an empirical investigation
IUI '05: Proceedings of the 10th international conference on Intelligent user interfaces

Multimodal conversational interfaces provide a natural means for users to communicate with computer systems through multiple modalities such as speech, gesture, and gaze. To build effective multimodal interfaces, understanding user multimodal inputs is ...
Multi-modal Language Models for Human-Robot Interaction
HRI '24: Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction

The recent progress in language models is enabling more flexible and natural conversation abilities for social robots. However, these language models were never made to be used in a physically embodied social agent. They lack the ability to process the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MA3HMI'18: Proceedings of the 4th International Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction

October 2018

50 pages

ISBN:9781450360760

DOI:10.1145/3279972

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICMI '18

Sponsor:

SIGCHI

ICMI '18: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

October 16, 2018

CO, Boulder, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
243
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)1

Reflects downloads up to 13 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Grootjen JWeingärtner HMayer S(2024)Uncovering and Addressing Blink-Related Challenges in Using Eye Tracking for Interactive SystemsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642086(1-23)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642086
Gross SKrenn B(2023)The Role of Multimodal Data for Modeling Communication in Artificial Social AgentsHandbook of Human‐Machine Systems10.1002/9781119863663.ch8(83-93)Online publication date: 7-Jul-2023
https://doi.org/10.1002/9781119863663.ch8
Khan ANewn JBailey JVelloso E(2022)Integrating Gaze and Speech for Enabling Implicit InteractionsProceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3502134(1-14)Online publication date: 29-Apr-2022
https://dl.acm.org/doi/10.1145/3491102.3502134
Kontogiorgos DPereira AGustafson J(2019)Estimating Uncertainty in Task-Oriented Dialogue2019 International Conference on Multimodal Interaction10.1145/3340555.3353722(414-418)Online publication date: 14-Oct-2019
https://dl.acm.org/doi/10.1145/3340555.3353722
Kontogiorgos DPereira AAndersson OKoivisto MGonzalez Rabal EVartiainen VGustafson JPelachaud CMartin JBuschmeier HLucas GKopp S(2019)The Effects of Anthropomorphism and Non-verbal Social Behaviour in Virtual AssistantsProceedings of the 19th ACM International Conference on Intelligent Virtual Agents10.1145/3308532.3329466(133-140)Online publication date: 1-Jul-2019
https://dl.acm.org/doi/10.1145/3308532.3329466
Sibirtseva EGhadirzadeh ALeite IBjörkman MKragic D(2019)Exploring Temporal Dependencies in Multimodal Referring Expressions with Mixed RealityVirtual, Augmented and Mixed Reality. Applications and Case Studies10.1007/978-3-030-21565-1_8(108-123)Online publication date: 8-Jun-2019
https://doi.org/10.1007/978-3-030-21565-1_8

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents