[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1452392.1452426acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Context-based recognition during human interactions: automatic feature selection and encoding dictionary

Published: 20 October 2008 Publication History

Abstract

During face-to-face conversation, people use visual feedback such as head nods to communicate relevant information and to synchronize rhythm between participants. In this paper we describe how contextual information from other participants can be used to predict visual feedback and improve recognition of head gestures in human-human interactions. For example, in a dyadic interaction, the speaker contextual cues such as gaze shifts or changes in prosody will influence listener backchannel feedback (e.g., head nod). To automatically learn how to integrate this contextual information into the listener gesture recognition framework, this paper addresses two main challenges: optimal feature representation using an encoding dictionary and automatic selection of optimal feature-encoding pairs. Multimodal integration between context and visual observations is performed using a discriminative sequential model (Latent-Dynamic Conditional Random Fields) trained on previous interactions. In our experiments involving 38 storytelling dyads, our context-based recognizer significantly improved head gesture recognition performance over a vision-only recognizer.

References

[1]
J. Allwood, J. Nivre, and E. Ahlsén. On the semantics and pragmatics of linguistic feedback. Journal of Semantics, pages 1--26, 1992.
[2]
H. Anderson, M. Bader, E. Bard, G. Doherty, S. Garrod, S. Isard, J. Kowtko, J. McAllister, J. Miller, C. Sotillo, H. Thompson, and R. Weinert. The mcrc map task corpus. Language and Speech, 34(4):351--366, 1991.
[3]
J. B. Bavelas, L. Coates, and T. Johnson. Listeners as co-narrators. Journal of Personality and Social Psychology, 79(6):941--952, 2000.
[4]
J. K. Burgoon, L. A. Stern, and L. Dillman. Interpersonal adaptation: Dyadic interaction patterns. Cambridge University Press, Cambridge, 1995.
[5]
N. Cathcart, J. Carletta, and E. Klein. A shallow model of backchannel continuers in spoken dialogue. In European ACL, pages 51--58, 2003.
[6]
H. H. Clark. Using Language. Cambridge University Press, 1996.
[7]
S. Fujie, Y. Ejiri, K. Nakajima, Y. Matsusaka, and T. Kobayashi. A conversation robot using head gesture recognition as para-linguistic information. In RO-MAN, pages 159--164, September 2004.
[8]
S. Igor, S. Petr, M. Pavel, B. LukáŽ, F. Michal, K. Martin, and C. Jan. Comparison of keyword spotting approaches for informal continuous speech. In MLMI, 2005.
[9]
E. Kaiser, A. Olwal, D. McGee, H. Benko, A. Corradini, X. Li, P. Cohen, and S. Feiner. Mutual disambiguation of 3d multimodal interaction in augmented and virtual reality. In ICMI, pages 12--19, November 2003.
[10]
A. Kapoor and R. Picard. A real-time head nod and shake detector. In PUI, November 2001.
[11]
S. Kawato and J. Ohya. Real-time detection of nodding and head-shaking by directly detecting and tracking the 'between-eyes'. In FG, pages 40--45, 2000.
[12]
L.-P. Morency and T. Darrell. Recognizing gaze aversion gestures in embodied conversational discourse. In ICMI, Banff, Canada, November 2006.
[13]
L.-P. Morency and T. Darrell. Conditional sequence model for context-based recognition of gaze aversion. In MLMI, 2007.
[14]
L.-P. Morency, C. Sidner, C. Lee, and T. Darrell. Head gestures for perceptual interfaces: The role of context in improving recognition. Artificial Intelligence, 171(8-9):568--585, June 2007.
[15]
R. Nishimura, N. Kitaoka, and S. Nakagawa. A spoken dialog system for chat-like conversations considering response timing. LNCS, 4629:599--606, 2007.
[16]
A. Rizzo, D. Klimchuk, R. Mitura, T. Bowerly, J. Buckwalter, and T. Parsons. A virtual reality scenario for all seasons: The virtual classroom. CNS Spectrums, 11(1):35--44, 2006.
[17]
L. Tickle-Degnen and R. Rosenthal. The nature of rapport and its nonverbal correlates. Psychological Inquiry, 1(4):285--293, 1990.
[18]
L. Z. Tiedens and A. R. Fragale. Power moves: Complementarity in dominant and submissive nonverbal behavior. Journal of Personality and Social Psychology, 84(3):558--568, 2003.
[19]
A. Torralba, K. P. Murphy, W. T. Freeman, and M. A. Rubin. Context-based vision system for place and object recognition. In ICCV, Nice, France, October 2003.
[20]
N. Ward and W. Tsukahara. Prosodic features which cue back-channel responses in english and japanese. Journal of Pragmatics, 23:1177--1207, 2000.
[21]
Watson: Head tracking and gesture recognition library. http://projects.ict.usc.edu/vision/watson/.
[22]
Y. Xiong, F. Quek, and D. McNeill. Hand motion gestural oscillations multimodal discourse. In ICMI, pages 132--139, Vancouver B. C., Canada, November 2003.
[23]
V. H. Yngve. On getting a word in edgewise. In Sixth regional Meeting of the Chicago Linguistic Society, pages 567--577, 1970.

Cited By

View all
  • (2023)An Integrated Model for Automated Identification and Learning of Conversational Gestures in Human–Robot InteractionCutting Edge Applications of Computational Intelligence Tools and Techniques10.1007/978-3-031-44127-1_3(33-61)Online publication date: 1-Dec-2023
  • (2019)Understanding the Dynamics of Social InteractionsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/330093715:1s(1-16)Online publication date: 17-Feb-2019
  • (2019)Situated interactionThe Handbook of Multimodal-Multisensor Interfaces10.1145/3233795.3233800(105-143)Online publication date: 1-Jul-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '08: Proceedings of the 10th international conference on Multimodal interfaces
October 2008
322 pages
ISBN:9781605581989
DOI:10.1145/1452392
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 October 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. contextual information
  2. head nod recognition
  3. human-human interaction
  4. visual gesture recognition

Qualifiers

  • Research-article

Conference

ICMI '08
Sponsor:
ICMI '08: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES
October 20 - 22, 2008
Crete, Chania, Greece

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)An Integrated Model for Automated Identification and Learning of Conversational Gestures in Human–Robot InteractionCutting Edge Applications of Computational Intelligence Tools and Techniques10.1007/978-3-031-44127-1_3(33-61)Online publication date: 1-Dec-2023
  • (2019)Understanding the Dynamics of Social InteractionsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/330093715:1s(1-16)Online publication date: 17-Feb-2019
  • (2019)Situated interactionThe Handbook of Multimodal-Multisensor Interfaces10.1145/3233795.3233800(105-143)Online publication date: 1-Jul-2019
  • (2019)Virtual Human Standardized Patients for Clinical TrainingVirtual Reality for Psychological and Neurocognitive Interventions10.1007/978-1-4939-9482-3_17(387-405)Online publication date: 24-Aug-2019
  • (2018)Using Parallel Episodes of Speech to Represent and Identify Interaction Dynamics for Group MeetingsProceedings of the Group Interaction Frontiers in Technology10.1145/3279981.3279983(1-7)Online publication date: 16-Oct-2018
  • (2018)Early Turn-Taking Prediction with Spiking Neural Networks for Human Robot Collaboration2018 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA.2018.8461208(3250-3256)Online publication date: May-2018
  • (2018)A Meeting Log Structuring System Using Wearable SensorsAdvances in Network-Based Information Systems10.1007/978-3-319-98530-5_75(841-852)Online publication date: 28-Aug-2018
  • (2016)Predicting Performance of Collaborative Storytelling Using Multimodal AnalysisIEICE Transactions on Information and Systems10.1587/transinf.2015CBP0003E99.D:6(1462-1473)Online publication date: 2016
  • (2016)Autonomous Virtual Human Agents for Healthcare Information Support and Clinical InterviewingArtificial Intelligence in Behavioral and Mental Health Care10.1016/B978-0-12-420248-1.00003-9(53-79)Online publication date: 2016
  • (2016)Virtual Reality Standardized Patients for Clinical TrainingThe Digital Patient10.1002/9781118952788.ch18(255-272)Online publication date: Jan-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media