[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Prediction of Who Will Be the Next Speaker and When Using Gaze Behavior in Multiparty Meetings

Published: 05 May 2016 Publication History

Abstract

In multiparty meetings, participants need to predict the end of the speaker’s utterance and who will start speaking next, as well as consider a strategy for good timing to speak next. Gaze behavior plays an important role in smooth turn-changing. This article proposes a prediction model that features three processing steps to predict (I) whether turn-changing or turn-keeping will occur, (II) who will be the next speaker in turn-changing, and (III) the timing of the start of the next speaker’s utterance. For the feature values of the model, we focused on gaze transition patterns and the timing structure of eye contact between a speaker and a listener near the end of the speaker’s utterance. Gaze transition patterns provide information about the order in which gaze behavior changes. The timing structure of eye contact is defined as who looks at whom and who looks away first, the speaker or listener, when eye contact between the speaker and a listener occurs. We collected corpus data of multiparty meetings, using the data to demonstrate relationships between gaze transition patterns and timing structure and situations (I), (II), and (III). The results of our analyses indicate that the gaze transition pattern of the speaker and listener and the timing structure of eye contact have a strong association with turn-changing, the next speaker in turn-changing, and the start time of the next utterance. On the basis of the results, we constructed prediction models using the gaze transition patterns and timing structure. The gaze transition patterns were found to be useful in predicting turn-changing, the next speaker in turn-changing, and the start time of the next utterance. Contrary to expectations, we did not find that the timing structure is useful for predicting the next speaker and the start time. This study opens up new possibilities for predicting the next speaker and the timing of the next utterance using gaze transition patterns in multiparty meetings.

References

[1]
Remco R. Bouckaert, Eibe Frank, Mark A. Hall, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2010. WEKA--experiences with a Java open-source project. Journal of Machine Learning Research 11, 2533--2541.
[2]
Lei Chen and Mary P. Harper. 2009. Multimodal floor control shift detection. In Proceedings of the International Conference on Multimodal Interaction. 15--22.
[3]
Anthony J. Conger. 1980. Integration and generalization of kappas for multiple raters. Psychological Bulletin 88, 2, 322--328.
[4]
Iwan de Kok and Dirk Heylen. 2009. Multimodal end-of-turn prediction in multi-party meetings. In Proceedings of the International Conference on Multimodal Interaction. 91--98.
[5]
Alfred Dielmann, Giulia Garau, and Hervé Bourlard. 2010. Floor holder detection and end of speaker turn prediction in meetings. In Proceedings of the Annual Conference on the International Speech Communication Association. 2306--2309.
[6]
Starkey Duncan. 1972. Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology 23, 2, 283--292.
[7]
Luciana Ferrer, Elizabeth Shriberg, and Andreas Stolcke. 2002. Is the speaker done yet? Faster and more accurate end-of-utterance detection using prosody in human--computer dialog. In Proceedings of the Annual Conference on the International Speech Communication Association, Vol. 3. 2061--2064.
[8]
Daniel Gatica-Perez. 2006. Analyzing group interactions in conversations: A review. In Proceedings of the International Conference on Multisensor Fusion and Integration for Intelligent Systems. 41--46.
[9]
Shelby J. Haberman. 1973. The analysis of residuals in cross-classified tables. Biometrics 29, 205--220.
[10]
Lixing Huang, Louis-Philippe Morency, and Jonathan Gratch. 2011. A multimodal end-of-turn prediction model: Learning from para social consensus sampling. In Proceedings of the International Conference on Autonomous Agents and Multi-Agent Systems.
[11]
Kristiina Jokinen, Hirohisa Furukawa, Masafumi Nishida, and Seiichi Yamamoto. 2013. Gaze and turn-taking behavior in casual conversational interactions. ACM Transactions on Interactive Intelligent Systems 3, 2, 12.
[12]
Natasa Jovanovic, Rieks op den Akker, and Anton Nijholt. 2006. Addressee identification in face-to-face meetings. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics.
[13]
Tatsuya Kawahara, Takuma Iwatate, and Katsuya Takanashii. 2012. Prediction of turn-taking by combining prosodic and eye-gaze information in poster conversations. In Proceedings of the Annual Conference on the International Speech Communication Association.
[14]
S. Sathiya Keerthi, Shirish Shevade, Chiranjib Bhattacharyya, and K. R. Krishna Murthy. 2001. Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Computation 13, 3, 637--649.
[15]
Adam Kendon. 1967. Some functions of gaze direction in social interaction. Acta Psychologica 26, 22--63.
[16]
Hanae Koiso, Yasuo Horiuchi, Syun Tutiya, Akira Ichikawa, and Yasuharu Den. 1998. An analysis of turn-taking and backchannels based on prosodic and syntactic features in Japanese map task dialogs. In Language and Speech, Vol. 41. 295--321.
[17]
Kornel Laskowski, Jens Edlund, and Mattias Heldner. 2011. A single-port non-parametric model of turn-taking in multi-party conversation. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 5600--5603.
[18]
Gina-Anne Levow. 2005. Turn-taking in Mandarin dialogue: Interactions of tones and intonation. In Proceedings of the SIGHAN Workshop on Chinese Language Processing.
[19]
Raveesh Meena, Gabriel Skantze, and Joakim Gustafson. 2014. Data-driven models for timing feedback responses in a map task dialogue system. Computer Speech and Language 28, 4, 903--922.
[20]
Louis-Philippe Morency, Iwan de Kok, and Jonathan Gratch. 2008. Predicting listener backchannels: A probabilistic multimodal approach. In Proceedings of the International Conference on Intelligent Virtual Agents. 176--190.
[21]
Kazuhiro Otsuka. 2011. Conversational scene analysis. IEEE Signal Processing Magazine 28, 127--131.
[22]
Kazuhiro Otsuka, Shoko Araki, Dan Mikami, Kentaro Ishizuka, Masakiyo Fujimoto, and Junji Yamato. 2009. Realtime meeting analysis and 3D meeting viewer based on omnidirectional multimodal sensors. In Proceedings of the International Conference on Multimodal Interfaces and Workshop on Machine Learning for Multimodal Interaction. 219--220.
[23]
Harvey Sacks, Emanuel A. Schegloff, and Gail Jefferson. 1974. A simplest systematics for the organisation of turn taking for conversation. Language 50, 696--735.
[24]
David Schlangen. 2006. From reaction to prediction experiments with computational models of turn-taking. In Proceedings of the Annual Conference on the International Speech Communication Association. 17--21.
[25]
Alex J. Smola and Bernhard Schölkopf. 2004. A tutorial on support vector regression. Statistics and Computing 14, 3, 199--222.
[26]
Vladimir Vapnik. 1998. Statistical Learning Theory. Wiley, New York.

Cited By

View all
  • (2024)A Computational Study on Sentence-based Next Speaker Prediction in Multiparty ConversationsProceedings of the 24th ACM International Conference on Intelligent Virtual Agents10.1145/3652988.3673915(1-4)Online publication date: 16-Sep-2024
  • (2024)Is It Possible to Recognize a Speaker Without Listening? Unraveling Conversation Dynamics in Multi-Party Interactions Using Continuous Eye GazeIEEE Robotics and Automation Letters10.1109/LRA.2024.34408449:11(9923-9929)Online publication date: Nov-2024
  • (2024)3M-Transformer: A Multi-Stage Multi-Stream Multimodal Transformer for Embodied Turn-Taking PredictionICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10448136(8050-8054)Online publication date: 14-Apr-2024
  • Show More Cited By

Index Terms

  1. Prediction of Who Will Be the Next Speaker and When Using Gaze Behavior in Multiparty Meetings

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Interactive Intelligent Systems
      ACM Transactions on Interactive Intelligent Systems  Volume 6, Issue 1
      Special Issue on New Directions in Eye Gaze for Interactive Intelligent Systems (Part 2 of 2), Regular Articles and Special Issue on Highlights of IUI 2015 (Part 1 of 2)
      May 2016
      219 pages
      ISSN:2160-6455
      EISSN:2160-6463
      DOI:10.1145/2896319
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 05 May 2016
      Accepted: 01 February 2016
      Revised: 01 January 2016
      Received: 01 December 2014
      Published in TIIS Volume 6, Issue 1

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Turn-changing
      2. gaze behavior
      3. multiparty meetings
      4. next speaker prediction
      5. speech timing prediction

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)59
      • Downloads (Last 6 weeks)4
      Reflects downloads up to 17 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)A Computational Study on Sentence-based Next Speaker Prediction in Multiparty ConversationsProceedings of the 24th ACM International Conference on Intelligent Virtual Agents10.1145/3652988.3673915(1-4)Online publication date: 16-Sep-2024
      • (2024)Is It Possible to Recognize a Speaker Without Listening? Unraveling Conversation Dynamics in Multi-Party Interactions Using Continuous Eye GazeIEEE Robotics and Automation Letters10.1109/LRA.2024.34408449:11(9923-9929)Online publication date: Nov-2024
      • (2024)3M-Transformer: A Multi-Stage Multi-Stream Multimodal Transformer for Embodied Turn-Taking PredictionICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10448136(8050-8054)Online publication date: 14-Apr-2024
      • (2024)Good Looking: How Gaze Patterns affect Users’ Perceptions of an Interactive Social Robot2024 IEEE International Conference on Advanced Robotics and Its Social Impacts (ARSO)10.1109/ARSO60199.2024.10557795(128-133)Online publication date: 20-May-2024
      • (2024)Quantitative Observation to Explore the Turn-Changing Mechanisms of Conversations in Remote Meetings Accompanying Supplemental MaterialsCollaboration Technologies and Social Computing10.1007/978-3-031-67998-8_11(161-176)Online publication date: 20-Aug-2024
      • (2023)Video-based Respiratory Waveform Estimation in Dialogue: A Novel Task and Dataset for Human-Machine InteractionProceedings of the 25th International Conference on Multimodal Interaction10.1145/3577190.3614154(649-660)Online publication date: 9-Oct-2023
      • (2023)Who's next?Proceedings of the 23rd ACM International Conference on Intelligent Virtual Agents10.1145/3570945.3607312(1-8)Online publication date: 19-Sep-2023
      • (2023)CUDA-GHR: Controllable Unsupervised Domain Adaptation for Gaze and Head Redirection2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV56688.2023.00054(467-477)Online publication date: Jan-2023
      • (2023)To Whom are You Talking? A Deep Learning Model to Endow Social Robots with Addressee Estimation Skills2023 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN54540.2023.10191452(1-10)Online publication date: 18-Jun-2023
      • (2022)Trimodal prediction of speaking and listening willingness to help improve turn-changing modelingFrontiers in Psychology10.3389/fpsyg.2022.77454713Online publication date: 18-Oct-2022
      • Show More Cited By

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media