[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3570945.3607298acmconferencesArticle/Chapter ViewAbstractPublication PagesivaConference Proceedingsconference-collections
extended-abstract

Prediction of Various Backchannel Utterances Based on Multimodal Information

Published: 22 December 2023 Publication History

Abstract

The listener's backchannels are an important part of dialogues. With appropriate backchannels, people are able to smoothly promote dialogues. Thus, backchannels are considered to be important in dialogues between not only humans but also humans and agents. Progress has been made in studying dialogue agents that perform natural affable dialogue. However, we have not clarified whether the listener's various backchannel types are predictable using the speaker's multimodal information. In this paper, we attempt to predict a listener's various backchannel types on the basis of the speaker's multimodal information in dialogues. First, we construct a dialogue corpus that consists of multimodal information of a speaker's utterances and a listener's backchannels. Second, we construct machine learning models to predict a listener's various backchannel types on the basis of a speaker's multimodal information. Our results suggest that our model was able to predict a listener's various backchannel types on the basis of a speaker's multimodal information.

References

[1]
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).
[2]
Tadas Baltrusaitis, Amir Zadeh, Yao Chong Lim, and Louis-Philippe Morency. 2018. OpenFace 2.0: Facial Behavior Analysis Toolkit. In 13th IEEE international conference on automatic face and gesture recognition (FG '18). 59--66.
[3]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT '19). 4171--4186.
[4]
Paul Ekman and Wallace V. Friesen. 1977. Manual for the Facial Action Coding System. Palo Alto: Consulting Psychologists Press (1977).
[5]
Florian Eyben, Felix Weninger, Florian Gross, and Björn Schuller. 2013. Recent Developments in openSMILE, the Munich Open-Source Multimedia Feature Extractor. In Proc. 21st ACM international conference on Multimedia (MM '13). 835--838.
[6]
Shinya Fujie, Kenta Fukushima, and Tetsunori Kobayashi. 2005. Back-channel Feedback Generation Using Linguistic and Nonlinguistic Information and Its Application to Spoken Dialogue System. In 9th European Conference on Speech Communication and Technology (INTERSPEECH '05). 889--892.
[7]
Kohei Hara, Koji Inoue, Katsuya Takanashi, and Tatsuya Kawahara. 2018. Prediction of Turn-taking Using Multitask Learning with Prediction of Backchannels and Fillers. In Proc. 19th Annual Conference of the International Speech Communication Association (INTERSPEECH '18). 991--995.
[8]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proc. IEEE conference on computer vision and pattern recognition (CVPR '16). 770--778.
[9]
Shawn Hershey, Sourish Chaudhuri, Daniel P.W. Ellis, Jort F. Gemmeke, Aren Jansen, R. Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron J. Weiss, and Kevin Wilson. 2017. CNN architectures for large-scale audio classification. In 2017 ieee international conference on acoustics, speech and signal processing (ICASSP '17). 131--135.
[10]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.
[11]
Lixing Huang, Louis-Philippe Morency, and Jonathan Gratch. 2010. Parasocial Consensus Sampling: Combining Multiple Perspectives to Learn Virtual Human Behavior. In Proc. 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS '10). 1265--1272.
[12]
Ryo Ishii, Ryuichiro Higashinaka, and Junji Tomita. 2018. Predicting Nods by using Dialogue Acts in Dialogue. In Proc. 11th International Conference on Language Resources and Evaluation (LREC '18). 2940--2944.
[13]
Ryo Ishii, Xutong Ren, Michal Muszynski, and Louis-Philippe Morency. 2021. Multimodal and Multitask Approach to Listener's Backchannel Prediction: Can Prediction of Turn-changing and Turn-management Willingness Improve Backchannel Modeling?. In Proc. 21st ACM International Conference on Intelligent Virtual Agents (IVA '21). 131--138.
[14]
Tatsuya Kawahara. 2019. Spoken Dialogue System for a Human-like Conversational Robot ERICA. In Proc. 10th International Workshop on Spoken Dialogue Systems (IWSDS '19). 65--75.
[15]
Hanae Koiso, Yasuo Horiuchi, Syun Tutiya, Akira Ichikawa, and Yasuharu Den. 1998. An Analysis of Turn-Taking and Backchannels Based on Prosodic and Syntactic Features in Japanese Map Task Dialogs. Language and Speech 41 (1998), 295--321.
[16]
Taku Kudo, Kaoru Yamamoto, and Yuji Matsumoto. 2004. Applying conditional random fields to Japanese morphological analysis. In Proc. 2004 conference on empirical methods in natural language processing. 230--237.
[17]
Andrew L Maas, Awni Y Hannun, Andrew Y Ng, et al. 2013. Rectifier non-linearities improve neural network acoustic models. In Proc. 30th International Conferenceon Machine Learning (ICML '13).
[18]
Senko K. Maynard. 1986. On back-channel behavior in Japanese and English casual conversation. Linguistics 24, 6 (1986), 1079--1108.
[19]
Louis-Philippe Morency, Iwan De Kok, and Jonathan Gratch. 2008. Predicting Listener Backchannels: A Probabilistic Multimodal Approach. In International Workshop on Intelligent Virtual Agents (IVA '08). 176--190.
[20]
Akira Morikawa, Ryo Ishii, Hajime Noto, Atsushi Fukayama, and Takao Nakamura. 2022. Determining Most Suitable Listener Backchannel Type for Speaker's Utterance. In Proc. 22nd ACM International Conference on Intelligent Virtual Agents (IVA '22). 1--3.
[21]
Markus Mueller, David Leuschner, Lars Briem, Maria Schmidt, Kevin Kilgour, Sebastian Stueker, and Alex Waibel. 2015. Using Neural Networks for Data-Driven Backchannel Prediction: A Survey on Input Features and Training Techniques. In International Conference on Human-Computer Interaction (HCI '15). 329--340.
[22]
Chiharu Mukai. 1999. The Use of Back-channels by Advanced Learners of Japanese: Its Qualitative and Quantitative Aspects. Japanese language education around the globe 9 (1999), 197--219.
[23]
Robin Ruede, Markus Müller, Sebastian Stüker, and Alex Waibel. 2019. Yeah, Right, Uh-Huh: A Deep Learning Backchannel Predictor. Advanced Social Interaction with Agents 510 (2019), 247--258.
[24]
Björn Schuller, Stefan Steidl, and Anton Batliner. 2009. The interspeech 2009 emotion challenge. (2009).
[25]
Khiet P. Truong, Ronald Poppe, and Dirk Heylen. 2010. Arule-based backchannel prediction model using pitch and pause information. In 11th Annual Conference of the International Speech Communication Association (INTERSPEECH '10). 3058--3061.
[26]
NigelWard.1996. Using Prosodic Clues to Decide When to Produce Back-channel Utterances. In Proc. 4th International Conference on Spoken Language Processing (ICSLP '96), Vol. 3. 1728--1731.
[27]
Nigel Ward and Wataru Tsukahara. 2000. Prosodic features which cue backchannel responses in English and Japanese. Journal of Pragmatics 32, 8 (2000), 1177--1207.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
IVA '23: Proceedings of the 23rd ACM International Conference on Intelligent Virtual Agents
September 2023
376 pages
ISBN:9781450399944
DOI:10.1145/3570945
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 December 2023

Check for updates

Author Tags

  1. backchannel
  2. communication
  3. multimodal interaction

Qualifiers

  • Extended-abstract
  • Research
  • Refereed limited

Conference

IVA '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 53 of 196 submissions, 27%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 69
    Total Downloads
  • Downloads (Last 12 months)69
  • Downloads (Last 6 weeks)1
Reflects downloads up to 26 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media