Abstract
During an interaction, interactants exchange speaking turns. Exchanges can be done smoothly or through interruptions. Listeners can display backchannels, send signals to grab the speaking turn, wait for the speaker to yield the turn, or even interrupt and grab the speaking turn. Interruptions are very frequent in natural interactions. To create believable and engaging interaction between human interactants and embodied conversational agent ECA, it is important to endow virtual agent with the capability to manage interruptions, that is to have the ability to interrupt, but also to react to an interruption. As a first step, we focus on the later one where the agent is able to perceive and interpret the user’s multimodal behaviors as either an attempt or not to take the turn. To this aim, we annotate, analyse and characterize interruptions in human-human conversations. In this paper, we describe our annotation schema that embeds different types of interruptions. We then provide an analysis of multimodal features, focusing of prosodic features (F0 and loudness) and body (head and hand) activity, to characterize interruptions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Allwood, J., Nivre, J., Ahlsén, E.: On the semantics and pragmatics of linguistic feedback. J. Semant. 9(1), 1–26 (1992)
Ball, P.: Listeners’ responses to filled pauses in relation to floor apportionment. Br. J. Soc. Clin. Psychol. (1975)
Baltrusaitis, T., Zadeh, A., Lim, Y.C., Morency, L.P.: OpenFace 2.0: facial behavior analysis toolkit. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 59–66. IEEE (2018)
Baur, T., et al.: explainable cooperative machine learning with NOVA. KI - Künstliche Intelligenz (2020)
Beattie, G.W.: Floor apportionment and gaze in conversational dyads. Br. J. Soc. Clin. Psychol. 17(1), 7–15 (1978)
Beattie, G.W.: Interruption in Conversational Interaction, and Its Relation to the Sex and Status of the Interactants. Walter de Gruyter, Berlin/New York (1981)
Bögels, S., Torreira, F.: Turn-end estimation in conversational turn-taking: the roles of context and prosody. Discour. Process. 58(10), 903–924 (2021)
Cafaro, A., Glas, N., Pelachaud, C.: The effects of interrupting behavior on interpersonal attitude and engagement in dyadic interactions. In: Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, pp. 911–920 (2016)
Cafaro, A., et al.: The NoXi database: multimodal recordings of mediated novice-expert interactions. In: Proceedings of the 19th ACM International Conference on Multimodal Interaction, pp. 350–359 (2017)
Chowdhury, S.A., Danieli, M., Riccardi, G.: Annotating and categorizing competition in overlap speech. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5316–5320. IEEE (2015)
Chýlek, A., Švec, J., Šmídl, L.: Learning to interrupt the user at the right time in incremental dialogue systems. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2018. LNCS (LNAI), vol. 11107, pp. 500–508. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00794-2_54
Coates, J.: 11 no gap, lots of overlap: turn-taking patterns in. Researching language and literacy in social context: a reader, p. 177 (1994)
Coman, A.C., Yoshino, K., Murase, Y., Nakamura, S., Riccardi, G.: An incremental turn-taking model for task-oriented dialog systems. arXiv preprint arXiv:1905.11806 (2019)
De Kok, I., Heylen, D.: Multimodal end-of-turn prediction in multi-party meetings. In: Proceedings of the 2009 International Conference on Multimodal Interfaces, pp. 91–98 (2009)
De Ruiter, J.P., Mitterer, H., Enfield, N.J.: Projecting the end of a speaker’s turn: a cognitive cornerstone of conversation. Language 82(3), 515–535 (2006)
Dediu, D., Levinson, S.C.: On the antiquity of language: the reinterpretation of Neandertal linguistic capacities and its consequences. Front. Psychol. 4, 397 (2013)
Demol, M., Verhelst, W., Verhoeve, P.: The duration of speech pauses in a multilingual environment. In: Eighth Annual Conference of the International Speech Communication Association (2007)
Duncan, S.: Some signals and rules for taking speaking turns in conversations. J. Pers. Soc. Psychol. 23(2), 283 (1972)
Egorow, O., Wendemuth, A.: On emotions as features for speech overlaps classification. IEEE Trans. Affect. Comput. (2019)
Ekman, P., Friesen, W.V.: Facial action coding system. Environ. Psychol. Nonverbal Behav. (1978)
Eyben, F., Wöllmer, M., Schuller, B.: OpenSmile: the Munich versatile and fast open-source audio feature extractor. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 1459–1462 (2010)
Ferguson, N.: Simultaneous speech, interruptions and dominance. Br. J. Soc. Clin. Psychol. 16(4), 295–302 (1977)
French, P., Local, J.: Turn-competitive incomings. J. Pragmat. 7(1), 17–38 (1983)
Goldberg, J.A.: Interrupting the discourse on interruptions: an analysis in terms of relationally neutral, power-and rapport-oriented acts. J. Pragmat. 14(6), 883–903 (1990)
Gravano, A., Brusco, P., Benus, S.: Who do you think will speak next? Perception of turn-taking cues in Slovak and argentine Spanish. In: INTERSPEECH, pp. 1265–1269 (2016)
Gravano, A., Hirschberg, J.: A corpus-based study of interruptions in spoken dialogue. In: Thirteenth Annual Conference of the International Speech Communication Association (2012)
Hammarberg, B., Fritzell, B., Gaufin, J., Sundberg, J., Wedin, L.: Perceptual and acoustic correlates of abnormal voice qualities. Acta Otolaryngol. 90(1–6), 441–451 (1980)
Hara, K., Inoue, K., Takanashi, K., Kawahara, T.: Turn-taking prediction based on detection of transition relevance place. In: Proceedings of Interspeech 2019, pp. 4170–4174 (2019). https://doi.org/10.21437/Interspeech.2019-1537
Heldner, M., Edlund, J.: Pauses, gaps and overlaps in conversations. J. Phon. 38(4), 555–568 (2010)
Holler, J., Kendrick, K.H., Casillas, M., Levinson, S.C.: Turn-taking in human communicative interaction. Front. Media SA (2016)
Indefrey, P., Levelt, W.J.: The spatial and temporal signatures of word production components. Cognition 92(1–2), 101–144 (2004)
Ishii, R., Otsuka, K., Kumano, S., Matsuda, M., Yamato, J.: Predicting next speaker and timing from gaze transition patterns in multi-party meetings. In: Proceedings of the 15th ACM on International conference on multimodal interaction, pp. 79–86 (2013)
Ishii, R., Otsuka, K., Kumano, S., Yamato, J.: Using respiration to predict who will speak next and when in multiparty meetings. ACM Trans. Interact. Intell. Syst. (TiiS) 6(2), 1–20 (2016)
Ishii, R., Ren, X., Muszynski, M., Morency, L.P.: Can prediction of turn-management willingness improve turn-changing modeling? In: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, pp. 1–8 (2020)
Ishii, R., Ren, X., Muszynski, M., Morency, L.P.: Multimodal and multitask approach to listener’s backchannel prediction: can prediction of turn-changing and turn-management willingness improve backchannel modeling? In: Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents, pp. 131–138 (2021)
Ishimoto, Y., Teraoka, T., Enomoto, M.: End-of-utterance prediction by prosodic features and phrase-dependency structure in spontaneous Japanese speech. In: Interspeech, pp. 1681–1685 (2017)
Itakura, H.: Describing conversational dominance. J. Pragmat. 33(12), 1859–1880 (2001)
Kendon, A.: Some functions of gaze-direction in social interaction. Acta Physiol. 26, 22–63 (1967)
Kurtić, E., Brown, G.J., Wells, B.: Resources for turn competition in overlapping talk. Speech Commun. 55(5), 721–743 (2013)
Lee, C.C., Lee, S., Narayanan, S.S.: An analysis of multimodal cues of interruption in dyadic spoken interactions. In: Ninth Annual Conference of the International Speech Communication Association (2008)
Lee, C.C., Narayanan, S.: Predicting interruptions in dyadic spoken interactions. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5250–5253. IEEE (2010)
Maier, A., Hough, J., Schlangen, D., et al.: Towards deep end-of-turn prediction for situated spoken dialogue systems (2017)
Moerman, M., Sacks, H.: Appendix B. on “understanding” in the analysis of natural conversation. In: Talking Culture, pp. 180–186. University of Pennsylvania Press (2010)
Niebuhr, O., Görs, K., Graupe, E.: Speech reduction, intensity, and F0 shape are cues to turn-taking. In: Proceedings of the SIGDIAL 2013 Conference, pp. 261–269 (2013)
Riest, C., Jorschick, A.B., de Ruiter, J.P.: Anticipation in turn-taking: mechanisms and information sources. Front. Psychol. 6, 89 (2015)
Sacks, H., Schegloff, E.A., Jefferson, G.: A simplest systematics for the organization of turn taking for conversation. In: Studies in the Organization of Conversational Interaction, pp. 7–55. Elsevier (1978)
Schegloff, E.A.: Sequencing in conversational openings 1. Am. Anthropol. 70(6), 1075–1095 (1968)
Schegloff, E.A.: Overlapping talk and the organization of turn-taking for conversation. Lang. Soc. 29(1), 1–63 (2000)
Schegloff, E.A., Sacks, H.: Opening up Closings. Walter de Gruyter, Berlin/New York (1973)
Shriberg, E., Stolcke, A., Baron, D.: Observations on overlap: findings and implications for automatic processing of multi-party conversation. In: Seventh European Conference on Speech Communication and Technology (2001)
Skantze, G., Johansson, M., Beskow, J.: Exploring turn-taking cues in multi-party human-robot discussions about objects. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 67–74 (2015)
Stivers, T., et al.: Universals and cultural variation in turn-taking in conversation. Proc. Natl. Acad. Sci. 106(26), 10587–10592 (2009)
Tannen, D., et al.: You Just Don’t Understand: Women and Men in Conversation. Virago, London (1991)
Truong, K.P.: Classification of cooperative and competitive overlaps in speech using cues from the context, overlapper, and overlappee. In: Interspeech, pp. 1404–1408 (2013)
Van Berkum, J.J., Brown, C.M., Zwitserlood, P., Kooijman, V., Hagoort, P.: Anticipating upcoming words in discourse: evidence from ERPs and reading times. J. Exp. Psychol. Learn. Mem. Cogn. 31(3), 443 (2005)
Xiu, Y., Li, J., Wang, H., Fang, Y., Lu, C.: Pose flow: efficient online pose tracking. In: BMVC (2018)
Yang, L.C.: Visualizing spoken discourse: prosodic form and discourse functions of interruptions. In: Proceedings of the Second SIGdial Workshop on Discourse and Dialogue (2001)
Acknowledgements
This work was performed as part of ANR-JST-CREST TAPAS and ANR-JST-DFG PANORAMA project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Yang, L., Achard, C., Pelachaud, C. (2022). Multimodal Analysis of Interruptions. In: Duffy, V.G. (eds) Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management. Anthropometry, Human Behavior, and Communication. HCII 2022. Lecture Notes in Computer Science, vol 13319. Springer, Cham. https://doi.org/10.1007/978-3-031-05890-5_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-05890-5_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-05889-9
Online ISBN: 978-3-031-05890-5
eBook Packages: Computer ScienceComputer Science (R0)