Abstract
The Switching Wizard of Oz (SWOZ) is a setup to evaluate human behavior synthesis algorithms in online face-to-face interactions. Conversational partners are represented to each other as virtual agents, whose animated behavior is either based on a synthesis algorithm, or driven by the actual behavior of the conversational partner. Human and algorithm have the same expression capabilities. The source is switched at random intervals, which means that the algorithm’s behavior can only be identified when it deviates from what is regarded as appropriate. The SWOZ approach is especially suitable for the controlled evaluation of synthesis algorithms that consider a limited set of behaviors. We evaluate a backchannel synthesis algorithm for speaker–listener dialogs using an asymmetric version of the framework. Human speakers talk to virtual listeners, that are either controlled by human listeners or by an algorithm. Speakers indicate when they feel they are no longer talking to a human listener. Analysis of these responses reveals patterns of inappropriate behavior in terms of quantity and timing of backchannels. These insights can be used to improve synthesis algorithms.
Similar content being viewed by others
References
Bailenson JN, Yee N, Patel K, Beall AC (2008) Detecting digital chameleons. Comput Hum Behav 24(1):66–87
Bavelas JB, Coates L, Johnson T (2002) Listener responses as a collaborative process: the role of gaze. J Commun 52(3):566–580
Bente G, Krämer NC, Petersen A, de Ruiter JP (2001) Computer animated movement and person perception: methodological advances in nonverbal behavior research. J Nonverbal Behav 25(3):151–166
Brunner LJ (1979) Smiles can be back channels. J Pers Soc Psychol 37(5):728–734
Cathcart N, Carletta J, Klein E (2003) A shallow model of backchannel continuers in spoken dialogue. In: Proceedings of the conference of the European chapter of the association for computational linguistics, Budapest, Hungary, vol 1, pp 51–58
Chang CC, Lin CJ (2011) LibSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):1–27
Dahlbäck N, Jönsson A, Ahrenberg L (1993) Wizard of Oz studies: why and how. In: Proceedings of the international conference on intelligent user interfaces (IUI), Orlando, FL, pp 193–200
Duncan S Jr (1974) On the structure of speaker–auditor interaction during speaking turns. Lang Soc 3(2):161–180
de Kok I, Ozkan D, Heylen D, Morency LP (2010) Learning and evaluating response prediction models using parallel listener consensus. In: Proceedings of the international conference on multimodal interfaces (ICMI), Beijing, China
de Kok I, Poppe R, Heylen D (2012) Iterative perceptual learning for social behavior synthesis. Technical report, TR-CTIT-12-01, University of Twente
Edlund J, Beskow J (2009) Mushypeek: a framework for online investigation of audiovisual dialogue phenomena. Lang Speech 52(2–3):351–367
Heylen D, Bevacqua E, Pelachaud C, Poggi I, Gratch J, Schröder M (2011) Generating listening behaviour. In: Cowie R, Pelachaud C, Petta P (eds) Emotion-oriented systems cognitive technologies. Springer, Berlin, pp 321–347
Hoai M, la Torre FD (2012) Max-margin early event detectors. In: Proceedings of the conference on computer vision and pattern recognition (CVPR), Providence, RI, pp 2863–2870
Huang L, Morency LP, Gratch J (2010) Learning backchannel prediction model from parasocial consensus sampling: a subjective evaluation. In: Proceedings of the international conference on interactive virtual agents (IVA), Philadelphia, PA, pp 159–172
Huang L, Morency LP, Gratch J (2011) Virtual rapport 2.0. In: Proceedings of the international conference on interactive virtual agents (IVA), Reykjavik, Iceland, pp 68–79
Krauss RM, Garlock CM, Bricker PD, McMahon LE (1977) The role of audible and visible back-channel responses in interpersonal communication. J Pers Soc Psychol 35(7):523–529
Li HZ (2006) Backchannel responses as misleading feedback in intercultural discourse. J Intercult Commun Res 35(2):99–116
Martin JC, Paggio P, Kuehnlein P, Stiefelhagen R, Pianesi F (2008) Introduction to the special issue on multimodal corpora for modeling human multimodal behavior. Lang Resour Eval 42(2):253–264
McDonnell R, Ennis C, Dobbyn S, O’Sullivan C (2009) Talking bodies: sensitivity to desynchronization of conversations. ACM Trans Appl Percept 6(4):A22
McKeown G, Valstar M, Cowie R, Pantic M, Schröder M (2012) The SEMAINE database: annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Trans Affect Comput 3(1):5–17
Morency LP, de Kok I, Gratch J (2010) A probabilistic multimodal approach for predicting listener backchannels. Auton Agents Multi-Agent Syst 20(1):80–84
Poppe R, ter Maat M, Heylen D (2012) Online backchannel synthesis evaluation with the Switching Wizard of Oz. In: Joint proceedings of the intelligent virtual agents (IVA) 2012 workshops, Santa Cruz, CA, pp 75–82
Poppe R, ter Maat M, Heylen D (2012) Online behavior evaluation with the switching wizard of Oz. In: Proceedings of the international conference on interactive virtual agents (IVA), Santa Cruz, CA, pp 486–488
Poppe R, Truong KP, Heylen D (2013) Perceptual evaluation of backchannel strategies for artificial listeners. J Auton Agents Multi-Agent Syst 27(2):235–253
Schedl M (2006) The CoMIRVA toolkit for visualizing music-related data. Technical report, Department of Computational Perception, Johannes Kepler University Linz
Truong KP, Poppe R, de Kok I, Heylen D (2011) A multimodal analysis of vocal and visual backchannels in spontaneous dialogs. In: Proceedings of interspeech, Florence, Italy, pp 2973–2976
Turing AM (1950) Computing machinery and intelligence. Mind 59(236):433–460
van Welbergen H, Reidsma D, Ruttkay Z, Zwiers J (2010) Elckerlyc—a BML realizer for continuous, multimodal interaction with a virtual human. J Multimodal User Interfaces 3(4):271–284
Wang Z, Lee J, Marsella S (2013) Multi-party, multi-role comprehensive listening behavior. J Auton Agents Multi-Agent Syst 27(2):218–234
Ward N, Tsukahara W (2000) Prosodic features which cue back-channel responses in English and Japanese. J Pragmat 32(8):1177–1207
Xudong D (2009) The pragmatics of interaction. chap. Listener response. John Benjamins Publishing, Amsterdam, pp 104–124
Yngve VH (1970) On getting a word in edgewise. In: Papers from the sixth regional meeting of Chicago Linguistic Society. Chicago Linguistic Society, Chicago, pp 567–577
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Poppe, R., ter Maat, M. & Heylen, D. Switching Wizard of Oz for the online evaluation of backchannel behavior. J Multimodal User Interfaces 8, 109–117 (2014). https://doi.org/10.1007/s12193-013-0131-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12193-013-0131-2