[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3652988.3673966acmconferencesArticle/Chapter ViewAbstractPublication PagesivaConference Proceedingsconference-collections
extended-abstract
Free access

A Holistic Evaluation Methodology for Multi-Party Spoken Conversational Agents

Published: 26 December 2024 Publication History

Abstract

While research in multi-party spoken conversation with intelligent embodied agents has made significant progress in sub-tasks like speaker identification and non-verbal cues, there’s a gap in fully autonomous applications users can directly interact with. This lack translates to the absence of a standard methodology for evaluating multi-party conversational speech agents that considers both task-based system performance and user experience.
Our research has addressed the former by developing a multi-modal robot receptionist for a hospital waiting room whose multi-party conversational ability, nonverbal behaviour, and dialogue management is implemented using Large Language Models (LLM). In this paper, we go on to address the issue of evaluation, describing an experimental methodology and design of task-based user experiments that captures both objective measures of multi-party dialogue performance (such as accurate tracking of user goals) and the users’ subjective experience of multi-party embodied conversations. This paper therefore presents a holistic methodology for the future evaluation of multi-party spoken conversational agents.

References

[1]
Angus Addlesee, Neeraj Cherakara, Nivan Nelson, Daniel Hernandez Garcia, Nancie Gunson, Weronika Sieińska, Christian Dondrup, and Oliver Lemon. 2024. Multi-party Multimodal Conversations Between Patients, Their Companions, and a Social Robot in a Hospital Memory Clinic. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, Nikolaos Aletras and Orphee De Clercq (Eds.). Association for Computational Linguistics, St. Julians, Malta, 62–70. https://aclanthology.org/2024.eacl-demo.8
[2]
Angus Addlesee, Neeraj Cherakara, Nivan Nelson, Daniel Hernández García, Nancie Gunson, Weronika Sieińska, Marta Romeo, Christian Dondrup, and Oliver Lemon. 2024. A Multi-party Conversational Social Robot Using LLMs. In Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction (, Boulder, CO, USA,) (HRI ’24). Association for Computing Machinery, New York, NY, USA, 1273–1275. https://doi.org/10.1145/3610978.3641112
[3]
Angus Addlesee, Weronika Sieińska, Nancie Gunson, Daniel Hernández Garcia, Christian Dondrup, and Oliver Lemon. 2023. Multi-party Goal Tracking with LLMs: Comparing Pre-training, Fine-tuning, and Prompt Engineering. In Sigdial 2023. arxiv:2308.15231http://arxiv.org/abs/2308.15231
[4]
Dan Bohus and Eric Horvitz. 2011. Multiparty turn taking in situated dialog: Study, lessons, and directions. In Proceedings of the SIGDIAL 2011 Conference. 98–109.
[5]
Neeraj Cherakara, Finny Varghese, Sheena Shabana, Nivan Nelson, Abhiram Karukayil, Rohith Kulothungan, Mohammed Afil Farhan, Birthe Nesset, Meriam Moujahid, Tanvi Dinkar, Verena Rieser, and Oliver Lemon. 2023. FurChat: An Embodied Conversational Agent using LLMs, Combining Open and Closed-Domain Dialogue with Facial Expressions. In Sigdial 2023. 588–592. arxiv:2308.15214http://arxiv.org/abs/2308.15214
[6]
Sara Cooper, Alessandro Di Fava, Carlos Vivas, Luca Marchionni, and Francesco Ferro. 2020. ARI: The Social Assistive Robot and Companion. In 29th IEEE International Conference on Robot and Human Interactive Communication, RO-MAN 2020. 745–751.
[7]
Arash Eshghi and Patrick GT Healey. 2016. Collective contexts in conversation: Grounding by proxy. Cognitive science 40, 2 (2016), 299–324.
[8]
Mary Ellen Foster, Bart Craenen, Amol Deshmukh, Oliver Lemon, Emanuele Bastianelli, Christian Dondrup, Ioannis Papaioannou, Andrea Vanzo, Jean-Marc Odobez, Olivier Canévet, 2019. MuMMER: Socially intelligent human-robot interaction in public spaces. arXiv preprint arXiv:1909.06749 (2019).
[9]
Press Furhat Robotics. 2018. FRAnny, Frankfurt Airport’s new multilingual robot concierge can help you in over 35 languages. Furhat Robotics Press Release (May 2018). https://furhatrobotics.com/press-releases/franny-frankfurt-airports-new-multilingual-robot-concierge-can-help-you-in-over-35-languages/
[10]
Sarah Gillet, Ronald Cumbal, André Pereira, José Lopes, Olov Engwall, and Iolanda Leite. 2021. Robot gaze can mediate participation imbalance in groups with diferent skill levels. In ACM/IEEE International Conference on Human-Robot Interaction. 303–311. https://doi.org/10.1145/3434073.3444670
[11]
Jia-Chen Gu, Chongyang Tao, and Zhen-Hua Ling. 2022. WHO Says WHAT to WHOM: A Survey of Multi-Party Conversations. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22).
[12]
Nancie Gunson, Daniel Hernández García, Weronika Sieińska, Christian Dondrup, and Oliver Lemon. 2022. Developing a Social Conversational Robot for the Hospital waiting room. In 2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN). IEEE, 1352–1357.
[13]
Eui Jun Hwang, Byeong Kyu Ahn, Bruce A. Macdonald, and Ho Seok Ahn. 2020. Demonstration of Hospital Receptionist Robot with Extended Hybrid Code Network to Select Responses and Gestures. In 2020 IEEE International Conference on Robotics and Automation (ICRA). 8013–8018. https://doi.org/10.1109/ICRA40945.2020.9197160
[14]
Adriana Lorena Iniguez-Carrillo, Laura Sanely Gaytan-Lugo, Miguel Angel Garcia-Ruiz, and Rocio Maciel-Arellano. 2021. Usability questionnaires to evaluate voice user interfaces. IEEE Latin America Transactions 19, 9 (2021), 1468–1477. https://doi.org/10.1109/TLA.2021.9468439
[15]
Koji Inoue, Hiromi Sakamoto, Kenta Yamamoto, Divesh Lala, and Tatsuya Kawahara. 2021. A multi-party attentive listening robot which stimulates involvement from side participants. In Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue. 261–264. https://aclanthology.org/2021.sigdial-1.28
[16]
James R. Lewis and Mary L. Hardzinski. 2015. Investigating the psychometric properties of the Speech User Interface Service Quality questionnaire. International Journal of Speech Technology 18, 3 (2015), 479–487. https://doi.org/10.1007/s10772-015-9289-1
[17]
Meriam Moujahid, Helen Hastie, and Oliver Lemon. 2022. Multi-party Interaction with a Robot Receptionist. In ACM/IEEE International Conference on Human-Robot Interaction, Vol. 2022-March. IEEE, 927–931. https://doi.org/10.1109/HRI53351.2022.9889641
[18]
Prasanth Murali, Ian Steenstra, Timothy Bickmore, Hye Sun Yun, Ameneh Shamekhi, and Timothy Bickmore. 2023. Improving Multiparty Interactions with a Robot Using Large Language Models. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems(CHI EA ’23). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3544549.3585602
[19]
Tatsuya Nomura, Takayuki Kanda, and Tomohiro Suzuki. 2006. Experimental investigation into influence of negative attitudes toward robots on human-robot interaction. AI and Society 20, 2 (2006), 138–150. https://doi.org/10.1007/s00146-005-0012-7
[20]
Jekaterina Novikova, Oliver Lemon, and Verena Rieser. 2016. Crowd-sourcing NLG Data: Pictures Elicit Better Data. CoRR abs/1608.00339 (2016). arXiv:1608.00339http://arxiv.org/abs/1608.00339
[21]
Viktor Richter, Birte Carlmeyer, Florian Lier, Sebastian Meyer Zu Borgsen, David Schlangen, Franz Kummert, Sven Wachsmuth, and Britta Wrede. 2016. Are you talking to me? Improving the robustness of dialogue systems in a multi party HRI scenario by incorporating gaze direction and lip movement of attendees. In HAI 2016 - Proceedings of the 4th International Conference on Human Agent Interaction. 43–50. https://doi.org/10.1145/2974804.2974823
[22]
Ameneh Shamekhi and Timothy W. Bickmore. 2019. A multimodal robot-driven meeting facilitation system for group decision-making sessions. In ICMI 2019 - Proceedings of the 2019 International Conference on Multimodal Interaction. 279–290. https://doi.org/10.1145/3340555.3353756
[23]
Gabriel Skantze, Martin Johansson, and Jonas Beskow. 2015. Exploring turn-taking cues in multi-party human-robot discussions about objects. ICMI 2015 - Proceedings of the 2015 ACM International Conference on Multimodal Interaction (2015), 67–74. https://doi.org/10.1145/2818346.2820749
[24]
Nicolas Spatola, Barbara Kühnlenz, and Gordon Cheng. 2021. Perception and Evaluation in Human–Robot Interaction: The Human–Robot Interaction Evaluation Scale (HRIES)—A Multicomponent Approach of Anthropomorphism. International Journal of Social Robotics 13, 7 (2021), 1517–1539. https://doi.org/10.1007/s12369-020-00667-4
[25]
Dina Utami and Timothy Bickmore. 2019. Collaborative User Responses in Multiparty Interaction with a Couples Counselor Robot. ACM/IEEE International Conference on Human-Robot Interaction 2019-March (2019), 294–303. https://doi.org/10.1109/HRI.2019.8673177
[26]
Evgenios Vlachos, Anne Faber Hansen, and Jakob Povl Holck. 2020. A robot in the library. In International conference on human-computer interaction. Springer.
[27]
Marilyn A Walker, Diane J Litman, Candace A Kamm, and Alicia Abella. 1997. PARADISE: A Framework for Evaluating Spoken Dialogue Agents. In Proceedings of the 8th Conference on European Chapter of the Association of Computational Linguistics, ACL ’97. 271–280.
[28]
Mateusz Żarkowski. 2019. Multi-party Turn-Taking in Repeated Human–Robot Interactions: An Interdisciplinary Evaluation. International Journal of Social Robotics 11, 5 (2019), 693–707. https://doi.org/10.1007/s12369-019-00603-1

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
IVA '24: Proceedings of the 24th ACM International Conference on Intelligent Virtual Agents
September 2024
337 pages
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 December 2024

Check for updates

Author Tags

  1. HRI
  2. conversational agents
  3. evaluation methodology
  4. large language models
  5. multi-party interaction
  6. social robots
  7. user study

Qualifiers

  • Extended-abstract
  • Research
  • Refereed limited

Funding Sources

Conference

IVA '24
Sponsor:
IVA '24: ACM International Conference on Intelligent Virtual Agents
September 16 - 19, 2024
GLASGOW, United Kingdom

Acceptance Rates

Overall Acceptance Rate 53 of 196 submissions, 27%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 27
    Total Downloads
  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)27
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media