More Web Proxy on the site http://driver.im/

research-article

Evaluating Conversational Recommender Systems via User Simulation

Authors:

Krisztian BalogAuthors Info & Claims

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 1512 - 1520

https://doi.org/10.1145/3394486.3403202

Published: 20 August 2020 Publication History

Abstract

Conversational information access is an emerging research area. Currently, human evaluation is used for end-to-end system evaluation, which is both very time and resource intensive at scale, and thus becomes a bottleneck of progress. As an alternative, we propose automated evaluation by means of simulating users. Our user simulator aims to generate responses that a real human would give by considering both individual preferences and the general flow of interaction with the system. We evaluate our simulation approach on an item recommendation task by comparing three existing conversational recommender systems. We show that preference modeling and task-specific interaction models both contribute to more realistic simulations, and can help achieve high correlation between automatic evaluation measures and manual human assessments.

References

[1]

Mohammad Aliannejadi, Hamed Zamani, Fabio Crestani, and W. Bruce Croft. 2019. Asking Clarifying Questions in Open-Domain Information-Seeking Conversations. In Proc. of SIGIR '19. 475--484.

[2]

Layla El Asri, Jing He, and Kaheer Suleman. 2016. A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue Systems. In Proc. of Interspeech '16, 2016. 1151--1155.

[3]

Leif Azzopardi, Mateusz Dubiel, Martin Halvey, and Jeffery Dalton. 2018. Conceptualizing Agent-human Interactions during the Conversational Search Process. In Proc. of CAIR '18.

[4]

Krisztian Balog and Tom Kenter. 2019. Personal Knowledge Graphs: A Research Agenda. In Proc. of ICTIR '19. 217--220.

Digital Library

[5]

Anja Belz and Ehud Reiter. 2006. Comparing Automatic and Human Evaluation of NLG Systems. In Proc. of EACL '06.

[6]

Keping Bi, Qingyao Ai, Yongfeng Zhang, and W Bruce Croft. 2019. Conversational Product Search Based on Negative Feedback. In Proc. of CIKM '19. 359--368.

Digital Library

[7]

Hongshen Chen, Xiaorui Liu, Dawei Yin, and Jiliang Tang. 2017. A Survey on Dialogue Systems: Recent Advances and New Frontiers. SIGKDD Explor. Newsl., Vol. 19, 2 (Nov. 2017), 25--35.

Digital Library

[8]

Konstantina Christakopoulou, Filip Radlinski, and Katja Hofmann. 2016. Towards Conversational Recommender Systems. In Proc. of KDD '16. 815--824.

Digital Library

[9]

Grace Chung. 2004. Developing a Flexible Spoken Dialog System Using Simulation. In Proc. of ACL '04.

Digital Library

[10]

J. Shane Culpepper, Fernando Diaz, and Mark D. Smucker. 2018. Research Frontiers in Information Retrieval: Report from the Third Strategic Workshop on Information Retrieval in Lorne (SWIRL 2018). SIGIR Forum, Vol. 52, 1 (2018), 34--90.

Digital Library

[11]

Jeff Dalton, Chenyan Xiong, and Jamie Callan. 2019. TREC Conversational Assistance Track. http://www.treccast.ai/.

[12]

David Griol, Javier Carbó, and José M. Molina. 2013. An Automatic Dialog Simulation Technique to Develop and Evaluate Interactive Conversational Agents. Appl. Artif. Intell., Vol. 27, 9 (oct 2013), 759--780.

[13]

F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Trans. Interact. Intell. Syst., Vol. 5, 4, Article 19 (2015).

Digital Library

[14]

Katja Hofmann, Lihong Li, and Filip Radlinski. 2016. Online Evaluation for Information Retrieval. Found. Trends Inf. Retr., Vol. 10, 1 (2016), 1--117.

Digital Library

[15]

Dan Jurafsky and James H. Martin. 2019. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 3nd Edition draft .Prentice Hall, Pearson Education International.

[16]

Diane Kelly. 2009. Methods for Evaluating Interactive Information Retrieval Systems with Users. Found. Trends Inf. Retr., Vol. 3, 1--2 (jan 2009), 1--224.

Digital Library

[17]

Florian Kreyssig, I n igo Casanueva, Pawel Budzianowski, and Milica Gasic. 2018. Neural User Simulation for Corpus-based Policy Optimisation of Spoken Dialogue Systems. In Proc. of SIGDIAL '18. 60--69.

[18]

Jiwei Li, Will Monroe, Alan Ritter, Dan Jurafsky, Michel Galley, and Jianfeng Gao. 2016. Deep Reinforcement Learning for Dialogue Generation. In Proc. of EMNLP '16. 1192--1202.

[19]

Jiwei Li, Will Monroe, Tianlin Shi, Sé bastien Jean, Alan Ritter, and Dan Jurafsky. 2017b. Adversarial Learning for Neural Dialogue Generation. In Proc. of EMNLP '17. 2157--2169.

[20]

Xiujun Li, Yun-Nung Chen, Lihong Li, Jianfeng Gao, and Asli Celikyilmaz. 2017a. End-to-End Task-Completion Neural Dialogue Systems. In Proc. of IJCNLP '17. 733--743.

[21]

Chia-Wei Liu, Ryan Lowe, Iulian Serban, Mike Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation. In Proc. of EMNLP '16. 2122--2132.

[22]

David Martin Maxwell. 2019. Modelling search and stopping in interactive information retrieval. Ph.D. Dissertation. University of Glasgow.

[23]

Alexandros Papangelis, Yi-Chia Wang, Piero Molino, and Gokhan Tur. 2019. Collaborative Multi-Agent Dialogue Model Training Via Reinforcement Learning. In Proc. of SIGDIAL '19. 92--102.

[24]

Baolin Peng, Xiujun Li, Jianfeng Gao, Jingjing Liu, and Kam-Fai Wong. 2018. Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning. In Proc. of ACL '18. 2182--2192.

[25]

Olivier Pietquin and Helen Hastie. 2013. A survey on metrics for the evaluation of user simulations. The Knowledge Engineering Review, Vol. 28, 1 (2013), 59--73.

[26]

Chen Qu, Liu Yang, W. Bruce Croft, Yongfeng Zhang, Johanne R. Trippas, and Minghui Qiu. 2019. User Intent Prediction in Information-Seeking Conversations. In Proc. of CHIIR '19. 25--33.

Digital Library

[27]

Filip Radlinski and Nick Craswell. 2017. A Theoretical Framework for Conversational Search. In Proc. of CHIIR '17. 117--126.

Digital Library

[28]

Mark Sanderson. 2010. Test Collection Based Evaluation of Information Retrieval Systems. Found. Trends Inf. Retr., Vol. 4, 4 (2010), 247--375.

[29]

Jost Schatzmann, Kallirroi Georgila, and Steve Young. 2005. Quantitative Evaluation of User Simulation Techniques for Spoken Dialogue Systems. In Proc. of SIGDIAL '05. 45--54.

[30]

Jost Schatzmann, Blaise Thomson, Karl Weilhammer, Hui Ye, and Steve Young. 2007. Agenda-based User Simulation for Bootstrapping a POMDP Dialogue System. In Proc. of NAACL-Short '07. 149--152.

[31]

Jost Schatzmann, Karl Weilhammer, Matt Stuttle, and Steve Young. 2006. A Survey of Statistical User Simulation Techniques for Reinforcement-Learning of Dialogue Management Strategies. Knowl. Eng. Rev., Vol. 21, 2 (June 2006), 97--126.

Digital Library

[32]

Iulian Vlad Serban, Ryan Lowe, Peter Henderson, Laurent Charlin, and Joelle Pineau. 2018. A Survey of Available Corpora For Building Data-Driven Dialogue Systems: The Journal Version. D & D, Vol. 9, 1 (2018), 1--49.

[33]

Johanne R. Trippas. 2019. Spoken Conversational Search: Audio-only Interactive Information Retrieval. Ph.D. Dissertation. RMIT University.

[34]

Svitlana Vakulenko. 2019. Knowledge-based Conversational Search. Ph.D. Dissertation. TU Wien.

[35]

Svitlana Vakulenko, Kate Revoredo, Claudio Di Ciccio, and Maarten de Rijke. 2019. QRFA: A Data-Driven Model of Information Seeking Dialogues. In Advances in Information Retrieval. 541--557.

[36]

Liu Yang, Minghui Qiu, Chen Qu, Jiafeng Guo, Yongfeng Zhang, W. Bruce Croft, Jun Huang, and Haiqing Chen. 2018. Response Ranking with Deep Matching Networks and External Knowledge in Information-Seeking Conversation Systems. In Proc. of SIGIR '18. 245--254.

Digital Library

[37]

Yongfeng Zhang, Xu Chen, Qingyao Ai, Liu Yang, and W. Bruce Croft. 2018. Towards Conversational Search and Recommendation: System Ask, User Respond. In Proc. of CIKM '18. 177--186.

Cited By

Bernard NBalog KOosterhuis HBast HXiong C(2024)Towards a Formal Characterization of User Simulation Objectives in Conversational Information AccessProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672529(185-193)Online publication date: 2-Aug-2024
https://dl.acm.org/doi/10.1145/3664190.3672529
Sekulić IAlinannejadi MCrestani F(2024)Analysing Utterances in LLM-Based User Simulation for Conversational SearchACM Transactions on Intelligent Systems and Technology10.1145/365004115:3(1-22)Online publication date: 5-Mar-2024
https://dl.acm.org/doi/10.1145/3650041
Rahdari BBrusilovsky PKveton B(2024)Towards Simulation-Based Evaluation of Recommender Systems with Carousel InterfacesACM Transactions on Recommender Systems10.1145/36437092:1(1-25)Online publication date: 30-Jan-2024
https://dl.acm.org/doi/10.1145/3643709
Show More Cited By

Index Terms

Evaluating Conversational Recommender Systems via User Simulation
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI design and evaluation methods
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems
    2. Users and interactive retrieval

Recommendations

Estimation-Action-Reflection: Towards Deep Interaction Between Conversational and Recommender Systems
WSDM '20: Proceedings of the 13th International Conference on Web Search and Data Mining

Recommender systems are embracing conversational technologies to obtain user preferences dynamically, and to overcome inherent limitations of their static models. A successful Conversational Recommender System (CRS) requires proper handling of ...
Simulating User Satisfaction for the Evaluation of Task-oriented Dialogue Systems
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Evaluation is crucial in the development process of task-oriented dialogue systems. As an evaluation method, user simulation allows us to tackle issues such as scalability and cost-efficiency, making it a viable choice for large-scale automatic ...
UserSimCRS: A User Simulation Toolkit for Evaluating Conversational Recommender Systems
WSDM '23: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining

We present an extensible user simulation toolkit to facilitate automatic evaluation of conversational recommender systems. It builds on an established agenda-based approach and extends it with several novel elements, including user satisfaction ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

August 2020

3664 pages

ISBN:9781450379984

DOI:10.1145/3394486

General Chairs:
Rajesh Gupta
UC San Diego, USA
,
Yan Liu
USC, USA
,
Program Chairs:
Mohak Shah
LG Electronics, USA
,
Suju Rajan
Linkedin, USA
,
Publications Chairs:
Jiliang Tang
Michigan State, USA
,
B. Aditya Prakash
Georgia Tech, USA

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '20

Sponsor:

KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

July 6 - 10, 2020

CA, Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

61
Total Citations
View Citations
1,273
Total Downloads

Downloads (Last 12 months)189
Downloads (Last 6 weeks)25

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bernard NBalog KOosterhuis HBast HXiong C(2024)Towards a Formal Characterization of User Simulation Objectives in Conversational Information AccessProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672529(185-193)Online publication date: 2-Aug-2024
https://dl.acm.org/doi/10.1145/3664190.3672529
Sekulić IAlinannejadi MCrestani F(2024)Analysing Utterances in LLM-Based User Simulation for Conversational SearchACM Transactions on Intelligent Systems and Technology10.1145/365004115:3(1-22)Online publication date: 5-Mar-2024
https://dl.acm.org/doi/10.1145/3650041
Rahdari BBrusilovsky PKveton B(2024)Towards Simulation-Based Evaluation of Recommender Systems with Carousel InterfacesACM Transactions on Recommender Systems10.1145/36437092:1(1-25)Online publication date: 30-Jan-2024
https://dl.acm.org/doi/10.1145/3643709
Bernard NBalog K(2024)Identifying Breakdowns in Conversational Recommender Systems using User SimulationProceedings of the 6th ACM Conference on Conversational User Interfaces10.1145/3640794.3665539(1-10)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3640794.3665539
Anelli VFerrara AMusto CNarducci FRagone AZanker M(2024)Sixth Knowledge-aware and Conversational Recommender Systems Workshop (KaRS)Proceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3687114(1245-1249)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3687114
Zhang GGao CPan HTeng RLi RSerra ESpezzano F(2024)Reformulating Conversational Recommender Systems as Tri-Phase Offline Policy LearningProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679792(3135-3144)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679792
Zhang ADeng YLin YChen XWen JChua THui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Large Language Model Powered Agents for Information RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3661375(2989-2992)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3661375
Yang DChen FFang HHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Behavior Alignment: A New Perspective of Evaluating LLM-based Conversational Recommendation SystemsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657924(2286-2290)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657924
Siro CAliannejadi Mde Rijke MHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657712(1952-1962)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657712
Abbasiantaeb ZYuan YKanoulas EAliannejadi MAngélica LLattanzi SMuñoz Medina AAkoglu LGionis AVassilvitskii S(2024)Let the LLMs Talk: Simulating Human-to-Human Conversational QA via Zero-Shot LLM-to-LLM InteractionsProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635856(8-17)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1145/3616855.3635856
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents