[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3394486.3403202acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Evaluating Conversational Recommender Systems via User Simulation

Published: 20 August 2020 Publication History

Abstract

Conversational information access is an emerging research area. Currently, human evaluation is used for end-to-end system evaluation, which is both very time and resource intensive at scale, and thus becomes a bottleneck of progress. As an alternative, we propose automated evaluation by means of simulating users. Our user simulator aims to generate responses that a real human would give by considering both individual preferences and the general flow of interaction with the system. We evaluate our simulation approach on an item recommendation task by comparing three existing conversational recommender systems. We show that preference modeling and task-specific interaction models both contribute to more realistic simulations, and can help achieve high correlation between automatic evaluation measures and manual human assessments.

References

[1]
Mohammad Aliannejadi, Hamed Zamani, Fabio Crestani, and W. Bruce Croft. 2019. Asking Clarifying Questions in Open-Domain Information-Seeking Conversations. In Proc. of SIGIR '19. 475--484.
[2]
Layla El Asri, Jing He, and Kaheer Suleman. 2016. A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue Systems. In Proc. of Interspeech '16, 2016. 1151--1155.
[3]
Leif Azzopardi, Mateusz Dubiel, Martin Halvey, and Jeffery Dalton. 2018. Conceptualizing Agent-human Interactions during the Conversational Search Process. In Proc. of CAIR '18.
[4]
Krisztian Balog and Tom Kenter. 2019. Personal Knowledge Graphs: A Research Agenda. In Proc. of ICTIR '19. 217--220.
[5]
Anja Belz and Ehud Reiter. 2006. Comparing Automatic and Human Evaluation of NLG Systems. In Proc. of EACL '06.
[6]
Keping Bi, Qingyao Ai, Yongfeng Zhang, and W Bruce Croft. 2019. Conversational Product Search Based on Negative Feedback. In Proc. of CIKM '19. 359--368.
[7]
Hongshen Chen, Xiaorui Liu, Dawei Yin, and Jiliang Tang. 2017. A Survey on Dialogue Systems: Recent Advances and New Frontiers. SIGKDD Explor. Newsl., Vol. 19, 2 (Nov. 2017), 25--35.
[8]
Konstantina Christakopoulou, Filip Radlinski, and Katja Hofmann. 2016. Towards Conversational Recommender Systems. In Proc. of KDD '16. 815--824.
[9]
Grace Chung. 2004. Developing a Flexible Spoken Dialog System Using Simulation. In Proc. of ACL '04.
[10]
J. Shane Culpepper, Fernando Diaz, and Mark D. Smucker. 2018. Research Frontiers in Information Retrieval: Report from the Third Strategic Workshop on Information Retrieval in Lorne (SWIRL 2018). SIGIR Forum, Vol. 52, 1 (2018), 34--90.
[11]
Jeff Dalton, Chenyan Xiong, and Jamie Callan. 2019. TREC Conversational Assistance Track. http://www.treccast.ai/.
[12]
David Griol, Javier Carbó, and José M. Molina. 2013. An Automatic Dialog Simulation Technique to Develop and Evaluate Interactive Conversational Agents. Appl. Artif. Intell., Vol. 27, 9 (oct 2013), 759--780.
[13]
F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Trans. Interact. Intell. Syst., Vol. 5, 4, Article 19 (2015).
[14]
Katja Hofmann, Lihong Li, and Filip Radlinski. 2016. Online Evaluation for Information Retrieval. Found. Trends Inf. Retr., Vol. 10, 1 (2016), 1--117.
[15]
Dan Jurafsky and James H. Martin. 2019. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 3nd Edition draft .Prentice Hall, Pearson Education International.
[16]
Diane Kelly. 2009. Methods for Evaluating Interactive Information Retrieval Systems with Users. Found. Trends Inf. Retr., Vol. 3, 1--2 (jan 2009), 1--224.
[17]
Florian Kreyssig, I n igo Casanueva, Pawel Budzianowski, and Milica Gasic. 2018. Neural User Simulation for Corpus-based Policy Optimisation of Spoken Dialogue Systems. In Proc. of SIGDIAL '18. 60--69.
[18]
Jiwei Li, Will Monroe, Alan Ritter, Dan Jurafsky, Michel Galley, and Jianfeng Gao. 2016. Deep Reinforcement Learning for Dialogue Generation. In Proc. of EMNLP '16. 1192--1202.
[19]
Jiwei Li, Will Monroe, Tianlin Shi, Sé bastien Jean, Alan Ritter, and Dan Jurafsky. 2017b. Adversarial Learning for Neural Dialogue Generation. In Proc. of EMNLP '17. 2157--2169.
[20]
Xiujun Li, Yun-Nung Chen, Lihong Li, Jianfeng Gao, and Asli Celikyilmaz. 2017a. End-to-End Task-Completion Neural Dialogue Systems. In Proc. of IJCNLP '17. 733--743.
[21]
Chia-Wei Liu, Ryan Lowe, Iulian Serban, Mike Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation. In Proc. of EMNLP '16. 2122--2132.
[22]
David Martin Maxwell. 2019. Modelling search and stopping in interactive information retrieval. Ph.D. Dissertation. University of Glasgow.
[23]
Alexandros Papangelis, Yi-Chia Wang, Piero Molino, and Gokhan Tur. 2019. Collaborative Multi-Agent Dialogue Model Training Via Reinforcement Learning. In Proc. of SIGDIAL '19. 92--102.
[24]
Baolin Peng, Xiujun Li, Jianfeng Gao, Jingjing Liu, and Kam-Fai Wong. 2018. Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning. In Proc. of ACL '18. 2182--2192.
[25]
Olivier Pietquin and Helen Hastie. 2013. A survey on metrics for the evaluation of user simulations. The Knowledge Engineering Review, Vol. 28, 1 (2013), 59--73.
[26]
Chen Qu, Liu Yang, W. Bruce Croft, Yongfeng Zhang, Johanne R. Trippas, and Minghui Qiu. 2019. User Intent Prediction in Information-Seeking Conversations. In Proc. of CHIIR '19. 25--33.
[27]
Filip Radlinski and Nick Craswell. 2017. A Theoretical Framework for Conversational Search. In Proc. of CHIIR '17. 117--126.
[28]
Mark Sanderson. 2010. Test Collection Based Evaluation of Information Retrieval Systems. Found. Trends Inf. Retr., Vol. 4, 4 (2010), 247--375.
[29]
Jost Schatzmann, Kallirroi Georgila, and Steve Young. 2005. Quantitative Evaluation of User Simulation Techniques for Spoken Dialogue Systems. In Proc. of SIGDIAL '05. 45--54.
[30]
Jost Schatzmann, Blaise Thomson, Karl Weilhammer, Hui Ye, and Steve Young. 2007. Agenda-based User Simulation for Bootstrapping a POMDP Dialogue System. In Proc. of NAACL-Short '07. 149--152.
[31]
Jost Schatzmann, Karl Weilhammer, Matt Stuttle, and Steve Young. 2006. A Survey of Statistical User Simulation Techniques for Reinforcement-Learning of Dialogue Management Strategies. Knowl. Eng. Rev., Vol. 21, 2 (June 2006), 97--126.
[32]
Iulian Vlad Serban, Ryan Lowe, Peter Henderson, Laurent Charlin, and Joelle Pineau. 2018. A Survey of Available Corpora For Building Data-Driven Dialogue Systems: The Journal Version. D & D, Vol. 9, 1 (2018), 1--49.
[33]
Johanne R. Trippas. 2019. Spoken Conversational Search: Audio-only Interactive Information Retrieval. Ph.D. Dissertation. RMIT University.
[34]
Svitlana Vakulenko. 2019. Knowledge-based Conversational Search. Ph.D. Dissertation. TU Wien.
[35]
Svitlana Vakulenko, Kate Revoredo, Claudio Di Ciccio, and Maarten de Rijke. 2019. QRFA: A Data-Driven Model of Information Seeking Dialogues. In Advances in Information Retrieval. 541--557.
[36]
Liu Yang, Minghui Qiu, Chen Qu, Jiafeng Guo, Yongfeng Zhang, W. Bruce Croft, Jun Huang, and Haiqing Chen. 2018. Response Ranking with Deep Matching Networks and External Knowledge in Information-Seeking Conversation Systems. In Proc. of SIGIR '18. 245--254.
[37]
Yongfeng Zhang, Xu Chen, Qingyao Ai, Liu Yang, and W. Bruce Croft. 2018. Towards Conversational Search and Recommendation: System Ask, User Respond. In Proc. of CIKM '18. 177--186.

Cited By

View all
  • (2024)Towards a Formal Characterization of User Simulation Objectives in Conversational Information AccessProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672529(185-193)Online publication date: 2-Aug-2024
  • (2024)Analysing Utterances in LLM-Based User Simulation for Conversational SearchACM Transactions on Intelligent Systems and Technology10.1145/365004115:3(1-22)Online publication date: 5-Mar-2024
  • (2024)Towards Simulation-Based Evaluation of Recommender Systems with Carousel InterfacesACM Transactions on Recommender Systems10.1145/36437092:1(1-25)Online publication date: 30-Jan-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
August 2020
3664 pages
ISBN:9781450379984
DOI:10.1145/3394486
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. conversational information access
  2. conversational recommendation
  3. user simulation

Qualifiers

  • Research-article

Conference

KDD '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)189
  • Downloads (Last 6 weeks)25
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Towards a Formal Characterization of User Simulation Objectives in Conversational Information AccessProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672529(185-193)Online publication date: 2-Aug-2024
  • (2024)Analysing Utterances in LLM-Based User Simulation for Conversational SearchACM Transactions on Intelligent Systems and Technology10.1145/365004115:3(1-22)Online publication date: 5-Mar-2024
  • (2024)Towards Simulation-Based Evaluation of Recommender Systems with Carousel InterfacesACM Transactions on Recommender Systems10.1145/36437092:1(1-25)Online publication date: 30-Jan-2024
  • (2024)Identifying Breakdowns in Conversational Recommender Systems using User SimulationProceedings of the 6th ACM Conference on Conversational User Interfaces10.1145/3640794.3665539(1-10)Online publication date: 8-Jul-2024
  • (2024)Sixth Knowledge-aware and Conversational Recommender Systems Workshop (KaRS)Proceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3687114(1245-1249)Online publication date: 8-Oct-2024
  • (2024)Reformulating Conversational Recommender Systems as Tri-Phase Offline Policy LearningProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679792(3135-3144)Online publication date: 21-Oct-2024
  • (2024)Large Language Model Powered Agents for Information RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3661375(2989-2992)Online publication date: 10-Jul-2024
  • (2024)Behavior Alignment: A New Perspective of Evaluating LLM-based Conversational Recommendation SystemsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657924(2286-2290)Online publication date: 10-Jul-2024
  • (2024)Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657712(1952-1962)Online publication date: 10-Jul-2024
  • (2024)Let the LLMs Talk: Simulating Human-to-Human Conversational QA via Zero-Shot LLM-to-LLM InteractionsProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635856(8-17)Online publication date: 4-Mar-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media