More Web Proxy on the site http://driver.im/

short-paper

Evaluating Variable-Length Multiple-Option Lists in Chatbots and Mobile Search

Authors:

Pepa Atanasova,

Georgi Karadzhov,

Fabrizio SebastianiAuthors Info & Claims

SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 997 - 1000

https://doi.org/10.1145/3331184.3331308

Published: 18 July 2019 Publication History

Abstract

In recent years, the proliferation of smart mobile devices has lead to the gradual integration of search functionality within mobile platforms. This has created an incentive to move away from the "ten blue links" metaphor, as mobile users are less likely to click on them, expecting to get the answer directly from the snippets. In turn, this has revived the interest in Question Answering. Then, along came chatbots, conversational systems, and messaging platforms, where the user needs could be better served with the system asking follow-up questions in order to better understand the user's intent. While typically a user would expect a single response at any utterance, a system could also return multiple options for the user to select from, based on different system understandings of the user's intent. However, this possibility should not be overused, as this practice could confuse and/or annoy the user. How to produce good variable-length lists, given the conflicting objectives of staying short while maximizing the likelihood of having a correct answer included in the list, is an underexplored problem. It is also unclear how to evaluate a system that tries to do that. Here we aim to bridge this gap. In particular, we define some necessary and some optional properties that an evaluation measure fit for this purpose should have. We further show that existing evaluation measures from the IR tradition are not entirely suitable for this setup, and we propose novel evaluation measures that address it satisfactorily.

References

[1]

A. Albahem, D. Spina, F. Scholer, A. Moffat, and L. Cavedon. 2018. Desirable Properties for Diversity and Truncated Effectiveness Metrics. In Proc. of ADCS.

Digital Library

[2]

E. Amigó, J. Gonzalo, and F. Verdejo. 2013. A General Evaluation Measure for Document Organization Tasks. In Proceedings of SIGIR.

Digital Library

[3]

E. Amigó, D. Spina, and J. Carrillo de Albornoz. 2018. An Axiomatic Analysis of Diversity Evaluation Metrics: Introducing the Rank-Biased Utility Metric. In Proceedings of SIGIR.

Digital Library

[4]

Z. Ashktorab, M. Jain, Q. V. Liao, and J. D Weisz. 2019. Resilient Chatbots: Repair Strategy Preferences for Conversational Breakdowns. In Proceedings of CHI.

Digital Library

[5]

L. Azzopardi, P. Thomas, and N. Craswell. 2018. Measuring the Utility of Search Engine Result Pages: An Information Foraging Based Measure. In Proc. of SIGIR.

Digital Library

[6]

R. Baeza-Yates, A. Z. Broder, and Y. Maarek. 2011. The New Frontier of WebSearch Technology: Seven Challenges. In Search Computing. Springer, 3--9.

Digital Library

[7]

T. Bocklisch, J. Faulker, N. Pawlowski, and A. Nichol. 2017. Rasa: Open Source Language Understanding and Dialogue Management. (2017). arXiv:1712.05181.

[8]

D. Braun, A. Hernandez-Mendez, F. Matthes, and M. Langen. 2017. Evaluating Natural Language Understanding Services for Conversational Question Answering Systems. In Proceedings of SIGDIAL.

[9]

M. Burtsev and other 19 authors. 2018. Deep Pavlov: Open-Source Library for Dialogue Systems. In Proceedings of ACL: System Demonstrations.

[10]

A. P. Chaves and M. A. Gerosa. 2019. How Should my Chatbot Interact? A Surveyon Human-Chatbot Interaction Design. (2019). arXiv:1904.02743 {cs.HC}.

[11]

A. Coucke, A. Saade, A. Ball, T. Bluche, A. Caulier, D. Leroy, C. Doumouro, T. Gisselbrecht, F. Caltagirone, T. Lavril, M. Primet, and J. Dureau. 2018. Snips Voice Platform: An Embedded Spoken Language Understanding System for Private-by-Design Voice Interfaces. (2018). arXiv:1805.10190 {cs.CL}.

[12]

A. Følstad and P. Brandtzæg. 2017. Chatbots and the New World of HCI. Interactions 24, 4 (2017), 38--42.

Digital Library

[13]

K. Järvelin and J. Kekäläinen. 2000. IR Evaluation Methods for Retrieving Highly Relevant Documents. In Proceedings of SIGIR.

Digital Library

[14]

F. Liu, A. Moffat, T. Baldwin, and X. Zhang. 2016. Quit While Ahead: Evaluating Truncated Rankings. In Proceedings of SIGIR.

Digital Library

[15]

A. Moffat. 2013. Seven Numeric Properties of Effectiveness Metrics. In Proceedings of AIRS.

[16]

A. Moffat and J. Zobel. 2008. Rank-biased Precision for Measurement of Retrieval Effectiveness. ACM Transactions on Information Systems27, 1 (2008), 2.

Digital Library

[17]

A. Peñas and A. Rodrigo. 2011. A Simple Measure to Assess Non-Response. In Proceedings of ACL.

Digital Library

[18]

T. Russell-Rose and T. Tate. 2012.Designing the Search Experience: The Information Architecture of Discovery. Newnes.

Digital Library

[19]

T. Sakai. 2004. New Performance Metrics Based on Multigrade Relevance: Their Application to Question Answering. In Proceedings of NTCIR.

[20]

T. Sakai and Z. Dou. 2013. Summaries, Ranked Retrieval and Sessions: A Unified Framework for Information Access Evaluation. In Proceedings of SIGIR.

Digital Library

[21]

F. Sebastiani. 2015. An Axiomatically Derived Measure for the Evaluation of Classification Algorithms. In Proceedings of ICTIR.

Digital Library

[22]

M. D. Smucker and C. Clarke. 2012. Modeling User Variance in Time-Biased Gain. In Proceedings of CHIIR.

Digital Library

[23]

E. M. Voorhees. 2001. Overview of the TREC 2001 Question Answering Track. In Proceedings of TREC.

[24]

J. D. Williams, E. Kamal, M. Ashour, H. Amr, J. Miller, and G. Zweig. 2015. Fast and Easy Language Understanding for Dialog Systems with Microsoft Language Understanding Intelligent Service (LUIS). In Proceedings of SIGDIAL.

[25]

P. Xu and R. Sarikaya. 2013. Exploiting Shared Information for Multi-Intent Natural Language Sentence Classification. In Proceedings of INTERSPEECH.

Index Terms

Evaluating Variable-Length Multiple-Option Lists in Chatbots and Mobile Search

Recommendations

Constructing Click Models for Mobile Search
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

Users' click-through behavior is considered as a valuable yet noisy source of implicit relevance feedback for web search engines. A series of click models have therefore been proposed to extract accurate and unbiased relevance feedback from click logs. ...
Enhancing mobile search using web search log data
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

Mobile search is still in infancy compared with general purpose web search. With limited training data and weak relevance features, the ranking performance in mobile search is far from satisfactory. To address this problem, we propose to leverage the ...
Evaluating Mobile Search with Height-Biased Gain
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Mobile search engine result pages (SERPs) are becoming highly visual and heterogenous. Unlike the traditional ten-blue-link SERPs for desktop search, different verticals and cards occupy different amounts of space within the small screen. Hence, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2019

1512 pages

ISBN:9781450361729

DOI:10.1145/3331184

General Chairs:
Benjamin Piwowarski
CNRS - Sorbonne Universite, France
,
Max Chevalier
Universite de Toulouse, CNRS, France
,
Eric Gaussier
Universite Grenoble Alpes, CNRS, France
,
Program Chairs:
Yoelle Maarek
Amazon Research, Israel
,
Jian-Yun Nie
University of Montreal, Canada
,
Falk Scholer
RMIT University, Australia

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

SIGIR Student Travel Grant

Conference

SIGIR '19

Sponsor:

SIGIR

SIGIR '19: The 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

July 21 - 25, 2019

Paris, France

Acceptance Rates

SIGIR'19 Paper Acceptance Rate 84 of 426 submissions, 20%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
187
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents