More Web Proxy on the site http://driver.im/

Article

Free access

PARADISE: a framework for evaluating spoken dialogue agents

Authors:

Marilyn A. Walker,

Diane J. Litman,

Candace A. Kamm,

Alicia AbellaAuthors Info & Claims

ACL '98/EACL '98: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics

Pages 271 - 280

https://doi.org/10.3115/976909.979652

Published: 07 July 1997 Publication History

Abstract

This paper presents PARADISE (PARAdigm for DIalogue System Evaluation), a general framework for evaluating spoken dialogue agents. The framework decouples task requirements from an agent's dialogue behaviors, supports comparisons among dialogue strategies, enables the calculation of performance over subdialogues and whole dialogues, specifies the relative contribution of various factors to performance, and makes it possible to compare agents performing different tasks by normalizing for task complexity.

References

[1]

Abella, Alicia, Michael K. Brown, and Bruce Buntschuh. 1996. Development principles for dialog-based interfaces. In ECAI-96 Spoken Dialog Processing Workshop, Budapest, Hungary.

Digital Library

[2]

Bates, Madeleine and Damaris Ayuso. 1993. A proposal for incremental dialogue evaluation. In Proceedings of the DARPA Speech and NL Workshop, pages 319--322.

Digital Library

[3]

Carberry, S. 1989. Plan recognition and its use in understanding dialogue. In A. Kobsa and W. Wahlster, editors, User Models in Dialogue Systems. Springer Verlag, Berling, pages 133--162.

[4]

Carletta, Jean C. 1996. Assessing the reliability of subjective codings. Computational Linguistics, 22(2):249--254.

Digital Library

[5]

Chu-Carrol, Jennifer and Sandra Carberry. 1995. Response generation in collaborative negotiation. In Proceedings of the Conference of the 33rd Annual Meeting of the Association for Computational Linguistics, pages 136--143.

Digital Library

[6]

Cohen, Paul. R. 1995. Empirical Methods for Artificial Intelligence. MIT Press, Boston.

Digital Library

[7]

Danieli, M., W. Eckert, N. Fraser, N. Gilbert, M. Guyomard, P. Heisterkamp, M. Kharoune, J. Magadur, S. McGlashan, D. Sadek, J. Siroux, and N. Youd. 1992. Dialogue manager design evaluation. Technical Report Project Esprit 2218 SUNDIAL, WP6000-D3.

[8]

Danieli, Morena and Elisabetta Gerbino. 1995. Metrics for evaluating dialogue strategies in a spoken language system. In Proceedings of the 1995 AAAI Spring Symposium on Empirical Methods in Discourse Interpretation and Generation, pages 34--39.

[9]

Doyle, Jon. 1992. Rationality and its roles in reasoning. Computational Intelligence, 8(2):376--409.

[10]

Fraser, Norman M. 1995. Quality standards for spoken dialogue systems: a report on progress in EAGLES. In ESCA Workshop on Spoken Dialogue Systems Vigso, Denmark, pages 157--160.

[11]

Gale, William, Ken W. Church, and David Yarowsky. 1992. Estimating upper and lower bounds on the performance of word-sense disambiguation programs. In Proc. of 30th ACL, pages 249--256, Newark, Delaware.

Digital Library

[12]

Grosz, Barbara J. and Candace L. Sidner. 1986. Attentions, intentions and the structure of discourse. Computational Linguistics, 12:175--204.

Digital Library

[13]

Hirschberg, Julia and Christine Nakatani. 1996. A prosodic analysis of discourse segments in direction-giving monologues. In 34th Annual Meeting of the Association for Computational Linguistics, pages 286--293.

Digital Library

[14]

Hirschman, Lynette, Deborah A. Dahl, Donald P. McKay, Lewis M. Norton, and Marcia C. Linebarger. 1990. Beyond class A: A proposal for automatic evaluation of discourse. In Proceedings of the Speech and Natural Language Workshop, pages 109--113.

Digital Library

[15]

Hirschman, Lynette and Christine Pao. 1993. The cost of errors in a spoken language system. In Proceedings of the Third European Conference on Speech Communication and Technology, pages 1419--1422.

[16]

Joshi, Aravind K., Bonnie L. Webber, and Ralph M. Weischedel. 1984. Preventing false inferences. In COLING84: Proc. 10th International Conference on Computational Linguistics., pages 134--138.

Digital Library

[17]

Kamm, Candace. 1995. User interfaces for voice applications. In David Roe and Jay Wilpon, editors, Voice Communication between Humans and Machines. National Academy Press, pages 422--442.

Digital Library

[18]

Keeney, Ralph and Howard Raiffa. 1976. Decisions with Multiple Objectives: Preferences and Value Tradeoffs. John Wiley and Sons.

[19]

Krippendorf, Klaus. 1980. Content Analysis: An Introduction to its Methodology. Sage Publications, Beverly Hills, Ca.

[20]

Litman, Diane and James Allen. 1990. Recognizing and relating discourse intentions and task-oriented plans. In Philip Cohen, Jerry Morgan, and Martha Pollack, editors, Intentions in Communication. MIT Press.

[21]

Passonneau, Rebecca J. and Diane Litman. 1997. Discourse segmentation by human and automated means. Computational Linguistics, 23(1).

Digital Library

[22]

Polifroni, Joseph, Lynette Hirschman, Stephanie Seneff, and Victor Zue. 1992. Experiments in evaluating interactive spoken language systems. In Proceedings of the DARPA Speech and NL Workshop, pages 28--33.

Digital Library

[23]

Pollack, Martha, Julia Hirschberg, and Bonnie Webber. 1982. User participation in the reasoning process of expert systems. In Proceedings First National Conference on Artificial Intelligence, pages pp. 358--361.

[24]

Shriberg, Elizabeth, Elizabeth Wade, and Patti Price. 1992. Human-machine problem solving using spoken language systems (SLS): Factors affecting performance and user satisfaction. In Proceedings of the DARPA Speech and NL Workshop, pages 49--54.

Digital Library

[25]

Siegel, Sidney and N.J. Castellan. 1988. Nonparametric Statistics for the Behavioral Sciences. McGraw Hill.

[26]

Simpson, A. and N. A. Fraser. 1993. Black box and glass box evaluation of the SUNDIAL system. In Proceedings of the Third European Conference on Speech Communication and Technology, pages 1423--1426.

[27]

Smith, Ronnie W. and Steven A. Gordon. 1997. Effects of variable initiative on linguistic behavior in human-computer spoken natural language dialog. Computational Linguistics, 23(1).

Digital Library

[28]

Sparck-Jones, Karen and Julia R. Galliers. 1996. Evaluating Natural Language Processing Systems. Springer.

[29]

Walker, Marilyn A. 1996. The Effect of Resource Limits and Task Complexity on Collaborative Planning in Dialogue. Artificial Intelligence Journal, 85(1--2):181--243.

Digital Library

[30]

Webber, Bonnie and Aravind Joshi. 1982. Taking the initiative in natural language database interaction: Justifying why. In Coling 82, pages 413--419.

Digital Library

Cited By

Paetzel-Prüsmann MLehman JGomez CKennedy J(2024)An Automatic Evaluation Framework for Social Conversations with RobotsProceedings of the 2024 International Symposium on Technological Advances in Human-Robot Interaction10.1145/3648536.3648543(56-64)Online publication date: 9-Mar-2024
https://dl.acm.org/doi/10.1145/3648536.3648543
Jin YChen LCai WZhao X(2023)CRS-Que: A User-Centric Evaluation Framework for Conversational Recommender SystemsACM Transactions on Recommender Systems10.1145/3631534Online publication date: 2-Nov-2023
https://dl.acm.org/doi/10.1145/3631534
Siro CAliannejadi MDe Rijke M(2023)Understanding and Predicting User Satisfaction with Conversational Recommender SystemsACM Transactions on Information Systems10.1145/362498942:2(1-37)Online publication date: 8-Nov-2023
https://dl.acm.org/doi/10.1145/3624989
Show More Cited By

PARADISE: a framework for evaluating spoken dialogue agents
1. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

PARADISE-style evaluation of a human-human library corpus
SIGDIAL '11: Proceedings of the SIGDIAL 2011 Conference

We apply a PARADISE-style evaluation to a human-human dialogue corpus that was collected to support the design of a spoken dialogue system for library transactions. The book request dialogue task we investigate is informational in nature: a book request ...
Towards developing general models of usability with PARADISE

The design of methods for performance evaluation is a major open research issue in the area of spoken language dialogue systems. This paper presents the PARADISE methodology for developing predictive models of spoken dialogue performance, and shows how ...
Paradise: a framework for evaluating spoken dialogue agents
Readings in intelligent user interfaces

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings

ACL '98/EACL '98: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics

July 1997

543 pages

Program Chairs:
Philip R. Cohen
Oregon Graduate Institute
,
Wolfgang Wahlster
DFKI Saarbrücken, Germany

Sponsors

Directorate General XIII (European Commission)
Universidad Complutense de Madrid
Universidad Autónoma de Madrid
Universidad Nacional de Educación a Distancia
Universidad Politécnica de Madrid

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 07 July 1997

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

120
Total Citations
View Citations
2,051
Total Downloads

Downloads (Last 12 months)446
Downloads (Last 6 weeks)42

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Paetzel-Prüsmann MLehman JGomez CKennedy J(2024)An Automatic Evaluation Framework for Social Conversations with RobotsProceedings of the 2024 International Symposium on Technological Advances in Human-Robot Interaction10.1145/3648536.3648543(56-64)Online publication date: 9-Mar-2024
https://dl.acm.org/doi/10.1145/3648536.3648543
Jin YChen LCai WZhao X(2023)CRS-Que: A User-Centric Evaluation Framework for Conversational Recommender SystemsACM Transactions on Recommender Systems10.1145/3631534Online publication date: 2-Nov-2023
https://dl.acm.org/doi/10.1145/3631534
Siro CAliannejadi MDe Rijke M(2023)Understanding and Predicting User Satisfaction with Conversational Recommender SystemsACM Transactions on Information Systems10.1145/362498942:2(1-37)Online publication date: 8-Nov-2023
https://dl.acm.org/doi/10.1145/3624989
Reimann MOertel CKunneman FHindriks KLugrin BLatoschik Mvon Mammen SKopp SPécune FPelachaud C(2023)Predicting Interaction Quality Aspects Using Level-Based Scores for Conversational AgentsProceedings of the 23rd ACM International Conference on Intelligent Virtual Agents10.1145/3570945.3607332(1-8)Online publication date: 19-Sep-2023
https://dl.acm.org/doi/10.1145/3570945.3607332
Eagle TBlau CBales SDesai NLi VWhittaker S(2022)“I don’t know what you mean by `I am anxious'”: A New Method for Evaluating Conversational Agent Responses to Standardized Mental Health Inputs for Anxiety and DepressionACM Transactions on Interactive Intelligent Systems10.1145/348805712:2(1-23)Online publication date: 20-Jul-2022
https://dl.acm.org/doi/10.1145/3488057
Siro CAliannejadi Mde Rijke MAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)Understanding User Satisfaction with Task-oriented Dialogue SystemsProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531798(2018-2023)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3531798
Spina DTrippas JThomas PJoho HByström KClark LCraswell NCzerwinski MElsweiler DFrummet AGhosh SKiesel JLopatovska IMcDuff DMeyer SMourad AOwoicho PCherumanal SRussell DSitbon L(2021)Report on the future conversations workshop at CHIIR 2021ACM SIGIR Forum10.1145/3476415.347642155:1(1-22)Online publication date: 16-Jul-2021
https://dl.acm.org/doi/10.1145/3476415.3476421
Gao MLiu XXu AAkkiraju R(2021)Chatbot or Chat-Blocker: Predicting Chatbot Popularity before DeploymentProceedings of the 2021 ACM Designing Interactive Systems Conference10.1145/3461778.3462147(1458-1469)Online publication date: 28-Jun-2021
https://dl.acm.org/doi/10.1145/3461778.3462147
Shen LZhan HShen XChen HZhao XZhu XDemartini GZuccon GCulpepper JHuang ZTong H(2021)Identifying Untrustworthy SamplesProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482352(1598-1608)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482352
Thomas PCzerwinksi MMcduff DCraswell N(2021)Theories of Conversation for Conversational IRACM Transactions on Information Systems10.1145/343986939:4(1-23)Online publication date: 16-Aug-2021
https://dl.acm.org/doi/10.1145/3439869
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents