[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/1641462.1641463dlproceedingsArticle/Chapter ViewAbstractPublication PagesisdsConference Proceedingsconference-collections
research-article
Free access

Evaluating interactive dialogue systems: extending component evaluation to integrated system evaluation

Published: 11 July 1997 Publication History

Abstract

This paper discusses the range of ways in which spoken dialogue system components have been evaluated and discusses approaches to evaluation that attempt to integrate component evaluation into an overall view of system performance. We will argue that the PARADISE (PARAdigm for DIalogue System Evaluation) framework has several advantages over other proposals.

References

[1]
Abella, Alicia, Michael K Brown, and Bruce Buntschuh. 1996. Development principles for dialog-based interfaces. In ECAI-96 Spoken Dialog Processing Workshop, Budapest, Hungary.
[2]
Bates, Madeleine and Damans Ayuso. 1993. A proposal for incremental dialogue evaluation. In Proceedings of the DARPA Speech and NL Workshop, pages 319--322.
[3]
Bernsen, Niels Ole, Hans Dybkjaer, and Laila Dybkjaer. 1996. Principles for the design of cooperative spoken human-machine dialogue. In International Conference on Spoken Language Processing, ICSLP 96, pages 729--732.
[4]
Carberry, S. 1989. Plan recognition and its use in understanding dialogue. In A. Kobsa and W. Wahlster, editors, User Models in Dialogue Systems. Springer Verlag, Berlin, pages 133--162.
[5]
Carletta, Jean C. 1996. Assessing the reliability of subjective codings. Computational Linguistics, 22(2):249--254.
[6]
Ciaremella, A. 1993. A prototype performance evaluation report. Technical Report Project Esprit 2218 SUNDIAL, WP8000--D3.
[7]
Cohen, Paul. R. 1995. Empirical Methods for Artificial Intelligence. MIT Press, Boston.
[8]
Danieli, M., W. Eckert, N. Fraser, N. Gilbert, M. Guyomard, P. Heisterkamp, M. Kharoune, J. Magadur, S. McGlashan, D. Sadek, J. Siroux, and N. Youd. 1992. Dialogue manager design evaluation. Technical Report Project Esprit 2218 SUNDIAL, WP6000--D3.
[9]
Danieli, Morena and Elisabetta Gerbino. 1995. Metrics for evaluating dialogue strategies in a spoken language system. In Proceedings of the 1995 AAAI Spring Symposium on Empirical Methods in Discourse Interpretation and Generation, pages 34--39.
[10]
Doyle, Jon. 1992. Rationality and its roles in reasoning. Computational Intelligence, 8(2):376--409.
[11]
Fraser, Norman M. 1995. Quality standards for spoken dialogue systems: a report on progress in EAGLES. In ESCA Workshop on Spoken Dialogue Systems Vigso, Denmark, pages 157--160.
[12]
Grice, H. P. 1967. Logic and conversation.
[13]
Grosz, Barbara J. and Candace L. Sidner. 1986. Attentions, intentions and the structure of discourse. Computational Linguistics, 12:175--204.
[14]
Hirschberg, Julia and Christine Nakatani. 1996. A prosodic analysis of discourse segments in direction-giving monologues. In 34th Annual Meeting of the Association for Computational Linguistics, pages 286--293.
[15]
Hirschman, L., M. Bates, D. Dahl, W. Fisher, J. Garofolo, D. Pallett, K. Hunicke-Smith, P. Price, A. Rudnicky, and E. Tzoukermann. 1993. Multi-site data collection and evaluation in spoken language understanding. In Proceedings of the Human Language Technology Workshop, pages 19--24.
[16]
Hirschman, Lynette, Deborah A. Dahl, Donald P. McKay, Lewis M. Norton, and Marcia C. Linebarger. 1990. Beyond class A: A proposal for automatic evaluation of discourse. In Proceedings of the Speech and Natural Language Workshop, pages 109--113.
[17]
Hirschman, Lynette and Christine Pao. 1993. The cost of errors in a spoken language system. In Proceedings of the Third European Conference on Speech Communication and Technology, pages 1419--1422.
[18]
Kamm, Candace. 1995. User interfaces for voice applications. In David Roe and Jay Wilpon, editors, Voice Communication between Humans and Machines. National Academy Press, pages 422--442.
[19]
Keeney, Ralph and Howard Raiffa. 1976. Decisions with Multiple Objectives: Preferences and Value Tradeoffs. John Wiley and Sons.
[20]
Krippendorf, Klaus. 1980. Content Analysis: An Introduction to its Methodology. Sage Publications, Beverly Hills, Ca.
[21]
Litman, Diane and James Allen. 1990. Recognizing and relating discourse intentions and task-oriented plans. In Philip Cohen, Jerry Morgan, and Martha Pollack, editors, Intentions in Communication. MIT Press.
[22]
Passonneau, Rebecca J. and Diane Litman. 1997. Discourse segmentation by human and automated means. Computational Linguistics, 23(1).
[23]
Polifroni, Joseph, Lynette Hirschman, Stephanie Seneff, and Victor Zue. 1992. Experiments in evaluating interactive spoken language systems. In Proceedings of the DARPA Speech and NL Workshop, pages 28--33.
[24]
Price, Patti, Lynette Hirschman, Elizabeth Shriberg, and Elizabeth Wade. 1992. Subject-based evaluation measures for interactive spoken language systems. In Proceedings of the DARPA Speech and NL Workshop, pages 34--39.
[25]
Shriberg, Elizabeth, Elizabeth Wade, and Patti Price. 1992. Human-machine problem solving using spoken language systems (SLS): Factors affecting performance and user satisfaction. In Proceedings of the DARPA Speech and NL Workshop, pages 49--54.
[26]
Siegel, Sidney and N. J. Castellan. 1988. Nonparametric Statistics for the Behavioral Sciences. McGraw Hill.
[27]
Simpson, A. and N. A. Fraser. 1993. Black box and glass box evaluation of the SUNDIAL system. In Proceedings of the Third European Conference on Speech Communication and Technology, pages 1423--1426.
[28]
Smith, Ronnie W. and Steven A. Gordon. 1997. Effects of variable initiative on linguistic behavior in human-computer spoken natural language dialog. Computational Linguistics, 23(1).
[29]
Smith, Ronnie W. and D. Richard Hipp. 1994. Spoken Natural Language Dialog Systems: A Practical Approach. Oxford University Press.
[30]
Sparck-Jones, Karen and Julia R. Galliers. 1996. Evaluating Natural Language Processing Systems. Springer.
[31]
Walker, Marilyn A. 1989. Evaluating discourse processing algorithms. In Proc. 27th Annual Meeting of the Association of Computational Linguistics, pages 251--261.
[32]
Walker, Marilyn A. 1996. The Effect of Resource Limits and Task Complexity on Collaborative Planning in Dialogue. Artificial Intelligence Journal, 85(1--2): 181--243.
[33]
Walker, Marilyn A., Diane Litman, Candace Kamm, and Alicia Abella. 1997. Paradise: A general framework for evaluating spoken dialogue agents. In Proceedings of the 35th Annual Meeting of the Association of Computational Linguistics, ACL/EACL 97.

Cited By

View all
  • (2001)Usability evaluation in spoken language dialogue systemsProceedings of the workshop on Evaluation for Language and Dialogue Systems - Volume 910.3115/1118053.1118055(1-10)Online publication date: 6-Jul-2001
  1. Evaluating interactive dialogue systems: extending component evaluation to integrated system evaluation

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image DL Hosted proceedings
    ISDS '97: Interactive Spoken Dialog Systems on Bringing Speech and NLP Together in Real Applications
    July 1997
    133 pages

    Publisher

    Association for Computational Linguistics

    United States

    Publication History

    Published: 11 July 1997

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)59
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 21 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2001)Usability evaluation in spoken language dialogue systemsProceedings of the workshop on Evaluation for Language and Dialogue Systems - Volume 910.3115/1118053.1118055(1-10)Online publication date: 6-Jul-2001

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media