More Web Proxy on the site http://driver.im/

article

Predicting the quality and usability of spoken dialogue services

Authors:

Sebastian Möller,

Klaus-Peter Engelbrecht,

Robert SchleicherAuthors Info & Claims

Speech Communication, Volume 50, Issue 8-9

Pages 730 - 744

https://doi.org/10.1016/j.specom.2008.03.001

Published: 01 August 2008 Publication History

Abstract

In this paper, we compare different approaches for predicting the quality and usability of spoken dialogue systems. The respective models provide estimations of user judgments on perceived quality, based on parameters which can be extracted from interaction logs. Different types of input parameters and different modeling algorithms have been compared using three spoken dialogue databases obtained with two different systems. The results show that both linear regression models and classification trees are able to cover around 50% of the variance in the training data, and neural networks even more. When applied to independent test data, in particular to data obtained with different systems and/or user groups, the prediction accuracy decreases significantly. The underlying reasons for the limited predictive power are discussed. It is shown that - although an accurate prediction of individual ratings is not yet possible with such models - they may still be used for taking decisions on component optimization, and are thus helpful tools for the system developer.

References

[1]

Designing Interactive Speech Systems: From First Ideas to User Testing. Springer, Berlin.

[2]

Compagnoni, B., 2006. Development of prediction models for the quality of spoken dialogue systems. Diploma thesis (unpublished). Deutsche Telekom Laboratories, TU Berlin/Institut f. Nachrichtentechnik, TU Braunschweig.

[3]

Dialogue analysis in the Carnegie Mellon communicator. Proc. 6th Europ. Conf. on Speech Comm. Technol. (Eurospeech'99), Budapest. v1. 243-246.

[4]

Engelbrecht, K.-P., 2006. Fehlerklassifikation und Benutzbarkeits-Vorhersage für Sprachdialogdienste auf der Basis von mentalen Modellen (Error classification and usability prediction for spoken dialogue services on the basis of mental models). Magister thesis (unpublished), Deutsche Telekom Laboratories, TU, Berlin.

[5]

Assessment of interactive systems. In: Gibbon, D., Moore, R., Winski, R. (Eds.), Handbook on Standards and Resources for Spoken Language Systems, Mouton de Gruyter, Berlin. pp. 564-615.

[6]

Hastie, H.W., Prasad, R., Walter, M., 2002. Automatic evaluation: using a date dialogue act tagger for user satisfaction and task completion prediction. In: Proc. 3rd Internat. Conf. on Language Res. Evaluation (LREC 2002), Las Palmas, Vol. 2, pp. 641-648.

[7]

Towards a tool for the subjective assessment of speech system interfaces (SASSI). Natural Language Eng. v6 i3-4. 287-303.

[8]

Hone, K.S., Graham, R., 2001. Subjective assessment of speech-system interface usability. In: Proc. 7th Europ. Conf. on Speech Comm. Technol. (Eurospeech 2001 - Scandinavia), Aalborg, Vol. 3, pp. 2083-2086.

[9]

ISO 9241-11, 1998. Ergonomic Requirements for Office Work with Visual Display Terminals (VDTs) - Part 11: Guidance on Usability. International Organization for Standardization, Geneva.

[10]

ITU-T Contribution COM 12-47, 2007. Quality Estimation for Transmitted Synthesized Speech with Single-Ended Models: Comparison of Step 1 Results, Federal Republic of Germany, Alcatel-Lucent, Psytechnics. In: Möller, S., Kim, D.-S., Malfait, L., Kleijn, B., ITU-T SG12 Meeting, 16-25 January, 2007, Geneva.

[11]

ITU-T Recommendation P.851, 2003. Subjective Quality Evaluation of Telephone Services Based on Spoken Dialogue Systems. International Telecommunication Union, Geneva.

[12]

ITU-T Recommendation P.862, 2001. Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-end Speech Quality Assessment of Narrow-band Telephone Networks and Speech Codecs. International Telecommunication Union, Geneva.

[13]

ITU-T Supplement 24 to P-Series Recommendations, 2005. Parameters Describing the Interaction with Spoken Dialogue Systems. International Telecommunication Union, Geneva.

[14]

Voice and Speech Quality Perception. Assessment and Evaluation. Springer, Berlin.

[15]

Kamm, C.A., Litman, D.J., Walker, M.A., 1998. From novice to expert: the effect of tutorials on user expertise with spoken dialogue systems. In: Proc. 5th Internat. Conf. on Spoken Language Process. (ICSLP'98), Sydney, Vol. 4, pp. 1211-1214.

[16]

Larsen, L.B. (2003). Issues in the evaluation of spoken dialogue systems using objective and subjective measures. In: Proc. IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU'03), pp. 209-214.

[17]

Bayesian interpolation. Neural Comput. v4 i3. 415-447.

[18]

Quality of Telephone-based Spoken Dialogue Systems. Springer, New York, NY.

[19]

Möller, S., 2005b. Towards generic quality prediction models for spoken dialogue systems - a case study. In: Proc. 9th Europ. Conf. on Speech Comm. Technol. (Interspeech 2005), Lisboa, pp. 2489-2492.

[20]

Möller, S., 2005c. Perceptual quality dimensions of spoken dialogue systems: a review and new experimental results. In: Proc. 4th Europ. Congr. Acoustics (Forum Acusticum Budapest 2005), Budapest, pp. 2681-2686.

[21]

Evaluating the speech output component of a smart-home system. Speech Comm. v48. 1-27.

[22]

Möller, S., Englert, R., Engelbrecht, K., Hafner, V., Jameson, A., Oulasvirta, A., Raake, A., Reithinger, N., 2006. MeMo: towards automatic usability evaluation of spoken dialogue services by user error simulations. In: Proc. 9th Internat. Conf. on Spoken Language Process. (Interspeech 2006 - ICSLP), Pittsburgh, PA, pp. 1786-1789.

[23]

Möller, S., Heimansberg, J., 2006. Estimation of TTS quality in telephone environments using a reference-free quality prediction model. In: Proc. 2nd ISCA/DEGA Tutorial Res. Workshop Perceptual Quality of Systems, Berlin, pp. 56-60.

[24]

Evaluating spoken dialogue systems according to De-facto standards: a case study. Computer Speech Language. v21. 26-53.

[25]

Oulasvirta, A., Möller, S., Engelbrecht, K., Jameson, A., 2006. The relationship of user errors to perceived usability of a spoken dialogue system. In: Proc. 2nd ISCA/DEGA Tutorial Res. Workshop on Perceptual Quality of Systems, Berlin, pp. 61-67.

[26]

Rajman, M., Rajman, A., Seydoux, F., Trutnev, A., 2003. Assessing the usability of a dialogue management system designed in the framework of a rapid dialogue prototyping methodology. In: Proc. ISCA Tutorial Res. Workshop on Auditory Quality of Systems (AQS 2003), Mont Cenis, pp. 126-133.

[27]

Objective assessment of speech and audio quality - technology and applications. IEEE Trans. Audio Speech Language Process. v14. 1890-1901.

[28]

Simpson, A., Fraser, N.M., 1993. Black box and glass box evaluation of the SUNDIAL system. In: Proc. 3rd Europ. Conf. on Speech Comm. Technol. (Eurospeech'93), Berlin, Vol. 2, pp. 1423-1426.

[29]

Sutton, S., Cole, R., de Villiers, J., Schalkwyk, J., Vermeulen, P., Macon, M., Yan, Y., Kaiser, E., Rundle, B., Shobaki, K., Hosom, P., Kain, A., Wouters, J., Massaro, M., Cohen, M., 1998. Universal speech tools: the CSLU toolkit. In: Proc. 5th Internat. Conf. on Spoken Language Process. (ICSLP'98), Sydney, Vol. 7, pp. 3221-3224.

[30]

Trutnev, A., Ronzenknop, A., Rajman, M., 2004. Speech recognition simulation and its application for wizard-of-OZ experiments. In: Proc. 4th Internat. Conf. on Language Resources Eval. (LREC 2004), Lisbon, pp. 611-614.

[31]

Walker, M.A., Litman, D.J., Kamm, C.A., Abella, A., 1997. PARADISE: a framework for evaluating spoken dialogue agents. In: Proc. ACL/EACL 35th Ann. Meeting of the Assoc. for Comput. Linguistics, Madrid, pp. 271-280.

[32]

Evaluating spoken dialogue agents with PARADISE: two case studies. Computer Speech Language. v12. 147-317.

[33]

Towards developing general models of usability with PARADISE. Natural Language Engineering. v6. 363-377.

[34]

Walker, M., Kamm, C., Boland, J., 2000b. Developing and testing general models of spoken dialogue system performance. In: Proc. 2nd Int. Conf. on Language Resources and Evaluation (LREC 2000), Athens, vol. 1, pp. 189-196.

Cited By

Lane DRenwick RMcAvoy JO’Reilly P(2019)The Advent of Speech Based NLP QA Systems: A Refined Usability Testing ModelDesign, User Experience, and Usability. Practice and Case Studies10.1007/978-3-030-23535-2_11(152-163)Online publication date: 26-Jul-2019
https://dl.acm.org/doi/10.1007/978-3-030-23535-2_11
El Asri LPiot BGeist MLaroche RPietquin OJonker CMarsella SThangarajah JTuyls K(2016)Score-based Inverse Reinforcement LearningProceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems10.5555/2936924.2936991(457-465)Online publication date: 9-May-2016
https://dl.acm.org/doi/10.5555/2936924.2936991
Ward NDeVault D(2016)Challenges in Building Highly Interactive Dialogue SystemsAI Magazine10.1609/aimag.v37i4.268737:4(7-18)Online publication date: 1-Dec-2016
https://dl.acm.org/doi/10.1609/aimag.v37i4.2687
Show More Cited By

Index Terms

Predicting the quality and usability of spoken dialogue services
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Discourse, dialogue and pragmatics
      2. Speech recognition
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI design and evaluation methods

Recommendations

Sequential classifiers for the prediction of user judgments about spoken dialog systems

So far, predictions of user quality judgments in response to spoken dialog systems have been achieved on the basis of interaction parameters describing the dialog, e.g. in the PARADISE framework. These parameters do not take into account the temporal ...
Spoken Dialogue Interfaces: Integrating Usability
USAB '09: Proceedings of the 5th Symposium of the Workgroup Human-Computer Interaction and Usability Engineering of the Austrian Computer Society on HCI and Usability for e-Inclusion

Usability is a fundamental requirement for natural language interfaces. Usability evaluation reflects the impact of the interface and the acceptance from the users. This work examines the potential of usability evaluation in terms of issues and ...
I Know Your Feelings Before You Do: Predicting Future Affective Reactions in Human-Computer Dialogue
CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems

Current Spoken Dialogue Systems (SDSs) often serve as passive listeners that respond only after receiving user speech. To achieve human-like dialogue, we propose a novel future prediction architecture that allows an SDS to anticipate future affective ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Speech Communication

Speech Communication Volume 50, Issue 8-9

August, 2008

142 pages

ISSN:0167-6393

Issue’s Table of Contents

Copyright © Elsevier B.V. © 2008.

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 August 2008

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lane DRenwick RMcAvoy JO’Reilly P(2019)The Advent of Speech Based NLP QA Systems: A Refined Usability Testing ModelDesign, User Experience, and Usability. Practice and Case Studies10.1007/978-3-030-23535-2_11(152-163)Online publication date: 26-Jul-2019
https://dl.acm.org/doi/10.1007/978-3-030-23535-2_11
El Asri LPiot BGeist MLaroche RPietquin OJonker CMarsella SThangarajah JTuyls K(2016)Score-based Inverse Reinforcement LearningProceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems10.5555/2936924.2936991(457-465)Online publication date: 9-May-2016
https://dl.acm.org/doi/10.5555/2936924.2936991
Ward NDeVault D(2016)Challenges in Building Highly Interactive Dialogue SystemsAI Magazine10.1609/aimag.v37i4.268737:4(7-18)Online publication date: 1-Dec-2016
https://dl.acm.org/doi/10.1609/aimag.v37i4.2687
Schmitt AUltes S(2015)Interaction QualitySpeech Communication10.1016/j.specom.2015.06.00374:C(12-36)Online publication date: 1-Nov-2015
https://dl.acm.org/doi/10.1016/j.specom.2015.06.003
Suendermann DPieraccini REskenazi MBlack ATraum D(2012)One year of contenderNAACL-HLT Workshop on Future Directions and Needs in the Spoken Dialog Community: Tools and Data10.5555/2390444.2390468(45-48)Online publication date: 7-Jun-2012
https://dl.acm.org/doi/10.5555/2390444.2390468
Möller SEngelbrecht KKretzschmar FSchmidt SWeiss BEskenazi MBlack ATraum D(2012)Towards standardized metrics and tools for spoken and multimodal dialog system evaluationNAACL-HLT Workshop on Future Directions and Needs in the Spoken Dialog Community: Tools and Data10.5555/2390444.2390449(5-6)Online publication date: 7-Jun-2012
https://dl.acm.org/doi/10.5555/2390444.2390449
Dzikovska MBell PIsard AMoore JDaelemans W(2012)Evaluating language understanding accuracy with respect to objective outcomes in a dialogue systemProceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics10.5555/2380816.2380874(471-481)Online publication date: 23-Apr-2012
https://dl.acm.org/doi/10.5555/2380816.2380874
González-Brenes JMostow JMoore JTraum DChai JPassonneau R(2011)Which system differences matter?Proceedings of the SIGDIAL 2011 Conference10.5555/2132890.2132893(8-17)Online publication date: 17-Jun-2011
https://dl.acm.org/doi/10.5555/2132890.2132893
González-Brenes JMostow J(2011)Classifying dialogue in high-dimensional spaceACM Transactions on Speech and Language Processing 10.1145/1966407.19664137:3(1-15)Online publication date: 6-Jun-2011
https://dl.acm.org/doi/10.1145/1966407.1966413
Higashinaka RMinami YDohsaka KMeguro TKatagiri YNakano MFernández RLemon O(2010)Modeling user satisfaction transitions in dialogues from overall ratingsProceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue10.5555/1944506.1944510(18-27)Online publication date: 24-Sep-2010
https://dl.acm.org/doi/10.5555/1944506.1944510
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents