[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Predicting the quality and usability of spoken dialogue services

Published: 01 August 2008 Publication History

Abstract

In this paper, we compare different approaches for predicting the quality and usability of spoken dialogue systems. The respective models provide estimations of user judgments on perceived quality, based on parameters which can be extracted from interaction logs. Different types of input parameters and different modeling algorithms have been compared using three spoken dialogue databases obtained with two different systems. The results show that both linear regression models and classification trees are able to cover around 50% of the variance in the training data, and neural networks even more. When applied to independent test data, in particular to data obtained with different systems and/or user groups, the prediction accuracy decreases significantly. The underlying reasons for the limited predictive power are discussed. It is shown that - although an accurate prediction of individual ratings is not yet possible with such models - they may still be used for taking decisions on component optimization, and are thus helpful tools for the system developer.

References

[1]
Designing Interactive Speech Systems: From First Ideas to User Testing. Springer, Berlin.
[2]
Compagnoni, B., 2006. Development of prediction models for the quality of spoken dialogue systems. Diploma thesis (unpublished). Deutsche Telekom Laboratories, TU Berlin/Institut f. Nachrichtentechnik, TU Braunschweig.
[3]
Dialogue analysis in the Carnegie Mellon communicator. Proc. 6th Europ. Conf. on Speech Comm. Technol. (Eurospeech'99), Budapest. v1. 243-246.
[4]
Engelbrecht, K.-P., 2006. Fehlerklassifikation und Benutzbarkeits-Vorhersage für Sprachdialogdienste auf der Basis von mentalen Modellen (Error classification and usability prediction for spoken dialogue services on the basis of mental models). Magister thesis (unpublished), Deutsche Telekom Laboratories, TU, Berlin.
[5]
Assessment of interactive systems. In: Gibbon, D., Moore, R., Winski, R. (Eds.), Handbook on Standards and Resources for Spoken Language Systems, Mouton de Gruyter, Berlin. pp. 564-615.
[6]
Hastie, H.W., Prasad, R., Walter, M., 2002. Automatic evaluation: using a date dialogue act tagger for user satisfaction and task completion prediction. In: Proc. 3rd Internat. Conf. on Language Res. Evaluation (LREC 2002), Las Palmas, Vol. 2, pp. 641-648.
[7]
Towards a tool for the subjective assessment of speech system interfaces (SASSI). Natural Language Eng. v6 i3-4. 287-303.
[8]
Hone, K.S., Graham, R., 2001. Subjective assessment of speech-system interface usability. In: Proc. 7th Europ. Conf. on Speech Comm. Technol. (Eurospeech 2001 - Scandinavia), Aalborg, Vol. 3, pp. 2083-2086.
[9]
ISO 9241-11, 1998. Ergonomic Requirements for Office Work with Visual Display Terminals (VDTs) - Part 11: Guidance on Usability. International Organization for Standardization, Geneva.
[10]
ITU-T Contribution COM 12-47, 2007. Quality Estimation for Transmitted Synthesized Speech with Single-Ended Models: Comparison of Step 1 Results, Federal Republic of Germany, Alcatel-Lucent, Psytechnics. In: Möller, S., Kim, D.-S., Malfait, L., Kleijn, B., ITU-T SG12 Meeting, 16-25 January, 2007, Geneva.
[11]
ITU-T Recommendation P.851, 2003. Subjective Quality Evaluation of Telephone Services Based on Spoken Dialogue Systems. International Telecommunication Union, Geneva.
[12]
ITU-T Recommendation P.862, 2001. Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-end Speech Quality Assessment of Narrow-band Telephone Networks and Speech Codecs. International Telecommunication Union, Geneva.
[13]
ITU-T Supplement 24 to P-Series Recommendations, 2005. Parameters Describing the Interaction with Spoken Dialogue Systems. International Telecommunication Union, Geneva.
[14]
Voice and Speech Quality Perception. Assessment and Evaluation. Springer, Berlin.
[15]
Kamm, C.A., Litman, D.J., Walker, M.A., 1998. From novice to expert: the effect of tutorials on user expertise with spoken dialogue systems. In: Proc. 5th Internat. Conf. on Spoken Language Process. (ICSLP'98), Sydney, Vol. 4, pp. 1211-1214.
[16]
Larsen, L.B. (2003). Issues in the evaluation of spoken dialogue systems using objective and subjective measures. In: Proc. IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU'03), pp. 209-214.
[17]
Bayesian interpolation. Neural Comput. v4 i3. 415-447.
[18]
Quality of Telephone-based Spoken Dialogue Systems. Springer, New York, NY.
[19]
Möller, S., 2005b. Towards generic quality prediction models for spoken dialogue systems - a case study. In: Proc. 9th Europ. Conf. on Speech Comm. Technol. (Interspeech 2005), Lisboa, pp. 2489-2492.
[20]
Möller, S., 2005c. Perceptual quality dimensions of spoken dialogue systems: a review and new experimental results. In: Proc. 4th Europ. Congr. Acoustics (Forum Acusticum Budapest 2005), Budapest, pp. 2681-2686.
[21]
Evaluating the speech output component of a smart-home system. Speech Comm. v48. 1-27.
[22]
Möller, S., Englert, R., Engelbrecht, K., Hafner, V., Jameson, A., Oulasvirta, A., Raake, A., Reithinger, N., 2006. MeMo: towards automatic usability evaluation of spoken dialogue services by user error simulations. In: Proc. 9th Internat. Conf. on Spoken Language Process. (Interspeech 2006 - ICSLP), Pittsburgh, PA, pp. 1786-1789.
[23]
Möller, S., Heimansberg, J., 2006. Estimation of TTS quality in telephone environments using a reference-free quality prediction model. In: Proc. 2nd ISCA/DEGA Tutorial Res. Workshop Perceptual Quality of Systems, Berlin, pp. 56-60.
[24]
Evaluating spoken dialogue systems according to De-facto standards: a case study. Computer Speech Language. v21. 26-53.
[25]
Oulasvirta, A., Möller, S., Engelbrecht, K., Jameson, A., 2006. The relationship of user errors to perceived usability of a spoken dialogue system. In: Proc. 2nd ISCA/DEGA Tutorial Res. Workshop on Perceptual Quality of Systems, Berlin, pp. 61-67.
[26]
Rajman, M., Rajman, A., Seydoux, F., Trutnev, A., 2003. Assessing the usability of a dialogue management system designed in the framework of a rapid dialogue prototyping methodology. In: Proc. ISCA Tutorial Res. Workshop on Auditory Quality of Systems (AQS 2003), Mont Cenis, pp. 126-133.
[27]
Objective assessment of speech and audio quality - technology and applications. IEEE Trans. Audio Speech Language Process. v14. 1890-1901.
[28]
Simpson, A., Fraser, N.M., 1993. Black box and glass box evaluation of the SUNDIAL system. In: Proc. 3rd Europ. Conf. on Speech Comm. Technol. (Eurospeech'93), Berlin, Vol. 2, pp. 1423-1426.
[29]
Sutton, S., Cole, R., de Villiers, J., Schalkwyk, J., Vermeulen, P., Macon, M., Yan, Y., Kaiser, E., Rundle, B., Shobaki, K., Hosom, P., Kain, A., Wouters, J., Massaro, M., Cohen, M., 1998. Universal speech tools: the CSLU toolkit. In: Proc. 5th Internat. Conf. on Spoken Language Process. (ICSLP'98), Sydney, Vol. 7, pp. 3221-3224.
[30]
Trutnev, A., Ronzenknop, A., Rajman, M., 2004. Speech recognition simulation and its application for wizard-of-OZ experiments. In: Proc. 4th Internat. Conf. on Language Resources Eval. (LREC 2004), Lisbon, pp. 611-614.
[31]
Walker, M.A., Litman, D.J., Kamm, C.A., Abella, A., 1997. PARADISE: a framework for evaluating spoken dialogue agents. In: Proc. ACL/EACL 35th Ann. Meeting of the Assoc. for Comput. Linguistics, Madrid, pp. 271-280.
[32]
Evaluating spoken dialogue agents with PARADISE: two case studies. Computer Speech Language. v12. 147-317.
[33]
Towards developing general models of usability with PARADISE. Natural Language Engineering. v6. 363-377.
[34]
Walker, M., Kamm, C., Boland, J., 2000b. Developing and testing general models of spoken dialogue system performance. In: Proc. 2nd Int. Conf. on Language Resources and Evaluation (LREC 2000), Athens, vol. 1, pp. 189-196.

Cited By

View all
  • (2019)The Advent of Speech Based NLP QA Systems: A Refined Usability Testing ModelDesign, User Experience, and Usability. Practice and Case Studies10.1007/978-3-030-23535-2_11(152-163)Online publication date: 26-Jul-2019
  • (2016)Score-based Inverse Reinforcement LearningProceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems10.5555/2936924.2936991(457-465)Online publication date: 9-May-2016
  • (2016)Challenges in Building Highly Interactive Dialogue SystemsAI Magazine10.1609/aimag.v37i4.268737:4(7-18)Online publication date: 1-Dec-2016
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Speech Communication
Speech Communication  Volume 50, Issue 8-9
August, 2008
142 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 August 2008

Author Tags

  1. Optimization
  2. Prediction model
  3. Quality
  4. Spoken dialogue system
  5. Usability

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2019)The Advent of Speech Based NLP QA Systems: A Refined Usability Testing ModelDesign, User Experience, and Usability. Practice and Case Studies10.1007/978-3-030-23535-2_11(152-163)Online publication date: 26-Jul-2019
  • (2016)Score-based Inverse Reinforcement LearningProceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems10.5555/2936924.2936991(457-465)Online publication date: 9-May-2016
  • (2016)Challenges in Building Highly Interactive Dialogue SystemsAI Magazine10.1609/aimag.v37i4.268737:4(7-18)Online publication date: 1-Dec-2016
  • (2015)Interaction QualitySpeech Communication10.1016/j.specom.2015.06.00374:C(12-36)Online publication date: 1-Nov-2015
  • (2012)One year of contenderNAACL-HLT Workshop on Future Directions and Needs in the Spoken Dialog Community: Tools and Data10.5555/2390444.2390468(45-48)Online publication date: 7-Jun-2012
  • (2012)Towards standardized metrics and tools for spoken and multimodal dialog system evaluationNAACL-HLT Workshop on Future Directions and Needs in the Spoken Dialog Community: Tools and Data10.5555/2390444.2390449(5-6)Online publication date: 7-Jun-2012
  • (2012)Evaluating language understanding accuracy with respect to objective outcomes in a dialogue systemProceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics10.5555/2380816.2380874(471-481)Online publication date: 23-Apr-2012
  • (2011)Which system differences matter?Proceedings of the SIGDIAL 2011 Conference10.5555/2132890.2132893(8-17)Online publication date: 17-Jun-2011
  • (2011)Classifying dialogue in high-dimensional spaceACM Transactions on Speech and Language Processing 10.1145/1966407.19664137:3(1-15)Online publication date: 6-Jun-2011
  • (2010)Modeling user satisfaction transitions in dialogues from overall ratingsProceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue10.5555/1944506.1944510(18-27)Online publication date: 24-Sep-2010
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media