[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3583780.3615493acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article
Open access

Predicting Interaction Quality of Conversational Assistants With Spoken Language Understanding Model Confidences

Published: 21 October 2023 Publication History

Abstract

In conversational AI assistants, SLU models are part of a complex pipeline composed of several modules working in harmony. Hence, an update to the SLU model needs to ensure improvements not only in the model specific metrics but also in the overall conversational assistant performance. Specifically, the impact on user interaction quality metrics must be factored in, while integrating interactions with distal modules upstream and downstream of the SLU component. We develop a ML model that makes it possible to gauge the interaction quality metrics due to SLU model changes before a production launch. The proposed model is a multi-modal transformer with a gated mechanism that conditions on text embeddings, output of a BERT model pre-trained on conversational data, and the hypotheses of the SLU classifiers with the corresponding confidence scores. We show that the proposed model predicts defect with more than 76% correlation with live interaction quality defects, compared to 46% baseline.

References

[1]
Praveen Kumar Bodigutla, Spyros Matsoukas, Longshaokan Marshall Wang, Kate Ridgeway, Joshua Levy, Swanand Joshi, and Alborz Geramifard. 2019. Domain-Independent turn-level Dialogue Quality Evaluation via User Satisfaction Estimation. In SIGDIAL 2019 Workshop on Implications of Deep Learning for Dialog Modeling.
[2]
Praveen Kumar Bodigutla, Aditya Tiwari, Josep Valls-Vargas, Lazaros Polymenakos, and Spyros Matsoukas. 2020. Joint turn and dialogue level user satisfaction estimation on multi-domain conversations. In EMNLP 2020.
[3]
Rakesh Chada, Pradeep Natarajan, Darshan Fofadiya, and Prathap Ramachandra. 2021. Error Detection in Large-Scale Natural Language Understanding Systems Using Transformer Models. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, Online, 498--503. https://doi.org/10.18653/v1/2021.findings-acl.44
[4]
Corinna Cortes, Mehryar Mohri, Michael Riley, and Afshin Rostamizadeh. 2008. Sample selection bias correction theory. In Algorithmic Learning Theory: 19th International Conference, ALT 2008, Budapest, Hungary, October 13--16, 2008. Proceedings 19. Springer, 38--53.
[5]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https://doi.org/10.18653/v1/N19--1423
[6]
Sebastian Farquhar, Yarin Gal, and Tom Rainforth. 2021. On statistical bias in active learning: How and when to fix it. arXiv preprint arXiv:2101.11665 (2021).
[7]
Huifeng Guo, Ruiming TANG, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17. 1725--1731. https://doi.org/10.24963/ijcai.2017/239
[8]
Saurabh Gupta, Xing Fan, Derek Liu, Benjamin Yao, Yuan Ling, Kun Zhou, Tuan-Hung Pham, and Edward Guo. 2021. RoBERTaIQ: An efficient framework for automatic interaction quality estimation of dialogue systems. In KDD 2021 Workshop on Data-Efficient Machine Learning.
[9]
Rinat Khaziev, Usman Shahid, Tobias Röding, Rakesh Chada, Emir Kapanci, and Pradeep Natarajan. 2022. FPI: Failure Point Isolation in Large-scale Conversational Assistants. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track. Association for Computational Linguistics, Hybrid: Seattle, Washington Online, 141--148. https://doi.org/10.18653/v1/2022.naacl-industry.17
[10]
Young-Bum Kim, Dongchan Kim, Joo-Kyung Kim, and Ruhi Sarikaya. 2018. A Scalable Neural Shortlisting-Reranking Approach for Large-Scale Domain Classification in Natural Language Understanding. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers). Association for Computational Linguistics, New Orleans - Louisiana, 16--24. https://doi.org/10.18653/v1/N18--3003
[11]
Jannik Kossen, Sebastian Farquhar, Yarin Gal, and Tom Rainforth. 2021. Active testing: Sample-efficient model evaluation. In International Conference on Machine Learning. PMLR, 5753--5763.
[12]
Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74--81.
[13]
Ryan Lowe, Michael Noseworthy, Iulian Vlad Serban, Nicolas Angelard-Gontier, Yoshua Bengio, and Joelle Pineau. 2017. Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada, 1116--1126. https://doi.org/10.18653/v1/P17--1103
[14]
Phuc Nguyen, Deva Ramanan, and Charless Fowlkes. 2018. Active testing: An efficient and robust framework for estimating accuracy. In International Conference on Machine Learning. PMLR, 3759--3768.
[15]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 311--318. https://doi.org/10.3115/1073083.1073135
[16]
Enrico Piovano, Dieu-Thu Le, Bei Chen, and Melanie Bradford. 2022. Online adaptive metrics for model evaluation on non-representative offline test data. In 2022 26th International Conference on Pattern Recognition (ICPR). IEEE, 4464--4470.
[17]
Pragaash Ponnusamy, Alireza Ghias, Yi Yi, Benjamin Yao, Chenlei Guo, and Ruhi Sarikaya. 2022. Feedback-based self-learning in large-scale conversational ai agents. AI magazine, Vol. 42, 4 (2022), 43--56.
[18]
Wasifur Rahman, Md Kamrul Hasan, Sangwu Lee, AmirAli Bagher Zadeh, Chengfeng Mao, Louis-Philippe Morency, and Ehsan Hoque. 2020. Integrating Multimodal Information in Large Pretrained Transformers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 2359--2369. https://doi.org/10.18653/v1/2020.acl-main.214
[19]
Alexander Schmitt and Stefan Ultes. 2015a. Interaction quality: assessing the quality of ongoing spoken dialog interaction by experts-and how it relates to user satisfaction. Speech Communication, Vol. 74 (2015), 12--36.
[20]
Alexander Schmitt and Stefan Ultes. 2015b. Interaction Quality: Assessing the quality of ongoing spoken dialog interaction by experts-And how it relates to user satisfaction. Speech Communication, Vol. 74 (2015), 12--36. https://doi.org/10.1016/j.specom.2015.06.003
[21]
Alexander Schmitt, Stefan Ultes, and Wolfgang Minker. 2012. A Parameterized and Annotated Spoken Dialog Corpus of the Carnegie Mellon University Let's Go Bus Information System. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12). European Language Resources Association (ELRA), Istanbul, Turkey, 3369--3373.
[22]
Stefan Schroedl, Manoj Kumar, Kiana Hajebi, Morteza Ziyadi, Sriram Venkatapathy, Anil Ramakrishna, Rahul Gupta, and Pradeep Natarajan. 2022. Improving Large-Scale Conversational Assistants using Model Interpretation based Training Sample Selection. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track. Association for Computational Linguistics, Abu Dhabi, UAE, 371--378. https://aclanthology.org/2022.emnlp-industry.37
[23]
Pooja Sethi, Denis Savenkov, Forough Arabshahi, Jack Goetz, Micaela Tolliver, Nicolas Scheffer, Ilknur Kabul, Yue Liu, and Ahmed Aly. 2021. AutoNLU: Detecting, root-causing, and fixing NLU model errors. arXiv preprint arXiv:2110.06384 (2021).
[24]
Koustuv Sinha, Prasanna Parthasarathi, Jasmine Wang, Ryan Lowe, William L. Hamilton, and Joelle Pineau. 2020. Learning an Unreferenced Metric for Online Dialogue Evaluation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 2430--2441. https://doi.org/10.18653/v1/2020.acl-main.220
[25]
Chengwei Su, Rahul Gupta, Shankar Ananthakrishnan, and Spyros Matsoukas. 2018. A re-ranker scheme for integrating large scale nlu models. In 2018 IEEE Spoken Language Technology Workshop (SLT). IEEE, 670--676.
[26]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc.
[27]
Tong Wang, Jiangning Chen, Mohsen Malmir, Shuyan Dong, Xin He, Han Wang, Chengwei Su, Yue Liu, and Yang Liu. 2021. Optimizing NLU Reranking Using Entity Resolution Signals in Multi-domain Dialog Systems. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers. Association for Computational Linguistics, Online, 19--25. https://doi.org/10.18653/v1/2021.naacl-industry.3
[28]
Verena Weber, Enrico Piovano, and Melanie Bradford. 2021. It is better to Verify: Semi-Supervised Learning with a human in the loop for large-scale NLU models. In Proceedings of the Second Workshop on Data Science with Human in the Loop: Language Advances. Association for Computational Linguistics, Online, 8--15. https://doi.org/10.18653/v1/2021.dash-1.2

Cited By

View all

Index Terms

  1. Predicting Interaction Quality of Conversational Assistants With Spoken Language Understanding Model Confidences

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
      October 2023
      5508 pages
      ISBN:9798400701245
      DOI:10.1145/3583780
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 October 2023

      Check for updates

      Author Tags

      1. defect prediction
      2. dialog response quality
      3. spoken language understanding
      4. transformer-based model

      Qualifiers

      • Research-article

      Conference

      CIKM '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 215
        Total Downloads
      • Downloads (Last 12 months)187
      • Downloads (Last 6 weeks)30
      Reflects downloads up to 10 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media