More Web Proxy on the site http://driver.im/

research-article

Open access

Predicting Interaction Quality of Conversational Assistants With Spoken Language Understanding Model Confidences

Authors:

Enrico Piovano,

Monir Moniruzzaman,

Melanie Bradford,

Subhrangshu NandiAuthors Info & Claims

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Pages 4581 - 4587

https://doi.org/10.1145/3583780.3615493

Published: 21 October 2023 Publication History

Abstract

In conversational AI assistants, SLU models are part of a complex pipeline composed of several modules working in harmony. Hence, an update to the SLU model needs to ensure improvements not only in the model specific metrics but also in the overall conversational assistant performance. Specifically, the impact on user interaction quality metrics must be factored in, while integrating interactions with distal modules upstream and downstream of the SLU component. We develop a ML model that makes it possible to gauge the interaction quality metrics due to SLU model changes before a production launch. The proposed model is a multi-modal transformer with a gated mechanism that conditions on text embeddings, output of a BERT model pre-trained on conversational data, and the hypotheses of the SLU classifiers with the corresponding confidence scores. We show that the proposed model predicts defect with more than 76% correlation with live interaction quality defects, compared to 46% baseline.

References

[1]

Praveen Kumar Bodigutla, Spyros Matsoukas, Longshaokan Marshall Wang, Kate Ridgeway, Joshua Levy, Swanand Joshi, and Alborz Geramifard. 2019. Domain-Independent turn-level Dialogue Quality Evaluation via User Satisfaction Estimation. In SIGDIAL 2019 Workshop on Implications of Deep Learning for Dialog Modeling.

[2]

Praveen Kumar Bodigutla, Aditya Tiwari, Josep Valls-Vargas, Lazaros Polymenakos, and Spyros Matsoukas. 2020. Joint turn and dialogue level user satisfaction estimation on multi-domain conversations. In EMNLP 2020.

[3]

Rakesh Chada, Pradeep Natarajan, Darshan Fofadiya, and Prathap Ramachandra. 2021. Error Detection in Large-Scale Natural Language Understanding Systems Using Transformer Models. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, Online, 498--503. https://doi.org/10.18653/v1/2021.findings-acl.44

[4]

Corinna Cortes, Mehryar Mohri, Michael Riley, and Afshin Rostamizadeh. 2008. Sample selection bias correction theory. In Algorithmic Learning Theory: 19th International Conference, ALT 2008, Budapest, Hungary, October 13--16, 2008. Proceedings 19. Springer, 38--53.

Digital Library

[5]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https://doi.org/10.18653/v1/N19--1423

[6]

Sebastian Farquhar, Yarin Gal, and Tom Rainforth. 2021. On statistical bias in active learning: How and when to fix it. arXiv preprint arXiv:2101.11665 (2021).

[7]

Huifeng Guo, Ruiming TANG, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17. 1725--1731. https://doi.org/10.24963/ijcai.2017/239

Digital Library

[8]

Saurabh Gupta, Xing Fan, Derek Liu, Benjamin Yao, Yuan Ling, Kun Zhou, Tuan-Hung Pham, and Edward Guo. 2021. RoBERTaIQ: An efficient framework for automatic interaction quality estimation of dialogue systems. In KDD 2021 Workshop on Data-Efficient Machine Learning.

[9]

Rinat Khaziev, Usman Shahid, Tobias Röding, Rakesh Chada, Emir Kapanci, and Pradeep Natarajan. 2022. FPI: Failure Point Isolation in Large-scale Conversational Assistants. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track. Association for Computational Linguistics, Hybrid: Seattle, Washington Online, 141--148. https://doi.org/10.18653/v1/2022.naacl-industry.17

[10]

Young-Bum Kim, Dongchan Kim, Joo-Kyung Kim, and Ruhi Sarikaya. 2018. A Scalable Neural Shortlisting-Reranking Approach for Large-Scale Domain Classification in Natural Language Understanding. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers). Association for Computational Linguistics, New Orleans - Louisiana, 16--24. https://doi.org/10.18653/v1/N18--3003

[11]

Jannik Kossen, Sebastian Farquhar, Yarin Gal, and Tom Rainforth. 2021. Active testing: Sample-efficient model evaluation. In International Conference on Machine Learning. PMLR, 5753--5763.

[12]

Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74--81.

[13]

Ryan Lowe, Michael Noseworthy, Iulian Vlad Serban, Nicolas Angelard-Gontier, Yoshua Bengio, and Joelle Pineau. 2017. Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada, 1116--1126. https://doi.org/10.18653/v1/P17--1103

[14]

Phuc Nguyen, Deva Ramanan, and Charless Fowlkes. 2018. Active testing: An efficient and robust framework for estimating accuracy. In International Conference on Machine Learning. PMLR, 3759--3768.

[15]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 311--318. https://doi.org/10.3115/1073083.1073135

Digital Library

[16]

Enrico Piovano, Dieu-Thu Le, Bei Chen, and Melanie Bradford. 2022. Online adaptive metrics for model evaluation on non-representative offline test data. In 2022 26th International Conference on Pattern Recognition (ICPR). IEEE, 4464--4470.

[17]

Pragaash Ponnusamy, Alireza Ghias, Yi Yi, Benjamin Yao, Chenlei Guo, and Ruhi Sarikaya. 2022. Feedback-based self-learning in large-scale conversational ai agents. AI magazine, Vol. 42, 4 (2022), 43--56.

[18]

Wasifur Rahman, Md Kamrul Hasan, Sangwu Lee, AmirAli Bagher Zadeh, Chengfeng Mao, Louis-Philippe Morency, and Ehsan Hoque. 2020. Integrating Multimodal Information in Large Pretrained Transformers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 2359--2369. https://doi.org/10.18653/v1/2020.acl-main.214

[19]

Alexander Schmitt and Stefan Ultes. 2015a. Interaction quality: assessing the quality of ongoing spoken dialog interaction by experts-and how it relates to user satisfaction. Speech Communication, Vol. 74 (2015), 12--36.

Digital Library

[20]

Alexander Schmitt and Stefan Ultes. 2015b. Interaction Quality: Assessing the quality of ongoing spoken dialog interaction by experts-And how it relates to user satisfaction. Speech Communication, Vol. 74 (2015), 12--36. https://doi.org/10.1016/j.specom.2015.06.003

Digital Library

[21]

Alexander Schmitt, Stefan Ultes, and Wolfgang Minker. 2012. A Parameterized and Annotated Spoken Dialog Corpus of the Carnegie Mellon University Let's Go Bus Information System. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12). European Language Resources Association (ELRA), Istanbul, Turkey, 3369--3373.

[22]

Stefan Schroedl, Manoj Kumar, Kiana Hajebi, Morteza Ziyadi, Sriram Venkatapathy, Anil Ramakrishna, Rahul Gupta, and Pradeep Natarajan. 2022. Improving Large-Scale Conversational Assistants using Model Interpretation based Training Sample Selection. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track. Association for Computational Linguistics, Abu Dhabi, UAE, 371--378. https://aclanthology.org/2022.emnlp-industry.37

[23]

Pooja Sethi, Denis Savenkov, Forough Arabshahi, Jack Goetz, Micaela Tolliver, Nicolas Scheffer, Ilknur Kabul, Yue Liu, and Ahmed Aly. 2021. AutoNLU: Detecting, root-causing, and fixing NLU model errors. arXiv preprint arXiv:2110.06384 (2021).

[24]

Koustuv Sinha, Prasanna Parthasarathi, Jasmine Wang, Ryan Lowe, William L. Hamilton, and Joelle Pineau. 2020. Learning an Unreferenced Metric for Online Dialogue Evaluation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 2430--2441. https://doi.org/10.18653/v1/2020.acl-main.220

[25]

Chengwei Su, Rahul Gupta, Shankar Ananthakrishnan, and Spyros Matsoukas. 2018. A re-ranker scheme for integrating large scale nlu models. In 2018 IEEE Spoken Language Technology Workshop (SLT). IEEE, 670--676.

[26]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc.

[27]

Tong Wang, Jiangning Chen, Mohsen Malmir, Shuyan Dong, Xin He, Han Wang, Chengwei Su, Yue Liu, and Yang Liu. 2021. Optimizing NLU Reranking Using Entity Resolution Signals in Multi-domain Dialog Systems. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers. Association for Computational Linguistics, Online, 19--25. https://doi.org/10.18653/v1/2021.naacl-industry.3

[28]

Verena Weber, Enrico Piovano, and Melanie Bradford. 2021. It is better to Verify: Semi-Supervised Learning with a human in the loop for large-scale NLU models. In Proceedings of the Second Workshop on Data Science with Human in the Loop: Language Advances. Association for Computational Linguistics, Online, 8--15. https://doi.org/10.18653/v1/2021.dash-1.2

Cited By

Index Terms

Predicting Interaction Quality of Conversational Assistants With Spoken Language Understanding Model Confidences
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning

Recommendations

Eye Gaze for Spoken Language Understanding in Multi-modal Conversational Interactions
ICMI '14: Proceedings of the 16th International Conference on Multimodal Interaction

When humans converse with each other, they naturally amalgamate information from multiple modalities (i.e., speech, gestures, speech prosody, facial expressions, and eye gaze). This paper focuses on eye gaze and its combination with speech. We develop a ...
Salience modeling based on non-verbal modalities for spoken language understanding
ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces

Previous studies have shown that, in multimodal conversational systems, fusing information from multiple modalities together can improve the overall input interpretation through mutual disambiguation. Inspired by these findings, this paper investigates ...
Learning Dialogue History for Spoken Language Understanding
Natural Language Processing and Chinese Computing
Abstract
In task-oriented dialogue systems, spoken language understanding (SLU) aims to convert users’ queries expressed by natural language to structured representations. SLU usually consists of two parts, namely intent identification and slot filling. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

October 2023

5508 pages

ISBN:9798400701245

DOI:10.1145/3583780

General Chairs:
Ingo Frommholz
University of Wolverhampton, UK
,
Frank Hopfgartner
University of Koblenz, Germany
,
Mark Lee
University of Birmingham, UK
,
Michael Oakes
University of Birmingham, UK
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Min Zhang
Tsinghua University, China
,
Rodrygo Santos
Federal University of Minas Gerais, Brazil

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2023

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM '23

Sponsor:

CIKM '23: The 32nd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2023

Birmingham, United Kingdom

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
215
Total Downloads

Downloads (Last 12 months)187
Downloads (Last 6 weeks)30

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents