[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3394171.3414392acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
abstract

GoldenRetriever: A Speech Recognition System Powered by Modern Information Retrieval

Published: 12 October 2020 Publication History

Abstract

Existing Automatic Speech Recognition (ASR) systems usually generate the N-best hypotheses list first, and then rescore them with the language model score and the acoustic model score to find the best one. This procedure is essentially analogous to the working mechanism of modern Information Retrieval (IR) systems, which retrieve a relatively large amount of relevant candidates first, re-rank them, and output the top-N list. Exploiting their commonality, this demonstration proposes a novel system named GoldenRetriever that marries IR with ASR. GoldenRetriever transforms the problem of N-best hypotheses rescoring as a Learning-to-Rescore (L2RS) problem and utilizes a wide range of features beyond the language model score and the acoustic model score. In this demonstration, the audience can experience the great potential of marrying IR with ASR for the first time. GoldenRetriever should inspire more research on transferring the state-of-the-art IR techniques to ASR.

Supplementary Material

MP4 File (3394171.3414392.mp4)
Automatic Speech Recognition (ASR) systems usually generate the N-best hypotheses list first, and then rescore these hypotheses with the language model score and the acoustic model score. This procedure is essentially an analogy of the working mechanism of modern Information Retrieval (IR) systems, which retrieve a relatively large amount of relevant candidates first and re-rank them to output the top-N list. Exploiting their commonality, this demonstration proposes a novel system named GoldenRetriever that marries IR with ASR. GoldenRetriever transforms the problem of N-best hypotheses rescoring as a Learning-to-Rescore (L2RS) problem and utilizes a wide range of features beyond the language model score and the acoustic model score. In this demonstration, the audience can experience the great potential of marrying IR with ASR for the first time. GoldenRetriever should inspire more research on transferring the state-of-the-art IR techniques to ASR.

References

[1]
Yunbo Cao, Jun Xu, Tie-Yan Liu, Hang Li, Yalou Huang, and Hsiao-Wuen Hon. 2006. Adapting ranking SVM to document retrieval. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. 186--193.
[2]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171--4186.
[3]
Christopher Manning, Prabhakar Raghavan, and Hinrich Schütze. 2010. Introduction to information retrieval. Natural Language Engineering, Vol. 16, 1 (2010), 100--103.
[4]
Tomávs Mikolov, Martin Karafiát, Lukávs Burget, Jan Cernockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In INTERSPEECH2010 .
[5]
Atsunori Ogawa, Marc Delcroix, Shigeki Karita, and Tomohiro Nakatani. 2018. Rescoring N-best speech recognition list based on one-on-one hypothesis comparison using encoder-classifier model. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6099--6103.
[6]
Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlicek, Yanmin Qian, Petr Schwarz, et almbox. 2011. The Kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding. IEEE Signal Processing Society.
[7]
Yuanfeng Song, Di Jiang, Xuefang Zhao, Qian Xu, Raymond Chi-Wing Wong, Lixin Fan, and Qiang Yang. 2019. L2RS: A Learning-to-Rescore Mechanism for Automatic Speech Recognition. arXiv preprint arXiv:1910.11496 (2019).
[8]
Tomohiro Tanaka, Ryo Masumura, Takafumi Moriya, and Yushi Aono. 2018. Neural speech-to-text language models for rescoring hypotheses of dnn-hmm hybrid automatic speech recognition systems. In 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 196--200.
[9]
Han Xiao. 2018. bert-as-service. https://github.com/hanxiao/bert-as-service.

Cited By

View all
  • (2024)Speech-to-SQL: toward speech-driven SQL query generation from natural language questionThe VLDB Journal10.1007/s00778-024-00837-033:4(1179-1201)Online publication date: 16-Feb-2024
  • (2022)Telugu Dialect Speech Dataset Creation and Recognition using Deep Learning Techniques2022 IEEE 19th India Council International Conference (INDICON)10.1109/INDICON56171.2022.10040194(1-6)Online publication date: 24-Nov-2022
  • (2021)SmartMeetingProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3478556(2777-2779)Online publication date: 17-Oct-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '20: Proceedings of the 28th ACM International Conference on Multimedia
October 2020
4889 pages
ISBN:9781450379885
DOI:10.1145/3394171
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2020

Check for updates

Author Tags

  1. N-best rescoring
  2. learning-to-rescore
  3. speech recognition

Qualifiers

  • Abstract

Funding Sources

  • HKRGC

Conference

MM '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Speech-to-SQL: toward speech-driven SQL query generation from natural language questionThe VLDB Journal10.1007/s00778-024-00837-033:4(1179-1201)Online publication date: 16-Feb-2024
  • (2022)Telugu Dialect Speech Dataset Creation and Recognition using Deep Learning Techniques2022 IEEE 19th India Council International Conference (INDICON)10.1109/INDICON56171.2022.10040194(1-6)Online publication date: 24-Nov-2022
  • (2021)SmartMeetingProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3478556(2777-2779)Online publication date: 17-Oct-2021
  • (2021)SmartSalesProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3478555(2774-2776)Online publication date: 17-Oct-2021
  • (2021)Multimodal N-best List Rescoring with Weakly Supervised Pre-training in Hybrid Speech Recognition2021 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM51629.2021.00167(1336-1341)Online publication date: Dec-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media