CRDNN-BiLSTM Knowledge Distillation Model Towards Enhancing the Automatic Speech Recognition

L. Ashok Kumar¹,
D. Karthika Renuka¹,
K. S. Naveena² &
…
S. Sree Resmi²

117 Accesses
Explore all metrics

Abstract

Numerous automatic speech recognition (ASR) models have been developed in recent years, but they suffer from the drawback of being large models that take more time to train and are difficult to deploy on devices. Knowledge distillation has been used to reduce the size of current learning models while keeping up the efficiency across a range of applications. As a result, the knowledge distillation for the ASR model has been suggested in this paper to make the training process simpler and faster than the existing model. The knowledge gained from training a teacher acoustic model is transferred to the student acoustic model to improve its performance. With the help of this work, the ASR models can be trained effectively with fewer tiresome tasks. Graphical results show that this framework efficiently trains the audio input. The experimental results inferred that the proposed model employing knowledge distillation is efficient in speech recognition by achieving a Word Error Rate of 1.21% on LibriSpeech Corpus dev-clean and 2.23% on LibriSpeech Corpus test-clean.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

End-to-end emotional speech recognition using acoustic model adaptation based on knowledge distillation

Article 13 February 2023

Research on Khalkha Dialect Mongolian Speech Recognition Acoustic Model Based on Weight Transfer

Cross-Modal Knowledge Distillation for Audiovisual Speech Recognition

Data availability

The dataset used for implementation is a benchmark dataset and its available for free access.

Notes

https://www.openslr.org/12.

References

Asami T, Masumura R, Yamaguchi Y, Masataki H, Aono Y. Domain adaptation of dnn acoustic models using knowledge distillation. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE; 2017, March. p. 5185–5189.
Chung J, Gulcehre C, Cho K, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modelling 2014. arXiv preprint arXiv:1412.3555.
Collobert R, Puhrsch C, Synnaeve G. Wav2letter: an end-to-end convnet-based speech recognition system, 2016.
Fukuda T, Suzuki M, Kurata G, Thomas S, Cui J, Ramabhadran B. Efficient knowledge distillation from an ensemble of teachers. In: Interspeech 2017, August. p. 3697–3701.
Graves A, Fernández S, Gomez F, Schmidhuber J. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on Machine learning 2006, June. p. 369–376.
Gudepu PR, Vadisetti GP, Niranjan A, Saranu K, Sarma R, Shaik MAB, Paramasivam P. Whisper augmented end-to-end/hybrid speech recognition system-CycleGAN approach. In: INTERSPEECH; 2020. p. 2302–2306.
Guo J, Sainath T, RonWeiss. A spelling correction model for end-to-end speech recognition, 05 2019.
Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network 2015. arXiv preprint arXiv:1503.02531.
Huang M, You Y, Chen Z, Qian Y, Yu K. Knowledge distillation for sequence model. In: Interspeech 2018, September. p. 3703–3707.
Hui L, Belkin M. Evaluation of neural architectures trained with square loss vs cross-entropy in classification tasks 2020. arXiv preprint arXiv:2006.07322.
Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. PMLR. 2015, June. p. 448–456.
Jiang Y, Sharma B, Madhavi M, Li H. Knowledge distillation from BERT transformer to speech transformer for intent classification 2021. arXiv preprint arXiv:2108.02598.
Kumar LA, Renuka DK, Priya MS. Towards robust speech recognition model using deep learning. In: 2023 International conference on intelligent systems for communication, IoT and security (ICISCoIS) 2023, February. IEEE. p. 253–256.
Kurata G, Audhkhasi K. Improved knowledge distillation from bi-directional to uni-directional LSTM CTC for end-to-end speech recognition. In: 2018 IEEE spoken language technology workshop (SLT) 2018, December. IEEE. p. 411–417.
Lee MH, Chang JH. Knowledge distillation from language model to acoustic model: a hierarchical multi-task learning approach. In: ICASSP 2022-2022 IEEE international conference on acoustics, speech and signal processing (ICASSP) 2022, May. IEEE. p. 8392–8396.
Li C, Zhu L, Xu S, Gao P, Xu B. Compression of the acoustic model via knowledge distillation and pruning. In: 2018 24th International conference on pattern recognition (ICPR) 2018, August. IEEE. p. 2785–2790.
Li J, Lavrukhin V, Ginsburg B, Leary R, Kuchaiev O, Cohen JM, Gadde RT. Jasper: an end-to-end convolutional neural acoustic model. 2019. arXiv preprint arXiv:1904.03288.
Liu Y, Xiong H, He Z, Zhang J, Wu H, Wang H, Zong C. End-to-end speech translation with knowledge distillation 2019. arXiv preprint arXiv:1904.08075.
Lu KH, Chen KY. A context-aware knowledge transferring strategy for CTC-based ASR 2022. arXiv preprint arXiv:2210.06244.
Masumura R, Makishima N, Ihori M, Takashima A, Tanaka T, Orihashi S. Hierarchical transformer-based large-context end-to-end asr with large-context knowledge distillation. In: ICASSP 2021-2021 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE. 2021, June. p. 5879–5883.
Priya S, Karthika Renuka D, Ashok Kumar L. Towards improving speech recognition model with post-processing spell correction using BERT. J Intell Fuzzy Syst. 2022;43(4):4873–82.
Article Google Scholar
Priya MS, Renuka DK, Kumar LA, Rose SL. Multilingual low resource Indian language speech recognition and spell correction using Indic BERT. Sādhanā. 2022;47(4):227.
Article Google Scholar
Ravanelli M, Parcollet T, Plantinga P, Rouhe A, Cornell S, Lugosch L, Bengio Y. SpeechBrain: a general-purpose speech toolkit 2021. arXiv preprint arXiv:2106.04624
Rose LS, Kumar LA, Renuka DK. Deep learning using python. Oxford: Wiley; 2019.
Google Scholar
Tian S, Deng K, Li Z, Ye L, Cheng G, Li T, Yan Y. Knowledge distillation For CTC-based speech recognition via consistent acoustic representation learning. Proc Interspeech. 2022;2022:2633–7.
Article Google Scholar
Wang Y, Zhao J. Continuous speech recognition model based on CTC technology. In: 2018 International conference on network, communication, computer engineering (NCCE 2018). Atlantis Press. 2018, May. p. 149–152.
Yang X, Li Q, Zhang C, Woodland PC. Knowledge distillation from multiple foundation models for end-to-end speech recognition 2023. arXiv preprint arXiv:2303.1091710.4, 11.2 next one
Yi J, Tao J, Wen Z, Liu B. Distilling knowledge using parallel data for far-field speech recognition 2018. arXiv preprint arXiv:1802.06941.
Yuan Z, Lyu Z, Li J, Zhou X. An improved hybrid ctc-attention model for speech recognition 2018. arXiv preprint arXiv:1810.12020.
Zhang W, Chang X, Qian Y, Watanabe S. Improving end-to-end single-channel multi-talker speech recognition. IEEE/ACM Trans Audio, Speech Lang Process. 2020;28:1385–94.
Article Google Scholar

Download references

Acknowledgements

Our sincere thanks to the Department of Science and Technology, Government of India for funding this project under the Department of Science and Technology Interdisciplinary Cyber-Physical Systems (DST-ICPS) scheme (Grant no. T88).

Author information

Authors and Affiliations

PSG College of Technology, Coimbatore, India
L. Ashok Kumar & D. Karthika Renuka
Department of IT, PSG College of Technology, Coimbatore, India
K. S. Naveena & S. Sree Resmi

Authors

L. Ashok Kumar
View author publications
You can also search for this author in PubMed Google Scholar
D. Karthika Renuka
View author publications
You can also search for this author in PubMed Google Scholar
K. S. Naveena
View author publications
You can also search for this author in PubMed Google Scholar
S. Sree Resmi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to L. Ashok Kumar.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ashok Kumar, L., Karthika Renuka, D., Naveena, K.S. et al. CRDNN-BiLSTM Knowledge Distillation Model Towards Enhancing the Automatic Speech Recognition. SN COMPUT. SCI. 5, 304 (2024). https://doi.org/10.1007/s42979-024-02608-8

Download citation

Received: 22 December 2022
Accepted: 04 January 2024
Published: 06 March 2024
DOI: https://doi.org/10.1007/s42979-024-02608-8

CRDNN-BiLSTM Knowledge Distillation Model Towards Enhancing the Automatic Speech Recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

End-to-end emotional speech recognition using acoustic model adaptation based on knowledge distillation

Research on Khalkha Dialect Mongolian Speech Recognition Acoustic Model Based on Weight Transfer

Cross-Modal Knowledge Distillation for Audiovisual Speech Recognition

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

CRDNN-BiLSTM Knowledge Distillation Model Towards Enhancing the Automatic Speech Recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

End-to-end emotional speech recognition using acoustic model adaptation based on knowledge distillation

Research on Khalkha Dialect Mongolian Speech Recognition Acoustic Model Based on Weight Transfer

Cross-Modal Knowledge Distillation for Audiovisual Speech Recognition

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now