More Web Proxy on the site http://driver.im/

article

Speech recognition in a dialog system: from conventional to deep processing

Authors:

Aldonso Becerra,

J. Ismael De La Rosa,

Efrén GonzálezAuthors Info & Claims

Multimedia Tools and Applications, Volume 77, Issue 12

Pages 15875 - 15911

https://doi.org/10.1007/s11042-017-5160-5

Published: 01 June 2018 Publication History

Abstract

The aim of this paper is to illustrate an overview of the automatic speech recognition (ASR) module in a spoken dialog system and how it has evolved from the conventional GMM-HMM (Gaussian mixture model - hidden Markov model) architecture toward the recent nonlinear DNN-HMM (deep neural network) scheme. GMMs have dominated for a long time the baseline of speech recognition, but in the past years with the resurgence of artificial neural networks (ANNs), the former models have been surpassed in most recognition tasks. An outstanding consideration for ANNs-based acoustic model is the fact that their weights can be adjusted in two training steps: i) initialization of the weights (with or without pre-training) and ii) fine-tuning. To exemplify these frameworks, a case study is realized by using the Kaldi toolkit, employing a mid-vocabulary with a personalized speaker-independent voice corpus on a connected-words phone dialing environment operated for recognition of digit strings and personal name lists in Spanish from Mexico. The obtained results show a reasonable accuracy in the speech recognition performance through the DNN acoustic modeling. A word error rate (WER) of 1.49% for context-dependent DNN-HMM is achieved, providing a 30% relative improvement with regard to the best GMM-HMM result in these experiments (2.12% WER).

References

[1]

Ali A, Zhang Y, Cardinal P, Dahak N, Vogel S, Glass J (2014) A complete KALDI recipe for building Arabic speech recognition systems. In: Proceeedings of IEEE Workshop Spokoen Language Technology (SLT), pp 525---529.

[2]

Anusuya MA, Katti SK (2009) Speech recognition by machine: a review. Int J Comput Sci Inf Secur 6(2):181---205

[3]

Bacchiani M, Senior A, Heigold G (2014) Asynchronous, Online, GMM-free training of a context dependent acoustic model for speech recognition. In: Proceedings of European Conference on Speech Communication and Technology, pp 1900---1904

[4]

Bengio Y, Lamblin P, Popovici D, Larochelle H (2006) Greedy layer-wise training of deep networks. In: Proceedings of Neural Information Processing Systems, pp 153---160

Digital Library

[5]

Bilmes J (2006) What HMMs can do. IEICE Trans Inf Syst E89-D(3):869---891

Digital Library

[6]

Bishop C (2006) Pattern recognition and machine learning. Springer, NY

Digital Library

[7]

Cai M, Shi Y, Liu J (2013) Deep maxout neural networks for speech recognition. In: Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop, pp 291---296

[8]

Chen X, Eversole A, Li G, Yu D, Seide F (2012) Pipelined back-propagation for context-dependent deep neural networks. In: Proceedings of INTERSPEECH

[9]

Dahl GE, Yu D, Deng L, Acero A (2011) Large vocabulary continuous speech recognition with context-dependent DBN-HMMs. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp 4688---4691

[10]

Dahl G E, Yu D, Deng L, Acero A (2012) Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Trans Audio Speech Lang Process 20(1):30---42

Digital Library

[11]

Dahl GE, Sainath TN, Hinton G (2013) Improving deep neural networks for LVCSR using rectified linear units and dropout. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp 8609---8613

[12]

Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition. IEEE Trans Acoust Speech, Signal Process ASSP-28 (4):357---366

[13]

Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J Royal Statist Soc 39(1):1---38

[14]

Deng L (2014) A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Trans Signal Inf Process 3:e2.

[15]

Deng L, Li X (2013) Machine learning paradigms for speech recognition: an overview. IEEE Trans Audio Speech, Lang Process 21(5):1060---1089

Digital Library

[16]

Deng L, Yu D (2014) Deep learning: methods and applications. Now Plublishers, Washington

Digital Library

[17]

Deng L, Kenny P, Lennig M, Gupta V, Seitz F, Mermelstein P (1991) Phonemic hidden Markov models with continuous mixture output densities for large vocabulary word recognition. IEEE Trans Signal Process 39(7):1677---1681

Digital Library

[18]

Deng L, Yu D, Platt J (2012) Scalable stacking and learning for building deep architectures. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp 2133---2136

[19]

Deng L, Hinton G, Kingsbury B (2013) New types of deep neural network learning for speech recognition and related applications: an overview. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp 8599---8603

[20]

Deng L, Li J, Huang JT, Yao K, Yu D, Seide F, Seltzer ML, Zweig G, He X, Williams J, Gong Y, Acero A (2013) Recent advances in deep learning for speech research at Microsoft. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp 8604---8608.

[21]

Duda R, Hart P, Stork D (2001) Pattern classification. Wiley, NY

Digital Library

[22]

Gales MJF, Young SJ (2007) The application of hidden Markov models in speech recognition. Found Trends Signal Process 1(3):195---304

Digital Library

[23]

Gauvain J, Lee Ch (1994) Maximum a posteriori estimation for multivariate gaussian mixture observations of Markov chains. IEEE Trans Speech Audio Process 2 (2):291---298

[24]

Gose E, Johnsonbaugh R, Jost S (1996) Pattern recognition and image analysis. Prentice-Hall, New Jersey

Digital Library

[25]

Gupta S, Jaafar J, wan Ahmad WF, Bansal A (2013) Feature extraction using MFCC. Signal Image Process: Int J 4(4):101---108

[26]

Heigold G, Ney H, Schlüter R, Wiesler S (2012) Discriminative training for automatic speech recognition: modeling, criteria, optimization, implementation, and performance. IEEE Signal Process Mag 29(6):58---69

[27]

Heigold G, Ney H, Schlüter R (2013) Investigations on an EM-style optimization algorithm for discriminative training of HMMs. IEEE Trans Audio Speech Lang Process 21(12):2616--- 2626

Digital Library

[28]

Hen Hu Y, Hwang J (2002) Handbook of neural networks signal processing. CRC Press, Florida

Digital Library

[29]

Hinton G (2010) A practical guide to training restricted Boltzmann machines. Technical Report UTML TR, pp 2010---003

[30]

Hinton G, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504---507

Digital Library

[31]

Hinton G, Osindero S, Teh Y (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527---1554

Digital Library

[32]

Hinton G, Deng L, Yu D, Dahl GE, Mohamed A, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN, Kingsbury B (2012) Deep neural networks for acustic modeling in speech recognition. IEEE Signal Process Mag 29(6):82---97

[33]

Hinton G, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R (2012) Improving neural networks by preventing co-adaptation of feature detector, arXiv:1207.0580v1

[34]

Hinton GE (2002) Training products of experts by minimizing contrastive divergence. Neural Comput 14:1771---1800.

Digital Library

[35]

Huang X, Acero A, Hon H (2001) Spoken language processing: a guide to theory, algorithm and system development. Prentice Hall, NJ

Digital Library

[36]

Huang Y, Yu D, Liu C, Gong Y (2014) A comparative analytic study on the Gaussian mixture and context dependent deep neural network hidden Markov models. In: Proceedings of INTERSPEECH 2014, pp 1895---1899

[37]

Huang Z, Li J, Weng Ch, Lee Ch (2014) Beyond cross-entropy: towards better frame-level objective functions for deep neural network training in automatic speech recognition. In: Proceeedings of INTERSPEECH 2014, pp 1214---1218

[38]

Jaitly N (2014) Exploring deep learning methods for discovering features in speech signals. Dissertation. University of Toronto, Toronto

[39]

Jaitly N, Hinton G (2013) Using an autoencoder with deformable templates to discover features for automated speech recognition. In: Proceedings of INTERSPEECH, pp 1737---1740

[40]

Jaitly N, Nguyen P, Senior A, Vanhoucke V (2012) Application of pretrained deep neural networks to large vocabulary conversational speech recognition. UTML TR

[41]

Jiang H (2010) Discriminative training of HMMs for automatic speech recognition: A survey. Comput Speech Lang 24(4):589---608

Digital Library

[42]

Juang BH, Levinson SE, Sondhi M (1986) Maximum likelihood estimation for multivariate mixture observations of Markov chains. IEEE Transactions on Information Theory IT-32(2):307---309

Digital Library

[43]

Jurafsky D, Martin J (2008) Speech and language processing: an introduction to natural language processing, computational linguistics and speech recognition. Pearson, NJ

Digital Library

[44]

Kaur K, Jain N (2015) Feature extraction and classification for automatic speaker recognition system --- a review. Int J Adv Res Comput Sci Softw Eng 5(1):1---6

[45]

Li J, Yu D, Huang JT, Gong Y (2012) Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM. In: Proceedings of IEEE Workshop on Spoken Language Technology SLT, pp 131---136.

[46]

Li X, Yang Y, Pang Z, Wu X (2015) A comparative study on selecting acoustic modeling units in deep neural networks based large vocabulary chinese speech recognition. Neurocomputing 170:251---256

Digital Library

[47]

Maas A, Hannun A, Ng A (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of International Conference on Machine Learning

[48]

Macho D, Mauuary L, Noé B, Cheng YM, Ealey D, Jou-vet D, Kelleher H, Pearce D, Saadoun F (2002) Evaluation of a noise-robust DSR front-end on Aurora databases. In: Proceedings of International Conference on Spoken Language Processing, pp 16---20

[49]

McLachlan G (1988) Mixture models. Marcel Dekker, New York

[50]

Miao Y, Metze F (2013) Improving low-resource CD-DNN-HMM using dropout and multilingual DNN training. In: Proceedings of INTERSPEECH 2013, pp 2237---2241

[51]

Mohamed A, Dahl GE, Hinton G (2012) Acoustic modeling using deep belief networks. IEEE Trans Audio Speech, Lang Process 20(1):14---22

Digital Library

[52]

Mohamed A, Dahl GE, Hinton G (2009) Deep Belief Networks for phone recognition. In: Proceedings of NIPS Workshop on Deep Learning for Speech Recognition and Related Applications

[53]

Morgan N, Bourlard H (1995) An introduction to hybrid HMM/connectionist continuous speech recognition. IEEE Signal Process Mag 12(3):25---42

[54]

Nakagawa S, Zhang W, Takahashi M (2006) Text-independent/text-prompted speakers recognition by combining speaker-specific GMM with speaker adapted syllable-based HMM. IEICE Trans Inf Syst E89-D(3):1058---1065

Digital Library

[55]

Niu J, Xie L, Jia L, Hu N (2013) Context-dependent deep neural networks for commercial Mandarin speech recognition applications. In: Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference

[56]

Noguchi H, Miura K, Fujinaga T, Sugahara T, Kawaguchi H, Yoshimoto M (2011) VLSI Architecture of GMM Processing and Viterbi Decoder for 60,000-Word Real-Time Continuous Speech Recognition. IEICE Trans Electron E94C(4):458---467

[57]

Pan J, Liu C, Wang Z, Hu Y, Jiang H (2012) Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: why DNN surpass GMMs in acoustic modeling. In: Proceedings of International Symposium on Chinese Spoken Language Processing, pp 301---305

[58]

Picone JW (1993) Signal modeling techniques in speech recognition. Proc IEEE 81(9):1215---1247

[59]

Povey D, Burget L, Agarwal M, Akyazi P, Kai F, Ghoshal A, Glembekb O, Goel N, Karafiát M, Rastrowh A, Rose R, Schwarz P, Thomash S (2011) The subspace Gaussian mixture model - A structured model for speech recognition. Comput Speech Lang 25(2):404---439

Digital Library

[60]

Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O, Goel N, Hannemann M, Motlicek P, Qian Y, Schwarz P, Silovsky J, Stemmer G, Vesely K (2011) The Kaldi speech recognition toolkit. In: Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop

[61]

Rabiner L (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proceed IEEE 77(2):257---286

[62]

Rabiner L, Juang B (1993) Fundamentals of speech recognition. Prentice-Hall, New Jersey

Digital Library

[63]

Rabiner L, Schafer R (2007) Introduction to digital speech processing. Found Trends Signal Process 1(1-2):1---194

Digital Library

[64]

Rath S, Povey D, Vesel K, Cernock J (2013) Improved feature processing for deep neural networks. In: Proceedings of INTERSPEECH 2013, pp 109---113

[65]

Reynolds DA, Quatieri TF, Dunn TRB (2000) Speaker verification using adapted gaussian mixture models. Digit Signal Process 10(1):19---41

Digital Library

[66]

Rumelhart DE, Hinton G, Williams RJ (1986) Learning representations by back-propagating errors. Nature f323:533---536

[67]

Sainath TN, Kingsbury B, Ramabhadran B, Fousek P, Novak P, Mohamed A (2011) Making Deep Belief Networks effective for large vocabulary continuous speech recognition. In: Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop, pp 30---35

[68]

Sainath T N, Kingsbury B, Ramabhadran B (2012) Improving training time of deep belief networks through hybrid pre-training and larger batch sizes. In: Proceedings of Neural Information Processing Systems, Workshop on Log-linear Models

[69]

Sainath TN, Mohamed A, Kingsbury B, Ramabhadran B (2013) Deep Convolutional neural networks for LVCSR. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp 8614---8618

[70]

Saon G, Chien J (2012) Large-vocabulary continuous speech recognition systems: a look at some recent advances. IEEE Signal Process Mag 29(6):18---33

[71]

Saon G, Chien J (2012) Recent developments in large vocabulary continuous speech recognition. In: Proceedings of Asia Pacific Signal and Information Processing Association

[72]

Scowen R (1993) Extended bnf - generic base standards. In: Proceedings of Software Engineering Standards Symp, pp 25---34

[73]

Seide F, Li G, Chen X, Yu D (2011) Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop, pp 24---29

[74]

Seide F, Li G, Yu D (2011) Conversational speech transcription using context-dependent deep neural networks. In: Proceedings of INTERSPEECH 2011, pp 437---440

[75]

Seltzer ML, Yu D, Wang Y (2013) An Investigation of deep neural networks for noise robust speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp 7398---7402

[76]

Senior A, Heigold G, Bacchiani M, Liao H (2014) GMM-free DNN training. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp 5639---5643

[77]

Sharma S, Ellis D, Kajarekar S, Jain P, Hermansky H (2000) Feature extraction using non-linear transformation for robust speech recognition on the aurora database. In: Proceedings of IEEE International Conference on Acoustics, Speechs and Signal Processing, pp II1117---II1120

Digital Library

[78]

Siniscalchi SM, Yu D, Deng L, Lee Ch (2012) Exploiting deep neural networks for detection- based speech recognition. Neurocomputing 106(2013):148---157

Digital Library

[79]

Stahlberg F, Schlippe T, Stephan V, Schultz T (2014) Towards automatic speech recognition without pronunciation dictionary, transcribed speech and text resources in the target language using cross-lingual word-to-phoneme alignment. In: Proceedings of Workshop on Spoken Language Technologies for Under-Resourced Languages, pp 73---80

[80]

Strik H, Russel A, Van Den Heuvel H, Cucchiarini C, Boves L (1997) A spoken dialog system for the dutch public transport information service. Int J Technol 2:121---131

[81]

Tao D, Cheng Y, Song M, Lin X (2016) Manifold ranking-based matrix factorization for saliency detection. IEEE Trans Neural Netw Learn Syst 27(6):1122---1134

[82]

Tao D, Lin X, Jin L, Li X (2016) Principal component 2-D long short-term memory for font recognition on single chinese characters. IEEE Trans Cybern 46(3):756---765

[83]

Tao D, Guo Y, Song M, Li Y, Yu Z, Tang Y (2016) Person re-identification by dual-regularized KISS metric learning. IEEE Trans Image Process 25(6):2726---2738

Digital Library

[84]

Trentin E, Gori M (2001) A survey of hybrid ANN/HMM models for automatic speech recognition. Neurocomputing 37:91---126

[85]

Vesely K, Ghoshal A, Burget L, Povey D (2013) Sequence-discriminative training of deep neural networks. In: Proceedings of INTERSPEECH 2013, pp 2345---2349

[86]

Vesely K, Hannemann M, Burget L (2013) Semi-Supervised training of Deep Neural Networks. In: Proceedings of IEEE Conference of Automatic Speech Recognition and Understanding Workshop, pp 267---272

[87]

Wang G (2014) Context-dependent acoustic modelling for speech recognition. Dissertation. National University of Singapur, Singapur

[88]

Xu Y, Du J, Dai L R, Lee C h (2014) An experimental study on speech enhancement based on deep neural networks. IEEE Signal Process Lett 21(1):1070---9908

[89]

Yao K, Yu D, Seide F, Su H, Deng L, Gong Y (2012) Adaptation of context-dependent deep neural networks for automatic speech recognition. In: Proceedings of IEEE Spoken Language Technology Workshop, pp 366---369

[90]

Young S (1996) Large vocabulary continuous speech recognition: a review. IEEE Signal Process Mag 13(5):45---57

[91]

Young S (2008) HMMs and related speech recognition technologies. In: Benesty J (ed) Springer Handbook of Speech Processing. Springer Berlin Heidelberg, Berlin, pp 539---558

[92]

Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P (2006) The HTK Book (for version 3.4). Cambridge University Engineering Department, UK

[93]

Yu D, Deng L (2015) Automatic speech recognition: a deep learning approach. Springer, London

Digital Library

[94]

Yu D, Deng L, Dahl GE (2010) Roles of pretraining and fine-tuning in context-dependent DBN-HMMs for real-world speech recognition. In: Proceedings of NIPS Workshop on Deep Learning and Unsupervised Feature Learning

[95]

Yu D, Deng L, Li G, Seide F (2011) Discriminative pretraining of deep neural networks. Patent Filing, US

[96]

Zhang C, Woodland PC (2014) Standalone training of context-dependent deep neural network acoustic models. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp 5597---5601

[97]

Zhang S, Bao Y, Zhou P, Jiang H, Li-Rong D (2014) Improving deep neural networks for LVCSR using dropout and shrinking structure. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp 6899---6903

[98]

Zhang X, Trmal J, Povey D, Khudanpur S (2014) Improving deep neural network acoustic models using generalized maxout networks. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing.

Cited By

Becerra ARosa JVelásquez EZepeda GEscalante NPedroza A(2024)Portable student attendance management module for university environment by using biometric mechanismsMultimedia Tools and Applications10.1007/s11042-023-15482-y83:1(1215-1239)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1007/s11042-023-15482-y
Kaur NSingh P(2022)Speech waveform reconstruction from speech parameters for an effective text to speech synthesis system using minimum phase harmonic sinusoidal model for PunjabiMultimedia Tools and Applications10.1007/s11042-022-12850-y81:18(26101-26120)Online publication date: 1-Jul-2022
https://dl.acm.org/doi/10.1007/s11042-022-12850-y
Maity KPradhan GSingh J(2021)A Pitch and Noise Robust Keyword Spotting System Using SMAC Features with Prosody ModificationCircuits, Systems, and Signal Processing10.1007/s00034-020-01565-w40:4(1892-1904)Online publication date: 1-Apr-2021
https://dl.acm.org/doi/10.1007/s00034-020-01565-w
Show More Cited By

Speech recognition in a dialog system: from conventional to deep processing
1. Computing methodologies
2. Hardware
  1. Communication hardware, interfaces and storage
  2. Power and energy
    1. Power estimation and optimization

Recommendations

Phoneme and tonal accent recognition for Thai speech
Highlights
► Phoneme recognition with soft phoneme segmentation procedure for Thai speech. ► Recognition system classifies phonemes using discrete hidden Markov models. ► MFPLP is better than MFCC as features in phoneme ...
Abstract
In this paper, we investigate the application of a phoneme recognition system with a soft phoneme segmentation procedure for Thai speech. In addition, we propose a new method to classify the tonal accent of a syllable. The recognition ...
Development of HMM/Neural Network-Based Medium-Vocabulary Isolated-Word Lithuanian Speech Recognition System

The development of Lithuanian HMM/ANN speech recognition system, which combines artificial neural networks (ANNs) and hidden Markov models (HMMs), is described in this paper. A hybrid HMM/ANN architecture was applied in the system. In this architecture, ...
Psycho-acoustics inspired automatic speech recognition
Abstract
Understanding the human spoken language recognition process is still a far scientific goal. Nowadays, commercial automatic speech recognisers (ASRs) achieve high performance at recognising clean speech, but their approaches are poorly ...
Highlights
- We propose a novel Automatic Speech Recognizer inspired by psycho-acoustic studies.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Multimedia Tools and Applications

Multimedia Tools and Applications Volume 77, Issue 12

June 2018

1482 pages

ISSN:1380-7501

Issue’s Table of Contents

Copyright © Copyright © 2018 Springer Science+Business Media, LLC, part of Springer Nature.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 June 2018

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Becerra ARosa JVelásquez EZepeda GEscalante NPedroza A(2024)Portable student attendance management module for university environment by using biometric mechanismsMultimedia Tools and Applications10.1007/s11042-023-15482-y83:1(1215-1239)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1007/s11042-023-15482-y
Kaur NSingh P(2022)Speech waveform reconstruction from speech parameters for an effective text to speech synthesis system using minimum phase harmonic sinusoidal model for PunjabiMultimedia Tools and Applications10.1007/s11042-022-12850-y81:18(26101-26120)Online publication date: 1-Jul-2022
https://dl.acm.org/doi/10.1007/s11042-022-12850-y
Maity KPradhan GSingh J(2021)A Pitch and Noise Robust Keyword Spotting System Using SMAC Features with Prosody ModificationCircuits, Systems, and Signal Processing10.1007/s00034-020-01565-w40:4(1892-1904)Online publication date: 1-Apr-2021
https://dl.acm.org/doi/10.1007/s00034-020-01565-w
Becerra ARosa JGonzález EPedroza AEscalante N(2018)Training deep neural networks with non-uniform frame-level cost function for automatic speech recognitionMultimedia Tools and Applications10.1007/s11042-018-5917-577:20(27231-27267)Online publication date: 1-Oct-2018
https://dl.acm.org/doi/10.1007/s11042-018-5917-5

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents