More Web Proxy on the site http://driver.im/

article

Discriminative training of HMMs for automatic speech recognition: A survey

Author:

Hui JiangAuthors Info & Claims

Computer Speech and Language, Volume 24, Issue 4

Pages 589 - 608

https://doi.org/10.1016/j.csl.2009.08.002

Published: 01 October 2010 Publication History

Abstract

Recently, discriminative training (DT) methods have achieved tremendous progress in automatic speech recognition (ASR). In this survey article, all mainstream DT methods in speech recognition are reviewed from both theoretical and practical perspectives. From the theoretical aspect, many effective discriminative learning criteria in ASR are first introduced and then a unifying view is presented to elucidate the relationship among these popular DT criteria originally proposed from different viewpoints. Next, some key optimization methods used to optimize these criteria are summarized and their convergence properties are discussed. Moreover, as some recent advances, a novel discriminative learning framework is introduced as a general scheme to formulate discriminative training of HMMs for ASR, from which a variety of new DT methods can be developed. In addition, some important implementation issues regarding how to conduct DT for large vocabulary ASR are also discussed from a more practical aspect, such as efficient implementation of discriminative training on word graphs and effective optimization of complex DT objective functions in high-dimensionality space, and so on. Finally, this paper is summarized and concluded with some possible future research directions for this area. As a technical survey, all DT techniques and ideas are reviewed and discussed in this paper from high level without involving too much technical detail and experimental result.

References

[1]

Afify, M., Li, X.W., Jiang, H., 2005. Statistical performance analysis of MCE/GPD learning in gaussian classifiers and hidden Markov models. In: Proceedings of ICASSP-05, Philadelphia, Pennsylvania.

[2]

Afify, M., 2005. Extended Baum-Welch reestimation of gaussian mixture models based on reverse jensen inequality. In: Proceedings of Interspeech 2005, Lisboa.

[3]

Statistical analysis of minimum classification error learning for gaussian and hidden Markov model classifiers. IEEE Transactions on Audio, Speech and Language Processing. v15 i8. 2405-2417.

[4]

Altun, Y., Tsochantaridis, I., Hofmann, T., 2003. Hidden Markov support vector machines. In: Proceedings of the 20th International Conference on Machine Learning (ICML-2003), Washington, DC.

[5]

Discriminative estimation of subspace constrained Gaussian mixture models for speech recognition. IEEE Transactions on Audio, Speech and Language Processing. v15 i1. 172-189.

[6]

Bahl, L.R., Brown, P.F., De Souza, P.V., Mercer, R.L., 1986. Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP'86), Tokyo, Japan, pp. 49-52.

[7]

An inequality with applications to statistical prediction for functions of Markov processes and to a model of ecology. Bulletin of the American Mathematical Society. v73. 360-363.

[8]

Growth transformation for functions on manifolds. Pacific Journal of Mathematics. v27 i2. 211-227.

[9]

A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics. v41 i1. 164-171.

[10]

Fundamentals of Statistical Exponential Families, with Applications in Statistical Decision Theory. Institute of Mathematical Statistics, Hayward, California.

[11]

Brown, P., 1987. The acoustic modeling problem in automatic speech recognition, Ph.D. Dissertation, Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA.

[12]

Chou, W., Juang, B.-H., Lee, C.-H., 1992. Segmental GPD training of HMM based speech recognition. In: Proceedings of IEEE ICASSP92, vol. 1. pp. 473-476.

[13]

Chou, W., Juang, B.-H., Lee, C.-H., 1993. Minimum error rate training based on N-best string models. In: Proceedings of IEEE ICASSP93, vol. 2. pp. 652-655.

[14]

Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society B. v39. 1-38.

[15]

An inequality for rational functions with applications to some statistical estimation problems. IEEE Transactions on Information Theory. v37 i1. 107-113.

[16]

Gunawardana, A., Byrne, W., 2001. Discriminative speaker adaptation with conditional maximum likelihood linear regression. In: Proceedings of Eurospeech 2001.

[17]

Discriminative learning in sequential pattern recognition: a unifying review for optimization-based speech recognition. IEEE Signal Processing Magazine. 14-36.

[18]

Jaakkola, T., Haussler, D., 1998. Exploiting generative models in discriminative classifiers. In: Proceedings of Advances in Neural Information Processing Systems, vol. 11.

[19]

Jaakkola, T., Meila, M., Jebara, T., 1999. Maximum entropy discrimination. In: Proceedings of Advances in Neural Information Processing Systems, vol. 12.

[20]

Jebara, T., Pentland, A., 1998. Maximum conditional likelihood via bound maximization and the CEM algorithm. In: Proceedings of Advances in Neural Information Processing Systems, vol. 11.

[21]

Jebara, T., Pentland, A., 2000. On reversing Jensen's inequality. In: Proceedings of NIPS'2000.

[22]

Jebara, T., 2002. Discriminative, Generative and Imitative Learning, Ph.D. Thesis, MIT.

[23]

Robust speech recognition based on Bayesian prediction approach. IEEE Transactions on Speech and Audio Processing. v7 i4. 426-440.

[24]

Jiang, H., 2004. Discriminative training for large margin HMMs, Technical Report CS-2004-01, Department of Computer Science and Engineering, York University.

[25]

A dynamic in-search data selection method with its applications to acoustic modeling and utterance verification. IEEE Transactions on Speech and Audio Processing. v13 i5. 945-955.

[26]

Large Margin Hidden Markov Models for Speech Recognition. IEEE Transactions on Audio, Speech and Language Processing. v14 i5. 1584-1595.

[27]

Jiang, H., Li, X., 2007. Incorporating training errors for large margin HMMs under semi-definite programming framework. In: Proceedings of 2007 IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP'2007), Hawaii, USA.

[28]

Jiang, H., Li, X., 2007. A general approximation-optimization approach to large margin estimation of HMMs. In: V. Kodic (Ed.), Speech Recognition and Synthesis.

[29]

Jiang, H., 2007. A general formulation for discriminative learning of graphical models, Technical Report, Department of Computer Science and Engineering, York University, Toronto, Canada.

[30]

Graphical models. Statistical Science. v19 iSpecial Issue on Bayesian Statistics. 140-155.

[31]

Discriminative learning for minimum error classification. IEEE Transactions on Signal Processing. v40. 3043-3054.

[32]

Minimum classification error rate methods for speech recognition. IEEE Transactions on Speech and Audio Processing. v5 i3. 257-265.

[33]

Kapadia, S., 1998. Discriminative training of hidden Markov models, Ph.D. Dissertation, Engineering Department, Cambridge University, UK.

[34]

Pattern recognition using a generalized probabilistic descent method. Proceedings of the IEEE. v86 i11. 2345-2373.

[35]

Li, X., Jiang, H., Liu, C.-J., 2005. Large margin HMMs for speech recognition. In: Proceedings of 2005 IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP'2005), Philadelphia, Pennsylvania, pp.V513-V516.

[36]

Li, X., Jiang, H., 2005. A constrained joint optimization method for large margin HMM estimation. In: Proceedings of 2005 IEEE workshop on Automatic Speech Recognition and Understanding.

[37]

Li, X., 2005. Large Margin Hidden Markov Models for Speech Recognition. M.S. Thesis, Department of Computer Science and Engineering, York University, Canada.

[38]

Li, X., Jiang, H., 2006. Solving large margin HMM estimation via semi-definite programming. In: Proceedings of 2006 International Conference on Spoken Language Processing (ICSLP'2006), Pittsburgh, USA.

[39]

Solving large margin hidden Markov model estimation via semidefinite programming. IEEE Transactions on Audio, Speech and Language Processing. v15 i8. 2383-2392.

[40]

Liu, C.-J., Jiang, H., Li, X., 2005. Discriminative training of CDHMMs for Maximum relative separation margin. In: Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP'2005), Philadelphia, Pennsylvania, pp. V101-V104.

[41]

Liu, C.-J., Jiang, H., Rigazio, L., 2005. Maximum relative margin estimation of HMMs based on N-best string models for continuous speech recognition. In: Proceedings of IEEE workshop on Automatic Speech Recognition and Understanding.

[42]

Liu, C., Liu, P., Jiang, H., Soong, F., Wang, R.-H., 2007. A constrained line search optimization for discriminative training in speech recognition. In: Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP'2007), Hawaii, USA.

[43]

A constrained line search optimization method for discriminative training of HMMs. IEEE Transactions on Audio, Speech and Language Processing. v16 i5. 900-909.

[44]

Macherey, W., Haferkamp, L., Schluter, R., Ney, R., 2005. Investigations on error minimizing training criteria for discriminative training in automatic speech recognition. In: Proceedings of Interspeech, Lisbon, Portugal, pp. 2133-2136.

[45]

McDermott, E., Hazen, T.J., 2004. Minimum classification error training of landmark models for real-time continuous speech recognition. In: Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP'2004), pp. I-937-I-940.

[46]

Discriminative training for large-vocabulary speech recognition using minimum classification error. IEEE Transactions on Audio, Speech, and Language Processing. v15 i1. 203-223.

[47]

Empirical risk minimization versus maximum likelihood estimation: a case study. Neural Computation. v7 i1. 144-157.

[48]

A decision theoretic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood. IEEE Transactions on Acoustics, Speech and Signal Processing. v31 i4. 814-817.

[49]

On a model-robust training method for speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing. v36 i9. 1432-1436.

[50]

A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan, M.I. (Ed.), Learning in Graphical Models, Kluwer Academic Publishers. pp. 355-368.

[51]

High-performance connected digit recognition using maximum mutual information estimation. IEEE Transactions on Speech and Audio Processing. v2 i2.

[52]

Pan, Z., Jiang, H., 2008. Large margin multinomial mixture model for text categorization. In: Proceedings of Interspeech 2008, Brisbane, Australia.

[53]

Pan, Z., 2008. Large margin multinomial model for document classification, Master Thesis, Department of Computer Science and Engineering, York University, Toronto, Canada.

[54]

Povey, D., Woodland, P.C., 2002. Minimum phone error and I-smoothing for improved discriminative training. In: Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP'2002), Orlando.

[55]

Povey, D., 2004. Discriminative training for large vocabulary speech recognition, Ph.D. Dissertation, Cambridge University, Cambridge, UK.

[56]

Deterministically Annealed Design of Hidden Markov Model Speech Recognizers. IEEE Transactions on Speech and Audio Processing. v9 i2. 111-126.

[57]

Deterministic annealing for clustering, compression, classification, regression, and related optimization problems. Proceedings of the IEEE. v86 i11. 2210-2239.

[58]

Learning with Kernels: Support vector Machine, Regularization, Optimization, and Beyond. The MIT Press, Cambridge.

[59]

Schluter, R., 2000. Investigations on discriminative training criteria, Ph.D. Dissertation, RWTH Aachenm University Technology, Aachen, Germany.

[60]

Smola, A.J., Bartlett, P., Scholkopf, B., Schuurmans, D., (Ed.), 2000. Advances in Large Margin Classifiers, The MIT Press.

[61]

Taskar, B., Guestrin, C., Koller, D., 2003. Max-margin Markov networks. In: Proceedings of Neural Information Processing Systems Conference (NIPS03), Vancouver, Canada.

[62]

Valtchev, V., 1995. Discriminative methods for HMM-based speech recognition, Ph.D. Dissertation, Cambridge University, UK.

[63]

Statistical Learning Theory. Wiley.

[64]

Confidence measures for large vocabulary continuous speech recognition. IEEE Transactions on Speech and Audio Processing. v9 i3. 288-298.

[65]

Large scale discriminative training of hidden Markov models for speech recognition. Computer Speech & Language. v16 i1. 25-47.

[66]

Yin, Y., Jiang, H., 2007. A fast optimization method for large margin estimation of HMMs based on second order cone programming. In: Proceedings of Interspeech 2007.

[67]

Yin, Y., 2007. A study of convex optimization for discriminative training of hidden Markov models in automatic speech recognition, Master Thesis, Department of Computer Science and Engineering, York University, Toronto, Canada.

[68]

Yin, Y., Jiang, H., 2007. A compact semidefinite programming (SDP) formulation for large margin estimation of HMMs in speech recognition. In: Proceedings of 2007 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Kyoto, Japan.

Cited By

Kadyan VDua MDhiman P(2021)Enhancing accuracy of long contextual dependencies for Punjabi speech recognition system using deep LSTMInternational Journal of Speech Technology10.1007/s10772-021-09814-224:2(517-527)Online publication date: 1-Jun-2021
https://dl.acm.org/doi/10.1007/s10772-021-09814-2
Ceritli TWilliams CGeddes J(2020)ptype: probabilistic type inferenceData Mining and Knowledge Discovery10.1007/s10618-020-00680-134:3(870-904)Online publication date: 1-May-2020
https://dl.acm.org/doi/10.1007/s10618-020-00680-1
Rizk YAwad MTunstel E(2019)Cooperative Heterogeneous Multi-Robot SystemsACM Computing Surveys10.1145/330384852:2(1-31)Online publication date: 9-Apr-2019
https://dl.acm.org/doi/10.1145/3303848
Show More Cited By

Discriminative training of HMMs for automatic speech recognition: A survey

Recommendations

Semi-supervised and unsupervised discriminative language model training for automatic speech recognition

We investigate supervised, semi-supervised and unsupervised training of DLMs.We use supervised and unsupervised confusion models to generate artificial data.We propose three target output selection methods for unsupervised DLM training.Ranking ...
Noisy speech recognition performance of discriminative HMMs
ISCSLP'06: Proceedings of the 5th international conference on Chinese Spoken Language Processing

Discriminatively trained HMMs are investigated in both clean and noisy environments in this study. First, a recognition error is defined at different levels including string, word, phone and acoustics. A high resolution error measure in terms of minimum ...
Noise Adaptive Training for Robust Automatic Speech Recognition

In traditional methods for noise robust automatic speech recognition, the acoustic models are typically trained using clean speech or using multi-condition data that is processed by the same feature enhancement algorithm expected to be used in decoding. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Computer Speech and Language

Computer Speech and Language Volume 24, Issue 4

October, 2010

229 pages

ISSN:0885-2308

Issue’s Table of Contents

Copyright © Elsevier Ltd © 2009.

Publisher

Academic Press Ltd.

United Kingdom

Publication History

Published: 01 October 2010

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kadyan VDua MDhiman P(2021)Enhancing accuracy of long contextual dependencies for Punjabi speech recognition system using deep LSTMInternational Journal of Speech Technology10.1007/s10772-021-09814-224:2(517-527)Online publication date: 1-Jun-2021
https://dl.acm.org/doi/10.1007/s10772-021-09814-2
Ceritli TWilliams CGeddes J(2020)ptype: probabilistic type inferenceData Mining and Knowledge Discovery10.1007/s10618-020-00680-134:3(870-904)Online publication date: 1-May-2020
https://dl.acm.org/doi/10.1007/s10618-020-00680-1
Rizk YAwad MTunstel E(2019)Cooperative Heterogeneous Multi-Robot SystemsACM Computing Surveys10.1145/330384852:2(1-31)Online publication date: 9-Apr-2019
https://dl.acm.org/doi/10.1145/3303848
Haridas AMarimuthu RSivakumar V(2018)A critical review and analysis on techniques of speech recognitionInternational Journal of Knowledge-based and Intelligent Engineering Systems10.3233/KES-18037422:1(39-57)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.3233/KES-180374
Becerra ADe La Rosa JGonzález E(2018)Speech recognition in a dialog systemMultimedia Tools and Applications10.1007/s11042-017-5160-577:12(15875-15911)Online publication date: 1-Jun-2018
https://dl.acm.org/doi/10.1007/s11042-017-5160-5
Chung HPlourde EChampagne B(2017)Single-channel enhancement of convolutive noisy speech based on a discriminative NMF algorithm2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2017.7952567(2302-2306)Online publication date: 5-Mar-2017
https://dl.acm.org/doi/10.1109/ICASSP.2017.7952567
Vegesna VGurugubelli KVydana HPulugandla BShrivastava MVuppala A(2017)DNN-HMM Acoustic Modeling for Large Vocabulary Telugu Speech RecognitionMining Intelligence and Knowledge Exploration10.1007/978-3-319-71928-3_19(189-197)Online publication date: 13-Dec-2017
https://dl.acm.org/doi/10.1007/978-3-319-71928-3_19
Ghalwash MRamljak DObradović Z(2015)Patient-specific early classification of multivariate observationsInternational Journal of Data Mining and Bioinformatics10.1504/IJDMB.2015.06795511:4(392-411)Online publication date: 1-Mar-2015
https://dl.acm.org/doi/10.1504/IJDMB.2015.067955
Hughes DFarrow NProfita HCorrell NZhang ZCohen PBohus DHoraud RMeng H(2015)Detecting and Identifying Tactile Gestures using Deep Autoencoders, Geometric Moments and Gesture Level FeaturesProceedings of the 2015 ACM on International Conference on Multimodal Interaction10.1145/2818346.2830601(415-422)Online publication date: 9-Nov-2015
https://dl.acm.org/doi/10.1145/2818346.2830601
Zhou PJiang HDai LHu YLiu Q(2015)State-clustering based multiple deep neural networks modeling approach for speech recognitionIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2015.239294423:4(631-642)Online publication date: 1-Apr-2015
https://dl.acm.org/doi/10.1109/TASLP.2015.2392944
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents