[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Discriminative training of HMMs for automatic speech recognition: A survey

Published: 01 October 2010 Publication History

Abstract

Recently, discriminative training (DT) methods have achieved tremendous progress in automatic speech recognition (ASR). In this survey article, all mainstream DT methods in speech recognition are reviewed from both theoretical and practical perspectives. From the theoretical aspect, many effective discriminative learning criteria in ASR are first introduced and then a unifying view is presented to elucidate the relationship among these popular DT criteria originally proposed from different viewpoints. Next, some key optimization methods used to optimize these criteria are summarized and their convergence properties are discussed. Moreover, as some recent advances, a novel discriminative learning framework is introduced as a general scheme to formulate discriminative training of HMMs for ASR, from which a variety of new DT methods can be developed. In addition, some important implementation issues regarding how to conduct DT for large vocabulary ASR are also discussed from a more practical aspect, such as efficient implementation of discriminative training on word graphs and effective optimization of complex DT objective functions in high-dimensionality space, and so on. Finally, this paper is summarized and concluded with some possible future research directions for this area. As a technical survey, all DT techniques and ideas are reviewed and discussed in this paper from high level without involving too much technical detail and experimental result.

References

[1]
Afify, M., Li, X.W., Jiang, H., 2005. Statistical performance analysis of MCE/GPD learning in gaussian classifiers and hidden Markov models. In: Proceedings of ICASSP-05, Philadelphia, Pennsylvania.
[2]
Afify, M., 2005. Extended Baum-Welch reestimation of gaussian mixture models based on reverse jensen inequality. In: Proceedings of Interspeech 2005, Lisboa.
[3]
Statistical analysis of minimum classification error learning for gaussian and hidden Markov model classifiers. IEEE Transactions on Audio, Speech and Language Processing. v15 i8. 2405-2417.
[4]
Altun, Y., Tsochantaridis, I., Hofmann, T., 2003. Hidden Markov support vector machines. In: Proceedings of the 20th International Conference on Machine Learning (ICML-2003), Washington, DC.
[5]
Discriminative estimation of subspace constrained Gaussian mixture models for speech recognition. IEEE Transactions on Audio, Speech and Language Processing. v15 i1. 172-189.
[6]
Bahl, L.R., Brown, P.F., De Souza, P.V., Mercer, R.L., 1986. Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP'86), Tokyo, Japan, pp. 49-52.
[7]
An inequality with applications to statistical prediction for functions of Markov processes and to a model of ecology. Bulletin of the American Mathematical Society. v73. 360-363.
[8]
Growth transformation for functions on manifolds. Pacific Journal of Mathematics. v27 i2. 211-227.
[9]
A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics. v41 i1. 164-171.
[10]
Fundamentals of Statistical Exponential Families, with Applications in Statistical Decision Theory. Institute of Mathematical Statistics, Hayward, California.
[11]
Brown, P., 1987. The acoustic modeling problem in automatic speech recognition, Ph.D. Dissertation, Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA.
[12]
Chou, W., Juang, B.-H., Lee, C.-H., 1992. Segmental GPD training of HMM based speech recognition. In: Proceedings of IEEE ICASSP92, vol. 1. pp. 473-476.
[13]
Chou, W., Juang, B.-H., Lee, C.-H., 1993. Minimum error rate training based on N-best string models. In: Proceedings of IEEE ICASSP93, vol. 2. pp. 652-655.
[14]
Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society B. v39. 1-38.
[15]
An inequality for rational functions with applications to some statistical estimation problems. IEEE Transactions on Information Theory. v37 i1. 107-113.
[16]
Gunawardana, A., Byrne, W., 2001. Discriminative speaker adaptation with conditional maximum likelihood linear regression. In: Proceedings of Eurospeech 2001.
[17]
Discriminative learning in sequential pattern recognition: a unifying review for optimization-based speech recognition. IEEE Signal Processing Magazine. 14-36.
[18]
Jaakkola, T., Haussler, D., 1998. Exploiting generative models in discriminative classifiers. In: Proceedings of Advances in Neural Information Processing Systems, vol. 11.
[19]
Jaakkola, T., Meila, M., Jebara, T., 1999. Maximum entropy discrimination. In: Proceedings of Advances in Neural Information Processing Systems, vol. 12.
[20]
Jebara, T., Pentland, A., 1998. Maximum conditional likelihood via bound maximization and the CEM algorithm. In: Proceedings of Advances in Neural Information Processing Systems, vol. 11.
[21]
Jebara, T., Pentland, A., 2000. On reversing Jensen's inequality. In: Proceedings of NIPS'2000.
[22]
Jebara, T., 2002. Discriminative, Generative and Imitative Learning, Ph.D. Thesis, MIT.
[23]
Robust speech recognition based on Bayesian prediction approach. IEEE Transactions on Speech and Audio Processing. v7 i4. 426-440.
[24]
Jiang, H., 2004. Discriminative training for large margin HMMs, Technical Report CS-2004-01, Department of Computer Science and Engineering, York University.
[25]
A dynamic in-search data selection method with its applications to acoustic modeling and utterance verification. IEEE Transactions on Speech and Audio Processing. v13 i5. 945-955.
[26]
Large Margin Hidden Markov Models for Speech Recognition. IEEE Transactions on Audio, Speech and Language Processing. v14 i5. 1584-1595.
[27]
Jiang, H., Li, X., 2007. Incorporating training errors for large margin HMMs under semi-definite programming framework. In: Proceedings of 2007 IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP'2007), Hawaii, USA.
[28]
Jiang, H., Li, X., 2007. A general approximation-optimization approach to large margin estimation of HMMs. In: V. Kodic (Ed.), Speech Recognition and Synthesis.
[29]
Jiang, H., 2007. A general formulation for discriminative learning of graphical models, Technical Report, Department of Computer Science and Engineering, York University, Toronto, Canada.
[30]
Graphical models. Statistical Science. v19 iSpecial Issue on Bayesian Statistics. 140-155.
[31]
Discriminative learning for minimum error classification. IEEE Transactions on Signal Processing. v40. 3043-3054.
[32]
Minimum classification error rate methods for speech recognition. IEEE Transactions on Speech and Audio Processing. v5 i3. 257-265.
[33]
Kapadia, S., 1998. Discriminative training of hidden Markov models, Ph.D. Dissertation, Engineering Department, Cambridge University, UK.
[34]
Pattern recognition using a generalized probabilistic descent method. Proceedings of the IEEE. v86 i11. 2345-2373.
[35]
Li, X., Jiang, H., Liu, C.-J., 2005. Large margin HMMs for speech recognition. In: Proceedings of 2005 IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP'2005), Philadelphia, Pennsylvania, pp.V513-V516.
[36]
Li, X., Jiang, H., 2005. A constrained joint optimization method for large margin HMM estimation. In: Proceedings of 2005 IEEE workshop on Automatic Speech Recognition and Understanding.
[37]
Li, X., 2005. Large Margin Hidden Markov Models for Speech Recognition. M.S. Thesis, Department of Computer Science and Engineering, York University, Canada.
[38]
Li, X., Jiang, H., 2006. Solving large margin HMM estimation via semi-definite programming. In: Proceedings of 2006 International Conference on Spoken Language Processing (ICSLP'2006), Pittsburgh, USA.
[39]
Solving large margin hidden Markov model estimation via semidefinite programming. IEEE Transactions on Audio, Speech and Language Processing. v15 i8. 2383-2392.
[40]
Liu, C.-J., Jiang, H., Li, X., 2005. Discriminative training of CDHMMs for Maximum relative separation margin. In: Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP'2005), Philadelphia, Pennsylvania, pp. V101-V104.
[41]
Liu, C.-J., Jiang, H., Rigazio, L., 2005. Maximum relative margin estimation of HMMs based on N-best string models for continuous speech recognition. In: Proceedings of IEEE workshop on Automatic Speech Recognition and Understanding.
[42]
Liu, C., Liu, P., Jiang, H., Soong, F., Wang, R.-H., 2007. A constrained line search optimization for discriminative training in speech recognition. In: Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP'2007), Hawaii, USA.
[43]
A constrained line search optimization method for discriminative training of HMMs. IEEE Transactions on Audio, Speech and Language Processing. v16 i5. 900-909.
[44]
Macherey, W., Haferkamp, L., Schluter, R., Ney, R., 2005. Investigations on error minimizing training criteria for discriminative training in automatic speech recognition. In: Proceedings of Interspeech, Lisbon, Portugal, pp. 2133-2136.
[45]
McDermott, E., Hazen, T.J., 2004. Minimum classification error training of landmark models for real-time continuous speech recognition. In: Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP'2004), pp. I-937-I-940.
[46]
Discriminative training for large-vocabulary speech recognition using minimum classification error. IEEE Transactions on Audio, Speech, and Language Processing. v15 i1. 203-223.
[47]
Empirical risk minimization versus maximum likelihood estimation: a case study. Neural Computation. v7 i1. 144-157.
[48]
A decision theoretic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood. IEEE Transactions on Acoustics, Speech and Signal Processing. v31 i4. 814-817.
[49]
On a model-robust training method for speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing. v36 i9. 1432-1436.
[50]
A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan, M.I. (Ed.), Learning in Graphical Models, Kluwer Academic Publishers. pp. 355-368.
[51]
High-performance connected digit recognition using maximum mutual information estimation. IEEE Transactions on Speech and Audio Processing. v2 i2.
[52]
Pan, Z., Jiang, H., 2008. Large margin multinomial mixture model for text categorization. In: Proceedings of Interspeech 2008, Brisbane, Australia.
[53]
Pan, Z., 2008. Large margin multinomial model for document classification, Master Thesis, Department of Computer Science and Engineering, York University, Toronto, Canada.
[54]
Povey, D., Woodland, P.C., 2002. Minimum phone error and I-smoothing for improved discriminative training. In: Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP'2002), Orlando.
[55]
Povey, D., 2004. Discriminative training for large vocabulary speech recognition, Ph.D. Dissertation, Cambridge University, Cambridge, UK.
[56]
Deterministically Annealed Design of Hidden Markov Model Speech Recognizers. IEEE Transactions on Speech and Audio Processing. v9 i2. 111-126.
[57]
Deterministic annealing for clustering, compression, classification, regression, and related optimization problems. Proceedings of the IEEE. v86 i11. 2210-2239.
[58]
Learning with Kernels: Support vector Machine, Regularization, Optimization, and Beyond. The MIT Press, Cambridge.
[59]
Schluter, R., 2000. Investigations on discriminative training criteria, Ph.D. Dissertation, RWTH Aachenm University Technology, Aachen, Germany.
[60]
Smola, A.J., Bartlett, P., Scholkopf, B., Schuurmans, D., (Ed.), 2000. Advances in Large Margin Classifiers, The MIT Press.
[61]
Taskar, B., Guestrin, C., Koller, D., 2003. Max-margin Markov networks. In: Proceedings of Neural Information Processing Systems Conference (NIPS03), Vancouver, Canada.
[62]
Valtchev, V., 1995. Discriminative methods for HMM-based speech recognition, Ph.D. Dissertation, Cambridge University, UK.
[63]
Statistical Learning Theory. Wiley.
[64]
Confidence measures for large vocabulary continuous speech recognition. IEEE Transactions on Speech and Audio Processing. v9 i3. 288-298.
[65]
Large scale discriminative training of hidden Markov models for speech recognition. Computer Speech & Language. v16 i1. 25-47.
[66]
Yin, Y., Jiang, H., 2007. A fast optimization method for large margin estimation of HMMs based on second order cone programming. In: Proceedings of Interspeech 2007.
[67]
Yin, Y., 2007. A study of convex optimization for discriminative training of hidden Markov models in automatic speech recognition, Master Thesis, Department of Computer Science and Engineering, York University, Toronto, Canada.
[68]
Yin, Y., Jiang, H., 2007. A compact semidefinite programming (SDP) formulation for large margin estimation of HMMs in speech recognition. In: Proceedings of 2007 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Kyoto, Japan.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Computer Speech and Language
Computer Speech and Language  Volume 24, Issue 4
October, 2010
229 pages

Publisher

Academic Press Ltd.

United Kingdom

Publication History

Published: 01 October 2010

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Enhancing accuracy of long contextual dependencies for Punjabi speech recognition system using deep LSTMInternational Journal of Speech Technology10.1007/s10772-021-09814-224:2(517-527)Online publication date: 1-Jun-2021
  • (2020)ptype: probabilistic type inferenceData Mining and Knowledge Discovery10.1007/s10618-020-00680-134:3(870-904)Online publication date: 1-May-2020
  • (2019)Cooperative Heterogeneous Multi-Robot SystemsACM Computing Surveys10.1145/330384852:2(1-31)Online publication date: 9-Apr-2019
  • (2018)A critical review and analysis on techniques of speech recognitionInternational Journal of Knowledge-based and Intelligent Engineering Systems10.3233/KES-18037422:1(39-57)Online publication date: 1-Jan-2018
  • (2018)Speech recognition in a dialog systemMultimedia Tools and Applications10.1007/s11042-017-5160-577:12(15875-15911)Online publication date: 1-Jun-2018
  • (2017)Single-channel enhancement of convolutive noisy speech based on a discriminative NMF algorithm2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2017.7952567(2302-2306)Online publication date: 5-Mar-2017
  • (2017)DNN-HMM Acoustic Modeling for Large Vocabulary Telugu Speech RecognitionMining Intelligence and Knowledge Exploration10.1007/978-3-319-71928-3_19(189-197)Online publication date: 13-Dec-2017
  • (2015)Patient-specific early classification of multivariate observationsInternational Journal of Data Mining and Bioinformatics10.1504/IJDMB.2015.06795511:4(392-411)Online publication date: 1-Mar-2015
  • (2015)Detecting and Identifying Tactile Gestures using Deep Autoencoders, Geometric Moments and Gesture Level FeaturesProceedings of the 2015 ACM on International Conference on Multimodal Interaction10.1145/2818346.2830601(415-422)Online publication date: 9-Nov-2015
  • (2015)State-clustering based multiple deep neural networks modeling approach for speech recognitionIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2015.239294423:4(631-642)Online publication date: 1-Apr-2015
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media