[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Maximum A Posteriori Linear Regression for language recognition

Published: 01 March 2012 Publication History

Abstract

This paper proposes the use of Maximum A Posteriori Linear Regression (MAPLR) transforms as feature for language recognition. Rather than estimating the transforms using maximum likelihood linear regression (MLLR), MAPLR inserts the priori information of the transforms in the estimation process using maximum a posteriori (MAP) as the estimation criterion to drive the transforms. By multi MAPLR adaptation each language spoken utterance is convert to one discriminative transform supervector consist of one target language transform vector and other non-target transform vectors. SVM classifiers are employed to model the discriminative MAPLR transform supervector. This system can achieve performance comparable to that obtained with state-of-the-art approaches and better than MLLR. Experiment results on 2007 NIST Language Recognition Evaluation (LRE) databases show that relative decline in EER of 4% and on mincost of 9% are obtained after the language recognition system using MAPLR instead of MLLR in 30-s tasks, and further improvement is gained combining with state-of-the-art systems. It leads to gains of 6% on EER and 11% on minDCF comparing with the performance of the only combination of the MMI system and the GMM-SVM system.

References

[1]
Allen, F., Ambikairajah, E., & Epps, J. (2006). Warped magnitude and phase-based features for language identification.
[2]
A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. International Computer Science Institute. v4.
[3]
Brummer, N., & van Leeuwen, D. A. (2006). On calibration of language recognition scores. In IEEE Odyssey 2006: The Speaker and Language Recognition Workshop, 2006 (pp. 1-8).
[4]
Burget, L., Mat¿jka, P., & ¿ernocky`, J. (2006). Discriminative Training Techniques for Acoustic Language. In Proceedings of ICASSP, 2006 (pp. 209-212).
[5]
Support vector machines for speaker and language recognition. Computer Speech & Language. v20 i2-3. 210-229.
[6]
Chesta, C., Siohan, O., & Lee, C. H. (1999). Maximum A Posteriori Linear Regression for Hidden Markov Model Adaptation. In Sixth European Conference on Speech Communication and Technology, 1999, Vol. 1, (pp. 211-214).
[7]
SVMTorch: support vector machines for large-scale regression problems. The Journal of Machine Learning Research. v1. 143-160.
[8]
Support Vector Machines. Cambridge University Press, Cambridge, UK.
[9]
Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. v39 i1. 1-38.
[10]
Ferras, M., Leung, C. C., Barras, C., & Gauvain, J. L. (2007). Constrained MLLR for Speaker Recognition. In Acoustics, Speech and Signal Processing, 2007. IEEE International Conference, Vol. 4 (pp. 53-56).
[11]
Ferras, M., Leung, C. C., Barras, C., & Gauvain, J. L. (2008). MLLR Techniques for Speaker Recognition. In Proc. IEEE Odyssey 2008 Speaker and Language Recognition Workshop, 2008.
[12]
Department of Engineering, and University of Cambridge, Maximum likelihood linear transformations for HMM-based speech recognition. Computer Speech and Language. v12. 75-98.
[13]
Gauvain, J. L., Messaoudi, A., & Schwenk, H. (2004). Language recognition using phone latices. In Eighth International Conference on Spoken Language Processing. ISCA, 2004.
[14]
Gupta, A. K., & Varga, T. (1993). Elliptically contoured models in statistics.
[15]
A vector space modeling approach to spoken language identification. IEEE Transactions on Audio, Speech, and Language Processing. v15 i1. 271-284.
[16]
Shen, W., & Reynolds, D. (2008). Improved gmm-based language recognition using constrained MLLR transforms. ICASSP, 2008.
[17]
Shen, W., Campbell, W., Gleason, T., Reynolds, D., & Singer, E. (2006). Experiments with lattice-based PPRLM language identification. In IEEE Odyssey 2006: The Speaker and Language Recognition Workshop, 2006 (pp. 1-6).
[18]
Singer, E., Torres-Carrasquillo, P. A., Gleason, T. P., Campbell, W. M., & Reynolds, D. A. (2003). Acoustic, phonetic, and discriminative approaches to automatic language identification. In Eighth European Conference on Speech Communication and Technology, 2003.
[19]
Stolcke, A., Ferrer, L., Kajarekar, S., Shriberg, E., & Venkataraman, A. (2005). MLLR Transforms as Features in Speaker Recognition. In Ninth European Conference on Speech Communication and Technology, 2005 (pp. 2425-2428).
[20]
Stolcke, A., Kajarekar, S., & Ferrer, L. (2008). Nonparametric feature normalization for SVM-based speaker verification. In Acoustics, Speech and Signal Processing, 2008. IEEE International Conference on, 2008 (pp. 1577-1580).
[21]
Torres-Carrasquillo, P. A., Singer, E., Kohler, M. A., Greene, R. J., Reynolds, D. A., & Deller Jr., J. R. (2002). Approaches to language identification using Gaussian mixture models and shifted delta cepstral features. In Seventh International Conference on Spoken Language Processing. Citeseer, 2002.
[22]
Yan, Y., & Barnard, E. (1995). An approach to automatic language identification based on language-dependent phone recognition. In ICASSP. IEEE (pp. 3511-3514).
[23]
Zhang, X., Wang, H. P., Xiao, X., Zhang, J., & Yan, Y. (2010). Maximum A Posteriori Linear Regression For Speaker Recognition. In Acoustics, Speech and Signal Processing, 2008. IEEE International Conference on, 2010 (pp. 4542-4545).
[24]
Zhong, S., & Liu, J. (2010). A CMLLR Supervector Kernel For SVM Language Recognition. In Acoustics, Speech and Signal Processing, 2008. IEEE International Conference on, 2010 (pp. 4998-5001).
[25]
Zissman, M. A., (1995). Language identification using phoneme recognition and phonotactic language modeling. In IEEE International Conference On Acoustics Speech And Signal Processing. Institute Of Electrical engineers INC (IEE), Vol. 5 (pp. 3503-3503).

Cited By

View all
  • (2018)Word-length algorithm for language identification of under-resourced languagesJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2014.12.00428:4(457-469)Online publication date: 20-Dec-2018

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Expert Systems with Applications: An International Journal
Expert Systems with Applications: An International Journal  Volume 39, Issue 4
March, 2012
736 pages

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 01 March 2012

Author Tags

  1. Language recognition
  2. MAPLR
  3. MLLR
  4. SVM

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Word-length algorithm for language identification of under-resourced languagesJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2014.12.00428:4(457-469)Online publication date: 20-Dec-2018

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media