Applying Batch Normalization to Hybrid NN-HMM Model For Speech Recognition

Hongjian Zhan¹⁶,
Guilin Chen¹⁷ &
Yue Lu¹⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 663))

Included in the following conference series:

Chinese Conference on Pattern Recognition

2387 Accesses

Abstract

Batch Normalization has showed success in image classification and other image processing areas by reducing internal covariate shift in deep network model’s training procedure. In this paper, we propose to apply batch normalization to speech recognition within the hybrid NN-HMM model. We evaluate the performance of this new method in the acoustic model of the hybrid system with a speaker-independent speech recognition task using some Chinese datasets. Compared to the former best model we used in the Chinese datasets, it shows that with batch normalization we can reach lower word error rate (WER) of 8 %–13 % relatively, meanwhile we just need 60 % iterations of original model to finish the training procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 71.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 89.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Speech Recognition Using HMM-CNN

Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition

Article 10 June 2015

Adaptation of Deep Neural Network Acoustic Models for Robust Automatic Speech Recognition

References

LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 512, 436–444 (2015)
Article Google Scholar
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006)
Article MathSciNet MATH Google Scholar
Abdel-Hamid, O., Mohamed, A., Jiang, H., et al.: Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4277–4280. IEEE (2012)
Google Scholar
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization, 12, 2121–2159. JMLR.org (2011)
Google Scholar
Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp. 1139–1147 (2013)
Google Scholar
Shimodaira, H.: Improving predictive inference under covariate shift by weighting the log-likelihood function, 90, 227–244. Elsevier (2000)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep networktraining by reducing internal covariate shift (2015). arXiv preprint arXiv:1502.03167
Wiesler, S., Ney, H.: A convergence analysis of log-linear training. In: Advances in Neural Information Processing Systems, pp. 657–665 (2011)
Google Scholar
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., et al.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, no. EPFL-CONF-192584. IEEE Signal Processing Society (2011)
Google Scholar
Sainath, T.N., Mohamed, A.R., Kingsbury, B., Ramabhadran, B.: Deep convolutional neural networks for LVCSR, pp. 8614–8618 (2013)
Google Scholar

Download references

Acknowledgment

This work is jointly supported by the Science and Technology Commission of Shanghai Municipality under research grants 14511105500 and 14DZ2260800.

Author information

Authors and Affiliations

Shanghai Key Laboratory of Multidimensional Information Processing, Department of Computer Science and Technology, East China Normal University, Shanghai, 200241, China
Hongjian Zhan & Yue Lu
Shanghai Youngtone Technology Co., Ltd, Shanghai, China
Guilin Chen

Authors

Hongjian Zhan
View author publications
You can also search for this author in PubMed Google Scholar
Guilin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yue Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongjian Zhan .

Editor information

Editors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi'an, China
Xuelong Li
Chinese Academy of Sciences, Institute of Computing Technology, Beijing, China
Xilin Chen
Tsinghua University , Beijing, China
Jie Zhou
Nanjing University of Science and Technology, Nanjing, China
Jian Yang
University of Electronic Science and Technology, Chengdu, Sichuan, China
Hong Cheng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhan, H., Chen, G., Lu, Y. (2016). Applying Batch Normalization to Hybrid NN-HMM Model For Speech Recognition. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds) Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, vol 663. Springer, Singapore. https://doi.org/10.1007/978-981-10-3005-5_35

Download citation

DOI: https://doi.org/10.1007/978-981-10-3005-5_35
Published: 22 October 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3004-8
Online ISBN: 978-981-10-3005-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Applying Batch Normalization to Hybrid NN-HMM Model For Speech Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Speech Recognition Using HMM-CNN

Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition

Adaptation of Deep Neural Network Acoustic Models for Robust Automatic Speech Recognition

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Applying Batch Normalization to Hybrid NN-HMM Model For Speech Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Speech Recognition Using HMM-CNN

Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition

Adaptation of Deep Neural Network Acoustic Models for Robust Automatic Speech Recognition

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation