Abstract
Batch Normalization has showed success in image classification and other image processing areas by reducing internal covariate shift in deep network model’s training procedure. In this paper, we propose to apply batch normalization to speech recognition within the hybrid NN-HMM model. We evaluate the performance of this new method in the acoustic model of the hybrid system with a speaker-independent speech recognition task using some Chinese datasets. Compared to the former best model we used in the Chinese datasets, it shows that with batch normalization we can reach lower word error rate (WER) of 8 %–13 % relatively, meanwhile we just need 60 % iterations of original model to finish the training procedure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 512, 436–444 (2015)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006)
Abdel-Hamid, O., Mohamed, A., Jiang, H., et al.: Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4277–4280. IEEE (2012)
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization, 12, 2121–2159. JMLR.org (2011)
Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp. 1139–1147 (2013)
Shimodaira, H.: Improving predictive inference under covariate shift by weighting the log-likelihood function, 90, 227–244. Elsevier (2000)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep networktraining by reducing internal covariate shift (2015). arXiv preprint arXiv:1502.03167
Wiesler, S., Ney, H.: A convergence analysis of log-linear training. In: Advances in Neural Information Processing Systems, pp. 657–665 (2011)
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., et al.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, no. EPFL-CONF-192584. IEEE Signal Processing Society (2011)
Sainath, T.N., Mohamed, A.R., Kingsbury, B., Ramabhadran, B.: Deep convolutional neural networks for LVCSR, pp. 8614–8618 (2013)
Acknowledgment
This work is jointly supported by the Science and Technology Commission of Shanghai Municipality under research grants 14511105500 and 14DZ2260800.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhan, H., Chen, G., Lu, Y. (2016). Applying Batch Normalization to Hybrid NN-HMM Model For Speech Recognition. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds) Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, vol 663. Springer, Singapore. https://doi.org/10.1007/978-981-10-3005-5_35
Download citation
DOI: https://doi.org/10.1007/978-981-10-3005-5_35
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3004-8
Online ISBN: 978-981-10-3005-5
eBook Packages: Computer ScienceComputer Science (R0)