Abstract
Multi-channel blind source separation (BSS) methods use more than one microphone. There is a need to develop speech separation algorithms under single microphone scenario. In this paper we propose a method for single channel speech separation (SCSS) by combining empirical mode decomposition (EMD) and speech specific information. Speech specific information is derived in the form of source-filter features. Source features are obtained using multi pitch information. Filter information is estimated using formant analysis. To track multi pitch information in the mixed signal we apply simple-inverse filtering tracking (SIFT) and histogram based pitch estimation to excitation source information. Formant estimation is done using linear predictive (LP) analysis. Pitch and formant estimation are done with and without EMD decomposition for better extraction of the individual speakers in the mixture. Combining EMD with speech specific information provides encouraging results for single-channel speech separation.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bofill, P. (2008). Identifying single source data for mixing matrix estimation in instantaneous blind source separation Proceedings of the ICANN (pp. 759–767). Berlin: Springer.
Douglas, S. C., & Sawada, H., & Makino S. (2005). Natural gradient Multichannel blind deconvolution and speech separation using causal FIR filters”. IEEE Transactions on Speech Audio Processing, 13(1), 92–104.
Ellis, D. P. (2006). Model-based scene analysis. Computational auditory scene analysis: Principles, algorithms, and applications, 4, 115–146.
Fevotte, C., & Godsill, S. J. (2006). A Bayesian approach for blind separation of sparse sources. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 2174–2188.
Gao, B., Woo, W. L., & Dlay, S. S. (2011). Single-channel source separation using EMD-subband variable regularized sparse features. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 961–976.
Gao, B., Woo, W. L., & Dlay, S. S. (2013). Unsupervised single-channel separation of nonstationary signals using gammatone filterbank and itakura–saito nonnegative matrix two-dimensional factorizations. IEEE Transactions on Circuits and Systems I: Regular Papers, 60(3), 662–675.
Greenwood, M., & Kinghorn, A. (1999). SUVing: Automatic silence/unvoiced/voiced classification of speech. Sheffield: Undergraduate Coursework, Department of Computer Science, The University of Sheffield.
Hershey, J. R., Olsen, P. A., Rennie, S. J., & Aron, A. (2011) Audio Alchemy: Getting computers to understand overlapping speech. Scientific American Online. http://www.scientificamerican.com/article/speech-gettingcomputersunderstand-overlapping.
Huang, N. E., & Shen, Z., & Long S. R. (1998). The empirical mode decomposition and Hilbert spectrum for nonlinear and non-stationary time series analysis”. Proceedings of the Royal Society of London A, 454, 903–995.
Jang, G. J., & Lee, T. W. (2003). A maximum likelihood approach to single-channel source separation. Journal of Machine Learning Research, 4, 1365–1392.
Karhunen, J., & Oja, E. (2001). Independent component analysis. Hoboken: Wiley.
Kristjansson, T., Attias, H., & Hershey, J. (2004). Single microphone source separation using high resolution signal reconstruction. In IEEE Proceedings.(ICASSP’04). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004. (Vol. 2, pp. ii-817).
Li, P., Guan, Y., Xu, B., & Liu, W. (2006). Monaural speech separation based on computational auditory scene analysis and objective quality assessment of speech. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 2014–2023.
Li, Y., Amari, S. I., Cichocki, A., Ho, D. W., & Xie, S. (2006). Underdetermined blind source separation based on sparse representation. IEEE Transactions on Signal Processing, 54(2), 423–437.
Litvin, Y., & Cohen, I. (2011). Single-channel source separation of audio signals using bark scale wavelet packet decomposition. Journal of Signal Processing Systems, 65(3), 339–350.
Mijovic, B., De Vos, M., Gligorijevic, I., Taelman, J., & Van Huffel, S. (2010). Source separation from single-channel recordings by combining empirical-mode decomposition and independent component analysis. IEEE Transactions on Biomedical Engineering, 57(9), 2188–2196.
Molla, M. K. I., & Hirose, K. (2007). Single-mixture audio source separation by subspace decomposition of Hilbert spectrum. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 893–900.
Ozerov, A., & Févotte, C. (2010). Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Transactions on Audio, Speech, and Language Processing, 18(3), 550–563.
Reyes-Gomez, M. J., Ellis, D. P., & Jojic, N. (2004). Multiband audio modeling for single-channel acoustic source separation. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP’04). (Vol. 5, pp. V-641).
Schmidt, M. N., & Olsson, R. K. (2006). Single-channel speech separation using sparse non-negative matrix factorization. In Spoken Language Proceesing, ISCA International Conference on (INTERSPEECH).
Snell, R. C., & Milinazzo, F. (1993). Formant location from LPC analysis data. IEEE Transactions on Speech and Audio Processing, 1(2), 129–134.
Stark, M., Wohlmayr, M., & Pernkopf, F. (2011). Source–filter-based single-channel speech separation using pitch information. IEEE Transactions on Audio, Speech, and Language Processing, 19(2), 242–255.
Tengtrairat, N., Gao, B., Woo, W. L., & Dlay, S. S. (2013). Single-channel blind separation using pseudo-stereo mixture and complex 2-D histogram. IEEE Transactions on Neural Networks and Learning Systems, 24(11), 1722–1735.
Vincent, E., Bertin, N., Gribonval, R., & Bimbot, F. (2014). From blind to guided audio source separation: How models and side information can improve the separation of sound. IEEE Signal Processing Magazine, 31(3), 107–115.
Vincent, E., Gribonval, R., & Févotte, C. (2006). Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech, and Language Processing, 14(4), 1462–1469.
Virtanen, T. (2007). Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 1066–1074.
Wang, Y. H., Yeh, C. H., Young, H. W. V., Hu, K., & Lo, M. T. (2014). On the computational complexity of the empirical mode decomposition algorithm. Physica A: Statistical Mechanics and its Applications, 400, 159–167.
Wu, Z., & Huang, N. E. (2009). Ensemble empirical mode decomposition: A noise-assisted data analysis method. Advances in Adaptive Data Analysis, 1(01), 1–41.
Yegnanarayana, B., Swamy, R. K., & Murty, K. S. R. (2009). Determining mixing parameters from multispeaker data using speech-specific information. IEEE Transactions on Audio, Speech, and Language Processing, 17(6), 1196–1207.
Yeh, J. R., Shieh, J. S., & Huang, N. E. (2010). Complementary ensemble empirical mode decomposition: A novel noise enhanced data analysis method. Advances in Adaptive Data Analysis, 2(02), 135–156.
Yilmaz, O., & Rickard, S. (2004). Blind separation of speech mixtures via time frequency masking. IEEE Transactions on Signal Processing, 52(7), 1830–1847.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Prasanna Kumar, M.K., Kumaraswamy, R. Single-channel speech separation using combined EMD and speech-specific information. Int J Speech Technol 20, 1037–1047 (2017). https://doi.org/10.1007/s10772-017-9468-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-017-9468-3