Single-channel speech separation using combined EMD and speech-specific information

274 Accesses
9 Citations
Explore all metrics

Abstract

Multi-channel blind source separation (BSS) methods use more than one microphone. There is a need to develop speech separation algorithms under single microphone scenario. In this paper we propose a method for single channel speech separation (SCSS) by combining empirical mode decomposition (EMD) and speech specific information. Speech specific information is derived in the form of source-filter features. Source features are obtained using multi pitch information. Filter information is estimated using formant analysis. To track multi pitch information in the mixed signal we apply simple-inverse filtering tracking (SIFT) and histogram based pitch estimation to excitation source information. Formant estimation is done using linear predictive (LP) analysis. Pitch and formant estimation are done with and without EMD decomposition for better extraction of the individual speakers in the mixture. Combining EMD with speech specific information provides encouraging results for single-channel speech separation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Single-channel speech separation using empirical mode decomposition and multi pitch information with estimation of number of speakers

Article 29 November 2016

Maximum A Posteriori Spectral Estimation with Source Log-Spectral Priors for Multichannel Speech Enhancement

Robust Speech Analysis Based on Source-Filter Model Using Multivariate Empirical Mode Decomposition in Noisy Environments

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Bofill, P. (2008). Identifying single source data for mixing matrix estimation in instantaneous blind source separation Proceedings of the ICANN (pp. 759–767). Berlin: Springer.
Google Scholar
Douglas, S. C., & Sawada, H., & Makino S. (2005). Natural gradient Multichannel blind deconvolution and speech separation using causal FIR filters”. IEEE Transactions on Speech Audio Processing, 13(1), 92–104.
Article Google Scholar
Ellis, D. P. (2006). Model-based scene analysis. Computational auditory scene analysis: Principles, algorithms, and applications, 4, 115–146.
Google Scholar
Fevotte, C., & Godsill, S. J. (2006). A Bayesian approach for blind separation of sparse sources. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 2174–2188.
Article MATH Google Scholar
Gao, B., Woo, W. L., & Dlay, S. S. (2011). Single-channel source separation using EMD-subband variable regularized sparse features. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 961–976.
Article Google Scholar
Gao, B., Woo, W. L., & Dlay, S. S. (2013). Unsupervised single-channel separation of nonstationary signals using gammatone filterbank and itakura–saito nonnegative matrix two-dimensional factorizations. IEEE Transactions on Circuits and Systems I: Regular Papers, 60(3), 662–675.
Article MathSciNet Google Scholar
Greenwood, M., & Kinghorn, A. (1999). SUVing: Automatic silence/unvoiced/voiced classification of speech. Sheffield: Undergraduate Coursework, Department of Computer Science, The University of Sheffield.
Google Scholar
Hershey, J. R., Olsen, P. A., Rennie, S. J., & Aron, A. (2011) Audio Alchemy: Getting computers to understand overlapping speech. Scientific American Online. http://www.scientificamerican.com/article/speech-gettingcomputersunderstand-overlapping.
Huang, N. E., & Shen, Z., & Long S. R. (1998). The empirical mode decomposition and Hilbert spectrum for nonlinear and non-stationary time series analysis”. Proceedings of the Royal Society of London A, 454, 903–995.
Article MathSciNet MATH Google Scholar
http://iitg.vlab.co.in/?sub=59&brch=164&sim=616&cnt=1108.
Jang, G. J., & Lee, T. W. (2003). A maximum likelihood approach to single-channel source separation. Journal of Machine Learning Research, 4, 1365–1392.
MathSciNet MATH Google Scholar
Karhunen, J., & Oja, E. (2001). Independent component analysis. Hoboken: Wiley.
MATH Google Scholar
Kristjansson, T., Attias, H., & Hershey, J. (2004). Single microphone source separation using high resolution signal reconstruction. In IEEE Proceedings.(ICASSP’04). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004. (Vol. 2, pp. ii-817).
Li, P., Guan, Y., Xu, B., & Liu, W. (2006). Monaural speech separation based on computational auditory scene analysis and objective quality assessment of speech. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 2014–2023.
Article Google Scholar
Li, Y., Amari, S. I., Cichocki, A., Ho, D. W., & Xie, S. (2006). Underdetermined blind source separation based on sparse representation. IEEE Transactions on Signal Processing, 54(2), 423–437.
Article MATH Google Scholar
Litvin, Y., & Cohen, I. (2011). Single-channel source separation of audio signals using bark scale wavelet packet decomposition. Journal of Signal Processing Systems, 65(3), 339–350.
Article Google Scholar
Mijovic, B., De Vos, M., Gligorijevic, I., Taelman, J., & Van Huffel, S. (2010). Source separation from single-channel recordings by combining empirical-mode decomposition and independent component analysis. IEEE Transactions on Biomedical Engineering, 57(9), 2188–2196.
Article Google Scholar
Molla, M. K. I., & Hirose, K. (2007). Single-mixture audio source separation by subspace decomposition of Hilbert spectrum. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 893–900.
Article Google Scholar
Ozerov, A., & Févotte, C. (2010). Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Transactions on Audio, Speech, and Language Processing, 18(3), 550–563.
Article Google Scholar
Reyes-Gomez, M. J., Ellis, D. P., & Jojic, N. (2004). Multiband audio modeling for single-channel acoustic source separation. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP’04). (Vol. 5, pp. V-641).
Schmidt, M. N., & Olsson, R. K. (2006). Single-channel speech separation using sparse non-negative matrix factorization. In Spoken Language Proceesing, ISCA International Conference on (INTERSPEECH).
Snell, R. C., & Milinazzo, F. (1993). Formant location from LPC analysis data. IEEE Transactions on Speech and Audio Processing, 1(2), 129–134.
Article Google Scholar
Stark, M., Wohlmayr, M., & Pernkopf, F. (2011). Source–filter-based single-channel speech separation using pitch information. IEEE Transactions on Audio, Speech, and Language Processing, 19(2), 242–255.
Article Google Scholar
Tengtrairat, N., Gao, B., Woo, W. L., & Dlay, S. S. (2013). Single-channel blind separation using pseudo-stereo mixture and complex 2-D histogram. IEEE Transactions on Neural Networks and Learning Systems, 24(11), 1722–1735.
Article Google Scholar
Vincent, E., Bertin, N., Gribonval, R., & Bimbot, F. (2014). From blind to guided audio source separation: How models and side information can improve the separation of sound. IEEE Signal Processing Magazine, 31(3), 107–115.
Article Google Scholar
Vincent, E., Gribonval, R., & Févotte, C. (2006). Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech, and Language Processing, 14(4), 1462–1469.
Article Google Scholar
Virtanen, T. (2007). Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 1066–1074.
Article Google Scholar
Wang, Y. H., Yeh, C. H., Young, H. W. V., Hu, K., & Lo, M. T. (2014). On the computational complexity of the empirical mode decomposition algorithm. Physica A: Statistical Mechanics and its Applications, 400, 159–167.
Article Google Scholar
Wu, Z., & Huang, N. E. (2009). Ensemble empirical mode decomposition: A noise-assisted data analysis method. Advances in Adaptive Data Analysis, 1(01), 1–41.
Article Google Scholar
Yegnanarayana, B., Swamy, R. K., & Murty, K. S. R. (2009). Determining mixing parameters from multispeaker data using speech-specific information. IEEE Transactions on Audio, Speech, and Language Processing, 17(6), 1196–1207.
Article Google Scholar
Yeh, J. R., Shieh, J. S., & Huang, N. E. (2010). Complementary ensemble empirical mode decomposition: A novel noise enhanced data analysis method. Advances in Adaptive Data Analysis, 2(02), 135–156.
Article MathSciNet Google Scholar
Yilmaz, O., & Rickard, S. (2004). Blind separation of speech mixtures via time frequency masking. IEEE Transactions on Signal Processing, 52(7), 1830–1847.
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

BMS College of Engineering, Bangalore, Karnataka, 560019, India
M. K. Prasanna Kumar
Siddaganga Institute of Technology, Tumkur, Karnataka, 572103, India
R. Kumaraswamy

Authors

M. K. Prasanna Kumar
View author publications
You can also search for this author in PubMed Google Scholar
R. Kumaraswamy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. K. Prasanna Kumar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Prasanna Kumar, M.K., Kumaraswamy, R. Single-channel speech separation using combined EMD and speech-specific information. Int J Speech Technol 20, 1037–1047 (2017). https://doi.org/10.1007/s10772-017-9468-3

Download citation

Received: 18 June 2017
Accepted: 27 September 2017
Published: 23 October 2017
Issue Date: December 2017
DOI: https://doi.org/10.1007/s10772-017-9468-3

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Single-channel speech separation using empirical mode decomposition and multi pitch information with estimation of number of speakers

Maximum A Posteriori Spectral Estimation with Source Log-Spectral Priors for Multichannel Speech Enhancement

Robust Speech Analysis Based on Source-Filter Model Using Multivariate Empirical Mode Decomposition in Noisy Environments

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Single-channel speech separation using combined EMD and speech-specific information

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Single-channel speech separation using empirical mode decomposition and multi pitch information with estimation of number of speakers

Maximum A Posteriori Spectral Estimation with Source Log-Spectral Priors for Multichannel Speech Enhancement

Robust Speech Analysis Based on Source-Filter Model Using Multivariate Empirical Mode Decomposition in Noisy Environments

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation