[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Single-channel speech separation using combined EMD and speech-specific information

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Multi-channel blind source separation (BSS) methods use more than one microphone. There is a need to develop speech separation algorithms under single microphone scenario. In this paper we propose a method for single channel speech separation (SCSS) by combining empirical mode decomposition (EMD) and speech specific information. Speech specific information is derived in the form of source-filter features. Source features are obtained using multi pitch information. Filter information is estimated using formant analysis. To track multi pitch information in the mixed signal we apply simple-inverse filtering tracking (SIFT) and histogram based pitch estimation to excitation source information. Formant estimation is done using linear predictive (LP) analysis. Pitch and formant estimation are done with and without EMD decomposition for better extraction of the individual speakers in the mixture. Combining EMD with speech specific information provides encouraging results for single-channel speech separation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Bofill, P. (2008). Identifying single source data for mixing matrix estimation in instantaneous blind source separation Proceedings of the ICANN (pp. 759–767). Berlin: Springer.

    Google Scholar 

  • Douglas, S. C., & Sawada, H., & Makino S. (2005). Natural gradient Multichannel blind deconvolution and speech separation using causal FIR filters”. IEEE Transactions on Speech Audio Processing, 13(1), 92–104.

    Article  Google Scholar 

  • Ellis, D. P. (2006). Model-based scene analysis. Computational auditory scene analysis: Principles, algorithms, and applications, 4, 115–146.

    Google Scholar 

  • Fevotte, C., & Godsill, S. J. (2006). A Bayesian approach for blind separation of sparse sources. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 2174–2188.

    Article  MATH  Google Scholar 

  • Gao, B., Woo, W. L., & Dlay, S. S. (2011). Single-channel source separation using EMD-subband variable regularized sparse features. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 961–976.

    Article  Google Scholar 

  • Gao, B., Woo, W. L., & Dlay, S. S. (2013). Unsupervised single-channel separation of nonstationary signals using gammatone filterbank and itakura–saito nonnegative matrix two-dimensional factorizations. IEEE Transactions on Circuits and Systems I: Regular Papers, 60(3), 662–675.

    Article  MathSciNet  Google Scholar 

  • Greenwood, M., & Kinghorn, A. (1999). SUVing: Automatic silence/unvoiced/voiced classification of speech. Sheffield: Undergraduate Coursework, Department of Computer Science, The University of Sheffield.

    Google Scholar 

  • Hershey, J. R., Olsen, P. A., Rennie, S. J., & Aron, A. (2011) Audio Alchemy: Getting computers to understand overlapping speech. Scientific American Online. http://www.scientificamerican.com/article/speech-gettingcomputersunderstand-overlapping.

  • Huang, N. E., & Shen, Z., & Long S. R. (1998). The empirical mode decomposition and Hilbert spectrum for nonlinear and non-stationary time series analysis”. Proceedings of the Royal Society of London A, 454, 903–995.

    Article  MathSciNet  MATH  Google Scholar 

  • http://iitg.vlab.co.in/?sub=59&brch=164&sim=616&cnt=1108.

  • Jang, G. J., & Lee, T. W. (2003). A maximum likelihood approach to single-channel source separation. Journal of Machine Learning Research, 4, 1365–1392.

    MathSciNet  MATH  Google Scholar 

  • Karhunen, J., & Oja, E. (2001). Independent component analysis. Hoboken: Wiley.

    MATH  Google Scholar 

  • Kristjansson, T., Attias, H., & Hershey, J. (2004). Single microphone source separation using high resolution signal reconstruction. In IEEE Proceedings.(ICASSP’04). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004. (Vol. 2, pp. ii-817).

  • Li, P., Guan, Y., Xu, B., & Liu, W. (2006). Monaural speech separation based on computational auditory scene analysis and objective quality assessment of speech. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 2014–2023.

    Article  Google Scholar 

  • Li, Y., Amari, S. I., Cichocki, A., Ho, D. W., & Xie, S. (2006). Underdetermined blind source separation based on sparse representation. IEEE Transactions on Signal Processing, 54(2), 423–437.

    Article  MATH  Google Scholar 

  • Litvin, Y., & Cohen, I. (2011). Single-channel source separation of audio signals using bark scale wavelet packet decomposition. Journal of Signal Processing Systems, 65(3), 339–350.

    Article  Google Scholar 

  • Mijovic, B., De Vos, M., Gligorijevic, I., Taelman, J., & Van Huffel, S. (2010). Source separation from single-channel recordings by combining empirical-mode decomposition and independent component analysis. IEEE Transactions on Biomedical Engineering, 57(9), 2188–2196.

    Article  Google Scholar 

  • Molla, M. K. I., & Hirose, K. (2007). Single-mixture audio source separation by subspace decomposition of Hilbert spectrum. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 893–900.

    Article  Google Scholar 

  • Ozerov, A., & Févotte, C. (2010). Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Transactions on Audio, Speech, and Language Processing, 18(3), 550–563.

    Article  Google Scholar 

  • Reyes-Gomez, M. J., Ellis, D. P., & Jojic, N. (2004). Multiband audio modeling for single-channel acoustic source separation. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP’04). (Vol. 5, pp. V-641).

  • Schmidt, M. N., & Olsson, R. K. (2006). Single-channel speech separation using sparse non-negative matrix factorization. In Spoken Language Proceesing, ISCA International Conference on (INTERSPEECH).

  • Snell, R. C., & Milinazzo, F. (1993). Formant location from LPC analysis data. IEEE Transactions on Speech and Audio Processing, 1(2), 129–134.

    Article  Google Scholar 

  • Stark, M., Wohlmayr, M., & Pernkopf, F. (2011). Source–filter-based single-channel speech separation using pitch information. IEEE Transactions on Audio, Speech, and Language Processing, 19(2), 242–255.

    Article  Google Scholar 

  • Tengtrairat, N., Gao, B., Woo, W. L., & Dlay, S. S. (2013). Single-channel blind separation using pseudo-stereo mixture and complex 2-D histogram. IEEE Transactions on Neural Networks and Learning Systems, 24(11), 1722–1735.

    Article  Google Scholar 

  • Vincent, E., Bertin, N., Gribonval, R., & Bimbot, F. (2014). From blind to guided audio source separation: How models and side information can improve the separation of sound. IEEE Signal Processing Magazine, 31(3), 107–115.

    Article  Google Scholar 

  • Vincent, E., Gribonval, R., & Févotte, C. (2006). Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech, and Language Processing, 14(4), 1462–1469.

    Article  Google Scholar 

  • Virtanen, T. (2007). Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 1066–1074.

    Article  Google Scholar 

  • Wang, Y. H., Yeh, C. H., Young, H. W. V., Hu, K., & Lo, M. T. (2014). On the computational complexity of the empirical mode decomposition algorithm. Physica A: Statistical Mechanics and its Applications, 400, 159–167.

    Article  Google Scholar 

  • Wu, Z., & Huang, N. E. (2009). Ensemble empirical mode decomposition: A noise-assisted data analysis method. Advances in Adaptive Data Analysis, 1(01), 1–41.

    Article  Google Scholar 

  • Yegnanarayana, B., Swamy, R. K., & Murty, K. S. R. (2009). Determining mixing parameters from multispeaker data using speech-specific information. IEEE Transactions on Audio, Speech, and Language Processing, 17(6), 1196–1207.

    Article  Google Scholar 

  • Yeh, J. R., Shieh, J. S., & Huang, N. E. (2010). Complementary ensemble empirical mode decomposition: A novel noise enhanced data analysis method. Advances in Adaptive Data Analysis, 2(02), 135–156.

    Article  MathSciNet  Google Scholar 

  • Yilmaz, O., & Rickard, S. (2004). Blind separation of speech mixtures via time frequency masking. IEEE Transactions on Signal Processing, 52(7), 1830–1847.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. K. Prasanna Kumar.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Prasanna Kumar, M.K., Kumaraswamy, R. Single-channel speech separation using combined EMD and speech-specific information. Int J Speech Technol 20, 1037–1047 (2017). https://doi.org/10.1007/s10772-017-9468-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-017-9468-3

Keywords

Navigation