Abstract
We address the problem of blind source separation from a single channel audio source using a statistical model of the sources. We modify the Bark Scale aligned Wavelet Packet Decomposition, to acquire approximate-shiftability property. We allow oversampling in some decomposition nodes to equalize sampling rate in all terminal nodes. Statistical models are trained from samples of each source separately. The separation is performed using these models. The proposed psycho-acoustically motivated non-uniform filterbank structure reduces signal space dimension and simplifies training procedure of the statistical model. In our experiments we show that the proposed algorithm performs better when compared to a competing algorithm. We study the effect that different wavelet families have on the performance of the proposed signal analysis in the single-channel source separation task.
Similar content being viewed by others
References
Vincent, E., Févotte, C., Benaroya, L., & Gribonval, R. (2003). A tentative typology of audio source separation tasks. In Proc. 4th international symposium on independent component analysis and blind signal separation (ICA2003) (pp. 715–720). Nara, Japan.
Cherry, C. E. (1953). Some experiments on the recognition of speech, with one and with two ears. Journal of the Acoustical Society of America, 25(5), 975–979.
Comon P. (1994). Independent component analysis, a new concept? Signal Processing, 36(3), 287–314.
Hyvärinen, A., Karhunen, J., & Oja, E. (2001). Independent component analysis. Wiley-Interscience.
Ozerov, A., Philippe, P., Bimbot, F., & Gribonval, R. (2007). Adaptation of bayesian models for single-channel source separation and its application to voice/music separation in popular songs. IEEE Transactions on Audio, Speech & Language Processing, 15(5), 1564–1578.
Benaroya, L., Bimbot, F., & Gribonval, R. (2006). Audio source separation with a single sensor. IEEE Transactions on Audio, Speech & Language Processing, 14(1), 191–199.
Benaroya, L., & Bimbot, F. (2003). Wiener based source separation with HMM/GMM using a single sensor. In Proc. 4th international symposium on independent component analysis and blind signal separation (ICA2003) (pp. 957–961). Nara, Japan.
Srinivasan, S., Samuelsson, J., & Kleijn, W. B. (2006). Codebook driven short-term predictor parameter estimation for speech enhancement. IEEE Transactions on Audio, Speech & Language Processing, 14(1), 163–176.
Srinivasan, S., Samuelsson, J., & Kleijn, W. B. (2007). Codebook-based bayesian speech enhancement for nonstationary environments. IEEE Transactions on Audio, Speech & Language Processing, 15(2), 441–452.
Cohen, I. (2001). Enhancement of speech using bark-scaled wavelet packet decomposition. In Proc. 7th European conf. speech, communication and technology, EUROSPEECH-2001 (pp. 1933–1936). Aalborg, Denmark.
Fernandes, F. C .A., van Spaendonck, R. L. C., & Burrus, C. S. (2003). A new framework for complex wavelet transforms. IEEE Transactions Signal Processing, 51(7), 1825–1837.
Litvin, Y., & Cohen, I. (2009). Single-channel source separation of audio signals using bark scale wavelet packet decomposition. In 2009 IEEE international workshop on machine learning for signal processing (MLSP09).
Fernandes, F. C. A. (2002). Directional, shift-insensitive, complex wavelet transforms with controllable redundancy. Ph.D. thesis, Rice Univ., Houston, TX, USA.
Simoncelli, E. P., Freeman, W. T., Adelson, E. H., & Heeger, D. J. (1992). Shiftable multiscale transforms. IEEE Transactions on Information Theory, 38(2), 587–607.
Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), 1109–1121.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–38.
Gribonval, R., Benaroya, L., Vincent, E., & Févotte, C. (2003). Proposals for performance measurement in source separation. In Proc. 4th international symposium on ICA and BSS (ICA2003) (pp. 763–768). Nara, Japan.
Févotte, C., Gribonval, R., & Vincent, E. (2005). BSS_EVAL toolbox user guide revision 2.0. Tech. Rep. 1706, IRISA, Rennes, France.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the Israel Science Foundation under Grant 1085/05 and by the European Commission under project Memories FP6-IST-035300.
Rights and permissions
About this article
Cite this article
Litvin, Y., Cohen, I. Single-Channel Source Separation of Audio Signals Using Bark Scale Wavelet Packet Decomposition. J Sign Process Syst 65, 339–350 (2011). https://doi.org/10.1007/s11265-010-0510-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-010-0510-9