Multichannel Audio Source Separation Exploiting NMF-Based Generic Source Spectral Model in Gaussian Modeling Framework

Thanh Thi Hien Duong^18,19,
Ngoc Q. K. Duong²⁰,
Cong-Phuong Nguyen^18,21 &
…
Quoc-Cuong Nguyen²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10891))

Included in the following conference series:

International Conference on Latent Variable Analysis and Signal Separation

1812 Accesses
1 Citations

Abstract

Nonnegative matrix factorization (NMF) has been well-known as a powerful spectral model for audio signals. Existing work, including ours, has investigated the use of generic source spectral models (GSSM) based on NMF for single-channel audio source separation and shown its efficiency in different settings. This paper extends the work to multichannel case where the GSSM is combined with the source spatial covariance model within a unified Gaussian modeling framework. Especially, unlike a conventional combination where the estimated variances of each source are further constrained by NMF separately, we propose to constrain the total variances of all sources altogether and found a better separation performance. We present the expectation-maximization (EM) algorithm for the parameter estimation. We demonstrate the effectiveness of the proposed approach by using a benchmark dataset provided within the 2016 Signal Separation Evaluation Campaign.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An Introduction to Multichannel NMF for Audio Source Separation

General Formulation of Multichannel Extensions of NMF Variants

Deep Neural Network Based Multichannel Audio Source Separation

Notes

References

Liutkus, A., Stöter, F.-R., Rafii, Z., Kitamura, D., Rivet, B., Ito, N., Ono, N., Fontecave, J.: The 2016 signal separation evaluation campaign. In: Tichavský, P., Babaie-Zadeh, M., Michel, O.J.J., Thirion-Moreau, N. (eds.) LVA/ICA 2017. LNCS, vol. 10169, pp. 323–332. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-53547-0_31
Chapter Google Scholar
Liutkus, A., Durrieu, J.L., Daudet, L., Richard, G.: An overview of informed audio source separation. In: International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), pp. 1–4. IEEE (2013)
Google Scholar
Ewert, S., Pardo, B., Mueller, M., Plumbley, M.D.: Score-informed source separation for musical audio recordings: an overview. IEEE Sig. Process. Mag. 31(3), 116–124 (2014)
Article Google Scholar
Magoarou, L.L., Ozerov, A., Duong, N.Q.K.: Text-informed audio source separation. example-based approach using non-negative matrix partial co-factorization. J. Sig. Process. Syst. 79(2), 117–131 (2015)
Article Google Scholar
Parekh, S., Essid, S., Ozerov, A., Duong, N.Q.K., Perez, P., Richard, G.: Motion informed audio source separation. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017)
Google Scholar
Souviraà-Labastie, N., Olivero, A., Vincent, E., Bimbot, F.: Multi-channel audio source separation using multiple deformed references. IEEE/ACM Trans. Audio Speech Lang. Process. 23, 1775–1787 (2015)
Article Google Scholar
Sun, D.L., Mysore, G.J.: Universal speech models for speaker independent single channel source separation. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 141–145 (2013)
Google Scholar
Badawy, D.E., Duong, N.Q.K., Ozerov, A.: On-the-fly audio source separation - a novel user-friendly framework. IEEE/ACM Trans. Audio Speech Lang. Process. 25(2), 261–272 (2017)
Article Google Scholar
Duong, H.T.T., Nguyen, Q.C., Nguyen, C.P., Tran, T.H., Duong, N.Q.K.: Speech enhancement based on nonnegative matrix factorization with mixed group sparsity constraint. In: Proceedings of the ACM SoICT, pp. 247–251 (2015)
Google Scholar
Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural and Information Processing Systems 13, pp. 556–562 (2001)
Google Scholar
Févotte, C., Bertin, N., Durrieu, J.L.: Nonnegative matrix factorization with the Itakura-Saito divergence: with application to music analysis. Neural Comput. 21(3), 793–830 (2009)
Article Google Scholar
Mandel, M., Ellis, D.: EM localization and separation using interaural level and phase cues. In: Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 275–278 (2007)
Google Scholar
Sawada, H., Araki, S., Makino, S.: Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment. IEEE Trans. Audio Speech Lang. Process. 19(3), 516–527 (2011)
Article Google Scholar
Kitamura, D., Ono, N., Sawada, H., Kameoka, H., Saruwatari, H.: Efficient multichannel nonnegative matrix factorization exploiting rank-1 spatial model. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 276–280 (2015)
Google Scholar
Duong, N.Q.K., Vincent, E., Gribonval, R.: Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Trans. Audio Speech Lang. Process. 18(7), 1830–1840 (2010)
Article Google Scholar
Fakhry, M., Svaizer, P., Omologo, M.: Audio source separation in reverberant environments using beta-divergence based nonnegative factorization. IEEE/ACM Trans. Audio Speech Lang. Process. 25(7), 1462–1476 (2017)
Article Google Scholar
Arberet, S., Ozerov, A., Duong, N.Q.K., Vincent, E., Gribonval, R., Vandergheynst, P.: Nonnegative matrix factorization and spatial covariance model for under-determined reverberant audio source separation. In: Proceedings of the IEEE ISSPA, pp. 1–4 (2010)
Google Scholar
Ozerov, A., Vincent, E., Bimbot, F.: A general flexible framework for the handling of prior information in audio source separation. IEEE Trans. Audio Speech Lang. Process. 20(4), 1118–1133 (2012)
Article Google Scholar
Lefèvre, A., Bach, F., Févotte, C.: Itakura-Saito non-negative matrix factorization with group sparsity. In: Proceedings of the IEEE ICASSP, pp. 21–24 (2011)
Google Scholar
Wood, S., Rouat, J.: Blind speech separation with GCC-NMF. In: Proceedings of the Interspeech, pp. 3329–3333 (2016)
Google Scholar
Vincent, E., Gribonval, R., Fevotte, C.: Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

International Research Institute MICA, Hanoi University of Science and Technology, Hanoi, Vietnam
Thanh Thi Hien Duong & Cong-Phuong Nguyen
Information Technology Faculty, Hanoi University of Mining and Geology, Hanoi, Vietnam
Thanh Thi Hien Duong
Imaging Science Lab, Technicolor, Cesson-Sévigné, France
Ngoc Q. K. Duong
Department of Instrumentation and Industrial Informatic, Hanoi University of Science and Technology, Hanoi, Vietnam
Cong-Phuong Nguyen & Quoc-Cuong Nguyen

Authors

Thanh Thi Hien Duong
View author publications
You can also search for this author in PubMed Google Scholar
Ngoc Q. K. Duong
View author publications
You can also search for this author in PubMed Google Scholar
Cong-Phuong Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Quoc-Cuong Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thanh Thi Hien Duong .

Editor information

Editors and Affiliations

Paul Sabatier University, Toulouse, France
Yannick Deville
Bar-Ilan University, Ramat Gan, Israel
Sharon Gannot
University of Surrey, Guildford, United Kingdom
Russell Mason
University of Surrey, Guildford, United Kingdom
Mark D. Plumbley
University of Surrey, Guildford, United Kingdom
Dominic Ward

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Duong, T.T.H., Duong, N.Q.K., Nguyen, CP., Nguyen, QC. (2018). Multichannel Audio Source Separation Exploiting NMF-Based Generic Source Spectral Model in Gaussian Modeling Framework. In: Deville, Y., Gannot, S., Mason, R., Plumbley, M., Ward, D. (eds) Latent Variable Analysis and Signal Separation. LVA/ICA 2018. Lecture Notes in Computer Science(), vol 10891. Springer, Cham. https://doi.org/10.1007/978-3-319-93764-9_50

Download citation

DOI: https://doi.org/10.1007/978-3-319-93764-9_50
Published: 06 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93763-2
Online ISBN: 978-3-319-93764-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics