Abstract
In this paper we propose a monophonic constrained signal decomposition model applied to polyphonic signals composed of several monophonic sources from different musical instruments. The harmonic constraint is particularly effective for tonal instruments because each note is associated with a unique basis. The monophonic constraint is implemented by enforcing single-non-zero gains per instrument in the factorization process. The proposed method uses previously trained instrument models with a supervised procedure. Both constraints (harmonic and monophonic) are implemented in a deterministic manner. The proposed method has been tested for two audio signal applications, Sound Source Separation and Automatic Music Transcription. Comparison with other state-of-the-art methods using a dataset of polyphonic mixtures composed of monophonic sources has produced competitive and promising results.
Similar content being viewed by others
References
Abdallah S, Plumbley M (2004) Polyphonic music transcription by non-negative sparse coding of power spectra. In: Proc. 5th Int. Society for Music Information Retrieval conf. (ISMIR), Barcelona, Spain
Abdallah S, Plumbley M (2006) Unsupervised analysis of polyphonic music by sparse coding. IEEE Trans Neural Netw 17(1):179–196
Benaroya L, Bimbot F, Gribonval R (2006) Audio source separation with a single sensor. IEEE Trans Audio Speech Lang Process 14(1):191–199
Bertin N, Badeau R, Vincent E (2010) Enforcing harmonicity and smoothness in Bayesian non-negative matrix factorization applied to polyphonic music transcription. IEEE Trans Audio Speech Lang Process 18(3):538–549
Candés EJ, Wakin MB (2008) An introduction to compressive sampling. IEEE Signal Process Mag 25(2):21–30
Carabias-Orti JJ, Virtanen T, Vera-Candeas P, Ruiz-Reyes N, Cañadas-Quesada FJ (2011) Musical instrument sound multi-excitation model for non-negative spectrogram factorization. IEEE J Sel Topics Signal Process 5(6):1144–1158
Chen SS, Donoho DL, Saunders MA (1998) Atomic decomposition by basis pursuit. SIAM J Sci Comput 20:33–61
Dixon S (2000) On the computer recognition of solo piano music. In: Proceedings of Australasian computer music conference
Duan Z, Pardo B (2011) Soundprism: an online system for score-informed source separation of music audio. IEEE J Sel Topics Signal Process 5(6):1205–1215
Duan Z, Pardo B, Zhang C (2010) Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions. IEEE Trans Audio Speech Lang Process 18(8):2121–2133
Every MR, Szymanski JE (2006) Separation of synchronous pitched notes by spectral filtering of harmonics. IEEE Trans Audio Speech Lang Process 14(5):1845–1856
Févotte C, Idier J (2011) Algorithms for nonnegative matrix factorization with the beta-divergence. Neural Comput 23(9):2421–2456
Févotte C, Bertin N, Durrieu JL (2009) Nonnegative matrix factorization with the Itakura–Saito divergence. With application to music analysis. Neural Comput 21(3):793–830
FitzGerald D, Cranitch M, Coyle E (2009) On the use of the beta divergence for musical source separation. In: Signals and systems conference (ISSC 2009), IET Irish, 10–11 June 2009, pp 1–6
Gainza M, Coyle E (2007) Automating ornamentation transcription. In: IEEE international conference on acoustics, speech and signal processing, 2007. ICASSP 2007, vol 1, 15–20 April 2007, pp I-69–I-72
Gemmeke JF, Virtanen T, Hurmalainen A (2011) Exemplar-based sparse representations for noise robust automatic speech recognition. IEEE Trans Audio Speech Lang Process 19(7):2067–2080
Goto M (2004) Development of the RWC music database. In: Proc. of the 18th international congress on acoustics (ICA 2004), pp I-553–I-556 (invited paper)
Goto M, Hashiguchi H, Nishimura T, Oka R (2002) RWC music database: popular, classical, and jazz music databases. In: Proc. of the 3rd Int. Society for Music Information Retrieval conf. (ISMIR), Paris, France
Gribonval R, Bacry E (2003) Harmonic decomposition of audio signals with matching pursuit. IEEE Trans Signal Process 51(1):101–111
Helen M, Virtanen T (2005) Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine. In: Proc. EUSIPCO
Hoyer P (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469
Hyvarinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13:411–430
Klapuri A (2004) Signal processing methods for the automatic transcription of music. PhD thesis, Tampere University of Technology
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791
Lee DD, Seung HS (2000) Algorithms for non-negative matrix factorization. In: Proc. of neural information processing systems, Denver, USA
Marxer R, Jordi J, Bonada J (2012) Low-latency instrument separation in polyphonic audio using timbre models. In: Proc. LVA/ICA
Namgook C, Kuo C-CJ (2009) Underdetermined audio source separation from anechoic mixtures with long time delay. In: IEEE international conference on acoustics, speech and signal processing, 2009. ICASSP 2009, 19–24 April 2009, pp 1557–1560
Olshausen BA, Field DF (1997) Sparse coding with an overcomplete basis set: a strategy employed by V1? Vis Res 37:3311–3325
Ozerov A, Févotte C (2010) Multichannel non-negative matrix factorization in convolutive mixtures for audio source separation. IEEE Trans Audio Speech Lang Process 18(3):550–563
Ozerov A, Févotte C, Charbit M (2009) Factorial scaled hidden Markov model for polyphonic audio representation and source separation. In: IEEE workshop on applications of signal processing to audio and acoustics, WASPAA’09, pp 121–124
Ozerov A, Vincent E, Bimbot F (2012) A general flexible framework for the handling of prior information in audio source separation. IEEE Trans Audio Speech Lang Process 20(4):1118–1133
Plumbley M (2003) Algorithms for nonnegative independent component analysis. IEEE Trans Neural Netw 14(3):534–543
Raczyński SA, Ono N, Sagayama S (2007) Multipitch analysis with harmonic nonnegative matrix approximation. In: Proc. int. conf. music inf. retrieval (ISMIR), pp 381–386
Reyes-Gomez MJ, Raj B, Ellis D (2003) Multi-channel source separation by factorial HMMs. In: Proc. ICASSP, vol I, pp 664–667
Sawada H, Araki S, Makino S (2011) Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment. IEEE Trans Audio Speech Lang Process 19(3):516–527
Smaragdis P (1998) Blind separation of convolved mixtures in the frequency domain. Neurocomputing 22:21–34
Valentin E, Vincent E, Harlander N, Hohmann V (2011) Subjective and objective quality assessment of audio source separation. IEEE Trans Audio Speech Lang Process 19(7):2046–2057
Vincent E (2012) Improved perceptual metrics for the evaluation of audio source separation. In: 10th int. conf. on latent variable analysis and signal separation (LVA/ICA 2012)
Vincent E, Bertin N, Badeau R (2010) Adaptive harmonic spectral decomposition for multiple pitch estimation. IEEE Trans Audio Speech Lang Process 18(3):528–537
Virtanen T (2007) Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans Audio Speech Lang Process 15(3):1066–1074
Virtanen T, Klapuri A (2006) Analysis of polyphonic audio using source-filter model and non-negative matrix factorization. In: Advances in models for acoustic processing, neural information processing systems workshop
Virtanen T, Cemgil AT, Godsill S (2008) Bayesian extensions to non-negative matrix factorisation for audio signal modeling. In: Proc. int. conf. acoust., speech, signal process. (ICASSP), Las Vegas, USA
Wang B, Plumbley MD (2005) Musical audio stream separation by non-negative matrix factorization. In: Proc. DMRN summer conference, Glasgow
Zibulevsky M, Kisilev P, Zeevi YY, Pearlmutter B (2002) Blind source separation via multinode sparse representation. In: NIPS
Acknowledgements
This work was supported by the Andalusian Business, Science and Innovation Council under project P10- TIC-6762, (FEDER) the Spanish Ministry of Science and Innovation under Project TEC2009-14414-C03-02, and the University of Jaen under Project R1/12/2010/64.
The authors would like to thank Z. Duan for kindly sharing his annotated real world music database with them.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rodríguez-Serrano, F.J., Carabias-Orti, J.J., Vera-Candeas, P. et al. Monophonic constrained non-negative sparse coding using instrument models for audio separation and transcription of monophonic source-based polyphonic mixtures. Multimed Tools Appl 72, 925–949 (2014). https://doi.org/10.1007/s11042-013-1398-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-013-1398-8