[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Masked Conditional Neural Networks for Audio Classification

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2017 (ICANN 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10614))

Included in the following conference series:

Abstract

We present the ConditionaL Neural Network (CLNN) and the Masked ConditionaL Neural Network (MCLNN) designed for temporal signal recognition. The CLNN takes into consideration the temporal nature of the sound signal and the MCLNN extends upon the CLNN through a binary mask to preserve the spatial locality of the features and allows an automated exploration of the features combination analogous to hand-crafting the most relevant features for the recognition task. MCLNN have achieved competitive recognition accuracies on the GTZAN and the ISMIR2004 music datasets that surpass several state-of-the-art neural network based architectures and hand-crafted methods applied on both datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 71.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 89.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Neural Information Processing Systems, NIPS (2012)

    Google Scholar 

  2. Abdel-Hamid, O., Mohamed, A.-R., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(10), 1533–1545 (2014)

    Article  Google Scholar 

  3. Schlter, J.: Unsupervised audio feature extraction for music similarity estimation. Thesis (2011)

    Google Scholar 

  4. Hamel, P., Eck, D.: Learning features from music audio with deep belief networks. In: International Society for Music Information Retrieval Conference, ISMIR (2010)

    Google Scholar 

  5. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  6. Taylor, G.W., Hinton, G.E., Roweis, S.: Modeling human motion using binary latent variables. In: Advances in Neural Information Processing Systems, NIPS, pp. 1345–1352 (2006)

    Google Scholar 

  7. Smolensky, P.: Information processing in dynamical systems: foundations of harmony theory, pp. 194–281 (1986)

    Google Scholar 

  8. Mohamed, A.-R., Hinton, G.: Phone recognition using restricted Boltzmann machines. In: IEEE International Conference on Acoustics Speech and Signal Processing, ICASSP (2010)

    Google Scholar 

  9. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–80 (1997)

    Article  Google Scholar 

  10. Pons, J., Lidy, T., Serra, X.: Experimenting with musically motivated convolutional neural networks. In: International Workshop on Content-Based Multimedia Indexing, CBMI (2016)

    Google Scholar 

  11. Piczak, K.J.: Environmental sound classification with convolutional neural networks. In: IEEE International Workshop on Machine Learning for Signal Processing (MLSP) (2015)

    Google Scholar 

  12. Lin, M., Chen, Q., Yan, S.: Network in network. In: International Conference on Learning Representations, ICLR (2014)

    Google Scholar 

  13. Bergstra, J., Casagrande, N., Erhan, D., Eck, D., Kgl, B.: Aggregate features and AdaBoost for music classification. Mach. Learn. 65(2–3), 473–484 (2006)

    Article  Google Scholar 

  14. Chang, K.K., Jang, J.-S.R., Iliopoulos, C.S.: Music genre classification via compressive sampling. In: International Society for Music Information Retrieval, ISMIR (2010)

    Google Scholar 

  15. Panagakis, Y., Kotropoulos, C., Arce, G.R.: Music genre classification using locality preserving non-negative tensor factorization and sparse representations. In: International Society for Music Information Retrieval Conference, ISMIR (2009)

    Google Scholar 

  16. Anden, J., Mallat, S.: Deep scattering spectrum. IEEE Trans. Sig. Process. 62(16), 4114–4128 (2014)

    Article  MathSciNet  Google Scholar 

  17. Henaff, M., Jarrett, K., Kavukcuoglu, K., LeCun, Y.: Unsupervised learning of sparse features for scalable audio classification. In: International Society for Music Information Retrieval, ISMIR (2011)

    Google Scholar 

  18. Sigtia, S., Dixon, S.: Improved music feature learning with deep neural networks. In: International Conference on Acoustics, Speech, and Signal Processing, ICASSP (2014)

    Google Scholar 

  19. Bergstra, J., Mandel, M., Eck, D.: Scalable genre and tag prediction with spectral covariance. In: International Society for Music Information Retrieval, ISMIR (2010)

    Google Scholar 

  20. Li, T., Ogihara, M., Li, Q.: A comparative study on content-based music genre classification. In: ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR (2003)

    Google Scholar 

  21. Pohle, T., Schnitzer, D., Schedl, M., Knees, P., Widmer, G.: On rhythm and general music similarity. In: International Society for Music Information Retrieval, ISMIR (2009)

    Google Scholar 

  22. Seyerlehner, K., Schedl, M., Pohle, T., Knees, P.: Using block-level features for genre classification, tag classification and music similarity estimation. In: Music Information Retrieval eXchange, MIREX (2010)

    Google Scholar 

  23. Holzapfel, A., Stylianou, Y.: Musical genre classification using nonnegative matrix factorization-based features. IEEE Trans. Audio Speech Lang. Process. 16(2), 424–434 (2008)

    Article  Google Scholar 

  24. Lidy, T., Rauber, A., Pertusa, A., Inesta, J.M.: Improving genre classification by combination of audio and symbolic descriptors using a transcription system. In: International Conference on Music Information Retrieval (2007)

    Google Scholar 

  25. Pampalk, E., Flexer, A., Widmer, G.: Improvements of audio-based music similarity and genre classification. In: International Conference on Music Information Retrieval, ISMIR (2005)

    Google Scholar 

  26. Panagakis, I., Benetos, E., Kotropoulos, C.: Music genre classification: a multilinear approach. In: International Society for Music Information Retrieval, ISMIR (2008)

    Google Scholar 

  27. Lidy, T., Rauber, A.: Evaluation of feature extractors and psycho-acoustic transformations for music genre classification. In: International Conference on Music Information Retrieval, ISMIR (2005)

    Google Scholar 

  28. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: IEEE International Conference on Computer Vision, ICCV (2015)

    Google Scholar 

  29. Kereliuk, C., Sturm, B.L., Larsen, J.: Deep learning and music adversaries. IEEE Trans. Multimedia 17(11), 2059–2071 (2015)

    Article  Google Scholar 

  30. Sturm, B.L.: The state of the art ten years after a state of the art: future research in music information retrieval. J. New Music Res. 43(2), 147–172 (2014)

    Article  Google Scholar 

  31. Bello, J.P.: Machine Listening of Music, pp. 159–184. Springer, New York (2014)

    Google Scholar 

  32. Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5) (2002)

    Google Scholar 

Download references

Acknowledgments

This work is funded by the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no. 608014 (CAPACITIE).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fady Medhat .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Medhat, F., Chesmore, D., Robinson, J. (2017). Masked Conditional Neural Networks for Audio Classification. In: Lintas, A., Rovetta, S., Verschure, P., Villa, A. (eds) Artificial Neural Networks and Machine Learning – ICANN 2017. ICANN 2017. Lecture Notes in Computer Science(), vol 10614. Springer, Cham. https://doi.org/10.1007/978-3-319-68612-7_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68612-7_40

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68611-0

  • Online ISBN: 978-3-319-68612-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics