[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Combining Acoustic and Multilevel Visual Features for Music Genre Classification

Published: 24 August 2015 Publication History

Abstract

Most music genre classification approaches extract acoustic features from frames to capture timbre information, leading to the common framework of bag-of-frames analysis. However, time-frequency analysis is also vital for modeling music genres. This article proposes multilevel visual features for extracting spectrogram textures and their temporal variations. A confidence-based late fusion is proposed for combining the acoustic and visual features. The experimental results indicated that the proposed method achieved an accuracy improvement of approximately 14% and 2% in the world's largest benchmark dataset (MASD) and Unique dataset, respectively. In particular, the proposed approach won the Music Information Retrieval Evaluation eXchange (MIREX) music genre classification contests from 2011 to 2013, demonstrating the feasibility and necessity of combining acoustic and visual features for classifying music genres.

References

[1]
Jeremy F. Alm and James S. Walker. 2002. Time-frequency analysis of musical instruments. SIAM Review 44, 3, 457--476.
[2]
James Bergstra, Michael I. Mandel, and Douglas Eck. 2010. Scalable genre and tag prediction with spectral covariance. In Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR). J. Stephen Downie and Remco C. Veltkamp (Eds.), International Society for Music Information Retrieval, 507--512. http://dblp.uni-trier.de/db/conf/ismir/ismir2010.html#BergstraME10.
[3]
Thierry Bertin-Mahieux, Daniel P. W. Ellis, Brian Whitman, and Paul Lamere. 2011. The million song dataset. In Proceedings of the International Conference on Music Information Retrieval. 591--596.
[4]
William M. Campbell, Douglas E. Sturim, and Douglas A. Reynolds. 2006. Support vector machines using GMM supervectors for speaker verification. IEEE Sig. Process. Lett. 13, 5, 308--311.
[5]
Chuan Cao and Ming Li. 2009. Thinkits submission for MIREX 2009 audio music classification and similarity tasks. http://www.music-ir.org/mirex/results/2009/abs/CL.pdf.
[6]
Chih-Chung Chang and Chih-Jen Lin. 2010. LIBSVM: A library for support vector machine. (2010). http://www.csie.ntu.edu.tw/∼cjlin/libsvm.
[7]
Zhi-Sheng Chen, Jyh-Shing Roger Jang, and Chin-Hui Lee. 2011. A kernel framework for content-based artist recommendation system in music. IEEE Trans. Multimedi. 13, 6, 1371--1380.
[8]
Y. M. G. Costa, L. S. Oliveira, A. L. Koerich, F. Gouyon, and J. G. Martins. 2012. Music genre classification using LBP textural features. Sig. Process. 92, 11, 2723--2737. 2012.04.023
[9]
Janez Demšar. 2006. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1--30. http://dl.acm.org/citation.cfm?id=1248547.1248548.
[10]
Hrishikesh Deshpande, Rohit Singh, and Unjung Nam. 2001. Classification of music signals in the visual domain. In Proceedings of the COST-G6 Conference on Digital Audio Effects. 1--4.
[11]
J. Stephen Downie, Andreas F. Ehmann, and Xiao Hu. 2005. Music-to-knowledge (M2K): A prototyping and evaluation environment for music digital library research. In Proceedings of the 5th ACM/IEEE Joint Conference on Digital Libraries. IEEE, 376--376.
[12]
Daniel P. W. Ellis. 2007. Beat tracking by dynamic programming. J. New Music Res. 36, 1, 51--60.
[13]
Daniel P. W. Ellis and Graham E. Poliner. 2007. Identifying cover songs' with chroma features and dynamic programming beat tracking. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Vol. 4, IEEE, 1429--1432.
[14]
Zhouyu Fu, Guojun Lu, Kai Ming Ting, and Dengsheng Zhang. 2011. A survey of audio-based music classification and annotation. IEEE Trans. Multimed. 13, 2, 303--319. 2010.2098858
[15]
Jean-Luc Gauvain and Chin-Hui Lee. 1994. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans. Speech Audio Process. 2, 2, 291--298.
[16]
Masataka Goto. 2003. SmartMusicKiosk: Music listening station with chorus-search function. In Proceedings of the 16th ACM Conference on User Interface Software and Technology. ACM, 31--40.
[17]
Peter Grosche, Joan Serrà, Meinard Müller, and Josep Ll. Arcos. 2012. Structure-based audio fingerprinting for music retrieval. In Proceedings of the International Conference on Music Information Retrieval. 55--60.
[18]
Dan-Ning Jiang, Lie Lu, Hong-Jiang Zhang, Jian-Hua Tao, and Lian-Hong Cai. 2002. Music type classification by spectral contrast feature. In Proceedings of the IEEE International Conference on Multimedia and Expo. Vol. 1, 113--116.
[19]
Josef Kittler, Mohamad Hatef, Robert P. W. Duin, and Jiri Matas. 1998. On combining classifiers. IEEE Trans. Patt. Anal. Mach. Intell. 20, 3 (1998), 226--239.
[20]
Chang-Hsing Lee, Jau-Ling Shih, Kun-Ming Yu, and Hwai-San Lin. 2009. Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features. IEEE Trans. Multimed. 11, 4, 670--682.
[21]
Thomas Lidy and Andreas Rauber. 2005. Evaluation of feature extractors and psycho-acoustic transformations for music genre classification. In Proceedings of the International Conference on Music Information Retrieval. 34--41.
[22]
Cory McKay. 2010. Automatic music classification with jMIR. Ph.D. dissertation, McGill University, Canada.
[23]
Anders Meng, Peter Ahrendt, Jan Larsen, and Lars Kai Hansen. 2007. Temporal feature integration for music genre classification. IEEE Trans. Audio, Speech, Lang. Process. 15, 5 (July 2007), 1654--1664.
[24]
Anders Meng and John Shawe-Taylor. 2005. An investigation of feature models for music genre classification using the support vector classifier. In Proceedings of the International Conference on Music Information Retrieval. 604--609.
[25]
Meinard Muller, Daniel P. W. Ellis, Anssi Klapuri, and Gaël Richard. 2011. Signal processing for music analysis. IEEE J. Select. Topics Sig. Process. 5, 6, 1088--1110.
[26]
Timo Ojala, Matti Pietikainen, and Topi Maenpaa. 2002. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Patt. Anal. Machine Intell. 24, 7, 971--987.
[27]
François Pachet and Daniel Cazaly. 2000. A taxonomy of musical genres. In Proceedings of the RIAO Conference. 1238--1245.
[28]
Y. Panagakis, C. L. Kotropoulos, and G. R. Arce. 2014. Music genre classification via joint sparse low-rank representation of audio features. IEEE/ACM Trans. Audio, Speech, Lang. Process. 22, 12, 1905--1917.
[29]
Yannis Panagakis, Constantine Kotropoulos, and Gonzalo R. Arce. 2010. Non-negative multilinear principal component analysis of auditory temporal modulations for music genre classification. IEEE Trans. Audio, Speech, and Lang. Process. 18, 3, 576--588.
[30]
Jouni Paulus, Meinard Müller, and Anssi Klapuri. 2010. State of the art report: Audio-based music structure analysis. In Proceedings of the International Conference on Music Information Retrieval. 625--636.
[31]
Soo-Chang Pei and Nien-Teh Hsu. 2009. Instrumentation analysis and identification of polyphonic music using beat-synchronous feature integration and fuzzy clustering. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, 169--172.
[32]
Lawrence Rabiner and Biing-Hwang Juang. 1993. Fundamentals of Speech Recognition. Vol. 14, Prentice Hall PTR.
[33]
Jia-Min Ren and J. R. Jang. 2012. Discovering time-constrained sequential patterns for music genre classification. IEEE/ACM Trans. Audio, Speech, Lang. Process. 20, 4, 1134--1144.
[34]
Douglas A. Reynolds, Thomas F. Quatieri, and Robert B. Dunn. 2000. Speaker verification using adapted gaussian mixture models. Digital Signal Process. 10, 13, 19--41. 1999.0361
[35]
Alexander Schindler, Rudolf Mayer, and Andreas Rauber. 2012. Facilitating comprehensive benchmarking experiments on the million song dataset. In Proceedings of the International Conference on Music Information Retrieval. 469--474.
[36]
Klaus Seyerlehner. 2010. Content-based music recommender systems: Beyond simple frame-level audio similarity. Ph.D. dissertation, Johannes Kepler University, Linz, Austria.
[37]
Klaus Seyerlehner, Markus Schedl, Peter Knees, and Reinhard Sonnleitner. 2011. Draft: A refined block-level feature set for classification, similarity and tag prediction. http://www.music-ir.org/mirex/abstracts/2011/SSKS1.pdf.
[38]
Klaus Seyerlehner, Markus Schedl, Tim Pohle, and Peter Knees. 2010. Using block-level features for genre classification, tag classification and music similarity estimation. http://www.music-ir.org/mirex/abstracts/2010/SSPK1.pdf.
[39]
E. Tsunoo, G. Tzanetakis, N. Ono, and S. Sagayama. 2011. Beyond timbral statistics: Improving music classification using percussive patterns and bass lines. IEEE/ACM Trans. Audio, Speech, Lang. Process. 19, 4 (May 2011), 1003--1014.
[40]
George Tzanetakis. 2007. MARSYAS submissions to MIREX 2007. http://www.music-ir.org/mirex/abstracts/2007/AI_CC_GC_MC_AS_tzanetakis.pdf.
[41]
George Tzanetakis and Perry Cook. 2002. Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10, 5.
[42]
Ming-Ju Wu, Zhi-Sheng Chen, Jyh-Shing Jang, Jia-Min Ren, Yi-Hsung Li, and Chun-Hung Lu. 2011. Combining visual and acoustic features for music genre classification. In Proceedings of the International Conference on Machine Learning and Applications (ICMLA). Vol. 2, IEEE, 124--129.
[43]
Ting-Fan Wu, Chih-Jen Lin, and Ruby C. Weng. 2004. Probability estimates for multi-class classification by pairwise coupling. J. Mach. Learn. Res. 5, 975--1005.
[44]
C.-C. M. Yeh, Li Su, and Yi-Hsuan Yang. 2013. Dual-layer bag-of-frames model for music genre classification. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 246--250.

Cited By

View all
  • (2024)Classification and study of music genres with multimodal Spectro-Lyrical Embeddings for Music (SLEM)Multimedia Tools and Applications10.1007/s11042-024-19160-5Online publication date: 24-Apr-2024
  • (2023)PMG-Net: Persian music genre classification using deep neural networksEntertainment Computing10.1016/j.entcom.2022.10051844(100518)Online publication date: Jan-2023
  • (2023)Time-frequency visual representation and texture features for audio applications: a comprehensive review, recent trends, and challengesMultimedia Tools and Applications10.1007/s11042-023-14734-182:23(36143-36177)Online publication date: 16-Mar-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 12, Issue 1
August 2015
220 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/2816987
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2015
Accepted: 01 April 2015
Revised: 01 January 2015
Received: 01 September 2014
Published in TOMM Volume 12, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tag

  1. Music genre classification

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)2
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Classification and study of music genres with multimodal Spectro-Lyrical Embeddings for Music (SLEM)Multimedia Tools and Applications10.1007/s11042-024-19160-5Online publication date: 24-Apr-2024
  • (2023)PMG-Net: Persian music genre classification using deep neural networksEntertainment Computing10.1016/j.entcom.2022.10051844(100518)Online publication date: Jan-2023
  • (2023)Time-frequency visual representation and texture features for audio applications: a comprehensive review, recent trends, and challengesMultimedia Tools and Applications10.1007/s11042-023-14734-182:23(36143-36177)Online publication date: 16-Mar-2023
  • (2022)The Classification of Music and Art Genres under the Visual Threshold of Deep LearningComputational Intelligence and Neuroscience10.1155/2022/44397382022Online publication date: 1-Jan-2022
  • (2022)Stacked auto-encoders based visual features for speech/music classificationExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.118041208:COnline publication date: 1-Dec-2022
  • (2022)Classification of Music Genres Based on Mel-Frequency Cepstrum Coefficients Using Deep Learning ModelsDisruptive Technologies for Big Data and Cloud Applications10.1007/978-981-19-2177-3_83(891-907)Online publication date: 2-Aug-2022
  • (2021)A Middle-Level Learning Feature Interaction Method with Deep Learning for Multi-Feature Music Genre ClassificationElectronics10.3390/electronics1018220610:18(2206)Online publication date: 9-Sep-2021
  • (2021)Client-driven animated GIF generation framework using an acoustic featureMultimedia Tools and Applications10.1007/s11042-020-10236-680:28-29(35923-35940)Online publication date: 1-Nov-2021
  • (2020)Multi-Level Local Feature Coding Fusion for Music Genre RecognitionIEEE Access10.1109/ACCESS.2020.30176618(152713-152727)Online publication date: 2020
  • (2019)Double Coated VGG16 Architecture: An Enhanced Approach for Genre Classification of Spectrographic Representation of Musical Pieces2019 22nd International Conference on Computer and Information Technology (ICCIT)10.1109/ICCIT48885.2019.9038339(1-5)Online publication date: Dec-2019
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media