Music Emotion Classification with Source Separation Based MSB-Conformer

Menghao Fang¹⁰,
Xia Li¹¹,
Haoyu Wang¹¹,
Qiuxuan Wang¹²,
Xiushu Chen¹¹,
Shuo Zhang¹¹ &
…
Qifan Deng¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14865))

Included in the following conference series:

International Conference on Intelligent Computing

469 Accesses

Abstract

The paper presents a novel approach for classifying music emotions that addresses several limitations, including limited emotion categories, low accuracy, and counterintuitive results compared to the Valence-Arousal (V-A) model. The proposed method involves separating music sources using the MULTI-SCALE MULTI-BAND DENSENETS model, simplifying the complex music structure into vocals, drums, bass, and four other audio tracks. Time-series features and fixed attribute features are extracted from each track. The time-series features are inputted into the Conformer model, which utilizes convolutional neural networks for local feature extraction and Transformer encoders for capturing long-distance dependencies. The Conformer model output is combined with fixed attribute features and passed through fully connected layers for emotion classification. Experiments on a Netease Cloud Music dataset with 12 emotion categories demonstrated that the proposed MSB-Conformer model outperformed comparative models (CNN+LSTM, WaveNet, Transformer), achieving an average accuracy of 94.24% and surpassing the state-of-the-art in classification categories. The effectiveness and generality of the model were validated using the Emotify dataset. The proposed method simplifies music complexity through source separation and utilizes the Conformer model to simultaneously model local and global audio features, offering a novel approach to music emotion classification tasks.

M. Fang, X. Li, S. Zhang and Q. Deng—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 139.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 179.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An efficient speech emotion recognition based on a dual-stream CNN-transformer fusion network

Article 06 July 2023

Review of data features-based music emotion recognition methods

Article 03 August 2017

Predicting Music Emotion by Using Convolutional Neural Network

References

Takahashi, N., Mitsufuji, Y.: Multi-scale multi-band densenets for audio source separation. In: 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, pp. 21–25. IEEE (2017)
Google Scholar
Peng, Z., Guo, Z., Huang, W., et al.: Conformer: local features coupling global representations for recognition and detection. IEEE Trans. Pattern Anal. Mach. Intell. 45(8), 9454–9468 (2023)
Article Google Scholar
Fischer, M.T., Arya, D., Streeb, D., et al.: Visual analytics for temporal hypergraph model exploration. IEEE Trans. Vis. Comput. Graph. 27(2), 550–560 (2021)
Article Google Scholar
Nilsonne, G., Harrell, F.E.: EEG-based model and antidepressant response. Nat. Biotechnol. 39(1), 27 (2020)
Article Google Scholar
Li, B., Liu, X., Dinesh, K., et al.: Creating a multitrack classical music performance dataset for multimodal music analysis: challenges, insights, and applications. IEEE Trans. Multimed. 21(2), 522–535 (2019)
Article Google Scholar
Tong, G.: Music emotion classification method using improved deep belief network. Mob. Inf. Syst. 2022, 1–7 (2022)
Google Scholar
Xia, Y., Xu, F.: Study on music emotion recognition based on the machine learning model clustering algorithm. Math. Probl. Eng. 2022, 1–11 (2022)
Google Scholar
Tzanetakis, G., Essl, G., Cook, P.: Audio analysis using the discrete wavelet transform. In: Proceedings of the WSES International Conference on Acoustics and Music: Theory and Applications (2001)
Google Scholar
Pao, T.-L., Liao, W.-Y., Chen, Y.-T.: A weighted discrete KNN method for mandarin speech and emotion recognition. In: InTech (2008)
Google Scholar
Jin, A.W.Q.: Application of LDA to speaker recognition (2010)
Google Scholar
Wang, Y., et al.: UniSpeech: unified speech representation learning with labeled and unlabeled data. In: Proceedings of the International Conference on Machine Learning, pp. 10937–10947. PMLR (2021)
Google Scholar
Chen, S., et al.: WavLM: large-scale self-supervised pre-training for full stack speech processing. arXiv preprint arXiv:2110.13900 (2021)
Chen, Z., et al.: Large-scale self-supervised speech representation learning for automatic speaker verification. arXiv preprint arXiv:2110.05777 (2021)
Chen, S., et al.: Continuous speech separation with conformer. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5749–5753. IEEE (2021)
Google Scholar
Wu, Y.-C., Hayashi, T., Tobing, P.L., et al.: Quasi-periodic WaveNet: an autoregressive raw waveform generative model with pitch-dependent dilated convolution neural network. IEEE/ACM Trans. Audio Speech Lang. Process. (TBD)
Google Scholar
Yu, L., Wang, L., Yang, C., et al.: Analysis and implementation of a single stage transformer-less converter with high step-down voltage gain for voltage regulator modules. IEEE Trans. Indu. Electron. 68(12), 12239–12249 (2021)
Article Google Scholar
Music.163. https://music.163.com/
Yu, Y., Tong, X., Wang, H., et al.: Semantic-aware spatio-temporal app usage representation via graph convolutional network. In: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 4, no. 3, pp. 1–24 (2020)
Google Scholar
Aljanaki, A., Wiering, F., Veltkamp, R.C.: Studying emotion induced by music through a crowdsourcing game. Inf. Process. Manag. (2015)
Google Scholar
Zentner, M., Grandjean, D., Scherer, K.R.: Emotions evoked by the sound of music: characterization, classification, and measurement. Emotion 8(4), 494–521 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Cyber Science and Engineering, University of International Relations, Beijing, 100091, China
Menghao Fang
Marine Engineering College, Dalian Maritime University, Liaoning, 116026, China
Xia Li, Haoyu Wang, Xiushu Chen & Shuo Zhang
Navigation College, Dalian Maritime University, Liaoning, 116026, China
Qiuxuan Wang
College of Liberal Arts and Sciences, University of Colorado Denver, Denver, 69122, United States of America
Qifan Deng

Authors

Menghao Fang
View author publications
You can also search for this author in PubMed Google Scholar
Xia Li
View author publications
You can also search for this author in PubMed Google Scholar
Haoyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qiuxuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiushu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shuo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qifan Deng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Shuo Zhang or Qifan Deng .

Editor information

Editors and Affiliations

Eastern Institute of Technology, Ningbo, China
De-Shuang Huang
Tianjin University of Science and Technology, Tianjin, China
Chuanlei Zhang
China University of Mining and Technology, Xuzhou, China
Wei Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fang, M. et al. (2024). Music Emotion Classification with Source Separation Based MSB-Conformer. In: Huang, DS., Zhang, C., Chen, W. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2024. Lecture Notes in Computer Science, vol 14865. Springer, Singapore. https://doi.org/10.1007/978-981-97-5591-2_23

Download citation

DOI: https://doi.org/10.1007/978-981-97-5591-2_23
Published: 14 August 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5590-5
Online ISBN: 978-981-97-5591-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics