[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Music Emotion Classification with Source Separation Based MSB-Conformer

  • Conference paper
  • First Online:
Advanced Intelligent Computing Technology and Applications (ICIC 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14865))

Included in the following conference series:

  • 469 Accesses

Abstract

The paper presents a novel approach for classifying music emotions that addresses several limitations, including limited emotion categories, low accuracy, and counterintuitive results compared to the Valence-Arousal (V-A) model. The proposed method involves separating music sources using the MULTI-SCALE MULTI-BAND DENSENETS model, simplifying the complex music structure into vocals, drums, bass, and four other audio tracks. Time-series features and fixed attribute features are extracted from each track. The time-series features are inputted into the Conformer model, which utilizes convolutional neural networks for local feature extraction and Transformer encoders for capturing long-distance dependencies. The Conformer model output is combined with fixed attribute features and passed through fully connected layers for emotion classification. Experiments on a Netease Cloud Music dataset with 12 emotion categories demonstrated that the proposed MSB-Conformer model outperformed comparative models (CNN+LSTM, WaveNet, Transformer), achieving an average accuracy of 94.24% and surpassing the state-of-the-art in classification categories. The effectiveness and generality of the model were validated using the Emotify dataset. The proposed method simplifies music complexity through source separation and utilizes the Conformer model to simultaneously model local and global audio features, offering a novel approach to music emotion classification tasks.

M. Fang, X. Li, S. Zhang and Q. Deng—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 139.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 179.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Takahashi, N., Mitsufuji, Y.: Multi-scale multi-band densenets for audio source separation. In: 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, pp. 21–25. IEEE (2017)

    Google Scholar 

  2. Peng, Z., Guo, Z., Huang, W., et al.: Conformer: local features coupling global representations for recognition and detection. IEEE Trans. Pattern Anal. Mach. Intell. 45(8), 9454–9468 (2023)

    Article  Google Scholar 

  3. Fischer, M.T., Arya, D., Streeb, D., et al.: Visual analytics for temporal hypergraph model exploration. IEEE Trans. Vis. Comput. Graph. 27(2), 550–560 (2021)

    Article  Google Scholar 

  4. Nilsonne, G., Harrell, F.E.: EEG-based model and antidepressant response. Nat. Biotechnol. 39(1), 27 (2020)

    Article  Google Scholar 

  5. Li, B., Liu, X., Dinesh, K., et al.: Creating a multitrack classical music performance dataset for multimodal music analysis: challenges, insights, and applications. IEEE Trans. Multimed. 21(2), 522–535 (2019)

    Article  Google Scholar 

  6. Tong, G.: Music emotion classification method using improved deep belief network. Mob. Inf. Syst. 2022, 1–7 (2022)

    Google Scholar 

  7. Xia, Y., Xu, F.: Study on music emotion recognition based on the machine learning model clustering algorithm. Math. Probl. Eng. 2022, 1–11 (2022)

    Google Scholar 

  8. Tzanetakis, G., Essl, G., Cook, P.: Audio analysis using the discrete wavelet transform. In: Proceedings of the WSES International Conference on Acoustics and Music: Theory and Applications (2001)

    Google Scholar 

  9. Pao, T.-L., Liao, W.-Y., Chen, Y.-T.: A weighted discrete KNN method for mandarin speech and emotion recognition. In: InTech (2008)

    Google Scholar 

  10. Jin, A.W.Q.: Application of LDA to speaker recognition (2010)

    Google Scholar 

  11. Wang, Y., et al.: UniSpeech: unified speech representation learning with labeled and unlabeled data. In: Proceedings of the International Conference on Machine Learning, pp. 10937–10947. PMLR (2021)

    Google Scholar 

  12. Chen, S., et al.: WavLM: large-scale self-supervised pre-training for full stack speech processing. arXiv preprint arXiv:2110.13900 (2021)

  13. Chen, Z., et al.: Large-scale self-supervised speech representation learning for automatic speaker verification. arXiv preprint arXiv:2110.05777 (2021)

  14. Chen, S., et al.: Continuous speech separation with conformer. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5749–5753. IEEE (2021)

    Google Scholar 

  15. Wu, Y.-C., Hayashi, T., Tobing, P.L., et al.: Quasi-periodic WaveNet: an autoregressive raw waveform generative model with pitch-dependent dilated convolution neural network. IEEE/ACM Trans. Audio Speech Lang. Process. (TBD)

    Google Scholar 

  16. Yu, L., Wang, L., Yang, C., et al.: Analysis and implementation of a single stage transformer-less converter with high step-down voltage gain for voltage regulator modules. IEEE Trans. Indu. Electron. 68(12), 12239–12249 (2021)

    Article  Google Scholar 

  17. Music.163. https://music.163.com/

  18. Yu, Y., Tong, X., Wang, H., et al.: Semantic-aware spatio-temporal app usage representation via graph convolutional network. In: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 4, no. 3, pp. 1–24 (2020)

    Google Scholar 

  19. Aljanaki, A., Wiering, F., Veltkamp, R.C.: Studying emotion induced by music through a crowdsourcing game. Inf. Process. Manag. (2015)

    Google Scholar 

  20. Zentner, M., Grandjean, D., Scherer, K.R.: Emotions evoked by the sound of music: characterization, classification, and measurement. Emotion 8(4), 494–521 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Shuo Zhang or Qifan Deng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fang, M. et al. (2024). Music Emotion Classification with Source Separation Based MSB-Conformer. In: Huang, DS., Zhang, C., Chen, W. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2024. Lecture Notes in Computer Science, vol 14865. Springer, Singapore. https://doi.org/10.1007/978-981-97-5591-2_23

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-5591-2_23

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-5590-5

  • Online ISBN: 978-981-97-5591-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics