[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3394171.3413542acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

SonoSpace: Visual Feedback of Timbre with Unsupervised Learning

Published: 12 October 2020 Publication History

Abstract

One of the most difficult things in practicing musical instruments is improving timbre. Unlike pitch and rhythm, timbre is a high-dimensional and sensuous concept, and learners cannot evaluate their timbre by themselves. To efficiently improve their timbre control, learners generally need a teacher to provide feedback about timbre. However, hiring teachers is often expensive and sometimes difficult. Our goal is to develop a low-cost learning system that substitutes the teacher. We found that a variational autoencoder (VAE), which is an unsupervised neural network model, provides a 2-dimensional user-friendly mapping of timbre. Our system, SonoSpace, maps the learner's timbre into a 2D latent space extracted from an advanced player's performance. Seeing this 2D latent space, the learner can visually grasp the relative distance between their timbre and that of the advanced player. Although our system was evaluated mainly with an alto saxophone, SonoSpace could also be applied to other instruments, such as trumpets, flutes, and drums.

Supplementary Material

ZIP File (mmfp2257aux.zip)
This is a video that explains SonoSpace.
MP4 File (3394171.3413542.mp4)
This is a video presentation about our paper "SonoSpace: Visual Feedback of Timbre with Unsupervised Learning". It contains mainly the description of our method, system design and user study. Please refer to this video, especially as our research focuses on instruments and their sounds.

References

[1]
Jakob Abeßer, Johannes Hasselhorn, Christian Dittmar, Andreas Lehmann, and Sascha Grollmisch. 2013. Automatic quality assessment of vocal and instrumental performances of ninth-grade and tenth-grade pupils. In Proceedings of the International Symposium on Computer Music Multidisciplinary Research (CMMR), Marseille, France. 975--988.
[2]
Barics Bozkurt, Ozan Baysal, and D Yuret. 2017. A Dataset and Baseline System for Singing Voice Assessment. In Proceedings of the International Symposium on Computer Music Multidisciplinary Research (CMMR), Matosinhos, Portugal. 25--28.
[3]
François Chollet. Variational AutoEncoder. https://keras.io/examples/generative/vae/. (Accessed on 08/10/2020).
[4]
Hanjun Dai, Yingtao Tian, Bo Dai, Steven Skiena, and Le Song. 2018. Syntax-Directed Variational Autoencoder for Structured Data. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada.
[5]
Sergio Giraldo, George Waddell, Ignasi Nou, Ariadna Ortega, Oscar Mayor, Alfonso Perez, Aaron Williamon, and Rafael Ramirez. 2019. Automatic Assessment of Tone Quality in Violin Music Performance. Frontiers in Psychology, Vol. 10 (2019), 334. https://doi.org/10.3389/fpsyg.2019.00334
[6]
Geoffrey E Hinton and Ruslan R Salakhutdinov. 2006. Reducing the Dimensionality of Data with Neural Networks. Science (New York, N.Y.), Vol. 313 (08 2006), 504--7. https://doi.org/10.1126/science.1127647
[7]
Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1312.6114
[8]
Trevor Knight, Finn Upham, and Ichiro Fujinaga. 2011. The potential for automatic assessment of trumpet tone quality. In Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011, Miami, Florida, USA, October 24-28, Anssi Klapuri and Colby Leider (Eds.). University of Miami, 573--578.
[9]
Yann LeCun, Corinna Cortes, and CJ Burges. 2010. MNIST handwritten digit database. ATT Labs [Online], Vol. 2 (2010). http://yann.lecun.com/exdb/mnist
[10]
Pei-Ching Li, Li Su, Yi-Hsuan Yang, and Alvin W. Y. Su. 2015. Analysis of Expressive Musical Terms in Violin Using Score-Informed and Expression-Based Audio Features. In Proceedings of the 16th International Society for Music Information Retrieval Conference, ISMIR 2015, Má laga, Spain, October, Meinard Mü ller and Frans Wiering (Eds.). 809--815.
[11]
librosa development team. 2020. LibROSA. https://librosa.org/doc/latest/index.html. (Accessed on 01/30/2020).
[12]
Yin-Jyun Luo, Li Su, Yi-Hsuan Yang, and Tai-Shih Chi. 2015. Detection of Common Mistakes in Novice Violin Playing. In Proceedings of the 16th International Society for Music Information Retrieval Conference, ISMIR 2015, Má laga, Spain, October 26-30, Meinard Mü ller and Frans Wiering (Eds.). 316--322.
[13]
makemusic. SmartMusic | Music Learning Software for Educators & Students. https://www.smartmusic.com/. (Accessed on 01/31/2020).
[14]
Tomoyasu Nakano, Masataka Goto, and Yuzuru Hiraga. 2006. An automatic singing skill evaluation method for unknown melodies using pitch interval accuracy and vibrato features. In INTERSPEECH 2006 - ICSLP, Ninth International Conference on Spoken Language Processing, Pittsburgh, PA, USA, September 17--21. ISCA.
[15]
Oy. 2020. Yousician | Learn Guitar, Piano, Ukulele With The Songs you Love. https://yousician.com/. (Accessed on 01/31/2020).
[16]
Kumar Ashis Pati, Siddharth Gururani, and Alexander Lerch. 2018. Assessment of Student Music Performances Using Deep Neural Networks. Applied Sciences, Vol. 8, 4 (2018). https://doi.org/10.3390/app8040507
[17]
Hubert Pham. 2006. PyAudio Documentation. https://people.csail.mit.edu/hubert/pyaudio/docs/. (Accessed on 01/30/2020).
[18]
Matt Prockup. Percussion Dataset. http://www.mattprockup.com/percussion-dataset (Accessed on 05/20/2020).
[19]
Matthew Prockup, Erik M. Schmidt, Jeffrey J. Scott, and Youngmoo E. Kim. 2013. Toward Understanding Expressive Percussion Through Content Based Analysis. In Proceedings of the 14th International Society for Music Information Retrieval Conference, ISMIR 2013, Curitiba, Brazil, November 4-8, Alceu de Souza Britto Jr., Fabien Gouyon, and Simon Dixon (Eds.). 143--148.
[20]
Oriol Romani Picas, Hector Parra Rodriguez, Dara Dabiri, Hiroshi Tokuda, Wataru Hariya, Koji Oishi, and Xavier Serra. 2015. A real-time system for measuring sound goodness in instrumental sounds. In Audio Engineering Society Convention 138. Audio Engineering Society.
[21]
Carl Seashore. 1936. University of Iowa Studies in the Psychology of Music. Iowa city: University of Iowa (1936).
[22]
Noé Tits, Fengna Wang, Kevin El Haddad, Vincent Pagel, and Thierry Dutoit. 2019. Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis Through Audio Analysis. 4475--4479. https://doi.org/10.21437/Interspeech.2019-1426
[23]
Amruta Vidwans, Siddharth Gururani, Chih-Wei Wu, Vinod Subramanian, Rupak Vignesh Swaminathan, and Alexander Lerch. 2017. Objective descriptors for the assessment of student music performances. In Audio Engineering Society Conference: 2017 AES International Conference on Semantic Audio. Audio Engineering Society.
[24]
Dongfang Wang and Jin Gu. 2018. VASC: dimension reduction and visualization of single-cell RNA-seq data by deep variational autoencoder. Genomics, proteomics & bioinformatics, Vol. 16, 5 (2018), 320--331.
[25]
Ryad Zemouri, Melanie Levesque, Normand Amyot, Claude Hudon, Olivier Kokoko, and Antoine Tahan. 2019. Deep Convolutional Variational Autoencoder as a 2D-Visualization Tool for Partial Discharge Source Classification in Hydrogenerators. IEEE Access, Vol. PP (12 2019), 1--1. https://doi.org/10.1109/ACCESS.2019.2962775

Cited By

View all
  • (2023)TimToShape: Supporting Practice of Musical Instruments by Visualizing Timbre with 2D Shapes based on Crossmodal CorrespondencesProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584053(850-865)Online publication date: 27-Mar-2023
  • (2023)Automatic Detection of Poor Tone Quality in Classical Guitar Playing Using Deep Anomaly Detection Method2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)10.1109/WASPAA58266.2023.10248058(1-5)Online publication date: 22-Oct-2023
  • (2023)Deep Learning-based Visualization of Music Mood2023 International Conference on Cyberworlds (CW)10.1109/CW58918.2023.00015(32-39)Online publication date: 3-Oct-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '20: Proceedings of the 28th ACM International Conference on Multimedia
October 2020
4889 pages
ISBN:9781450379885
DOI:10.1145/3394171
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep neural network
  2. machine learning
  3. music performance analysis
  4. music practice
  5. timbre analysis
  6. variational auto encoder

Qualifiers

  • Research-article

Conference

MM '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)24
  • Downloads (Last 6 weeks)1
Reflects downloads up to 30 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)TimToShape: Supporting Practice of Musical Instruments by Visualizing Timbre with 2D Shapes based on Crossmodal CorrespondencesProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584053(850-865)Online publication date: 27-Mar-2023
  • (2023)Automatic Detection of Poor Tone Quality in Classical Guitar Playing Using Deep Anomaly Detection Method2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)10.1109/WASPAA58266.2023.10248058(1-5)Online publication date: 22-Oct-2023
  • (2023)Deep Learning-based Visualization of Music Mood2023 International Conference on Cyberworlds (CW)10.1109/CW58918.2023.00015(32-39)Online publication date: 3-Oct-2023
  • (2022)Sensing Control Parameters of Flute from Microphone Sound Based on Machine Learning from Robotic PerformerSensors10.3390/s2205207422:5(2074)Online publication date: 7-Mar-2022
  • (2022)Interactive piano Learning Systems: implementing the Suzuki Method in web-based classroomsEducation and Information Technologies10.1007/s10639-022-11290-328:3(3401-3416)Online publication date: 22-Sep-2022
  • (2021)Effect of Visual Feedback on Understanding Timbre with Shapes Based on Crossmodal CorrespondencesProceedings of the 27th ACM Symposium on Virtual Reality Software and Technology10.1145/3489849.3489912(1-3)Online publication date: 8-Dec-2021
  • (2021)The role of innovative approaches in teaching the flute: A path to creative realization of studentsThinking Skills and Creativity10.1016/j.tsc.2021.10095942(100959)Online publication date: Dec-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media