More Web Proxy on the site http://driver.im/

research-article

SonoSpace: Visual Feedback of Timbre with Unsupervised Learning

Authors:

Hiromi Nakamura,

Jun RekimotoAuthors Info & Claims

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

Pages 367 - 374

https://doi.org/10.1145/3394171.3413542

Published: 12 October 2020 Publication History

Abstract

One of the most difficult things in practicing musical instruments is improving timbre. Unlike pitch and rhythm, timbre is a high-dimensional and sensuous concept, and learners cannot evaluate their timbre by themselves. To efficiently improve their timbre control, learners generally need a teacher to provide feedback about timbre. However, hiring teachers is often expensive and sometimes difficult. Our goal is to develop a low-cost learning system that substitutes the teacher. We found that a variational autoencoder (VAE), which is an unsupervised neural network model, provides a 2-dimensional user-friendly mapping of timbre. Our system, SonoSpace, maps the learner's timbre into a 2D latent space extracted from an advanced player's performance. Seeing this 2D latent space, the learner can visually grasp the relative distance between their timbre and that of the advanced player. Although our system was evaluated mainly with an alto saxophone, SonoSpace could also be applied to other instruments, such as trumpets, flutes, and drums.

Supplementary Material

ZIP File (mmfp2257aux.zip)

This is a video that explains SonoSpace.

Download
106.01 MB

MP4 File (3394171.3413542.mp4)

This is a video presentation about our paper "SonoSpace: Visual Feedback of Timbre with Unsupervised Learning". It contains mainly the description of our method, system design and user study. Please refer to this video, especially as our research focuses on instruments and their sounds.

Download
187.58 MB

References

[1]

Jakob Abeßer, Johannes Hasselhorn, Christian Dittmar, Andreas Lehmann, and Sascha Grollmisch. 2013. Automatic quality assessment of vocal and instrumental performances of ninth-grade and tenth-grade pupils. In Proceedings of the International Symposium on Computer Music Multidisciplinary Research (CMMR), Marseille, France. 975--988.

[2]

Barics Bozkurt, Ozan Baysal, and D Yuret. 2017. A Dataset and Baseline System for Singing Voice Assessment. In Proceedings of the International Symposium on Computer Music Multidisciplinary Research (CMMR), Matosinhos, Portugal. 25--28.

[3]

François Chollet. Variational AutoEncoder. https://keras.io/examples/generative/vae/. (Accessed on 08/10/2020).

[4]

Hanjun Dai, Yingtao Tian, Bo Dai, Steven Skiena, and Le Song. 2018. Syntax-Directed Variational Autoencoder for Structured Data. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada.

[5]

Sergio Giraldo, George Waddell, Ignasi Nou, Ariadna Ortega, Oscar Mayor, Alfonso Perez, Aaron Williamon, and Rafael Ramirez. 2019. Automatic Assessment of Tone Quality in Violin Music Performance. Frontiers in Psychology, Vol. 10 (2019), 334. https://doi.org/10.3389/fpsyg.2019.00334

[6]

Geoffrey E Hinton and Ruslan R Salakhutdinov. 2006. Reducing the Dimensionality of Data with Neural Networks. Science (New York, N.Y.), Vol. 313 (08 2006), 504--7. https://doi.org/10.1126/science.1127647

[7]

Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1312.6114

[8]

Trevor Knight, Finn Upham, and Ichiro Fujinaga. 2011. The potential for automatic assessment of trumpet tone quality. In Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011, Miami, Florida, USA, October 24-28, Anssi Klapuri and Colby Leider (Eds.). University of Miami, 573--578.

[9]

Yann LeCun, Corinna Cortes, and CJ Burges. 2010. MNIST handwritten digit database. ATT Labs [Online], Vol. 2 (2010). http://yann.lecun.com/exdb/mnist

[10]

Pei-Ching Li, Li Su, Yi-Hsuan Yang, and Alvin W. Y. Su. 2015. Analysis of Expressive Musical Terms in Violin Using Score-Informed and Expression-Based Audio Features. In Proceedings of the 16th International Society for Music Information Retrieval Conference, ISMIR 2015, Má laga, Spain, October, Meinard Mü ller and Frans Wiering (Eds.). 809--815.

[11]

librosa development team. 2020. LibROSA. https://librosa.org/doc/latest/index.html. (Accessed on 01/30/2020).

[12]

Yin-Jyun Luo, Li Su, Yi-Hsuan Yang, and Tai-Shih Chi. 2015. Detection of Common Mistakes in Novice Violin Playing. In Proceedings of the 16th International Society for Music Information Retrieval Conference, ISMIR 2015, Má laga, Spain, October 26-30, Meinard Mü ller and Frans Wiering (Eds.). 316--322.

[13]

makemusic. SmartMusic | Music Learning Software for Educators & Students. https://www.smartmusic.com/. (Accessed on 01/31/2020).

[14]

Tomoyasu Nakano, Masataka Goto, and Yuzuru Hiraga. 2006. An automatic singing skill evaluation method for unknown melodies using pitch interval accuracy and vibrato features. In INTERSPEECH 2006 - ICSLP, Ninth International Conference on Spoken Language Processing, Pittsburgh, PA, USA, September 17--21. ISCA.

[15]

Oy. 2020. Yousician | Learn Guitar, Piano, Ukulele With The Songs you Love. https://yousician.com/. (Accessed on 01/31/2020).

[16]

Kumar Ashis Pati, Siddharth Gururani, and Alexander Lerch. 2018. Assessment of Student Music Performances Using Deep Neural Networks. Applied Sciences, Vol. 8, 4 (2018). https://doi.org/10.3390/app8040507

[17]

Hubert Pham. 2006. PyAudio Documentation. https://people.csail.mit.edu/hubert/pyaudio/docs/. (Accessed on 01/30/2020).

[18]

Matt Prockup. Percussion Dataset. http://www.mattprockup.com/percussion-dataset (Accessed on 05/20/2020).

[19]

Matthew Prockup, Erik M. Schmidt, Jeffrey J. Scott, and Youngmoo E. Kim. 2013. Toward Understanding Expressive Percussion Through Content Based Analysis. In Proceedings of the 14th International Society for Music Information Retrieval Conference, ISMIR 2013, Curitiba, Brazil, November 4-8, Alceu de Souza Britto Jr., Fabien Gouyon, and Simon Dixon (Eds.). 143--148.

[20]

Oriol Romani Picas, Hector Parra Rodriguez, Dara Dabiri, Hiroshi Tokuda, Wataru Hariya, Koji Oishi, and Xavier Serra. 2015. A real-time system for measuring sound goodness in instrumental sounds. In Audio Engineering Society Convention 138. Audio Engineering Society.

[21]

Carl Seashore. 1936. University of Iowa Studies in the Psychology of Music. Iowa city: University of Iowa (1936).

[22]

Noé Tits, Fengna Wang, Kevin El Haddad, Vincent Pagel, and Thierry Dutoit. 2019. Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis Through Audio Analysis. 4475--4479. https://doi.org/10.21437/Interspeech.2019-1426

[23]

Amruta Vidwans, Siddharth Gururani, Chih-Wei Wu, Vinod Subramanian, Rupak Vignesh Swaminathan, and Alexander Lerch. 2017. Objective descriptors for the assessment of student music performances. In Audio Engineering Society Conference: 2017 AES International Conference on Semantic Audio. Audio Engineering Society.

[24]

Dongfang Wang and Jin Gu. 2018. VASC: dimension reduction and visualization of single-cell RNA-seq data by deep variational autoencoder. Genomics, proteomics & bioinformatics, Vol. 16, 5 (2018), 320--331.

[25]

Ryad Zemouri, Melanie Levesque, Normand Amyot, Claude Hudon, Olivier Kokoko, and Antoine Tahan. 2019. Deep Convolutional Variational Autoencoder as a 2D-Visualization Tool for Partial Discharge Source Classification in Hydrogenerators. IEEE Access, Vol. PP (12 2019), 1--1. https://doi.org/10.1109/ACCESS.2019.2962775

Cited By

Arai KHirao YNarumi TNakamura TTakamichi SYoshida S(2023)TimToShape: Supporting Practice of Musical Instruments by Visualizing Timbre with 2D Shapes based on Crossmodal CorrespondencesProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584053(850-865)Online publication date: 27-Mar-2023
https://dl.acm.org/doi/10.1145/3581641.3584053
Ogawa KSawada SKatsurada KOhmura H(2023)Automatic Detection of Poor Tone Quality in Classical Guitar Playing Using Deep Anomaly Detection Method2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)10.1109/WASPAA58266.2023.10248058(1-5)Online publication date: 22-Oct-2023
https://doi.org/10.1109/WASPAA58266.2023.10248058
Wang HSourin A(2023)Deep Learning-based Visualization of Music Mood2023 International Conference on Cyberworlds (CW)10.1109/CW58918.2023.00015(32-39)Online publication date: 3-Oct-2023
https://doi.org/10.1109/CW58918.2023.00015
Show More Cited By

Index Terms

SonoSpace: Visual Feedback of Timbre with Unsupervised Learning
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI design and evaluation methods
      1. User studies
    2. Interaction devices
      1. Haptic devices

Recommendations

Zound: An interactive live electronics
AM '17: Proceedings of the 12th International Audio Mostly Conference on Augmented and Participatory Sound and Music Experiences

An interactive live electronics, completely controlled real-time by the mouth harp's (Jew's harp, Jaw harp) acoustic signal has been created, using Max (™ Cycling '74). The system interacts according to the various timbre's features of the instrument, ...
Managing gesture and timbre for analysis and instrument control in an interactive environment
NIME '06: Proceedings of the 2006 conference on New interfaces for musical expression

This paper describes recent enhancements in an interactive system designed to improvise with saxophonist John Butcher [1]. In addition to musical parameters such as pitch and loudness, our system is able to analyze timbral characteristics of the ...
End-to-End Classification of Ballroom Dancing Music Using Machine Learning
Perception, Representations, Image, Sound, Music
Abstract
The term ‘ballroom dancing’ refers to a social and competitive type of partnered dance. Competitive ballroom dancing consists of 10 different types of dances performed to specific styles of music unique to each type of dance. There are few ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

October 2020

4889 pages

ISBN:9781450379885

DOI:10.1145/3394171

General Chairs:
Chang Wen Chen
Chinese University of Hong Kong, Shenzhen, China
,
Rita Cucchiara
UNIMORE, Italy
,
Xian-Sheng Hua
Alibaba Group, China
,
Program Chairs:
Guo-Jun Qi
Futurewei Technologies, USA
,
Elisa Ricci
UNITN & Fondazione Bruno Kessler, Italy
,
Zhengyou Zhang
Tencent, China
,
Roger Zimmermann
National University of Singapore, Singapore

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '20

Sponsor:

SIGMM

MM '20: The 28th ACM International Conference on Multimedia

October 12 - 16, 2020

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
271
Total Downloads

Downloads (Last 12 months)24
Downloads (Last 6 weeks)1

Reflects downloads up to 30 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Arai KHirao YNarumi TNakamura TTakamichi SYoshida S(2023)TimToShape: Supporting Practice of Musical Instruments by Visualizing Timbre with 2D Shapes based on Crossmodal CorrespondencesProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584053(850-865)Online publication date: 27-Mar-2023
https://dl.acm.org/doi/10.1145/3581641.3584053
Ogawa KSawada SKatsurada KOhmura H(2023)Automatic Detection of Poor Tone Quality in Classical Guitar Playing Using Deep Anomaly Detection Method2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)10.1109/WASPAA58266.2023.10248058(1-5)Online publication date: 22-Oct-2023
https://doi.org/10.1109/WASPAA58266.2023.10248058
Wang HSourin A(2023)Deep Learning-based Visualization of Music Mood2023 International Conference on Cyberworlds (CW)10.1109/CW58918.2023.00015(32-39)Online publication date: 3-Oct-2023
https://doi.org/10.1109/CW58918.2023.00015
Kuroda JKoutaki G(2022)Sensing Control Parameters of Flute from Microphone Sound Based on Machine Learning from Robotic PerformerSensors10.3390/s2205207422:5(2074)Online publication date: 7-Mar-2022
https://doi.org/10.3390/s22052074
Sun J(2022)Interactive piano Learning Systems: implementing the Suzuki Method in web-based classroomsEducation and Information Technologies10.1007/s10639-022-11290-328:3(3401-3416)Online publication date: 22-Sep-2022
https://doi.org/10.1007/s10639-022-11290-3
Arai KKonno MHirao YYoshida SNarumi T(2021)Effect of Visual Feedback on Understanding Timbre with Shapes Based on Crossmodal CorrespondencesProceedings of the 27th ACM Symposium on Virtual Reality Software and Technology10.1145/3489849.3489912(1-3)Online publication date: 8-Dec-2021
https://dl.acm.org/doi/10.1145/3489849.3489912
Li Y(2021)The role of innovative approaches in teaching the flute: A path to creative realization of studentsThinking Skills and Creativity10.1016/j.tsc.2021.10095942(100959)Online publication date: Dec-2021
https://doi.org/10.1016/j.tsc.2021.100959

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents