More Web Proxy on the site http://driver.im/

research-article

Real-time Sound Visualization via Multidimensional Clustering and Projections

Authors:

Ngan V.T. Nguyen,

Tommy DangAuthors Info & Claims

IAIT '21: Proceedings of the 12th International Conference on Advances in Information Technology

Article No.: 35, Pages 1 - 6

https://doi.org/10.1145/3468784.3471604

Published: 20 July 2021 Publication History

Abstract

Sound plays a vital role in every aspect of human life since it is one of the primary sensory information that our auditory system collects and allows us to perceive the world. Sound clustering and visualization is the process of collecting and analyzing audio samples; that process is a prerequisite of sound classification, which is the core of automatic speech recognition, virtual assistants, and text to speech applications. Nevertheless, understanding how to recognize and properly interpret complex, high-dimensional audio data is the most significant challenge in sound clustering and visualization. This paper proposed a web-based platform to visualize and cluster similar sound samples of musical notes and human speech in real-time. For visualizing high-dimensional data like audio, Mel-Frequency Cepstral Coefficients (MFCCs) were initially developed to represent the sounds made by the human vocal tract are extracted. Then, t-distributed Stochastic Neighbor Embedding (t-SNE), a dimensionality reduction technique, was designed for high dimensional datasets is applied. This paper focuses on both data clustering and high-dimensional visualization methods to properly present the clustering results in the most meaningful way to uncover potentially interesting behavioral patterns of musical notes played by different instruments.

References

[1]

F. Beritelli and R. Grasso. 2008. A pattern recognition system for environmental sound classification based on MFCCs and neural networks. In 2008 2nd International Conference on Signal Processing and Communication Systems. 1–4. https://doi.org/10.1109/ICSPCS.2008.4813723

[2]

Michael Bostock, Vadim Ogievetsky, and Jeffrey Heer. 2011. D3 Data-Driven Documents. IEEE Trans. Vis. Comput. Graph. 17, 12 (2011), 2301–2309.

Digital Library

[3]

Johannes Buchner. 2017. Synthetic Speech Commands: A public dataset for single-word speech recognition.Dataset available from https://www.kaggle.com/jbuchner/synthetic-speech-commands-dataset/(2017).

[4]

Tuan Nhon Dang and Leland Wilkinson. 2013. TimeExplorer: Similarity search time series by their signatures. In International Symposium on Visual Computing. Springer, 280–289.

Digital Library

[5]

Mikael Fernström and Eoin Brazil. 2001. Sonic browsing: An auditory tool for multimedia asset management. (2001), 132–135.

[6]

Jonathan T Foote and Matthew L Cooper. 2003. Media segmentation using self-similarity decomposition. 5021 (2003), 167–175.

[7]

John M. Grey. 1975. An Exploration of Musical Timbre. Ph.D., STAN-M-2(1975). https://ccrma.stanford.edu/files/papers/stanm2.pdf

[8]

Harold Hotelling. 1933. Analysis of a complex of statistical variables into principal components.Journal of educational psychology 24, 6 (1933), 417.

[9]

Hyoung-Gook Kim, N. Moreau, and T. Sikora. 2004. Audio Classification Based on MPEG-7 Spectral Basis Representations. IEEE Trans. Cir. and Sys. for Video Technol. 14, 5 (May 2004), 716–725. https://doi.org/10.1109/TCSVT.2004.826766

Digital Library

[10]

Leland McInnes, John Healy, and James Melville. 2018. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arxiv:1802.03426 [stat.ML]

[11]

Jonathan Rubin, Rui Abreu, Anurag Ganguli, Saigopal Nelaturi, Ion Matei, and Kumar Sricharan. 2016. Classifying heart sound recordings using deep convolutional neural networks and mel-frequency cepstral coefficients. In 2016 Computing in cardiology conference (CinC). IEEE, 813–816.

[12]

George Tzanetakis. 2002. Manipulation, Analysis and Retrieval Systems for Audio Signals. Ph.D. Dissertation. USA. Advisor(s) Cook, Perry. AAI3041872.

[13]

George Tzanetakis and Perry Cook. 2000. 3D graphics tools for sound collections. In Proc. COSTG6 Conference on Digital Audio Effects, DAFX. Fernström, Mikael Limerick.

[14]

G. Tzanetakis and P. Cook. 2002. Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing 10, 5(2002), 293–302.

[15]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data using t-SNE. Journal of Machine Learning Research 9 (2008), 2579–2605. http://www.jmlr.org/papers/v9/vandermaaten08a.html

[16]

C. Weinstein, S. S. McCandless, L. Mondshein, and V. Zue. 1975. A system for acoustic-phonetic analysis of continuous speech. IEEE Transactions on Acoustics, Speech, and Signal Processing 23, 1(1975), 54–67. https://doi.org/10.1109/TASSP.1975.1162651

Index Terms

Real-time Sound Visualization via Multidimensional Clustering and Projections

Index terms have been assigned to the content through auto-classification.

Recommendations

Dravidian language classification from speech signal using spectral and prosodic features

The interesting aspect of the Dravidian languages is a commonality through a shared script, similar vocabulary, and their common root language. In this work, an attempt has been made to classify the four complex Dravidian languages using cepstral ...
Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach

Dysarthria is a neurological impairment of controlling the motor speech articulators that compromises the speech signal. Automatic Speech Recognition (ASR) can be very helpful for speakers with dysarthria because the disabled persons are often ...
Text-Independent Speaker Identification Using Vowel Formants

Automatic speaker identification has become a challenging research problem due to its wide variety of applications. Neural networks and audio-visual identification systems can be very powerful, but they have limitations related to the number of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

IAIT '21: Proceedings of the 12th International Conference on Advances in Information Technology

June 2021

281 pages

ISBN:9781450390125

DOI:10.1145/3468784

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 July 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

IAIT2021

IAIT2021: The 12th International Conference on Advances in Information Technology

June 29 - July 1, 2021

Bangkok, Thailand

Acceptance Rates

Overall Acceptance Rate 20 of 47 submissions, 43%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
115
Total Downloads

Downloads (Last 12 months)38
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents