More Web Proxy on the site http://driver.im/

research-article

Search Result Clustering in Collaborative Sound Collections

Authors:

Xavier SerraAuthors Info & Claims

ICMR '20: Proceedings of the 2020 International Conference on Multimedia Retrieval

Pages 207 - 214

https://doi.org/10.1145/3372278.3390691

Published: 08 June 2020 Publication History

Abstract

The large size of nowadays' online multimedia databases makes retrieving their content a difficult and time-consuming task. Users of online sound collections typically submit search queries that express a broad intent, often making the system return large and unmanageable result sets. Search Result Clustering is a technique that organises search-result content into coherent groups, which allows users to identify useful subsets in their results. Obtaining coherent and distinctive clusters that can be explored with a suitable interface is crucial for making this technique a useful complement of traditional search engines. In our work, we propose a graph-based approach using audio features for clustering diverse sound collections obtained when querying large online databases. We propose an approach to assess the performance of different features at scale, by taking advantage of the metadata associated with each sound. This analysis is complemented with an evaluation using ground-truth labels from manually annotated datasets. We show that using a confidence measure for discarding inconsistent clusters improves the quality of the partitions. After identifying the most appropriate features for clustering, we conduct an experiment with users performing a sound design task, in order to evaluate our approach and its user interface. A qualitative analysis is carried out including usability questionnaires and semi-structured interviews. This provides us with valuable new insights regarding the features that promote efficient interaction with the clusters.

References

[1]

Martin Aumüller, Erik Bernhardsson, and Alexander Faithfull. 2020. ANN-benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Information Systems, Vol. 87 (2020), 101374.

Digital Library

[2]

Alan W Black and Paul A Taylor. 1997. Automatically clustering similar units for unit selection in speech synthesis. (1997).

[3]

Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, Vol. 2008, 10 (2008), P10008.

[4]

Dmitry Bogdanov, Nicolas Wack, Emilia Gómez Gutiérrez, Sankalp Gulati, Perfecto Herrera Boyer, Oscar Mayor, Gerard Roma Trepat, Justin Salamon, José Ricardo Zapata González, and Xavier Serra. 2013. Essentia: An audio analysis library for music information retrieval. In 14th Conference of the International Society for Music Information Retrieval, 2013. p. 493--8.

[5]

Tadeusz Cali'nski and Jerzy Harabasz. 1974. A dendrite method for cluster analysis. Communications in Statistics-theory and Methods, Vol. 3, 1 (1974), 1--27.

[6]

Claudio Carpineto, Stanislaw Osi'nski, Giovanni Romano, and Dawid Weiss. 2009. A survey of web clustering engines. ACM Computing Surveys (CSUR), Vol. 41, 3 (2009), 17.

Digital Library

[7]

Lawrence Cayton. 2008. Fast nearest neighbor retrieval for bregman divergences. In Proceedings of the 25th international conference on Machine learning. ACM, 112--119.

Digital Library

[8]

Keunwoo Choi, György Fazekas, Mark Sandler, and Kyunghyun Cho. 2017. Transfer learning for music classification and regression tasks. arXiv preprint arXiv:1703.09179 (2017).

[9]

Scott Deerwester, Susan T Dumais, George W Furnas, Thomas K Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American society for information science, Vol. 41, 6 (1990), 391--407.

[10]

Wei Dong, Charikar Moses, and Kai Li. 2011. Efficient k-nearest neighbor graph construction for generic similarity measures. In Proceedings of the 20th international conference on World wide web. ACM, 577--586.

Digital Library

[11]

Antti Eronen and Anssi Klapuri. 2000. Musical instrument recognition using cepstral coefficients and temporal features. In Acoustics, Speech, and Signal Processing, 2000. ICASSP'00. Proceedings. 2000 IEEE International Conference on, Vol. 2. IEEE, II753--II756.

[12]

Adil Fahad, Najlaa Alshatri, Zahir Tari, Abdullah Alamri, Ibrahim Khalil, Albert Y Zomaya, Sebti Foufou, and Abdelaziz Bouras. 2014. A survey of clustering algorithms for big data: Taxonomy and empirical analysis. IEEE transactions on emerging topics in computing, Vol. 2, 3 (2014), 267--279.

[13]

Per Fallgren, Zofia Malisz, and Jens Edlund. 2018. A Tool for Exploring Large Amounts of Found Audio Data. In DHN. 499--503.

[14]

Eduardo Fonseca, Jordi Pons Puig, Xavier Favory, Frederic Font Corbera, Dmitry Bogdanov, Andres Ferraro, Sergio Oramas, Alastair Porter, and Xavier Serra. 2017. Freesound datasets: a platform for the creation of open audio datasets. In Hu X, Cunningham SJ, Turnbull D, Duan Z, editors. Proceedings of the 18th ISMIR Conference; 2017 oct 23--27; Suzhou, China.[Canada]: International Society for Music Information Retrieval; 2017. p. 486--93. International Society for Music Information Retrieval (ISMIR).

[15]

Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia. ACM, 411--412.

Digital Library

[16]

Frederic Font, Gerard Roma, and Xavier Serra. 2018. Sound sharing and retrieval. In Computational Analysis of Sound Scenes and Events. Springer, 279--301.

[17]

Jort F Gemmeke, Daniel PW Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R Channing Moore, Manoj Plakal, and Marvin Ritter. 2017. Audio set: An ontology and human-labeled dataset for audio events. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 776--780.

Digital Library

[18]

Perfecto Herrera-Boyer, Geoffroy Peeters, and Shlomo Dubnov. 2003. Automatic classification of musical instrument sounds. Journal of New Music Research, Vol. 32, 1 (2003), 3--21.

[19]

Shawn Hershey, Sourish Chaudhuri, Daniel PW Ellis, Jort F Gemmeke, Aren Jansen, R Channing Moore, Manoj Plakal, Devin Platt, Rif A Saurous, Bryan Seybold, et al. 2017. CNN architectures for large-scale audio classification. In 2017 ieee international conference on acoustics, speech and signal processing (icassp). IEEE, 131--135.

[20]

Hilary Hutchinson, Wendy Mackay, Bo Westerlund, Benjamin B Bederson, et al. 2003. Technology probes: inspiring design for and with families. In Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 17--24.

Digital Library

[21]

Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing. ACM, 604--613.

Digital Library

[22]

Aren Jansen, Jort F Gemmeke, Daniel PW Ellis, Xiaofeng Liu, Wade Lawrence, and Dylan Freedman. 2017. Large-scale audio event discovery in one million youtube videos. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 786--790.

Digital Library

[23]

Youngmoo E Kim, Erik M Schmidt, Raymond Migneco, Brandon G Morton, Patrick Richardson, Jeffrey Scott, Jacquelin A Speck, and Douglas Turnbull. 2010. Music emotion recognition: A state of the art review. In Proc. ISMIR, Vol. 86. Citeseer, 937--952.

[24]

Yanchi Liu, Zhongmou Li, Hui Xiong, Xuedong Gao, and Junjie Wu. 2010. Understanding of internal clustering validation measures. In 2010 IEEE International Conference on Data Mining. IEEE, 911--916.

Digital Library

[25]

Christopher Manning, Prabhakar Raghavan, and Hinrich Schütze. 2010. Introduction to information retrieval. Natural Language Engineering, Vol. 16, 1 (2010), 100--103.

[26]

Luis Gustavo Martins, Juan José Burred, George Tzanetakis, and Mathieu Lagrange. 2007. Polyphonic instrument recognition using spectral clustering. In ISMIR. 213--218.

[27]

Robert Neumayer, Thomas Lidy, and Andreas Rauber. 2005. Content-based organization of digital audio collections .na.

[28]

Mark EJ Newman and Michelle Girvan. 2004. Finding and evaluating community structure in networks. Physical review E, Vol. 69, 2 (2004), 026113.

[29]

Jakob Nielsen. 2000. Why You Only Need to Test with 5 Users. Jakob Nielsens Alertbox, Vol. 19, September 23 (2000), 1--4. https://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/ http://www.useit.com/alertbox/20000319.html

[30]

Maria E Niessen, Tim LM Van Kasteren, and Andreas Merentitis. 2013. Hierarchical modeling using automated sub-clustering for sound event recognition. In 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. IEEE, 1--4.

[31]

Geoffroy Peeters, Bruno L Giordano, Patrick Susini, Nicolas Misdariis, and Stephen McAdams. 2011. The timbre toolbox: Extracting audio descriptors from musical signals. The Journal of the Acoustical Society of America, Vol. 130, 5 (2011), 2902--2916.

[32]

Georgios Petkos, Manos Schinas, Symeon Papadopoulos, and Yiannis Kompatsiaris. 2017. Graph-based multimodal clustering for social multimedia. Multimedia Tools and Applications, Vol. 76, 6 (2017), 7897--7919.

Digital Library

[33]

Jordi Pons, Oriol Nieto, Matthew Prockup, Erik Schmidt, Andreas Ehmann, and Xavier Serra. 2017. End-to-end learning for music audio tagging at scale. arXiv preprint arXiv:1711.02520 (2017).

[34]

Gerard Roma, Anna Xambó, Perfecto Herrera, and Robin Laney. 2012. Factors in human recognition of timbre lexicons generated by data clustering. (2012).

[35]

Gerard Roma Trepat et al. 2015. Algorithms and representations for supporting online music creation with large-scale audio databases. (2015).

[36]

Simone Romano, Nguyen Xuan Vinh, James Bailey, and Karin Verspoor. 2016. Adjusting for chance clustering comparison measures. The Journal of Machine Learning Research, Vol. 17, 1 (2016), 4635--4666.

Digital Library

[37]

Gerard Salton. 1989. Automatic text processing: The transformation, analysis, and retrieval of. Reading: Addison-Wesley, Vol. 169 (1989).

[38]

James Sinclair and Michael Cardew-Hall. 2008. The folksonomy tag cloud: when is it useful? Journal of Information Science, Vol. 34, 1 (2008), 15--29.

Digital Library

[39]

Daniel Tunkelang. 2009. Faceted search. Synthesis lectures on information concepts, retrieval, and services, Vol. 1, 1 (2009), 1--80.

Digital Library

[40]

George Tzanetakis and Perry Cook. 2001. Marsyas3D: a prototype audio browser-editor using a large scale immersive visual and audio display. Georgia Institute of Technology.

[41]

George Tzanetakis and Perry Cook. 2002. Musical genre classification of audio signals. IEEE Transactions on speech and audio processing, Vol. 10, 5 (2002), 293--302.

[42]

Nguyen Xuan Vinh, Julien Epps, and James Bailey. 2010. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, Vol. 11, Oct (2010), 2837--2854.

Digital Library

[43]

Kanit Wongsuphasawat, Dominik Moritz, Anushka Anand, Jock Mackinlay, Bill Howe, and Jeffrey Heer. 2015. Voyager: Exploratory analysis via faceted browsing of visualization recommendations. IEEE transactions on visualization and computer graphics, Vol. 22, 1 (2015), 649--658.

[44]

Dongkuan Xu and Yingjie Tian. 2015. A comprehensive survey of clustering algorithms. Annals of Data Science, Vol. 2, 2 (2015), 165--193.

[45]

Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks?. In Advances in neural information processing systems. 3320--3328.

Cited By

Tekli J(2022)An overview of cluster-based image search result organization: background, techniques, and ongoing challengesKnowledge and Information Systems10.1007/s10115-021-01650-964:3(589-642)Online publication date: 11-Feb-2022
https://dl.acm.org/doi/10.1007/s10115-021-01650-9
Fonseca EFavory XPons JFont FSerra X(2021)FSD50K: An Open Dataset of Human-Labeled Sound EventsIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2021.313320830(829-852)Online publication date: 10-Dec-2021
https://dl.acm.org/doi/10.1109/TASLP.2021.3133208

Index Terms

Search Result Clustering in Collaborative Sound Collections

Recommendations

Search result presentation based on faceted clustering
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

We propose a competence partitioning strategy for Web search result presentation: the unmodified head of a ranked result list is combined with a clustering of documents from the result list tail. We identify two principles to which such a clustering ...
Performance Evaluation of Some Clustering Algorithms and Validity Indices

In this article, we evaluate the performance of three clustering algorithms, hard K-Means, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validity indices, namely Davies-Bouldin index, Dunn's index, ...
Automatic search interface clustering and search result processing in metasearch engine

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '20: Proceedings of the 2020 International Conference on Multimedia Retrieval

June 2020

605 pages

ISBN:9781450370875

DOI:10.1145/3372278

General Chairs:
Cathal Gurrin
Dublin City University, Ireland
,
Björn Þór Jónsson
IT University of Copenhagen, Denmark
,
Noriko Kando
National Institute of Informatics, Tokyo
,
Program Chairs:
Klaus Schoeffmann
Klagenfurt University, Austria
,
Phoebe Chen
La Trobe University, Australia
,
Noel E. O'Connor
Dublin City University, Ireland

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICMR '20

Sponsor:

SIGMM

ICMR '20: International Conference on Multimedia Retrieval

June 8 - 11, 2020

Dublin, Ireland

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
97
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)1

Reflects downloads up to 24 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tekli J(2022)An overview of cluster-based image search result organization: background, techniques, and ongoing challengesKnowledge and Information Systems10.1007/s10115-021-01650-964:3(589-642)Online publication date: 11-Feb-2022
https://dl.acm.org/doi/10.1007/s10115-021-01650-9
Fonseca EFavory XPons JFont FSerra X(2021)FSD50K: An Open Dataset of Human-Labeled Sound EventsIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2021.313320830(829-852)Online publication date: 10-Dec-2021
https://dl.acm.org/doi/10.1109/TASLP.2021.3133208

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten