[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3372278.3390691acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Search Result Clustering in Collaborative Sound Collections

Published: 08 June 2020 Publication History

Abstract

The large size of nowadays' online multimedia databases makes retrieving their content a difficult and time-consuming task. Users of online sound collections typically submit search queries that express a broad intent, often making the system return large and unmanageable result sets. Search Result Clustering is a technique that organises search-result content into coherent groups, which allows users to identify useful subsets in their results. Obtaining coherent and distinctive clusters that can be explored with a suitable interface is crucial for making this technique a useful complement of traditional search engines. In our work, we propose a graph-based approach using audio features for clustering diverse sound collections obtained when querying large online databases. We propose an approach to assess the performance of different features at scale, by taking advantage of the metadata associated with each sound. This analysis is complemented with an evaluation using ground-truth labels from manually annotated datasets. We show that using a confidence measure for discarding inconsistent clusters improves the quality of the partitions. After identifying the most appropriate features for clustering, we conduct an experiment with users performing a sound design task, in order to evaluate our approach and its user interface. A qualitative analysis is carried out including usability questionnaires and semi-structured interviews. This provides us with valuable new insights regarding the features that promote efficient interaction with the clusters.

References

[1]
Martin Aumüller, Erik Bernhardsson, and Alexander Faithfull. 2020. ANN-benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Information Systems, Vol. 87 (2020), 101374.
[2]
Alan W Black and Paul A Taylor. 1997. Automatically clustering similar units for unit selection in speech synthesis. (1997).
[3]
Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, Vol. 2008, 10 (2008), P10008.
[4]
Dmitry Bogdanov, Nicolas Wack, Emilia Gómez Gutiérrez, Sankalp Gulati, Perfecto Herrera Boyer, Oscar Mayor, Gerard Roma Trepat, Justin Salamon, José Ricardo Zapata González, and Xavier Serra. 2013. Essentia: An audio analysis library for music information retrieval. In 14th Conference of the International Society for Music Information Retrieval, 2013. p. 493--8.
[5]
Tadeusz Cali'nski and Jerzy Harabasz. 1974. A dendrite method for cluster analysis. Communications in Statistics-theory and Methods, Vol. 3, 1 (1974), 1--27.
[6]
Claudio Carpineto, Stanislaw Osi'nski, Giovanni Romano, and Dawid Weiss. 2009. A survey of web clustering engines. ACM Computing Surveys (CSUR), Vol. 41, 3 (2009), 17.
[7]
Lawrence Cayton. 2008. Fast nearest neighbor retrieval for bregman divergences. In Proceedings of the 25th international conference on Machine learning. ACM, 112--119.
[8]
Keunwoo Choi, György Fazekas, Mark Sandler, and Kyunghyun Cho. 2017. Transfer learning for music classification and regression tasks. arXiv preprint arXiv:1703.09179 (2017).
[9]
Scott Deerwester, Susan T Dumais, George W Furnas, Thomas K Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American society for information science, Vol. 41, 6 (1990), 391--407.
[10]
Wei Dong, Charikar Moses, and Kai Li. 2011. Efficient k-nearest neighbor graph construction for generic similarity measures. In Proceedings of the 20th international conference on World wide web. ACM, 577--586.
[11]
Antti Eronen and Anssi Klapuri. 2000. Musical instrument recognition using cepstral coefficients and temporal features. In Acoustics, Speech, and Signal Processing, 2000. ICASSP'00. Proceedings. 2000 IEEE International Conference on, Vol. 2. IEEE, II753--II756.
[12]
Adil Fahad, Najlaa Alshatri, Zahir Tari, Abdullah Alamri, Ibrahim Khalil, Albert Y Zomaya, Sebti Foufou, and Abdelaziz Bouras. 2014. A survey of clustering algorithms for big data: Taxonomy and empirical analysis. IEEE transactions on emerging topics in computing, Vol. 2, 3 (2014), 267--279.
[13]
Per Fallgren, Zofia Malisz, and Jens Edlund. 2018. A Tool for Exploring Large Amounts of Found Audio Data. In DHN. 499--503.
[14]
Eduardo Fonseca, Jordi Pons Puig, Xavier Favory, Frederic Font Corbera, Dmitry Bogdanov, Andres Ferraro, Sergio Oramas, Alastair Porter, and Xavier Serra. 2017. Freesound datasets: a platform for the creation of open audio datasets. In Hu X, Cunningham SJ, Turnbull D, Duan Z, editors. Proceedings of the 18th ISMIR Conference; 2017 oct 23--27; Suzhou, China.[Canada]: International Society for Music Information Retrieval; 2017. p. 486--93. International Society for Music Information Retrieval (ISMIR).
[15]
Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia. ACM, 411--412.
[16]
Frederic Font, Gerard Roma, and Xavier Serra. 2018. Sound sharing and retrieval. In Computational Analysis of Sound Scenes and Events. Springer, 279--301.
[17]
Jort F Gemmeke, Daniel PW Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R Channing Moore, Manoj Plakal, and Marvin Ritter. 2017. Audio set: An ontology and human-labeled dataset for audio events. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 776--780.
[18]
Perfecto Herrera-Boyer, Geoffroy Peeters, and Shlomo Dubnov. 2003. Automatic classification of musical instrument sounds. Journal of New Music Research, Vol. 32, 1 (2003), 3--21.
[19]
Shawn Hershey, Sourish Chaudhuri, Daniel PW Ellis, Jort F Gemmeke, Aren Jansen, R Channing Moore, Manoj Plakal, Devin Platt, Rif A Saurous, Bryan Seybold, et al. 2017. CNN architectures for large-scale audio classification. In 2017 ieee international conference on acoustics, speech and signal processing (icassp). IEEE, 131--135.
[20]
Hilary Hutchinson, Wendy Mackay, Bo Westerlund, Benjamin B Bederson, et al. 2003. Technology probes: inspiring design for and with families. In Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 17--24.
[21]
Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing. ACM, 604--613.
[22]
Aren Jansen, Jort F Gemmeke, Daniel PW Ellis, Xiaofeng Liu, Wade Lawrence, and Dylan Freedman. 2017. Large-scale audio event discovery in one million youtube videos. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 786--790.
[23]
Youngmoo E Kim, Erik M Schmidt, Raymond Migneco, Brandon G Morton, Patrick Richardson, Jeffrey Scott, Jacquelin A Speck, and Douglas Turnbull. 2010. Music emotion recognition: A state of the art review. In Proc. ISMIR, Vol. 86. Citeseer, 937--952.
[24]
Yanchi Liu, Zhongmou Li, Hui Xiong, Xuedong Gao, and Junjie Wu. 2010. Understanding of internal clustering validation measures. In 2010 IEEE International Conference on Data Mining. IEEE, 911--916.
[25]
Christopher Manning, Prabhakar Raghavan, and Hinrich Schütze. 2010. Introduction to information retrieval. Natural Language Engineering, Vol. 16, 1 (2010), 100--103.
[26]
Luis Gustavo Martins, Juan José Burred, George Tzanetakis, and Mathieu Lagrange. 2007. Polyphonic instrument recognition using spectral clustering. In ISMIR. 213--218.
[27]
Robert Neumayer, Thomas Lidy, and Andreas Rauber. 2005. Content-based organization of digital audio collections .na.
[28]
Mark EJ Newman and Michelle Girvan. 2004. Finding and evaluating community structure in networks. Physical review E, Vol. 69, 2 (2004), 026113.
[29]
Jakob Nielsen. 2000. Why You Only Need to Test with 5 Users. Jakob Nielsens Alertbox, Vol. 19, September 23 (2000), 1--4. https://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/ http://www.useit.com/alertbox/20000319.html
[30]
Maria E Niessen, Tim LM Van Kasteren, and Andreas Merentitis. 2013. Hierarchical modeling using automated sub-clustering for sound event recognition. In 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. IEEE, 1--4.
[31]
Geoffroy Peeters, Bruno L Giordano, Patrick Susini, Nicolas Misdariis, and Stephen McAdams. 2011. The timbre toolbox: Extracting audio descriptors from musical signals. The Journal of the Acoustical Society of America, Vol. 130, 5 (2011), 2902--2916.
[32]
Georgios Petkos, Manos Schinas, Symeon Papadopoulos, and Yiannis Kompatsiaris. 2017. Graph-based multimodal clustering for social multimedia. Multimedia Tools and Applications, Vol. 76, 6 (2017), 7897--7919.
[33]
Jordi Pons, Oriol Nieto, Matthew Prockup, Erik Schmidt, Andreas Ehmann, and Xavier Serra. 2017. End-to-end learning for music audio tagging at scale. arXiv preprint arXiv:1711.02520 (2017).
[34]
Gerard Roma, Anna Xambó, Perfecto Herrera, and Robin Laney. 2012. Factors in human recognition of timbre lexicons generated by data clustering. (2012).
[35]
Gerard Roma Trepat et al. 2015. Algorithms and representations for supporting online music creation with large-scale audio databases. (2015).
[36]
Simone Romano, Nguyen Xuan Vinh, James Bailey, and Karin Verspoor. 2016. Adjusting for chance clustering comparison measures. The Journal of Machine Learning Research, Vol. 17, 1 (2016), 4635--4666.
[37]
Gerard Salton. 1989. Automatic text processing: The transformation, analysis, and retrieval of. Reading: Addison-Wesley, Vol. 169 (1989).
[38]
James Sinclair and Michael Cardew-Hall. 2008. The folksonomy tag cloud: when is it useful? Journal of Information Science, Vol. 34, 1 (2008), 15--29.
[39]
Daniel Tunkelang. 2009. Faceted search. Synthesis lectures on information concepts, retrieval, and services, Vol. 1, 1 (2009), 1--80.
[40]
George Tzanetakis and Perry Cook. 2001. Marsyas3D: a prototype audio browser-editor using a large scale immersive visual and audio display. Georgia Institute of Technology.
[41]
George Tzanetakis and Perry Cook. 2002. Musical genre classification of audio signals. IEEE Transactions on speech and audio processing, Vol. 10, 5 (2002), 293--302.
[42]
Nguyen Xuan Vinh, Julien Epps, and James Bailey. 2010. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, Vol. 11, Oct (2010), 2837--2854.
[43]
Kanit Wongsuphasawat, Dominik Moritz, Anushka Anand, Jock Mackinlay, Bill Howe, and Jeffrey Heer. 2015. Voyager: Exploratory analysis via faceted browsing of visualization recommendations. IEEE transactions on visualization and computer graphics, Vol. 22, 1 (2015), 649--658.
[44]
Dongkuan Xu and Yingjie Tian. 2015. A comprehensive survey of clustering algorithms. Annals of Data Science, Vol. 2, 2 (2015), 165--193.
[45]
Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks?. In Advances in neural information processing systems. 3320--3328.

Cited By

View all
  • (2022)An overview of cluster-based image search result organization: background, techniques, and ongoing challengesKnowledge and Information Systems10.1007/s10115-021-01650-964:3(589-642)Online publication date: 11-Feb-2022
  • (2021)FSD50K: An Open Dataset of Human-Labeled Sound EventsIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2021.313320830(829-852)Online publication date: 10-Dec-2021

Index Terms

  1. Search Result Clustering in Collaborative Sound Collections

                            Recommendations

                            Comments

                            Please enable JavaScript to view thecomments powered by Disqus.

                            Information & Contributors

                            Information

                            Published In

                            cover image ACM Conferences
                            ICMR '20: Proceedings of the 2020 International Conference on Multimedia Retrieval
                            June 2020
                            605 pages
                            ISBN:9781450370875
                            DOI:10.1145/3372278
                            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                            Sponsors

                            Publisher

                            Association for Computing Machinery

                            New York, NY, United States

                            Publication History

                            Published: 08 June 2020

                            Permissions

                            Request permissions for this article.

                            Check for updates

                            Author Tags

                            1. audio features
                            2. neural-network embeddings
                            3. search interfaces
                            4. sound clustering
                            5. sound retrieval
                            6. unsupervised classification

                            Qualifiers

                            • Research-article

                            Conference

                            ICMR '20
                            Sponsor:

                            Acceptance Rates

                            Overall Acceptance Rate 254 of 830 submissions, 31%

                            Contributors

                            Other Metrics

                            Bibliometrics & Citations

                            Bibliometrics

                            Article Metrics

                            • Downloads (Last 12 months)7
                            • Downloads (Last 6 weeks)1
                            Reflects downloads up to 24 Jan 2025

                            Other Metrics

                            Citations

                            Cited By

                            View all
                            • (2022)An overview of cluster-based image search result organization: background, techniques, and ongoing challengesKnowledge and Information Systems10.1007/s10115-021-01650-964:3(589-642)Online publication date: 11-Feb-2022
                            • (2021)FSD50K: An Open Dataset of Human-Labeled Sound EventsIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2021.313320830(829-852)Online publication date: 10-Dec-2021

                            View Options

                            Login options

                            View options

                            PDF

                            View or Download as a PDF file.

                            PDF

                            eReader

                            View online with eReader.

                            eReader

                            Figures

                            Tables

                            Media

                            Share

                            Share

                            Share this Publication link

                            Share on social media