abstract

Public Access

Leveraging User Input and Feedback for Interactive Sound Event Detection and Annotation

Author:

Bongjun KimAuthors Info & Claims

IUI '18: Proceedings of the 23rd International Conference on Intelligent User Interfaces

Pages 671 - 672

https://doi.org/10.1145/3172944.3173149

Published: 05 March 2018 Publication History

PDF eReader

Abstract

Tagging of environment audio events is essential in many areas. However, finding sound events and labeling them within a long audio file is tedious and time-consuming. Building an automatic recognition system using modern machine learning is often not feasible because it requires a large number of human-labeled training examples and it is not reliable enough for all uses. I propose interactive sound event detection to solve the issue by combining machine search with human tagging, specifically focusing on the effectiveness of various types of user-inputs to the interactive sound searching. The types of user inputs that I will explore include binary relevance feedback, segmentation, and vocal imitation. I expect that leveraging one or combination of these user inputs would help users find audio contents of interest quickly and accurately, even in the situation where there are not enough training examples for a typical automated system.

References

[1]

Saleema Amershi, James Fogarty, Ashish Kapoor, and Desney Tan. 2011. Effective End-user Interaction with Machine Learning. In Proc. of the AAAI Conference on Artificial Intelligence (AAAI). AAAI Press, 1529--1532.

Digital Library

Google Scholar

[2]

Sébastien Gulluni, Slim Essid, Olivier Buisson, and Gaël Richard. 2011. An interactive system for electro-Acoustic music analysis. In Proc. of the International Society for Music Information Retrieval Conference (ISMIR). 145--150.

Google Scholar

[3]

Bongjun Kim and Bryan Pardo. 2017. I-SED: An Interactive Sound Event Detector. In Proc. of the International Conference on Intelligent User Interfaces (IUI). ACM, 553--557.

Digital Library

Google Scholar

[4]

Giambattista Parascandolo, Heikki Huttunen, and Tuomas Virtanen. 2016. Recurrent neural networks for polyphonic sound event detection in real life recordings. In Proc. of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6440--6444.

Digital Library

Google Scholar

[5]

Burr Settles. 2011. Closing the loop: Fast, interactive semi-supervised annotation with queries on features and instances. In Proc. of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 1467--1478.

Digital Library

Google Scholar

Cited By

View all

Tang YChang CYang XIgarashi T(2023)SyncLabeling: A Synchronized Audio Segmentation Interface for Mobile DevicesProceedings of the ACM on Human-Computer Interaction10.1145/36042737:MHCI(1-19)Online publication date: 13-Sep-2023
https://dl.acm.org/doi/10.1145/3604273
Tan EKarnapi FNg LOoi KGan W(2021)Extracting Urban Sound Information for Residential Areas in Smart Cities Using an End-to-End IoT SystemIEEE Internet of Things Journal10.1109/JIOT.2021.30687558:18(14308-14321)Online publication date: 15-Sep-2021
https://doi.org/10.1109/JIOT.2021.3068755
Cramer AWu HSalamon JBello J(2019)Look, Listen, and Learn More: Design Choices for Deep Audio EmbeddingsICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2019.8682475(3852-3856)Online publication date: May-2019
https://doi.org/10.1109/ICASSP.2019.8682475

Index Terms

Leveraging User Input and Feedback for Interactive Sound Event Detection and Annotation
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interactive systems and tools
2. Information systems
  1. Information retrieval
    1. Users and interactive retrieval

Recommendations

I-SED: An Interactive Sound Event Detector
IUI '17: Proceedings of the 22nd International Conference on Intelligent User Interfaces

Tagging of sound events is essential in many research areas. However, finding sound events and labeling them within a long audio file is tedious and time-consuming. Building an automatic recognition system using machine learning techniques is often not ...
A Human-in-the-Loop System for Sound Event Detection and Annotation
Special Issue on Human-Centered Machine Learning

Labeling of audio events is essential for many tasks. However, finding sound events and labeling them within a long audio file is tedious and time-consuming. In cases where there is very little labeled data (e.g., a single labeled example), it is often ...
Frequency-dependent auto-pooling function for weakly supervised sound event detection
Abstract
Sound event detection (SED), which is typically treated as a supervised problem, aims at detecting types of sound events and corresponding temporal information. It requires to estimate onset and offset annotations for sound events at each frame. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

IUI '18: Proceedings of the 23rd International Conference on Intelligent User Interfaces

March 2018

698 pages

ISBN:9781450349451

DOI:10.1145/3172944

General Chairs:
Shlomo Berkovsky
CSIRO, Australia
,
Yoshinori Hijikata
Kwansei Gakuin University, Japan
,
Jun Rekimoto
University of Tokyo, Japan
,
Program Chairs:
Margaret Burnett
Oregon State University, USA
,
Mark Billinghurst
University of South Australia, Australia
,
Aaron Quigley
University of St Andrews, UK

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

In-Cooperation

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 March 2018

Check for updates

Author Tags

Qualifiers

Abstract

Funding Sources

National Science Foundation

Conference

IUI'18

Sponsor:

SIGAI

IUI'18: 23rd International Conference on Intelligent User Interfaces

March 7 - 11, 2018

Tokyo, Japan

Acceptance Rates

IUI '18 Paper Acceptance Rate 43 of 299 submissions, 14%;

Overall Acceptance Rate 746 of 2,811 submissions, 27%

Upcoming Conference

IUI '25

Sponsor:
sigai
sigai

30th International Conference on Intelligent User Interfaces

March 24 - 27, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
238
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)1

Reflects downloads up to 30 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Tang YChang CYang XIgarashi T(2023)SyncLabeling: A Synchronized Audio Segmentation Interface for Mobile DevicesProceedings of the ACM on Human-Computer Interaction10.1145/36042737:MHCI(1-19)Online publication date: 13-Sep-2023
https://dl.acm.org/doi/10.1145/3604273
Tan EKarnapi FNg LOoi KGan W(2021)Extracting Urban Sound Information for Residential Areas in Smart Cities Using an End-to-End IoT SystemIEEE Internet of Things Journal10.1109/JIOT.2021.30687558:18(14308-14321)Online publication date: 15-Sep-2021
https://doi.org/10.1109/JIOT.2021.3068755
Cramer AWu HSalamon JBello J(2019)Look, Listen, and Learn More: Design Choices for Deep Audio EmbeddingsICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2019.8682475(3852-3856)Online publication date: May-2019
https://doi.org/10.1109/ICASSP.2019.8682475

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

I-SED: An Interactive Sound Event Detector

A Human-in-the-Loop System for Sound Event Detection and Annotation

Frequency-dependent auto-pooling function for weakly supervised sound event detection