[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3015166.3015208acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicspsConference Proceedingsconference-collections
research-article

Overlapping Speech Detection with Cluster-based HMM Framework

Published: 21 November 2016 Publication History

Abstract

Overlapping speech is known to be the major source of error in various speech processing algorithm. Many previous studies on overlapping speech detection focus on exploring the various feature set for representing speech and overlapping speech characteristics while using the HMM framework. In this study, however, we hypothesize that the capacity of single HMM will not be enough to cover the whole speech and overlapping speech distribution. Thus, we proposed a simple cluster-based HMM framework to construct multiple speech and overlapping speech model. The experimental results on GRID corpus show significant improvements compare to the conventional overlap detection system.

References

[1]
Yella, S.H. and Bourlard, H. 2014. Overlapping speech detection using long-term conversational features for speaker diarization in meeting room conversations. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22, 12 (Dec. 2014), 1688--1700.
[2]
Cetin, O. and Shriberg, E. 2006. Speaker overlaps and ASR errors in meetings: Effects before, during, and after the overlap. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (Toulouse, France, 2006), 357--360.
[3]
Tsai, W. and Liao, S. 2010. Speaker identification in overlapping speech, Journal of Information Science and Engineering, 26, 1891--1903.
[4]
Miro, X. A., Bozonnet, S., Evans, N., Fredouille, C., Friedland, G. and Vinyals, O. 2012. Speaker diarization: A review of recent research. IEEE Transactions on Audio, Speech, and Language Processing, 20, 2 (Feb. 2012), 356--370.
[5]
Yella, S.H. and Bourlard, H. 2013. Improved overlap speech diarization of meeting recordings using long-term conversational features. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing(Vancouver, Canada, 2013), 7746--7750.
[6]
Wrigley, S.N., Brown, G.J., Wan, V. and Renals, S. 2005. Speech and crosstalk detection in multi-channel audio. IEEE Transactions on Speech and Audio Processing, 13, 1 (Jan. 2005), 84--91.
[7]
Zelenak, M., Segura, C., Luque, J. and Hernando, J. 2012. Simultaneous speech detection with spatial features for speaker diarization. IEEE Transactions on Audio, Speech, and Language Processing, 20, 2 (Feb 2012), 436--446.
[8]
Boakye, K., Vinyals, O. and Friedland, G. 2011. Improved overlapped speech handling for speaker diarization. In Proceedings of the INTERSPEECH (Florence, Italy, 2011), 941--944.
[9]
Vipperla, R., Geiger, J.T., Bozonnet, S., Wang, D., Evans, N., Schuller, B. and Rigoll, G. 2012. Speech overlap detection and attribution using convolutive non-negative sparse coding. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (Toulouse, France, 2012), 4181--4184.
[10]
Geiger, J.T., Eyben, F., Schuller, B.and Rigoll, G. 2013. Detecting overlapping speech with long-term short memory recurrent neural networks. In Proceedings of the INTERSPEECH (Lyon, France, 2013).
[11]
Cooke, M.P., Barker, J., Cunningham, S. and Shao, X. 2006. An audio-visual corpus for speech perception and automatic speech recognition. The Journal of the Acoustical Society of America, 120, 5 (Nov. 2006), 2421--2424.

Cited By

View all
  • (2017)Improving separation of overlapped speech for meeting conversations using uncalibrated microphone array2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)10.1109/ASRU.2017.8268916(55-62)Online publication date: Dec-2017

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICSPS 2016: Proceedings of the 8th International Conference on Signal Processing Systems
November 2016
235 pages
ISBN:9781450347907
DOI:10.1145/3015166
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 November 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Expectation-maximizationclustering
  2. Hidden Markov model
  3. Overlapping speech
  4. Overlapping speech detection

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICSPS 2016

Acceptance Rates

ICSPS 2016 Paper Acceptance Rate 46 of 83 submissions, 55%;
Overall Acceptance Rate 46 of 83 submissions, 55%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Improving separation of overlapped speech for meeting conversations using uncalibrated microphone array2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)10.1109/ASRU.2017.8268916(55-62)Online publication date: Dec-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media