[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Video-Aided Model-Based Source Separation in Real Reverberant Rooms

Published: 01 September 2013 Publication History

Abstract

Source separation algorithms that utilize only audio data can perform poorly if multiple sources or reverberation are present. In this paper we therefore propose a video-aided model-based source separation algorithm for a two-channel reverberant recording in which the sources are assumed static. By exploiting cues from video, we first localize individual speech sources in the enclosure and then estimate their directions. The interaural spatial cues, the interaural phase difference and the interaural level difference, as well as the mixing vectors are probabilistically modeled. The models make use of the source direction information and are evaluated at discrete time-frequency points. The model parameters are refined with the well-known expectation-maximization (EM) algorithm. The algorithm outputs time-frequency masks that are used to reconstruct the individual sources. Simulation results show that by utilizing the visual modality the proposed algorithm can produce better time-frequency masks thereby giving improved source estimates. We provide experimental results to test the proposed algorithm in different scenarios and provide comparisons with both other audio-only and audio-visual algorithms and achieve improved performance both on synthetic and real data. We also include dereverberation based pre-processing in our algorithm in order to suppress the late reverberant components from the observed stereo mixture and further enhance the overall output of the algorithm. This advantage makes our algorithm a suitable candidate for use in under-determined highly reverberant settings where the performance of other audio-only and audio-visual methods is limited.

Cited By

View all
  • (2023)Audio–text retrieval based on contrastive learning and collaborative attention mechanismMultimedia Systems10.1007/s00530-023-01144-429:6(3625-3638)Online publication date: 1-Dec-2023
  • (2022)Dual transform based joint learning single channel speech separation using generative joint dictionary learningMultimedia Tools and Applications10.1007/s11042-022-12816-081:20(29321-29346)Online publication date: 1-Aug-2022
  • (2019)Two-Stage Monaural Source Separation in Reverberant Room Environments Using Deep Neural NetworksIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2018.287470827:1(125-139)Online publication date: 1-Jan-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Audio, Speech, and Language Processing  Volume 21, Issue 9
September 2013
210 pages

Publisher

IEEE Press

Publication History

Published: 01 September 2013

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Audio–text retrieval based on contrastive learning and collaborative attention mechanismMultimedia Systems10.1007/s00530-023-01144-429:6(3625-3638)Online publication date: 1-Dec-2023
  • (2022)Dual transform based joint learning single channel speech separation using generative joint dictionary learningMultimedia Tools and Applications10.1007/s11042-022-12816-081:20(29321-29346)Online publication date: 1-Aug-2022
  • (2019)Two-Stage Monaural Source Separation in Reverberant Room Environments Using Deep Neural NetworksIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2018.287470827:1(125-139)Online publication date: 1-Jan-2019
  • (2018)Geometric Information Based Monaural Speech Separation Using Deep Neural Network2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2018.8461753(4454-4458)Online publication date: 15-Apr-2018
  • (2017)A Consolidated Perspective on Multimicrophone Speech Enhancement and Source SeparationIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2016.264770225:4(692-730)Online publication date: 1-Apr-2017
  • (2017)Underdetermined source separation using time-frequency masks and an adaptive combined Gaussian-Student's t probabilistic model2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2017.7952945(4187-4191)Online publication date: 5-Mar-2017
  • (2015)An unsupervised acoustic fall detection system using source separation for sound interference suppressionSignal Processing10.1016/j.sigpro.2014.08.021110:C(199-210)Online publication date: 1-May-2015

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media