research-article

Video-Aided Model-Based Source Separation in Real Reverberant Rooms

Authors:

Muhammad Salman Khan,

Syed Mohsen Naqvi,

Ata- ur-Rehman,

Wenwu Wang,

Jonathon ChambersAuthors Info & Claims

IEEE Transactions on Audio, Speech, and Language Processing, Volume 21, Issue 9

Pages 1900 - 1912

https://doi.org/10.1109/TASL.2013.2261814

Published: 01 September 2013 Publication History

Abstract

Source separation algorithms that utilize only audio data can perform poorly if multiple sources or reverberation are present. In this paper we therefore propose a video-aided model-based source separation algorithm for a two-channel reverberant recording in which the sources are assumed static. By exploiting cues from video, we first localize individual speech sources in the enclosure and then estimate their directions. The interaural spatial cues, the interaural phase difference and the interaural level difference, as well as the mixing vectors are probabilistically modeled. The models make use of the source direction information and are evaluated at discrete time-frequency points. The model parameters are refined with the well-known expectation-maximization (EM) algorithm. The algorithm outputs time-frequency masks that are used to reconstruct the individual sources. Simulation results show that by utilizing the visual modality the proposed algorithm can produce better time-frequency masks thereby giving improved source estimates. We provide experimental results to test the proposed algorithm in different scenarios and provide comparisons with both other audio-only and audio-visual algorithms and achieve improved performance both on synthetic and real data. We also include dereverberation based pre-processing in our algorithm in order to suppress the late reverberant components from the observed stereo mixture and further enhance the overall output of the algorithm. This advantage makes our algorithm a suitable candidate for use in under-determined highly reverberant settings where the performance of other audio-only and audio-visual methods is limited.

Cited By

View all

Hu TXiang XQin JTan Y(2023)Audio–text retrieval based on contrastive learning and collaborative attention mechanismMultimedia Systems10.1007/s00530-023-01144-429:6(3625-3638)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1007/s00530-023-01144-4
Hossain MAl Mahmud TIslam MHossen MKhan RYe Z(2022)Dual transform based joint learning single channel speech separation using generative joint dictionary learningMultimedia Tools and Applications10.1007/s11042-022-12816-081:20(29321-29346)Online publication date: 1-Aug-2022
https://dl.acm.org/doi/10.1007/s11042-022-12816-0
Sun YWang WChambers JNaqvi S(2019)Two-Stage Monaural Source Separation in Reverberant Room Environments Using Deep Neural NetworksIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2018.287470827:1(125-139)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1109/TASLP.2018.2874708
Show More Cited By

Recommendations

Bootstrap Averaging for Model-Based Source Separation in Reverberant Conditions

Recently proposed model-based methods use time-frequency T-F masking for source separation, where the T-F masks are derived from various cues described by a frequency domain Gaussian mixture model GMM. These methods work well for separating mixtures ...
Performance analysis of dynamic acoustic source separation in reverberant rooms

We study the effect of reverberation and source movement on the performance of blind source separation and deconvolution (BSSD) algorithms. Using the model of statistical room acoustics we derive theoretical performance measures for a class of unmixing ...
Acoustic parameter extraction from occupied rooms utilizing blind source separation
KES'06: Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part III

Room acoustic parameters such as reverberation time (RT) can be extracted from passively received speech signals by some ‘blind' methods, which mitigates the need for good controlled excitation signals or prior information of the room geometry. However, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Audio, Speech, and Language Processing

IEEE Transactions on Audio, Speech, and Language Processing Volume 21, Issue 9

September 2013

210 pages

ISSN:1558-7916

Issue’s Table of Contents

Publisher

IEEE Press

Publication History

Published: 01 September 2013

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Hu TXiang XQin JTan Y(2023)Audio–text retrieval based on contrastive learning and collaborative attention mechanismMultimedia Systems10.1007/s00530-023-01144-429:6(3625-3638)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1007/s00530-023-01144-4
Hossain MAl Mahmud TIslam MHossen MKhan RYe Z(2022)Dual transform based joint learning single channel speech separation using generative joint dictionary learningMultimedia Tools and Applications10.1007/s11042-022-12816-081:20(29321-29346)Online publication date: 1-Aug-2022
https://dl.acm.org/doi/10.1007/s11042-022-12816-0
Sun YWang WChambers JNaqvi S(2019)Two-Stage Monaural Source Separation in Reverberant Room Environments Using Deep Neural NetworksIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2018.287470827:1(125-139)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1109/TASLP.2018.2874708
Xian YSun YChambers JNaqvi S(2018)Geometric Information Based Monaural Speech Separation Using Deep Neural Network2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2018.8461753(4454-4458)Online publication date: 15-Apr-2018
https://dl.acm.org/doi/10.1109/ICASSP.2018.8461753
Gannot SVincent EMarkovich-Golan SOzerov AGannot SVincent EMarkovich-Golan SOzerov A(2017)A Consolidated Perspective on Multimicrophone Speech Enhancement and Source SeparationIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2016.264770225:4(692-730)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1109/TASLP.2016.2647702
Sun YRafique WChambers JNaqvi S(2017)Underdetermined source separation using time-frequency masks and an adaptive combined Gaussian-Student's t probabilistic model2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2017.7952945(4187-4191)Online publication date: 5-Mar-2017
https://dl.acm.org/doi/10.1109/ICASSP.2017.7952945
Salman Khan MYu MFeng PWang LChambers J(2015)An unsupervised acoustic fall detection system using source separation for sound interference suppressionSignal Processing10.1016/j.sigpro.2014.08.021110:C(199-210)Online publication date: 1-May-2015
https://dl.acm.org/doi/10.1016/j.sigpro.2014.08.021

Abstract

Cited By

Recommendations

Bootstrap Averaging for Model-Based Source Separation in Reverberant Conditions

Performance analysis of dynamic acoustic source separation in reverberant rooms

Acoustic parameter extraction from occupied rooms utilizing blind source separation

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations