[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/971478.971500acmotherconferencesArticle/Chapter ViewAbstractPublication PagespuiConference Proceedingsconference-collections
Article

Audio-video array source separation for perceptual user interfaces

Published: 15 November 2001 Publication History

Abstract

Steerable microphone arrays provide a flexible infrastructure for audio source separation. In order for them to be used effectively in perceptual user interfaces, there must be a mechanism in place for steering the focus of the array to the sound source. Audio-only steering techniques often perform poorly in the presence of multiple sound sources or strong reverberation. Video-only techniques can achieve high spatial precision but require that the audio and video subsystems be accurately calibrated to preserve this precision. We present an audio-video localization technique that combines the benefits of the two modalities. We implement our technique in a test environment containing multiple stereo cameras and a room-sized microphone array. Our technique achieves an 8.9 dB improvement over a single far-field microphone and a 6.7 dB improvement over source separation based on video-only localization.

References

[1]
D. J. Beymer and K. Konolige. Real-time tracking of multiple people using stereo. In Frame-Rate Workshop, 1999.
[2]
U. Bub, M. Hunke, and A. Waibel. Knowing who to listen to in speech recognition: Visually guided beamforming. In 1995 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1995.
[3]
M. Casey, W. Gardner, and S. Basu. Vision steered beamforming and transaural rendering for the artificial life interactive video environment,(alive). In 99th Convention of the Audio Engineering Society, 1995.
[4]
M. Collobert, R. Feraud, G. LeTourneur, O. Bernier, J. E. Viallet, Y. Mahieux, and D. Collobert. Listen: a system for locating and tracking individual speakers. In 2nd International Conference on Face and Gesture Recognition, 1996.
[5]
T. Darrell, D. Demirdjian, N. Checka, and P. Felzenszwalb. Plan-view trajectory estimation with dense stereo background models. In 2001 International Conference on Computer Vision, 2001.
[6]
T. Darrell, G. G. Gordon, M. Harville, and J. Woodfill. Integrated person tracking using stereo, color, and pattern detection. IJCV, (37(2)):199--207, June 2000.
[7]
R. Duraiswami, D. Zotkin, and L. S. Davis. Active speech source localization by a dual course-to-fine search. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 2001.
[8]
Y. A. Ivanov, A. F. Bobick, and J. Liu. Fast lighting independent background subtraction. IJCV, 2000.
[9]
J. Krumm, S. Harris, B. Meyers, B. Brummit, M. Hale, and S. Shafer. Multi-camera multi-person tracking for easyliving. In 3rd IEEE Workshop on Visual Surveillance, 2000. R<10>H. F. Silverman, W. R. Patterson, and J. L. Flanagan. The huge microphone array. IEEE Concurrency, pages 36--46, Oct. 1998.
[10]
B. D. V. Veen and K. M. Buckley. Beamforming: A versatile approach to spatial filtering. IEEE ASSP Magazine, Apr. 1988.
[11]
M. Viberg and H. Krim. Two decades of statistical array processing. In 31st Asilomar Conference on Signals, Systems, and Computers, 1997.
[12]
C. Wang and M. Brandstein. Multi-source face tracking with audio and visual data. In IEEE International Workshop on Multimedia Signal Processing, 1999.

Cited By

View all
  • (2014)What's Making that Sound?Proceedings of the 22nd ACM international conference on Multimedia10.1145/2647868.2654936(147-156)Online publication date: 3-Nov-2014
  • (2013)Video-Aided Model-Based Source Separation in Real Reverberant RoomsIEEE Transactions on Audio, Speech, and Language Processing10.1109/TASL.2013.226181421:9(1900-1912)Online publication date: 1-Sep-2013
  • (2013)Two-stage audio-visual speech dereverberation and separation based on models of the interaural spatial cues and spatial covariance2013 18th International Conference on Digital Signal Processing (DSP)10.1109/ICDSP.2013.6622780(1-6)Online publication date: Jul-2013
  • Show More Cited By

Index Terms

  1. Audio-video array source separation for perceptual user interfaces

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    PUI '01: Proceedings of the 2001 workshop on Perceptive user interfaces
    November 2001
    241 pages
    ISBN:9781450374736
    DOI:10.1145/971478
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 November 2001

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    PUI01
    PUI01: Workshop on Perceptive User Interfaces
    November 15 - 16, 2001
    Florida, Orlando, USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 19 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2014)What's Making that Sound?Proceedings of the 22nd ACM international conference on Multimedia10.1145/2647868.2654936(147-156)Online publication date: 3-Nov-2014
    • (2013)Video-Aided Model-Based Source Separation in Real Reverberant RoomsIEEE Transactions on Audio, Speech, and Language Processing10.1109/TASL.2013.226181421:9(1900-1912)Online publication date: 1-Sep-2013
    • (2013)Two-stage audio-visual speech dereverberation and separation based on models of the interaural spatial cues and spatial covariance2013 18th International Conference on Digital Signal Processing (DSP)10.1109/ICDSP.2013.6622780(1-6)Online publication date: Jul-2013
    • (2013)Characterization of SURF interest point distribution for visual processing in sensor networks2013 18th International Conference on Digital Signal Processing (DSP)10.1109/ICDSP.2013.6622701(1-7)Online publication date: Jul-2013
    • (2008)Target Detection and Tracking With Heterogeneous SensorsIEEE Journal of Selected Topics in Signal Processing10.1109/JSTSP.2008.20014292:4(503-513)Online publication date: Aug-2008
    • (2008)Omnidirectional Audio-Visual Talker Localization Based on Dynamic Fusion of Audio-Visual Features Using Validity and Reliability CriteriaIEICE - Transactions on Information and Systems10.1093/ietisy/e91-d.3.598E91-D:3(598-606)Online publication date: 1-Mar-2008
    • (2007)Audio-Visual Event Recognition in Surveillance Video SequencesIEEE Transactions on Multimedia10.1109/TMM.2006.8862639:2(257-267)Online publication date: 1-Feb-2007
    • (2004)Audio-Video Integration for Background ModellingComputer Vision - ECCV 200410.1007/978-3-540-24671-8_16(202-213)Online publication date: 2004
    • (2003)A Graphical Model for Audiovisual Object TrackingIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2003.120651225:7(828-836)Online publication date: 1-Jul-2003
    • (2003)Audiovisual localization of multiple speakers in a video teleconferencing settingInternational Journal of Imaging Systems and Technology10.1002/ima.1004513:1(95-105)Online publication date: 2-Jun-2003

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media