[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation

Published: 01 April 2017 Publication History

Abstract

Speech enhancement and separation are core problems in audio signal processing, with commercial applications in devices as diverse as mobile phones, conference call systems, hands-free systems, or hearing aids. In addition, they are crucial preprocessing steps for noise-robust automatic speech and speaker recognition. Many devices now have two to eight microphones. The enhancement and separation capabilities offered by these multichannel interfaces are usually greater than those of single-channel interfaces. Research in speech enhancement and separation has followed two convergent paths, starting with microphone array processing and blind source separation, respectively. These communities are now strongly interrelated and routinely borrow ideas from each other. Yet, a comprehensive overview of the common foundations and the differences between these approaches is lacking at present. In this paper, we propose to fill this gap by analyzing a large number of established and recent techniques according to four transverse axes: 1 the acoustic impulse response model, 2 the spatial filter design criterion, 3 the parameter estimation algorithm, and 4 optional postfiltering. We conclude this overview paper by providing a list of software and data resources and by discussing perspectives and future trends in the field.

References

[1]
A. Boothroyd et al., "Hearing aids and wireless technology," Hearing Rev., vol. 14, no. 6, p. 44, 2007.
[2]
J. Flanagan, J. Johnston, R. Zahn, and G. Elko, "Computer-steered microphone arrays for sound transduction in large rooms," J. Acoust. Soc. Amer., vol. 78, no. 5, pp. 1508-1518, 1985.
[3]
H. F. Silverman, W. R. Patterson, J. L. Flanagan, and D. Rabinkin, "A digital processing system for source location and sound capture by large microphone arrays," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Apr. 1997, vol. 1, pp. 251-254.
[4]
D. Mostefa et al., "The CHIL audiovisual corpus for lecture and meeting analysis inside smart rooms," Lang. Resources Eval., vol. 41, no. 3-4, pp. 389-407, 2007.
[5]
C. Fox, Y. Liu, E. Zwyssig, and T. Hain, "The Sheffield wargames corpus," in Proc. Interspeech, 2013, pp. 1116-1120.
[6]
L. Cristoforetti, et al. "The DIRHA simulated corpus," in Proc. 9th Int. Conf. Lang. Res. Eval., 2014, pp. 2629-2634.
[7]
E. Vincent, R. Gribonval, and M. D. Plumbley, "Oracle estimators for the benchmarking of source separation algorithms," Signal Process., vol. 87, no. 8, pp. 1933-1950, 2007.
[8]
S. L. Gay and J. Benesty, Eds., Acoustic Signal Processing for Telecommunication. Norwell, MA, USA: Kluwer, 2000.
[9]
M. S. Brandstein and D. B. Ward, Eds., Microphone Arrays: Signal Processing Techniques and Applications. New York, NY, USA: Springer, 2001.
[10]
J. Benesty, S. Makino, and J. Chen, Eds., Speech Enhancement. New York, NY, USA: Springer, 2005.
[11]
P. C. Loizou, Speech Enhancement: Theory and Practice. Boca Raton, FL, USA: CRC Press, 2007.
[12]
I. Cohen, J. Benesty, and S. Gannot, Eds., Speech Processing in Modern Communication: Challenges and Perspectives. New York, NY, USA: Springer, 2010.
[13]
P. O'Grady, B. Pearlmutter, and S. T. Rickard, "Survey of sparse and non-sparse methods in source separation," Int. J. Imag. Syst. Technol., vol. 15, pp. 18-33, 2005.
[14]
S. Makino, T.-W. Lee, and H. Sawada, Eds. New York, NY, USA: Springer, 2007.
[15]
M. S. Pedersen, J. Larsen, U. Kjems, and L. C. Parra, "Convolutive blind source separationmethods," in SpringerHandbook of Speech Processing, pp. 1065-1094. New York, NY, USA: Springer, 2008.
[16]
P. Comon and C. Jutten, Eds., Handbook of Blind Source Separation, Independent Component Analysis and Applications. New York, NY, USA: Academic, 2010.
[17]
E. Vincent, M. Jafari, S. A. Abdallah,M. D. Plumbley, and M. E. Davies, "Probabilistic modeling paradigms for audio source separation, "in, Machine Audition: Principles, Algorithms and Systems. Hershey, PA, USA: Idea Group, Inc., pp. 162-185, 2010.
[18]
E. Vincent, N. Bertin, R. Gribonval, and F. Bimbot, "From blind to guided audio source separation: How models and side information can improve the separation of sound," IEEE Signal Proc.Mag., vol. 31, no. 3, pp. 107-115, May 2014.
[19]
U. Zölzer, Ed., DAFX: Digital Audio Effects. New York, NY, USA: Wiley, 2011.
[20]
A. Ozerov, C. Févotte, R. Blouet, and J.-L. Durrieu, "Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation," in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., Prague, Czech Republic, May 2011, pp. 257-260.
[21]
N. Sturmel et al., "Linear mixing models for active listening of music productions in realistic studio conditions," in 132th Proc. Audio Eng. Soc. Conv., 2012.
[22]
E. Hänsler and G. Schmidt, Acoustic Echo and NoiseControl: A Practical Approach. Hoboken, NJ, USA: Wiley, 2004.
[23]
P. A. Naylor and N. D. Gaubitch, Eds., Speech Dereverberation. New York, NY, USA: Springer, 2010.
[24]
P. Divenyi, Ed.,Speech Separation by Humans and Machines. New York, NY, USA: Springer-Verlag, 2004.
[25]
D. Wang and G. J. Brown, Eds., Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. New York, NY, USA: Wiley, 2006.
[26]
T. Virtanen, J. F. Gemmeke, B. Raj, and P. Smaragdis, "Compositional models for audio processing," IEEE Signal Process. Mag., vol. 32, no. 2, pp. 125-144, Mar. 2015.
[27]
I. Cohen and S. Gannot, "Spectral enhancement methods," in, Springer Handbook of Speech Processing and Speech Communication. New York, NY, USA: Springer, 2007.
[28]
M. Wölfel, and J. McDonough, Distant Speech Recognition. New York, NY, USA: Wiley, 2009.
[29]
T. Virtanen, R. Singh, and B. Raj, Eds., Techniques for Noise Robustness in Automatic Speech Recognition. New York, NY, USA: Wiley, 2012.
[30]
J. Li, L. Deng, Y. Gong, and R. Haeb-Umbach, "An overview of noiserobust automatic speech recognition," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 4, pp. 745-777, Apr. 2014.
[31]
N. Ono et al., "Harmonic and percussive sound separation and its application to MIR-related tasks," in, Advances in Music Information Retrieval. New York, NY, USA: Springer, 2010.
[32]
C. Knapp and G. Carter, "The generalized correlation method for estimation of time delay," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-24, no. 4, pp. 320-327, Aug. 1976.
[33]
J. Chen, J. Benesty, and Y. Huang, "Time delay estimation in room acoustic environments: An overview," EURASIP J. Appl. Signal Process, vol. 2006, pp. 1-19, 2006.
[34]
A. Brutti, M. Omologo, and P. Svaizer, "Comparison between different sound source localization techniques based on a real data collection," in Proc. Joint Workshop Hands-Free Speech Commun. Microphone Arrays, 2008, pp. 69-72.
[35]
H. Kuttruff, Room Acoustics. New York, NY, USA: Taylor & Francis, 2000.
[36]
J. Blauert, Spatial Hearing: The Psychophysics of Human Sound Localization. Cambridge, MA, USA: MIT Press, 1997.
[37]
D. Markovic, K. Kowalczyk, F. Antonacci, C. Hofmann, A. Sarti, and W. Kellermann, "Estimation of acoustic reflection coefficients through pseudospectrum matching," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 1, pp. 125-137, Jan. 2014.
[38]
E. Vincent, R. Gribonval, and C. Févotte, "Performance measurement in blind audio source separation," IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 4, pp. 1462-1469, Jul. 2006.
[39]
J.-F. Cardoso, "Multidimensional independent component analysis," in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 1998, vol. 4, pp. 1941-1944.
[40]
S.-K. Lee, "Measurement of reverberation times using a wavelet filter bank and application to a passenger car," J. Audio Eng. Soc., vol. 52, no. 5, pp. 506-515, 2004.
[41]
M. Jeub, M. Schäfer, and P. Vary, "A binaural room impulse response database for the evaluation of dereverberation algorithms," in Proc. IEEE Int. Conf. Dig. Signal Process., 2009, pp. 1-4.
[42]
M. R. Schroeder, "Statistical parameters of the frequency response curves of large rooms," J. Audio Eng. Soc., vol. 35, no. 5, pp. 299-306, 1987.
[43]
J.-D. Polack, "Playing billiards in the concert hall: The mathematical foundations of geometrical room acoustics," Appl. Acoust., vol. 38, no. 2, pp. 235-244, 1993.
[44]
M. R. Schroeder, "Frequency correlation functions of frequency responses in rooms," J. Acoust. Soc. Amer., vol. 34, no. 12, pp. 1819-1823, 1963.
[45]
T. Gustafsson, B. D. Rao, and M. Trivedi, "Source localization in reverberant environments: Modeling and statistical analysis," IEEE Trans. Speech Audio Process., vol. 11, pp. 791-803, 2003.
[46]
R. Scharrer and M. Vorländer, "Sound field classification in small microphone arrays using spatial coherences," IEEE Trans. Audio, Speech, Lang. Process, vol. 21, no. 9, pp. 1891-1899, Sep. 2013.
[47]
O. Schwartz, E. Habets, and S. Gannot, "Nested generalized sidelobe canceller for joint dereverberation and noise reduction," in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., Brisbane, Australia, Apr. 2015, pp. 106-110.
[48]
H.-L. Nguyen Thi and C. Jutten, "Blind source separation for convolutive mixtures," Signal Process., vol. 45, no. 2, pp. 209-229, 1995.
[49]
S. Choi and A. Cichocki, "Adaptive blind separation of speech signals: Cocktail party problem," in Proc. IEEE Int. Conf. Signal Process., 1997, pp. 617-622.
[50]
F. Ehlers and H. G. Schuster, "Blind separation of convolutive mixtures and an application in automatic speech recognition in a noisy environment," IEEE Trans. Signal Process., vol. 45, no. 10, pp. 2608-2612, Oct. 1997.
[51]
R. H. Lambert and A. J. Bell, "Blind separation of multiple speakers in a multipath environment," in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 1997, pp. I-423-I-426.
[52]
H.-C. Wu and J. C. Príncipe, "Generalized anti-Hebbian learning for source separation," in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 1999, pp. II-1073-II-1076.
[53]
M. Ito et al., "Moving-source separation using directional microphones," in Proc. Int. Symp. Signal Process. Inf. Technol., 2002, pp. 523-526.
[54]
A. Aissa-El-Bey, K. Abed-Meraim, and Y. Grenier, "Blind separation of underdetermined convolutive mixtures using their time-frequency representation," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 5, pp. 1540-1550, Jul. 2007.
[55]
M. Gupta and S. C. Douglas, "Beamforming initialization and data prewhitening in natural gradient convolutive blind source separation of speech mixtures," in Proc. Int. Conf. Independent Component Anal. Blind Signal Separation, 2007, pp. 462-470.
[56]
Y. Lin, J. Chen, Y. Kim, and D. D. Lee, "Blind channel identification for speech dereverberation using l1-norm sparse learning," in Proc. Neural Inf. Process. Conf., 2007, pp. 921-928.
[57]
P. Sudhakar, S. Arberet, and R. Gribonval, "Double sparsity: Towards blind estimation ofmultiple channels," in Proc. Int. Conf. Latent Variable Anal. Signal Separation, 2010, pp. 571-578.
[58]
M. Yu, W. Ma, J. Xin, and S. Osher, "Multi-channel l1 regularized convex speech enhancement model and fast computation by the split bregman method," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 2, pp. 661-675, Feb. 2012.
[59]
I. J. Kelly and F. M. Boland, "Detecting arrivals in room impulse responses with dynamic time warping," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 7, pp. 1139-1147, Jul. 2014.
[60]
Z. Koldovky, J. Malek, and S. Gannot, "Spatial source subtraction based on incomplete measurements of relative transfer function," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 23, no. 8, pp. 1335-1347, Aug. 2015.
[61]
A. Benichoux, L. S. R. Simon, E. Vincent, and R. Gribonval, "Convex regularizations for the simultaneous recording of room impulse responses," IEEE Trans. Signal Process., vol. 62, no. 8, pp. 1976-1986, Apr. 2014.
[62]
B. Laufer-Goldshtein, R. Talmon, and S. Gannot, "A study on manifolds of acoustic responses," in Proc. 12th Int. Conf. Latent Variable Analysis Independent Component Analysis, Liberec, Czech Republic, Aug. 2015, pp. 203-210.
[63]
M. Kowalski, E. Vincent, and R. Gribonval, "Beyond the narrowband approximation: Wideband convex methods for under-determined reverberant audio source separation," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 7, pp. 1818-1829, Sep. 2010.
[64]
J. Chen, J. Benesty, and Y. Huang, "A minimum distortion noise reduction algorithm with multiple microphones," IEEE Trans. Audio, Speech, Lang. Process., vol. 16, no. 3, pp. 481-493, Mar. 2008.
[65]
R. Crochiere and L. Rabiner, Multi-Rate Signal Processing. Englewood Cliffs, NJ, USA: Prentice-Hall, 1983.
[66]
S. Affes and Y. Grenier, "A signal subspace tracking algorithm for microphone array processing of speech," IEEE Trans. Speech Audio Process., vol. 5, no. 5, pp. 425-437, Sep. 1997.
[67]
P. Smaragdis, "Blind separation of convolved mixtures in the frequency domain," Neurocomputing, vol. 22, pp. 21-34, 1998.
[68]
S. Ikeda and N. Murata, "An approach to blind source separation of speech signals," in Proc. Int. Conf. Artif. Neural Netw., 1998, pp. 761- 766.
[69]
S. Gannot, D. Burshtein, and E. Weinstein, "Signal enhancement using beamforming and nonstationarity with applications to speech," IEEE Trans. Signal Process., vol. 49, no. 8, pp. 1614-1626, Aug. 2001.
[70]
L. C. Parra and C.V. Alvino, "Geometric source separation: Merging convolutive source separation with geometric beamforming," IEEE Trans. Speech Audio Process., vol. 10, no. 6, pp. 352-362, Sep. 2002.
[71]
B. Albouy and Y. Deville, "Alternative structures and power spectrum criteria for blind segmentation and separation of convolutive speech mixtures," in Proc. Int. Conf. Independent Component Anal. Blind Signal Separation, 2003, pp. 361-366.
[72]
S. Winter, H. Sawada, S. Araki, and S. Makino, "Overcomplete BSS for convolutivemixtures based on hierarchical clustering," in Proc. Int. Conf. Independent Component Anal. Blind Signal Separation, 2004, pp. 652- 660.
[73]
S. Araki, R. Mukai, S. Makino, T. Nishikawa, and H. Saruwatari, "The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech," IEEE Trans. Speech Audio Process., vol. 11, no. 2, pp. 109-116, Mar. 2003.
[74]
D. Kounades-Bastian, L. Girin, X. Alameda-Pineda, S. Gannot, and R. P. Horaud, "A variational EM algorithm for the separation of moving sound sources," in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust., New Paltz, USA, Oct. 2015, pp. 1-5.
[75]
H. Sawada, R. Mukai, S. F. G. M. de la Kethulle de Ryhove, S. Araki, and S. Makino, "Spectral smoothing for frequency-domain blind source separation," in Proc. Int. Workshop Acoust. Echo Noise Control, 2003, pp. 311-314.
[76]
F. N. ans P. Svaizer and M. Omologo, "Convolutive BSS of short mixtures by ICA recursively regularized across frequencies," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 3, pp. 624-639, Mar. 2011.
[77]
M. Knaak, S. Araki, and S. Makino, "Geometrically constrained independent component analysis," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 2, pp. 715-726, Feb. 2007.
[78]
K. Reindl, Y. Zheng, A. Schwarz, S. Meier, R. Maas, A. Sehr, and W. Kellermann, "A stereophonic acoustic signal extraction scheme for noisy and reverberant environments," Comput. Speech Lang., vol. 27, no. 3, pp. 726-745, 2013.
[79]
S. Leglaive, R. Badeau, and G. Richard, "Multichannel audio source separation with probabilistic reverberation modeling," in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust., New Paltz, NY, USA, Oct. 2015, pp. 1-5.
[80]
S. Araki, H. Sawada, R. Mukai, and S. Makino, "Underdetermined blind sparse source separation for arbitrarily arrangedmultiple sensors," Signal Process., vol. 87, no. 8, pp. 1833-1847, Aug. 2007.
[81]
H. Sawada, S. Araki, R. Mukai, and S. Makino, "Grouping separated frequency components with estimating propagation model parameters in frequency-domain blind source separation," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 5, pp. 1592-1604, Jul. 2007.
[82]
S. Stenzel, J. Freudenberger, and G. Schmidt, "A minimum variance beamformer for spatially distributed microphones using a soft reference selection," in Proc. Joint Workshop Hands-Free Speech Commun. Microphone Arrays, May 2014, pp. 127-131.
[83]
A. Deleforge, S. Gannot, and W. Kellermann, "Towards a generalization of relative transfer functions to more than one source," in Proc. Eur. Signal Conf., 2015, pp. 419-423.
[84]
X. Li, R. Horaud, L. Girin, and S. Gannot, "Local relative transfer function for sound source localization," in Proc. Eur. Signal Process. Conf., 2015, pp. 399-403.
[85]
O. Yilmaz and S. T. Rickard, "Blind separation of speech mixtures via time-frequency masking," IEEE Trans. Signal Process., vol. 52, no. 7, pp. 1830-1847, Jul. 2004.
[86]
M. Puigt and Y. Deville, "Time-frequency ratio-based blind separation methods for attenuated and time-delayed sources," Mech. Syst. Signal Process, vol. 19, pp. 1348-1379, 2005.
[87]
T. Melia and S. T. Rickard, "Underdetermined blind source separation in echoic environments usingDESPRIT," EURASIP J. Adv. Signal Process., vol. 2007, pp. 1-19, 2007.
[88]
C. Liu, B. C. Wheeler, W. D. O'Brien, Jr, C. R. Lansing, R. C. Bilger, D. L. Jones, and A. S. Feng, "A two-microphone dual delay-line approach for extraction of a speech sound in the presence of multiple interferers," J. Acoust. Soc. Amer., vol. 110, no. 6, pp. 3218-3231, Dec. 2001.
[89]
J. Anemüller and B. Kollmeier, "Adaptive separation of acoustic sources for anechoic conditions: A constrained frequency domain approach," Speech Commun., vol. 39, no. 1-2, pp. 79-95, 2003.
[90]
K. Reindl, S. Markovich-Golan, H. Barfuss, S. Gannot, and W. Kellermann, "Geometrically constrained TRINICON-based relative transfer function estimation in underdetermined scenarios," in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust., New Paltz, USA, Oct. 2013, pp. 1-4.
[91]
A. Bronkhorst and R. Plomp, "The effect of head-induced interaural time and level differences on speech intelligibility in noise," J. Acoust. Soc. Amer., vol. 83, no. 4, pp. 1508-1516, 1988.
[92]
S. Doclo, S. Gannot, M. Moonen, and A. Spriet, "Acoustic beamforming for hearing aid applications," in Handbook on Array Processing and Sensor Networks, S. Haykin and K. Liu, Eds. Hoboken, NJ, USA: Wiley, 2010.
[93]
B. Cornelis, S. Doclo, T. Van dan Bogaert, M. Moonen, and J. Wouters, "Theoretical analysis of binaural multimicrophone noise reduction techniques," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 2, pp. 342-355, Feb. 2010.
[94]
D. Marquardt, V. Hohmann, and S. Doclo, "Coherence preservation in multi-channel wiener filtering based noise reduction for binaural hearing aids," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2013, pp. 8648-8652.
[95]
E. Hadad, S. Gannot, and S. Doclo, "Binaural linearly constrained minimum variance beamformer for hearing aid applications," in Proc. Int. Workshop Acoust. Signal Enhanc., Aachen, Germany, Sep. 2012, pp. 1-4.
[96]
D. Marquardt, E. Hadad, S. Gannot, and S. Doclo, "Theoretical analysis of linearly constrained multi-channel Wiener filtering algorithms for combined noise reduction and binaural cue preservation in binaural hearing aids," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 23, no. 12, pp. 2384-2397, Dec. 2015.
[97]
D. Marquardt, V. Hohmann, and S. Doclo, "Interaural coherence preservation in multi-channel Wiener filtering-based noise reduction for binaural hearing aids," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 23, no. 12, pp. 2162-2176, Dec. 2015.
[98]
E. Hadad, D. Marquardt, S. Doclo, and S. Gannot, "Theoretical analysis of binaural transfer function MVDR beamformers with interference cue preservation constraints," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 23, no. 12, pp. 2449-2464, Dec. 2015.
[99]
E. Hadad, S. Doclo, and S. Gannot, "The binaural LCMV beamformer and its performance analysis," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 24, no. 3, pp. 543-558, Mar. 2016.
[100]
J. Shynk, "Frequency-domain andmultirate and adaptive filtering," IEEE Signal Process. Mag., vol. 9, no. 1, pp. 14-37, Jan. 1992.
[101]
M. R. Portnoff, "Time-frequency representation of digital signals and systems based on short-time Fourier analysis," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-28, no. 1, pp. 55-69, Feb. 1980.
[102]
J. Le Roux and E. Vincent, "ConsistentWiener filtering for audio source separation," IEEE Signal Process. Lett., vol. 20, no. 3, pp. 217-220, Mar. 2013.
[103]
A. Gilloire and M. Vetterli, "Adaptive filtering in subbands with critical sampling: Analysis, experiments, and application to acoustic echo cancellation," IEEE Trans. Signal Process., vol. 40, no. 8, pp. 1862-1875, Aug. 1992.
[104]
Y. Avargel and I. Cohen, "Adaptive system identification in the short-time Fourier transform domain using cross-multiplicative transfer function approximation," IEEE Trans. Audio, Speech, Lang. Process., vol. 16, no. 1, pp. 162-173, Jan. 2008.
[105]
H. Attias, "New EM algorithms for source separation and deconvolution with a microphone array," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2003, vol. V, pp. 297-300.
[106]
R. Talmon, I. Cohen, and S. Gannot, "Convolutive transfer function generalized sidelobe canceler," IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 7, pp. 1420-1434, Sep. 2009.
[107]
W. Kellermann and H. Buchner, "Wideband algorithms versus narrowband algorithms for adaptive filtering in the DFT domain," in Proc. Asilomar Conf. Signal, Syst. Comput., 2003, vol. 2, pp. 1278-1282.
[108]
C. Servière, "Separation of speech signals with segmentation of the impulse responses under reverberant conditions," in Proc. Int. Conf. Independent Component Anal. Blind Signal Separation, 2003, pp. 511- 516.
[109]
S. Mirsamadi, S. Ghaffarzadegan, H. Sheikhzadeh, S. M. Ahadi, and A. H. Rezaie, "Efficient frequency domain implementation of noncausal multichannel blind deconvolution for convolutive mixtures of speech," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 8, pp. 2365- 2377, Oct. 2012.
[110]
C. Févotte and J.-F. Cardoso, "Maximum likelihood approach for blind audio source separation using time-frequencyGaussian models," in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust., 2005, pp. 78-81.
[111]
E. Vincent, S. Arberet, and R. Gribonval, "Underdetermined instantaneous audio source separation via local Gaussian modeling," in Proc. Int. Conf. Independent Component Anal. Blind Signal Separation, 2009, pp. 775-782.
[112]
A. Ozerov and C. Févotte, "Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 3, pp. 550-563, Mar. 2010.
[113]
N. Q. K. Duong, E. Vincent, and R. Gribonval, "Under-determined reverberant audio source separation using a full-rank spatial covariance model," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 7, pp. 1830-1840, Sep. 2010.
[114]
N. Q. K. Duong, H. Tachibana, E. Vincent, N. Ono, R. Gribonval, and S. Sagayama, "Multichannel harmonic and percussive component separation by joint modeling of spatial and spectral continuity," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2011, pp. 205-208.
[115]
J. Nikunen and T. Virtanen, "Direction of arrival based spatial covariance model for blind sound source separation," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 3, pp. 727-739, Mar. 2014.
[116]
N. Q. K. Duong, E.Vincent, and R.Gribonval, "Spatial location priors for Gaussian model based reverberant audio source separation," EURASIP J. Adv. Signal Process., vol. 2013, p. 149, Sep. 2013.
[117]
R. Martin, Freisprecheinrichtungen mit Mehrkanaliger Echokompensation und Störgeräuschreduktion, ABDN, Band 3. Verlag der Augustinus Buchhandlung, Aachen, Germany, 1995, (in German).
[118]
H. L. Van Trees, Detection, Estimation, and Modulation Theory, vol. IV, Optimum Array Processing. New York, NY, USA: Wiley, Apr. 2002.
[119]
D. Levin, E. Habets, and S. Gannot, "A generalized theorem on the average array directivity factor," IEEE Signal Process. Lett., vol. 20, no. 9, pp. 877-880, Jul. 2013.
[120]
A. T. Parsons, "Maximum directivity proof for three-dimensional arrays," J. Acoust. Soc. Amer., vol. 82, no. 1, pp. 179-182, 1987.
[121]
H. Cox, R. Zeskind, and M. Owen, "Robust adaptive beamforming," IEEE Trans. Acoust., Speech, Signal. Process., vol. ASSP-35, no. 10, pp. 1365-1376, Oct. 1987.
[122]
D. Levin, E. Habets, and S. Gannot, "Robust beamforming using sensors with nonidentical directivity patterns," in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., Vancouver, BC, Canada, May 2013, pp. 91-95.
[123]
G. W. Elko, "Microphone array systems for hands-free telecommunication," Speech Commun., vol. 20, no. 3, pp. 229-240, 1996.
[124]
J. Chen, J. Benesty, and C. Pan, "On the design and implementation of linear differential microphone arrays," J. Acoust. Soc. Amer., vol. 136, no. 6, pp. 3097-3113, 2014.
[125]
J. Benesty and J. Chen, Study and Design of Differential Microphone Arrays. New York, NY, USA: Springer, 2013.
[126]
J. Benesty, J. Chen, and I. Cohen, Design of Third-Order Circular Differential Arrays. New York, NY, USA: Springer, 2015.
[127]
J. Benesty, J. Chen, and C. Pan, Fundamentals of Differential Beamforming. New York, NY, USA: Springer, 2016.
[128]
H.-E. de Bree, P. Leussink, T. Korthorst, H. Jansen, T. S. Lammerink, and M. Elwenspoek, "The μ-flown: A novel device for measuring acoustic flows," Sensors Actuators A, Phys., vol. 54, no. 1, pp. 552-557, 1996.
[129]
J. Meyer and G. Elko, "A highly scalable spherical microphone array based on an orthonormal decomposition of the soundfield," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2002, vol. 2, pp. 1781-1784.
[130]
G.W. Elko and J.M. Meyer, "Using a higher-order spherical microphone array to assess spatial and temporal distribution of sound in rooms," J. Acoust. Soc. Amer., vol. 132, no. 3, pp. 1912-1912, Mar. 2012.
[131]
T. D. Abhayapala and D. B. Ward, "Theory and design of high order sound field microphones using spherical microphone array," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2002, vol. 2, pp. 1949- 1952.
[132]
B. Rafaely, "Analysis and design of spherical microphone arrays," IEEE Trans. Speech Audio Process., vol. 13, no. 1, pp. 135-143, Jan. 2005.
[133]
Z. Li and R. Duraiswami, "Flexible and optimal design of spherical microphone arrays for beamforming," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 2, pp. 702-714, Feb. 2007.
[134]
C. T. Jin, N. Epain, and A. Parthy, "Design, optimization and evaluation of a dual-radius spherical microphone array," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 1, pp. 193-204, Jan. 2014.
[135]
B. Rafaely, Fundamentals of Spherical Array Processing, vol. 8, New York, NY, USA: Springer, 2015.
[136]
N. Ito, H. Shimizu, N. Ono, and S. Sagayama, "Diffuse noise suppression using crystal-shaped microphone arrays," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 7, pp. 2101-2110, Sep. 2011.
[137]
S. Markovich, S. Gannot, and I. Cohen, "Multichannel eigenspace beamforming in a reverberant noisy environment with multiple interfering speech signals," IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 6, pp. 1071-1086, Aug. 2009.
[138]
T. Dvorkind and S. Gannot, "Time difference of arrival estimation of speech source in a noisy and reverberant environment," Signal Process., vol. 85, no. 1, pp. 177-204, 2005.
[139]
E. Jan and J. Flanagan, "Sound capture from spatial volumes: Matched-filter processing of microphone arrays having randomly-distributed sensors," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 1996, vol. 2, pp. 917-920.
[140]
B. D. Van Veen and K.M. Buckley, "Beamforming: A versatile approach to spatial filtering," IEEE Acoust., Speech, Signal Process. Mag., vol. 5, no. 2, pp. 4-24, Apr. 1988.
[141]
C. L. Dolph, "A current distribution for broadside arrays which optimizes the relationship between beam width and side-lobe level," Proc. IRE, vol. 34, no. 6, pp. 335-348, Jun. 1946.
[142]
I. McCowan and H. Bourlard, "Microphone array post-filter based on noise field coherence," IEEE Trans. Speech Audio Process., vol. 11, no. 6, pp. 709-716, Nov. 2003.
[143]
K. U. Simmer, J. Bitzer, and C. Marro, "Post-filtering techniques," in Microphone Arrays: Signal Processing Techniques and Applications, M. S. Brandstein and D. B. Ward, Eds., Berlin, Germany: Springer-Verlag 2001, ch. 3, pp. 39-60.
[144]
W.Kellermann, "A self-steering digital microphone array," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 1991, pp. 3581-3584.
[145]
D. B. Ward, R. A. Kennedy, and R. C. Williamson, "Theory and design of broadband sensor arrays with frequency invariant far-field beam patterns," J. Acoust. Soc. Amer., vol. 97, no. 2, pp. 1023-1034, 1995.
[146]
S. Doclo and M. Moonen, "Design of far-field and near-field broadband beamformers using eigenfilters," Signal Process., vol. 83, no. 12, pp. 2641-2673, 2003.
[147]
S. Markovich-Golan, S. Gannot, and I. Cohen, "A weighted multichannel Wiener filter for multiple sources scenarios," in Proc. IEEE 27th Convention Elect. Electron. Eng. Israel, Eilat, Israel, Nov. 2012, pp. 1-5.
[148]
S. Doclo, A. Spriet, J. Wouters, and M. Moonen, "Speech distortion weighted multichannel Wiener filtering techniques for noise reduction," in Speech Enhancement (Signals and Communication Technology). Berlin, Germany: Springer, pp. 199-228 2005.
[149]
H. Cox, "Resolving power and sensitivity to mismatch of optimum array processors," J. Acoust. Soc. Amer., vol. 54, no. 3, pp. 771-785, Sep. 1973.
[150]
S. Araki, H. Sawada, and S. Makino, "Blind speech separation in a meeting situation with maximum SNR beamformers," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2007, vol. 1, pp. 41-44.
[151]
E. Warsitz and R. Haeb-Umbach, "Blind acoustic beamforming based on generalized eigenvalue decomposition," IEEE Trans. Audio, Speech, Lang., vol. 15, no. 5, pp. 1529-1539, Jul. 2007.
[152]
G. H. Golub and C. F. V. Loan, Matrix Computations, 3rd ed. Baltimore, MD, USA: The Johns Hopkins Univ. Press, Nov. 1996.
[153]
E. Habets, J. Benesty, I. Cohen, and S. Gannot, "On a tradeoff between dereverberation and noise reduction using the MVDR beamformer," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2009, pp. 3741- 3744.
[154]
S. Doclo and M. Moonen, "Multimicrophone noise reduction using recursive GSVD-based optimal filtering with ANC postprocessing stage," IEEE Trans. Speech Audio Process., vol. 13, no. 1, pp. 53-69, Jan. 2005.
[155]
S. Doclo and M. Moonen, "GSVD-based optimal filtering for single and multimicrophone speech enhancement," IEEE Trans. Signal Process., vol. 50, no. 9, pp. 2230-2244, Sep. 2002.
[156]
J. Benesty, J. Chen, and Y. Huang, Microphone Array Signal Processing, vol. 1. New York, NY, USA: Springer, 2008.
[157]
M. Souden, J. Benesty, and S. Affes, "On optimal frequency-domain multichannel linear filtering for noise reduction," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 2, pp. 260-276, Feb. 2010.
[158]
N. Roman, D. Wang, and G. Brown, "Speech segregation based on sound localization," J. Acoust. Soc. Amer., vol. 114, no. 4, pp. 2236-2252, 2003.
[159]
Y. Izumi, N. Ono, and S. Sagayama, "Sparseness-based 2ch BSS using the EM algorithm in reverberant environment," in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust., 2007, pp. 147-150.
[160]
R. Gribonval, "Piecewise linear source separation," SPIE Wavelets: Appl. Signal Image Process., vol. 5207, pp. 297-310, 2003.
[161]
J. P. Rosca, C. Borss, and R. V. Balan, "Generalized sparse signal mixing model and application to noisy blind source separation," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2004, pp. III-877-III-880.
[162]
M. Togami, T. Sumiyoshi, and A. Amano, "Sound source separation of overcomplete convolutive mixtures using generalized sparseness," in Proc. Int. Workshop Acoust. Echo Noise Control, 2006.
[163]
O. Thiergart, M. Taseska, and E. A. Habets, "An informed parametric spatial filter based on instantaneous direction-of-arrival estimates," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 12, pp. 2182-2196, Dec. 2014.
[164]
E. Vincent, "Complex nonconvex lp norm minimization for underdeter-mined source separation," in Proc. Int. Conf. Independent Component Anal. Blind Signal Separation, 2007, pp. 430-437.
[165]
M. Maazaoui, Y. Grenier, and K. Abed-Meraim, "Frequency domain blind source separation for robot audition using a parameterized sparsity criterion," in Proc. Eur. Signal Process. Conf., 2011, pp. 1869-1873.
[166]
S. Choi, A. Cichocki, and S. Amari, "Flexible independent component analysis," in Proc. IEEE Signal Process. Soc. Workshop Neural Netw. Signal Process. VII, 1998, pp. 83-92.
[167]
R. Everson and S. Roberts, "Independent component analysis: A flexible nonlinearity and decorrelating manifold approach," Neural Comput., vol. 11, pp. 1957-1983, 1999.
[168]
N. Ono and S. Miyabe, "Auxiliary-function-based independent component analysis for super-Gaussian sources," in Proc. Int. Conf. Latent Variable Anal. Signal Separation, 2010, pp. 165-172.
[169]
G. Bao, Z. Ye, X. Xu, and Y. Zhou, "A compressed sensing approach to blind separation of speech mixture based on a two-layer sparsity model," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 5, pp. 899-906, May 2013.
[170]
R. C. Hendriks, J. S. Erkelens, J. Jensen, and R. Heusdens, "Minimum mean-square error amplitude estimators for speech enhancement under the generalized gamma distribution," in Proc. Int. Workshop Acoust. Signal Enhanc., 2006, pp. 1-6.
[171]
H. Sawada, R. Mukai, S. Araki, and S. Makino, "Polar coordinate based nonlinear function for frequency domain blind source separation," IEICE Tech. Rep., vol. E86-A, no. 3, pp. 590-596, Mar. 2003.
[172]
R. Mukai, H. Sawada, S. Araki, and S. Makino, "Blind source separation for moving speech signals using blockwise ICA and residual crosstalk subtraction," IEICE Tech. Rep., vol. E87-A, no. 8, pp. 1941-1948, 2004.
[173]
T. Lotter and P. Vary, "Speech enhancement by MAP spectral amplitude estimation using a super-Gaussian speech model," EURASIP J. Appl. Signal Process., vol. 2005, pp. 1110-1126, 2005.
[174]
J. I. Marin-Hurtado, D. N. Parikh, and D. V. Anderson, "Perceptually inspired noise-reduction method for binaural hearing aids," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 4, pp. 1372-1382, May 2012.
[175]
J.-F. Cardoso, "The three easy routes to independent component analysis; contrasts and geometry," in Proc. Int. Conf. Independent Component Anal. Blind Signal Separation, 2001, pp. 1-6.
[176]
P. Comon, "Independent component analysis, a new concept?," Signal Process., vol. 36, no. 3, pp. 287-314, 1994.
[177]
T. Lee, Independent Component Analysis--Theory and Applications. Boston, MA, USA: Kluwer, 1998.
[178]
A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component Analysis, (Adaptive and Learning Systems), 1st ed. Hoboken, NJ, USA:Wiley, 2001.
[179]
K. Kumatani, J. McDonough, B. Rauch, D. Klakow, P. N. Garner, and W. Li, "Beamforming with a maximum negentropy criterion," IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 5, pp. 994-1008, Jul. 2009.
[180]
H. Sawada, R. Mukai, S. Araki, and S. Makino, "A robust and precise method for solving the permutation problem of frequency-domain blind source separation," IEEE Trans. Speech Audio Process., vol. 12, no. 5, pp. 530-538, Sep. 2004.
[181]
H. Sawada, S. Araki, and S. Makino, "Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 3, pp. 516-527, Mar. 2011.
[182]
S. Winter, W. Kellermann, H. Sawada, and S. Makino, "MAP-based underdetermined blind source separation of convolutive mixtures by hierarchical clustering and l1-norm minimization," EURASIP J. Appl. Signal Process., vol. 2007, no. 1, pp. 81-81, 2007.
[183]
S. Arberet, P. Vandergheynst, R. E. Carrillo, J.-P. Thiran, and Y. Wiaux, "Sparse reverberant audio source separation via reweighted analysis," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 7, pp. 1391- 1402, Jul. 2013.
[184]
H. Buchner, R. Aichner, and W. Kellermann, "TRINICON-based blind system identification with application tomultiple-source localization and separation," in Blind Speech Separation, S. Makino, T.-W. Lee, and H. Sawada, Eds. Berlin, Germany: Springer, 2007, pp. 101-147.
[185]
N. Mitianoudis and M. E. Davies, "Audio source separation of convolutive mixtures," IEEE Trans. Speech Audio Process., vol. 11, no. 5, pp. 489-497, Sep. 2003.
[186]
A. Hiroe, "Solution of permutation problem in frequency domain ICA usingmultivariate probability density functions," in Proc. Int. Conf. Independent Component Anal. Blind Signal Separation, 2006, pp. 601-608.
[187]
T. Kim, H. T. Attias, S.-Y. Lee, and T.-W. Lee, "Blind source separation exploiting higher-order frequency dependencies," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 1, pp. 70-79, Jan. 2007.
[188]
N. Ono, "Stable and fast update rules for independent vector analysis based on auxiliary function technique," in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust., 2011, pp. 189-192.
[189]
M. Er and A. Cantoni, "Derivative constraints for broad-band element space antenna array processors," IEEE Trans. Acoust., Speech, Signal Process., vol. 31, no. 6, pp. 1378-1393, Dec. 1983.
[190]
K. Buckley, "Spatial/spectral filteringwith linearly constrained minimum variance beamformers," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-35, no. 3, pp. 249-266, Mar. 1987.
[191]
Y. Zheng, R. Goubran, and M. El-Tanany, "Robust near-field adaptive beamforming with distance discrimination," IEEE Trans. Speech Audio Process., vol. 12, no. 5, pp. 478-488, Sep. 2004.
[192]
O. L. Frost III, "An algorithm for linearly constrained adaptive array processing," Proc. IEEE, vol. 60, no. 8, pp. 926-935, Aug. 1972.
[193]
L. J. Griffiths and C. W. Jim, "An alternative approach to linearly constrained adaptive beamforming," IEEE Trans. Antennas Propag., vol.AP- 30, no. 1, pp. 27-34, Jan. 1982.
[194]
B. R. Breed and J. Strauss, "A short proof of the equivalence of LCMV and GSC beamforming," IEEE Signal Process. Lett., vol. 9, no. 6, pp. 168-169, Jun. 2002.
[195]
G. Strang, Linear Algebra and its Application. 2nd ed. New York, NY, USA: Academic, 1980.
[196]
B. Widrow et al., "Adaptive noise cancelling: Principals and applications," Proc. IEEE, vol. 63, no. 12, pp. 1692-1716, Dec. 1975.
[197]
S. Nordholm, I. Claesson, and B. Bengtsson, "Adaptive array noise suppression of handsfree speaker input in cars," IEEE Trans. Vehicular Technol., vol. 42, no. 4, pp. 514-518, Nov. 1993.
[198]
S. Markovich-Golan, S. Gannot, and I. Cohen, "Subspace tracking of multiple sources and its application to speakers extraction," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Dallas, TX, USA, Mar. 2010, pp. 201-204.
[199]
O. Hoshuyama, A. Sugiyama, and A. Hirano, "A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters," IEEE Trans. Signal Process., vol. 47, no. 10, pp. 2677- 2684, Oct. 1999.
[200]
W. Herbordt and W. Kellermann, "Computationally efficient frequency-domain robust generalized sidelobe canceller," in Proc. Int. Workshop Acoust. Signal Enhanc., Darmstadt, Germany, Sep. 2001, pp. 51-54.
[201]
G. Reuven, S. Gannot, and I. Cohen, "Dual-source transfer-function generalized sidelobe canceller," IEEE Trans. Audio, Speech, Lang. Process., vol. 16, no. 4, pp. 711-727, May 2008.
[202]
S. Markovich-Golan, S. Gannot, and I. Cohen, "A sparse blocking matrix for multiple constraints GSC beamformer," in Proc. IEEE Intl. Conf. Acoust., Speech, Signal Process., Kyoto, Japan, Apr. 2012, pp. 197-200.
[203]
N. Madhu and R. Martin, "A versatile framework for speaker separation using a model-based speaker localization approach," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 7, pp. 1900-1912, Sep. 2011.
[204]
A. Spriet, M. Moonen, and J. Wouters, "Spatially pre-processed speech distortion weighted multi-channel Wiener filtering for noise reduction," Signal Process., vol. 84, no. 12, pp. 2367-2387, Dec. 2004.
[205]
R. Martin and T. Lotter, "Optimal recursive smoothing of non-stationary periodograms," in Proc. Int. Workshop Acoust. Echo NoiseControl, 2001, pp. 167-170.
[206]
R. Martin, "Noise power spectral density estimation based on optimal smoothing and minimum statistics," IEEE Trans. Speech Audio Process., vol. 9, no. 5, pp. 504-512, Jul. 2001.
[207]
I. Cohen, "Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging," IEEE Trans. Speech Audio Process., vol. 11, no. 5, pp. 466-475, Sep. 2003.
[208]
T. Gerkmann and R. Hendriks, "Noise power estimation based on the probability of speech presence," in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust., 2011, pp. 145-148.
[209]
T. Gerkmann, C. Breithaupt, and R. Martin, "Improved a posteriori speech presence probability estimation based on a likelihood ratio with fixed priors," IEEE Trans. Audio, Speech, Lang. Process., vol. 16, no. 5, pp. 910-919, Jul. 2008.
[210]
Y. Ephraim and D. Malah, "Speech enhancement using a minimummean square error short-time spectral amplitude estimator," IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 6, pp. 1109-1121, Dec. 1984.
[211]
I. Cohen and B. Berdugo, "Speech enhancement for non-stationary noise environments," Signal Process., vol. 81, no. 11, pp. 2403-2418, Nov. 2001.
[212]
J. Sohn, N. Kim, and W. Sung, "A statistical model-based voice activity detection," IEEE Signal Process. Lett., vol. 6, no. 1, pp. 1-3, Jan. 1999.
[213]
M. Souden, J. Chen, J. Benesty, and S. Affes, "Gaussian model-based multichannel speech presence probability," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 5, pp. 1072-1077, Jul. 2010.
[214]
M. Taseska and E. A. Habets, "MMSE-based blind source extraction in diffuse noise fields using a complex coherence-based a priori SAP estimator," in Proc. Int. Workshop Acoust. Signal Enhanc., 2012, pp. 1-4.
[215]
M. Taseska and E. Habets, "Spotforming using distributed microphone arrays," in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust., Oct. 2013, pp. 1-4.
[216]
M. Taseska and E. Habets, "Informed spatial filtering for sound extraction using distributed microphone arrays," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 7, pp. 1195-1207, Jul. 2014.
[217]
M. Taseska and E. A. Habets, "Spotforming: Spatial filtering with distributed arrays for position-selective sound acquisition," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 24, no. 7, pp. 1291-1304, Jul. 2016.
[218]
M. Taseska, S. Markovich-Golan, E. Habets, and S. Gannot, "Nearfield source extraction using speech presence probabilities for ad hoc microphone arrays," in Proc. 14th Int. Workshop Acoust. Signal Enhanc., 2014, pp. 169-173.
[219]
I. Cohen, "Relative transfer function identification using speech signals," IEEE Trans. Speech Audio Process., vol. 12, no. 5, pp. 451-459, Sep. 2004.
[220]
A. Bertrand and M. Moonen, "Distributed node-specific LCMV beamforming in wireless sensor networks," IEEE Trans. Signal Process., vol. 60, no. 1, pp. 233-246, Jan. 2012.
[221]
S. Markovich-Golan and S. Gannot, "Performance analysis of the covariance subtraction method for relative transfer function estimation and comparison to the covariance whitening method," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Brisbane, Australia, Apr. 2015, pp. 544-548.
[222]
M. Ito, S. Araki, and T. Nakatani, "Permutation-free clustering of relative transfer function features for blind source separation," in Proc. Eur. Signal Process. Conf., 2015, pp. 409-413.
[223]
M. Taseska and E. A. P. Habets, "Relative transfer function estimation exploiting instantaneous signals and the signal subspace," in Proc. Eur. Signal Process. Conf., 2015, pp. 404-408.
[224]
S. Meier and W. Kellermann, "Analysis of the performance and limitations of ICA-based relative impulse response identification," in Proc. Eur. Signal Process. Conf., 2015, pp. 414-418.
[225]
L. Molgedey and H.G. Schuster, "Separation of a mixture of independent signals using time delayed correlations," Phys. Rev. Lett., vol. 72, no. 23, pp. 3634-3637, 1994.
[226]
M. Z. Ikram and D. R. Morgan, "A beamformer approach to permutation alignment for multichannel frequency-domain blind source separation," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2002, pp. 881- 884.
[227]
D.-T. Pham, C. Servière, and H. Boumaraf, "Blind separation of speech mixtures based on nonstationarity," in Proc. Int. Symp. Signal Process. Appl., 2003, pp. II-73-II-76.
[228]
A. Ozerov, E. Vincent, and F. Bimbot, "A general flexible framework for the handling of prior information in audio source separation," IEEE Trans. Audio, Speech, Lang. Process, vol. 20, no. 4, pp. 1118-1133, May 2012.
[229]
H. Sawada, H. Kameoka, S. Araki, and N. Ueda, "Efficient algorithms for multichannel extensions of Itakura-Saito nonnegative matrix factorization," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2012, pp. 261-264.
[230]
R. Sakanashi, S. Miyabe, T. Yamada, and S. Makino, "Comparison of superimposition and sparse models in blind source separation by multichannel Wiener filter," in Proc. Asia-Pacific Signal Inf. Process. Assoc., 2012, pp. 1-6.
[231]
K. Adilo¿lu and E. Vincent, "A general variational Bayesian framework for robust feature extraction in multisource recordings," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2012, pp. 273-276.
[232]
H. Sawada, H. Kameoka, S. Araki, and N. Ueda, "Multichannel extensions of non-negative matrix factorization with complex-valued data," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 5, pp. 971-982, May 2013.
[233]
M. Souden, S. Araki, K. Kinoshita, T. Nakatani, and H. Sawada, "A multichannel MMSE-based framework for speech source separation and noise reduction," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 9, pp. 1913-1928, Sep. 2013.
[234]
J. Thiemann and E. Vincent, "A fast EM algorithm for Gaussian model-based source separation," in Proc. 21st Eur. Signal Process. Conf., 2013, pp. 1-5.
[235]
N. Ito, E. Vincent, T. Nakatani, N. Ono, S. Araki, and S. Sagayama, "Blind suppression of nonstationary diffuse noise based on spatial covariance matrix decomposition," J. Signal Process. Syst., vol. 79, no. 2, pp. 145- 157, 2015.
[236]
D. Schmid, G. Enzner, S. Malik, D. Kolossa, and R. Martin, "Variational Bayesian inference for multichannel dereverberation and noise reduction," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 8, pp. 1320-1335, Aug. 2014.
[237]
K. Adilo¿lu and E. Vincent, "Variational Bayesian inference for source separation and robust feature extraction," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 24, no. 10, pp. 1746-1758, Oct. 2016.
[238]
A. P. Dempster, N. M. Laird, and D. B. Rubin.,"Maximum likelihood from incomplete data via the EM algorithm," J. Roy. Statist. Soc. Ser. B (Methodological), vol. 39, pp. 1-38, 1977.
[239]
C. Blandin, A. Ozerov, and E. Vincent, "Multi-source TDOA estimation in reverberant audio using angular spectra and clustering," Signal Process., vol. 92, no. 8, pp. 1950-1960, 2012.
[240]
A. Ozerov, P. Philippe, F. Bimbot, and R. Gribonval, "Adaptation of Bayesian models for single-channel source separation and its application to voice/music separation in popular songs," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 5, pp. 1564-1578, Jul. 2007.
[241]
M. Togami, "Online speech source separation based on maximum likelihood of local Gaussian modeling," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2011, pp. 213-216.
[242]
L. S. R. Simon and E. Vincent, "A general framework for online audio source separation," in Proc. Int. Conf. Latent Variable Anal. Signal Separation, Tel-Aviv, Israel, Mar. 2012, pp. 397-404.
[243]
D. M. Titterington, "Recursive parameter estimation using incomplete data," J. Roy. Statist. Soc. B, vol. 46, no. 2, pp. 257-267, 1984.
[244]
O. Cappé and E. Moulines, "On-line expectation-maximization algorithm for latent data models," J. Roy. Statist. Soc. B, vol. 71, no. 3, pp. 593-613, 2009.
[245]
R. M. Neal and G. E. Hinton, "A view of the EM algorithm that justifies incremental, sparse, and other variants," in Learning in Graphical Models, M. I. Jordan, Ed., Cambridge, MA, USA: MIT Press, 2009, pp. 355-368.
[246]
O. Schwartz and S. Gannot, "Speaker tracking using recursive EM algorithms," IEEE Trans. Audio, Speech, Lang., vol. 22, no. 2, pp. 392-402, Feb. 2014.
[247]
B. Schwartz, S. Gannot, and E. A. P. Habets, "Online speech derever-beration using Kalman filter and EM algorithm," IEEE Trans. Audio, Speech, Lang. Process., vol. 23, no. 2, pp. 394-406, Feb. 2015.
[248]
C. M. Bishop, Pattern Recognition and Machine Learning. New York, NY, USA: Springer, 2006.
[249]
D. R. Hunter and K. Lange, "A tutorial on MM algorithms," Amer. Statistician, vol. 58, no. 1, pp. 30-37, Feb. 2004.
[250]
R. Zelinski, "A microphone array with adaptive post-filtering for noise reduction in reverberant rooms," in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., Apr. 1988, pp. 2578-2581.
[251]
J. Meyer and K. U. Simmer, "Multi-channel speech enhancement in a car environment using Wiener filtering and spectral subtraction," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Munich, Germany, Apr. 1997, pp. 21-24.
[252]
C. Marro, Y. Mahieux, and K. Simmer, "Analysis of noise reduction and dereverberation techniques based on microphone arrays with postfiltering," IEEE Trans. Speech Audio Process., vol. 6, no. 3, pp. 240-259, May 1998.
[253]
S. Leukimmiatis, D. Dimitriadis, and P. Maragos, "An optimum microphone array post-filter for speech applications," in Proc. Int. Conf. Spoken Lang. Process., 2006, pp. 2142-2145.
[254]
R. Balan and J. Rosca, "Microphone array speech enhancement by bayesian estimation of spectral amplitude and phase," in Proc. IEEE Workshop Sensor Array Multichannel Signal Process., 2002, pp. 209- 213.
[255]
Y. Ephraim and D. Mala, "Speech enhancement using a minimum mean square error log-spectral amplitude estimator," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-33, no. 2, pp. 443-445, Apr. 1985.
[256]
I. Cohen, S. Gannot, and B. Berdugo, "An integrated real-time beamforming and postfiltering system for nonstationary noise environments," EURASIP J. Adv. Signal Process., vol. 2003, pp. 1064-1073, Oct. 2003.
[257]
S. Gannot and I. Cohen, "Speech enhancement based on the general transfer function GSC and postfiltering," IEEE Trans. Speech Audio Process., vol. 12, no. 6, pp. 561-571, Nov. 2004.
[258]
C. Zheng, H. Liu, R. Peng, and X. Li, "A statistical analysis of two-channel post-filter estimators in isotropic noise fields," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 2, pp. 336-342, Feb. 2013.
[259]
D. Kolossa and R. Orglmeister, "Nonlinear postprocessing for blind speech separation," in Proc. Int. Conf. Independent Component Anal. Blind Signal Separation, 2004, pp. 832-839.
[260]
E. Hoffmann, D. Kolossa, and R. Orglmeister, "Time frequency masking strategy for blind source separation of acoustic signals based on optimally-modified log-spectral amplitude estimator," in Proc. Int. Conf. Independent Component Anal. Blind Signal Separation, 2009, pp. 581-588.
[261]
Y. Hioka, K. Furuya, K. Kobayashi, K. Niwa, and Y. Haneda, "Under-determined sound source separation using power spectrum density estimated by combination of directivity gain," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 6, pp. 1240-1250, June 2013.
[262]
A. Jourjine, S. Rickard, and O. Yilmaz, "Blind separation of disjoint orthogonal signals: Demixing n sources from 2 mixtures," in Proc. IEEE Int. Conf. Acoust., Speech. Signal Process., 2000, vol. 5, pp. 2985-2988.
[263]
D. L.Wang, "On ideal binary mask as the computational goal of auditory scene analysis," in Speech Separation by Humans and Machines. New York, NY, USA: Springer, 2005, pp. 181-197.
[264]
J. Mouba and S. Marchand, "A source localization/separation/ respatialization system based on unsupervised classification of interaural cues," in Proc. Conf. Dig. Audio Effects, 2006, pp. 233-238.
[265]
H.-M. Park and R. M. Stern, "Spatial separation of speech signals using amplitude estimation based on interaural comparisons of zero-crossings," Speech Commun., vol. 51, no. 1, pp. 15-25, Jan. 2009.
[266]
A. Deleforge, F. Forbes, and R. Horaud, "Variational EM for binaural sound-source separation and localization," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2013, pp. 76-80.
[267]
H. Viste and G. Evangelista, "On the use of spatial cues to improve binaural source separation," in Proc. Conf. Digit. Audio Effects, 2003, pp. 209-213.
[268]
M. I. Mandel, R. J. Weiss, and D. P. W. Ellis, "Model-based expectation maximization source separation and localization," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 2, pp. 382-394, Feb. 2010.
[269]
P. Aarabi and G. Shi, "Phase-based dual-microphone robust speech enhancement," IEEE Trans. Syst., Man, Cybern., vol. 34, no. 4, pp. 1763- 1773, Aug. 2004.
[270]
A. Shamsoddini and P. Denbigh, "A sound segregation algorithm for reverberant conditions," Speech Commun., vol. 33, no. 3, pp. 179-196, 2001.
[271]
N. Roman, S. Srinivasan, and D. Wang, "Binaural segregation in multisource reverberant environments," J. Acoust. Soc. Amer., vol. 120, no. 6, pp. 4040-4051, 2006.
[272]
E.Vincent and X. Rodet, "Underdetermined source separationwith structured source priors," in Proc. Int. Conf. Independent Component Anal. Blind Signal Separation, 2004, pp. 327-332.
[273]
M. G. Christensen, "Multi-channel maximum likelihood pitch estimation," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Mar. 2012, pp. 409-412.
[274]
S. Karimian-Azari, J. R. Jensen, and M. G. Christensen, "Fast joint DOA and pitch estimation using a broadband MVDR beamformer," in Proc. Eur. Signal Process. Conf., Marrakech, Morocco, Sep. 2013, pp. 1-5.
[275]
S. Arberet et al., "Nonnegative matrix factorization and spatial covariance model for under-determined reverberant audio source separation," in Proc. Int. Symp. Signal Process. Appl., 2010, pp. 1-4.
[276]
B. Weinstein, A. V. Oppenheim, M. Feder, and J. R. Buck, "Iterative and sequential algorithms for multisensor signal enhancement," IEEE Trans. Signal Process., vol. 42, no. 4, pp. 846-859, Apr. 1994.
[277]
M. Feder, A. V. Oppenheim, and E. Weinstein, "Maximum likelihood noise cancellation using the emalgorithm," IEEE Trans. Acoust., Speech, Signal Process., vol. 37, no. 2, pp. 204-216, Feb. 1989.
[278]
X. Sun and S. Douglas, "A natural gradient convolutive blind source separation algorithm for speech mixtures," in Proc. Int. Conf. Independent Component Anal. Blind Signal Separation, 2001, pp. 1-4.
[279]
M. Reyes-Gomez, B. Raj, and D. Ellis, "Multi-channel source separation by factorial HMMs," in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 2003, pp. I-664-I-667.
[280]
A. Ozerov, C. Févotte, and M. Charbit, "Factorial scaled hidden Markov model for polyphonic audio representation and source separation," in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust., 2009, pp. 121-124.
[281]
C. Févotte, N. Bertin, and J.-L. Durrieu, "Nonnegative matrix factorization with the Itakura-Saito divergence with application to music analysis," Neural Comput., vol. 21, no. 3, pp. 793-830, Mar. 2009.
[282]
D. Kounades-Bastian, L. Girin, X. Alameda-Pineda, S. Gannot, and R. Horaud, "A variational EM algorithm for the separation of timevarying convolutive audio mixtures," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 24, no. 8, pp. 1408-1423, Aug. 2016.
[283]
S. Nakamura, K. Hiyane, F. Asano, T. Nishiura, and T. Yamada, "Acoustical sound database in real environments for sound scene understanding and hands-free speech recognition," in Proc. 2nd Int. Conf. Lang. Res. Eval., 2000, pp. 965-968.
[284]
E. Vincent et al., "The signal separation evaluation campaign (2007- 2010): Achievements and remaining challenges," Signal Process., vol. 92, pp. 1928-1936, 2012.
[285]
H. Kayser, S. D. Ewert, J. Anemüller, T. Rohdenburg, V. Hohmann, and B. Kollmeier, "Database ofmultichannel in-ear and behind-the-ear headrelated and binaural room impulse responses," EURASIP J. Adv. Signal Process., vol. 2009, 2009, Art. no. 6.
[286]
E. Vincent, J. Barker, S. Watanabe, J. Le Roux, F. Nesta, and M. Matassoni, "The second CHiME speech separation and recognition challenge: Datasets, tasks and baselines," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2013, pp. 126-130.
[287]
E. Hadad, F. Heese, P.Vary, and S. Gannot, "Multichannel audio database in various acoustic environments," in Proc. Int. Workshop Acoust. Signal Enh., Antibes - Juan les Pins, France, Sep. 2014, pp. 313-317.
[288]
V. Välimäki, J. D. Parker, L. Savioja, J. O. Smith, and J. S. Abel, "Fifty years of artificial reverberation," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 5, pp. 1421-1447, Jul. 2012.
[289]
J. Allen and D. Berkley, "Image method for efficiently simulating smallroom acoustics," J. Acoust. Soc. Amer., vol. 65, no. 4, pp. 943-950, Apr. 1979.
[290]
D. R. Campbell, K. J. Palomäki, and G. J. Brown, "Roomsim, a MATLAB simulation of "shoebox" room acoustics for use in teaching and research," J. Comput. Inf. Syst., vol. 9, no. 3, pp. 48-51, 2005.
[291]
D. Jarrett, E. Habets, M. Thomas, and P. Naylor, "Rigid sphere room impulse response simulation: Algorithm and applications," J. Acoust. Soc. Amer., vol. 132, no. 3, pp. 1462-1472, 2012.
[292]
J. K. Nielsen, J. R. Jensen, S. H. Jensen, and M. G. Christensen, "The single- and multichannel audio recordings database (SMARD)," in Proc. Int. Workshop Acoust. Signal Enh., 2014, pp. 40-44.
[293]
J. Le Roux and E. Vincent, "A categorization of robust speech processing datasets," Mitsubishi Electric Research Laboratories, Cambridge, MA, USA, Tech. Rep. TR2014-116, Aug. 2014.
[294]
S. Renals, T. Hain, and H. Bourlard, "Interpretation of multiparty meetings: The AMI and AMIDA projects," in Proc. Joint Workshop Hands-Free Speech Commun. Microphone Arrays, 2008, pp. 115-118.
[295]
A. Brutti, L. Cristoforetti, W. Kellermann, L. Marquardt, and M. Omologo, "WOZ acoustic data collection for interactive TV," in Proc. Int. Conf. Lang. Res. Eval., vol. 44, no. 3, 2008, pp. 205-219.
[296]
A. Stupakov, E. Hanusa, D. Vijaywargi, D. Fox, and J. Bilmes, "The design and collection of COSINE, a multi-microphone in situ speech corpus recorded in noisy environments," Comput. Speech Lang., vol. 26, no. 1, pp. 52-66, 2011.
[297]
J. Barker, R. Marxer, E. Vincent, and S. Watanabe, "The third 'CHiME' speech separation and recognition challenge: Dataset, task and baselines," in Proc. IEEE Workshop Automatic Speech Recognit. Understanding, 2015, pp. 504-511.
[298]
J. Le Roux, E. Vincent, J. R. Hershey, and D. P. W. Ellis, "MICbots: collecting large realistic datasets for speech and audio research using mobile robots," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2015, pp. 504-511.
[299]
A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, "Perceptual evaluation of speech quality (PESQ)--A new method for speech quality assessment of telephone networks and codecs," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. 2, pp. 749-752, 2001.
[300]
R. Huber and B. Kollmeier, "PEMO-Q--A new method for objective audio quality assessment using a model of auditory perception," IEEE Trans. Acoust, Speech, Lang. Process., vol. 14, no. 6, pp. 1902-1911, Nov. 2006.
[301]
C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, "An algorithm for intelligibility prediction of time-frequency weighted noisy speech," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 7, pp. 2125- 2136, Sep. 2011.
[302]
V. Emiya, E. Vincent, N. Harlander, and V. Hohmann, "Subjective and objective quality assessment of audio source separation," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 7, pp. 2046-2057, Sep. 2011.
[303]
ITU, "ITU-T Recommendation P.835: Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm," 2003.
[304]
E. Vincent, M. G. Jafari, and M. D. Plumbley, "Preliminary guidelines for subjective evaluation of audio source separation algorithms," in Proc. UK ICA Res. Netw. Workshop, 2006.
[305]
J. Barker, E. Vincent, N. Ma, H. Christensen, and P. Green, "The PASCAL CHiME speech separation and recognition challenge," Comput. Speech Lang., vol. 27, no. 3, pp. 621-633, 2013.
[306]
K. Kumatani et al., "Microphone array processing for distant speech recognition: Towards real-world deployment," in Proc. Asia-Pacific Signal Inf. Process. Assoc., 2012, pp. 1-10.
[307]
J. Thiemann and E. Vincent, "An experimental comparison of source separation and beamforming techniques for microphone array signal enhancement," in Proc. IEEE Int. Workshop Mach. Learning Signal Process., 2013, pp. 1-5.
[308]
E. Vincent, H. Sawada, P. Bofill, S. Makino, and J. P. Rosca, "First stereo audio source separation evaluation campaign: data, algorithms and results," in Proc. Int. Conf. Independent Component Anal. Signal Separation, 2007, pp. 552-559.
[309]
E. Vincent, S. Araki, and P. Bofill,"The 2008 signal separation evaluati on campaign: A community-based approach to large-scale evaluation," in Proc. 8th Int. Conf. Independent Component Anal. Signal Separation, 2009, pp. 734-741.
[310]
S. Araki et al., "The 2010 signal separation evaluation campaign (SiSEC 2010): Audio source separation," in Proc. 9th Int. Conf. Latent Variable Anal. Signal Separation, 2010, pp. 114-122.
[311]
S. Araki et al., "The 2011 signal separation evaluation campaign (SiSEC2011): Audio source separation," in Proc. Int. Conf. Latent Variable Anal. Signal Separation, 2012, pp. 414-422.
[312]
N. Ono, Z. Koldovsky, S. Miyabe, and N. Ito, "The 2013 signal separation evaluation campaign," in Proc. IEEE Int. Workshop Mach. Learning Signal Process., Southampton, U.K., Sep. 2013, pp. 1-6.
[313]
N. Ono, D. Kitamura, Z. Rafii, N. Ito, and A. Liutkus, "The 2015 signal separation evaluation campaign," in Proc. Int. Conf. Latent Variable Anal. Signal Separation, Liberec, Czech Republic, Aug. 2015, pp. 387-395.
[314]
M. I. Mandel and D. P. W. Ellis, "EM localization and separation using interaural level and phase cues," in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust., Oct. 2007, pp. 275-278.
[315]
M. I. Mandel, D. P. W. Ellis, and T. Jebara, "An EM algorithm for localizing multiple sound sources in reverberant environments," in Proc. Neural Inf. Process. Conf., 2007, pp. 953-960.
[316]
Z. El Chami, A. D.-T. Pham, C. Servière, and A. Guerin, "A new model based underdetermined source separation," in Proc. Int. Workshop Acoust. Signal Enhanc., 2008, pp. 279-282.
[317]
H. Sawada, S. Araki, and S. Makino, "A two-stage frequency-domain blind source separation method for underdetermined convolutive mixtures," in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust., Oct. 2007, pp. 139-142.
[318]
J. Cho, J. Choi, and C. D. Yoo, "Underdetermined convolutive blind source separation using a novel mixing matrix estimation and MMSE-based source estimation," in Proc. IEEE Int. Workshop Mach. Learning Signal Process., 2011, pp. 1-6.
[319]
F. Nesta and M. Omologo, "Convolutive underdetermined source separation through weighted interleaved ICA and spatio-temporal correlation," in Proc. Latent Variable Anal. Signal Separation, 2012, pp. 222-230.
[320]
J. Cho and C. Yoo, "Underdetermined convolutive BSS: Bayes risk minimization based on amixture of super-Gaussian posterior approximation," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 23, no. 5, pp. 828- 839, May 2015.
[321]
E. Weinstein, M. Feder, and A. V. Oppenheim, "Multi-channel signal separation by decorrelation," IEEE Trans. Speech Audio Process., vol. 1, no. 4, pp. 405-413, Oct. 1993.
[322]
S. Araki, S. Makino, Y. Hinamoto, R. Mukai, T. Nishikawa, and H. Saruwatari, "Equivalence between frequency-domain blind source separation and frequency-domain adaptive beamforming for convolutive mixtures," EURASIP J. Appl. Sig. Process., vol. 11, pp. 1157-1166, 2003.
[323]
J.-F. Cardoso and A. Souloumiac, "Blind beamforming for non-gaussian signals," IEE Proc. F (Radar and Signal Proc.), vol. 140, no. 6, pp. 362- 370, 1993.
[324]
S. Y. Low and S. Nordholm, "A hybrid speech enhancement system employing blind source separation and adaptive noise cancellation," in Proc. 6th Nordic Signal Process. Symp., 2004, pp. 204-207.
[325]
K. Reindl, S. Meier, H. Barfuss, and W. Kellermann, "Minimum mutual information-based linearly constrained broadband signal extraction," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 6, pp. 1096-1108, June 2014.
[326]
H. Buchner, "A systematic approach to incorporate deterministic prior knowledge in broadband adaptive MIMO systems," in Proc. 2010 Conf. Rec. 44th Asilomar Conf. Signals, Systems Comput., 2010, pp. 461-468.
[327]
S. Markovich-Golan, S. Gannot, and W. Kellermann, "Combined LCMV-TRINICON beamforming for separating multiple speech sources in noisy and reverberant environments," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 25, no. 2, pp. 320-332, Feb. 2017.
[328]
S. Araki, M. Okada, T. Higuchi, A. Ogawa, and T. Nakatani, "Spatial correlation model based observation vector clustering and MVDR beamforming for meeting recognition," in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., Mar. 2016, pp. 385-389.
[329]
S. Araki and T. Nakatani, "Hybrid approach for multichannel source separation combining time-frequency mask with multi-channel Wiener filter," in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., May 2011, pp. 225-228.
[330]
A. Asaei, M. E. Davies, H. Bourlard, and V. Cevher, "Computational methods for structured sparse component analysis of convolutive speech mixtures," in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 2012, pp. 2425-2428.
[331]
Z. Koldovsky, J. Málek, P. Tichavsky, and F. Nesta, "Semi-blind noise extraction using partially known position of the target source," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 10, pp. 2029-2041, Oct. 2013.
[332]
R. Mignot, L. Daudet, and F. Ollivier, "Room reverberation reconstruction: Interpolation of the early part using compressed sensing," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 11, pp. 2013-2312, Nov. 2013.
[333]
A. Asaei, M. Golbabaee, H. Bourlard, and V. Cevher, "Structured sparsity models for reverberant speech separation," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 3, pp. 620-633, Mar. 2014.
[334]
R. Mignot, G. Chardon, and L. Daudet, "Low frequency interpolation of room impulse responses using compressed sensing," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 1, pp. 205-216, Jan. 2014.
[335]
A. Deleforge, F. Forbes, and R. Horaud, "Acoustic space learning for sound-source separation and localization on binaural manifolds," Int. J. Neural Syst., vol. 25, no. 1, 2015, Art. no. 1440003.
[336]
B. Laufer-Goldshtein, R. Talmon, and S. Gannot, "Semi-supervised sound source localization based on manifold regularization," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 24, no. 8, pp. 1393-1407, Aug. 2016.
[337]
B. Laufer-Goldshtein, R. Talmon, and S. Gannot, "Manifold-based Bayesian interference for semi-supervised source localization," in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., Shanghai, China, Mar. 2016, pp. 6335-6339.
[338]
B. Laufer-Goldshtein, R. Talmon, and S. Gannot, "Semi-supervised source localization onmultiple-manifoldswith distributed microphones," arXiv:1610.04770, 2016.
[339]
R. Talmon and S. Gannot, "Relative transfer function identification on manifolds for supervised GSC beamformers," in Proc. Eur. Signal Process. Conf., Marrakech, Morocco, Sep. 2013, pp. 1-5.
[340]
Y. Jiang, D. L. Wang, R. S. Liu, and Z. M. Feng, "Binaural classification for reverberant speech segregation using deep neural networks," IEEE/ACMTrans. Audio, Speech, Lang. Process., vol. 22, no. 12, pp. 2112-2121, Dec. 2014.
[341]
J. Woodruff and D. Wang, "Binaural detection, localization, and segregation in reverberant environments based on joint pitch and azimuth cues," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 4, pp. 806-815, Apr. 2013.
[342]
J. Heymann, L. Drude, A. Chinaev, and R. Haeb-Umbach, "BLSTM supported GEV beamformer front-end for the 3rd CHiME challenge," in Proc. IEEE Workshop Automatic Speech Recognit. Understanding, 2015, pp. 444-451.
[343]
A. A. Nugraha, A. Liutkus, and E. Vincent, "Multichannel audio source separation with deep neural networks," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 24, no. 9, pp. 1652-1664, Sep. 2016.
[344]
T. N. Sainath, R. J. Weiss, K.W. Wilson, A. Narayanan, M. Bacchiani, and A. Senior, "Speaker location and microphone spacing invariant acoustic modeling from raw multichannel waveforms," in Proc. IEEE Workshop Automatic Speech Recognit. Understanding, 2015, pp. 30-36.
[345]
S. Markovich-Golan, A. Bertrand, M. Moonen, and S. Gannot, "Optimal distributed minimum-variance beamforming approaches for speech enhancement in wireless acoustic sensor networks," Signal Process., vol. 107, pp. 4-20, 2015.
[346]
S. Doclo, M. Moonen, T. Van den Bogaert, and J. Wouters, "Reduced-bandwidth and distributed MWF-based noise reduction algorithms for binaural hearing aids," IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 1, pp. 38-51, Jan. 2009.
[347]
A. Bertrand and M. Moonen, "Distributed adaptive node-specific signal estimation in fully connected sensor networks--Part I: Sequential node updating," IEEE Trans. Signal Process., vol. 58, pp. 5277-5291, Oct. 2010.
[348]
S. Markovich-Golan, S. Gannot, and I. Cohen, "Distributed multiple constraints generalized sidelobe canceler for fully connected wireless acoustic sensor networks," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 2, pp. 343-356, Feb. 2013.
[349]
S. Markovich-Golan, S. Gannot, and I. Cohen, "Performance of the SDW-MWF with randomly located microphones in a reverberant enclosure," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 7, pp. 1513- 1523, Jul. 2013.
[350]
S. Markovich-Golan, S. Gannot, and I. Cohen, "Low-complexity addition or removal of sensors/constraints in LCMV beamformers," IEEE Trans. Signal Process., vol. 60, no. 3, pp. 1205-1214, Mar. 2012.
[351]
Y. Zeng and R. Hendriks, "Distributed delay and sum beamformer for speech enhancement via randomized gossip," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 1, pp. 260-273, Jan. 2014.
[352]
R. Heusdens, G. Zhang, R. C. Hendriks, Y. Zeng, and W. B. Kleijn, "Distributed MVDR beamforming for (wireless) microphone networks using message passing," in Proc. Int. Workshop Acoust. Signal Enhanc., 2012, pp. 1-4.
[353]
M. O'Connor and W. B. Kleijn, "Diffusion-based distributed MVDR beamformer," in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 2014, pp. 810-814.
[354]
N. D. Gaubitch, J. Martinez, W. B. Kleijn, and R. Heusdens, "On nearfield beamforming with smartphone-based ad-hoc microphone arrays," in Proc. Int. Workshop Acoust. Signal Enhanc., Sep. 2014, pp. 94-98.
[355]
M. Souden, K. Kinoshita, M. Delcroix, and T. Nakatani, "Location feature integration for clustering-based speech separation in distributed microphone arrays," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 2, pp. 354-367, Feb. 2014.
[356]
P. Pertilä, M. S. Hämäläinen, and M. Mieskolainen, "Passive temporal offset estimation of multichannel recordings of an ad-hoc microphone array," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 11, pp. 2393-2402, Nov. 2013.
[357]
S. Wehr, I. Kozintsev, R. Lienhart, and W. Kellermann, "Synchronization of acoustic sensors for distributed ad-hoc audio networks and its use for blind source separation," in Proc. IEEE Int. Symp. Multimedia Softw. Eng., 2004, pp. 18-25.
[358]
S. Markovich-Golan, S. Gannot, and I. Cohen, "Blind sampling rate offset estimation and compensation in wireless acoustic sensor networks with application to beamforming," in Proc. Int. Workshop Acoust. Signal Enhanc., Aachen, Germany, Sep. 2012, pp. 1-4.
[359]
J. Schmalenstroeer, P. Jebramcik, and R. Haeb-Umbach, "A gossiping approach to sampling clock synchronization in wireless acoustic sensor networks," in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., May 2014, pp. 7575-7579.
[360]
D. Cherkassky and S. Gannot, "Blind synchronization in wireless sensor networks with application to speech enchantment," in Proc. Intl. Workshop Acoust. Signal Enhanc., Antibes - Juan les Pins, France, Sep. 2014, pp. 184-188.
[361]
Y. Zeng, R. Hendriks, and N. Gaubitch, "On clock synchronization for multi-microphone speech processing in wireless acoustic sensor networks," in Proc. IEEE Intl. Conf. Acoust., Speech Signal Process., Apr. 2015, pp. 231-235.
[362]
D. Cherkassky, S. Markovich-Golan, and S. Gannot, "Performance analysis of MVDR beamformer in WASN with sampling rate offsets and blind synchronization," in Proc. Eur. Signal Process Conf., Nice, France, Aug. 2015, pp. 245-249.
[363]
J. Schmalenstroeer, P. Jebramcik, and R. Haeb-Umbach, "A combined hardware-software approach for acoustic sensor network synchronization," Signal Process., vol. 107, pp. 171-184, 2015.
[364]
L. Wang and S. Doclo, "Correlation maximization based sampling rate offset estimation for distributed microphone arrays," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 24, no. 3, pp. 571-582, Mar. 2016.
[365]
D. Cherkassky and S. Gannot, "Blind synchronization in wireless acoustic sensor networks," IEEE/ACM Trans. Audio, Speech, Lang. Process., Aug. 2016.
[366]
C. Anderson, P. Teal, and M. Poletti, "Spatially robust far-field beamforming using the von Mises(-Fisher) distribution," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 23, no. 12, pp. 2189-2197, Dec. 2015.
[367]
S. Doclo and M. Moonen, "Design of broadband beamformers robust against gain and phase errors in the microphone array characteristics," IEEE Trans. Signal Process., vol. 51, no. 10, pp. 2511-2526, Oct. 2003.
[368]
S. Doclo and M. Moonen, "Superdirective beamforming robust against microphone mismatch," IEEE Trans. Acoust., Speech, Signal Process., vol. 15, no. 2, pp. 617-631, Feb. 2007.
[369]
J. Li, P. Stoica, and Z.Wang, "On robust Capon beamforming and diagonal loading," IEEE Trans. Signal Process., vol. 51, no. 7, pp. 1702-1715, Jul. 2003.
[370]
S. Vorobyov, A. Gershman, and Z.-Q. Luo, "Robust adaptive beamforming using worst-case performance optimization: A solution to the signal mismatch problem," IEEE Trans. Signal Process., vol. 51, no. 2, pp. 313-324, Feb. 2003.
[371]
R. Lorenz and S. Boyd, "Robust minimum variance beamforming," IEEE Trans. Signal Process., vol. 53, no. 5, pp. 1684-1696, May 2005.
[372]
S. Nordebo, I. Claesson, and S. Nordholm, "Adaptive beamforming: spatial filter designed blocking matrix," IEEE J. Ocean. Eng., vol. 19, no. 4, pp. 583-590, Apr. 1994.
[373]
C. A. Anderson, S. Meier, W. Kellermann, P. D. Teal, and M. A. Poletti, "TRINICON-BSS system incorporating robust dual beamformers for noise reduction," in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 2015, pp. 529-533.
[374]
O. Thiergart, M. Taseska, and E. A. Habets, "An informed LCMV filter based on multiple instantaneous direction-of-arrival estimates," in Proc. IEEE Intl. Conf. Acoust., Speech Signal Process., 2013, pp. 659-663.
[375]
B. Yang, "Projection approximation subspace tracking," IEEE Trans. Signal Process., vol. 43, no. 1, pp. 95-107, Jan. 1995.
[376]
B. Kollmeier, J. Peissig, and V. Hohmann, "Binaural noise-reduction hearing aid scheme with real-time processing in the frequency domain," Scandinavian Audiol. Suppl., vol. 38, pp. 28-38, 1993.
[377]
T. Wittkop and V. Hohmann, "Strategy-selective noise reduction for binaural digital hearing aids," Speech Commun., vol. 39, no. 1, pp. 111- 138, 2003.
[378]
J. Li, M. Akagi, and Y. Suzuki, "Extension of the two-microphone noise reduction method for binaural hearing aids," in Proc. Int. Conf. Audio, Lang. Image Process., 2008, pp. 97-101.
[379]
A. S. Bregman, Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge, MA, USA: MIT Press, 1994.
[380]
S. Wehr, M. Zourub, R. Aichner, and W. Kellermann, "Post-processing for BSS algorithms to recover spatial cues," in Proc. Int. Workshop Acoust. Signal. Enhanc., Paris, France, Sep. 2006.
[381]
R. Aichner, H. Buchner, M. Zourub, and W. Kellermann, "Multi-channel source separation preserving spatial information," in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., Honolulu HI, USA, Apr. 2007, pp. 5-8.
[382]
K. Reindl, Y. Zheng, and W. Kellermann, "Speech enhancement for binaural hearing aids based on blind source separation," in Proc. Int. Symp. Control, Commmun. Signal Process., Mar. 2010, pp. 1-6.
[383]
S. Doclo, R. Dong, T. Klasen, J. Wouters, S. Haykin, and M. Moonen, "Extension of the multi-channel Wiener filter with ITD cues for noise reduction in binaural hearing aids," in Proc. IEEEWorkshop Appl. Signal Process. Audio. Acoust., 2005, pp. 70-73.
[384]
T. Lotter and P. Vary, "Dual-channel speech enhancement by superdirective beamforming," EURASIP J. Adv. Signal Process., vol. 2006, p. 175, Jan. 2006.
[385]
S. Markovich-Golan, S. Gannot, and I. Cohen, "A reduced bandwidth binaural MVDR beamformer," in Proc. Int. Workshop Acoust. Signal Enh., Tel-Aviv, Israel, Sep. 2010.
[386]
E. Hadad, S. Doclo, and S. Gannot, "The binaural LCMV beamformer and its performance analysis," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 24, no. 3, pp. 543-558, Mar. 2016.
[387]
I. Almajai and B. Milner, "Visually derived Wiener filters for speech enhancement," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 6, pp. 1642-1651, Aug. 2011.
[388]
V. Khalidov, F. Forbes, and R. Horaud, "Conjugate mixture models for clusteringmultimodal data," Neural Comput., vol. 23, no. 2, pp. 517-557, 2011.
[389]
M. S. Khan, S. M. Naqvi, A. ur Rehman, W. Wang, and J. Chambers, "Video-aided model-based source separation in real reverberant rooms," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 9, pp. 1900- 1912, Sep. 2013.
[390]
I.-D. Gebru, X. Alameda-Pineda, R. Horaud, and F. Forbes, "Audio-visual speaker localization via weighted clustering," in Proc. IEEE Int. Workshop Mach. Learning Signal Process., Reims, France, Sep. 2014, pp. 1-6.
[391]
A. Deleforge, R. Horaud, Y. Schechner, and L. Girin, "Co-localization of audio sources in images using binaural features and locally-linear regression," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 23, no. 4, pp. 718-731, Apr. 2015.
[392]
S. Zeiler, H. Meutzner, A. H. Abdelaziz, and D. Kolossa, "Introducing the Turbo-Twin-HMM for audio-visual speech enhancement," Proc. Interspeech, 2016, pp. 1750-1754.
[393]
D. Dov, R. Talmon, and I. Cohen, "Audio-visual voice activity detection using diffusion maps," IEEE/ACMTrans. Audio, Speech, Lang. Process., vol. 23, no. 4, pp. 732-745, Apr. 2015.

Cited By

View all
  • (2024)Physics-informed neural network for volumetric sound field reconstruction of speech signalsEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-024-00366-22024:1Online publication date: 9-Sep-2024
  • (2024)Multi-microphone simultaneous speakers detection and localization of multi-sources for separation and noise reductionEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-024-00365-32024:1Online publication date: 4-Oct-2024
  • (2024)Compression of room impulse responses for compact storage and fast low-latency convolutionEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-024-00363-52024:1Online publication date: 13-Sep-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Audio, Speech and Language Processing
IEEE/ACM Transactions on Audio, Speech and Language Processing  Volume 25, Issue 4
April 2017
228 pages
ISSN:2329-9290
EISSN:2329-9304
Issue’s Table of Contents

Publisher

IEEE Press

Publication History

Published: 01 April 2017
Published in TASLP Volume 25, Issue 4

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)1
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Physics-informed neural network for volumetric sound field reconstruction of speech signalsEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-024-00366-22024:1Online publication date: 9-Sep-2024
  • (2024)Multi-microphone simultaneous speakers detection and localization of multi-sources for separation and noise reductionEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-024-00365-32024:1Online publication date: 4-Oct-2024
  • (2024)Compression of room impulse responses for compact storage and fast low-latency convolutionEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-024-00363-52024:1Online publication date: 13-Sep-2024
  • (2024)MIRACLE—a microphone array impulse response dataset for acoustic learningEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-024-00352-82024:1Online publication date: 18-Jun-2024
  • (2024)Mask-Based Beamforming Applied to the End-Fire Microphone ArrayCircuits, Systems, and Signal Processing10.1007/s00034-023-02530-z43:3(1661-1696)Online publication date: 1-Mar-2024
  • (2023)UNSSORProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667596(34021-34042)Online publication date: 10-Dec-2023
  • (2023)Direction-of-arrival and power spectral density estimation using a single directional microphone and group-sparse optimizationEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-023-00304-82023:1Online publication date: 4-Oct-2023
  • (2023)MYRiAD: a multi-array room acoustic databaseEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-023-00284-92023:1Online publication date: 26-Apr-2023
  • (2023)Variance Analysis of Covariance and Spectral Estimates for Mixed-Spectrum Continuous-Time SignalsIEEE Transactions on Signal Processing10.1109/TSP.2023.326647471(1395-1407)Online publication date: 1-Jan-2023
  • (2023)Acoustic SLAM With Moving Sound Event Based on Auxiliary Microphone ArraysIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.328932424:11(11823-11834)Online publication date: 1-Nov-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media