More Web Proxy on the site http://driver.im/

article

Joint time-frequency segmentation algorithm for transient speech decomposition and speech enhancement

Authors:

Charturong Tantibundhit,

Franz Pernkopf,

Gernot KubinAuthors Info & Claims

IEEE Transactions on Audio, Speech, and Language Processing, Volume 18, Issue 6

Pages 1417 - 1428

Published: 01 August 2010 Publication History

Abstract

We develop an algorithm, the joint time-frequency segmentation algorithm, where the wavelet packet coefficients of the analyzed speech signal are represented as tiles of a time-frequency representation adapted to the characteristics of the signal itself. Further, our algorithm enables the decomposition of the speech signal into transient and non-transient components, respectively. Any block of wavelet packet coefficients, whose tiling height is larger than or equal to the tiling width belongs to the transient component and vice versa for the non-transient component. The transient component is selectively amplified and recombined with the original speech to generate the modified speech with energy adjusted to be equal to the original speech. The intelligibility of the original and modified speech is evaluated by 16 human listeners. Word recognition rate results show that the modified speech significantly improves speech intelligibility in background noise, i.e., by 10% absolute at 0 dB to 27% absolute at - 30 dB.

References

[1]

P. C. Loizou, Speech Enhancement: Theory and Practice. Boca Raton, FL: CRC, 2007.

[2]

Y. Ephraim and H. Trees, "A signal subspace approach for speech enhancement," IEEE Trans. Acoust., Speech, Signal Process., vol. 3, no. 4, pp. 251-266, Jul. 1995.

[3]

S. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-27, no. 2, pp. 113-120, Apr. 1979.

[4]

J. Lim and A. Oppenheim, "Enhancement and bandwidth compression of noisy speech," Proc. IEEE, vol. 67, no. 12, pp. 1586-1604, Dec. 1979.

[5]

B. Sauert and P. Vary, "Near end listening enhancement: Speech intelligibility improvement in noisy environments," in Proc. ICASSP, 2006, pp. 493-496.

[6]

V. Hazen and A. Simpson, "The effect of cue-enhancement on the intelligibility of nonsense word and sentence materials presented in noise," Speech Commun., vol. 24, no. 12, pp. 211-226, 1998.

[7]

S. Yoo, J. Boston, A. EI-Jaroudi, C. Li, J. Durrant, K. Kovacyk, and S. Shaiman, "Speech signal modification to increase intelligibility in noisy environments," J. Acoust. Soc. Amer., vol. 122, no. 2, pp. 1138-1149, 2007.

[8]

C. Tantibundhit, J. Boston, C. Li, J. Durrant, S. Shaiman, K. Kovacyk, and A. EI-Jaroudi, "New signal decomposition method based speech enhancement," Signal Process., vol. 87, no. 11, pp. 2607-2628, 2007.

Digital Library

[9]

W. Strange, J. Jenkins, and T. Johnson, "Dynamic specification of coarticulated vowels," J. Acoust. Soc. Amer., vol. 74, no. 3, pp. 695-705, 1983.

[10]

C. Mackersie, A. C. Neuman, and H. Levitt, "A comparison of response time and word recognition measures using a word-monitoring and closed-set identification task," Ear Hear., vol. 20, no. 2, pp. 140-148, 1999.

[11]

L. R. Rabiner, "A tutorial on hidden Markov models and selected applications in speech recognition," Proc. IEEE, vol. 77, no. 2, pp. 257-286, Feb. 1989.

[12]

M. S. Crouse, R. D. Nowak, and R. G. Baraniuk, "Wavelet-based statistical signal processing using hidden Markov models," IEEE Trans. Signal Process., vol. 46, no. 4, pp. 886-902, Apr. 1998.

Digital Library

[13]

J. B. Durand, P. Gonçalvès, and Y. Guédon, "Computational methods for hidden Markov tree models--An application to wavelet trees," IEEE Trans. Signal Process., vol. 52, no. 9, pp. 2551-2560, Sep. 2004.

[14]

T. Painter, "Perceptual coding of digital audio," Proc. IEEE, vol. 88, no. 4, pp. 451-513, Apr. 2000.

[15]

C. Herley, Z. X. Xiong, K. Ramchandran, and M. T. Orchard, "Joint space-frequency segmentation using balanced wavelet packet trees for least-cost image representation," IEEE Trans. Image Process., vol. 6, no. 9, pp. 1213-1230, Sep. 1997.

Digital Library

[16]

C. M. Thiele and L. F. Villemoes, "A fast algorithm for adapted time-frequency tilings," J. Appl. Comput. Harmonic Anal., vol. 3, no. 2, pp. 91-99, 1996.

[17]

C. Tantibundhit and G. Kubin, "Joint time-frequency segmentation for transient decomposition," in Proc. Interspeech, 2008, pp. 2502-2505.

[18]

R. Coifman and M. Wickerhauser, "Entropy-based algorithms for best basis selection," IEEE Trans. Inf. Theory, vol. 38, no. 2, pp. 713-718, 1992.

Digital Library

[19]

C. Tantibundhit, F. Pernkopf, and G. Kubin, "Speech enhancement based on time-frequency segmentation," in Proc. ICASSP, 2009, pp. 4673-4676.

[20]

S. Mallat, A Wavelet Tour of Signal Processing. San Diego, CA: Academic, 1998.

[21]

A. S. House, C. E. Williams, H. M. L. Hecker, and. K. D. Kryter, "Articulation-testing methods: Consonantal dlfferentiation With a closed-response set," J. Acoust. Soc. Amer., vol. 37, no. 1, pp. 158-166, 1965.

[22]

R. Learned, "Wavelet Packet Based Transient Signal Classification," M.S. thesis, Dept. of Elect. Eng., Mass. Inst. of Technol., Cambridge, 1992.

[23]

T. M. Cover andJ. A. Thomas, Elements of Information Theory. New York: Wiley, 1991.

[24]

L. Daudet and B. Torrésani, "Hybrid representation for audiophonic signal encoding," Signal Process., vol. 82, no. 11, pp. 1595-1617, 2002.

Digital Library

[25]

N. R. French and J. C. Steinberg, "Factors governing the intelligibility of speech sounds," J. Acoust. Soc. Amer., vol. 19, no. 1, pp. 90-119, 1947.

[26]

R. J. Niederjohn and J. H. Grotelueschen, "The enhancement of speech intelligibility in high noise levels by high-pass filtering followed by rapid amplitude compression," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-24, no. 4, pp. 202-207, Aug. 1976.

Cited By

Saulig NMilanovi eIoana C(2017)A local entropy-based algorithm for information content extraction from timefrequency distributions of noisy signalsDigital Signal Processing10.1016/j.dsp.2017.08.00570:C(155-165)Online publication date: 1-Nov-2017
https://dl.acm.org/doi/10.1016/j.dsp.2017.08.005

Joint time-frequency segmentation algorithm for transient speech decomposition and speech enhancement

Recommendations

Joint Time–Frequency Segmentation Algorithm for Transient Speech Decomposition and Speech Enhancement

We develop an algorithm, the joint time-frequency segmentation algorithm, where the wavelet packet coefficients of the analyzed speech signal are represented as tiles of a time-frequency representation adapted to the characteristics of the signal ...
New signal decomposition method based speech enhancement

The auditory system, like the visual system, may be sensitive to abrupt stimulus changes, and the transient component in speech may be particularly critical to speech perception. If this component can be identified and selectively amplified, improved ...
Combined speech enhancement and auditory modelling for robust distributed speech recognition

The performance of automatic speech recognition (ASR) systems in the presence of noise is an area that has attracted a lot of research interest. Additive noise from interfering noise sources, and convolutional noise arising from transmission channel ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Audio, Speech, and Language Processing

IEEE Transactions on Audio, Speech, and Language Processing Volume 18, Issue 6

August 2010

34 pages

ISSN:1063-6676

Issue’s Table of Contents

Publisher

IEEE Press

Publication History

Published: 01 August 2010

Revised: 23 September 2009

Received: 03 May 2009

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Saulig NMilanovi eIoana C(2017)A local entropy-based algorithm for information content extraction from timefrequency distributions of noisy signalsDigital Signal Processing10.1016/j.dsp.2017.08.00570:C(155-165)Online publication date: 1-Nov-2017
https://dl.acm.org/doi/10.1016/j.dsp.2017.08.005

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents