[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Joint time-frequency segmentation algorithm for transient speech decomposition and speech enhancement

Published: 01 August 2010 Publication History

Abstract

We develop an algorithm, the joint time-frequency segmentation algorithm, where the wavelet packet coefficients of the analyzed speech signal are represented as tiles of a time-frequency representation adapted to the characteristics of the signal itself. Further, our algorithm enables the decomposition of the speech signal into transient and non-transient components, respectively. Any block of wavelet packet coefficients, whose tiling height is larger than or equal to the tiling width belongs to the transient component and vice versa for the non-transient component. The transient component is selectively amplified and recombined with the original speech to generate the modified speech with energy adjusted to be equal to the original speech. The intelligibility of the original and modified speech is evaluated by 16 human listeners. Word recognition rate results show that the modified speech significantly improves speech intelligibility in background noise, i.e., by 10% absolute at 0 dB to 27% absolute at - 30 dB.

References

[1]
P. C. Loizou, Speech Enhancement: Theory and Practice. Boca Raton, FL: CRC, 2007.
[2]
Y. Ephraim and H. Trees, "A signal subspace approach for speech enhancement," IEEE Trans. Acoust., Speech, Signal Process., vol. 3, no. 4, pp. 251-266, Jul. 1995.
[3]
S. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-27, no. 2, pp. 113-120, Apr. 1979.
[4]
J. Lim and A. Oppenheim, "Enhancement and bandwidth compression of noisy speech," Proc. IEEE, vol. 67, no. 12, pp. 1586-1604, Dec. 1979.
[5]
B. Sauert and P. Vary, "Near end listening enhancement: Speech intelligibility improvement in noisy environments," in Proc. ICASSP, 2006, pp. 493-496.
[6]
V. Hazen and A. Simpson, "The effect of cue-enhancement on the intelligibility of nonsense word and sentence materials presented in noise," Speech Commun., vol. 24, no. 12, pp. 211-226, 1998.
[7]
S. Yoo, J. Boston, A. EI-Jaroudi, C. Li, J. Durrant, K. Kovacyk, and S. Shaiman, "Speech signal modification to increase intelligibility in noisy environments," J. Acoust. Soc. Amer., vol. 122, no. 2, pp. 1138-1149, 2007.
[8]
C. Tantibundhit, J. Boston, C. Li, J. Durrant, S. Shaiman, K. Kovacyk, and A. EI-Jaroudi, "New signal decomposition method based speech enhancement," Signal Process., vol. 87, no. 11, pp. 2607-2628, 2007.
[9]
W. Strange, J. Jenkins, and T. Johnson, "Dynamic specification of coarticulated vowels," J. Acoust. Soc. Amer., vol. 74, no. 3, pp. 695-705, 1983.
[10]
C. Mackersie, A. C. Neuman, and H. Levitt, "A comparison of response time and word recognition measures using a word-monitoring and closed-set identification task," Ear Hear., vol. 20, no. 2, pp. 140-148, 1999.
[11]
L. R. Rabiner, "A tutorial on hidden Markov models and selected applications in speech recognition," Proc. IEEE, vol. 77, no. 2, pp. 257-286, Feb. 1989.
[12]
M. S. Crouse, R. D. Nowak, and R. G. Baraniuk, "Wavelet-based statistical signal processing using hidden Markov models," IEEE Trans. Signal Process., vol. 46, no. 4, pp. 886-902, Apr. 1998.
[13]
J. B. Durand, P. Gonçalvès, and Y. Guédon, "Computational methods for hidden Markov tree models--An application to wavelet trees," IEEE Trans. Signal Process., vol. 52, no. 9, pp. 2551-2560, Sep. 2004.
[14]
T. Painter, "Perceptual coding of digital audio," Proc. IEEE, vol. 88, no. 4, pp. 451-513, Apr. 2000.
[15]
C. Herley, Z. X. Xiong, K. Ramchandran, and M. T. Orchard, "Joint space-frequency segmentation using balanced wavelet packet trees for least-cost image representation," IEEE Trans. Image Process., vol. 6, no. 9, pp. 1213-1230, Sep. 1997.
[16]
C. M. Thiele and L. F. Villemoes, "A fast algorithm for adapted time-frequency tilings," J. Appl. Comput. Harmonic Anal., vol. 3, no. 2, pp. 91-99, 1996.
[17]
C. Tantibundhit and G. Kubin, "Joint time-frequency segmentation for transient decomposition," in Proc. Interspeech, 2008, pp. 2502-2505.
[18]
R. Coifman and M. Wickerhauser, "Entropy-based algorithms for best basis selection," IEEE Trans. Inf. Theory, vol. 38, no. 2, pp. 713-718, 1992.
[19]
C. Tantibundhit, F. Pernkopf, and G. Kubin, "Speech enhancement based on time-frequency segmentation," in Proc. ICASSP, 2009, pp. 4673-4676.
[20]
S. Mallat, A Wavelet Tour of Signal Processing. San Diego, CA: Academic, 1998.
[21]
A. S. House, C. E. Williams, H. M. L. Hecker, and. K. D. Kryter, "Articulation-testing methods: Consonantal dlfferentiation With a closed-response set," J. Acoust. Soc. Amer., vol. 37, no. 1, pp. 158-166, 1965.
[22]
R. Learned, "Wavelet Packet Based Transient Signal Classification," M.S. thesis, Dept. of Elect. Eng., Mass. Inst. of Technol., Cambridge, 1992.
[23]
T. M. Cover andJ. A. Thomas, Elements of Information Theory. New York: Wiley, 1991.
[24]
L. Daudet and B. Torrésani, "Hybrid representation for audiophonic signal encoding," Signal Process., vol. 82, no. 11, pp. 1595-1617, 2002.
[25]
N. R. French and J. C. Steinberg, "Factors governing the intelligibility of speech sounds," J. Acoust. Soc. Amer., vol. 19, no. 1, pp. 90-119, 1947.
[26]
R. J. Niederjohn and J. H. Grotelueschen, "The enhancement of speech intelligibility in high noise levels by high-pass filtering followed by rapid amplitude compression," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-24, no. 4, pp. 202-207, Aug. 1976.

Cited By

View all
  • (2017)A local entropy-based algorithm for information content extraction from timefrequency distributions of noisy signalsDigital Signal Processing10.1016/j.dsp.2017.08.00570:C(155-165)Online publication date: 1-Nov-2017

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Audio, Speech, and Language Processing  Volume 18, Issue 6
August 2010
34 pages

Publisher

IEEE Press

Publication History

Published: 01 August 2010
Revised: 23 September 2009
Received: 03 May 2009

Author Tags

  1. joint time-frequency (TF) segmentation
  2. speech enhancement
  3. speech intelligibility
  4. transient component
  5. wavelet packet transform

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2017)A local entropy-based algorithm for information content extraction from timefrequency distributions of noisy signalsDigital Signal Processing10.1016/j.dsp.2017.08.00570:C(155-165)Online publication date: 1-Nov-2017

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media