A New Biologically Inspired Fuzzy Expert System-Based Voiced/Unvoiced Decision Algorithm for Speech Enhancement

M. A. Ben Messaoud¹,
A. Bouzid¹ &
N. Ellouze¹

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

In this paper, we propose a speech enhancement approach for a single-microphone system. The main idea is to apply a specific transformation on the speech signal depending on the voicing state of the signal. We apply a voiced/unvoiced algorithm based on the multi-scale product analysis with the use of fuzzy logic to make more cognitively inspired use of speech information. A comb filtering is applied on the voiced frames of the noisy speech signal, and a spectral subtraction is operated on the unvoiced frames of the same signal. Further, the harmonics are enhanced by performing a designed comb filtering using an adjustable bandwidth. The comb filter is tuned by an accurate fundamental frequency estimation method. The fundamental frequency estimation method is based on computing the multi-scale product analysis of the noisy speech. Experimental results show that the proposed approach is capable of reducing noise in adverse noise environments with little speech degradation and outperforms several competitive methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Adaptive Prosody Modelling for Improved Synthetic Speech Quality

Evolving Fuzzy-Neural Method for Multimodal Speech Recognition

Filter algorithm based on cochlear mechanics and neuron filter mechanism and application on enhancement of audio signals

Article 16 April 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Hussain A, Chetouani M, Squartini S, Bastari A, Piazza F. Nonlinear speech enhancement: an overview. In: Stylianou Y, Faundez-Zanuy M, Esposito A, editors. LNCS 4391. Berlin: Springer; 2007. p. 217–48.
Google Scholar
Boll SF. Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans Acoust Speech Signal Process. 1979;27:113–8.
Article Google Scholar
Hu HT, Kuo FJ, Wang HJ. Supplementary schemes to spectral subtraction for speech enhancement. Speech Commun. 2002;36:205–14.
Article Google Scholar
Lu Y, Loizou PC. A geometric approach to spectral subtraction. Speech Commun. 2008;50:453–514.
Article PubMed PubMed Central Google Scholar
Cadore J, Valverde-Albacete FJ, Gallardo-Antolín A, Peláez-Moreno C. Auditory-inspired morphological processing of speech spectrograms: applications in automatic speech recognition and speech enhancement. Cognit Comput. 2013;5:426–516.
Article Google Scholar
Hu Y, Loizou PC. Speech enhancement based on wavelet thresholding the multitaper spectrum. IEEE Trans Speech Audio Process. 2004;12:59–69.
Article Google Scholar
Ding GH, Huang T, Xu B. Suppression of additive noise using a power spectral density MMSE estimator. IEEE Trans Signal Process Lett. 2004;11:585–604.
Article Google Scholar
Cohen I. Speech enhancement using a noncausal a priori SNR estimator. IEEE Trans Signal Process Lett. 2004;11:725–34.
Article Google Scholar
Lee KY, Jung S. Time-domain approach using multiple Kalman filters and EM algorithm to speech enhancement with nonstationary noise. IEEE Trans Speech Audio Process. 2000;8:282–310.
Article Google Scholar
Zavarehei E, Vaseghi S. Speech enhancement in temporal DFT trajectories using Kalman filters. In: Interspeech, Lisbon; 2005.
Huag F, Lee T, Kleijn WB. Transform-domain wiener filter for speech periodicity. In: IEEE International Conference Acoustic Speech Signal Processing (ICASSP); 2012. p. 4577–84.
Hu Y, Loizou PC. A subspace approach for enhancing speech corrupted by colored noise. IEEE Signal Process Lett. 2002;9:204–13.
Article Google Scholar
Hardwick J, Yoo CD, Lim JS. Speech enhancement using the dual excitation model. In: Proceedings of IEEE International Conferrence Acoustic Speech Signal Processing (ICASSP); 1993; 367–74.
Dubost S, Cappe O. Enhancement of speech based on non-parametric estimation of a time varying harmonic representation. In: Proceedings of IEEE International Conferrence Acoustic Speech Signal Processing (ICASSP); 2000. p. 1859–64.
Deisher ME, Spanias AS. HMM-based speech enhancement using harmonic modeling. In: Proceedings of IEEE International Conferrence Acoustic Speech Signal Processing (ICASSP); 1997; 1175–84.
Jensen J, Hansen JHL. Speech enhancement using a constrained iterative sinusoidal model. IEEE Trans Speech Audio Process. 2001;9:731–810.
Article Google Scholar
Squartini S, Schuller B, Hussain A. Cognitive and emotional information processing for human–machine interaction. Cognit Comput. 2012;4(4):383–93.
Article Google Scholar
Espinosa-Duro V, Faundez-Zanuy M, Mekyska J. Beyond cognitive signals. Cognit Comput. 2011;3(2):374–8.
Article Google Scholar
Esposito A. The perceptual and cognitive role of visual and auditory channels in conveying emotional information. Cognit Comput. 2009;1(3):268–311.
Article Google Scholar
Abel A, Hussain A. Novel two-stage audiovisual speech filtering in noisy environments. Cognit Comput. 2014;6:200–18.
Article Google Scholar
Abel A, Hussain A. Cognitively inspired audiovisual speech filtering: towards an intelligent, fuzzy based, multimodal, two-stage speech enhancement system. Springer Briefs in Cognitive Computation, Springer International Publishing; 2015.
Rotili R, Principi E, Squartini S, Schuller B. A Real-time speech enhancement framework in noisy and reverberated acoustic scenarios. Cognit Comput. 2013;5:504–13.
Article Google Scholar
Narayanan A, Wang DL. Ideal ratio mask estimation using deep neural networks for robust speech recognition. In: Proceedings of ICASSP; 2013. pp. 1520–6149.
Xu Y, Du J, Dai L, Lee C. An experimental study on speech enhancement based on deep neural networks. IEEE Signal Process Lett. 2014;21:65–74.
Article Google Scholar
Cho E, Smith JO, Widrow B. Exploiting the harmonic structure for speech enhancement. In: Proceedings of IEEE International Conferrence Acoustic Speech Signal Processing (ICASSP); 2012.
George E, Smith M. Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model. IEEE Trans Speech Audio Process. 1997;5:389–418.
Article Google Scholar
Nehorai A, Porat B. Adaptive comb filtering for harmonic signal enhancement. IEEE Trans Acoust Speech Signal Process. 1986;34:1124–215.
Article Google Scholar
Chen JH, Gersho A. Adaptive postfiltering for quality enhancement of coded speech. IEEE Trans Speech Audio Process. 1995;3:59–113.
Article Google Scholar
Grancharov V, Plasberg JH, Samuelsson J, Kleijn WB. Generalized postfilter for speech quality enhancement. IEEE Trans Audio Speech Lang Process. 2008;16:57–8.
Article Google Scholar
Jin W, Liu X, Scordilis MS. Speech enhancement using harmonic emphasis and comb filtering. IEEE Trans Audio Speech Lang Process. 2010;18:356–413.
Article Google Scholar
Ahmadi S, Spanias A. Cepstrum-based pitch detection using a new statistical V/UV classification algorithm. IEEE Trans Speech Audio Process. 1999;7:333–6.
Article Google Scholar
Fisher E, Tabrikian J, Dubnov S. Generalized likelihood ratio test for voiced–unvoiced decision in noisy speech using the harmonic model. IEEE Trans Audio Speech Lang Process. 2006;14:502–9.
Article Google Scholar
Nakatani T, Amano S, Irino T, Ishizuka K, Kondo T. A method for fundamental frequency estimation and voicing decision: application to infant utterances recorded in real acoustical environments. Speech Commun. 2008;50:203–12.
Article Google Scholar
Talkin D. A robust algorithm for pitch tracking (RAPT). In: Talkin D, editor. Speech coding and synthesis. Amsterdam: Elsevier; 1995. p. 495–518.
Google Scholar
de Cheveigné A, Kawahara H. YIN, a fundamental frequency estimator for speech and music. J Acoust Soc Am. 2002;111:1917–2014.
Article PubMed Google Scholar
Beritelli F, Casale S, Russo S, Serrano S. Adaptive V/UV speech detection based on characterization of background noise. EURASIP J Audio Speech Music Process. 2009;. doi:10.1155/2009/965436.
Google Scholar
Ben Messaoud MA, Bouzid A, Ellouze, N. Estimation du pitch et décision de voisement par compression spectrale de l’autocorrélation du produit multi-échelle. In: Proceedings of Journée d’Etude de la parole (JEP-TALN-RECITAL 2012); 2012; pp. 201–8.
Bouzid A, Ellouze N. Electroglottographic measures based on GCI and GOI detection using multiscale product. Int J Comput Commun Control. 2008;3:21–32.
Article Google Scholar
Ben Messaoud MA, Bouzid A, Ellouze N. Using multi-scale product spectrum for single and multi-pitch estimation. IET Signal Process J. 2011;5:344–412.
Article Google Scholar
Xu Y, Weaver JB, Healy DM, Lu J. Wavelet transform domain filters: a spatially selective noise filtration technique. IEEE Trans Image Process. 1994;3:747–812.
Article CAS PubMed Google Scholar
Sadler BM, Swami A. Analysis of multi-scale products for step detection and estimation. IEEE Trans Inf Theory. 1999;45:1043–9.
Article Google Scholar
Mallat S. A wavelet tour of signal processing. 3rd ed. San Diego: Academic Press; 2008.
Google Scholar
Touzi A, Ben Messaoud MA. New approach for conception and implementation of object oriented expert system using UML. Int Arab J Inf Technol. 2009;6:99–108.
Google Scholar
Ben Messaoud MA, Bouzid A, Ellouze N. An efficient method for fundamental frequency determination of noisy speech. In: Drugman T, Dutoit T, editors. LNCS 7911. Springer: Berlin; 2013. p. 33–41.
Google Scholar
Hu Y, Loizou PC. Subjective comparison and evaluation of speech enhancement algorithms. Speech Commun. 2007;49:588–614.
Article PubMed PubMed Central Google Scholar
ITU-T P.862. Perceptual evaluation of speech quality (PESQ), and objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. In: ITU-T Recommendation; 2000; p. 862.
Camacho A, Harris JG. A sawtooth waveform inspired pitch estimator for speech and music. J Acoust Soc Am. 2008;124:1638–715.
Article PubMed Google Scholar
Ben Messaoud MA, Bouzid A, Ellouze N. Autocorrelation of the speech multi-scale product for voicing decision and pitch estimation. Cognit Comput. 2010;2:151–9.
Article Google Scholar
Loizou PC. Speech enhancement: theory and practice. Dallas: CRC Press; 2007.
Google Scholar

Download references

Acknowledgments

The authors would like to thank Dr. A. Abel for his help throughout the revision of paper by her thesis.

Author information

Authors and Affiliations

National School of Engineers of Tunis, University of Tunis El Manar, Tunis, Tunisia
M. A. Ben Messaoud, A. Bouzid & N. Ellouze

Authors

M. A. Ben Messaoud
View author publications
You can also search for this author in PubMed Google Scholar
A. Bouzid
View author publications
You can also search for this author in PubMed Google Scholar
N. Ellouze
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. A. Ben Messaoud.

Ethics declarations

Conflict of Interest

M. A. Ben Messaoud, A. Bouzid and N. Ellouze declare that they have no conflict of interest.

Informed Consent

All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, as revised in 2008 (5). Additional informed consent was obtained from all patients for which identifying information is included in this article.

Human and Animal Rights

This article does not contain any studies with human or animal subjects performed by the any of the authors.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ben Messaoud, M.A., Bouzid, A. & Ellouze, N. A New Biologically Inspired Fuzzy Expert System-Based Voiced/Unvoiced Decision Algorithm for Speech Enhancement. Cogn Comput 8, 478–493 (2016). https://doi.org/10.1007/s12559-015-9376-2

Download citation

Received: 07 March 2015
Accepted: 28 December 2015
Published: 11 January 2016
Issue Date: June 2016
DOI: https://doi.org/10.1007/s12559-015-9376-2

A New Biologically Inspired Fuzzy Expert System-Based Voiced/Unvoiced Decision Algorithm for Speech Enhancement

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Adaptive Prosody Modelling for Improved Synthetic Speech Quality

Evolving Fuzzy-Neural Method for Multimodal Speech Recognition

Filter algorithm based on cochlear mechanics and neuron filter mechanism and application on enhancement of audio signals

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Informed Consent

Human and Animal Rights

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A New Biologically Inspired Fuzzy Expert System-Based Voiced/Unvoiced Decision Algorithm for Speech Enhancement

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Adaptive Prosody Modelling for Improved Synthetic Speech Quality

Evolving Fuzzy-Neural Method for Multimodal Speech Recognition

Filter algorithm based on cochlear mechanics and neuron filter mechanism and application on enhancement of audio signals

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Informed Consent

Human and Animal Rights

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation