Abstract
In this paper, we propose a speech enhancement approach for a single-microphone system. The main idea is to apply a specific transformation on the speech signal depending on the voicing state of the signal. We apply a voiced/unvoiced algorithm based on the multi-scale product analysis with the use of fuzzy logic to make more cognitively inspired use of speech information. A comb filtering is applied on the voiced frames of the noisy speech signal, and a spectral subtraction is operated on the unvoiced frames of the same signal. Further, the harmonics are enhanced by performing a designed comb filtering using an adjustable bandwidth. The comb filter is tuned by an accurate fundamental frequency estimation method. The fundamental frequency estimation method is based on computing the multi-scale product analysis of the noisy speech. Experimental results show that the proposed approach is capable of reducing noise in adverse noise environments with little speech degradation and outperforms several competitive methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Hussain A, Chetouani M, Squartini S, Bastari A, Piazza F. Nonlinear speech enhancement: an overview. In: Stylianou Y, Faundez-Zanuy M, Esposito A, editors. LNCS 4391. Berlin: Springer; 2007. p. 217–48.
Boll SF. Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans Acoust Speech Signal Process. 1979;27:113–8.
Hu HT, Kuo FJ, Wang HJ. Supplementary schemes to spectral subtraction for speech enhancement. Speech Commun. 2002;36:205–14.
Lu Y, Loizou PC. A geometric approach to spectral subtraction. Speech Commun. 2008;50:453–514.
Cadore J, Valverde-Albacete FJ, Gallardo-Antolín A, Peláez-Moreno C. Auditory-inspired morphological processing of speech spectrograms: applications in automatic speech recognition and speech enhancement. Cognit Comput. 2013;5:426–516.
Hu Y, Loizou PC. Speech enhancement based on wavelet thresholding the multitaper spectrum. IEEE Trans Speech Audio Process. 2004;12:59–69.
Ding GH, Huang T, Xu B. Suppression of additive noise using a power spectral density MMSE estimator. IEEE Trans Signal Process Lett. 2004;11:585–604.
Cohen I. Speech enhancement using a noncausal a priori SNR estimator. IEEE Trans Signal Process Lett. 2004;11:725–34.
Lee KY, Jung S. Time-domain approach using multiple Kalman filters and EM algorithm to speech enhancement with nonstationary noise. IEEE Trans Speech Audio Process. 2000;8:282–310.
Zavarehei E, Vaseghi S. Speech enhancement in temporal DFT trajectories using Kalman filters. In: Interspeech, Lisbon; 2005.
Huag F, Lee T, Kleijn WB. Transform-domain wiener filter for speech periodicity. In: IEEE International Conference Acoustic Speech Signal Processing (ICASSP); 2012. p. 4577–84.
Hu Y, Loizou PC. A subspace approach for enhancing speech corrupted by colored noise. IEEE Signal Process Lett. 2002;9:204–13.
Hardwick J, Yoo CD, Lim JS. Speech enhancement using the dual excitation model. In: Proceedings of IEEE International Conferrence Acoustic Speech Signal Processing (ICASSP); 1993; 367–74.
Dubost S, Cappe O. Enhancement of speech based on non-parametric estimation of a time varying harmonic representation. In: Proceedings of IEEE International Conferrence Acoustic Speech Signal Processing (ICASSP); 2000. p. 1859–64.
Deisher ME, Spanias AS. HMM-based speech enhancement using harmonic modeling. In: Proceedings of IEEE International Conferrence Acoustic Speech Signal Processing (ICASSP); 1997; 1175–84.
Jensen J, Hansen JHL. Speech enhancement using a constrained iterative sinusoidal model. IEEE Trans Speech Audio Process. 2001;9:731–810.
Squartini S, Schuller B, Hussain A. Cognitive and emotional information processing for human–machine interaction. Cognit Comput. 2012;4(4):383–93.
Espinosa-Duro V, Faundez-Zanuy M, Mekyska J. Beyond cognitive signals. Cognit Comput. 2011;3(2):374–8.
Esposito A. The perceptual and cognitive role of visual and auditory channels in conveying emotional information. Cognit Comput. 2009;1(3):268–311.
Abel A, Hussain A. Novel two-stage audiovisual speech filtering in noisy environments. Cognit Comput. 2014;6:200–18.
Abel A, Hussain A. Cognitively inspired audiovisual speech filtering: towards an intelligent, fuzzy based, multimodal, two-stage speech enhancement system. Springer Briefs in Cognitive Computation, Springer International Publishing; 2015.
Rotili R, Principi E, Squartini S, Schuller B. A Real-time speech enhancement framework in noisy and reverberated acoustic scenarios. Cognit Comput. 2013;5:504–13.
Narayanan A, Wang DL. Ideal ratio mask estimation using deep neural networks for robust speech recognition. In: Proceedings of ICASSP; 2013. pp. 1520–6149.
Xu Y, Du J, Dai L, Lee C. An experimental study on speech enhancement based on deep neural networks. IEEE Signal Process Lett. 2014;21:65–74.
Cho E, Smith JO, Widrow B. Exploiting the harmonic structure for speech enhancement. In: Proceedings of IEEE International Conferrence Acoustic Speech Signal Processing (ICASSP); 2012.
George E, Smith M. Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model. IEEE Trans Speech Audio Process. 1997;5:389–418.
Nehorai A, Porat B. Adaptive comb filtering for harmonic signal enhancement. IEEE Trans Acoust Speech Signal Process. 1986;34:1124–215.
Chen JH, Gersho A. Adaptive postfiltering for quality enhancement of coded speech. IEEE Trans Speech Audio Process. 1995;3:59–113.
Grancharov V, Plasberg JH, Samuelsson J, Kleijn WB. Generalized postfilter for speech quality enhancement. IEEE Trans Audio Speech Lang Process. 2008;16:57–8.
Jin W, Liu X, Scordilis MS. Speech enhancement using harmonic emphasis and comb filtering. IEEE Trans Audio Speech Lang Process. 2010;18:356–413.
Ahmadi S, Spanias A. Cepstrum-based pitch detection using a new statistical V/UV classification algorithm. IEEE Trans Speech Audio Process. 1999;7:333–6.
Fisher E, Tabrikian J, Dubnov S. Generalized likelihood ratio test for voiced–unvoiced decision in noisy speech using the harmonic model. IEEE Trans Audio Speech Lang Process. 2006;14:502–9.
Nakatani T, Amano S, Irino T, Ishizuka K, Kondo T. A method for fundamental frequency estimation and voicing decision: application to infant utterances recorded in real acoustical environments. Speech Commun. 2008;50:203–12.
Talkin D. A robust algorithm for pitch tracking (RAPT). In: Talkin D, editor. Speech coding and synthesis. Amsterdam: Elsevier; 1995. p. 495–518.
de Cheveigné A, Kawahara H. YIN, a fundamental frequency estimator for speech and music. J Acoust Soc Am. 2002;111:1917–2014.
Beritelli F, Casale S, Russo S, Serrano S. Adaptive V/UV speech detection based on characterization of background noise. EURASIP J Audio Speech Music Process. 2009;. doi:10.1155/2009/965436.
Ben Messaoud MA, Bouzid A, Ellouze, N. Estimation du pitch et décision de voisement par compression spectrale de l’autocorrélation du produit multi-échelle. In: Proceedings of Journée d’Etude de la parole (JEP-TALN-RECITAL 2012); 2012; pp. 201–8.
Bouzid A, Ellouze N. Electroglottographic measures based on GCI and GOI detection using multiscale product. Int J Comput Commun Control. 2008;3:21–32.
Ben Messaoud MA, Bouzid A, Ellouze N. Using multi-scale product spectrum for single and multi-pitch estimation. IET Signal Process J. 2011;5:344–412.
Xu Y, Weaver JB, Healy DM, Lu J. Wavelet transform domain filters: a spatially selective noise filtration technique. IEEE Trans Image Process. 1994;3:747–812.
Sadler BM, Swami A. Analysis of multi-scale products for step detection and estimation. IEEE Trans Inf Theory. 1999;45:1043–9.
Mallat S. A wavelet tour of signal processing. 3rd ed. San Diego: Academic Press; 2008.
Touzi A, Ben Messaoud MA. New approach for conception and implementation of object oriented expert system using UML. Int Arab J Inf Technol. 2009;6:99–108.
Ben Messaoud MA, Bouzid A, Ellouze N. An efficient method for fundamental frequency determination of noisy speech. In: Drugman T, Dutoit T, editors. LNCS 7911. Springer: Berlin; 2013. p. 33–41.
Hu Y, Loizou PC. Subjective comparison and evaluation of speech enhancement algorithms. Speech Commun. 2007;49:588–614.
ITU-T P.862. Perceptual evaluation of speech quality (PESQ), and objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. In: ITU-T Recommendation; 2000; p. 862.
Camacho A, Harris JG. A sawtooth waveform inspired pitch estimator for speech and music. J Acoust Soc Am. 2008;124:1638–715.
Ben Messaoud MA, Bouzid A, Ellouze N. Autocorrelation of the speech multi-scale product for voicing decision and pitch estimation. Cognit Comput. 2010;2:151–9.
Loizou PC. Speech enhancement: theory and practice. Dallas: CRC Press; 2007.
Acknowledgments
The authors would like to thank Dr. A. Abel for his help throughout the revision of paper by her thesis.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
M. A. Ben Messaoud, A. Bouzid and N. Ellouze declare that they have no conflict of interest.
Informed Consent
All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, as revised in 2008 (5). Additional informed consent was obtained from all patients for which identifying information is included in this article.
Human and Animal Rights
This article does not contain any studies with human or animal subjects performed by the any of the authors.
Rights and permissions
About this article
Cite this article
Ben Messaoud, M.A., Bouzid, A. & Ellouze, N. A New Biologically Inspired Fuzzy Expert System-Based Voiced/Unvoiced Decision Algorithm for Speech Enhancement. Cogn Comput 8, 478–493 (2016). https://doi.org/10.1007/s12559-015-9376-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-015-9376-2