Multitaper chirp group delay Hilbert envelope coefficients for robust speaker verification

Ahmed Krobba¹,
Mohamed Debyeche¹ &
Sid-Ahmed Selouani²

236 Accesses
3 Citations
Explore all metrics

Abstract

In this paper, we propose two new feature extraction methods for robust automatic speaker verification under noisy conditions. The first method, called Multi-taper Gammatone Hilbert Envelope Coefficients (MGHECs), employs multi-taper magnitude spectra that offer considerable advantages for spectrum estimates. The second method, called Multi-taper Chirp Group Delay Zeros-Phase Hilbert Envelope Coefficients (MCGDZPHECs) based on multi-tapers phase spectral. The chirp group delay technique is used to estimate the vocal tract from the chirp Fourier transform phase. The performance evaluation of the proposed methods and their extended variants are carried out on NIST 2008 corpus under noisy conditions, using various noise SNR levels which are extracted from NOISEX-92. Experimental results show that the proposed methods provide better representation of speech spectrum. Moreover, we obtained a significant improvement in performance under noisy conditions when compared to conventional Mean Hilbert Envelope Coefficients (MHECs) feature extraction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Mixture linear prediction Gammatone Cepstral features for robust speaker verification under transmission channel noise

Article 09 March 2020

Gammatone filterbank and symbiotic combination of amplitude and phase-based spectra for robust speaker verification under noisy conditions and compression artifacts

Article 30 September 2017

Robust Speaker Recognition Using Improved GFCC and Adaptive Feature Selection

References

Ajmera PK, Holambe RS (2013) Fractional Fourier transform based features for speaker recognition using support vector machine. Comput Electr Eng 39:550–557
Article Google Scholar
Al-Ali AKH, Dean D, Senadji B, Chandran V, Naik GR (2017) Enhanced forensic speaker verification using a combination of DWT and MFCC feature warping in the presence of noise and reverberation conditions. IEEE Access 5:15400–15413
Article Google Scholar
Alam MJ, Kinnunen T, Kenny P, Ouellet P, O’Shaughnessy D (2013) Multitaper MFCC and PLP features for speaker verification using i-vectors. Speech Comm 55:237–251
Article Google Scholar
Alsteris LD, Paliwal KK (2006) Short-time phase spectrum in speech processing: a review and some experimental results. Digital Signal Process 17:578–616
Article Google Scholar
Ambikairajah E, Kua JMK, Sethu V, Li H (2012) PNCC-ivector-SRC based speaker verification. In: Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), Asia-Pacific, pp. 1–7
Apsingekar VR, De Leon PL (2011) Speaker verification score normalization using speaker model clusters. Speech Comm 53:110–118
Article Google Scholar
Asbai N, Amrouche A (2017) Boosting scores fusion approach using Front-End Diversity and adaboost Algorithm, for speaker verification. Comput Electr Eng 62:648–662 Elsevier
Article Google Scholar
Babadi B, Brown EN (2014) A review of multitaper spectral analysis. IEEE Trans Biomed Eng 61:1555–1564
Article Google Scholar
Banno H, Takeda K, Itakura F (2001) A study on perceptual distance measure for phase spectrum of stimuli. In: Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3297–3300
Bhattacharjee U, Pranab Das (2013) Performance Evaluation of Wiener Filter and Kalman Filter Combined with Spectral Subtraction in Speaker Verification System, International Journal of Innovative Technology and Exploring Engineering (IJITEE), ISSN: 2278–3075, vol-2
Bousquet PM, Bonastre JF, Matrouf D (2014) Exploring some limits of Gaussian PLDA modeling for i-vector distributions. In Odyssey: The Speaker and Language Recognition Workshop, 41–47
Bozkurt B, Couvreur L, Dutoit T (2007) Chirp group delay analysis of speech signals. Speech Comm 49:159–176
Article Google Scholar
Chang J, Wang D (2017) Robust speaker recognition based on DNN/i-Vectors and speech separation. In: IEEE International Conference Acoustics, Speech and Signal Processing (ICASSP), pp. 5415–5419
Chetouani M, Faundez-Zanuy M, Gas B, Zarader JL (2009) Investigation on LP-residual representations for speaker identification. Pattern Recognition 42:487–494
Article MATH Google Scholar
Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19:788–798
Article Google Scholar
Fedila M, Bengherabi M, Amrouche A (2015) Consolidating product spectrum and gammatone filterbank for robust speaker verification under noisy conditions. In: International Conference on Intelligent Systems Design and Applications (ISDA). IEEE, pp 347–352
Fedila M, Harizi F, Bengherabi M, Amrouche A (2014) Robust speaker verification using a new front end based on multitaper and gammatone filters. In: Tenth International Conference on Signal Image Technology and Internet-Based Systems (SITIS). IEEE, pp. 99–103
Garcia-Romero D, Espy-Wilson CY (2011) Analysis of i-vector length normalization in speaker recognition systems. In Interspeech, pp. 249–252
Hansen JHL, Hasan T (2015) Speaker recognition by machines and humans: A tutorial review. IEEE Signal Process Mag 32:74–99
Article Google Scholar
Hasan T, Hansen JH (2013) Acoustic factor analysis for robust speaker verification. IEEE Trans Audio Speech Lang Process 21:842–853
Article Google Scholar
Hegde RM, Murthy HA, Gadde VRR (2007) Significance of the modified group delay feature in speech recognition. IEEE Trans Audio Speech Lang Process 15(1):190–202
Article Google Scholar
Introduction page for Chirp Group Delay processing. Available at: http://tcts.fpms.ac.be/demos/zzt/cgd.html. Accessed 25 Nov 2018
Jeevan M, Dhingra A, Hanmandlu M, Panigrahi BK (2017) Robust speaker verification using GFCC based i-Vectors. In: Proceedings of the International Conference on Signal, Networks, Computing, and Systems (pp. 85–91). Springer, New Delhi
Kanagasundaram A, Vogt RJ, Dean DB, Sridharan S (2012) PLDA based speaker recognition on short utterances. In The Speaker and Language Recognition Workshop
Kenny P, Boulianne G, Ouellet P, Dumouchel P (2007) Joint factor analysis versus Eigenchannels inspeaker recognition. IEEE Trans Audio Speech and Lang Process 15:1435–1447
Article Google Scholar
Kim S, Ji M, Kim H (2008) Noise Robust Speaker Recognition Using Subband Likelihoods and Reliable Feature Selection. ETRI J 30:89–100
Article Google Scholar
Kim C, Stern RM (2016) Power-normalized cepstral coefficients (PNCC) for robust speech recognition. IEEE Trans Audio Speech Lang Process 24:1315–1329
Article Google Scholar
Kinnunen T, Alam MJ, Mate ˇjka P, Kenny P, C ˇ ernocky J, OShaughnessy D (2013) Frequency warping and robust speaker verification: a comparison of alternative mel-scale representations. In: Proc. INTERSPEECH. Lyon, France, pp. 3122–3126
Kinnunen T, Rajan P (2013) A practical, self-adaptive voice activity detector for speaker verification with noisy telephone and microphone data. In: Proceedings of ICASSP, pp. 7229–7233
Kinnunen T, Saeidi R, Sedlak F, Lee KA, Sandberg J, Hansson-Sandsten M, Li H (2012) Low-variance multitaper MFCC features: a case study in robust speaker verification. IEEE Trans Audio Speech Lang Proc, pp. 1990–2001
Krobba A, Debyeche M, Selouani SA (2018) Robust speaker verification system in acoustic noise mobile by using Multitaper Gammatone Hilbert envelope coefficients. 2^nd International Conference on Natural Language and Speech Processing (ICNLSP), (pp. 1–6). IEEE
Narendra1 KC, Kumaraswamy R, Gurugopinath S, (2017). Performance comparison of multitaper techniques for speaker verification with expressive speech. International Journal of Speech Technology 1–10
Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Comm 52:12–40
Article Google Scholar
Li Z, Gao Y (2016) Acoustic feature extraction method for robust speaker identification. Multimed Tools Appl 75(12):7391–7406
Article Google Scholar
Ming J, Hazen TJ, Glass JR, Reynolds DA (2007) Robust speaker recognition in noisy conditions. IEEE Trans Audio Speech Lang Process 15(5):1711–1723
Article Google Scholar
Murthy HA, Yegnanarayana B (1991) Speech processing using group delay functions. Signal Process 22:259–267
Article Google Scholar
Paliwal KK, Wojcicki K, Shannon B (2011) The importance of phase in speech enhancement. Speech Comm 53:465–494
Article Google Scholar
Pohjalainen J, Hanilçi C, Kinnunen T, Alku P (2014) Mixture linear prediction in speaker verification under vocal effort mismatch. IEEE Signal Processing Letters 21(12):1516–1520
Article Google Scholar
Prieto GA, Parker RL, Thomson DJ, Vernon FL, Graham RL (2007) Reducing the bias of multitaper spectrum estimates. Geophys J Int 171:1269–1281
Article Google Scholar
Rao W, Mak MW (2013) Boosting the performance of i-vector based speaker verification via utterance partitioning. IEEE Trans Audio Speech Lang Process 21(5):1012–1022
Article Google Scholar
Rao KS, Sarkar S (2014) Stochastic feature compensation for robust speaker verification. In: Robust Speaker Recognition in Noisy Environments (pp. 49–76). Springer, Cham
Ravindran S, Anderson DV, Slaney M (2006) Improving the noise robustness of mel-frequency cepstral coefficients for speech processing. Proc. ISCA SAPA, Pittsburgh, pp 48–52
Google Scholar
Recommendation G (2003) 722.2: Wideband Coding of Speech at around 16 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB)
Sadjadi SO, Hansen JHL (2011) Hilbert envelope based features for robust speaker identification under reverberant mismatched conditions. ICASSP, pp: 5448–5451
Sadjadi SO, Hansen JHL (2015) Mean Hilbert envelope coefficients (MHEC) for robust speaker and language identification. Speech Comm 72:138–148
Article Google Scholar
Sadjadi SO, Hasan T, Hansen JHL (2012) Mean Hilbert Envelope Coefficients (MHEC) for Robust Speaker Recognition. INTERSPEECH, pp: 1696–1699
Saeidi R, Pohjalainen J, Kinnunen T, Alku P (2010) Temporally weighted linear prediction features for tackling additive noise in speaker verification. IEEE Signal Processing Letters 17(6):599–602
Article Google Scholar
Seyed OS, Malcolm S, Heck L (2013) MSR identity toolbox v.1.0.A MATLAB toolbox for speaker recognition research. In proc, IEEE signal Process, Speech and Language Processing Technical Committee Newsletter
Tabibi S, Kegel A, Lai WK, Dillier N (2017) Investigating the use of a Gammatone filterbank for a cochlear implant coding strategy. J Neurosci Methods 277:63–74
Article Google Scholar
The NIST Year (2008) Speaker recognition evaluation plan. Available: https://www.nist.gov/sites/default/files/documents/2017/09/26/sre08_evalplan_release4.pdf
Thomson DJ (1982) Spectrum estimation and harmonic analysis. Proc IEEE 70:1055–1096
Article Google Scholar
Varga A, Steeneken HJ, Tomlinson M, Jones D (1992) The NOISEX-92 study on the effect of additive noise on automatic speech recognition. NOISEX92 CDROM
Ye L, Nie L, Han L, Zhang L, Rosenblum D (2015) Action2Activity: Recognizing Complex Activities from Sensor Data. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Buenos Aires, pp 1617–1623
Google Scholar
Zhao X, Shao Y, Wang DL (2012) CASA-Based Robust Speaker Identification. IEEE Trans Audio, Speech and Language Processing 20(5):1608–1616
Article Google Scholar
Zhu D, Paliwal K (2004) Product of power spectrum and group delay function for speech recognition. Proc of International Conference on Acoustics, Speech and Signal Processing (ICASSP) 1:125–128
Google Scholar

Download references

Author information

Authors and Affiliations

Speech Communication and Signal Processing Laboratory, University of USTHB, Algiers, Algeria
Ahmed Krobba & Mohamed Debyeche
LARIHS Laboratory, Campus Shappaing, University of Moncton, Moncton, Canada
Sid-Ahmed Selouani

Authors

Ahmed Krobba
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Debyeche
View author publications
You can also search for this author in PubMed Google Scholar
Sid-Ahmed Selouani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmed Krobba.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Krobba, A., Debyeche, M. & Selouani, SA. Multitaper chirp group delay Hilbert envelope coefficients for robust speaker verification. Multimed Tools Appl 78, 19525–19542 (2019). https://doi.org/10.1007/s11042-019-7154-y

Download citation

Received: 03 May 2018
Revised: 26 December 2018
Accepted: 02 January 2019
Published: 14 February 2019
Issue Date: 30 July 2019
DOI: https://doi.org/10.1007/s11042-019-7154-y

Multitaper chirp group delay Hilbert envelope coefficients for robust speaker verification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Mixture linear prediction Gammatone Cepstral features for robust speaker verification under transmission channel noise

Gammatone filterbank and symbiotic combination of amplitude and phase-based spectra for robust speaker verification under noisy conditions and compression artifacts

Robust Speaker Recognition Using Improved GFCC and Adaptive Feature Selection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Multitaper chirp group delay Hilbert envelope coefficients for robust speaker verification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Mixture linear prediction Gammatone Cepstral features for robust speaker verification under transmission channel noise

Gammatone filterbank and symbiotic combination of amplitude and phase-based spectra for robust speaker verification under noisy conditions and compression artifacts

Robust Speaker Recognition Using Improved GFCC and Adaptive Feature Selection

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation