Abstract
In this paper, we propose two new feature extraction methods for robust automatic speaker verification under noisy conditions. The first method, called Multi-taper Gammatone Hilbert Envelope Coefficients (MGHECs), employs multi-taper magnitude spectra that offer considerable advantages for spectrum estimates. The second method, called Multi-taper Chirp Group Delay Zeros-Phase Hilbert Envelope Coefficients (MCGDZPHECs) based on multi-tapers phase spectral. The chirp group delay technique is used to estimate the vocal tract from the chirp Fourier transform phase. The performance evaluation of the proposed methods and their extended variants are carried out on NIST 2008 corpus under noisy conditions, using various noise SNR levels which are extracted from NOISEX-92. Experimental results show that the proposed methods provide better representation of speech spectrum. Moreover, we obtained a significant improvement in performance under noisy conditions when compared to conventional Mean Hilbert Envelope Coefficients (MHECs) feature extraction.
Similar content being viewed by others
References
Ajmera PK, Holambe RS (2013) Fractional Fourier transform based features for speaker recognition using support vector machine. Comput Electr Eng 39:550–557
Al-Ali AKH, Dean D, Senadji B, Chandran V, Naik GR (2017) Enhanced forensic speaker verification using a combination of DWT and MFCC feature warping in the presence of noise and reverberation conditions. IEEE Access 5:15400–15413
Alam MJ, Kinnunen T, Kenny P, Ouellet P, O’Shaughnessy D (2013) Multitaper MFCC and PLP features for speaker verification using i-vectors. Speech Comm 55:237–251
Alsteris LD, Paliwal KK (2006) Short-time phase spectrum in speech processing: a review and some experimental results. Digital Signal Process 17:578–616
Ambikairajah E, Kua JMK, Sethu V, Li H (2012) PNCC-ivector-SRC based speaker verification. In: Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), Asia-Pacific, pp. 1–7
Apsingekar VR, De Leon PL (2011) Speaker verification score normalization using speaker model clusters. Speech Comm 53:110–118
Asbai N, Amrouche A (2017) Boosting scores fusion approach using Front-End Diversity and adaboost Algorithm, for speaker verification. Comput Electr Eng 62:648–662 Elsevier
Babadi B, Brown EN (2014) A review of multitaper spectral analysis. IEEE Trans Biomed Eng 61:1555–1564
Banno H, Takeda K, Itakura F (2001) A study on perceptual distance measure for phase spectrum of stimuli. In: Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3297–3300
Bhattacharjee U, Pranab Das (2013) Performance Evaluation of Wiener Filter and Kalman Filter Combined with Spectral Subtraction in Speaker Verification System, International Journal of Innovative Technology and Exploring Engineering (IJITEE), ISSN: 2278–3075, vol-2
Bousquet PM, Bonastre JF, Matrouf D (2014) Exploring some limits of Gaussian PLDA modeling for i-vector distributions. In Odyssey: The Speaker and Language Recognition Workshop, 41–47
Bozkurt B, Couvreur L, Dutoit T (2007) Chirp group delay analysis of speech signals. Speech Comm 49:159–176
Chang J, Wang D (2017) Robust speaker recognition based on DNN/i-Vectors and speech separation. In: IEEE International Conference Acoustics, Speech and Signal Processing (ICASSP), pp. 5415–5419
Chetouani M, Faundez-Zanuy M, Gas B, Zarader JL (2009) Investigation on LP-residual representations for speaker identification. Pattern Recognition 42:487–494
Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19:788–798
Fedila M, Bengherabi M, Amrouche A (2015) Consolidating product spectrum and gammatone filterbank for robust speaker verification under noisy conditions. In: International Conference on Intelligent Systems Design and Applications (ISDA). IEEE, pp 347–352
Fedila M, Harizi F, Bengherabi M, Amrouche A (2014) Robust speaker verification using a new front end based on multitaper and gammatone filters. In: Tenth International Conference on Signal Image Technology and Internet-Based Systems (SITIS). IEEE, pp. 99–103
Garcia-Romero D, Espy-Wilson CY (2011) Analysis of i-vector length normalization in speaker recognition systems. In Interspeech, pp. 249–252
Hansen JHL, Hasan T (2015) Speaker recognition by machines and humans: A tutorial review. IEEE Signal Process Mag 32:74–99
Hasan T, Hansen JH (2013) Acoustic factor analysis for robust speaker verification. IEEE Trans Audio Speech Lang Process 21:842–853
Hegde RM, Murthy HA, Gadde VRR (2007) Significance of the modified group delay feature in speech recognition. IEEE Trans Audio Speech Lang Process 15(1):190–202
Introduction page for Chirp Group Delay processing. Available at: http://tcts.fpms.ac.be/demos/zzt/cgd.html. Accessed 25 Nov 2018
Jeevan M, Dhingra A, Hanmandlu M, Panigrahi BK (2017) Robust speaker verification using GFCC based i-Vectors. In: Proceedings of the International Conference on Signal, Networks, Computing, and Systems (pp. 85–91). Springer, New Delhi
Kanagasundaram A, Vogt RJ, Dean DB, Sridharan S (2012) PLDA based speaker recognition on short utterances. In The Speaker and Language Recognition Workshop
Kenny P, Boulianne G, Ouellet P, Dumouchel P (2007) Joint factor analysis versus Eigenchannels inspeaker recognition. IEEE Trans Audio Speech and Lang Process 15:1435–1447
Kim S, Ji M, Kim H (2008) Noise Robust Speaker Recognition Using Subband Likelihoods and Reliable Feature Selection. ETRI J 30:89–100
Kim C, Stern RM (2016) Power-normalized cepstral coefficients (PNCC) for robust speech recognition. IEEE Trans Audio Speech Lang Process 24:1315–1329
Kinnunen T, Alam MJ, Mate ˇjka P, Kenny P, C ˇ ernocky J, OShaughnessy D (2013) Frequency warping and robust speaker verification: a comparison of alternative mel-scale representations. In: Proc. INTERSPEECH. Lyon, France, pp. 3122–3126
Kinnunen T, Rajan P (2013) A practical, self-adaptive voice activity detector for speaker verification with noisy telephone and microphone data. In: Proceedings of ICASSP, pp. 7229–7233
Kinnunen T, Saeidi R, Sedlak F, Lee KA, Sandberg J, Hansson-Sandsten M, Li H (2012) Low-variance multitaper MFCC features: a case study in robust speaker verification. IEEE Trans Audio Speech Lang Proc, pp. 1990–2001
Krobba A, Debyeche M, Selouani SA (2018) Robust speaker verification system in acoustic noise mobile by using Multitaper Gammatone Hilbert envelope coefficients. 2nd International Conference on Natural Language and Speech Processing (ICNLSP), (pp. 1–6). IEEE
Narendra1 KC, Kumaraswamy R, Gurugopinath S, (2017). Performance comparison of multitaper techniques for speaker verification with expressive speech. International Journal of Speech Technology 1–10
Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Comm 52:12–40
Li Z, Gao Y (2016) Acoustic feature extraction method for robust speaker identification. Multimed Tools Appl 75(12):7391–7406
Ming J, Hazen TJ, Glass JR, Reynolds DA (2007) Robust speaker recognition in noisy conditions. IEEE Trans Audio Speech Lang Process 15(5):1711–1723
Murthy HA, Yegnanarayana B (1991) Speech processing using group delay functions. Signal Process 22:259–267
Paliwal KK, Wojcicki K, Shannon B (2011) The importance of phase in speech enhancement. Speech Comm 53:465–494
Pohjalainen J, Hanilçi C, Kinnunen T, Alku P (2014) Mixture linear prediction in speaker verification under vocal effort mismatch. IEEE Signal Processing Letters 21(12):1516–1520
Prieto GA, Parker RL, Thomson DJ, Vernon FL, Graham RL (2007) Reducing the bias of multitaper spectrum estimates. Geophys J Int 171:1269–1281
Rao W, Mak MW (2013) Boosting the performance of i-vector based speaker verification via utterance partitioning. IEEE Trans Audio Speech Lang Process 21(5):1012–1022
Rao KS, Sarkar S (2014) Stochastic feature compensation for robust speaker verification. In: Robust Speaker Recognition in Noisy Environments (pp. 49–76). Springer, Cham
Ravindran S, Anderson DV, Slaney M (2006) Improving the noise robustness of mel-frequency cepstral coefficients for speech processing. Proc. ISCA SAPA, Pittsburgh, pp 48–52
Recommendation G (2003) 722.2: Wideband Coding of Speech at around 16 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB)
Sadjadi SO, Hansen JHL (2011) Hilbert envelope based features for robust speaker identification under reverberant mismatched conditions. ICASSP, pp: 5448–5451
Sadjadi SO, Hansen JHL (2015) Mean Hilbert envelope coefficients (MHEC) for robust speaker and language identification. Speech Comm 72:138–148
Sadjadi SO, Hasan T, Hansen JHL (2012) Mean Hilbert Envelope Coefficients (MHEC) for Robust Speaker Recognition. INTERSPEECH, pp: 1696–1699
Saeidi R, Pohjalainen J, Kinnunen T, Alku P (2010) Temporally weighted linear prediction features for tackling additive noise in speaker verification. IEEE Signal Processing Letters 17(6):599–602
Seyed OS, Malcolm S, Heck L (2013) MSR identity toolbox v.1.0.A MATLAB toolbox for speaker recognition research. In proc, IEEE signal Process, Speech and Language Processing Technical Committee Newsletter
Tabibi S, Kegel A, Lai WK, Dillier N (2017) Investigating the use of a Gammatone filterbank for a cochlear implant coding strategy. J Neurosci Methods 277:63–74
The NIST Year (2008) Speaker recognition evaluation plan. Available: https://www.nist.gov/sites/default/files/documents/2017/09/26/sre08_evalplan_release4.pdf
Thomson DJ (1982) Spectrum estimation and harmonic analysis. Proc IEEE 70:1055–1096
Varga A, Steeneken HJ, Tomlinson M, Jones D (1992) The NOISEX-92 study on the effect of additive noise on automatic speech recognition. NOISEX92 CDROM
Ye L, Nie L, Han L, Zhang L, Rosenblum D (2015) Action2Activity: Recognizing Complex Activities from Sensor Data. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Buenos Aires, pp 1617–1623
Zhao X, Shao Y, Wang DL (2012) CASA-Based Robust Speaker Identification. IEEE Trans Audio, Speech and Language Processing 20(5):1608–1616
Zhu D, Paliwal K (2004) Product of power spectrum and group delay function for speech recognition. Proc of International Conference on Acoustics, Speech and Signal Processing (ICASSP) 1:125–128
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Krobba, A., Debyeche, M. & Selouani, SA. Multitaper chirp group delay Hilbert envelope coefficients for robust speaker verification. Multimed Tools Appl 78, 19525–19542 (2019). https://doi.org/10.1007/s11042-019-7154-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-7154-y