[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

A Pitch and Noise Robust Keyword Spotting System Using SMAC Features with Prosody Modification

Published: 01 April 2021 Publication History

Abstract

Spotting of keywords in continuous speech signal with the aid of the computer is called a keyword spotting (KWS) system. A variety of strategies have been suggested in the literature to detect keywords from the adult’s speech effectively. However, only a limited number of studies have been reported for KWS in children’s speech. Due to the difference in physiological properties, the pitch and speaking rate of children’s differ from the adult’s. Consequently, KWS system model parameters trained on the speech data from adult’s signal yield poor performance for children speech. In this paper, we have developed a KWS system for spotting keywords from children’s speech using models trained on adults’ speech. The proposed approach uses spectral moment time–frequency distribution augmented by low-order cepstral (SMAC) as the front-end feature. The mismatches due to differences in pitch and speaking rate of children and adult speakers are further mitigated by data-augmented training using explicit pitch and speaking rate modifications. The experimental findings presented in this paper show that the SMAC feature offers significantly better output for both clean and noisy test conditions than the conventional Mel frequency cepstral coefficients.

References

[1]
A. Batliner, M. Blomberg, S. D’Arcy, D. Elenius, D. Giuliani, M. Gerosa, C. Hacker, M. Russell, S. Steidl, M. Wong, The PF_STAR children’s speech corpus, in INTERSPEECH, pp. 2761–2764 (2005)
[2]
Becerra A, de la Rosa JI, and González E Speech recognition in a dialog system: from conventional to deep processing Multimed. Tools Appl. 2018 77 12 15875-15911
[3]
Benisty H, Katz I, Crammer K, and Malah D Discriminative keyword spotting for limited-data applications Speech Commun. 2018 99 1-11
[4]
Can D and Saraclar M Lattice indexing for spoken term detection IEEE Trans. Audio Speech Lang. Process. 2011 19 8 2338-2347
[5]
G. Chen, C. Parada, G. Heigold, Small-footprint keyword spotting using deep neural networks, in Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4087–4091 (2014)
[6]
Chen IF, Ni C, Lim BP, Chen NF, and Lee CH A keyword-aware language modelling approach to spoken keyword search J. Signal Process. Syst. 2016 82 2 197-206
[7]
W.M. Fisher, Ther DARPA speech recognition research database: specifications and status, in Proceedings DARPA Workshop on Speech Recognition, Feb. 1986, pp. 93–99 (1986)
[8]
M. Gerosa, D. Giuliani, S. Narayanan, A. Potamianos, A review of ASR technologies for children’s speech, in Proceedings of the 2nd Workshop on Child, Computer and Interaction, pp. 7:1–7:8 (2009)
[9]
D.R.H. Miller, M. Kleber, C.L. Kao, O. Kimball, T. Colthurst, S.A. Lowe, R.M. Schwartz, H. Gish, Rapid and accurate spoken term detection, in Proceedings INTERSPEECH (2007)
[10]
Murthy KSR and Yegnanarayana B Epoch extraction from speech signals IEEE Trans. Audio Speech Lang. Process. 2008 16 1602-1613
[11]
Pattanayak B, Rout JK, and Pradhan G Adaptive spectral smoothening for development of robust keyword spotting system IET Signal Proc. 2019 13 5 544-550
[12]
D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz et al., The Kaldi speech recognition toolkit, in Proceedings Automatic Speech Recognition and Understanding (2011)
[13]
Rao KS and Yegnanarayana B Prosody modification using instants of significant excitation IEEE Trans. Audio Speech Lang. Process. 2006 14 3 972-980
[14]
S.P. Rath, D. Povey, K. Veselỳ, J. Cernockỳ, Improved feature processing for deep neural networks, in Proceedings INTERSPEECH, pp. 109–113 (2013)
[15]
T. Robinson, J. Fransen, D. Pye, J. Foote, S. Renals, WSJCAMO: a British English speech corpus for large vocabulary continuous speech recognition, in 1995 International Conference on Acoustics, Speech, and Signal Processing. vol. 1, pp. 81–84. IEEE (1995)
[16]
M. Russell, S. D’Arcy, Challenges for computer recognition of children’s speech, in Speech and Language Technology in Education (SLaTE2007), pp. 108–111 (2007)
[17]
Shah M, Arunachalam S, Wang J, Blaauw D, Sylvester D, Kim HS, Seo JS, and Chakrabarti C A fixed-point neural network architecture for speech applications on resource-constrained hardware J. Signal Process. Syst. 2018 90 5 727-741
[18]
S. Shahnawazuddin, A. Dey, R. Sinha, Pitch-adaptive front-end features for robust children’s ASR, in Proceedings INTERSPEECH, pp. 3459–3463 (2016)
[19]
Shahnawazuddin S, Adiga N, and Kathania HK Effect of prosody modification on children’s ASR IEEE Signal Process. Lett. 2017 24 11 1749-1753
[20]
Shahnawazuddin S, Sinha R, and Pradhan G Pitch-normalized acoustic features for robust children’s speech recognition IEEE Signal Process. Lett. 2017 24 8 1128-1132
[21]
Shahnawazuddin S, Maity K, and Pradhan G Improving the performance of keyword spotting system for children’s speech through prosody modification Digit. Signal Proc. 2019 86 11-18
[22]
H. Sundar, J.F. Lehman, R. Singh, Keyword spotting in multi-player voice-driven games for children, in Proceedings Sixteenth Annual Conference of the International Speech Communication Association, pp. 1660–1664 (2015)
[23]
Tsiakoulis P, Potamianos A, and Dimitriadis D Spectral moment features augmented by low order cepstral coefficients for robust ASR IEEE Signal Process. Lett. 2010 17 6 551-554
[24]
Varga A and Steeneken HJ Assessment for automatic speech recognition: II. Noisex-92: a database and an experiment to study the effect of additive noise on speech recognition systems Speech Commun. 1993 12 3 247-251
[25]
S. Wegmann, A. Faria, A. Janin, K. Riedhammer, N. Morgan, The TAO of ATWV: probing the mysteries of keyword search performance, in 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 192–197. IEEE (2013)
[26]
Wöllmer M, Schuller B, Batliner A, Steidl S, and Seppi D Tandem decoding of children’s speech for keyword detection in a child-robot interaction scenario ACM Trans. Speech Lang. Process. (TSLP) 2011 7 4 1-22
[27]
Y.D. Wu, B.L. Liu, Keyword spotting method based on speech feature space trace matching, in Proceedings of the 2003 International Conference on Machine Learning and Cybernetics. vol. 5, pp. 3188–3192. IEEE (2003)
[28]
Yadav IC, Shahnawazuddin S, and Pradhan G Addressing noise and pitch sensitivity of speech recognition system through variational mode decomposition based spectral smoothing Digit. Signal Proc. 2019 86 55-64
[29]
N. Zhao, H. Yang, Realizing speech to gesture conversion by keyword spotting. In: Proceedings Chinese Spoken Language Processing (ISCSLP), pp. 1–5 (2016)

Cited By

View all
  • (2023)Noise robust automatic speech recognition: review and analysisInternational Journal of Speech Technology10.1007/s10772-023-10033-026:2(475-519)Online publication date: 1-Jul-2023
  • (2023)Addressing Effects of Formant Dispersion and Pitch Sensitivity for the Development of Children’s KWS SystemSpeech and Computer10.1007/978-3-031-48309-7_42(520-534)Online publication date: 29-Nov-2023
  • (2022)Analysis of Short-Time Magnitude Spectra for Improving Intelligibility Assessment of Dysarthric SpeechCircuits, Systems, and Signal Processing10.1007/s00034-022-02047-x41:10(5676-5698)Online publication date: 1-Oct-2022
  • Show More Cited By

Index Terms

  1. A Pitch and Noise Robust Keyword Spotting System Using SMAC Features with Prosody Modification
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Circuits, Systems, and Signal Processing
        Circuits, Systems, and Signal Processing  Volume 40, Issue 4
        Apr 2021
        522 pages

        Publisher

        Birkhauser Boston Inc.

        United States

        Publication History

        Published: 01 April 2021
        Accepted: 06 October 2020
        Revision received: 01 October 2020
        Received: 21 December 2019

        Author Tags

        1. Keyword spotting
        2. Children’s speech
        3. SMAC feature
        4. Pitch modification
        5. Duration modification

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 13 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2023)Noise robust automatic speech recognition: review and analysisInternational Journal of Speech Technology10.1007/s10772-023-10033-026:2(475-519)Online publication date: 1-Jul-2023
        • (2023)Addressing Effects of Formant Dispersion and Pitch Sensitivity for the Development of Children’s KWS SystemSpeech and Computer10.1007/978-3-031-48309-7_42(520-534)Online publication date: 29-Nov-2023
        • (2022)Analysis of Short-Time Magnitude Spectra for Improving Intelligibility Assessment of Dysarthric SpeechCircuits, Systems, and Signal Processing10.1007/s00034-022-02047-x41:10(5676-5698)Online publication date: 1-Oct-2022
        • (2022)Data-Adaptive Single-Pole Filtering of Magnitude Spectra for Robust Keyword SpottingCircuits, Systems, and Signal Processing10.1007/s00034-021-01923-241:5(3023-3039)Online publication date: 1-May-2022
        • (2021)Training augmentation with TANDEM acoustic modelling in Punjabi adult speech recognition systemInternational Journal of Speech Technology10.1007/s10772-021-09797-024:2(473-481)Online publication date: 1-Jun-2021

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media