[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Robust Speech Rate Estimation for Spontaneous Speech

Published: 01 November 2007 Publication History

Abstract

In this paper, we propose a direct method for speech rate estimation from acoustic features without requiring any automatic speech transcription. We compare various spectral and temporal signal analysis and smoothing strategies to better characterize the underlying syllable structure to derive speech rate. The proposed algorithm extends the methods of spectral sub- band correlation by including temporal correlation and the use of prominent spectral subbands for improving the signal correlation essential for syllable detection. Furthermore, to address some of the practical robustness issues in previously proposed methods, we introduce some novel components into the algorithm such as the use of pitch confidence for filtering spurious syllable envelope peaks, magnifying window for tackling neighboring syllable smearing, and relative peak measure thresholds for pseudo peak rejection. We also describe an automated approach for learning algorithm parameters from data, and find the optimal settings through Monte Carlo simulations and parameter sensitivity analysis. Final experimental evaluations are conducted based on a portion of the Switchboard corpus for which manual phonetic segmentation information, and published results for direct comparison are available. The results show a correlation coefficient of 0.745 with respect to the ground truth based on manual segmentation. This result is about a 17% improvement compared to the current best single estimator and a 11% improvement over the multiestimator evaluated on the same Switchboard database.

Cited By

View all
  • (2024)Speed-Aware Audio-Driven Speech Animation using Adaptive WindowsACM Transactions on Graphics10.1145/369134144:1(1-14)Online publication date: 31-Aug-2024
  • (2024)A Comprehensive Analysis of Speech Recognition Systems in Healthcare: Current Research Challenges and Future ProspectsSN Computer Science10.1007/s42979-023-02466-w5:1Online publication date: 3-Jan-2024
  • (2021)A robust speech rate estimation based on the activation profile from the selected acoustic unit dictionary2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2016.7472709(5400-5404)Online publication date: 11-Mar-2021
  • Show More Cited By
  1. Robust Speech Rate Estimation for Spontaneous Speech

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image IEEE Transactions on Audio, Speech, and Language Processing
    IEEE Transactions on Audio, Speech, and Language Processing  Volume 15, Issue 8
    November 2007
    389 pages

    Publisher

    IEEE Press

    Publication History

    Published: 01 November 2007

    Author Tags

    1. Rich speech transcription
    2. speech prosody
    3. speech rate estimation
    4. spontaneous speech processing

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 31 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Speed-Aware Audio-Driven Speech Animation using Adaptive WindowsACM Transactions on Graphics10.1145/369134144:1(1-14)Online publication date: 31-Aug-2024
    • (2024)A Comprehensive Analysis of Speech Recognition Systems in Healthcare: Current Research Challenges and Future ProspectsSN Computer Science10.1007/s42979-023-02466-w5:1Online publication date: 3-Jan-2024
    • (2021)A robust speech rate estimation based on the activation profile from the selected acoustic unit dictionary2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2016.7472709(5400-5404)Online publication date: 11-Mar-2021
    • (2021)Supervised and unsupervised active learning for automatic speech recognition of low-resource languages2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2016.7472693(5320-5324)Online publication date: 11-Mar-2021
    • (2021)A Robust Speaking Rate Estimator Using a CNN-BLSTM NetworkCircuits, Systems, and Signal Processing10.1007/s00034-021-01754-140:12(6098-6120)Online publication date: 1-Dec-2021
    • (2019)QuantleProceedings of the 18th International Conference on Information Processing in Sensor Networks10.1145/3302506.3310405(253-264)Online publication date: 16-Apr-2019
    • (2018)Sequential use of spectral models to reduce deletion and insertion errors in vowel detectionComputer Speech and Language10.1016/j.csl.2017.12.00850:C(105-125)Online publication date: 1-Jul-2018
    • (2017)Sonority Measurement Using System, Source, and Suprasegmental InformationIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2016.264190125:3(505-518)Online publication date: 1-Mar-2017
    • (2017)Automatic detection of syllable stress using sonority based prominence features for pronunciation evaluation2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2017.7953277(5845-5849)Online publication date: 5-Mar-2017
    • (2017)Using speech technology for quantifying behavioral characteristics in peer-led team learning sessionsComputer Speech and Language10.1016/j.csl.2017.04.00246:C(343-366)Online publication date: 1-Nov-2017
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media