[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Lu et al., 2013 - Google Patents

Combining a vector space representation of linguistic context with a deep neural network for text-to-speech synthesis

Lu et al., 2013

View PDF
Document ID
17506550162325821482
Author
Lu H
King S
Watts O
Publication year
Publication venue
8th ISCA Speech Synthesis Workshop

External Links

Snippet

Conventional statistical parametric speech synthesis relies on decision trees to cluster together similar contexts, result-ing in tied-parameter context-dependent hidden Markov models (HMMs). However, decision tree clustering has a major weak-ness: it use hard …
Continue reading at www.research.ed.ac.uk (PDF) (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/144Training of HMMs
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/06Elementary speech units used in speech synthesisers; Concatenation rules
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data

Similar Documents

Publication Publication Date Title
Lu et al. Combining a vector space representation of linguistic context with a deep neural network for text-to-speech synthesis
US11361751B2 (en) Speech synthesis method and device
Zen et al. Statistical parametric speech synthesis using deep neural networks
Sun et al. Voice conversion using deep bidirectional long short-term memory based recurrent neural networks
Kirchhoff et al. Combining acoustic and articulatory feature information for robust speech recognition
Wu et al. Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis
Athanaselis et al. ASR for emotional speech: clarifying the issues and enhancing performance
Hashimoto et al. The effect of neural networks in statistical parametric speech synthesis
CN106935239A (en) The construction method and device of a kind of pronunciation dictionary
Chengalvarayan et al. Speech trajectory discrimination using the minimum classification error learning
Hashimoto et al. Trajectory training considering global variance for speech synthesis based on neural networks
EP1647970B1 (en) Hidden conditional random field models for phonetic classification and speech recognition
Rosenberg et al. Modeling phrasing and prominence using deep recurrent learning.
Shahin et al. Talking condition recognition in stressful and emotional talking environments based on CSPHMM2s
Ng et al. Spoken language recognition with prosodic features
Du et al. Phone-level prosody modelling with GMM-based MDN for diverse and controllable speech synthesis
CN110415725A (en) Use the method and system of first language data assessment second language pronunciation quality
Lai et al. Phone-aware LSTM-RNN for voice conversion
Kuzdeuov et al. Speech command recognition: Text-to-speech and speech corpus scraping are all you need
Zangar et al. Duration modelling and evaluation for Arabic statistical parametric speech synthesis
Ribeiro et al. Learning word vector representations based on acoustic counts
Saba et al. Urdu text-to-speech conversion using deep learning
Barman et al. State of the art review of speech recognition using genetic algorithm
Pour et al. Persian automatic speech recognition by the use of whisper model
Vazirnezhad et al. Hybrid statistical pronunciation models designed to be trained by a medium-size corpus