Variani et al., 2020 - Google Patents
Neural oracle search on n-best hypothesesVariani et al., 2020
- Document ID
- 17178186380429881932
- Author
- Variani E
- Chen T
- Apfel J
- Ramabhadran B
- Lee S
- Moreno P
- Publication year
- Publication venue
- ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
External Links
Snippet
In this paper, we propose a neural search algorithm to select the most likely hypothesis using a sequence of acoustic representations and multiple hypotheses as input. The algorithm provides a sequence level score for each audio-hypothesis pair that is obtained by …
- 230000001537 neural 0 title abstract description 11
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Variani et al. | Hybrid autoregressive transducer (hat) | |
US12254865B2 (en) | Multi-dialect and multilingual speech recognition | |
JP7070894B2 (en) | Time series information learning system, method and neural network model | |
Variani et al. | Neural oracle search on n-best hypotheses | |
US20220223066A1 (en) | Method, device, and computer program product for english pronunciation assessment | |
EP0635820A1 (en) | Minimum error rate training of combined string models | |
CN110459208B (en) | Knowledge migration-based sequence-to-sequence speech recognition model training method | |
Nagy et al. | Automatic punctuation restoration with bert models | |
Masumura et al. | Large context end-to-end automatic speech recognition via extension of hierarchical recurrent encoder-decoder models | |
Matsoukas et al. | Advances in transcription of broadcast news and conversational telephone speech within the combined EARS BBN/LIMSI system | |
JPH07506198A (en) | composite expert | |
Bluche et al. | Predicting detection filters for small footprint open-vocabulary keyword spotting | |
Chuangsuwanich | Multilingual techniques for low resource automatic speech recognition | |
US20250149032A1 (en) | End-to-end automatic speech recognition system for both conversational and command-and-control speech | |
CN115204143A (en) | Method and system for calculating text similarity based on prompt | |
Alsayadi et al. | Dialectal Arabic speech recognition using CNN-LSTM based on end-to-end deep learning | |
Joshi et al. | Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems. | |
Collobert et al. | Word-level speech recognition with a letter to word encoder | |
Bai et al. | Integrating knowledge into end-to-end speech recognition from external text-only data | |
Moriya et al. | Improving scheduled sampling for neural transducer-based asr | |
Fukuda et al. | Global RNN Transducer Models For Multi-dialect Speech Recognition. | |
US20250104717A9 (en) | End-to-End Speech Recognition Adapted for Multi-Speaker Applications | |
Fosler-Lussier et al. | Crandem systems: Conditional random field acoustic models for hidden Markov models | |
Wang et al. | Speech-and-text transformer: Exploiting unpaired text for end-to-end speech recognition | |
Deng et al. | History utterance embedding transformer lm for speech recognition |