Variani et al., 2020 - Google Patents

Neural oracle search on n-best hypotheses

Variani et al., 2020

Document ID: 17178186380429881932
Author: Variani E; Chen T; Apfel J; Ramabhadran B; Lee S; Moreno P
Publication year: 2020
Publication venue: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

External Links

Cited by

Snippet

In this paper, we propose a neural search algorithm to select the most likely hypothesis using a sequence of acoustic representations and multiple hypotheses as input. The algorithm provides a sequence level score for each audio-hypothesis pair that is obtained by …

Continue reading at ieeexplore.ieee.org (other versions)

230000001537 neural 0 title abstract description 11

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification

Similar Documents

Publication	Publication Date	Title
Variani et al.	2020	Hybrid autoregressive transducer (hat)
US12254865B2 (en)	2025-03-18	Multi-dialect and multilingual speech recognition
JP7070894B2 (en)	2022-05-18	Time series information learning system, method and neural network model
Variani et al.	2020	Neural oracle search on n-best hypotheses
US20220223066A1 (en)	2022-07-14	Method, device, and computer program product for english pronunciation assessment
EP0635820A1 (en)	1995-01-25	Minimum error rate training of combined string models
CN110459208B (en)	2022-01-11	Knowledge migration-based sequence-to-sequence speech recognition model training method
Nagy et al.	2021	Automatic punctuation restoration with bert models
Masumura et al.	2019	Large context end-to-end automatic speech recognition via extension of hierarchical recurrent encoder-decoder models
Matsoukas et al.	2006	Advances in transcription of broadcast news and conversational telephone speech within the combined EARS BBN/LIMSI system
JPH07506198A (en)	1995-07-06	composite expert
Bluche et al.	2019	Predicting detection filters for small footprint open-vocabulary keyword spotting
Chuangsuwanich	2016	Multilingual techniques for low resource automatic speech recognition
US20250149032A1 (en)	2025-05-08	End-to-end automatic speech recognition system for both conversational and command-and-control speech
CN115204143A (en)	2022-10-18	Method and system for calculating text similarity based on prompt
Alsayadi et al.	2022	Dialectal Arabic speech recognition using CNN-LSTM based on end-to-end deep learning
Joshi et al.	2021	Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems.
Collobert et al.	2020	Word-level speech recognition with a letter to word encoder
Bai et al.	2021	Integrating knowledge into end-to-end speech recognition from external text-only data
Moriya et al.	2023	Improving scheduled sampling for neural transducer-based asr
Fukuda et al.	2022	Global RNN Transducer Models For Multi-dialect Speech Recognition.
US20250104717A9 (en)	2025-03-27	End-to-End Speech Recognition Adapted for Multi-Speaker Applications
Fosler-Lussier et al.	2008	Crandem systems: Conditional random field acoustic models for hidden Markov models
Wang et al.	2023	Speech-and-text transformer: Exploiting unpaired text for end-to-end speech recognition
Deng et al.	2021	History utterance embedding transformer lm for speech recognition