Dash et al., 2024 - Google Patents

A TRANSFORMER APPROACH TO BILINGUAL AUTOMATED SPEECH RECOGNITION USING CODE-SWITCHED SPEECH

Dash et al., 2024

Document ID: 13992071895485161965
Author: Dash P; Babu S; Singaravel L; Balasubramanian D
Publication year: 2024
Publication venue: Obstetrics and Gynaecology Forum

External Links

Cited by

Snippet

In a bilingual and linguistically diverse country like India, where a significant portion of the population is fluent in multiple languages, the conventional bilingual Transformer neural network architecture faces challenges in accurately translating conversations that …

Continue reading at www.obstetricsandgynaecologyforum.com (PDF) (other versions)

238000013459 approach 0 title abstract description 32

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/19—Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
- G10L15/197—Probabilistic grammars, e.g. word n-grams
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G06F17/2827—Example based machine translation; Alignment
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2872—Rule based translation
- G06F17/2881—Natural language generation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/277—Lexical analysis, e.g. tokenisation, collocates
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/289—Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using predictive techniques

Similar Documents

Publication	Publication Date	Title
Kheddar et al.	2023	Deep transfer learning for automatic speech recognition: Towards better generalization
CN109887484B (en)	2023-08-04	Dual learning-based voice recognition and voice synthesis method and device
US20220230628A1 (en)	2022-07-21	Generation of optimized spoken language understanding model through joint training with integrated knowledge-language module
US20090248394A1 (en)	2009-10-01	Machine translation in continuous space
US11798529B2 (en)	2023-10-24	Generation of optimized knowledge-based language model through knowledge graph multi-alignment
Păiş et al.	2022	Capitalization and punctuation restoration: a survey
Kumar et al.	2022	A comprehensive review of recent automatic speech summarization and keyword identification techniques
Sreeram et al.	2020	Exploration of end-to-end framework for code-switching speech recognition task: Challenges and enhancements
Shao et al.	2023	Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition
CN115547293A (en)	2022-12-30	Multi-language voice synthesis method and system based on layered prosody prediction
US20220230629A1 (en)	2022-07-21	Generation of optimized spoken language understanding model through joint training with integrated acoustic knowledge-speech module
Singh et al.	2024	MECOS: A bilingual Manipuri–English spontaneous code-switching speech corpus for automatic speech recognition
Singh et al.	2023	An integrated model for text to text, image to text and audio to text linguistic conversion using machine learning approach
Dash et al.	2024	A TRANSFORMER APPROACH TO BILINGUAL AUTOMATED SPEECH RECOGNITION USING CODE-SWITCHED SPEECH
WO2022159198A1 (en)	2022-07-28	Generation of optimized knowledge-based language model through knowledge graph multi-alignment
WO2022159211A1 (en)	2022-07-28	Generation of optimized spoken language understanding model through joint training with integrated knowledge-language module
Yolchuyeva	2021	Novel NLP Methods for Improved Text-To-Speech Synthesis
Safonova et al.	2022	Automatic speech recognition of low-resource languages based on Chukchi
Bogdanoski et al.	2023	Exploring ASR Models in Low-Resource Languages: Use-Case the Macedonian Language
Monesh Kumar et al.	2024	A New Robust Deep Learning‐Based Automatic Speech Recognition and Machine Transition Model for Tamil and Gujarati
Bekarystankyzy et al.	2024	Integrated End-to-End automatic speech recognition for languages for agglutinative languages
Rahmati et al.	2024	GE2PE: Persian End-to-End Grapheme-to-Phoneme Conversion
CN117524193B (en)	2024-03-29	Training method, device, equipment and medium for Chinese-English mixed speech recognition system
Gong et al.	2021	A Review of End-to-End Chinese–Mandarin Speech Synthesis Techniques
Lamichhane et al.	2021	English Speech Recognition Using Convolution Neural Network, Gated Recurrent Unit and Connectionist Temporal Classification