Parcollet et al., 2024 - Google Patents

LeBenchmark 2.0: A standardized, replicable and enhanced framework for self-supervised representations of French speech

Parcollet et al., 2024

Document ID: 10702343771019640492
Author: Parcollet T; Nguyen H; Evain S; Boito M; Pupier A; Mdhaffar S; Le H; Alisamir S; Tomashenko N; Dinarelli M; Zhang S; Allauzen A; Coavoux M; Estève Y; Rouvier M; Goulian J; Lecouteux B; Portet F; Rossato S; Ringeval F; Schwab D; Besacier L
Publication year: 2024
Publication venue: Computer Speech & Language

External Links

Cited by

Snippet

Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are …

Continue reading at arxiv.org (PDF) (other versions)

238000012549 training 0 abstract description 132

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/30675—Query execution
- G06F17/30684—Query execution using natural language analysis
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/277—Lexical analysis, e.g. tokenisation, collocates
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2785—Semantic analysis
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2872—Rule based translation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/289—Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass

Similar Documents

Publication	Publication Date	Title
Lakhotia et al.	2021	On generative spoken language modeling from raw audio
Liao et al.	2023	Improving readability for automatic speech recognition transcription
Evain et al.	2021	Task agnostic and task specific self-supervised learning from speech with lebenchmark
Parcollet et al.	2024	LeBenchmark 2.0: A standardized, replicable and enhanced framework for self-supervised representations of French speech
Bellegarda et al.	2016	State of the art in statistical methods for language and speech processing
Long et al.	2020	Acoustic data augmentation for Mandarin-English code-switching speech recognition
Zheng et al.	2018	BLSTM-CRF Based End-to-End Prosodic Boundary Prediction with Context Sensitive Embeddings in a Text-to-Speech Front-End.
Belinkov	2018	On internal language representations in deep learning: An analysis of machine translation and speech recognition
Dunbar et al.	2022	Self-supervised language learning from raw audio: Lessons from the zero resource speech challenge
Nasr et al.	2023	End-to-end speech recognition for arabic dialects
Algayres et al.	2022	DP-Parse: Finding word boundaries from raw speech with an instance lexicon
Das et al.	2024	Speechverse: A large-scale generalizable audio language model
Victor et al.	2019	Application of extractive text summarization algorithms to speech-to-text media
Singh et al.	2024	MECOS: A bilingual Manipuri–English spontaneous code-switching speech corpus for automatic speech recognition
Kazakova et al.	2022	Analysis of natural language processing technology: Modern problems and approaches
Suni et al.	2014	The simple4all entry to the blizzard challenge 2014
Anidjar et al.	2023	Speech and multilingual natural language framework for speaker change detection and diarization
NithyaKalyani et al.	2019	Speech summarization for tamil language
Pelloin et al.	2022	ASR-generated text for language model pre-training applied to speech tasks
Liu et al.	2023	Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech
Yu et al.	2016	Abstractive headline generation for spoken content by attentive recurrent neural networks with ASR error modeling
Safonova et al.	2022	Automatic speech recognition of low-resource languages based on Chukchi
Woldemariam et al.	2020	Adapting language specific components of cross-media analysis frameworks to less-resourced languages: the case of Amharic
Yolchuyeva	2021	Novel NLP Methods for Improved Text-To-Speech Synthesis
Krishna et al.	2023	Representation learning with hidden unit clustering for low resource speech applications