Parcollet et al., 2024 - Google Patents
LeBenchmark 2.0: A standardized, replicable and enhanced framework for self-supervised representations of French speechParcollet et al., 2024
View PDF- Document ID
- 10702343771019640492
- Author
- Parcollet T
- Nguyen H
- Evain S
- Boito M
- Pupier A
- Mdhaffar S
- Le H
- Alisamir S
- Tomashenko N
- Dinarelli M
- Zhang S
- Allauzen A
- Coavoux M
- Estève Y
- Rouvier M
- Goulian J
- Lecouteux B
- Portet F
- Rossato S
- Ringeval F
- Schwab D
- Besacier L
- Publication year
- Publication venue
- Computer Speech & Language
External Links
Snippet
Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are …
- 238000012549 training 0 abstract description 132
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/30675—Query execution
- G06F17/30684—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/277—Lexical analysis, e.g. tokenisation, collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2785—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2872—Rule based translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/289—Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lakhotia et al. | On generative spoken language modeling from raw audio | |
Liao et al. | Improving readability for automatic speech recognition transcription | |
Evain et al. | Task agnostic and task specific self-supervised learning from speech with lebenchmark | |
Parcollet et al. | LeBenchmark 2.0: A standardized, replicable and enhanced framework for self-supervised representations of French speech | |
Bellegarda et al. | State of the art in statistical methods for language and speech processing | |
Long et al. | Acoustic data augmentation for Mandarin-English code-switching speech recognition | |
Zheng et al. | BLSTM-CRF Based End-to-End Prosodic Boundary Prediction with Context Sensitive Embeddings in a Text-to-Speech Front-End. | |
Belinkov | On internal language representations in deep learning: An analysis of machine translation and speech recognition | |
Dunbar et al. | Self-supervised language learning from raw audio: Lessons from the zero resource speech challenge | |
Nasr et al. | End-to-end speech recognition for arabic dialects | |
Algayres et al. | DP-Parse: Finding word boundaries from raw speech with an instance lexicon | |
Das et al. | Speechverse: A large-scale generalizable audio language model | |
Victor et al. | Application of extractive text summarization algorithms to speech-to-text media | |
Singh et al. | MECOS: A bilingual Manipuri–English spontaneous code-switching speech corpus for automatic speech recognition | |
Kazakova et al. | Analysis of natural language processing technology: Modern problems and approaches | |
Suni et al. | The simple4all entry to the blizzard challenge 2014 | |
Anidjar et al. | Speech and multilingual natural language framework for speaker change detection and diarization | |
NithyaKalyani et al. | Speech summarization for tamil language | |
Pelloin et al. | ASR-generated text for language model pre-training applied to speech tasks | |
Liu et al. | Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech | |
Yu et al. | Abstractive headline generation for spoken content by attentive recurrent neural networks with ASR error modeling | |
Safonova et al. | Automatic speech recognition of low-resource languages based on Chukchi | |
Woldemariam et al. | Adapting language specific components of cross-media analysis frameworks to less-resourced languages: the case of Amharic | |
Yolchuyeva | Novel NLP Methods for Improved Text-To-Speech Synthesis | |
Krishna et al. | Representation learning with hidden unit clustering for low resource speech applications |