Bui et al., 2021 - Google Patents

Multi-task Text Normalization Approach for Speech Synthesis

Bui et al., 2021

Document ID: 7065142421152839345
Author: Bui C; Truong T; Nguyen C; Phung L; Nguyen M; Nguyen H
Publication year: 2021
Publication venue: PRICAI 2021: Trends in Artificial Intelligence: 18th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2021, Hanoi, Vietnam, November 8–12, 2021, Proceedings, Part II 18

External Links

Cited by

Snippet

Abstract Text normalization for Text-To-Speech includes several challenging tasks in natural language processing such as verbalizing abbreviations, Out-Of-Vocabulary (OOV) words, and chunking phrases for long sentences without punctuation. Instead of dealing with these …

Continue reading at link.springer.com (other versions)

238000010606 normalization 0 title abstract description 35

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2872—Rule based translation
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/289—Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
- G10L13/10—Prosody rules derived from text; Stress or intonation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/06—Elementary speech units used in speech synthesisers; Concatenation rules
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B19/00—Teaching not covered by other main groups of this subclass
- G09B19/06—Foreign languages
- G09B19/08—Printed or written appliances, e.g. text books, bilingual letter assemblies, charts
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints

Similar Documents

Publication	Publication Date	Title
Santhanavijayan et al.	2021	A semantic-aware strategy for automatic speech recognition incorporating deep learning models
Hasegawa-Johnson et al.	2020	Grapheme-to-phoneme transduction for cross-language ASR
Mohammed	2020	Using machine learning to build POS tagger for under-resourced language: the case of Somali
Khomitsevich et al.	2015	A bilingual Kazakh-Russian system for automatic speech recognition and synthesis
Kubis et al.	2019	Open challenge for correcting errors of speech recognition systems
CN113823259A (en)	2021-12-21	Method and device for converting text data into phoneme sequence
Besacier et al.	2006	ASR and translation for under-resourced languages
Rao et al.	2015	Language identification using excitation source features
Park et al.	2019	Jejueo datasets for machine translation and speech synthesis
Abdelghany et al.	2020	Doc2Vec: An approach to identify Hadith Similarities
Bui et al.	2021	Multi-task Text Normalization Approach for Speech Synthesis
Domokos et al.	2015	Romanian phonetic transcription dictionary for speeding up language technology development
Safarik et al.	2017	Unified approach to development of ASR systems for East Slavic languages
Devi et al.	2021	Development of ManiTo: a Manipuri tonal contrast dataset
Hlaing et al.	2018	Myanmar Number Normalization for Text-to-Speech
Volín et al.	2021	Human and transformer-based prosodic phrasing in two speech genres
Sefara	2019	The development of an automatic pronunciation assistant
Boughariou et al.	2021	Classification based method for disfluencies detection in spontaneous spoken Tunisian dialect
Carson-Berndsen	2002	Multilingual time maps: portable phonotactic models for speech technology
Tran	2015	SentiVoice-a system for querying hotel service reviews via phone
Soundarya et al.	2023	Analysis of Mispronunciation Detection and Diagnosis Based on Conventional Deep Learning Techniques
Lyes et al.	2019	Building a pronunciation dictionary for the Kabyle language
Iman et al.	2022	Neural Network for Arabic Text Diacritization on a New Dataset
Menshikova et al.	2019	Prosodic boundaries prediction in russian using morphological and syntactic features
Ghosh et al.	2023	Boosting Rule-Based Grapheme-to-Phoneme Conversion with Morphological Segmentation and Syllabification in Bengali