Niu et al., 2017 - Google Patents

A study on landmark detection based on CTC and its application to pronunciation error detection

Niu et al., 2017

Document ID: 15952675735315899758
Author: Niu C; Zhang J; Yang X; Xie Y
Publication year: 2017
Publication venue: 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

External Links

Cited by

Snippet

Acoustic features extracted in the vicinity of landmarks have demonstrated their usefulness for detecting mispronunciation in our recent work [1, 2]. Traditional approaches of detecting acoustic landmarks rely on annotations by linguists with prior knowledge of speech …

Continue reading at www.apsipa.org (PDF) (other versions)

238000001514 detection method 0 title abstract description 39

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/187—Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B19/00—Teaching not covered by other main groups of this subclass
- G09B19/06—Foreign languages

Similar Documents

Publication	Publication Date	Title
CN110517663B (en)	2021-09-21	Language identification method and system
Polzehl et al.	2011	Anger recognition in speech using acoustic and linguistic cues
Lee et al.	2013	An information-extraction approach to speech processing: Analysis, detection, verification, and recognition
Le et al.	2009	Automatic speech recognition for under-resourced languages: application to Vietnamese language
Lee et al.	2012	A comparison-based approach to mispronunciation detection
Tachbelie et al.	2014	Using different acoustic, lexical and language modeling units for ASR of an under-resourced language–Amharic
Arora et al.	2018	Phonological feature-based speech recognition system for pronunciation training in non-native language learning
Tu et al.	2018	Investigating the role of L1 in automatic pronunciation evaluation of L2 speech
Vlasenko et al.	2014	Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications
Kruspe et al.	2016	Bootstrapping a System for Phoneme Recognition and Keyword Spotting in Unaccompanied Singing.
Serrino et al.	2019	Contextual Recovery of Out-of-Lattice Named Entities in Automatic Speech Recognition.
Qian et al.	2010	Discriminative acoustic model for improving mispronunciation detection and diagnosis in computer-aided pronunciation training (CAPT).
Mao et al.	2018	Applying multitask learning to acoustic-phonemic model for mispronunciation detection and diagnosis in l2 english speech
Lee	2016	Language-independent methods for computer-assisted pronunciation training
Mary et al.	2018	Searching speech databases: features, techniques and evaluation measures
Niu et al.	2017	A study on landmark detection based on CTC and its application to pronunciation error detection
Shen et al.	2022	Self-supervised pre-trained speech representation based end-to-end mispronunciation detection and diagnosis of Mandarin
Zhang et al.	2016	Wake-up-word spotting using end-to-end deep neural network system
Joshi et al.	2015	Vowel mispronunciation detection using DNN acoustic models with cross-lingual training.
Wang et al.	2018	L2 mispronunciation verification based on acoustic phone embedding and Siamese networks
Qu et al.	2018	Combining articulatory features with end-to-end learning in speech recognition
Chen et al.	2022	An Alignment Method Leveraging Articulatory Features for Mispronunciation Detection and Diagnosis in L2 English.
Manjunath et al.	2017	Development of multilingual phone recognition system for Indian languages
Lin et al.	2023	Multi-lingual pronunciation assessment with unified phoneme set and language-specific embeddings
Wana et al.	2020	A multi-view approach for Mandarin non-native mispronunciation verification