CN109272992A - A kind of spoken language assessment method, device and a kind of device for generating spoken appraisal model - Google Patents
A kind of spoken language assessment method, device and a kind of device for generating spoken appraisal model Download PDFInfo
- Publication number
- CN109272992A CN109272992A CN201811422784.1A CN201811422784A CN109272992A CN 109272992 A CN109272992 A CN 109272992A CN 201811422784 A CN201811422784 A CN 201811422784A CN 109272992 A CN109272992 A CN 109272992A
- Authority
- CN
- China
- Prior art keywords
- voice data
- measured
- spoken
- text
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 82
- 238000012545 processing Methods 0.000 claims abstract description 18
- 238000011156 evaluation Methods 0.000 claims abstract description 13
- 239000000284 extract Substances 0.000 claims abstract description 7
- 238000012549 training Methods 0.000 claims description 76
- 238000001514 detection method Methods 0.000 claims description 29
- 239000012634 fragment Substances 0.000 claims description 19
- 230000006837 decompression Effects 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 12
- 230000000694 effects Effects 0.000 claims description 11
- 238000000605 extraction Methods 0.000 claims description 10
- 238000006243 chemical reaction Methods 0.000 claims description 8
- 230000002123 temporal effect Effects 0.000 claims description 7
- 235000013399 edible fruits Nutrition 0.000 claims description 3
- 238000002630 speech therapy Methods 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 8
- 230000006835 compression Effects 0.000 description 5
- 238000007906 compression Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 238000010606 normalization Methods 0.000 description 3
- 230000001755 vocal effect Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 239000003086 colorant Substances 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 241000406668 Loxodonta cyclotis Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000002035 prolonged effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000002864 sequence alignment Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0631—Creating reference templates; Clustering
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Machine Translation (AREA)
Abstract
The application provides a kind of spoken assessment method, device and a kind of device for generating spoken appraisal model, wherein spoken assessment method, the data processing end applied to spoken evaluation system includes: to obtain to be measured to comment voice data;Extract the spoke language attribute characteristic information to be measured for commenting voice data;Marking result is obtained according to the spoke language attribute characteristic information to be measured for commenting voice data based on pre-generated spoken appraisal model.
Description
Technical field
This application involves language learning technical field, in particular to a kind of spoken assessment method, device and a kind of generation mouth
The device of language appraisal model, a kind of calculating equipment and a kind of computer readable storage medium.
Background technique
With the development of global economic integration, English increasingly shows its important work as international language
With.The activity such as commercial exchange, cultural exchanges, transnational tourist is increasingly frequent, and more and more people need to learn a foreign language, in this way
The raising of oral communicative competence has become the urgent need of foreign language learning.
Both for adult, voice training data are substantially adult existing overwhelming majority oral evaluation technology.By
It is imperfect in the sound channel development of child, and have child's training data of mark seldom, these models are in child's oral evaluation
It is difficult the performance having had.Other model needs a large amount of example audio, and the flexibility of model is very low.
Summary of the invention
In view of this, the embodiment of the present application provides a kind of spoken assessment method, device and a kind of spoken mould of testing and assessing of generations
The device of type, a kind of calculating equipment and a kind of computer readable storage medium, to solve technological deficiency existing in the prior art.
The embodiment of the present application discloses a kind of spoken assessment method, applied to the data processing end of spoken evaluation system, packet
It includes:
It obtains and to be measured comments voice data;
Extract the spoke language attribute characteristic information to be measured for commenting voice data;
It is obtained based on pre-generated spoken appraisal model according to the spoke language attribute characteristic information to be measured for commenting voice data
To marking result.
Optionally, it is described obtain it is to be measured comment voice data after, described extract described to be measured comments the spoken of voice data to belong to
Property characteristic information before include:
To it is described it is to be measured comment voice data carry out voice terminate to detect;
To voice terminate detection after it is to be measured comment voice data carry out sound channel length normalized;
Voice data progress speech recognition is commented to obtain comment sound number to be measured to by the to be measured of sound channel length normalized
According to voice frame sequence and converting text corresponding with voice frame sequence;
The converting text and referenced text are subjected to pressure registration process, and generate pressure alignment information, the pressure
Alignment information includes the corresponding relationship of the voice frame sequence of converting text Yu the voice frame sequence of the referenced text.
Optionally, to it is described it is to be measured comment voice data carry out voice terminate detection include:
To it is described it is to be measured comment voice data to identify, and the comment sound data to be measured after identification were converted to comprising the time
Information detects speech text to the end of;
It carries out matching acquisition matched text time point with the speech text that detects to the end of according to the referenced text;
It is determined according to preset voice activity detection method and the matched text time point and described to be measured comments voice data
End time point.
Optionally, the spoke language attribute characteristic information includes:
Pronunciation attributive character information;
Intonation attributive character information;
And fluency attributive character information.
Optionally, the spoke language attribute characteristic information further include: word marks characteristic information;
After alignment information is forced in the generation further include:
According to alignment information is forced, judge whether the conversion text matches with the referenced text, if it is not, then to reference
The word addition word mark characteristic information not matched that in text.
Optionally, it is described obtain it is to be measured comment in voice data it is described it is to be measured comment voice data be the compressed voice of fragment
Data;
The acquisition is to be measured to comment VoP to include:
Receive that fragment is compressed to be measured to comment voice data;
It is compressed to fragment it is to be measured comment voice data carry out decompression processing.
Optionally, during the pressure registration process, using the decoding figure for having Kleene closure, for according to ginseng
It examines text and comments the voice data repeated in voice data to filter out for be measured.
Optionally, the acquisition is to be measured comments voice data, extracts the spoke language attribute feature letter to be measured for commenting voice data
Breath includes:
Obtain initial speech data;
Using the speech separating method clustered based on depth, target speech data is extracted from the initial speech data and is made
Voice data is commented to be to be measured;
Extract the spoke language attribute characteristic information to be measured for commenting voice data.
On the other hand, the application also provides a kind of method for generating spoken appraisal model, comprising:
Oral Training sample set is generated according to Oral Training sample group, each Oral Training sample group includes spoken instruction
Practice sample, each Oral Training sample includes spoke language attribute characteristic information and the spoke language attribute characteristic information is corresponding beats
Divide result;
Spoken appraisal model is trained according to the Oral Training sample set, the spoken language appraisal model makes each
The spoke language attribute characteristic information of the Oral Training sample is associated with corresponding marking result.
Optionally, include: before being trained according to the Oral Training sample set to spoken appraisal model
Sound channel length normalized is carried out to the voice data of the Oral Training sample, by all Oral Trainings
The sound channel length of Oral Training sample is unitized in sample group.
Optionally, the spoke language attribute characteristic information includes:
Pronunciation attributive character information;
Intonation attributive character information;
And fluency attributive character information.
On the other hand, the application also provides a kind of spoken assessment device, comprising:
Voice data obtains module, is configured as obtaining and to be measured comments voice data;
Attributive character extraction module is configured as extracting the spoke language attribute characteristic information to be measured for commenting voice data;
Marking result-generation module, is configured as based on pre-generated spoken appraisal model according to the comment sound to be measured
The spoke language attribute characteristic information of data obtains marking result.
Optionally, described device further include:
Voice terminates detection module, be configured as to it is described it is to be measured comment voice data carry out voice terminate to detect;
Sound channel degree normalized module, to be measured after being configured as terminating voice detection comment voice data carry out sound
Road length normalized;
Speech recognition module is configured as commenting voice data to carry out voice to by the to be measured of sound channel length normalized
Identification obtains the converting text to be measured for commenting voice data;
Alignment module is forced, is configured as the converting text and referenced text carrying out pressure registration process, and generate
Alignment information is forced, the pressure alignment information includes the voice frame sequence of converting text and the speech frame sequence of the referenced text
The corresponding relationship of column.
Optionally, the voice terminates detection module and includes:
Recognition unit, be configured as to it is described it is to be measured comment voice data to identify, and by the comment sound to be measured after identification
Data are converted to detects speech text comprising temporal information to the end of;
Matching unit is configured as carrying out matching acquisition according to the referenced text and the speech text that detects to the end of
Matched text time point;
Determination unit is configured as determining institute according to preset voice activity detection method and the matched text time point
State the end time point to be measured for commenting voice data.
Optionally, described device further include:
Mark module is configured as judge whether are the conversion text and the referenced text according to alignment information is forced
Matching, if it is not, then to the word addition word mark characteristic information not matched that in referenced text.
Optionally, the voice data acquisition module includes:
First acquisition unit is configured as obtaining the compressed voice data of fragment;
Decompression units, be configured as it is compressed to fragment it is to be measured comment voice data carry out decompression processing.
Optionally, the pressure alignment module, using the decoding figure for having Kleene closure, for will be to according to referenced text
The voice data repeated in assessment voice data filters out.
Optionally, the voice data acquisition module includes:
Second acquisition unit is configured as obtaining initial speech data;
Extraction unit is configured as extracting from initial speech data using the speech separating method clustered based on depth
Target speech data comments voice data as to be measured.
On the other hand, the application also provides a kind of device for generating spoken appraisal model, comprising:
Training sample generation module is configured as generating Oral Training sample set, Mei Gesuo according to Oral Training sample group
Stating Oral Training sample group includes Oral Training sample, and each Oral Training sample includes spoke language attribute characteristic information and institute
State the corresponding marking result of spoke language attribute characteristic information;
Model generation module is configured as being trained spoken appraisal model according to the Oral Training sample set, obtain
To the spoken appraisal model, the spoken language appraisal model makes the spoke language attribute characteristic information of each Oral Training sample
It is associated with corresponding marking result.
Optionally, described device further include: sound channel length normalized module is configured as to the Oral Training sample
This voice data carries out sound channel length normalized, by the sound of Oral Training sample in all Oral Training sample groups
Road length is unitized.
On the other hand, the application also provides a kind of calculating equipment, including memory, processor and storage are on a memory simultaneously
The computer instruction that can be run on a processor, the processor realize the spoken assessment method or institute when executing described instruction
The step of stating the method for generating spoken appraisal model.
On the other hand, the application also provides a kind of computer readable storage medium, is stored with computer instruction, the instruction
The step of spoken assessment method or the method for generating spoken appraisal model are realized when being executed by processor.
A kind of spoken assessment method, device and a kind of device, Yi Zhongji for generating spoken appraisal model provided by the present application
Equipment and a kind of computer readable storage medium are calculated, by carrying out sound channel length normalized and model to Oral Training sample
Training enables the spoken language assessment method and device to carry out spoken assessment to different crowd, and obtains accurately spoken assessment
As a result, especially solving the problems, such as child's oral evaluation.Meanwhile terminating detection using voice in the spoken language assessment method and improving
The fluency of entire spoken assessment, reduces user and uses threshold.In addition to this, it is solved in the method using Kleene closure
The problem of repeating with reading;Speech separating method based on depth cluster effectively prevents voice data superposition to be detected or alternately existing
The appearance of elephant.
Detailed description of the invention
Fig. 1 is the structural block diagram of the calculating equipment of one embodiment of the application;
Fig. 2 is the schematic flow chart of the spoken assessment method of one embodiment of the application;
Fig. 3 is the schematic flow chart of the spoken assessment method of one embodiment of the application
Fig. 4 is the schematic flow chart of the spoken appraisal model of generation of one embodiment of the application;
Fig. 5 is the TDNN schematic network structure of one embodiment of the application;
Fig. 6 is the partial structural diagram of the TDNN network structure of one embodiment of the application;
Fig. 7 is the structural schematic diagram of the spoken assessment device of one embodiment of the application;
Fig. 8 is the structural schematic diagram of the device of the spoken appraisal model of generation of one embodiment of the application.
Specific embodiment
Many details are explained in the following description in order to fully understand the application.But the application can be with
Much it is different from other way described herein to implement, those skilled in the art can be without prejudice to the application intension the case where
Under do similar popularization, therefore the application is not limited by following public specific implementation.
In this application, a kind of spoken assessment method, device and a kind of device for generating spoken appraisal model, one are provided
Kind calculates equipment and a kind of computer readable storage medium, is described in detail one by one in the following embodiments.
Fig. 1 shows the structural block diagram of the calculating equipment 100 according to one embodiment of this specification.The calculating equipment 100
Component includes but is not limited to memory 110 and processor 120.Processor 120 is connected with memory 110 by bus 130, number
According to library 150 for saving data, the data that library 150 stores for receiving data of network 160.
Calculating equipment 100 further includes access device 140, access device 140 enable calculate equipment 100 via one or
Multiple networks 160 communicate.The example of these networks includes public switched telephone network (PSTN), local area network (LAN), wide area network
(WAN), the combination of the communication network of personal area network (PAN) or such as internet.Access device 140 may include wired or wireless
One or more of any kind of network interface (for example, network interface card (NIC)), such as IEEE802.11 wireless local area
Net (WLAN) wireless interface, worldwide interoperability for microwave accesses (Wi-MAX) interface, Ethernet interface, universal serial bus (USB) connect
Mouth, cellular network interface, blue tooth interface, near-field communication (NFC) interface, etc..
In an embodiment of this specification, unshowned other component can also in above-mentioned and Fig. 1 of calculating equipment 100
To be connected to each other, such as pass through bus.It should be appreciated that calculating device structure block diagram shown in FIG. 1 is merely for the sake of exemplary
Purpose, rather than the limitation to this specification range.Those skilled in the art can according to need, and increase or replace other portions
Part.
Calculating equipment 100 can be any kind of static or mobile computing device, including mobile computer or mobile meter
Calculate equipment (for example, tablet computer, personal digital assistant, laptop computer, notebook computer, net book etc.), movement
Phone (for example, smart phone), wearable calculating equipment (for example, smartwatch, intelligent glasses etc.) or other kinds of shifting
Dynamic equipment, or the static calculating equipment of such as desktop computer or PC.Calculating equipment 100 can also be mobile or state type
Server.
Wherein, processor 120 can execute the step in method shown in Fig. 2.Fig. 2 shows according to one embodiment of the application
Spoken assessment method schematic flow chart, the spoken language assessment method, applied to the data processing end of spoken evaluation system,
Including step 202 to step 206:
Step 202: acquisition is to be measured to comment voice data.
Step 204: extracting the spoke language attribute characteristic information to be measured for commenting voice data.
Wherein, the spoke language attribute characteristic information includes: pronunciation attributive character information, intonation attributive character information and stream
Sharp degree attributive character information;
Step 206: special according to the spoke language attribute to be measured for commenting voice data based on pre-generated spoken appraisal model
Reference ceases to obtain marking result.
Obtain it is to be measured comment voice data after, the spoken language appraisal model can be according to the voice data to be measured commented
Spoke language attribute characteristic information obtains marking as a result, step is simple, can greatly shorten the time that user waits result.And the mouth
Language appraisal model is that precondition is good, without lay in a large amount of example audio can treat assessment voice data give a mark,
Expand the use scope of this method.
The content of the spoke language attribute characteristic information reflects the marking direction of the spoken appraisal model, i.e., the described spoken language
Pronunciation, intonation and the fluency that appraisal model treats assessment voice data are given a mark, and provide a score, with evaluation to
The accuracy of assessment voice data.
Such as the score of the spoken appraisal model provided is obtained according to hundred-mark system, to the spoken appraisal model
After inputting one section of voice, the spoken language appraisal model can carry out in terms of the pronunciation of this section of voice, intonation and fluency three
Marking, and providing marking result is 98 points.
The schematic flow chart of the spoken assessment method according to one embodiment of the application is shown referring to Fig. 3, Fig. 3;It is described
Spoken assessment method step includes step 302 to step 322:
Step 302: obtain by fragment it is compressed it is to be measured comment voice data, it is described that be measured to comment voice data be initial language
Sound data.
The data processing end be server end in the case where, if the server end from client directly acquire without
The to be measured of overcompression comments voice data, and user needs to expend biggish network flow, can also bring biggish delay, while also can
Pressure is brought to server network broadband.To it is described it is to be measured comment voice data to carry out fragment compression processing, and carry out sliced transmission.
The size of the comment sound data fragmentation to be measured affects the performance of entire oral evaluation, and fragment is too short, and transmission is prolonged
It is low late, but server end CPU/GPU utilization rate is low.Fragment is too long, although CPU/GPU utilization rate is high, network delay is big.
In actual use, each optimal cutting length to be measured for commenting voice data can be determined through a large number of experiments.
Audio compression algorithm is divided into lossless and damages two major classes, although lossless compression is lost without any information, pressure
Shrinkage is too low, and although lossy compression has partial information loss, but compression ratio is higher, to speech recognition result almost without shadow
It rings.There are many kinds of mature method, one embodiments of the application to use AAC (Advanced Audio Coding) lattice for compressed encoding
Formula compressed encoding.
Step 304: it is compressed to fragment it is to be measured comment voice data carry out decompression processing (inverse transformation of AAC).
It is compressed to fragment it is to be measured comment voice data carry out decompression processing convenient for it is subsequent treat assessment voice data place
Reason.
Step 306: using the speech separating method clustered based on depth, extracting target language from the initial speech data
Sound data comment voice data as to be measured.
Under the scene that child has parent to accompany one's spouse who is studying abroad, it is possible that the phenomenon that speech superposition or alternating.It is clustered using depth
Speech separating method accurately the voice data of child can be extracted from the voice data accompanied one's spouse who is studying abroad with parent, with
Improve the precision of spoken assessment.
Step 308: to it is described it is to be measured comment voice data carry out voice terminate to detect.
In this step, specific the following steps are included:
To it is described it is to be measured comment voice data to identify, and the comment sound data to be measured after identification were converted to comprising the time
Information detects speech text to the end of;
It carries out matching acquisition matched text time point with the speech text that detects to the end of according to the referenced text,
Middle matched text time point detects temporal information in speech text for the time point in the temporal information of referenced text and to the end of
In time point matching after time point;
It is determined according to preset voice activity detection method and the matched text time point and described to be measured comments voice data
End time point.
Wherein, the referenced text is the text for including user speech content and temporal information, the preset voice
Activity detection approach can include but is not limited to VAD (Voice Activity Detection, also known as speech terminals detection, language
Sound border detection).Terminate in detection process in voice, it is described wait test and assess by determining VAD and the matched text time point
The end time point of voice data can be accurately judged to user whether processing is over reading, thus the no manual intervention the case where
Lower realization detects the end time point of user speech automatically, and then improves the fluency of entire spoken assessment, reduces institute
That states method uses threshold, so that the field that child independently uses in the case that the method is accompanied especially suitable for no parent
Scape.
Traditional VAD technology based on dominant frequency component will appear a large amount of false positive in the very big scene of environmental noise
Erroneous judgement.Since the Oral Training sample in one embodiment of the application contains various true noise scenes, and user
Text with reading is pre-set, so that the VAD based on speech recognition is more robust, false positive False Rate can be significantly
It reduces.
Step 310: to voice terminate detection after it is to be measured comment voice data carry out sound channel length normalized.
In this step, using VTLN (Vocal Tract Length Normalisation, sound channel length normalization)
Technology influences to eliminate vocal tract shape bring recognition performance, in the case where guaranteeing to test and assess accuracy, expands this method
The scope of application.
Step 312: to by sound channel length normalized it is to be measured comment voice data carry out speech recognition obtain it is to be measured
Comment the voice frame sequence and converting text corresponding with voice frame sequence of voice data.
To by sound channel length normalized it is to be measured comment voice data carry out speech recognition can obtain really to
Assessment voice data, convenient for improving the accuracy of marking.
Step 314: the converting text and referenced text are subjected to pressure registration process, and generate pressure alignment information,
The pressure alignment information includes the corresponding relationship of the voice frame sequence of converting text Yu the voice frame sequence of the referenced text.
In this step, using Viterbi algorithm, which is a kind of dynamic programming algorithm, optimal minor structure such as formula
(1):
α (t, j)=bj(ot)miniα (t-1, i) aijFormula (1)
Wherein, α (t, j) is indicated until t moment, and hidden state is the probability of the optimal path of j;bj(ot) indicate given
O is generated under hidden state jtEmission probability;aijIndicate the transition probability from hidden state i to j;otIndicate the observation of t moment.
The detailed alignment information that alignment information is forced as user speech and referenced text voice frame sequence is subsequent
Speech assessment provides the foundation information.
During the pressure registration process, using the decoding figure for having Kleene closure, for according to referenced text
The voice data repeated in voice data is commented to filter out by be measured, wherein Kleene closure is being based on weighted finite state machine
It is widely used in (weighted finite-state transducer, WFST) decoded speech recognition, for solving weight
Again with reading the problem of.
Step 316: extracting the spoke language attribute characteristic information to be measured for commenting voice data.
Wherein, the spoke language attribute characteristic information includes: pronunciation attributive character information;
Intonation attributive character information;
And fluency attributive character information.
The content of the spoke language attribute characteristic information reflects the marking direction of the spoken appraisal model, that is, treats assessment
Pronunciation, intonation and the fluency of voice data are given a mark.
Wherein, the pronunciation attributive character information comments the corresponding language of each word, syllable, phoneme in voice data by be measured
Sound identification prediction probability composition;The intonation attributive character information comments each word, syllable, phoneme pair in voice data by be measured
The duration and fundamental frequency answered determine;Fluency attributive character information comments each word, syllable, phoneme in voice data by be measured
Corresponding duration, fundamental frequency and word speed determine.
Step 318: special according to the spoke language attribute to be measured for commenting voice data based on pre-generated spoken appraisal model
Reference ceases to obtain marking result.
It is a series of to treat assessment voice data progress decompression processing, the speech Separation of depth cluster, pressure registration process etc.
Pretreatment, can be improved the accuracy of spoken appraisal model marking, and expand the scope of application of this method, make different crowd all
It is able to use this method and carries out spoken assessment, and result of accurately being given a mark.
Referring to fig. 4, Fig. 4 shows the schematic flow chart that spoken appraisal model is generated according to one embodiment of the application, this
Apply in an embodiment, the spoken language appraisal model is obtained by step 3091 to step 3093:
Step 3092: Oral Training sample set, each Oral Training sample group are generated according to Oral Training sample group
Comprising Oral Training sample, each Oral Training sample includes spoke language attribute characteristic information and spoke language attribute feature letter
Cease corresponding marking result.
With the use of user and the accumulation of Oral Training sample, one embodiment of the application is continuous using unsupervised method
Ground carries out expansion and model iteration to Oral Training sample set.Offer example audio not allowable in this way, can to user speech into
Row assessment, so that this method is more general, more robust.
Step 3094: sound channel length normalized being carried out to the voice data of the Oral Training sample, will be owned
Oral Training sample group in Oral Training sample sound channel length it is unitized.
In this step, Oral Training sample set difference sound channel shape is eliminated using VTLN (sound channel length normalization) technology
Shape bring recognition performance influences, and ensure that the accuracy of user assessment of the spoken appraisal model to different vocal tract shapes.
Step 3096: spoken appraisal model being trained according to the Oral Training sample set, the spoken assessment mould
Type makes the spoke language attribute characteristic information of each Oral Training sample associated with corresponding marking result.
The spoken language appraisal model uses the TDNN (time-delay neural network) based on secondary sampling, and the TDNN is a kind of spy
Different convolutional network efficiently can rely on sequential relationship modeling to be long, have operand small, and recognition effect is good, is suitble to real-time
The advantages that speech recognition scene.Largely the experimental results showed that, it is better than using the TDNN effect of symmetrical context asymmetrical
TDNN.Specific TDNN schematic network structure is referring to Fig. 5, wherein 1 to 6 is hidden layer, and 7 be output layer.
The partial structurtes 500 of TDNN network structure in Fig. 5 are shown referring to Fig. 6, Fig. 6, voice is being extracted from input layer
Split is carried out by split layer after frame, split goes out multichannel input;The speech frame after split is delivered to full connection by split layer later
Layer, full articulamentum are connected the speech frame after split entirely;The speech frame after full connection is delivered to pond layer, Chi Hua later
Layer reduces the output size of the speech frame after full connection.
Step 320: the spoke language attribute characteristic information further include: word marks characteristic information;
According to alignment information is forced, judge whether the conversion text matches with the referenced text, if it is not, then to reference
The word addition word mark characteristic information not matched that in text.Wherein, according to alignment information is forced, judge the conversion text
Whether this matches with the referenced text, if it is not, there are voice keys in voice frame sequence corresponding for the referenced text
Voice blank frame sequence alignment in frame sequence voice frame sequence corresponding with conversion text, then determine described section of voice key frame
The content of corresponding referenced text is skip part, and determines that the unmatched content of remaining in the referenced text is to mispronounce portion
Point.
Characteristic information is marked according to word, can determine user's skip and the word mispronounced, so that the spoken assessment mould
Type can according to the pronunciation to be measured for commenting voice data, intonation, fluency into and user's skip and the word mispronounced give a mark,
Compared with the spoken appraisal model given a mark according to the pronunciation for commenting voice data, intonation and the fluency to be measured, marking
As a result more accurate.
In addition to this, word mark characteristic information can also be mispronounced according to the different error modes of word, skip
Different marks is carried out to corresponding word, shows the word mispronounced with skip in different ways in order to subsequent;Wherein
The different modes including but not limited to add different font colors, font size to the skip and the word mispronounced respectively
Or/and font type, which word as skip distinguished, which is the word mispronounced.
Step 322: characteristic information being marked according to word, shows the word not matched that, and show marking result.
Show the word that does not match that, and show marking as a result, make user be not only able to know final marking as a result,
Also the word for being able to know that oneself skip and misreading.It is not only able to carry out mouth to different crowd by sound channel length normalized
Language assessment, but also accurately spoken evaluating result can be obtained, especially solve the problems, such as child's oral evaluation.
The spoken language assessment method is made by carrying out sound channel length normalized and model training to Oral Training sample
Spoken assessment can be carried out to different crowd by obtaining the spoken language assessment method, and obtain accurately spoken evaluating result, especially be solved
The problem of child's oral evaluation.Meanwhile terminating detection using voice in the spoken language assessment method and improving entire spoken assessment
Fluency, reduce user using threshold.In addition to this, repetition asking with reading is solved using Kleene closure in the method
Topic;Speech separating method based on depth cluster effectively prevents voice data superposition or the appearance of Alternate Phenomenon to be detected.
One embodiment of this specification additionally provides a kind of spoken assessment device, shows referring to Fig. 7, Fig. 7 according to the application
The structural schematic diagram of the spoken assessment device 700 of one embodiment, comprising:
Voice data obtains module 702, is configured as obtaining and to be measured comments voice data;
Attributive character extraction module 704 is configured as extracting the spoke language attribute characteristic information to be measured for commenting voice data;
Marking result-generation module 706 is configured as based on pre-generated spoken appraisal model according to described wait test and assess
The spoke language attribute characteristic information of voice data obtains marking result;
Obtain it is to be measured comment voice data after, attributive character extraction module can extract the voice data to be measured of commenting
Spoke language attribute characteristic information, so that marking result-generation module uses the spoken appraisal model, according to the comment sound to be measured
The spoke language attribute characteristic information of data obtains marking as a result, step is simple, can greatly shorten the time that user waits result.And
The spoken language appraisal model is that precondition is good, can treat assessment voice data progress without laying in a large amount of example audio
Marking, expands the use scope of the device.
One embodiment of this specification additionally provides a kind of spoken assessment device, comprising:
Voice data obtains module, is configured as obtaining and to be measured comments voice data;
Attributive character extraction module is configured as extracting the spoke language attribute characteristic information to be measured for commenting voice data;
Marking result-generation module, is configured as based on pre-generated spoken appraisal model according to the comment sound to be measured
The spoke language attribute characteristic information of data obtains marking result.
In an alternative embodiment, the spoken assessment device further include:
Voice terminates detection module, be configured as to it is described it is to be measured comment voice data carry out voice terminate to detect;
Sound channel degree normalized module, to be measured after being configured as terminating voice detection comment voice data carry out sound
Road length normalized;
Speech recognition module is configured as to be measured voice data being commented to carry out to by the described of sound channel length normalized
Speech recognition obtains the converting text to be measured for commenting voice data;
Alignment module is forced, is configured as the converting text and referenced text carrying out pressure registration process, and generate
Alignment information is forced, the pressure alignment information includes the voice frame sequence of converting text and the speech frame sequence of the referenced text
The corresponding relationship of column.
The spoken language assessment device is made by carrying out sound channel length normalized and model training to Oral Training sample
Spoken assessment can be carried out to different crowd by obtaining the spoken language assessment method, and obtain accurately spoken evaluating result, especially be solved
The problem of child's oral evaluation.Meanwhile terminating detection using voice in the spoken language assessment device and improving entire spoken assessment
Fluency, reduce user using threshold.In addition to this, repetition asking with reading is solved using Kleene closure in the method
Topic;Speech separating method based on depth cluster effectively prevents voice data superposition or the appearance of Alternate Phenomenon to be detected.
In an alternative embodiment, the voice terminates detection module and includes:
Recognition unit, be configured as to it is described it is to be measured comment voice data to identify, and by the comment sound to be measured after identification
Data are converted to detects speech text comprising temporal information to the end of;
Matching unit is configured as carrying out matching acquisition according to the referenced text and the speech text that detects to the end of
Matched text time point;
Determination unit is configured as determining institute according to preset voice activity detection method and the matched text time point
State the end time point to be measured for commenting voice data.
Terminate in detection process in voice, by matched text time point determine it is described it is to be measured comment voice data at the end of
Between point can be accurately judged to user whether completed with read, thus realize be not necessarily to manual intervention, can detect and set out automatically
Sound terminates, and improves the fluency of entire spoken assessment, and reduce described device uses threshold, especially suitable for no parent
The scene that child independently uses in the case where company.
In an alternative embodiment, the spoke language attribute characteristic information includes:
Pronunciation attributive character information, intonation attributive character information, fluency attributive character information and word mark feature
Information.
In an alternative embodiment, the spoken assessment device further include:
Mark module judges whether the conversion text matches with the referenced text according to alignment information is forced, if
It is no, then to the word addition word mark characteristic information not matched that in referenced text.
The mark module marks characteristic information according to word, can determine user's skip and the word mispronounced, so that institute
State spoken appraisal model can according to the pronunciation to be measured for commenting voice data, intonation, fluency into and user's skip and mispronounce
Word is given a mark, with the spoken assessment given a mark according to the pronunciation for commenting voice data, intonation and the fluency to be measured
Model is compared, and marking result is more accurate.
In addition to this, the mark module can also be to the corresponding list of different error modes according to word mark characteristic information
Word carries out different marks, in order to which subsequent show in different ways mispronounces word corresponding with skip;Wherein institute
It states different modes and including but not limited to adds different font colors, font size to the skip and the word mispronounced respectively
Or/and font type.
In an alternative embodiment, the spoken assessment device further include:
Display module is configured as marking characteristic information according to word, shows the word not matched that, and shows marking knot
Fruit.
Display module shows the word not matched that, and shows marking as a result, user is made to be not only able to know marking knot
Fruit, the word for being also able to know that oneself skip and misreading.By sound channel length normalized be not only able to different crowd into
The spoken assessment of row, but also accurately spoken evaluating result can be obtained, especially solve the problems, such as child's oral evaluation.
In an alternative embodiment, the voice data acquisition module includes:
First acquisition unit is configured as obtaining the compressed voice data of fragment;
Decompression units, be configured as it is compressed to fragment it is to be measured comment voice data carry out decompression processing.
First acquisition unit can obtain the compressed voice data of fragment, and the spoken assessment device is received point
The compressed voice data of piece.
Decompression units can it is compressed to fragment it is to be measured comment voice data carry out decompression processing, treat assessment convenient for subsequent
The processing of voice data.
In an alternative embodiment, the pressure alignment module is used for root using the decoding figure for having Kleene closure
The voice data repeated in voice data is commented to filter out by be measured according to referenced text.
It is described that alignment module is forced to use the decoding figure with Kleene closure that can repeat by be measured comment in voice data
Existing voice data filters out, to solve the problems, such as to repeat with reading.
In an alternative embodiment, the voice data acquisition module includes:
Second acquisition unit is configured as obtaining initial speech data;
Extraction unit is configured as extracting from initial speech data using the speech separating method clustered based on depth
Target speech data comments voice data as to be measured.
Second acquisition unit can obtain initial speech data, and extraction unit is extracted from initial speech data
Target speech data comments voice data as to be measured, and then can treat assessment voice data and make true and reliable marking.
One embodiment of this specification additionally provides a kind of device for generating spoken appraisal model, shows referring to Fig. 8, Fig. 8
According to the structural schematic diagram of the device of the spoken appraisal model of the generation of one embodiment of the application, comprising:
Training sample generation module 802 is configured as generating Oral Training sample set according to Oral Training sample group, each
The Oral Training sample group include Oral Training sample, each Oral Training sample include spoke language attribute characteristic information and
The corresponding marking result of the spoke language attribute characteristic information;
Model generation module 804 is configured as being trained spoken appraisal model according to the Oral Training sample set,
The spoken appraisal model is obtained, the spoken language appraisal model believes the spoke language attribute feature of each Oral Training sample
Breath is associated with corresponding marking result.
In the training stage of spoken appraisal model, by the spoke language attribute characteristic information of each Oral Training sample with it is corresponding
Result of giving a mark is associated, and the spoken language appraisal model is enabled accurately according to spoke language attribute characteristic information, to treat assessment voice
It scores.Wherein, spoken appraisal model can be used a variety of neural network models and realize, the spoken appraisal model in the present embodiment
TDNN network model can be used.
In an alternative embodiment, the device of spoken appraisal model is generated further include: sound channel length normalized
Module is configured as carrying out sound channel length normalized to the voice data of the Oral Training sample, by all mouths
The sound channel length that Oral Training sample in sample group is practiced in speech therapy is unitized.
In an alternative embodiment, the spoke language attribute characteristic information includes:
Pronunciation attributive character information;
Intonation attributive character information;
And fluency attributive character information.
The content of the spoke language attribute characteristic information is extended, the direction for expanding the assessment of spoken appraisal model is more
Sample, so that the marking result of spoken appraisal model is more true and reliable.
One embodiment of this specification additionally provides a kind of calculating equipment, including memory, processor and is stored in memory
Computer instruction that is upper and can running on a processor, the processor realize the spoken assessment method when executing described instruction
Or the step of method for generating spoken appraisal model.
One embodiment of this specification additionally provides a kind of computer readable storage medium, is stored with computer instruction, should
The step of spoken assessment method or method for generating spoken appraisal model, is realized in instruction when being executed by processor.
A kind of exemplary scheme of above-mentioned computer readable storage medium for the present embodiment.It should be noted that this is deposited
The technical side of the technical solution of storage media and the above-mentioned spoken assessment method or the method for generating spoken appraisal model
Case belongs to same design, and the detail content that the technical solution of storage medium is not described in detail may refer to above-mentioned spoken assessment
The description of the technical solution of method.
The computer instruction includes computer program code, the computer program code can for source code form,
Object identification code form, executable file or certain intermediate forms etc..The computer-readable medium may include: that can carry institute
State any entity or device, recording medium, USB flash disk, mobile hard disk, magnetic disk, CD, the computer storage of computer program code
Device, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory),
Electric carrier signal, telecommunication signal and software distribution medium etc..It should be noted that the computer-readable medium include it is interior
Increase and decrease appropriate can be carried out according to the requirement made laws in jurisdiction with patent practice by holding, such as in certain jurisdictions of courts
Area does not include electric carrier signal and telecommunication signal according to legislation and patent practice, computer-readable medium.
It should be noted that for the various method embodiments described above, describing for simplicity, therefore, it is stated as a series of
Combination of actions, but those skilled in the art should understand that, the application is not limited by the described action sequence because
According to the application, certain steps can use other sequences or carry out simultaneously.Secondly, those skilled in the art should also know
It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules might not all be this Shen
It please be necessary.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, it may refer to the associated description of other embodiments.
The application preferred embodiment disclosed above is only intended to help to illustrate the application.There is no detailed for alternative embodiment
All details are described, are not limited the invention to the specific embodiments described.Obviously, according to the content of this specification,
It can make many modifications and variations.These embodiments are chosen and specifically described to this specification, is in order to preferably explain the application
Principle and practical application, so that skilled artisan be enable to better understand and utilize the application.The application is only
It is limited by claims and its full scope and equivalent.
Claims (21)
1. a kind of spoken language assessment method, the data processing end applied to spoken evaluation system characterized by comprising
It obtains and to be measured comments voice data;
Extract the spoke language attribute characteristic information to be measured for commenting voice data;
It is beaten based on pre-generated spoken appraisal model according to the spoke language attribute characteristic information to be measured for commenting voice data
Divide result.
2. it is according to claim 1 spoken language assessment method, which is characterized in that it is described obtain it is to be measured comment voice data after,
Include: before the extraction spoke language attribute characteristic information to be measured for commenting voice data
To it is described it is to be measured comment voice data carry out voice terminate to detect;
To voice terminate detection after it is to be measured comment voice data carry out sound channel length normalized;
It comments voice data to carry out speech recognition to obtain voice data to be measured of commenting to by the to be measured of sound channel length normalized
Voice frame sequence and converting text corresponding with voice frame sequence;
The converting text and referenced text are subjected to pressure registration process, and generate pressure alignment information, the pressure alignment
Information includes the corresponding relationship of the voice frame sequence of converting text Yu the voice frame sequence of the referenced text.
3. it is according to claim 2 spoken language assessment method, which is characterized in that it is described it is to be measured comment voice data carry out language
Sound terminates detection
To it is described it is to be measured comment voice data to identify, and the comment sound data to be measured after identification are converted to comprising temporal information
To the end of detect speech text;
It carries out matching acquisition matched text time point with the speech text that detects to the end of according to the referenced text;
The knot to be measured for commenting voice data is determined according to preset voice activity detection method and the matched text time point
Beam time point.
4. spoken language assessment method according to claim 3, which is characterized in that
The spoke language attribute characteristic information includes:
Pronunciation attributive character information;
Intonation attributive character information;
And fluency attributive character information.
5. spoken language assessment method according to claim 4, which is characterized in that the spoke language attribute characteristic information further include:
Word marks characteristic information;
After alignment information is forced in the generation, further includes:
According to alignment information is forced, judge whether the conversion text matches with the referenced text, if it is not, then to referenced text
In do not match that word addition word mark characteristic information.
6. spoken language assessment method according to claim 1, which is characterized in that the acquisition is to be measured to comment VoP to include:
Obtain the compressed voice data of fragment;
Decompression processing is carried out to the compressed voice data of fragment to obtain to be measured commenting voice data.
7. spoken language assessment method according to claim 2, which is characterized in that during the pressure registration process,
The voice data repeated in voice data is commented to be filtered by be measured using the decoding figure with Kleene closure.
8. spoken language assessment method according to claim 1, which is characterized in that the acquisition is to be measured to comment VoP to include:
Obtain initial speech data;
Using the speech separating method clustered based on depth, extracted from the initial speech data target speech data be used as to
Assessment voice data.
9. a kind of method for generating spoken appraisal model characterized by comprising
Oral Training sample set is generated according to Oral Training sample group, each Oral Training sample group includes Oral Training sample
This, each Oral Training sample includes spoke language attribute characteristic information and the corresponding marking knot of the spoke language attribute characteristic information
Fruit;
Spoken appraisal model is trained according to the Oral Training sample set, the spoken language appraisal model makes each described
The spoke language attribute characteristic information of Oral Training sample is associated with corresponding marking result.
10. spoken language assessment method according to claim 9, which is characterized in that described according to the Oral Training sample set
Include: before being trained to spoken appraisal model
Sound channel length normalized is carried out to the voice data of the Oral Training sample, by all Oral Training samples
The sound channel length of Oral Training sample is unitized in group.
11. a kind of spoken language assessment device characterized by comprising
Voice data obtains module, is configured as obtaining and to be measured comments voice data;
Attributive character extraction module is configured as extracting the spoke language attribute characteristic information to be measured for commenting voice data;
It gives a mark result-generation module, is configured as to be measured commenting voice data according to described based on pre-generated spoken appraisal model
Spoke language attribute characteristic information obtain marking result.
12. device according to claim 11, which is characterized in that further include:
Voice terminates detection module, be configured as to it is described it is to be measured comment voice data carry out voice terminate to detect;
Sound channel degree normalized module, it is long that be measured after being configured as terminating voice detection comments voice data to carry out sound channel
Spend normalized;
Speech recognition module is configured as commenting voice data to carry out speech recognition to by the to be measured of sound channel length normalized
Obtain the converting text to be measured for commenting voice data;
Alignment module is forced, is configured as the converting text and referenced text carrying out pressure registration process, and generate pressure
Alignment information, the pressure alignment information includes the voice frame sequence of converting text and the voice frame sequence of the referenced text
Corresponding relationship.
13. device according to claim 12, which is characterized in that the voice terminates detection module and includes:
Recognition unit, be configured as to it is described it is to be measured comment voice data to identify, and described after identification to be measured is commented into voice data
It is converted to and detects speech text to the end of comprising temporal information;
Matching unit is configured as carrying out matching acquisition matching with the speech text that detects to the end of according to the referenced text
Text time point;
Determination unit, be configured as being determined according to preset voice activity detection method and the matched text time point it is described to
The end time point of assessment voice data.
14. device according to claim 12, which is characterized in that further include:
Mark module is configured as judging whether the conversion text matches with the referenced text according to alignment information is forced,
If it is not, then to the word addition word mark characteristic information not matched that in referenced text.
15. device according to claim 12, which is characterized in that the voice data obtains module and includes:
First acquisition unit is configured as obtaining the compressed voice data of fragment;
Decompression units, are configured as carrying out decompression processing to the compressed voice data of fragment obtaining to be measured commenting voice data.
16. device according to claim 12, which is characterized in that the pressure alignment module, using with Kleene closure
Decoding figure, for commenting the voice data repeated in voice data to filter out for be measured according to referenced text.
17. device according to claim 11, which is characterized in that the voice data obtains module and includes:
Second acquisition unit is configured as obtaining initial speech data;
Extraction unit is configured as extracting target from initial speech data using the speech separating method clustered based on depth
Voice data comments voice data as to be measured.
18. a kind of device for generating spoken appraisal model characterized by comprising
Training sample generation module is configured as generating Oral Training sample set, each mouth according to Oral Training sample group
It includes Oral Training sample that sample group is practiced in speech therapy, and each Oral Training sample includes spoke language attribute characteristic information and the mouth
The corresponding marking result of language attributive character information;
Model generation module is configured as being trained spoken appraisal model according to the Oral Training sample set, obtains institute
State spoken appraisal model, the spoken language appraisal model make the spoke language attribute characteristic information of each Oral Training sample with it is right
The marking result answered is associated.
19. device according to claim 18, which is characterized in that further include:
Sound channel length normalized module is configured as the voice data progress sound channel length to the Oral Training sample and returns
One change processing, the sound channel length of Oral Training sample in all Oral Training sample groups is unitized.
20. a kind of calculating equipment including memory, processor and stores the calculating that can be run on a memory and on a processor
Machine instruction, which is characterized in that the processor realizes method described in claim 1-10 any one when executing described instruction
The step of.
21. a kind of computer readable storage medium, is stored with computer instruction, which is characterized in that the instruction is held by processor
The step of claim 1-10 any one the method is realized when row.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811422784.1A CN109272992B (en) | 2018-11-27 | 2018-11-27 | Spoken language evaluation method and device for generating spoken language evaluation model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811422784.1A CN109272992B (en) | 2018-11-27 | 2018-11-27 | Spoken language evaluation method and device for generating spoken language evaluation model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109272992A true CN109272992A (en) | 2019-01-25 |
CN109272992B CN109272992B (en) | 2022-03-18 |
Family
ID=65190827
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811422784.1A Active CN109272992B (en) | 2018-11-27 | 2018-11-27 | Spoken language evaluation method and device for generating spoken language evaluation model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109272992B (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109859741A (en) * | 2019-01-31 | 2019-06-07 | 成都终身成长科技有限公司 | Voice assessment method, device, electronic equipment and storage medium |
CN109903594A (en) * | 2019-04-02 | 2019-06-18 | 北京儒博科技有限公司 | Spoken language exercise householder method, device, equipment and storage medium |
CN109979256A (en) * | 2019-04-24 | 2019-07-05 | 上海乂学教育科技有限公司 | English pronunciation teaching system |
CN110136721A (en) * | 2019-04-09 | 2019-08-16 | 北京大米科技有限公司 | A kind of scoring generation method, device, storage medium and electronic equipment |
CN110491369A (en) * | 2019-07-24 | 2019-11-22 | 北京大米科技有限公司 | Appraisal procedure, device, storage medium and the electronic equipment of spoken grade |
CN111402924A (en) * | 2020-02-28 | 2020-07-10 | 联想(北京)有限公司 | Spoken language evaluation method and device and computer readable storage medium |
CN111477251A (en) * | 2020-05-21 | 2020-07-31 | 北京百度网讯科技有限公司 | Model evaluation method and device and electronic equipment |
CN111554143A (en) * | 2020-03-31 | 2020-08-18 | 北京学格科技有限公司 | Evaluation method and device based on CO-MIRT algorithm model |
CN111639220A (en) * | 2020-05-14 | 2020-09-08 | 广东小天才科技有限公司 | Spoken language evaluation method and device, electronic equipment and storage medium |
CN111639635A (en) * | 2020-05-26 | 2020-09-08 | 广东小天才科技有限公司 | Processing method and device for shot picture, electronic equipment and storage medium |
CN111831820A (en) * | 2020-03-11 | 2020-10-27 | 昆明理工大学 | News and case correlation analysis method based on case element guidance and deep clustering |
CN111986680A (en) * | 2020-08-26 | 2020-11-24 | 天津洪恩完美未来教育科技有限公司 | Method and device for evaluating spoken language of object, storage medium and electronic device |
CN112309371A (en) * | 2019-07-30 | 2021-02-02 | 上海流利说信息技术有限公司 | Intonation detection method, apparatus, device and computer readable storage medium |
CN112614509A (en) * | 2020-12-11 | 2021-04-06 | 北京读我网络技术有限公司 | Method and device for generating voice evaluation installation package and executing voice evaluation |
CN113380257A (en) * | 2021-06-08 | 2021-09-10 | 深圳市同行者科技有限公司 | Multi-terminal smart home response method, device, equipment and storage medium |
CN115346421A (en) * | 2021-05-12 | 2022-11-15 | 北京猿力未来科技有限公司 | Spoken language fluency scoring method, computing device and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101145283A (en) * | 2006-09-12 | 2008-03-19 | 董明 | Embedded type language teaching machine with pronunciation quality evaluation |
CN101739868A (en) * | 2008-11-19 | 2010-06-16 | 中国科学院自动化研究所 | Automatic evaluation and diagnosis method of text reading level for oral test |
US20120065977A1 (en) * | 2010-09-09 | 2012-03-15 | Rosetta Stone, Ltd. | System and Method for Teaching Non-Lexical Speech Effects |
CN106782502A (en) * | 2016-12-29 | 2017-05-31 | 昆山库尔卡人工智能科技有限公司 | A kind of speech recognition equipment of children robot |
CN108766415A (en) * | 2018-05-22 | 2018-11-06 | 清华大学 | A kind of voice assessment method |
-
2018
- 2018-11-27 CN CN201811422784.1A patent/CN109272992B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101145283A (en) * | 2006-09-12 | 2008-03-19 | 董明 | Embedded type language teaching machine with pronunciation quality evaluation |
CN101739868A (en) * | 2008-11-19 | 2010-06-16 | 中国科学院自动化研究所 | Automatic evaluation and diagnosis method of text reading level for oral test |
US20120065977A1 (en) * | 2010-09-09 | 2012-03-15 | Rosetta Stone, Ltd. | System and Method for Teaching Non-Lexical Speech Effects |
CN106782502A (en) * | 2016-12-29 | 2017-05-31 | 昆山库尔卡人工智能科技有限公司 | A kind of speech recognition equipment of children robot |
CN108766415A (en) * | 2018-05-22 | 2018-11-06 | 清华大学 | A kind of voice assessment method |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109859741A (en) * | 2019-01-31 | 2019-06-07 | 成都终身成长科技有限公司 | Voice assessment method, device, electronic equipment and storage medium |
CN109903594A (en) * | 2019-04-02 | 2019-06-18 | 北京儒博科技有限公司 | Spoken language exercise householder method, device, equipment and storage medium |
CN110136721A (en) * | 2019-04-09 | 2019-08-16 | 北京大米科技有限公司 | A kind of scoring generation method, device, storage medium and electronic equipment |
CN109979256A (en) * | 2019-04-24 | 2019-07-05 | 上海乂学教育科技有限公司 | English pronunciation teaching system |
CN110491369A (en) * | 2019-07-24 | 2019-11-22 | 北京大米科技有限公司 | Appraisal procedure, device, storage medium and the electronic equipment of spoken grade |
CN112309371A (en) * | 2019-07-30 | 2021-02-02 | 上海流利说信息技术有限公司 | Intonation detection method, apparatus, device and computer readable storage medium |
CN111402924A (en) * | 2020-02-28 | 2020-07-10 | 联想(北京)有限公司 | Spoken language evaluation method and device and computer readable storage medium |
CN111402924B (en) * | 2020-02-28 | 2024-04-19 | 联想(北京)有限公司 | Spoken language evaluation method, device and computer readable storage medium |
CN111831820A (en) * | 2020-03-11 | 2020-10-27 | 昆明理工大学 | News and case correlation analysis method based on case element guidance and deep clustering |
CN111831820B (en) * | 2020-03-11 | 2022-07-19 | 昆明理工大学 | News and case correlation analysis method based on case element guidance and deep clustering |
CN111554143A (en) * | 2020-03-31 | 2020-08-18 | 北京学格科技有限公司 | Evaluation method and device based on CO-MIRT algorithm model |
CN111639220A (en) * | 2020-05-14 | 2020-09-08 | 广东小天才科技有限公司 | Spoken language evaluation method and device, electronic equipment and storage medium |
CN111477251B (en) * | 2020-05-21 | 2023-09-05 | 北京百度网讯科技有限公司 | Model evaluation method and device and electronic equipment |
CN111477251A (en) * | 2020-05-21 | 2020-07-31 | 北京百度网讯科技有限公司 | Model evaluation method and device and electronic equipment |
CN111639635A (en) * | 2020-05-26 | 2020-09-08 | 广东小天才科技有限公司 | Processing method and device for shot picture, electronic equipment and storage medium |
CN111639635B (en) * | 2020-05-26 | 2024-02-27 | 广东小天才科技有限公司 | Processing method and device for shooting pictures, electronic equipment and storage medium |
CN111986680A (en) * | 2020-08-26 | 2020-11-24 | 天津洪恩完美未来教育科技有限公司 | Method and device for evaluating spoken language of object, storage medium and electronic device |
CN112614509A (en) * | 2020-12-11 | 2021-04-06 | 北京读我网络技术有限公司 | Method and device for generating voice evaluation installation package and executing voice evaluation |
CN112614509B (en) * | 2020-12-11 | 2022-11-01 | 北京读我网络技术有限公司 | Method and device for generating voice evaluation installation package and executing voice evaluation |
CN115346421A (en) * | 2021-05-12 | 2022-11-15 | 北京猿力未来科技有限公司 | Spoken language fluency scoring method, computing device and storage medium |
CN113380257A (en) * | 2021-06-08 | 2021-09-10 | 深圳市同行者科技有限公司 | Multi-terminal smart home response method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109272992B (en) | 2022-03-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109272992A (en) | A kind of spoken language assessment method, device and a kind of device for generating spoken appraisal model | |
CN102800314B (en) | English sentence recognizing and evaluating system with feedback guidance and method | |
CN106847260B (en) | Automatic English spoken language scoring method based on feature fusion | |
US20230267916A1 (en) | Text-based virtual object animation generation method, apparatus, storage medium, and terminal | |
Kelly et al. | Deep neural network based forensic automatic speaker recognition in VOCALISE using x-vectors | |
CN110246488B (en) | Voice conversion method and device of semi-optimized cycleGAN model | |
CN109326283A (en) | Multi-to-multi phonetics transfer method under non-parallel text condition based on text decoder | |
CN109243494B (en) | Children emotion recognition method based on multi-attention mechanism long-time memory network | |
CN104050965A (en) | English phonetic pronunciation quality evaluation system with emotion recognition function and method thereof | |
CN109508402A (en) | Violation term detection method and device | |
CN109979486B (en) | Voice quality assessment method and device | |
CN109377981B (en) | Phoneme alignment method and device | |
CN114023300A (en) | Chinese speech synthesis method based on diffusion probability model | |
CN102945673A (en) | Continuous speech recognition method with speech command range changed dynamically | |
CN107767881A (en) | A kind of acquisition methods and device of the satisfaction of voice messaging | |
CN115394287A (en) | Mixed language voice recognition method, device, system and storage medium | |
CN114613387A (en) | Voice separation method and device, electronic equipment and storage medium | |
CN111785236A (en) | Automatic composition method based on motivational extraction model and neural network | |
Han et al. | [Retracted] The Modular Design of an English Pronunciation Level Evaluation System Based on Machine Learning | |
CN112820266A (en) | Parallel end-to-end speech synthesis method based on skip coder | |
CN111341346A (en) | Language expression capability evaluation method and system for fusion depth language generation model | |
CN117095669A (en) | Emotion voice synthesis method, system, equipment and medium based on variation automatic coding | |
CN111259188A (en) | Lyric alignment method and system based on seq2seq network | |
CN117012230A (en) | Evaluation model for singing pronunciation and character biting | |
Bao et al. | [Retracted] An Auxiliary Teaching System for Spoken English Based on Speech Recognition Technology |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: Units F01-03 and 05-10 on the 6th floor of No.1 Building, No.8 Courtyard, Guangshun South Street, Chaoyang District, Beijing Applicant after: Beijing Ape Power Future Technology Co., Ltd. Address before: Units F01-03 and 05-10 on the 6th floor of No.1 Building, No.8 Courtyard, Guangshun South Street, Chaoyang District, Beijing Applicant before: Beijing chalk Future Technology Co., Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |