CN105719662B

CN105719662B - Dysarthrosis detection method and system

Info

Publication number: CN105719662B
Application number: CN201610264854.XA
Authority: CN
Inventors: 李明; 赵志洁
Original assignee: Sun Yat Sen University; SYSU CMU Shunde International Joint Research Institute
Current assignee: Sun Yat Sen University; SYSU CMU Shunde International Joint Research Institute
Priority date: 2016-04-25
Filing date: 2016-04-25
Publication date: 2019-10-25
Anticipated expiration: 2036-04-25
Also published as: CN105719662A

Abstract

The present invention relates to a kind of dysarthrosis detection method and systems.Above-mentioned dysarthrosis detection method includes the following steps: to read the data that electromagnetic pronunciation instrument generates, obtains the coordinates of motion data of audio data and its synchronization；The corresponding sub- motion track information of each words pronunciation is extracted from the motion track information according to the audio data；The sub- motion track information is corresponding with words pronunciation each in reference voice library with reference to motion track information progress characteristic operation, obtain likelihood probability value；Wherein the reference voice library is the speech database for including each words normal articulation；Dysarthrosis detection is carried out to the user according to likelihood probability value.Dysarthrosis detection method and system provided by the invention can use each words pronunciation in data and corresponding sub- motion track information, be improved the accuracy of testing result.

Description

Dysarthrosis detection method and system

Technical field

The present invention relates to voice processing technology fields, more particularly to a kind of dysarthrosis detection method and system.

Background technique

Currently, dysarthrosis detection technique research is in the stage of an initial development.Nowadays the structure of hospital rehabilitation department Sound impaired patients diagnostic mode is mainly assessed according to the diagnostic experiences of medical teacher and subjective Auditory Perception, time-consuming and laborious and not It is objective enough to stablize.It can be to dysarthrosis moreover, carrying out diagnosis using radioactive ray imaging technique and nuclear-magnetism medical device technology The body of patient causes adverse effect, it is also necessary to spend expensive medical instrument expense.It is existing about dysarthrosis Appraisal procedure mainly includes graphical method, phonetic symbol method, standardized test detection method and instrument inspection technique etc..

Above-mentioned dysarthrosis detection scheme is mainly concerned with speech intelligibility assessment, Diadochokinetic rate assessment and nose Flow detection etc. is easy to influence the accuracy of testing result.

Summary of the invention

Based on this, it is necessary to be easy the technical issues of influencing dysarthrosis detection accuracy for traditional scheme, provide one Kind dysarthrosis detection method and system.

A kind of dysarthrosis detection method, includes the following steps:

The voice data that electromagnetic pronunciation instrument generates is read, audio data and its corresponding is obtained according to the voice data Motion track information；Wherein, the sensor of the electromagnetic pronunciation instrument is mounted on the position of articulation of user, and the voice data is When user pronounces according to setting words, data that electromagnetic pronunciation instrument is obtained in user pronunciation sensed position；

The corresponding sub- motion profile of each words pronunciation is extracted from the motion track information according to the audio data Information；

The sub- motion track information is corresponding with words pronunciation each in reference voice library with reference to motion track information Characteristic operation is carried out, likelihood probability value is obtained；Wherein the reference voice library is the voice for including each words normal articulation Database；

Dysarthrosis detection is carried out to the user according to likelihood probability value.

A kind of dysarthrosis detection system, comprising:

Read module obtains audio according to the voice data for reading the voice data of electromagnetic pronunciation instrument generation Data and its corresponding motion track information；Wherein, the sensor of the electromagnetic pronunciation instrument is mounted on the position of articulation of user, When the voice data is that user pronounces according to setting words, electromagnetic pronunciation instrument is obtained in user pronunciation sensed position Data；

Extraction module, it is corresponding for extracting each words pronunciation from the motion track information according to the audio data Sub- motion track information；

Module is obtained, is used for sub- motion track information reference corresponding with words pronunciation each in reference voice library Motion track information carries out characteristic operation, obtains likelihood probability value；Wherein the reference voice library be include each words just The speech database often to pronounce；

Detection module, for carrying out dysarthrosis detection to the user according to likelihood probability value.

Above-mentioned dysarthrosis detection method and system, the voice data generated by reading electromagnetic pronunciation instrument, from described The corresponding sub- motion track information of each words pronunciation is extracted in motion track information；By the sub- motion track information and reference Each words pronunciation is corresponding in sound bank carries out characteristic operation with reference to motion track information, likelihood probability value is obtained, thus real The dysarthrosis detection of the existing user, make above-mentioned dysarthrosis detection scheme can use each words pronunciation in data and Corresponding sub- motion track information, is improved the accuracy of testing result.

Detailed description of the invention

Fig. 1 is the dysarthrosis detection method flow chart of one embodiment；

Fig. 2 is that the sensor of one embodiment pastes schematic diagram；

Fig. 3 is the probability distribution schematic diagram of one embodiment；

Fig. 4 is the dysarthrosis detection system structure of one embodiment.

Specific embodiment

The specific embodiment of dysarthrosis detection method and system of the invention is described in detail with reference to the accompanying drawing.

It show the dysarthrosis detection method flow chart of one embodiment with reference to Fig. 1, Fig. 1, includes the following steps:

S10, reads the voice data that electromagnetic pronunciation instrument generates, and obtains audio data and its right according to the voice data The motion track information answered；Wherein, the sensor of the electromagnetic pronunciation instrument is mounted on the position of articulation of user, the voice number When according to being pronounced for user according to setting words, data that electromagnetic pronunciation instrument is obtained in user pronunciation sensed position；

The speech research system installed on above-mentioned electromagnetic pronunciation instrument is a non line of sight motion capture system, passes through above-mentioned speech Language studies system, and electromagnetic pronunciation instrument can be acquired including two synchronous documents of audio data and its motion track information of synchronization, Wherein audio data is wav format, and motion track information is tsv format.Above-mentioned electromagnetic pronunciation instrument may be mounted at user's Position of articulation, the vocal organs position including above-mentioned user obtain data when user pronounces to setting words.It is above-mentioned Setting words can be one or more words corresponding to normal articulation included by reference voice library.

In one embodiment, the reference sensor of above-mentioned electromagnetic pronunciation instrument can be pasted onto the place between the eyebrows position of user, After six microsensors of electromagnetic pronunciation instrument are successively pasted onto the lingual surface of user, before lingual surface, the tip of the tongue, lower gums, upper mouth Lip, lower lip.

The stickup order of the sensor of above-mentioned electromagnetic pronunciation instrument may include: that reference sensor is pasted onto place between the eyebrows position Set, be secondly the stickup of oral sensor, concrete operations are after 6 microsensors are successively pasted onto lingual surface, before lingual surface, tongue The paste position of sharp, lower gums, upper lip, lower lip, above 6 microsensors can be as shown in Figure 2.Sensor is being pasted When be that edible medical oral cavity quick-drying adhesive in intraoral microsensor adhesive portion first has to counterpart Chamber clear up and to dry lingual surface with gauze intraoral right in order to be pasted microsensor with oral cavity quick-drying adhesive The position answered, it is noted that the interval between three microsensors is about 10mm when pasting microsensor on lingual surface (millimeter).It also needs to utilize intraoral microsensor the mixed of oral cavity later since the bonding force of rapid-curing cutback glue is weaker Glue is closed to be fixed.Since the line of microsensor is very thin, is easy to be torn and have certain length, so viscous Microsensor is posted also to need to carry out fixation to the line of microsensor later.Before carrying out data acquisition in order to allow by Picker is adapted to speak in the case where pasting microsensor in mouth, and collected user can be allowed first to say Words practice is practiced adapting to contain microsensor in the mouth and speak, when above-mentioned user, which feels, to be already adapted to after again formally Carry out the acquisition of voice data.

S20 extracts the corresponding sub- movement of each words pronunciation according to the audio data from the motion track information Trace information；

In above-mentioned steps, audio data coordinate data corresponding with motion track information is synchronization signal, can be by voice Signal carries out paragraph alignment to obtain the beginning and ending time of each words pronunciation, then the data of corresponding coordinate is utilized synchronization Time is segmented.

In one embodiment, above-mentioned to extract each words reading from the motion track information according to the audio data The step of sound corresponding sub- motion track information may include:

The audio data is segmented, start-stop of each words pronunciation in audio data in voice data is obtained Time；

Processing is synchronized to the audio data and motion track information, obtains the corresponding sub- movement of each words pronunciation Trace information.

The present embodiment can use cutting techniques Meier domain cepstrum coefficient (the Mel Frequency of voice signal Cepstrum Coefficient, MFCC) audio data is segmented, using alignment algorithm DTW (Dynamic Time Warping, dynamic time warping) voice signal is aligned, phase is carried out to voice signal using gauss hybrid models (GMM) Compare like property, the automatic segmentation of voice is aligned to realize.In the segmentation pair for completing voice signal using speech recognition technology Speech recognition system can also obtain a Likelihood Score for voice signal while neat, which can be used as detection The standard of audio data resolution, and can judge whether the audio data needs to carry out manual segmentation by above-mentioned Likelihood Score Alignment.

Can also include: after obtaining the corresponding sub- motion track information of each words pronunciation as one embodiment

Obtain the corresponding Likelihood Score of each words pronunciation in voice data；

When the Likelihood Score is lower than preset likelihood threshold value, each pronunciation pair is obtained using manual annotated audio tool The sub- motion track information answered.

Above-mentioned likelihood threshold value can the words normal articulation feature according to corresponding to the audio data being segmented be configured.

Above-mentioned Likelihood Score is lower than the sub- motion track information of default likelihood threshold value, may indicate that electromagnetic pronunciation instrument is obtained The user voice data taken is excessively fuzzy, can not carry out automatic paragraph alignment to it with speech recognition technology.For this portion Multi-voice frequency data need to carry out manually paragraph alignment using artificial.It is further that manual annotated audio tool Praat can be used Obtain the corresponding sub- motion track information of each pronunciation.

As one embodiment, may include: using the basic step that Praat carries out voice annotation

Newly-built voice annotation object.The audio data for needing to carry out voice annotation is chosen in list object, is clicked " To TextGrid ... " under " Annotate- " keys in the title of layer to be marked and confirmation in new window, chooses TextGrid " Edit " button is clicked after object enters edit page.

Save mark file.Since Praat software will not automatically save, so in order to avoid the contents lost marked, It needs in time to save mark object.

Extract hierarchical data needed for mark object." Extract tier ... " is clicked after selected TextGrid object Button, the level number needed for new window input simultaneously confirms, chooses emerging object and clicks " Into TextGrid " and presses Button, chooses and newly-generated object and click " Edit " can check extracted hierarchical data.

Extract fragment data needed for mark object." Extract part ... " is clicked after selected TextGrid object Button, at the beginning of new window input required fragment data and the end time chooses " Preserve times " simultaneously true Recognize, extracted hierarchical data can be checked by choosing emerging object and clicking " Edit ".

Obtain the data in TextGrid object.It is achieved using the sub- option in Query- menu.

Obtain the data in mark file.It using shell script extracts whole TextGrid files, and will do and extract Data in file are saved in text file, and then obtained data can be imported into Excel table and be carried out into one The data analysis and process of step.

S30, the sub- motion track information is corresponding with words pronunciation each in reference voice library with reference to motion profile Information carries out characteristic operation, obtains likelihood probability value；Wherein the reference voice library be include each words normal articulation Speech database；

Above-mentioned steps can use the GUI in MATLAB, and sub- motion track information and corresponding reference motion profile are believed Breath Dynamically Announce comes out, the motion conditions of oral cavity organs point when intuitively showing user pronunciation, then carries out corresponding characteristic operation, To obtain the likelihood probability value of user's words pronunciation, above-mentioned likelihood probability value is lower, shows that user's dysarthrosis is more serious.

The normal hair for being adapted to detect for the whether normal all words of user pronunciation structure sound has been prestored in above-mentioned reference voice library Whether sound, other pronunciations or the pronunciation that can be used for detecting the corresponding words of its detection are normal.

In one embodiment, above-mentioned that the sub- motion track information is corresponding with words pronunciation each in reference voice library Reference motion track information carry out characteristic operation, obtain likelihood probability value the step of may include:

It is obtained in the corresponding reference motion track information of sub- motion track information and the sub- motion track information respectively Coordinate point sequence obtains voice coordinate sequence and reference coordinate sequence；Wherein, the voice coordinate sequence and reference coordinate sequence In coordinate points correspond；

Voice coordinate sequence and reference coordinate sequence are normalized respectively；

Voice coordinate sequence after normalized is fitted to voice coordinate curve, obtains the voice coordinate curve Each rank voice fitting coefficient；

It is reference coordinate curve by the reference coordinate sequence fit after normalized, obtains the reference coordinate curve Each rank refers to fitting coefficient；

Likelihood probability value is obtained according to the voice fitting coefficient and with reference to fitting coefficient.

In the present embodiment, coordinate point sequence obtains sub- motion track information using the paragraph alignment result of voice, to described The time of the corresponding motion track information of sub- motion track information and amplitude are normalized.Voice after above-mentioned normalized Coordinate curve and reference coordinate curve can be written as corresponding multi-order function, and each rank fitting coefficient of voice coordinate curve is Each level number of the corresponding multi-order function of above-mentioned voice coordinate curve, each rank of reference coordinate curve are above-mentioned ginseng with reference to coefficient Examine each level number of the corresponding multi-order function of coordinate curve.

It is above-mentioned similar with reference to fitting coefficient acquisition according to the voice fitting coefficient according to described as one embodiment The step of probability value may include:

Multivariate Gaussian probability density distribution model is established according to reference fitting coefficient；

The voice fitting coefficient is substituted into multivariate Gaussian probability density distribution model and obtains the similar of voice fitting coefficient Probability value.

S40 carries out dysarthrosis detection to the user according to likelihood probability value.

In one embodiment, the above-mentioned process for carrying out dysarthrosis detection to the user according to likelihood probability value can be with Include:

Judge the size relation between the likelihood probability value and preset structure sound threshold value；

If the likelihood probability value is less than preset structure sound threshold value, determining user, there are dysarthrosis.

Above-mentioned preset structure sound threshold value can be according to set by specific words feature, and usual structure sound threshold value then determines to use There are dysarthrosis at family.In the case of, multivariate Gaussian probability density distribution model is established according to reference coefficient and corresponds to user Voice coefficient substitute into the probability after above-mentioned multivariate Gaussian probability density distribution model respectively as shown in figure 3, in Fig. 3, abscissa Indicate sensor serial number, ordinate indicates probability value, then the corresponding voice coefficient of user can substituted into above-mentioned multivariate Gaussian Probability after probability density distribution model is less than structure sound threshold value, then determines that there are dysarthrosis by user.

Dysarthrosis detection method provided by the invention, by reading audio data that electromagnetic pronunciation instrument generates and its same The motion track information of step extracts the corresponding sub- motion track information of each words pronunciation from the motion track information；It will The sub- motion track information is corresponding with words pronunciation each in reference voice library to carry out feature fortune with reference to motion track information It calculates, obtains likelihood probability value, to realize the dysarthrosis detection of the user, allow above-mentioned dysarthrosis detection scheme benefit With in data each words pronunciation and corresponding sub- motion track information, be improved the accuracy of testing result.

Refering to what is shown in Fig. 4, Fig. 4 is the dysarthrosis detection system structure of one embodiment, comprising:

Read module 10 obtains sound according to the voice data for reading the voice data of electromagnetic pronunciation instrument generation Frequency evidence and its corresponding motion track information；Wherein, the sensor of the electromagnetic pronunciation instrument is mounted on the pronunciation position of user It sets, when the voice data is that user pronounces according to setting words, electromagnetic pronunciation instrument is obtained in user pronunciation sensed position The data taken；

Extraction module 20, for extracting each words pronunciation pair from the motion track information according to the audio data The sub- motion track information answered；

Module 30 is obtained, is used for sub- motion track information ginseng corresponding with words pronunciation each in reference voice library It examines motion track information and carries out characteristic operation, obtain likelihood probability value；Wherein the reference voice library be include each words The speech database of normal articulation；

Detection module 40, for carrying out dysarthrosis detection to the user according to likelihood probability value.

In one embodiment, said extracted module can be further used for:

As one embodiment, said extracted module can be further used for:

Dysarthrosis detection system provided by the invention and dysarthrosis detection method provided by the invention correspond, The technical characteristic and its advantages that the embodiment of the dysarthrosis detection method illustrates are suitable for dysarthrosis detection system In the embodiment of system, hereby give notice that.

Each technical characteristic of embodiment described above can be combined arbitrarily, for simplicity of description, not to above-mentioned reality It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, all should be considered as described in this specification.

The embodiments described above only express several embodiments of the present invention, and the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to protection of the invention Range.Therefore, the scope of protection of the patent of the invention shall be subject to the appended claims.

Claims

1. a kind of dysarthrosis detection method, which comprises the steps of:

The voice data that electromagnetic pronunciation instrument generates is read, audio data and its corresponding movement are obtained according to the voice data Trace information；Wherein, the sensor of the electromagnetic pronunciation instrument is mounted on the position of articulation of user, and the voice data is user When being pronounced according to setting words, data that electromagnetic pronunciation instrument is obtained in user pronunciation sensed position；The setting words Refer to one or more words corresponding to normal articulation included by reference voice library；

The corresponding sub- motion track information of each words pronunciation is extracted from the motion track information according to the audio data；

The sub- motion track information is corresponding with each words pronunciation in the reference voice library with reference to motion track information Characteristic operation is carried out, likelihood probability value is obtained；Wherein the reference voice library is the language for including each words normal articulation Sound database；

2. dysarthrosis detection method according to claim 1, which is characterized in that the reference of the electromagnetic pronunciation instrument passes Sensor is pasted onto the place between the eyebrows position of user, and six microsensors of electromagnetic pronunciation instrument are successively pasted onto the lingual surface of user Afterwards, before lingual surface, the tip of the tongue, lower gums, upper lip, lower lip.

3. dysarthrosis detection method according to claim 1, which is characterized in that it is described according to the audio data from institute Stating the step of corresponding sub- motion track information of each words pronunciation is extracted in motion track information includes:

The audio data is segmented, when obtaining start-stop of each words pronunciation in voice data in audio data Between；

Processing is synchronized to the audio data and motion track information, obtains the corresponding sub- motion profile of each words pronunciation Information.

4. dysarthrosis detection method according to claim 3, which is characterized in that obtaining, each words pronunciation is corresponding After sub- motion track information, further includes:

When the Likelihood Score is lower than preset likelihood threshold value, it is corresponding that each pronunciation is obtained using manual annotated audio tool Sub- motion track information.

5. dysarthrosis detection method according to claim 1, which is characterized in that described by the sub- motion track information It is corresponding with words pronunciation each in reference voice library to carry out characteristic operation with reference to motion track information, obtain likelihood probability value Step includes:

Coordinate is obtained in the corresponding reference motion track information of sub- motion track information and the sub- motion track information respectively Point sequence obtains voice coordinate sequence and reference coordinate sequence；Wherein, in the voice coordinate sequence and reference coordinate sequence Coordinate points correspond；

Voice coordinate sequence after normalized is fitted to voice coordinate curve, obtains each rank of the voice coordinate curve Voice fitting coefficient；

It is reference coordinate curve by the reference coordinate sequence fit after normalized, obtains each rank of the reference coordinate curve With reference to fitting coefficient；

6. dysarthrosis detection method according to claim 5, which is characterized in that described according to the voice fitting coefficient Include: with the step of reference fitting coefficient acquisition likelihood probability value

The voice fitting coefficient is substituted into the likelihood probability that multivariate Gaussian probability density distribution model obtains voice fitting coefficient Value.

7. dysarthrosis detection method according to claim 1, which is characterized in that it is described according to likelihood probability value to described User carry out dysarthrosis detection process include:

8. a kind of dysarthrosis detection system characterized by comprising

Read module obtains audio data according to the voice data for reading the voice data of electromagnetic pronunciation instrument generation And its corresponding motion track information；Wherein, the sensor of the electromagnetic pronunciation instrument is mounted on the position of articulation of user, described When voice data is that user pronounces according to setting words, number that electromagnetic pronunciation instrument is obtained in user pronunciation sensed position According to；The setting words refers to one or more words corresponding to normal articulation included by reference voice library；

Extraction module, for extracting the corresponding son of each words pronunciation from the motion track information according to the audio data Motion track information；

Module is obtained, is used for sub- motion track information reference corresponding with words pronunciation each in the reference voice library Motion track information carries out characteristic operation, obtains likelihood probability value；Wherein the reference voice library be include each words The speech database of normal articulation；

9. dysarthrosis detection system according to claim 8, which is characterized in that the extraction module is further used for:

10. dysarthrosis detection system according to claim 9, which is characterized in that the extraction module is further used for: