Summary of the invention
According to an aspect of the present invention, provide a kind of method that Automatic Logos text string natural-sounding pauses that is used for, the literary composition language that this pause is used for carrying out on electronic equipment is changed, and this method comprises:
Obtain the text string that comprises two ends, these two ends are starting ends and finish end;
Whether at least one word of analyzing in the text string exists natural-sounding to pause near judging this word, this analysis is based at least one predetermined threshold value that is used for word, and the quantity of the syllable between the end in this predetermined threshold value and this word and the text string two ends is associated; With
The natural-sounding pause is inserted in the synthetic speech signal output expression of text string.
Preferably, described at least one predetermined threshold value comprises P word (P_word) threshold value, and it is based on the quantity of the syllable between starting end and this word.
Preferably, described at least one predetermined threshold value comprises F word (F_word) threshold value, and it is based on the quantity that finishes the syllable between end and this word.
Preferably, described at least one predetermined threshold value is determined by following step:
Training set to oral account record (transcription) provides at least one to be paused by the natural-sounding that identifier identified that inserts;
Word in each oral account record is designated P word and F word;
P word and F word that statistics ground analyzing and training is concentrated;
From The result of statistics, determine F word threshold value and P word threshold value.
Preferably, the natural-sounding of insertion pauses and also can comprise and be designated the pause that part of speech (POS) pattern is paused naturally.
Preferably, the natural-sounding of insertion pauses and also can comprise and be designated the pause that portmanteau word pauses naturally.
Embodiment
Referring to Fig. 1, show electronic equipment 100 with wireless telephonic form, this electronic equipment 100 comprises device handler 102, and it is connected to user interface 104 effectively by bus 103, and typically, user interface 104 is touch screen or display screen and keypad.Electronic equipment 100 also has language corpus 106, voice operation demonstrator 110, nonvolatile memory 120, ROM (read-only memory) 118 and wireless communication module 116, and they all are connected to processor 102 effectively by bus 103.Voice operation demonstrator 110 has output terminal, and this output terminal connects and driving loudspeaker 112.Corpus 106 comprises the speech waveform PUW expression of word or phoneme and correlated sampling, digitized and that handled.In other words, as described below, nonvolatile memory 120 (memory module) provides and has been used for the synthetic text string of literary composition language conversion (TTS) (text can be received by module 116 or miscellaneous equipment).The waveform language corpus also comprises the oral account record of expression phrase and corresponding sampling and digitized speech waveform and is positioned at text string with the position of natural pause boundary-related as described below.
As the skilled person will be apparent, typically, radio frequency communications unit 116 is a receiver and a transmitter with combination of common antenna.This radio frequency communications unit 116 has the transceiver that is connected to antenna by radio frequency amplifier.This transceiver is also connected to the public modulator/demodulator that communication unit 116 is connected to processor 102.Simultaneously, in this embodiment, nonvolatile memory 120 (memory module) stores programmable phonebook database Db, and ROM (read-only memory) 118 stores the operation code (OC) that is used for device handler 102.
Referring to Fig. 2, the method 200 that is used for definite threshold value that is associated with the natural-sounding pause of text string has been described.This threshold value is based on the forward and backward a plurality of syllables in the record of the oral account among the training set TS.After beginning step 210, method 200 is implemented step 220 is provided, and being used for provides at least one to be paused by the natural-sounding that manual punctuation mark that inserts or identifier " | " are identified to the training set TS of oral account record (some sentences typically).Fig. 3 A has illustrated such oral account record or sentence example in 3D.One 300 in these oral account records is " Based on our history|in China, ", and it has natural-sounding and pauses 310 between word " history " and " in ".For oral account record 300, a starting end 305 and an end end 315 are arranged.As the skilled person will be apparent, Fig. 3 A has at least one natural-sounding pause 310 and starting end 305 and finishes end 315 to all oral account records 300 among the 3D.These are given an oral account shown in further being analyzed as follows of record:
Based=2 syllable
On=1 syllable
Our=1 syllable
History=3 syllable
In=1 syllable
China=2 syllable
Simultaneously, each word in the oral account record can be designated as: (i) P word: be close in the oral account record front, by the word of pause naturally of punctuation mark " | " sign; (ii) F word: be close in the oral account record back, by the word of pause naturally of punctuation mark " | " sign; (iii) medium term: the word that the next door does not have natural-sounding to pause in the oral account record.After
step 220, identification of
steps 230 will be designated (i) P word to the word in each oral account record; (ii) F word; Or (iii) medium term.Thus, for oral account record " Based onour history|in China, ", following table 1 has identified the attribute of each word in the oral account record:
Word | The P word | The F word | Syllable quantity | Pause |
Based | N | N | 0 | N |
on | N | N | 2 | N |
our | N | N | 3 | N |
history | N | Y | 4 | After |
in | Y | N | 7 | Before |
China | N | N | 1 | N |
The analysis of table 1 pair oral account record " Based on our history in China "
Then, method 200 is carried out statistical study step 240.In this step 240, if the training set TS that is provided has 90,000 oral account records (for example sentence) and supposition word " in " has occurred 10 in training set, 000 time words, for these 10,000 examples of " in ", can observe following statistical study so:
(i) quantity=8,000 examples of (OPW) appear in " in " as the P word;
(ii) quantity=1,000 example of (OFW) appears in " in " as the F word;
(iii) quantity=1,000 example of (ONW) appears in " in " as middle word (neither P word, neither F word);
Further, in the appearance of 8,000 examples of " in " that from training set TS, identifies, can observe following statistical study as the P word:
(i) 8 or more syllable (OPS)=0 appear in the front;
(ii) 7 syllables (OPS)=400 appear in the front;
(iii) 6 syllables (OPS)=600 appear in the front;
(iv) 5 syllables (OPS)=2,000 appear in the front;
(v) 4 syllables (OPS)=3,000 appear in the front;
(vi) 3 syllables (OPS)=1,000 appear in the front;
(vii) 2 syllables (OPS)=1,000 appear in the front;
(viii) 1 syllable (OPS)=0 appears in the front;
Intuition and selected inspiration rate (heuristic ratio) HR of test are 0.75, and it is used for determining the P word pause threshold value PT of word " in ".This threshold value PT determines that in definite threshold value step 250 its step is as follows:
Minimum number from the maximum quantity of observed syllable to observed syllable is carried out from the OPS of maximum, up to:
OPS and/OPW 0.75
PT is chosen for quantity by the observed syllable that last OPS identified in the OPS summation;
Finish.
Therefore, the PT of " in " will determine as follows in step 250:
400/8,7 of 000=0.05 are syllable the preceding;
(400+600)/8,6 of 000=0.125 syllable the preceding;
(400+600+2,000)/8,5 of 000=0.375 are syllable the preceding;
(400+600+2,000+3,000)/8,4 of 000=0.75 are syllable the preceding;
Therefore PT is chosen as 4.
Use similar statistical study to come to determine the F word pause threshold value of " in ", reuse 0.75 inspiration rate HR in step 250.Simultaneously, determine PT and FT value (using 0.75 inspiration rate HR) for the example of all other P words of all other words among the training set TS and F word.Method 200 finishes in step 260 subsequently, and all the P words of all words and the example of F word all are stored in the nonvolatile memory 120 among the training set TS.
Referring to Fig. 4, the method 400 of the natural-sounding pause that is used for Automatic Logos text string STR has been described, the literary composition language that this pause is used for carrying out on electronic equipment 100 is changed.After beginning step 410, method 400 implements to obtain the step 420 of the text string STR that comprises two ends, and these two ends are starting end SE and finish end FE.Select word step 430 to select a word (perhaps portmanteau word CW), analytical procedure 440 is used for analyzing at least one word (or portmanteau word CW) of text string STR, near judging this word (or portmanteau word CW), whether exist natural-sounding to pause, this analysis is based at least one predetermined threshold value (PT or FT) of this word, and the quantity of the syllable between the end in the two ends of this threshold value and this word and text string is associated.Threshold value comprises P word threshold value PT, and it is based on the quantity of the syllable between starting end and this word.Threshold value also comprises F word threshold value FT, and it is based on the quantity that finishes the syllable between end and this word.
If testing procedure 450 determining steps 440 have identified pause,, will insert the natural-sounding pause and be used for phonetic synthesis so in step 460.Pause otherwise will can not insert for the word of selecting in step 430.Then,, check, just turn back to step 430 if also have word not analyze to have judged whether by analysis all words among the text string STR in step 470.Otherwise, phonetic synthesis step 480 will use corpus 106 to carry out phonetic synthesis at compositor 110, and one or more natural-soundings pauses (being inserted among the text string STR in step 460) that wherein will occur are inserted in the synthetic speech signal output expression of text string STR.
Referring to Fig. 5, the more detailed figure of analytical procedure 440 has been described.At first, check text string STR, whether have part of speech (POS) pattern and pause naturally to judge it in step 441.The example that the POS pattern is paused naturally is as follows:
1. number+noun
For example: two thousand books
2. verb+adverbial word
For example: look carefully
3. preposition+noun
For example: with telescopes
4. adjective+noun
For example: beautiful city
If determine to have pause in step 441, will carry out step 446 so, this pause is identified as the F word and pauses.If determine not pause in step 441, will check text string STR in step 442 so, whether have the portmanteau word insertion pause that pauses naturally to judge it.The example that portmanteau word pauses naturally is as follows:
a bit of
a body of
a few
a fleet of
a flooding of
a fraction of
a function of
a good deal
a good deal of
a great deal
a great deal of
a hint of
a large body of
a large number of
a lot ofland
a majority of
If determine to have pause in step 442, will carry out step 446 so, this pause is identified as the F word and pauses.If determine not pause to be identified in step 442,, will carry out a test to judge whether to have reached the P word threshold value PT of selected word so in step 443.Quantity by the syllable between starting end and the selected word among the comparison text string STR is carried out this judgement.If reached the P word threshold value PT of selected word, will determine to exist nature to pause so, and it is designated the pause of P word in step 444.In addition, do not identified,, will be carried out a test to judge whether to have reached the F word threshold value FT of selected word so in step 445 if pause in step 443.Carry out this judgement by comparing the quantity that finishes the syllable between end and the selected word among the text string STR.If reached the F word threshold value FT of selected word, will determine to exist nature to pause so, and it is designated the pause of F word in step 446.Otherwise not pausing in step 447 is identified.
The invention has the advantages that allow the natural-sounding in the sign text string to pause, it is synthetic to be used for literary composition language conversion (TTS), improves the quality of synthetic speech thus.
Above detail specifications has only provided preferred example embodiment, and and be not intended to limit the scope of the invention, applicability or configuration.The detailed description of preferred example embodiment is in order to make those skilled in the art can realize preferred example embodiment of the present invention.Be to be understood that under the prerequisite of the spirit and scope of the present invention of in not deviating from, being set forth, on the function of element and structure, can make multiple change as claims.