CN1770261A - Speech synthesis system and method - Google Patents
Speech synthesis system and method Download PDFInfo
- Publication number
- CN1770261A CN1770261A CNA2004100871367A CN200410087136A CN1770261A CN 1770261 A CN1770261 A CN 1770261A CN A2004100871367 A CNA2004100871367 A CN A2004100871367A CN 200410087136 A CN200410087136 A CN 200410087136A CN 1770261 A CN1770261 A CN 1770261A
- Authority
- CN
- China
- Prior art keywords
- word
- data
- affixe
- root
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 230000015572 biosynthetic process Effects 0.000 title claims abstract description 33
- 238000003786 synthesis reaction Methods 0.000 title claims abstract description 33
- 230000000875 corresponding effect Effects 0.000 claims description 35
- 238000012216 screening Methods 0.000 claims description 15
- 230000002194 synthesizing effect Effects 0.000 claims description 13
- 238000000354 decomposition reaction Methods 0.000 claims description 10
- 235000012364 Peperomia pellucida Nutrition 0.000 claims description 5
- 240000007711 Peperomia pellucida Species 0.000 claims description 5
- 230000002596 correlated effect Effects 0.000 claims description 2
- 238000013500 data storage Methods 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 206010028916 Neologism Diseases 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 239000003973 paint Substances 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A speech synthesis system and method carry on the pre-analysis processing to the word, decompose the word into the combination form of root of a word and affix, the speech synthesis system of the invention includes the database, analyzes the module, inquiry module, cuts off the sound module and synthesizes the module at least; the speech synthesis system and method of the present invention can decompose the word into a plurality of root and affix composition forms, and search out the optimum speech sound wave data corresponding to each root and affix, so as to automatically synthesize the speech data of the word, and have better pronunciation effect at the same time.
Description
Technical field
The invention relates to a kind of speech synthesis system and method, particularly can synthesize the system and method for word pronunciation data automatically about a kind of.
Background technology
E-dictionary is because its volume is small and exquisite, and storage volume is big, and has true man pronunciation and unlimited function such as expanding resource, become the indispensable instrument that a lot of people learn foreign languages.
The speech utterance function that most of now e-dictionaries have, mostly realize by dual mode, one, be that pronunciation data with all words in the dictionary is made into recording file in advance and is stored in this dictionary, and link with corresponding word data, when the user clicks this word, can provide the right pronunciation of each word of user.Yet this way often can't be upgraded via the pairing voice data of including in after expanding of word for follow-up in synchronization gain, thereby the function of pronunciation of these expansion neologisms can't be provided.Another kind method then is the automatic synthetic job that carries out voice by TTS (Text-To-Speech) engine, and right synthetic in this way voice are comparatively stiff, and the user can't be provided satisfied pronunciation information.
Therefore, how providing a kind of and can synthesize the voice messaging technology of word automatically, and have the system of preferable voice effect simultaneously, is technical task anxious to be solved at present.
Summary of the invention
For overcoming the shortcoming of above-mentioned prior art, fundamental purpose of the present invention is to provide a kind of speech synthesis system and method, can resolve the word-building form of word automatically, so as to the synthetic pairing voice data of this word.
For reaching above-mentioned purpose, the present invention promptly provides a kind of speech synthesis system and method, and speech synthesis system of the present invention comprises at least: database is used to store many word data and corresponding speech sound waves data thereof; Parsing module is used to analyze the word-building form of word, and carries out corresponding decomposition according to this analysis result, is the combination that is formed by root and affixe with this word deforming; Enquiry module is used for the decomposition result according to this parsing module, inquires about relevant word data at root and affixe from this database respectively, obtains corresponding speech sound waves data; Cutting the sound module, be used for the root and the affixe data that decomposite with reference to this parsing module, is that unit carries out the corresponding sound of cutting with the word data that this enquiry module inquired with the syllable, obtains the pairing voice data of this root and affixe; And synthesis module, be used for process this and cut the resulting speech sound waves data of sound resume module and carry out permutation and combination, be combined to form man-to-man corresponding relation with this root and affixe are formed, synthesize the voice data of this word.
Wherein, said system is applicable in the e-dictionary, and this parsing module is that the word-building rule according to word is decomposed into word the combination that is combined into by a plurality of roots and affixe; This enquiry module then by this root or affixe content, retrieves the word data that all comprise this root or affixe content from database.This enquiry module also comprises the screening unit, this screening unit is used for the root and the affixe data that decomposite according to this parsing module, in all word data relevant that enquiry module inquires, filter out best word data by contrast, carry out subsequent treatment for cutting the sound module with this root or affixe content.
Phoneme synthesizing method of the present invention comprises: (1) at first, the word-building form that provides parsing module to be used to analyze word, and carry out corresponding decomposing program according to this analysis result is the combination of being made up of root and affixe with this word deforming; (2) provide enquiry module, be used for decomposition result according to this parsing module at this each root and affixe content, the relevant word data of inquiry from this database respectively, and then obtain corresponding speech sound waves data; (3) providing and cut the sound module, so as to root and the affixe data that decomposites with reference to this parsing module, is that the word data that unit finds enquiry module is cut sound with the syllable, obtains the pairing voice data of this root and affixe; And (4) provide synthesis module, is used for and will carries out permutation and combination via cutting the resulting speech sound waves data of sound resume module, is combined to form man-to-man corresponding relation with this root and affixe are formed, synthesizes the voice data of this word.
In sum, the data that each step of the inventive method generates all leaves in the database, and this database also stores many word data and corresponding speech sound waves data thereof.In step (1), this parsing module is that the word-building rule according to word is decomposed into word the combination that is combined into by a plurality of roots and affixe.In step (2), this enquiry module is according to this root or affixe content, retrieves the word data that all comprise this root or affixe content from this database.And in this method step (2), comprise that also order screening unit is according to root and affixe data that this parsing module decomposited, through all word data relevant that contrast inquires from this enquiry module, filter out best word data, cut the sound module for this and carry out follow-up treatment step with this root or affixe content.Wherein, the comparing result when this screening unit is that then this word data is best word data when existing word data and this root and affixe data in full accord in this database; When a plurality of word candidate data, this screening unit then is that word-building type unanimity and difference and this root or affixe reckling are best word data with the comparing result.
Therefore, speech synthesis system of the present invention and method can be decomposed into word the composition form of some roots and affixe, and retrieve each root and affixe and distinguish corresponding best voice acoustic logging data, so as to the voice data of synthetic this word automatically, and has preferable voice effect simultaneously.
Description of drawings
Fig. 1 is the required basic structure block schematic diagram of speech synthesis system of the present invention; And
Fig. 2 is the operating process synoptic diagram of phoneme synthesizing method of the present invention.
Embodiment
Embodiment
Below by particular specific embodiment explanation embodiments of the present invention.
Fig. 1 is that speech synthesis system of the present invention is applied in the synoptic diagram in the e-dictionary.As shown in the figure, speech synthesis system 100 of the present invention is applicable in the e-dictionary 1, is used for the voice of synthetic word automatically.This speech synthesis system 100 comprises: database 110, parsing module 120, enquiry module 130, cut sound module 140 and synthesis module 150.Wherein, this enquiry module 130 comprises screening unit 131 in addition.
130 decomposition result that are used for by parsing module 120 of enquiry module, respectively at each root and affixe content, the relevant word data of inquiry from database 110 is so as to obtaining corresponding speech sound waves data.Wherein, also include screening unit 131 in this enquiry module 130, be used for the root and the affixe data that decomposite according to parsing module 120, through contrasting from all word data relevant that enquiry module 130 inquires with this root or affixe content, filter out best word data, cut sound module 140 for this and carry out subsequent treatment (being detailed later).This judgment principle is: if in the database 110 when existing word data and this root and affixe data in full accord (being generally root), be the word data of the best; When having a plurality of word candidate data (affixe usually), then with the word-building type consistent and with this root or affixe difference reckling be the word data of the best.
In the present embodiment, the root method that makes this enquiry module 130 at first solve in 120 minutes according to parsing module, from database 110, retrieve the word that all comprise this root, as " method ", " methodic ", word data such as " methodist " and " unmethodical ", then, make this screening unit 131 at the word data that retrieves, compare with this root " method " respectively, exist word " method " to conform to fully with this root data as finding in database 110, then this word " method " promptly is regarded as the best word data corresponding to this root.
Then, all include the word data of this affixe ology to make this enquiry module 130 continue retrieval from database 110, as " technology ", " sociology " and " biology " etc.; Then, this screening unit 131 promptly compares one by one to this word data, as the word of not finding to conform to fully with affixe " ology "; Then analyze the word-building position of ology in this word, because this affixe ology is " suffix " in word " methodology ", 131 of this screening unit filter out the word data that all are suffix with ology; At last each word and this affixe ology are carried out diversity ratio, for example after removing ology, remain alphabetical minimum person, find that after contrasting word " biology " is the most similar to affixe " ology ", promptly select it and be best word data for best.
Cut 140 of sound modules and be used for the root and the affixe data that decomposite with reference to this parsing module 120, and be that sound is cut with the word data that this enquiry module 130 inquires by unit, thereby obtain this root and the pairing voice data of affixe with the syllable.In the present embodiment, the result of these enquiry module 130 inquiries is the word " method " of corresponding root " method " and the word " biology " of corresponding affixe " ology ".Because method and this root are in full accord, so its pairing " speech sound waves 1 (not marking) " data promptly can directly be utilized.This cuts the content of sound module 140 with reference to affixe " ology ", with syllable (vowel or word sound) is that the sound processing is cut accordingly at the speech sound waves data of this word " biology " by unit, and cut at its vowel place, with the back segment speech sound waves data of intercepting word, i.e. " ology " corresponding " speech sound waves 2 (not marking) " data.
150 of synthesis modules are used for handling the speech sound waves data that obtains and carrying out permutation and combination cut sound module 140 via this, form with this root and affixe and are combined to form man-to-man corresponding relation, with the voice data of synthetic this word.In the present embodiment, this synthesis module 150 will be cut " speech sound waves 1 " data and " speech sound waves 2 " data that obtains after sound module 140 is handled, position according to its pairing root and affixe concerns respectively, carry out corresponding arrangement, method (speech sound waves 1)+ology (speech sound waves 2) just, the voice data of synthetic word " methodology ".
Fig. 2 is a process flow diagram, shows the running program of phoneme synthesizing method of the present invention, and phoneme synthesizing method of the present invention is applicable in the e-dictionary.As shown in the figure, at first, carry out step S210, pre-database construction 110 is used for storing the relevant lexical or textual analysis data of these e-dictionary 1 all words and the speech sound waves data of correspondence thereof, then proceeds to step S220.
In step S220, make this parsing module 120 analyze the word-building form of word " methodology ", and word is decomposed into root method+ suffix ology according to analysis result, then, proceed to step S230.
In step S230, make the decomposition result of this enquiry module 130, at this each root and the affixe word data that inquiry is correlated with from database 110 respectively, so as to obtaining corresponding speech sound waves data according to parsing module 120.In the present embodiment, these enquiry module 130 corresponding roots " method " retrieve " method ", " methodic ", " methodist " and word data such as " unmethodical " from database 110; Corresponding suffix " ology " then retrieves " technology ", " sociology " and word data such as " biology " from database 110; Subsequently, make 131 pairs of these word data in this screening unit compare one by one, so as to filtering out to best word data " method " that should root " method ", and to best word data " biology " that should affixe " ology ", then, proceed to step S240.
In step S240, make this cut sound module 140 with reference to this root and affixe data, with the syllable is that unit cuts sound respectively with the resulting best word data of enquiry module 130 inquiries, obtain pairing " speech sound waves 1 " data of this root " method " and, then proceed to step S250 corresponding to " speech sound waves 2 " data of this affixe " ology ".
In step S250, make this synthesis module 150 cut sound module 140 and carry out " speech sound waves 1 " data and " speech sound waves 2 " data that the sound processing is obtained of cutting via this, carry out corresponding permutation and combination according to its corresponding root method with putting in order of affixe ology, be method (speech sound waves 1)+ology (speech sound waves 2), the voice data of synthetic this word.
In sum, speech synthesis system of the present invention and method are applicable in the e-dictionary, this method is at first carried out preanalysis at word and is handled, to identify root and the affixe of forming this word, also in the language database of e-dictionary, retrieve the best voice parameter of each root and affixe, and with these all speech parameters that search out according to smoothing algorithm permutation and combination in addition, synthesize the voice data of this word.
Claims (17)
1. a speech synthesis system is characterized in that, this system comprises at least:
Database is used to store many word data and corresponding speech sound waves data thereof;
Parsing module is used to analyze the word-building form of word, and carries out corresponding decomposition according to this analysis result, is the combination that is formed by root and affixe with this word deforming;
Enquiry module is used for the decomposition result according to this parsing module, inquires about relevant word data at root and affixe from this database respectively, obtains corresponding speech sound waves data;
Cutting the sound module, be used for the root and the affixe data that decomposite with reference to this parsing module, is that unit carries out the corresponding sound of cutting with the word data that this enquiry module inquired with the syllable, obtains the pairing voice data of this root and affixe; And
Synthesis module is used for process this and cuts the resulting speech sound waves data of sound resume module and carry out permutation and combination, is combined to form man-to-man corresponding relation with this root and affixe are formed, synthesizes the voice data of this word.
2. speech synthesis system as claimed in claim 1 is characterized in that this system is applicable to e-dictionary.
3. speech synthesis system as claimed in claim 1 is characterized in that, this parsing module is the word-building rule according to word, word is decomposed into the combination that is combined into by a plurality of roots and affixe.
4. speech synthesis system as claimed in claim 3 is characterized in that this affixe comprises prefix and suffix.
5. speech synthesis system as claimed in claim 1 is characterized in that, this enquiry module is according to this root or affixe content, retrieves the word data that all comprise this root or affixe content from this database.
6. as claim 1 or 5 described speech synthesis systems, it is characterized in that, this enquiry module also wraps the screening unit, be used for the root and the affixe data that decomposite according to this parsing module, from all word data relevant that this enquiry module was inquired, filter out best word data with way of contrast, cut the sound module for this and handle with this root or affixe content.
7. speech synthesis system as claimed in claim 6 is characterized in that, if exist word data and this root and affixe data in full accord in this database, is best word data.
8. speech synthesis system as claimed in claim 6 is characterized in that, when having a plurality of word candidate data, is the word data of the best with word-building type unanimity and difference and this root or affixe reckling.
9. a phoneme synthesizing method is applicable in the speech synthesis system, and this method comprises:
(1) providing parsing module, be used to analyze the word-building form of word, and carry out corresponding decomposition according to this analysis result, is the combination that is formed by root and affixe with this word deforming;
(2) provide enquiry module, be used for decomposition result,, obtain corresponding speech sound waves data at this each root and the affixe word data that inquiry is correlated with from this database respectively according to this parsing module;
(3) providing and cut the sound module, be used for the root and the affixe data that decomposite with reference to this parsing module, is that sound is cut with the word data that this enquiry module inquired by unit with the syllable, obtains the pairing voice data of this root and affixe; And
(4) provide synthesis module, be used for process this and cut the resulting speech sound waves data of sound resume module and carry out permutation and combination, be combined to form man-to-man corresponding relation, synthesize the voice data of this word with this root and affixe form.
10. phoneme synthesizing method as claimed in claim 9 is characterized in that this speech synthesis system is applicable to e-dictionary.
11. phoneme synthesizing method as claimed in claim 9 is characterized in that, the data storage that each step of this method generates is in database.
12. phoneme synthesizing method as claimed in claim 11 is characterized in that, this database also is used to store many word data and corresponding speech sound waves data thereof.
13. phoneme synthesizing method as claimed in claim 9 is characterized in that, in this step (1), this parsing module is the word-building rule according to word, and word is decomposed into combination by root and at least one affixe be combined into.
14. phoneme synthesizing method as claimed in claim 9 is characterized in that, in this step (2), this enquiry module is according to this root or affixe content, retrieves all and comprise one of them word data of this root and affixe at least from this database.
15. phoneme synthesizing method as claimed in claim 14, it is characterized in that, in this step (2), also comprise the screening unit is provided, according to root and the affixe data that this parsing module decomposited, from all word data relevant that this enquiry module inquired, filter out best word data by contrast, cut the sound module for this and handle with this root or affixe content.
16. phoneme synthesizing method as claimed in claim 15, it is characterized in that, in this step (2), the comparing result when the screening unit is when existing word data and this root and affixe data in full accord in this database, and then this word data is best word data.
17. phoneme synthesizing method as claimed in claim 15, it is characterized in that, in this step (2), when having a plurality of word candidate data, this screening unit then is that word-building type unanimity and difference and this root or affixe reckling are best word data with the comparing result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2004100871367A CN100517463C (en) | 2004-11-01 | 2004-11-01 | Speech synthesis system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2004100871367A CN100517463C (en) | 2004-11-01 | 2004-11-01 | Speech synthesis system and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1770261A true CN1770261A (en) | 2006-05-10 |
CN100517463C CN100517463C (en) | 2009-07-22 |
Family
ID=36751506
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2004100871367A Expired - Fee Related CN100517463C (en) | 2004-11-01 | 2004-11-01 | Speech synthesis system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN100517463C (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101645190B (en) * | 2009-07-22 | 2011-03-30 | 合肥讯飞数码科技有限公司 | Word inquiring system and inquiring method thereof |
CN103680261A (en) * | 2012-08-31 | 2014-03-26 | 英业达科技有限公司 | Vocabulary learning system and method |
CN105531757A (en) * | 2013-09-20 | 2016-04-27 | 株式会社东芝 | Voice selection assistance device, voice selection method, and program |
CN108962218A (en) * | 2017-05-27 | 2018-12-07 | 北京搜狗科技发展有限公司 | A kind of word pronunciation method and apparatus |
CN109271037A (en) * | 2017-07-13 | 2019-01-25 | 北京搜狗科技发展有限公司 | A kind of method for building up and device of error correction dictionary |
CN109545014A (en) * | 2018-12-28 | 2019-03-29 | 杭州晶智能科技有限公司 | A kind of foreign language word exercising method based on interactive voice |
CN110444190A (en) * | 2019-08-13 | 2019-11-12 | 广州国音智能科技有限公司 | Method of speech processing, device, terminal device and storage medium |
CN111681467A (en) * | 2020-06-01 | 2020-09-18 | 广东小天才科技有限公司 | Vocabulary learning method, electronic equipment and storage medium |
CN112434521A (en) * | 2020-11-13 | 2021-03-02 | 北京搜狗科技发展有限公司 | Vocabulary processing method and device |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1178022A (en) * | 1995-03-07 | 1998-04-01 | 英国电讯有限公司 | Speech sound synthesizing device |
JPH1039895A (en) * | 1996-07-25 | 1998-02-13 | Matsushita Electric Ind Co Ltd | Speech synthesising method and apparatus therefor |
CN1113330C (en) * | 1997-08-15 | 2003-07-02 | 英业达股份有限公司 | Speech Regularization Method in Speech Synthesis |
-
2004
- 2004-11-01 CN CNB2004100871367A patent/CN100517463C/en not_active Expired - Fee Related
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101645190B (en) * | 2009-07-22 | 2011-03-30 | 合肥讯飞数码科技有限公司 | Word inquiring system and inquiring method thereof |
CN103680261A (en) * | 2012-08-31 | 2014-03-26 | 英业达科技有限公司 | Vocabulary learning system and method |
CN103680261B (en) * | 2012-08-31 | 2017-03-08 | 英业达科技有限公司 | Lexical learning system and its method |
CN105531757A (en) * | 2013-09-20 | 2016-04-27 | 株式会社东芝 | Voice selection assistance device, voice selection method, and program |
CN108962218A (en) * | 2017-05-27 | 2018-12-07 | 北京搜狗科技发展有限公司 | A kind of word pronunciation method and apparatus |
CN109271037A (en) * | 2017-07-13 | 2019-01-25 | 北京搜狗科技发展有限公司 | A kind of method for building up and device of error correction dictionary |
CN109545014A (en) * | 2018-12-28 | 2019-03-29 | 杭州晶智能科技有限公司 | A kind of foreign language word exercising method based on interactive voice |
CN110444190A (en) * | 2019-08-13 | 2019-11-12 | 广州国音智能科技有限公司 | Method of speech processing, device, terminal device and storage medium |
CN111681467A (en) * | 2020-06-01 | 2020-09-18 | 广东小天才科技有限公司 | Vocabulary learning method, electronic equipment and storage medium |
CN112434521A (en) * | 2020-11-13 | 2021-03-02 | 北京搜狗科技发展有限公司 | Vocabulary processing method and device |
Also Published As
Publication number | Publication date |
---|---|
CN100517463C (en) | 2009-07-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP0867859B1 (en) | Speech recognition language models | |
US6243680B1 (en) | Method and apparatus for obtaining a transcription of phrases through text and spoken utterances | |
US6092044A (en) | Pronunciation generation in speech recognition | |
US6910012B2 (en) | Method and system for speech recognition using phonetically similar word alternatives | |
US8412528B2 (en) | Back-end database reorganization for application-specific concatenative text-to-speech systems | |
EP1154405B1 (en) | Method and device for speech recognition in surroundings with varying noise levels | |
CN1280782C (en) | Extensible speech recognition system that provides user audio feedback | |
EP0978823B1 (en) | Speech recognition | |
EP0867858A2 (en) | Pronunciation generation in speech recognition | |
KR101169074B1 (en) | Segmental tonal modeling for tonal languages | |
EP1515306A1 (en) | Enrolment in speech recognition | |
CN1167307A (en) | Runtime audio unit selection method and system for speech synthesis | |
WO2016048350A1 (en) | Improving automatic speech recognition of multilingual named entities | |
WO2004111869A1 (en) | Exceptional pronunciation dictionary generation method for the automatic pronunciation generation in korean | |
CN100517463C (en) | Speech synthesis system and method | |
Gavalda | SOUP: A parser for real-world spontaneous speech | |
EP0845139A1 (en) | Speech synthesizer having an acoustic element database | |
Ordelman et al. | Compound decomposition in dutch large vocabulary speech recognition. | |
JPH08505957A (en) | Voice recognition system | |
Möbius et al. | The Bell Labs German text-to-speech system: an overview | |
KR20050032759A (en) | Automatic expansion method and device for foreign language transliteration | |
CN112711654B (en) | Chinese character interpretation technique generation method, system, equipment and medium for voice robot | |
EP1803116B1 (en) | Voice recognition method comprising a temporal marker insertion step and corresponding system | |
CN111696530B (en) | Target acoustic model obtaining method and device | |
JP2007163667A (en) | Speech synthesis apparatus and speech synthesis program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C17 | Cessation of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20090722 Termination date: 20101101 |