[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN1770261A - Speech synthesis system and method - Google Patents

Speech synthesis system and method Download PDF

Info

Publication number
CN1770261A
CN1770261A CNA2004100871367A CN200410087136A CN1770261A CN 1770261 A CN1770261 A CN 1770261A CN A2004100871367 A CNA2004100871367 A CN A2004100871367A CN 200410087136 A CN200410087136 A CN 200410087136A CN 1770261 A CN1770261 A CN 1770261A
Authority
CN
China
Prior art keywords
word
data
affixe
root
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2004100871367A
Other languages
Chinese (zh)
Other versions
CN100517463C (en
Inventor
邱全成
马飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inventec Corp
Original Assignee
Inventec Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inventec Corp filed Critical Inventec Corp
Priority to CNB2004100871367A priority Critical patent/CN100517463C/en
Publication of CN1770261A publication Critical patent/CN1770261A/en
Application granted granted Critical
Publication of CN100517463C publication Critical patent/CN100517463C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A speech synthesis system and method carry on the pre-analysis processing to the word, decompose the word into the combination form of root of a word and affix, the speech synthesis system of the invention includes the database, analyzes the module, inquiry module, cuts off the sound module and synthesizes the module at least; the speech synthesis system and method of the present invention can decompose the word into a plurality of root and affix composition forms, and search out the optimum speech sound wave data corresponding to each root and affix, so as to automatically synthesize the speech data of the word, and have better pronunciation effect at the same time.

Description

Speech synthesis system and method
Technical field
The invention relates to a kind of speech synthesis system and method, particularly can synthesize the system and method for word pronunciation data automatically about a kind of.
Background technology
E-dictionary is because its volume is small and exquisite, and storage volume is big, and has true man pronunciation and unlimited function such as expanding resource, become the indispensable instrument that a lot of people learn foreign languages.
The speech utterance function that most of now e-dictionaries have, mostly realize by dual mode, one, be that pronunciation data with all words in the dictionary is made into recording file in advance and is stored in this dictionary, and link with corresponding word data, when the user clicks this word, can provide the right pronunciation of each word of user.Yet this way often can't be upgraded via the pairing voice data of including in after expanding of word for follow-up in synchronization gain, thereby the function of pronunciation of these expansion neologisms can't be provided.Another kind method then is the automatic synthetic job that carries out voice by TTS (Text-To-Speech) engine, and right synthetic in this way voice are comparatively stiff, and the user can't be provided satisfied pronunciation information.
Therefore, how providing a kind of and can synthesize the voice messaging technology of word automatically, and have the system of preferable voice effect simultaneously, is technical task anxious to be solved at present.
Summary of the invention
For overcoming the shortcoming of above-mentioned prior art, fundamental purpose of the present invention is to provide a kind of speech synthesis system and method, can resolve the word-building form of word automatically, so as to the synthetic pairing voice data of this word.
For reaching above-mentioned purpose, the present invention promptly provides a kind of speech synthesis system and method, and speech synthesis system of the present invention comprises at least: database is used to store many word data and corresponding speech sound waves data thereof; Parsing module is used to analyze the word-building form of word, and carries out corresponding decomposition according to this analysis result, is the combination that is formed by root and affixe with this word deforming; Enquiry module is used for the decomposition result according to this parsing module, inquires about relevant word data at root and affixe from this database respectively, obtains corresponding speech sound waves data; Cutting the sound module, be used for the root and the affixe data that decomposite with reference to this parsing module, is that unit carries out the corresponding sound of cutting with the word data that this enquiry module inquired with the syllable, obtains the pairing voice data of this root and affixe; And synthesis module, be used for process this and cut the resulting speech sound waves data of sound resume module and carry out permutation and combination, be combined to form man-to-man corresponding relation with this root and affixe are formed, synthesize the voice data of this word.
Wherein, said system is applicable in the e-dictionary, and this parsing module is that the word-building rule according to word is decomposed into word the combination that is combined into by a plurality of roots and affixe; This enquiry module then by this root or affixe content, retrieves the word data that all comprise this root or affixe content from database.This enquiry module also comprises the screening unit, this screening unit is used for the root and the affixe data that decomposite according to this parsing module, in all word data relevant that enquiry module inquires, filter out best word data by contrast, carry out subsequent treatment for cutting the sound module with this root or affixe content.
Phoneme synthesizing method of the present invention comprises: (1) at first, the word-building form that provides parsing module to be used to analyze word, and carry out corresponding decomposing program according to this analysis result is the combination of being made up of root and affixe with this word deforming; (2) provide enquiry module, be used for decomposition result according to this parsing module at this each root and affixe content, the relevant word data of inquiry from this database respectively, and then obtain corresponding speech sound waves data; (3) providing and cut the sound module, so as to root and the affixe data that decomposites with reference to this parsing module, is that the word data that unit finds enquiry module is cut sound with the syllable, obtains the pairing voice data of this root and affixe; And (4) provide synthesis module, is used for and will carries out permutation and combination via cutting the resulting speech sound waves data of sound resume module, is combined to form man-to-man corresponding relation with this root and affixe are formed, synthesizes the voice data of this word.
In sum, the data that each step of the inventive method generates all leaves in the database, and this database also stores many word data and corresponding speech sound waves data thereof.In step (1), this parsing module is that the word-building rule according to word is decomposed into word the combination that is combined into by a plurality of roots and affixe.In step (2), this enquiry module is according to this root or affixe content, retrieves the word data that all comprise this root or affixe content from this database.And in this method step (2), comprise that also order screening unit is according to root and affixe data that this parsing module decomposited, through all word data relevant that contrast inquires from this enquiry module, filter out best word data, cut the sound module for this and carry out follow-up treatment step with this root or affixe content.Wherein, the comparing result when this screening unit is that then this word data is best word data when existing word data and this root and affixe data in full accord in this database; When a plurality of word candidate data, this screening unit then is that word-building type unanimity and difference and this root or affixe reckling are best word data with the comparing result.
Therefore, speech synthesis system of the present invention and method can be decomposed into word the composition form of some roots and affixe, and retrieve each root and affixe and distinguish corresponding best voice acoustic logging data, so as to the voice data of synthetic this word automatically, and has preferable voice effect simultaneously.
Description of drawings
Fig. 1 is the required basic structure block schematic diagram of speech synthesis system of the present invention; And
Fig. 2 is the operating process synoptic diagram of phoneme synthesizing method of the present invention.
Embodiment
Embodiment
Below by particular specific embodiment explanation embodiments of the present invention.
Fig. 1 is that speech synthesis system of the present invention is applied in the synoptic diagram in the e-dictionary.As shown in the figure, speech synthesis system 100 of the present invention is applicable in the e-dictionary 1, is used for the voice of synthetic word automatically.This speech synthesis system 100 comprises: database 110, parsing module 120, enquiry module 130, cut sound module 140 and synthesis module 150.Wherein, this enquiry module 130 comprises screening unit 131 in addition.
Database 110 is used to store many word data and pairing speech sound waves data thereof.In the present embodiment, this database 110 is divided into word library and sound bank (not marking), wherein, this word library stores the related data of all words in the e-dictionary 1, for example phonetic symbol, part of speech, literal and figure lexical or textual analysis data etc. expand and upgrade for the user, and sound bank is then deposited the several ripple data of voice of word, it and this word library interlink, and corresponding mutually with each word in this word library.
Parsing module 120 is used to analyze the word-building form of word, carries out corresponding decomposition by this analysis result, is the combination that is formed by root and affixe with this word deforming.In English word, major part is the derivative that is combined by root and affixe (prefix/postfix).Its mainly contain " root+suffix " combination form as: paint+-er forms painter; " prefix+root " array configuration as: inter-+vene forms intervene; " root+root " array configuration as: tele+scope forms telescope; And " prefix+root+suffix " array configuration is formed inaudible etc. as: in-+aud+-ible.In the present embodiment, this parsing module 120 is the above-mentioned word-building rules of utilization, and word is decomposed into the array configuration of some roots and affixe, and for example word methodology can be decomposed into root method and suffix ology.
130 decomposition result that are used for by parsing module 120 of enquiry module, respectively at each root and affixe content, the relevant word data of inquiry from database 110 is so as to obtaining corresponding speech sound waves data.Wherein, also include screening unit 131 in this enquiry module 130, be used for the root and the affixe data that decomposite according to parsing module 120, through contrasting from all word data relevant that enquiry module 130 inquires with this root or affixe content, filter out best word data, cut sound module 140 for this and carry out subsequent treatment (being detailed later).This judgment principle is: if in the database 110 when existing word data and this root and affixe data in full accord (being generally root), be the word data of the best; When having a plurality of word candidate data (affixe usually), then with the word-building type consistent and with this root or affixe difference reckling be the word data of the best.
In the present embodiment, the root method that makes this enquiry module 130 at first solve in 120 minutes according to parsing module, from database 110, retrieve the word that all comprise this root, as " method ", " methodic ", word data such as " methodist " and " unmethodical ", then, make this screening unit 131 at the word data that retrieves, compare with this root " method " respectively, exist word " method " to conform to fully with this root data as finding in database 110, then this word " method " promptly is regarded as the best word data corresponding to this root.
Then, all include the word data of this affixe ology to make this enquiry module 130 continue retrieval from database 110, as " technology ", " sociology " and " biology " etc.; Then, this screening unit 131 promptly compares one by one to this word data, as the word of not finding to conform to fully with affixe " ology "; Then analyze the word-building position of ology in this word, because this affixe ology is " suffix " in word " methodology ", 131 of this screening unit filter out the word data that all are suffix with ology; At last each word and this affixe ology are carried out diversity ratio, for example after removing ology, remain alphabetical minimum person, find that after contrasting word " biology " is the most similar to affixe " ology ", promptly select it and be best word data for best.
Cut 140 of sound modules and be used for the root and the affixe data that decomposite with reference to this parsing module 120, and be that sound is cut with the word data that this enquiry module 130 inquires by unit, thereby obtain this root and the pairing voice data of affixe with the syllable.In the present embodiment, the result of these enquiry module 130 inquiries is the word " method " of corresponding root " method " and the word " biology " of corresponding affixe " ology ".Because method and this root are in full accord, so its pairing " speech sound waves 1 (not marking) " data promptly can directly be utilized.This cuts the content of sound module 140 with reference to affixe " ology ", with syllable (vowel or word sound) is that the sound processing is cut accordingly at the speech sound waves data of this word " biology " by unit, and cut at its vowel place, with the back segment speech sound waves data of intercepting word, i.e. " ology " corresponding " speech sound waves 2 (not marking) " data.
150 of synthesis modules are used for handling the speech sound waves data that obtains and carrying out permutation and combination cut sound module 140 via this, form with this root and affixe and are combined to form man-to-man corresponding relation, with the voice data of synthetic this word.In the present embodiment, this synthesis module 150 will be cut " speech sound waves 1 " data and " speech sound waves 2 " data that obtains after sound module 140 is handled, position according to its pairing root and affixe concerns respectively, carry out corresponding arrangement, method (speech sound waves 1)+ology (speech sound waves 2) just, the voice data of synthetic word " methodology ".
Fig. 2 is a process flow diagram, shows the running program of phoneme synthesizing method of the present invention, and phoneme synthesizing method of the present invention is applicable in the e-dictionary.As shown in the figure, at first, carry out step S210, pre-database construction 110 is used for storing the relevant lexical or textual analysis data of these e-dictionary 1 all words and the speech sound waves data of correspondence thereof, then proceeds to step S220.
In step S220, make this parsing module 120 analyze the word-building form of word " methodology ", and word is decomposed into root method+ suffix ology according to analysis result, then, proceed to step S230.
In step S230, make the decomposition result of this enquiry module 130, at this each root and the affixe word data that inquiry is correlated with from database 110 respectively, so as to obtaining corresponding speech sound waves data according to parsing module 120.In the present embodiment, these enquiry module 130 corresponding roots " method " retrieve " method ", " methodic ", " methodist " and word data such as " unmethodical " from database 110; Corresponding suffix " ology " then retrieves " technology ", " sociology " and word data such as " biology " from database 110; Subsequently, make 131 pairs of these word data in this screening unit compare one by one, so as to filtering out to best word data " method " that should root " method ", and to best word data " biology " that should affixe " ology ", then, proceed to step S240.
In step S240, make this cut sound module 140 with reference to this root and affixe data, with the syllable is that unit cuts sound respectively with the resulting best word data of enquiry module 130 inquiries, obtain pairing " speech sound waves 1 " data of this root " method " and, then proceed to step S250 corresponding to " speech sound waves 2 " data of this affixe " ology ".
In step S250, make this synthesis module 150 cut sound module 140 and carry out " speech sound waves 1 " data and " speech sound waves 2 " data that the sound processing is obtained of cutting via this, carry out corresponding permutation and combination according to its corresponding root method with putting in order of affixe ology, be method (speech sound waves 1)+ology (speech sound waves 2), the voice data of synthetic this word.
In sum, speech synthesis system of the present invention and method are applicable in the e-dictionary, this method is at first carried out preanalysis at word and is handled, to identify root and the affixe of forming this word, also in the language database of e-dictionary, retrieve the best voice parameter of each root and affixe, and with these all speech parameters that search out according to smoothing algorithm permutation and combination in addition, synthesize the voice data of this word.

Claims (17)

1. a speech synthesis system is characterized in that, this system comprises at least:
Database is used to store many word data and corresponding speech sound waves data thereof;
Parsing module is used to analyze the word-building form of word, and carries out corresponding decomposition according to this analysis result, is the combination that is formed by root and affixe with this word deforming;
Enquiry module is used for the decomposition result according to this parsing module, inquires about relevant word data at root and affixe from this database respectively, obtains corresponding speech sound waves data;
Cutting the sound module, be used for the root and the affixe data that decomposite with reference to this parsing module, is that unit carries out the corresponding sound of cutting with the word data that this enquiry module inquired with the syllable, obtains the pairing voice data of this root and affixe; And
Synthesis module is used for process this and cuts the resulting speech sound waves data of sound resume module and carry out permutation and combination, is combined to form man-to-man corresponding relation with this root and affixe are formed, synthesizes the voice data of this word.
2. speech synthesis system as claimed in claim 1 is characterized in that this system is applicable to e-dictionary.
3. speech synthesis system as claimed in claim 1 is characterized in that, this parsing module is the word-building rule according to word, word is decomposed into the combination that is combined into by a plurality of roots and affixe.
4. speech synthesis system as claimed in claim 3 is characterized in that this affixe comprises prefix and suffix.
5. speech synthesis system as claimed in claim 1 is characterized in that, this enquiry module is according to this root or affixe content, retrieves the word data that all comprise this root or affixe content from this database.
6. as claim 1 or 5 described speech synthesis systems, it is characterized in that, this enquiry module also wraps the screening unit, be used for the root and the affixe data that decomposite according to this parsing module, from all word data relevant that this enquiry module was inquired, filter out best word data with way of contrast, cut the sound module for this and handle with this root or affixe content.
7. speech synthesis system as claimed in claim 6 is characterized in that, if exist word data and this root and affixe data in full accord in this database, is best word data.
8. speech synthesis system as claimed in claim 6 is characterized in that, when having a plurality of word candidate data, is the word data of the best with word-building type unanimity and difference and this root or affixe reckling.
9. a phoneme synthesizing method is applicable in the speech synthesis system, and this method comprises:
(1) providing parsing module, be used to analyze the word-building form of word, and carry out corresponding decomposition according to this analysis result, is the combination that is formed by root and affixe with this word deforming;
(2) provide enquiry module, be used for decomposition result,, obtain corresponding speech sound waves data at this each root and the affixe word data that inquiry is correlated with from this database respectively according to this parsing module;
(3) providing and cut the sound module, be used for the root and the affixe data that decomposite with reference to this parsing module, is that sound is cut with the word data that this enquiry module inquired by unit with the syllable, obtains the pairing voice data of this root and affixe; And
(4) provide synthesis module, be used for process this and cut the resulting speech sound waves data of sound resume module and carry out permutation and combination, be combined to form man-to-man corresponding relation, synthesize the voice data of this word with this root and affixe form.
10. phoneme synthesizing method as claimed in claim 9 is characterized in that this speech synthesis system is applicable to e-dictionary.
11. phoneme synthesizing method as claimed in claim 9 is characterized in that, the data storage that each step of this method generates is in database.
12. phoneme synthesizing method as claimed in claim 11 is characterized in that, this database also is used to store many word data and corresponding speech sound waves data thereof.
13. phoneme synthesizing method as claimed in claim 9 is characterized in that, in this step (1), this parsing module is the word-building rule according to word, and word is decomposed into combination by root and at least one affixe be combined into.
14. phoneme synthesizing method as claimed in claim 9 is characterized in that, in this step (2), this enquiry module is according to this root or affixe content, retrieves all and comprise one of them word data of this root and affixe at least from this database.
15. phoneme synthesizing method as claimed in claim 14, it is characterized in that, in this step (2), also comprise the screening unit is provided, according to root and the affixe data that this parsing module decomposited, from all word data relevant that this enquiry module inquired, filter out best word data by contrast, cut the sound module for this and handle with this root or affixe content.
16. phoneme synthesizing method as claimed in claim 15, it is characterized in that, in this step (2), the comparing result when the screening unit is when existing word data and this root and affixe data in full accord in this database, and then this word data is best word data.
17. phoneme synthesizing method as claimed in claim 15, it is characterized in that, in this step (2), when having a plurality of word candidate data, this screening unit then is that word-building type unanimity and difference and this root or affixe reckling are best word data with the comparing result.
CNB2004100871367A 2004-11-01 2004-11-01 Speech synthesis system and method Expired - Fee Related CN100517463C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2004100871367A CN100517463C (en) 2004-11-01 2004-11-01 Speech synthesis system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2004100871367A CN100517463C (en) 2004-11-01 2004-11-01 Speech synthesis system and method

Publications (2)

Publication Number Publication Date
CN1770261A true CN1770261A (en) 2006-05-10
CN100517463C CN100517463C (en) 2009-07-22

Family

ID=36751506

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2004100871367A Expired - Fee Related CN100517463C (en) 2004-11-01 2004-11-01 Speech synthesis system and method

Country Status (1)

Country Link
CN (1) CN100517463C (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101645190B (en) * 2009-07-22 2011-03-30 合肥讯飞数码科技有限公司 Word inquiring system and inquiring method thereof
CN103680261A (en) * 2012-08-31 2014-03-26 英业达科技有限公司 Vocabulary learning system and method
CN105531757A (en) * 2013-09-20 2016-04-27 株式会社东芝 Voice selection assistance device, voice selection method, and program
CN108962218A (en) * 2017-05-27 2018-12-07 北京搜狗科技发展有限公司 A kind of word pronunciation method and apparatus
CN109271037A (en) * 2017-07-13 2019-01-25 北京搜狗科技发展有限公司 A kind of method for building up and device of error correction dictionary
CN109545014A (en) * 2018-12-28 2019-03-29 杭州晶智能科技有限公司 A kind of foreign language word exercising method based on interactive voice
CN110444190A (en) * 2019-08-13 2019-11-12 广州国音智能科技有限公司 Method of speech processing, device, terminal device and storage medium
CN111681467A (en) * 2020-06-01 2020-09-18 广东小天才科技有限公司 Vocabulary learning method, electronic equipment and storage medium
CN112434521A (en) * 2020-11-13 2021-03-02 北京搜狗科技发展有限公司 Vocabulary processing method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1178022A (en) * 1995-03-07 1998-04-01 英国电讯有限公司 Speech sound synthesizing device
JPH1039895A (en) * 1996-07-25 1998-02-13 Matsushita Electric Ind Co Ltd Speech synthesising method and apparatus therefor
CN1113330C (en) * 1997-08-15 2003-07-02 英业达股份有限公司 Speech Regularization Method in Speech Synthesis

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101645190B (en) * 2009-07-22 2011-03-30 合肥讯飞数码科技有限公司 Word inquiring system and inquiring method thereof
CN103680261A (en) * 2012-08-31 2014-03-26 英业达科技有限公司 Vocabulary learning system and method
CN103680261B (en) * 2012-08-31 2017-03-08 英业达科技有限公司 Lexical learning system and its method
CN105531757A (en) * 2013-09-20 2016-04-27 株式会社东芝 Voice selection assistance device, voice selection method, and program
CN108962218A (en) * 2017-05-27 2018-12-07 北京搜狗科技发展有限公司 A kind of word pronunciation method and apparatus
CN109271037A (en) * 2017-07-13 2019-01-25 北京搜狗科技发展有限公司 A kind of method for building up and device of error correction dictionary
CN109545014A (en) * 2018-12-28 2019-03-29 杭州晶智能科技有限公司 A kind of foreign language word exercising method based on interactive voice
CN110444190A (en) * 2019-08-13 2019-11-12 广州国音智能科技有限公司 Method of speech processing, device, terminal device and storage medium
CN111681467A (en) * 2020-06-01 2020-09-18 广东小天才科技有限公司 Vocabulary learning method, electronic equipment and storage medium
CN112434521A (en) * 2020-11-13 2021-03-02 北京搜狗科技发展有限公司 Vocabulary processing method and device

Also Published As

Publication number Publication date
CN100517463C (en) 2009-07-22

Similar Documents

Publication Publication Date Title
EP0867859B1 (en) Speech recognition language models
US6243680B1 (en) Method and apparatus for obtaining a transcription of phrases through text and spoken utterances
US6092044A (en) Pronunciation generation in speech recognition
US6910012B2 (en) Method and system for speech recognition using phonetically similar word alternatives
US8412528B2 (en) Back-end database reorganization for application-specific concatenative text-to-speech systems
EP1154405B1 (en) Method and device for speech recognition in surroundings with varying noise levels
CN1280782C (en) Extensible speech recognition system that provides user audio feedback
EP0978823B1 (en) Speech recognition
EP0867858A2 (en) Pronunciation generation in speech recognition
KR101169074B1 (en) Segmental tonal modeling for tonal languages
EP1515306A1 (en) Enrolment in speech recognition
CN1167307A (en) Runtime audio unit selection method and system for speech synthesis
WO2016048350A1 (en) Improving automatic speech recognition of multilingual named entities
WO2004111869A1 (en) Exceptional pronunciation dictionary generation method for the automatic pronunciation generation in korean
CN100517463C (en) Speech synthesis system and method
Gavalda SOUP: A parser for real-world spontaneous speech
EP0845139A1 (en) Speech synthesizer having an acoustic element database
Ordelman et al. Compound decomposition in dutch large vocabulary speech recognition.
JPH08505957A (en) Voice recognition system
Möbius et al. The Bell Labs German text-to-speech system: an overview
KR20050032759A (en) Automatic expansion method and device for foreign language transliteration
CN112711654B (en) Chinese character interpretation technique generation method, system, equipment and medium for voice robot
EP1803116B1 (en) Voice recognition method comprising a temporal marker insertion step and corresponding system
CN111696530B (en) Target acoustic model obtaining method and device
JP2007163667A (en) Speech synthesis apparatus and speech synthesis program

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090722

Termination date: 20101101