CN103531196A - Sound selection method for waveform concatenation speech synthesis - Google Patents
Sound selection method for waveform concatenation speech synthesis Download PDFInfo
- Publication number
- CN103531196A CN103531196A CN201310481306.9A CN201310481306A CN103531196A CN 103531196 A CN103531196 A CN 103531196A CN 201310481306 A CN201310481306 A CN 201310481306A CN 103531196 A CN103531196 A CN 103531196A
- Authority
- CN
- China
- Prior art keywords
- primitive
- obtains
- candidate
- syllable
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000015572 biosynthetic process Effects 0.000 title claims abstract description 17
- 238000003786 synthesis reaction Methods 0.000 title claims abstract description 17
- 238000010187 selection method Methods 0.000 title abstract 2
- 238000000034 method Methods 0.000 claims abstract description 47
- 238000012549 training Methods 0.000 claims abstract description 24
- 238000003066 decision tree Methods 0.000 claims abstract description 11
- 238000001228 spectrum Methods 0.000 claims description 16
- 238000000605 extraction Methods 0.000 claims description 4
- 238000002372 labelling Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 230000001360 synchronised effect Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 5
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000002156 mixing Methods 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000010189 synthetic method Methods 0.000 description 2
- 230000007812 deficiency Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (9)
- Waveform concatenation phonetic synthesis select a sound method, it is characterized in that, the method comprises the following steps:Step S1, carries out the model training based on hidden Markov based on extract the original audio obtaining from audio database, obtains acoustic model collection and characteristic of correspondence decision tree;Step S2, inputs some training texts, based on described feature decision tree search, obtains associated acoustic models, and then obtains corresponding target voice and target syllable;Step S3, according to the similarity of the described target voice candidate primitive corresponding with it, and the likelihood probability of each parameters,acoustic of described candidate's primitive under current acoustic model, training obtains similarity sorter;Step S4, inputs any text to be synthesized, based on described similarity sorter, rejects dissimilar candidate's primitive, for remaining candidate's primitive, utilize concatenated cost minimum principle to select to obtain best primitive, and splicing obtains synthetic speech.
- 2. method according to claim 1, is characterized in that, described step S1 is further comprising the steps:Step S11, obtains the original audio in audio database;Step S12, carries out the extraction of frequency spectrum parameter and base frequency parameters frame by frame for described original audio;Step S13, the text corresponding for described original audio carries out synchronous mark, marks out the contextual feature information of corresponding syllable in described original audio, described original audio carried out to segment cutting mark simultaneously;Step S14, the frequency spectrum parameter based on described original audio and base frequency parameters, contextual feature information labeling, and segment cutting mark, carry out traditional hidden Markov model training, obtain the Models Sets that comprises duration, fundamental frequency and frequency spectrum, and feature decision tree separately.
- 3. method according to claim 2, is characterized in that, described step S12 is further comprising the steps:Step S121, divides frame windowing process by described original audio;Step S122, to processing its Mel cepstrum coefficient of every frame audio extraction obtaining;Step S123, calculates the base frequency parameters of every frame audio frequency.
- 4. method according to claim 1, is characterized in that, described step S2 is further comprising the steps:Step S21, inputs the training text of a plurality of syllable balances, through text analyzing, obtains corresponding contextual feature sequence;Step S22, is input to described contextual feature sequence in described feature decision tree, obtains the acoustic model sequence that meets current context;Step S23, based on described acoustic model sequence, adopts parameter generation algorithm to obtain target voice parameter;Step S24, based on described target voice parameter, synthesizes target sentences voice with vocoder, and described target sentences phonetic segmentation is become to target syllable.
- 5. method according to claim 4, is characterized in that, described text analyzing is for to extract the feature in text.
- 6. method according to claim 4, it is characterized in that, in described step S22, according to the contextual feature in described contextual feature sequence, respectively the clustering tree of duration, fundamental frequency and frequency spectrum parameter is carried out to decision-making, obtain corresponding acoustic model sequence and duration modeling.
- 7. method according to claim 4, is characterized in that, described target voice parameter comprises fundamental frequency and frequency spectrum parameter.
- 8. method according to claim 4, is characterized in that, described step S3 is further comprising the steps:Step S31, sentence in described audio database is carried out to cutting by syllable, cutting obtains take the segment that syllable is unit, be candidate's primitive, identical syllable is classified as to a class, with this, build candidate's primitive storehouse, and distribute to frame by frame each candidate's primitive in candidate's primitive storehouse by extracting the frequency spectrum parameter and the base frequency parameters that obtain in described step S12;Step S32, the parameters,acoustic of each primitive that described in each, target syllable is corresponding is brought in the context acoustic model that described step S22 obtains successively, the probability of duration, fundamental frequency and the frequency spectrum that calculates each primitive under its corresponding acoustic model, and using the set of all probability as characteristic set;Step S33, convenes some Chinese native persons to carry out binary mark to the similarity of described target syllable and candidate's primitive, similar or dissimilar, and using this result as categorical attribute;Step S34, based on described categorical attribute and characteristic set, carries out the training of similarity sorter.
- 9. method according to claim 8, is characterized in that, described step S4 is further comprising the steps:Step S41, inputs text to be synthesized, and obtains corresponding acoustic model according to described step S22;Step S42, the likelihood probability set of each parameters,acoustic that calculates each primitive according to described step S32 under current acoustic model, and using it as characteristic set;Step S43, inputs to described characteristic set in described similarity sorter, can dope each primitive and belong to similar classification or dissimilar classification;Step S44, removes all primitives in dissimilar classification, to remaining primitive, adopts concatenated cost minimum principle to select sound;Step S45, carries out windowing to the primitive of selecting to obtain level and smooth, obtains final synthetic speech.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310481306.9A CN103531196B (en) | 2013-10-15 | 2013-10-15 | A kind of waveform concatenation phonetic synthesis select sound method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310481306.9A CN103531196B (en) | 2013-10-15 | 2013-10-15 | A kind of waveform concatenation phonetic synthesis select sound method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103531196A true CN103531196A (en) | 2014-01-22 |
CN103531196B CN103531196B (en) | 2016-04-13 |
Family
ID=49933149
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310481306.9A Active CN103531196B (en) | 2013-10-15 | 2013-10-15 | A kind of waveform concatenation phonetic synthesis select sound method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103531196B (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104575488A (en) * | 2014-12-25 | 2015-04-29 | 北京时代瑞朗科技有限公司 | Text information-based waveform concatenation voice synthesizing method |
CN105304081A (en) * | 2015-11-09 | 2016-02-03 | 上海语知义信息技术有限公司 | Smart household voice broadcasting system and voice broadcasting method |
CN105654940A (en) * | 2016-01-26 | 2016-06-08 | 百度在线网络技术(北京)有限公司 | Voice synthesis method and device |
CN105719641A (en) * | 2016-01-19 | 2016-06-29 | 百度在线网络技术(北京)有限公司 | Voice selection method and device used for waveform splicing of voice synthesis |
CN106356052A (en) * | 2016-10-17 | 2017-01-25 | 腾讯科技(深圳)有限公司 | Voice synthesis method and device |
WO2017028003A1 (en) * | 2015-08-14 | 2017-02-23 | 华侃如 | Hidden markov model-based voice unit concatenation method |
CN106652986A (en) * | 2016-12-08 | 2017-05-10 | 腾讯音乐娱乐(深圳)有限公司 | Song audio splicing method and device |
CN106970950A (en) * | 2017-03-07 | 2017-07-21 | 腾讯音乐娱乐(深圳)有限公司 | The lookup method and device of similar audio data |
CN107492371A (en) * | 2017-07-17 | 2017-12-19 | 广东讯飞启明科技发展有限公司 | A kind of big language material sound storehouse method of cutting out |
CN107507619A (en) * | 2017-09-11 | 2017-12-22 | 厦门美图之家科技有限公司 | Phonetics transfer method, device, electronic equipment and readable storage medium storing program for executing |
CN109147799A (en) * | 2018-10-18 | 2019-01-04 | 广州势必可赢网络科技有限公司 | A kind of method, apparatus of speech recognition, equipment and computer storage medium |
CN109686358A (en) * | 2018-12-24 | 2019-04-26 | 广州九四智能科技有限公司 | The intelligent customer service phoneme synthesizing method of high-fidelity |
CN111899715A (en) * | 2020-07-14 | 2020-11-06 | 升智信息科技(南京)有限公司 | Speech synthesis method |
CN113011127A (en) * | 2021-02-08 | 2021-06-22 | 杭州网易云音乐科技有限公司 | Text phonetic notation method and device, storage medium and electronic equipment |
CN113096650A (en) * | 2021-03-03 | 2021-07-09 | 河海大学 | Acoustic decoding method based on prior probability |
CN113421544A (en) * | 2021-06-30 | 2021-09-21 | 平安科技(深圳)有限公司 | Singing voice synthesis method and device, computer equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH04313034A (en) * | 1990-10-16 | 1992-11-05 | Internatl Business Mach Corp <Ibm> | Synthesized-speech generating method |
CN101178896A (en) * | 2007-12-06 | 2008-05-14 | 安徽科大讯飞信息科技股份有限公司 | Unit selection voice synthetic method based on acoustics statistical model |
CN101471071A (en) * | 2007-12-26 | 2009-07-01 | 中国科学院自动化研究所 | Speech synthesis system based on mixed hidden Markov model |
CN102496363A (en) * | 2011-11-11 | 2012-06-13 | 北京宇音天下科技有限公司 | Correction method for Chinese speech synthesis tone |
-
2013
- 2013-10-15 CN CN201310481306.9A patent/CN103531196B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH04313034A (en) * | 1990-10-16 | 1992-11-05 | Internatl Business Mach Corp <Ibm> | Synthesized-speech generating method |
CN101178896A (en) * | 2007-12-06 | 2008-05-14 | 安徽科大讯飞信息科技股份有限公司 | Unit selection voice synthetic method based on acoustics statistical model |
CN101471071A (en) * | 2007-12-26 | 2009-07-01 | 中国科学院自动化研究所 | Speech synthesis system based on mixed hidden Markov model |
CN102496363A (en) * | 2011-11-11 | 2012-06-13 | 北京宇音天下科技有限公司 | Correction method for Chinese speech synthesis tone |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104575488A (en) * | 2014-12-25 | 2015-04-29 | 北京时代瑞朗科技有限公司 | Text information-based waveform concatenation voice synthesizing method |
WO2017028003A1 (en) * | 2015-08-14 | 2017-02-23 | 华侃如 | Hidden markov model-based voice unit concatenation method |
CN105304081A (en) * | 2015-11-09 | 2016-02-03 | 上海语知义信息技术有限公司 | Smart household voice broadcasting system and voice broadcasting method |
CN105719641A (en) * | 2016-01-19 | 2016-06-29 | 百度在线网络技术(北京)有限公司 | Voice selection method and device used for waveform splicing of voice synthesis |
CN105719641B (en) * | 2016-01-19 | 2019-07-30 | 百度在线网络技术(北京)有限公司 | Sound method and apparatus are selected for waveform concatenation speech synthesis |
CN105654940A (en) * | 2016-01-26 | 2016-06-08 | 百度在线网络技术(北京)有限公司 | Voice synthesis method and device |
CN106356052A (en) * | 2016-10-17 | 2017-01-25 | 腾讯科技(深圳)有限公司 | Voice synthesis method and device |
CN106356052B (en) * | 2016-10-17 | 2019-03-15 | 腾讯科技(深圳)有限公司 | Phoneme synthesizing method and device |
US10832652B2 (en) | 2016-10-17 | 2020-11-10 | Tencent Technology (Shenzhen) Company Limited | Model generating method, and speech synthesis method and apparatus |
CN106652986A (en) * | 2016-12-08 | 2017-05-10 | 腾讯音乐娱乐(深圳)有限公司 | Song audio splicing method and device |
CN106652986B (en) * | 2016-12-08 | 2020-03-20 | 腾讯音乐娱乐(深圳)有限公司 | Song audio splicing method and equipment |
CN106970950A (en) * | 2017-03-07 | 2017-07-21 | 腾讯音乐娱乐(深圳)有限公司 | The lookup method and device of similar audio data |
CN106970950B (en) * | 2017-03-07 | 2021-08-24 | 腾讯音乐娱乐(深圳)有限公司 | Similar audio data searching method and device |
CN107492371A (en) * | 2017-07-17 | 2017-12-19 | 广东讯飞启明科技发展有限公司 | A kind of big language material sound storehouse method of cutting out |
CN107507619A (en) * | 2017-09-11 | 2017-12-22 | 厦门美图之家科技有限公司 | Phonetics transfer method, device, electronic equipment and readable storage medium storing program for executing |
CN107507619B (en) * | 2017-09-11 | 2021-08-20 | 厦门美图之家科技有限公司 | Voice conversion method and device, electronic equipment and readable storage medium |
CN109147799A (en) * | 2018-10-18 | 2019-01-04 | 广州势必可赢网络科技有限公司 | A kind of method, apparatus of speech recognition, equipment and computer storage medium |
CN109686358A (en) * | 2018-12-24 | 2019-04-26 | 广州九四智能科技有限公司 | The intelligent customer service phoneme synthesizing method of high-fidelity |
CN111899715A (en) * | 2020-07-14 | 2020-11-06 | 升智信息科技(南京)有限公司 | Speech synthesis method |
CN111899715B (en) * | 2020-07-14 | 2024-03-29 | 升智信息科技(南京)有限公司 | Speech synthesis method |
CN113011127A (en) * | 2021-02-08 | 2021-06-22 | 杭州网易云音乐科技有限公司 | Text phonetic notation method and device, storage medium and electronic equipment |
CN113096650A (en) * | 2021-03-03 | 2021-07-09 | 河海大学 | Acoustic decoding method based on prior probability |
CN113096650B (en) * | 2021-03-03 | 2023-12-08 | 河海大学 | Acoustic decoding method based on prior probability |
CN113421544A (en) * | 2021-06-30 | 2021-09-21 | 平安科技(深圳)有限公司 | Singing voice synthesis method and device, computer equipment and storage medium |
CN113421544B (en) * | 2021-06-30 | 2024-05-10 | 平安科技(深圳)有限公司 | Singing voice synthesizing method, singing voice synthesizing device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN103531196B (en) | 2016-04-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103531196B (en) | A kind of waveform concatenation phonetic synthesis select sound method | |
CN102779508B (en) | Sound bank generates Apparatus for () and method therefor, speech synthesis system and method thereof | |
CN101178896B (en) | Unit selection voice synthetic method based on acoustics statistical model | |
CN107452379B (en) | Dialect language identification method and virtual reality teaching method and system | |
CN104112444B (en) | A kind of waveform concatenation phoneme synthesizing method based on text message | |
Xie et al. | Sequence error (SE) minimization training of neural network for voice conversion. | |
CN101000765A (en) | Speech synthetic method based on rhythm character | |
CN108228576B (en) | Text translation method and device | |
CN1835075B (en) | Speech synthetizing method combined natural sample selection and acaustic parameter to build mould | |
CN105023573A (en) | Speech syllable/vowel/phone boundary detection using auditory attention cues | |
CN105760852A (en) | Driver emotion real time identification method fusing facial expressions and voices | |
CN101751922A (en) | Text-independent speech conversion system based on HMM model state mapping | |
JP4829477B2 (en) | Voice quality conversion device, voice quality conversion method, and voice quality conversion program | |
CN109346056A (en) | Phoneme synthesizing method and device based on depth measure network | |
Xie et al. | A KL divergence and DNN approach to cross-lingual TTS | |
CN106297765B (en) | Phoneme synthesizing method and system | |
CN102254554A (en) | Method for carrying out hierarchical modeling and predicating on mandarin accent | |
CN109036376A (en) | A kind of the south of Fujian Province language phoneme synthesizing method | |
CN108172211A (en) | Adjustable waveform concatenation system and method | |
CN106297766B (en) | Phoneme synthesizing method and system | |
Shah et al. | Nonparallel emotional voice conversion for unseen speaker-emotion pairs using dual domain adversarial network & virtual domain pairing | |
CN104575488A (en) | Text information-based waveform concatenation voice synthesizing method | |
CN104916282A (en) | Speech synthesis method and apparatus | |
Kayte et al. | A Marathi Hidden-Markov Model Based Speech Synthesis System | |
CN102511061A (en) | Method and apparatus for fusing voiced phoneme units in text-to-speech |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20170602 Address after: 100094, No. 4, building A, No. 1, building 2, wing Cheng North Road, No. 405-346, Beijing, Haidian District Patentee after: Beijing Rui Heng Heng Xun Technology Co., Ltd. Address before: 100190 Zhongguancun East Road, Beijing, No. 95, No. Patentee before: Institute of Automation, Chinese Academy of Sciences |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20181224 Address after: 100190 Zhongguancun East Road, Haidian District, Haidian District, Beijing Patentee after: Institute of Automation, Chinese Academy of Sciences Address before: 100094 No. 405-346, 4th floor, Building A, No. 1, Courtyard 2, Yongcheng North Road, Haidian District, Beijing Patentee before: Beijing Rui Heng Heng Xun Technology Co., Ltd. |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20190528 Address after: 310019 1105, 11 / F, 4 building, 9 Ring Road, Jianggan District nine, Hangzhou, Zhejiang. Patentee after: Limit element (Hangzhou) intelligent Polytron Technologies Inc Address before: 100190 Zhongguancun East Road, Haidian District, Haidian District, Beijing Patentee before: Institute of Automation, Chinese Academy of Sciences |
|
CP01 | Change in the name or title of a patent holder | ||
CP01 | Change in the name or title of a patent holder |
Address after: 310019 1105, 11 / F, 4 building, 9 Ring Road, Jianggan District nine, Hangzhou, Zhejiang. Patentee after: Zhongke extreme element (Hangzhou) Intelligent Technology Co., Ltd Address before: 310019 1105, 11 / F, 4 building, 9 Ring Road, Jianggan District nine, Hangzhou, Zhejiang. Patentee before: Limit element (Hangzhou) intelligent Polytron Technologies Inc. |