[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN103531196A - Sound selection method for waveform concatenation speech synthesis - Google Patents

Sound selection method for waveform concatenation speech synthesis Download PDF

Info

Publication number
CN103531196A
CN103531196A CN201310481306.9A CN201310481306A CN103531196A CN 103531196 A CN103531196 A CN 103531196A CN 201310481306 A CN201310481306 A CN 201310481306A CN 103531196 A CN103531196 A CN 103531196A
Authority
CN
China
Prior art keywords
primitive
obtains
candidate
syllable
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310481306.9A
Other languages
Chinese (zh)
Other versions
CN103531196B (en
Inventor
陶建华
张冉
温正棋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Extreme Element Hangzhou Intelligent Technology Co Ltd
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201310481306.9A priority Critical patent/CN103531196B/en
Publication of CN103531196A publication Critical patent/CN103531196A/en
Application granted granted Critical
Publication of CN103531196B publication Critical patent/CN103531196B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a sound selection method for waveform concatenation speech synthesis. The method comprises the following steps of: on the basis of an original audio, carrying out hidden markov based model training so as to obtain an acoustic model set and a corresponding characteristic decision tree; inputting a plurality of training texts and on the basis of the characteristic decision tree, searching to obtain related acoustic models so as to obtain corresponding target voice and target syllables; according to similarity of the target voice and corresponding candidate primitives and likelihood probability of each acoustic parameter of the candidate primitives under a current acoustic model, training to obtain a similarity classifier; inputting a random text to be synthesized, removing the dissimilar candidate primitives on the basis of the similarity classifier, selecting the optimal primitive from the residual candidate primitives by utilizing a concatenation cost minimization rule and carrying out concatenation to obtain synthetic speech. The adoption of the method disclosed by the invention can synthesize speech with higher tone quality.

Description

A kind of waveform concatenation phonetic synthesis select sound method
Technical field
The present invention relates to Intelligent Information Processing field, what relate in particular to a kind of waveform concatenation phonetic synthesis selects sound method.
Background technology
Voice are as one of Main Means of mankind's exchange of information, and speech synthesis technique is mainly to allow computing machine can produce the continuous speech of high definition, high naturalness.In the evolution of speech synthesis technique, early stage research is mainly to adopt parameter synthetic method, has occurred again afterwards the synthetic method of waveform concatenation along with the development of computer technology.Along with the continuous increase of corpus, the quantity of candidate's primitive, also in continuous growth, how according to input text, is selected best primitive and is spliced, and more and more receives publicity.
Parameter speech synthesis system based on hidden Markov model and the splicing system based on unit selection are the speech synthesis techniques of main flow of nearly more than ten years, and mixing voice synthesis system combines the advantage of the two, the acoustic model that has adopted the former to train instructs unit selection, thereby select more suitable primitive, splices.This mixing voice synthesis system select sound method more more stable than traditional joining method, and manual intervention is still less, but still exists a lot of deficiencies, be mainly manifested in following some:
1, select sound method not embody the perception effect of people's ear, existing in selecting sound method a high score, and do not mean that and selected the voice that are more suitable for people's sense of hearing;
2, select sound method to adopt the method for factor weighted stacking to select sound, each feature that is about to primitive is calculated respectively filial generation valency, then give respectively weight, stack becomes total sound cost of selecting and selects sound again, the method supposes that all factors are linear superposition on the impact of the acceptance of primitive, and this does not obviously meet the fact.
Summary of the invention
For solving above-mentioned one or more problems, what the invention provides a kind of waveform concatenation phonetic synthesis selects sound method.The method combines people's subjective auditory perception, can select the primitive of the most applicable people's ear sense of hearing, finally splices good voice.
The sound method of selecting of waveform concatenation phonetic synthesis provided by the invention comprises the following steps:
Parameter extraction is carried out in original sound storehouse, and in conjunction with corresponding text marking information, carry out the model training based on hidden Markov; Input some training texts, carry out text analyzing, utilize decision tree search correlation model, and utilize parameter generation algorithm to synthesize corresponding target voice, and carry out the cutting of syllable, obtain target syllable; The artificial similarity of passing judgment on synthetic syllable voice and its candidate's primitive voice is used as categorical attribute, and the likelihood probability under "current" model of each parameters,acoustic of calculated candidate primitive, as the proper vector of input, thereby trains a similarity sorter simultaneously; Given any text to be synthesized, is used sorter to reject dissimilar candidate's primitive, to remaining candidate's primitive, utilizes concatenated cost minimum principle to select best primitive, finally splices synthetic speech.
From technique scheme, can find out, the sound method of selecting of waveform concatenation phonetic synthesis of the present invention has following beneficial effect:
(1) the similar primitive of the syllable synthetic to parameter, has identical with it stress and intonation, and the voice that adopt this standard to select splice, and can obtain having both stability and conforming voice;
(2) the similar primitive of the syllable synthetic to parameter, also more easily splicing, because they reach unanimity more in the feature of boundary, does not need or only needs seldom level and smooth, thereby has guaranteed the level and smooth and nature of raw tone;
(3) in selecting sound, introduced people's subjective sense of hearing factor, made the subjectivity hobby of selecting sound result to be more suitable for people.
Accompanying drawing explanation
Fig. 1 is waveform concatenation phonetic synthesis according to an embodiment of the invention selects sound method flow diagram;
Fig. 2 is acoustic training model flow process according to an embodiment of the invention;
Fig. 3 is the training of hidden Markov according to an embodiment of the invention process flow diagram;
Fig. 4 is the product process figure of target syllable according to an embodiment of the invention;
Fig. 5 is the training of sorter according to an embodiment of the invention process flow diagram;
Fig. 6 is for selecting according to an embodiment of the invention the process flow diagram of sound according to sorter.
Embodiment
For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in more detail.
It should be noted that, in accompanying drawing or instructions description, similar or identical part is all used identical figure number.The implementation that does not illustrate in accompanying drawing or describe is form known to a person of ordinary skill in the art in affiliated technical field.In addition, although the demonstration of the parameter that comprises particular value can be provided herein, should be appreciated that, parameter is without definitely equaling corresponding value, but can in acceptable error margin or design constraint, be similar to corresponding value.
Fig. 1 is waveform concatenation phonetic synthesis according to an embodiment of the invention selects sound method flow diagram, and as shown in Figure 1, this selects sound method to comprise the following steps:
Step S1, carries out the model training based on hidden Markov based on extract the original audio obtaining from audio database, obtains acoustic model collection and characteristic of correspondence decision tree;
As shown in Figure 2, described step S1 is further comprising the steps:
Step S11, obtains the original audio in audio database;
Step S12, carries out the extraction of frequency spectrum parameter and base frequency parameters frame by frame for described original audio;
Described step S12 is further comprising the steps:
Step S121, divides frame windowing process by described original audio;
Dividing frame windowing is audio signal processing technique conventional in prior art, and therefore not to repeat here.
Step S122, every frame audio frequency that processing is obtained is such as extract its Mel cepstrum coefficient with STRAIGHT algorithm;
In an embodiment of the present invention, first extract the static Mel cepstrum coefficient in 25 rank, then calculate respectively their first order difference and second order difference, the Mel cepstrum coefficient finally obtaining is 75 dimensions.
Step S123, calculates the base frequency parameters of every frame audio frequency;
In an embodiment of the present invention, first calculate the base frequency parameters of every frame audio frequency, then calculate equally its first order difference and second order difference, the base frequency parameters finally obtaining is 3 dimensions.
Step S13, the text corresponding for described original audio carries out synchronous mark, marks out the contextual feature information of corresponding syllable in described original audio, described original audio carried out to segment cutting mark simultaneously;
In an embodiment of the present invention, take syllable as unit carries out contextual feature information labeling, used altogether the pronunciation character of rhythm structure feature and 24 dimensions of 66 dimensions, described mark is mainly by manually carrying out.
Segmental information in described segment cutting is unimportant, and the present invention adopts the result of automatic segmentation.
Step S14, the frequency spectrum parameter based on described original audio and base frequency parameters, contextual feature information labeling, and segment cutting mark, carry out traditional hidden Markov model training, obtain the Models Sets that comprises duration, fundamental frequency and frequency spectrum, and feature decision tree separately.
In this step, adopt the mode of many spatial probability distribution to carry out modeling, in an embodiment of the present invention, for given parameter and characteristic sequence, carry out the hidden Markov model training of 10 states.Concrete training flow process as shown in Figure 3.
Step S2, inputs some training texts, based on described feature decision tree search, obtains associated acoustic models, and then obtains corresponding target voice and target syllable;
As shown in Figure 4, described step S2 is further comprising the steps:
Step S21, inputs the training text of a plurality of syllable balances, through the text analyzing of front end, by methods such as maximum entropies, the feature in text is extracted, and obtains corresponding contextual feature sequence;
Text analyzing method based on maximum entropy is text analysis technique conventional in prior art, and therefore not to repeat here.
In Chinese, have more than 1300 conventional syllable, therefore, in an embodiment of the present invention, input the text of 500 syllable balances, and the text analyzing of process front end, corresponding context property obtained;
Step S22, is input to described contextual feature sequence in described feature decision tree, obtains the acoustic model sequence that meets current context;
In this step, according to the contextual feature in described contextual feature sequence, respectively the clustering tree of duration, fundamental frequency and frequency spectrum parameter is carried out to decision-making, obtain corresponding acoustic model sequence and duration modeling;
Step S23, based on described acoustic model sequence, adopts parameter generation algorithm to obtain target voice parameter;
Described target voice parameter comprises fundamental frequency and frequency spectrum parameter;
Step S24, based on described target voice parameter, synthesizes target sentences voice with vocoder, and described target sentences phonetic segmentation is become to target syllable.
In this step, the target syllable that cutting obtains is for the target voice of similarity comparison.
Step S3, according to the similarity of the described target voice candidate primitive corresponding with it, and the likelihood probability of each parameters,acoustic of described candidate's primitive under current acoustic model, training obtains similarity sorter;
As shown in Figure 5, described step S3 is further comprising the steps:
Step S31, sentence in described audio database is carried out to cutting by syllable, cutting obtains take the segment that syllable is unit, be candidate's primitive, identical syllable is classified as to a class, with this, build candidate's primitive storehouse, and distribute to frame by frame each candidate's primitive in candidate's primitive storehouse by extracting the frequency spectrum parameter and the base frequency parameters that obtain in described step S12;
Step S32, the parameters,acoustic of each primitive that described in each, target syllable is corresponding is brought in the context acoustic model that described step S22 obtains successively, the probability of duration, fundamental frequency and the frequency spectrum that calculates each primitive under its corresponding acoustic model, and using the set of all probability as characteristic set;
Step S33, convenes some Chinese native persons to carry out binary mark to the similarity of described target syllable and candidate's primitive, similar or dissimilar, and using this result as categorical attribute;
The number of syllables of each class is different, and artificial in order to reduce, in an embodiment of the present invention, each class syllable is got at most 30 syllables for similarity comparison.
Step S34, based on described categorical attribute and characteristic set, carries out the training of similarity sorter.
In an embodiment of the present invention, described similarity sorter can adopt CART sorter or svm classifier device, and experiment shows to adopt the SVM of second order polynomial kernel to have better classifying quality.
Step S4, inputs any text to be synthesized, based on described similarity sorter, rejects dissimilar candidate's primitive, selects sound, for remaining candidate's primitive, utilize concatenated cost minimum principle to select to obtain best primitive, and splicing obtains synthetic speech.
As shown in Figure 6, described step S4 is further comprising the steps:
Step S41, inputs text to be synthesized, and obtains corresponding acoustic model according to described step S22;
Step S42, the likelihood probability set of each parameters,acoustic that calculates each primitive according to described step S32 under current acoustic model, and using it as characteristic set;
Step S43, inputs to described characteristic set in described similarity sorter, can dope each primitive and belong to similar classification or dissimilar classification;
Step S44, removes all primitives in dissimilar classification, to remaining primitive, adopts concatenated cost minimum principle to select sound;
Step S45, carries out windowing to the primitive of selecting to obtain level and smooth, obtains final synthetic speech.
In sum, what the present invention proposes a kind of waveform concatenation phonetic synthesis selects sound method, and the method can synthesize the voice compared with high tone quality.
It should be noted that, the above-mentioned implementation to each parts is not limited in the various implementations of mentioning in embodiment, and those of ordinary skill in the art can know simply and replace it, for example:
(1) the spectrum parameter adopting in training is Mel cepstrum coefficient, can substitute by other parameter, as used the line spectrum pairs parameter of different rank.
(2) to the read statement quantity in sorter training, can suitably increase and decrease according to the computational accuracy of oneself.
Above-described specific embodiment; object of the present invention, technical scheme and beneficial effect are further described; institute is understood that; the foregoing is only specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any modification of making, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (9)

  1. Waveform concatenation phonetic synthesis select a sound method, it is characterized in that, the method comprises the following steps:
    Step S1, carries out the model training based on hidden Markov based on extract the original audio obtaining from audio database, obtains acoustic model collection and characteristic of correspondence decision tree;
    Step S2, inputs some training texts, based on described feature decision tree search, obtains associated acoustic models, and then obtains corresponding target voice and target syllable;
    Step S3, according to the similarity of the described target voice candidate primitive corresponding with it, and the likelihood probability of each parameters,acoustic of described candidate's primitive under current acoustic model, training obtains similarity sorter;
    Step S4, inputs any text to be synthesized, based on described similarity sorter, rejects dissimilar candidate's primitive, for remaining candidate's primitive, utilize concatenated cost minimum principle to select to obtain best primitive, and splicing obtains synthetic speech.
  2. 2. method according to claim 1, is characterized in that, described step S1 is further comprising the steps:
    Step S11, obtains the original audio in audio database;
    Step S12, carries out the extraction of frequency spectrum parameter and base frequency parameters frame by frame for described original audio;
    Step S13, the text corresponding for described original audio carries out synchronous mark, marks out the contextual feature information of corresponding syllable in described original audio, described original audio carried out to segment cutting mark simultaneously;
    Step S14, the frequency spectrum parameter based on described original audio and base frequency parameters, contextual feature information labeling, and segment cutting mark, carry out traditional hidden Markov model training, obtain the Models Sets that comprises duration, fundamental frequency and frequency spectrum, and feature decision tree separately.
  3. 3. method according to claim 2, is characterized in that, described step S12 is further comprising the steps:
    Step S121, divides frame windowing process by described original audio;
    Step S122, to processing its Mel cepstrum coefficient of every frame audio extraction obtaining;
    Step S123, calculates the base frequency parameters of every frame audio frequency.
  4. 4. method according to claim 1, is characterized in that, described step S2 is further comprising the steps:
    Step S21, inputs the training text of a plurality of syllable balances, through text analyzing, obtains corresponding contextual feature sequence;
    Step S22, is input to described contextual feature sequence in described feature decision tree, obtains the acoustic model sequence that meets current context;
    Step S23, based on described acoustic model sequence, adopts parameter generation algorithm to obtain target voice parameter;
    Step S24, based on described target voice parameter, synthesizes target sentences voice with vocoder, and described target sentences phonetic segmentation is become to target syllable.
  5. 5. method according to claim 4, is characterized in that, described text analyzing is for to extract the feature in text.
  6. 6. method according to claim 4, it is characterized in that, in described step S22, according to the contextual feature in described contextual feature sequence, respectively the clustering tree of duration, fundamental frequency and frequency spectrum parameter is carried out to decision-making, obtain corresponding acoustic model sequence and duration modeling.
  7. 7. method according to claim 4, is characterized in that, described target voice parameter comprises fundamental frequency and frequency spectrum parameter.
  8. 8. method according to claim 4, is characterized in that, described step S3 is further comprising the steps:
    Step S31, sentence in described audio database is carried out to cutting by syllable, cutting obtains take the segment that syllable is unit, be candidate's primitive, identical syllable is classified as to a class, with this, build candidate's primitive storehouse, and distribute to frame by frame each candidate's primitive in candidate's primitive storehouse by extracting the frequency spectrum parameter and the base frequency parameters that obtain in described step S12;
    Step S32, the parameters,acoustic of each primitive that described in each, target syllable is corresponding is brought in the context acoustic model that described step S22 obtains successively, the probability of duration, fundamental frequency and the frequency spectrum that calculates each primitive under its corresponding acoustic model, and using the set of all probability as characteristic set;
    Step S33, convenes some Chinese native persons to carry out binary mark to the similarity of described target syllable and candidate's primitive, similar or dissimilar, and using this result as categorical attribute;
    Step S34, based on described categorical attribute and characteristic set, carries out the training of similarity sorter.
  9. 9. method according to claim 8, is characterized in that, described step S4 is further comprising the steps:
    Step S41, inputs text to be synthesized, and obtains corresponding acoustic model according to described step S22;
    Step S42, the likelihood probability set of each parameters,acoustic that calculates each primitive according to described step S32 under current acoustic model, and using it as characteristic set;
    Step S43, inputs to described characteristic set in described similarity sorter, can dope each primitive and belong to similar classification or dissimilar classification;
    Step S44, removes all primitives in dissimilar classification, to remaining primitive, adopts concatenated cost minimum principle to select sound;
    Step S45, carries out windowing to the primitive of selecting to obtain level and smooth, obtains final synthetic speech.
CN201310481306.9A 2013-10-15 2013-10-15 A kind of waveform concatenation phonetic synthesis select sound method Active CN103531196B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310481306.9A CN103531196B (en) 2013-10-15 2013-10-15 A kind of waveform concatenation phonetic synthesis select sound method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310481306.9A CN103531196B (en) 2013-10-15 2013-10-15 A kind of waveform concatenation phonetic synthesis select sound method

Publications (2)

Publication Number Publication Date
CN103531196A true CN103531196A (en) 2014-01-22
CN103531196B CN103531196B (en) 2016-04-13

Family

ID=49933149

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310481306.9A Active CN103531196B (en) 2013-10-15 2013-10-15 A kind of waveform concatenation phonetic synthesis select sound method

Country Status (1)

Country Link
CN (1) CN103531196B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104575488A (en) * 2014-12-25 2015-04-29 北京时代瑞朗科技有限公司 Text information-based waveform concatenation voice synthesizing method
CN105304081A (en) * 2015-11-09 2016-02-03 上海语知义信息技术有限公司 Smart household voice broadcasting system and voice broadcasting method
CN105654940A (en) * 2016-01-26 2016-06-08 百度在线网络技术(北京)有限公司 Voice synthesis method and device
CN105719641A (en) * 2016-01-19 2016-06-29 百度在线网络技术(北京)有限公司 Voice selection method and device used for waveform splicing of voice synthesis
CN106356052A (en) * 2016-10-17 2017-01-25 腾讯科技(深圳)有限公司 Voice synthesis method and device
WO2017028003A1 (en) * 2015-08-14 2017-02-23 华侃如 Hidden markov model-based voice unit concatenation method
CN106652986A (en) * 2016-12-08 2017-05-10 腾讯音乐娱乐(深圳)有限公司 Song audio splicing method and device
CN106970950A (en) * 2017-03-07 2017-07-21 腾讯音乐娱乐(深圳)有限公司 The lookup method and device of similar audio data
CN107492371A (en) * 2017-07-17 2017-12-19 广东讯飞启明科技发展有限公司 A kind of big language material sound storehouse method of cutting out
CN107507619A (en) * 2017-09-11 2017-12-22 厦门美图之家科技有限公司 Phonetics transfer method, device, electronic equipment and readable storage medium storing program for executing
CN109147799A (en) * 2018-10-18 2019-01-04 广州势必可赢网络科技有限公司 A kind of method, apparatus of speech recognition, equipment and computer storage medium
CN109686358A (en) * 2018-12-24 2019-04-26 广州九四智能科技有限公司 The intelligent customer service phoneme synthesizing method of high-fidelity
CN111899715A (en) * 2020-07-14 2020-11-06 升智信息科技(南京)有限公司 Speech synthesis method
CN113011127A (en) * 2021-02-08 2021-06-22 杭州网易云音乐科技有限公司 Text phonetic notation method and device, storage medium and electronic equipment
CN113096650A (en) * 2021-03-03 2021-07-09 河海大学 Acoustic decoding method based on prior probability
CN113421544A (en) * 2021-06-30 2021-09-21 平安科技(深圳)有限公司 Singing voice synthesis method and device, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04313034A (en) * 1990-10-16 1992-11-05 Internatl Business Mach Corp <Ibm> Synthesized-speech generating method
CN101178896A (en) * 2007-12-06 2008-05-14 安徽科大讯飞信息科技股份有限公司 Unit selection voice synthetic method based on acoustics statistical model
CN101471071A (en) * 2007-12-26 2009-07-01 中国科学院自动化研究所 Speech synthesis system based on mixed hidden Markov model
CN102496363A (en) * 2011-11-11 2012-06-13 北京宇音天下科技有限公司 Correction method for Chinese speech synthesis tone

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04313034A (en) * 1990-10-16 1992-11-05 Internatl Business Mach Corp <Ibm> Synthesized-speech generating method
CN101178896A (en) * 2007-12-06 2008-05-14 安徽科大讯飞信息科技股份有限公司 Unit selection voice synthetic method based on acoustics statistical model
CN101471071A (en) * 2007-12-26 2009-07-01 中国科学院自动化研究所 Speech synthesis system based on mixed hidden Markov model
CN102496363A (en) * 2011-11-11 2012-06-13 北京宇音天下科技有限公司 Correction method for Chinese speech synthesis tone

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104575488A (en) * 2014-12-25 2015-04-29 北京时代瑞朗科技有限公司 Text information-based waveform concatenation voice synthesizing method
WO2017028003A1 (en) * 2015-08-14 2017-02-23 华侃如 Hidden markov model-based voice unit concatenation method
CN105304081A (en) * 2015-11-09 2016-02-03 上海语知义信息技术有限公司 Smart household voice broadcasting system and voice broadcasting method
CN105719641A (en) * 2016-01-19 2016-06-29 百度在线网络技术(北京)有限公司 Voice selection method and device used for waveform splicing of voice synthesis
CN105719641B (en) * 2016-01-19 2019-07-30 百度在线网络技术(北京)有限公司 Sound method and apparatus are selected for waveform concatenation speech synthesis
CN105654940A (en) * 2016-01-26 2016-06-08 百度在线网络技术(北京)有限公司 Voice synthesis method and device
CN106356052A (en) * 2016-10-17 2017-01-25 腾讯科技(深圳)有限公司 Voice synthesis method and device
CN106356052B (en) * 2016-10-17 2019-03-15 腾讯科技(深圳)有限公司 Phoneme synthesizing method and device
US10832652B2 (en) 2016-10-17 2020-11-10 Tencent Technology (Shenzhen) Company Limited Model generating method, and speech synthesis method and apparatus
CN106652986A (en) * 2016-12-08 2017-05-10 腾讯音乐娱乐(深圳)有限公司 Song audio splicing method and device
CN106652986B (en) * 2016-12-08 2020-03-20 腾讯音乐娱乐(深圳)有限公司 Song audio splicing method and equipment
CN106970950A (en) * 2017-03-07 2017-07-21 腾讯音乐娱乐(深圳)有限公司 The lookup method and device of similar audio data
CN106970950B (en) * 2017-03-07 2021-08-24 腾讯音乐娱乐(深圳)有限公司 Similar audio data searching method and device
CN107492371A (en) * 2017-07-17 2017-12-19 广东讯飞启明科技发展有限公司 A kind of big language material sound storehouse method of cutting out
CN107507619A (en) * 2017-09-11 2017-12-22 厦门美图之家科技有限公司 Phonetics transfer method, device, electronic equipment and readable storage medium storing program for executing
CN107507619B (en) * 2017-09-11 2021-08-20 厦门美图之家科技有限公司 Voice conversion method and device, electronic equipment and readable storage medium
CN109147799A (en) * 2018-10-18 2019-01-04 广州势必可赢网络科技有限公司 A kind of method, apparatus of speech recognition, equipment and computer storage medium
CN109686358A (en) * 2018-12-24 2019-04-26 广州九四智能科技有限公司 The intelligent customer service phoneme synthesizing method of high-fidelity
CN111899715A (en) * 2020-07-14 2020-11-06 升智信息科技(南京)有限公司 Speech synthesis method
CN111899715B (en) * 2020-07-14 2024-03-29 升智信息科技(南京)有限公司 Speech synthesis method
CN113011127A (en) * 2021-02-08 2021-06-22 杭州网易云音乐科技有限公司 Text phonetic notation method and device, storage medium and electronic equipment
CN113096650A (en) * 2021-03-03 2021-07-09 河海大学 Acoustic decoding method based on prior probability
CN113096650B (en) * 2021-03-03 2023-12-08 河海大学 Acoustic decoding method based on prior probability
CN113421544A (en) * 2021-06-30 2021-09-21 平安科技(深圳)有限公司 Singing voice synthesis method and device, computer equipment and storage medium
CN113421544B (en) * 2021-06-30 2024-05-10 平安科技(深圳)有限公司 Singing voice synthesizing method, singing voice synthesizing device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN103531196B (en) 2016-04-13

Similar Documents

Publication Publication Date Title
CN103531196B (en) A kind of waveform concatenation phonetic synthesis select sound method
CN102779508B (en) Sound bank generates Apparatus for () and method therefor, speech synthesis system and method thereof
CN101178896B (en) Unit selection voice synthetic method based on acoustics statistical model
CN107452379B (en) Dialect language identification method and virtual reality teaching method and system
CN104112444B (en) A kind of waveform concatenation phoneme synthesizing method based on text message
Xie et al. Sequence error (SE) minimization training of neural network for voice conversion.
CN101000765A (en) Speech synthetic method based on rhythm character
CN108228576B (en) Text translation method and device
CN1835075B (en) Speech synthetizing method combined natural sample selection and acaustic parameter to build mould
CN105023573A (en) Speech syllable/vowel/phone boundary detection using auditory attention cues
CN105760852A (en) Driver emotion real time identification method fusing facial expressions and voices
CN101751922A (en) Text-independent speech conversion system based on HMM model state mapping
JP4829477B2 (en) Voice quality conversion device, voice quality conversion method, and voice quality conversion program
CN109346056A (en) Phoneme synthesizing method and device based on depth measure network
Xie et al. A KL divergence and DNN approach to cross-lingual TTS
CN106297765B (en) Phoneme synthesizing method and system
CN102254554A (en) Method for carrying out hierarchical modeling and predicating on mandarin accent
CN109036376A (en) A kind of the south of Fujian Province language phoneme synthesizing method
CN108172211A (en) Adjustable waveform concatenation system and method
CN106297766B (en) Phoneme synthesizing method and system
Shah et al. Nonparallel emotional voice conversion for unseen speaker-emotion pairs using dual domain adversarial network & virtual domain pairing
CN104575488A (en) Text information-based waveform concatenation voice synthesizing method
CN104916282A (en) Speech synthesis method and apparatus
Kayte et al. A Marathi Hidden-Markov Model Based Speech Synthesis System
CN102511061A (en) Method and apparatus for fusing voiced phoneme units in text-to-speech

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20170602

Address after: 100094, No. 4, building A, No. 1, building 2, wing Cheng North Road, No. 405-346, Beijing, Haidian District

Patentee after: Beijing Rui Heng Heng Xun Technology Co., Ltd.

Address before: 100190 Zhongguancun East Road, Beijing, No. 95, No.

Patentee before: Institute of Automation, Chinese Academy of Sciences

TR01 Transfer of patent right
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20181224

Address after: 100190 Zhongguancun East Road, Haidian District, Haidian District, Beijing

Patentee after: Institute of Automation, Chinese Academy of Sciences

Address before: 100094 No. 405-346, 4th floor, Building A, No. 1, Courtyard 2, Yongcheng North Road, Haidian District, Beijing

Patentee before: Beijing Rui Heng Heng Xun Technology Co., Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20190528

Address after: 310019 1105, 11 / F, 4 building, 9 Ring Road, Jianggan District nine, Hangzhou, Zhejiang.

Patentee after: Limit element (Hangzhou) intelligent Polytron Technologies Inc

Address before: 100190 Zhongguancun East Road, Haidian District, Haidian District, Beijing

Patentee before: Institute of Automation, Chinese Academy of Sciences

CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 310019 1105, 11 / F, 4 building, 9 Ring Road, Jianggan District nine, Hangzhou, Zhejiang.

Patentee after: Zhongke extreme element (Hangzhou) Intelligent Technology Co., Ltd

Address before: 310019 1105, 11 / F, 4 building, 9 Ring Road, Jianggan District nine, Hangzhou, Zhejiang.

Patentee before: Limit element (Hangzhou) intelligent Polytron Technologies Inc.