JP2709198B2 - Voice synthesis method - Google Patents
Voice synthesis methodInfo
- Publication number
- JP2709198B2 JP2709198B2 JP3044928A JP4492891A JP2709198B2 JP 2709198 B2 JP2709198 B2 JP 2709198B2 JP 3044928 A JP3044928 A JP 3044928A JP 4492891 A JP4492891 A JP 4492891A JP 2709198 B2 JP2709198 B2 JP 2709198B2
- Authority
- JP
- Japan
- Prior art keywords
- waveform
- waveforms
- representative
- speech
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Abstract
Description
【0001】[0001]
【産業上の利用分野】この発明は規則によって任意の音
声語を合成する方法、音声信号を情報圧縮して符号化
し、伝送、又は蓄積し、その圧縮音声信号を再合成する
方法に適用され、波形情報から音声を合成する音声合成
方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention is applied to a method of synthesizing an arbitrary speech word according to a rule, a method of compressing and transmitting information of a speech signal, transmitting or storing the compressed speech signal, and resynthesizing the compressed speech signal. The present invention relates to a speech synthesis method for synthesizing speech from waveform information.
【0002】[0002]
【従来の技術】音声の波形情報を全て記憶装置に蓄積し
ておき、この記憶装置を利用して音声合成をする音声合
成方式を実現するためには、記憶装置が大規模となる。
任意の音声語を合成する方式では、記憶容量の削減や、
多彩な合成音を出力するために、従来においては音声波
形から抽出した音声の特徴パラメータだけを記憶装置に
記憶し、必要に応じて特徴パラメータを読みだして合成
音を作成する、いわゆるパラメータ編集合成方式が主に
用いられていた。また、品質の高い合成音を得る目的
で、波形情報をそのまま蓄積しておく、波形編集合成方
式も用いられていた。2. Description of the Related Art A large-scale storage device is required to store all waveform information of a voice in a storage device and implement a voice synthesis method using the storage device to perform voice synthesis.
The method of synthesizing arbitrary spoken words reduces storage capacity,
Conventionally, in order to output a variety of synthesized sounds, only the feature parameters of the voice extracted from the voice waveform are stored in a storage device, and the feature parameters are read out as needed to create a synthesized voice, so-called parameter editing synthesis. The method was mainly used. Also, a waveform editing / synthesis method for storing waveform information as it is has been used for the purpose of obtaining a high-quality synthesized sound.
【0003】しかし、上記のパラメータ編集合成方式で
は、多彩なパラメータ制御が容易な反面、得られる合成
音の品質は肉声の音声に比較すると大きな隔たりがあっ
た。また、波形編集合成方式では、品質の良い合成音が
得られる反面、大容量の記憶装置が必要であった(例え
ば、特願昭63−115721)。音声波形の全てを記
憶しておくのではなく、音声区間の代表波形のみを用い
る方法として、線形予測分析で得られる残差信号に対し
て適用した例(特願昭56−179915)が有るが、
これは積極的にピッチに同期した波形補間法は用いてい
ない。また、ピッチに同期した波形補間法は、CV(子
音−母音)/VC(母音−子音)のセグメント間の接続
に利用されている例(三留他、昭和56年5月、日本音
響学会講演論文集、p431)があるが、合成単位全体
に対して、情報圧縮と高品質化を目的にした合成方式は
今までに無い。[0003] However, in the above parameter editing / synthesis method, although various parameter controls are easy, the quality of the synthesized sound obtained is largely different from that of the real voice. In the waveform editing / synthesizing method, a high-quality synthesized sound can be obtained, but a large-capacity storage device is required (for example, Japanese Patent Application No. 63-115721). There is an example (Japanese Patent Application No. 56-179915) applied to a residual signal obtained by linear prediction analysis as a method using only a representative waveform of a voice section instead of storing all voice waveforms. ,
This does not use a waveform interpolation method that is actively synchronized with the pitch. An example in which the waveform interpolation method synchronized with pitch is used for connection between segments of CV (consonant-vowel) / VC (vowel-consonant) (Mitsuru et al., May 1981, lecture by The Acoustical Society of Japan) Papers, p431), but there is no synthesis method for information compression and high quality for the entire synthesis unit.
【0004】この発明の目的は波形を合成して音声語を
得る方法において、記憶容量を大幅に削減可能とし、し
かも、高い品質の合成音を得ることができ、また、音声
を高能率で情報圧縮して伝送蓄積したものから高品質の
合成音を得ることができる音声合成方法を提供すること
にある。An object of the present invention is to provide a method of synthesizing a waveform to obtain a speech word, in which the storage capacity can be significantly reduced, a high-quality synthesized sound can be obtained, and the speech can be obtained with high efficiency. It is an object of the present invention to provide a speech synthesizing method capable of obtaining a high-quality synthesized sound from a compressed and transmitted and stored sound.
【0005】[0005]
【課題を解決するための手段】この発明によれば、音声
波形の異なる時点から抽出した複数個の代表波形から、
与えられたピッチ周期と持続時間とに従ってその代表波
形間の音声波形を、これら代表波形を用いて補間をして
連続音声を再合成する。前述したように、波形編集型音
声合成方式で高品質の合成音を得るためには、あらかじ
め合成に必要な音声波形を記憶装置に蓄積しておく必要
があるが、パラメータを蓄積しておく方式に比較する
と、記憶装置の規模が非常に大きくなり、経済化や小規
模化を実現するには波形情報の効率的な削減策が必要で
あった。また従来技術では、合成音声の品質を劣化させ
ることなくピッチ周期や持続時間の制御、つまり合成音
声の高低や発話速度の制御を行うことは非常に困難であ
った。この発明では、この音声波形を音声のピッチ周期
に着目して、選択された代表波形のみを用いる方法で能
率的に圧縮し、合成の際に聴覚的に歪が検知されないよ
うに処理再生するものであり、記憶容量削減効果と共
に、ピッチ周期や持続時間が変化した音声を高品質に合
成できる。According to the present invention, a plurality of representative waveforms extracted from different points in time of a speech waveform are obtained.
According to a given pitch period and duration, a voice waveform between the representative waveforms is interpolated using these representative waveforms to resynthesize a continuous voice. As described above, in order to obtain a high-quality synthesized sound by the waveform editing type speech synthesis method, it is necessary to previously store a speech waveform required for synthesis in a storage device. In comparison with the above, the scale of the storage device becomes very large, and an efficient measure for reducing the waveform information is required to realize economy and downsizing. Further, in the related art, it is very difficult to control the pitch period and the duration without deteriorating the quality of the synthesized voice, that is, to control the level of the synthesized voice and the utterance speed. In the present invention, this audio waveform is efficiently compressed by a method using only the selected representative waveform, paying attention to the pitch period of the audio, and is processed and reproduced so that distortion is not detected audibly during synthesis. In addition to the effect of reducing the storage capacity, it is possible to synthesize a voice having a changed pitch period or duration with high quality.
【0006】[0006]
【実施例】図1にこの発明を波形編集合成方法に適用し
た一実施例を示す。この音声合成は分析および蓄積過程
と合成過程との2段階がある。まず、音声の分析過程で
は、分析部11において音声入力端子12から原音声信
号とラベリング情報入力端子13から音韻ラベリング情
報とが最適波形ピーク位置探索回路14に入力される。
入力された原音声信号は視察によって音韻毎に区分化さ
れ、音韻記号と位置情報とが付加される。FIG. 1 shows an embodiment in which the present invention is applied to a waveform editing / synthesizing method. This speech synthesis has two stages: an analysis and accumulation process and a synthesis process. First, in the voice analysis process, the analysis unit 11 inputs the original voice signal from the voice input terminal 12 and the phoneme labeling information from the labeling information input terminal 13 to the optimum waveform peak position search circuit 14.
The input original audio signal is segmented for each phoneme by inspection, and phoneme symbols and position information are added.
【0007】次に、ピーク近傍波形切り出し回路15で
各音韻毎に、3個程度の代表波形に対してその近傍のピ
ーク位置を検索し、この位置情報を中心とする波形を切
り出し、これを代表波形としてその位置情報や音韻情報
等と共に記憶装置16に蓄積する。例えば図2Aに示す
ように1つの音韻を構成する原音声波形列P1 P2 P 3
…Pm …が入力され、これらから図2Bに示すように異
なる時点の3個の代表波形P1 、Pm 、Pn が抽出され
る。Next, a near-peak waveform extracting circuit 15
For each phoneme, about 3 representative waveforms
Search for the peak position and cut off the waveform centered on this position information.
And use this as a representative waveform for its position information and phoneme information.
And the like, and accumulate them in the storage device 16. For example, as shown in FIG. 2A
The original speech waveform sequence P that forms one phoneme1PTwoP Three
… Pm… Are input, and from these are different as shown in FIG. 2B.
Three representative waveforms P1, Pm, PnIs extracted
You.
【0008】音声合成過程では、合成部21において、
テキスト入力端子22から合成テキストがテキスト解析
部23に入力され、合成テキストが解析されて音韻系列
に変換され、波形読み出し回路24でその音韻系列から
必要な音声波形が記憶装置16から読みだし、ピッチ同
期補間回路25へ供給する。ピッチ同期補間回路25は
韻律情報生成回路26から与えられる合成ピッチ周期と
接続時間とに応じてその代表波形間の音声波形をその合
成ピッチ周期と同期して補間して一ピッチ毎に重ね合わ
せて得る。このときのピッチ毎の重ね合わせ窓形状は、
台形窓や余弦関数窓などが使える。In the speech synthesis process, the synthesizer 21
A synthesized text is input from a text input terminal 22 to a text analysis unit 23, the synthesized text is analyzed and converted into a phoneme sequence, and a waveform readout circuit 24 reads a necessary speech waveform from the phoneme sequence from the storage device 16 and generates a pitch. It is supplied to the synchronous interpolation circuit 25. The pitch synchronous interpolation circuit 25 interpolates the speech waveform between the representative waveforms in synchronism with the synthesized pitch cycle and superimposes the speech waveforms on a pitch-by-pitch basis in accordance with the synthesized pitch cycle and the connection time provided from the prosody information generating circuit 26. obtain. At this time, the overlapping window shape for each pitch is
You can use trapezoidal windows and cosine function windows.
【0009】例えば図2Bに示す代表波形P1 、Pm が
読み出され合成ピッチ周期TP 、持続時間Lが与えられ
た場合、代表波形P1 とPm との間の時間をLとし、周
期T P ごとに両波形から次のように補間され、合成波形
P1m(0) 、P1m(1) 、P1m(2) …P1m(k) の合成波形を
得る。 P1m(i) =P1 ×α(i)+Pm ×β(i) (1) ここで、P1m(0) =P1 ,P1m(k) =Pm であり、α
(i)及びβ(i)は代表波形P1 及びPm に対する重
み係数でそれぞれ式(2)と(3)で表わされる。For example, a representative waveform P shown in FIG.1, PmBut
Readout composite pitch period TP, Given the duration L
, The representative waveform P1And PmL is the time between
Period T PIs interpolated from both waveforms as
P1m(0), P1m(1), P1m(2)… P1m(k)
obtain. P1m(i) = P1× α (i) + Pm× β (i) (1) where P1m(0) = P1, P1m(k) = PmAnd α
(I) and β (i) are representative waveforms P1And PmHeavy against
And expressed by equations (2) and (3), respectively.
【0010】 α(i)=0.5×〔1+cos{π×(L−Ti)/L}〕 (2) β(i)=1−α(i) (3) ここで、iは合成波形番号、TiはPlm(0)からP
lm(i)の時間間隔、Lは代表波形PlとPmにおけ
る時間間隔を示す。同様の補間処理を時間的に隣接する
次の代表波形P m とP n について行う。このようにし
て、時間的に隣接した2つの代表波形間が補間された合
成波形は音韻毎の接続及び合成回路27で連続する音声
として出力される。Α (i) = 0.5 × [1 + cos {π × (L−Ti) / L}] (2) β (i) = 1−α (i) (3) where i is a composite waveform Number and Ti are P lm (0) to P
time interval lm (i), L represents a time interval in a representative waveform P l and P m. Similar interpolation processing is temporally adjacent
The following representative waveforms Pm and Pn are performed. In this way, a synthesized waveform obtained by interpolating between two representative waveforms that are temporally adjacent to each other is output as a continuous sound by the connection and synthesis circuit 27 for each phoneme.
【0011】以上の処理において、代表波形の選択方法
には、例えば当該音韻区間を複数のサブ区間に分割し、
その各サブ区間毎に波形ピークを求め、その波形を代表
波形とする方法や、当該音韻区間のスペクトル情報ある
いは波形情報の動特性などを目安に選択する方法、つま
りその音韻区間内で最大のピークを求め、その前側、後
側において、順次隣接波形の差分を見て大きく変化した
部分の波形と前記最大ピーク波形とを代表波形とする方
法などが考えられる。また、重み係数α(i)は、式
(4)のような簡単な線形関数も使用できる。In the above processing, the representative waveform selection method includes, for example, dividing the phoneme section into a plurality of sub-sections,
A method of obtaining a waveform peak for each of the sub-intervals and using the waveform as a representative waveform, or a method of selecting the spectrum information or the dynamic characteristics of the waveform information of the relevant phoneme interval as a guide, that is, a method of selecting the largest peak in the phoneme interval A method is conceivable in which the difference between adjacent waveforms is sequentially determined on the front side and the rear side, and the waveform of the portion that greatly changes and the maximum peak waveform are used as the representative waveform. In addition, a simple linear function such as Expression (4) can be used as the weight coefficient α (i).
【0012】 α(i)=(L−Ti)/L (4) この実施例では、音韻を単位とする音声合成方式を例と
して説明したが、音節などその他の合成単位に基づく音
声合成でも利用可能であることは明らかである。更にこ
の発明は音声の高能率符号化における音声合成にも適用
することができる。つまり、図2Bにおける代表波形の
みを符号化して伝送、又は記憶し、伝送路又は記憶装置
における情報圧縮を計り、受信側又は記憶読み出し側で
前述したこの発明方法に従って音声合成することで分析
合成音が得られる。さらに、波形編集音声合成の場合
は、全合成単位について得られた全体の代表波形をベク
トル量子化手法などのクラスタリング技術を用いること
で、いくつかの類似の代表波形をさらに1つの波形で代
表させ、情報圧縮率を高めることもできる。Α (i) = (L−Ti) / L (4) In this embodiment, a speech synthesis method using phonemes as a unit has been described as an example, but it is also used in speech synthesis based on other synthesis units such as syllables. Clearly, it is possible. Further, the present invention can be applied to speech synthesis in high-efficiency speech coding. In other words, only the representative waveform in FIG. 2B is encoded and transmitted or stored, the information is compressed in the transmission path or the storage device, and the synthesized voice is analyzed on the receiving side or the storage and reading side according to the above-described method of the present invention. Is obtained. Furthermore, in the case of waveform editing speech synthesis, several similar representative waveforms are further represented by one waveform by using a clustering technique such as a vector quantization method for the entire representative waveform obtained for all synthesis units. Also, the information compression ratio can be increased.
【0013】上述では1つの音韻について3つの代表波
形を用いたが、2つ以上の代表波形があればよい。 In the above description, one phoneme has three representative waves.
Although the shape is used, it is sufficient if there are two or more representative waveforms.
【0014】[0014]
【発明の効果】以上説明したように、この発明によれば
代表波形を用いることにより音声波形情報を能率良く圧
縮再生できるため、波形編集形音声合成方式において記
憶容量を大幅に削減でき、かつ伝送/蓄積で高能率符号
化することができ、しかも、高品質の合成音を得ること
が可能である。また、波形の補間処理をピッチ周期に同
期して行い、かつ接続時間を選ぶことにより原音声の品
質を劣化させることなく発声速度や音声の高低の制御が
可能である。As described above, according to the present invention, since the voice waveform information can be efficiently compressed and reproduced by using the representative waveform, the storage capacity can be greatly reduced in the waveform editing type voice synthesis system and the transmission can be performed. / Accumulation enables high-efficiency encoding and obtains a high-quality synthesized sound. Further, by performing the waveform interpolation processing in synchronization with the pitch cycle and selecting the connection time, it is possible to control the utterance speed and the level of the voice without deteriorating the quality of the original voice.
【図1】この発明方法の一実施例を示す構成図。FIG. 1 is a configuration diagram showing one embodiment of the method of the present invention.
【図2】原波形、代表波形、合成波形の例を示す図。FIG. 2 is a diagram showing examples of an original waveform, a representative waveform, and a composite waveform.
Claims (1)
しておき、合成時、与えられた音声単位系列に対応する
音声波形を前記記憶装置より読み出して連続音声を合成
する方法において、前記記憶装置には、前記音声単位内の音声波形からピッ
チ周期に対応して得られる波形の中から異なる時点の少
なくとも2つ以上の波形を代表波形として選択して記憶
し、 合成時には、与えられた音声単位系列の音声単位に対応
した前記代表波形を前記記憶装置より読み出し、 代表波形の中の時間的に隣り合う2つの代表波形の組の
各々に対し、2つの代表波形間の音声波形が与えられた
持続時間に応じた長さだけ連続するように、与えられた
ピッチ周期に同期した波形の列を前記2つの代表波形に
より重み付け合成により生成し、生成した波形の列によ
り隣り合う2つの代表波形間を補間することにより、音
声単位の音声波形を合成する ことを特徴とする音声合成
方法。1. A is stored in the storage device a speech waveform of each speech unit, the synthesis, a method for synthesizing a continuous speech voice corresponding to the speech unit sequence given waveform read from the storage device, The storage device stores a pitch from the audio waveform in the audio unit.
From the waveform obtained corresponding to the
Select and store at least two or more waveforms as representative waveforms
And, at the time of synthesis, corresponding to the speech unit of the speech unit sequence given
The representative waveform is read from the storage device , and a set of two temporally adjacent representative waveforms in the representative waveform is
Speech waveforms between two representative waveforms were given for each
Given to be continuous for a length corresponding to the duration
A sequence of waveforms synchronized with the pitch period is converted to the two representative waveforms.
Generated by weighting synthesis
By interpolating between two representative waveforms adjacent to each other,
A speech synthesis method comprising synthesizing a speech waveform for each voice.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP3044928A JP2709198B2 (en) | 1991-03-11 | 1991-03-11 | Voice synthesis method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP3044928A JP2709198B2 (en) | 1991-03-11 | 1991-03-11 | Voice synthesis method |
Publications (2)
Publication Number | Publication Date |
---|---|
JPH04281499A JPH04281499A (en) | 1992-10-07 |
JP2709198B2 true JP2709198B2 (en) | 1998-02-04 |
Family
ID=12705138
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP3044928A Expired - Fee Related JP2709198B2 (en) | 1991-03-11 | 1991-03-11 | Voice synthesis method |
Country Status (1)
Country | Link |
---|---|
JP (1) | JP2709198B2 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003108178A (en) | 2001-09-27 | 2003-04-11 | Nec Corp | Voice synthesizing device and element piece generating device for voice synthesis |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS56168700A (en) * | 1980-05-30 | 1981-12-24 | Nippon Electric Co | Waveform edition type voice synthesizer |
-
1991
- 1991-03-11 JP JP3044928A patent/JP2709198B2/en not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
JPH04281499A (en) | 1992-10-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR100385603B1 (en) | Voice segment creation method, voice synthesis method and apparatus | |
US4912768A (en) | Speech encoding process combining written and spoken message codes | |
JP2782147B2 (en) | Waveform editing type speech synthesizer | |
US4821324A (en) | Low bit-rate pattern encoding and decoding capable of reducing an information transmission rate | |
KR100615480B1 (en) | Speech bandwidth extension apparatus and speech bandwidth extension method | |
US5682502A (en) | Syllable-beat-point synchronized rule-based speech synthesis from coded utterance-speed-independent phoneme combination parameters | |
JP3891309B2 (en) | Audio playback speed converter | |
US5369730A (en) | Speech synthesizer | |
JPH09330097A (en) | Voice reproducing device | |
JP2709198B2 (en) | Voice synthesis method | |
US4601052A (en) | Voice analysis composing method | |
JPS595916B2 (en) | Speech splitting/synthesizing device | |
JPH03233500A (en) | Voice synthesis system and device used for same | |
JPH0854895A (en) | Reproducing device | |
JP3561654B2 (en) | Voice synthesis method | |
KR970003092B1 (en) | Method for constituting speech synthesis unit and sentence speech synthesis method | |
KR100359988B1 (en) | real-time speaking rate conversion system | |
JPH11311997A (en) | Sound reproducing speed converting device and method therefor | |
JPH09258796A (en) | Voice synthesizing method | |
JP2861005B2 (en) | Audio storage and playback device | |
JP2956936B2 (en) | Speech rate control circuit of speech synthesizer | |
JP3284634B2 (en) | Rule speech synthesizer | |
JP2000099094A (en) | Time series signal processor | |
JPWO2003042648A1 (en) | Speech coding apparatus, speech decoding apparatus, speech coding method, and speech decoding method | |
JPH01283600A (en) | Residue driving type speech synthesis device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
LAPS | Cancellation because of no payment of annual fees |