JPH0376479B2

JPH0376479B2 -

Info

Publication number: JPH0376479B2
Application number: JP56016331A
Authority: JP
Inventors: Shinichi Tamura; Hiroshi Yasuda
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1981-02-06
Filing date: 1981-02-06
Publication date: 1991-12-05
Also published as: JPS57130097A

Description

【発明の詳細な説明】この発明はケプストラム分析により形成した音
韻決定用のスペクトル包絡特性を有するデジタル
データに基いて音素片データを作成するようにし
た新規かつ有用な音素片データの作成方法に関す
る。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a novel and useful method for creating phoneme segment data, in which phoneme segment data is created based on digital data having spectral envelope characteristics for phoneme determination formed by cepstral analysis.

人間の音声のうち、例えば“ア”のような音声
の振動を伴なう有音声は、話す速度によつても異
なるが数10ｍsec程度の短かい区間についてみれ
ば、同じような形をした波形の繰り返しからでき
ている。従つて、有声音を合成するには、いくつ
かの繰り返し単位の波形（その長さＬをピツチと
いう、第１図参照）を用意しておき（例えば
“ア”に対する波形、“イ”に対する波形など）、
各々決められた順序に従つて所要の繰り返し数だ
け単位波形を繰り返し再生すればよい。また、例
えば“シ”の始めの音のように声帯の振動を伴な
わない無声音を合成するには、無声音をそのまま
用意するか、あるいは有声音と同じくある長さの
波形を用意しておき、これを数回繰り返して再生
すればよい。 Among human voices, for example, a voice with vibrations such as "a" has a waveform that has a similar shape, although it varies depending on the speaking speed, if you look at a short interval of about 10 msec. It is made up of repetitions of Therefore, to synthesize voiced sounds, prepare several repeating unit waveforms (the length L is called pitch, see Figure 1) (for example, the waveform for "a", the waveform for "i"). Such),
The unit waveforms may be repeatedly reproduced for the required number of repetitions in each predetermined order. In addition, to synthesize an unvoiced sound that does not involve vibration of the vocal cords, such as the first sound of "shi", you can either prepare the unvoiced sound as is, or prepare a waveform of the same length as the voiced sound. You can repeat this several times to play.

このような音声合成方法には音素片合成方法と
呼ばれ、繰り返しの単位となる波形を音素片とい
う。 This type of speech synthesis method is called a phoneme piece synthesis method, and the waveform that is a unit of repetition is called a phoneme piece.

従つて、この音素片合成方式では、いかにして
能率よく音響的特性（物理的特性）の良い音素片
を原音声から抽出するかが、合成された音声の品
質を左右することになる。 Therefore, in this phoneme segment synthesis method, the quality of synthesized speech depends on how efficiently phonemes with good acoustic properties (physical properties) are extracted from the original speech.

従来から知られている音素片データの作成方法
の一例を次に示す。 An example of a conventionally known method for creating phoneme piece data is shown below.

まず、原音声から音素片を抽出する場合に、原
音声信号をＡ−Ｄ変換してメモリーに格納すると
共に、必要なときメモリーから信号を読み出して
これをデイスプレー装置に供給し原音声信号波形
を映し出しながら、音素片となる単位波形を抽出
する方法がある。例えば、第２図において、区間
Ｋは同じような波形の繰り返しなので、，，
のうちのいずれかの波形を音素片として抽出す
る方法である。 First, when extracting phoneme fragments from the original speech, the original speech signal is A-D converted and stored in memory, and when necessary, the signal is read from the memory and supplied to the display device to form the original speech signal waveform. There is a method of extracting unit waveforms that become phoneme pieces while projecting the phoneme. For example, in Figure 2, section K is a repetition of a similar waveform, so...
This method extracts one of the waveforms as a phoneme piece.

ところが、この方法では音素片としては、，
，の波形のいずれでも使用することができる
から、音素片を抽出する基準がない。また、〜
の波形は厳密には夫々若干異るので〜のう
ちのいずれかに固定して音素片を抽出するように
すると、音声の物理的性質とは何ら関係のない音
素片を抽出するようなことにもなりかねない。 However, with this method, as a phoneme piece,
, any of the waveforms can be used, so there is no standard for extracting phoneme pieces. Also,~
Strictly speaking, the waveforms of are slightly different from each other, so if you fix one of ~ to extract a phoneme, you will end up extracting a phoneme that has nothing to do with the physical properties of the voice. It could also happen.

また、この方法では区間Ｋを１つの単位波形の
周波数特性で代表させるものであるから、音質上
の劣化を招く。すなわち、区間Ｋをデイスクリー
トフーリエ変換すると、第３図に示すような周波
数−振幅特性、すなわちスペクトル強度特性が得
られる。このスペクトル強度特性に示される激し
い凹凸は区間Ｋが同じような波形の繰り返しであ
るために生じたもので、この区間Ｋの音韻情報は
破線で示すスペクトル包絡が担つている。上述の
ように１つの単位波形の周波数特性で区間Ｋの周
波数特性を代表させると、第３図に示すようなス
ペクトル包絡が得られず、そのめに音質が劣化す
ることになる。 Furthermore, in this method, the section K is represented by the frequency characteristic of one unit waveform, which leads to deterioration in sound quality. That is, when section K is subjected to discrete Fourier transform, frequency-amplitude characteristics, ie, spectral intensity characteristics, as shown in FIG. 3 are obtained. The severe unevenness shown in this spectral intensity characteristic is caused by the repetition of similar waveforms in section K, and the phonological information in section K is carried by the spectral envelope shown by the broken line. If the frequency characteristics of section K are represented by the frequency characteristics of one unit waveform as described above, a spectral envelope as shown in FIG. 3 will not be obtained, and the sound quality will therefore deteriorate.

従つて、このような抽出方法による音素片デー
タの作成方法では原音声に忠実な音声合成を実現
できない。 Therefore, the method of creating phoneme segment data using such an extraction method cannot realize speech synthesis that is faithful to the original speech.

また、他の方法として音声信号が定常とみなせ
る区間ごとにLPC（Linear Predictive Coding）
分析を行ない、その区間のLPC分析により得ら
れたスペクトル包絡を音素片のスペクトル特性に
もつようにLPC合成でその音素片を作成する作
成方法もあるが、この方法では全極モデルを仮定
しているために第４図に示すように、その区間Ｋ
のスペクトル包絡のうち、凹部よりも凸部をよく
近似する。 Another method is to use LPC (Linear Predictive Coding) for each section where the audio signal can be considered stationary.
There is also a creation method in which a phoneme segment is created by LPC synthesis so that the spectral envelope obtained by LPC analysis of that interval is the spectral characteristic of the phoneme segment, but this method assumes an all-pole model. Therefore, as shown in Figure 4, the section K
of the spectrum envelope, the convex portions are better approximated than the concave portions.

そのため、鼻音のようにスペクトル包絡の急激
な凹部が聞えの本質的な役割をはたす音素片は、
この方法によつて形成できない。 Therefore, for phonemes such as nasal sounds, where a sharp concavity in the spectral envelope plays an essential role in hearing,
cannot be formed by this method.

また人間の耳は短区間（数10ｍsec）のスペク
トルの位相に対して比較的鈍いことを利用して、
音素片データをメモリーする際、デイスクリート
フーリエ変換により音素片を周波数領域のデータ
に変換したのち、この振幅データの位相を０相ま
たはπ相に振り分け、その後逆デイスクリートフ
ーリエ変換して、例えば第５図に示すような対称
波形（偶関数）の信号を形成することによつて、
軸対称である一方のデータのみメモリーするよう
にして使用するメモリー容量を削減するようにし
たものがある。 Also, taking advantage of the fact that the human ear is relatively dull to the phase of the spectrum over a short period (several tens of milliseconds),
When storing phoneme piece data, the phoneme piece is converted into frequency domain data by discrete Fourier transform, the phase of this amplitude data is divided into 0 phase or π phase, and then inverse discrete Fourier transform is performed to convert the phoneme piece into frequency domain data, for example. By forming a signal with a symmetrical waveform (even function) as shown in Figure 5,
There is a method that reduces the amount of memory used by storing data on only one side of the axis.

ところが、このような方法によつて音素片デー
タを形成した場合には、対称波形の両端ｅ（第５
図）の値は、有声音によつて相異するため、常に
一定の値になることはない。従つて、この音素片
データを使つて音声合成する場合、第６図で示す
ように異なる音素片V₁，V₂の接続部分が不連続
になつてることがあり、これによつて合成された
音声の品質が劣化してしまう。 However, when phoneme piece data is formed by such a method, both ends e (fifth
The value shown in the figure) differs depending on the voiced sound, so it is not always a constant value. Therefore, when performing speech synthesis using this phoneme piece data, the connection between different phoneme pieces V ₁ and V ₂ may become discontinuous, as shown in Figure 6, and this may cause the synthesized The quality of the audio deteriorates.

また、音素片にピツチを付加する方法として、
データ“０”を詰める方法がある。これは例えば
第７図で示すように、ピツチP₀より短かい音素
片に対し、この音素片をピツチP₀の長さにした
い場合に、その不足部分にデータ“０”を補つて
希望するピツチP₀をもつた音素片を作成するよ
うにしたものである。しかし、この音素片データ
はデータ“０”を補うものであるから、本来の音
素片データとは異なつたものとなり、そのため、
音素片の周波数特性が変り、合成音声の品質は当
然のことながら劣化する。 In addition, as a method of adding pitch to phoneme pieces,
There is a way to pad the data with "0". For example, as shown in Fig. 7, if you want to make a phoneme piece shorter than pitch P ₀ to the length of pitch P ₀ , you can fill in the missing part with data "0". This is to create a phoneme segment with pitch P ₀ . However, since this phoneme piece data supplements the data "0", it is different from the original phoneme piece data, and therefore,
The frequency characteristics of the phonemes change, and the quality of the synthesized speech naturally deteriorates.

そこで、この発明ではこのような点を考慮して
音響的特性のよい音素片を原音声から抽出して音
素片データを形成できるようにしたものである。
続いて、第８図を参照してこの発明の一例を詳細
に説明する。 Therefore, in the present invention, taking these points into consideration, it is possible to extract phoneme pieces with good acoustic characteristics from the original speech to form phoneme piece data.
Next, an example of the present invention will be explained in detail with reference to FIG.

まず、原音声をマイクロフオンなどを用いてア
ナログ音声信号（電気信号）に変換し、これをゲ
ート１に供給して適当な時間長ｌ（20ｍsec程度だ
け切り出したのち（第９図Ａ）、Ａ−Ｄ変換器２
にて所定ビツト数のデジタル信号Saに変換する。
Ａ−Ｄ変換器２のサンプリングレートは6kHz位
に設定される。このデジタル信号Saは後段の回
路３で原音声か有音声であるか無声音であるかの
判別が行なわれて判別出力Ｖ／VLが出力される
と共に、有声音である場合にはピツチ周期P₀の
抽出が行なわれて、そのデータが出力される。そ
の出力を説明の便宜上同じくP₀とする。 First, the original audio is converted into an analog audio signal (electrical signal) using a microphone, etc., and this is supplied to gate 1, and after cutting out an appropriate time l (about 20 msec) (Fig. 9A), -D converter 2
is converted into a digital signal Sa of a predetermined number of bits.
The sampling rate of the A-D converter 2 is set to about 6kHz. This digital signal Sa is discriminated in the subsequent circuit 3 as to whether it is the original speech, a voiced sound, or an unvoiced sound, and a discrimination output V/VL is output, and if it is a voiced sound, the pitch period P ₀ is output. is extracted and the data is output. For convenience of explanation, the output is also referred to as P ₀ .

続いて、このデジタル信号Saより、音素片の
音韻的な情報をもつたスペクトル包絡が抽出され
る。このスペクトル包絡の抽出のためにケプスト
ラム分析器１０が使用される。 Next, a spectral envelope containing phonetic information of the phoneme is extracted from this digital signal Sa. A cepstral analyzer 10 is used for extracting this spectral envelope.

すなわち、デジタル信号Saはデイスクリート
高速フーリエ変換によつて、その音声にとり、よ
り物理的な意味を有する周波数領域のパワースペ
クトルに変換する（ステツプ○イ）。このパワース
ペクトルSbの概形を第９図Ｂに示す。ステツプ
○ロにおいて、パワースペクトルSbの絶対値の対
数をとつたのち、第３図に示す原音声のスペクト
ル包絡Scの情報を抽出するため、このパワース
ペクトルSbを信号波形とみなして高速フーリエ
逆変換を施す（ステツプ○ハ）。フーリエ逆変換に
よつて得られた波形が第９図Ｃのケプストラム
Seである。 That is, the digital signal Sa is converted by discrete fast Fourier transform into a power spectrum in a frequency domain that has a more physical meaning for the voice (step ○i). The outline of this power spectrum Sb is shown in FIG. 9B. In Step ○Pro, after taking the logarithm of the absolute value of the power spectrum Sb, in order to extract information on the spectral envelope Sc of the original speech shown in Figure 3, this power spectrum Sb is treated as a signal waveform and subjected to fast Fourier inverse transformation. (Step ○c). The waveform obtained by inverse Fourier transform is the cepstrum in Figure 9C.
It is Se.

次に、ステツプ○ニにおいて、ピツチ周期P₀に
基づきこのピツチ周期P₀以上の高ケフレンシ部
のデータを零にし（第９図Ｄ）、得られた低ケフ
レンシ部Sfをステツプ○ホにおいて再度高速フーリ
エ変換することにより、ステツプ○イでの高速フー
リエ変換出力のうちの低周波成分Sgが得られ、
この低周波成分はさらにステツプ○ヘにおいて逆対
数化されて第９図Ｅに示すスペクトル包絡Sh（第
１のデジタル信号）が求められる。 Next, in step ◯◯◯, data of the high quenching frequency part with pitch period P ₀ or more is set to zero based on the pitch period P ₀ (Fig. 9D), and the obtained low quenching rate part Sf is set to high speed again in step ◯◯. By performing Fourier transform, the low frequency component Sg of the fast Fourier transform output in step ○I is obtained,
This low frequency component is further anti-logarithmized in step ◯ to obtain the spectral envelope Sh (first digital signal) shown in FIG. 9E.

このようにケプストラム分析によつて抽出され
たスペクトル包絡Shはもとのデータのスペクト
ル包絡Scの凸部と凹部の双方を一様に近似した
包絡特性となるから、原音声の音韻情報を十分に
保有し、従つてどのような原音声に対しても常に
一定の音質が保証されることになる。 In this way, the spectral envelope Sh extracted by cepstral analysis has an envelope characteristic that uniformly approximates both the convex and concave parts of the spectral envelope Sc of the original data, so it can sufficiently capture the phonological information of the original speech. Therefore, a constant sound quality is always guaranteed for any original sound.

さて、上述したようにＡ−Ｄ変換器２で得られ
た複数の符号化デジタルデータはケプストラム分
析器１０に供給されて原音声に対する１つのスペ
クトル包絡Shが形成される訳であるが、このス
ペクトル包絡Shにはピツチ周期P₀の情報が全く
含まれていない。従つて、ステツプ○トにおいてピ
ツチ周期P₀の情報に基いてスペクトル包絡Shが
再編成される。すなわち、Ａ−Ｄ変換器２のサン
プリング周波数f_sによつて決定される、ピツチ周
期P₀内に含まれるスペクトル包絡Shを形づくる
データが、補間を使つてf_s・P₀個のデーテ（第２
のデジタル信号）に再編成（再サンプリング）さ
れる。 Now, as mentioned above, a plurality of encoded digital data obtained by the A-D converter 2 are supplied to the cepstrum analyzer 10 to form one spectral envelope Sh for the original voice. The envelope Sh does not include any information about the pitch period P ₀ . Therefore, in step O, the spectral envelope Sh is reorganized based on the information of the pitch period _P0 . In other words, the data forming the spectral envelope Sh included within the pitch period P ₀ determined by the sampling frequency f _s of the A/D converter 2 is divided into f _s · P ₀ data (the 2
digital signal) is reorganized (resampled).

ピツチ周期P₀の情報に基いて再編成されたス
ペクトル包絡Shを構成するデータは、さらに次
のステツプ○チにおいて音韻決定要素及び奇関数へ
の関数変換要素が付加される。つまり、スペクト
ル包絡Shを構成するデータが虚軸（＋π／２、−π／２の軸）上のデータに位相変換されて出力波形でみ
た場合、すべて奇関数となるように関数変換処理
が行なわれると共に、＋π／２軸と−π／２軸とへのデータ位相の振り分けを行つて、所定のケフレンシ
にスペクトラムを集中させる。データ位相の振り
分け方によつてスペクトラムの集中するケフレン
シが異なり、この相異は音韻の相異となる。従つ
て、このデータの振り分けによつてスペクトル包
絡Shに音韻要素が付加されたことになる。 In the next step, a phoneme determining element and a function conversion element to an odd function are added to the data constituting the spectral envelope Sh that has been reorganized based on the information of the pitch period _P0 . In other words, when the data constituting the spectral envelope Sh is phase-converted to data on the imaginary axis (+π/2, -π/2 axes) and viewed as an output waveform, the function conversion process is performed so that all the data become odd functions. At the same time, the data phase is distributed to the +π/2 axis and the −π/2 axis to concentrate the spectrum at a predetermined quefrency. The quefrency at which the spectrum concentrates differs depending on how the data phase is distributed, and this difference is a difference in phoneme. Therefore, by allocating this data, a phonological element is added to the spectral envelope Sh.

なお、取扱う原音声が無声音である場合には、
乱数などを用いてデータを振り分ければよい。従
つて、このデータの振り分けは無声音、有声音の
判別出力Ｖ／VLを参照する。 In addition, if the original audio to be handled is unvoiced,
The data can be sorted using random numbers or the like. Therefore, the distribution of this data refers to the discrimination output V/VL for unvoiced sounds and voiced sounds.

このように音韻決定要素と関数変換要素を付加
した後は、ステツプ○リにおいてさらにこのスペク
トル包絡Shを逆デイスクリートフーリエ変換し
て周波数領域のデータから時間領域のデータ（音
素片用のデジタル信号Si）を形成する。このデジ
タル信号Siは切り出された音声信号区間、すなわ
ち音素片に対応する情報圧縮された時間領域での
ピツチ周期P₀に相当するデータであつて、出力
波形は逆対称波形になる（第９図Ｆ）。このデー
タにはさらに、同一データの繰り返し回数ｎ（ｎ
＝ｌ／P₀）を示すデータが付加されたのち、ス
テツプ○ルにおいてそのデータが最終的な音素片デ
ータとしてメモリーに格納される。 After adding the phoneme determining element and the function transformation element in this way, in step ○, this spectral envelope Sh is further inversely discrete Fourier transformed to convert frequency domain data to time domain data (digital signal Si for phoneme segment). ) to form. This digital signal Si is data corresponding to the pitch period P ₀ in the time domain in which information is compressed corresponding to the extracted audio signal section, that is, the phoneme segment, and the output waveform is an antisymmetric waveform (Fig. 9). F). This data further includes the number of repetitions n(n
After data indicating (=l/P ₀ ) is added, the data is stored in the memory as final phoneme piece data in step ○.

以上のような音素片データ処理を次の音声区間
においても行ない、以上の操作を音声信号がなく
なるまで繰り返す。 The above-described phoneme piece data processing is also performed in the next voice section, and the above operations are repeated until there are no more voice signals.

音声を合成するには、メモリー等に格納された
音素片データをｎ回繰り返して使用すると共に、
この処理を定められた順序に従つて行うことによ
り、必要とする音声を合成することができる。 To synthesize speech, the phoneme data stored in memory etc. is used repeatedly n times, and
By performing this processing in a predetermined order, it is possible to synthesize the required speech.

以上説明したようにこの発明によれば従来のデ
ータ形成方法に比し次のような特徴を有する。 As explained above, the present invention has the following features compared to conventional data forming methods.

すなわち、まず音声の音韻的な情報はパワース
ペクトルの包絡線が担つているが、ケプストラム
分析器１０を使用して音声信号からスペクトル包
絡を抽出する方法を採ると、スペクトル包絡Sc
の凸部と凹部を一様に近似したスペクトル包絡
Shを抽出することができるので、このスペクト
ル包絡Shに基いて音素片データを作成するこの
発明のデータ作成方法によれば、求めようとする
音声の音韻的な情報のほぼすべてをデータとして
格納できるため、常に一定の音質を確保できる。 That is, first of all, the phonological information of speech is carried by the envelope of the power spectrum, but if the method of extracting the spectral envelope from the speech signal using the cepstral analyzer 10 is adopted, the spectral envelope Sc
Spectral envelope that uniformly approximates the convex and concave parts of
Since Sh can be extracted, the data creation method of the present invention, which creates phoneme piece data based on this spectral envelope Sh, allows almost all of the phonological information of the speech to be obtained to be stored as data. Therefore, constant sound quality can always be ensured.

また、この発明ではこのスペクトル包絡Shに
ピツチ周期P₀の情報を付加して、このスペクト
ル包絡Shを形成するデータを再編成したので、
従来のように希望するピツチ周期に音素片の長さ
を揃えるため、その音素片データとは全く無関係
なデータ“０”を詰めるようにしたものに比べ、
音素片の周波数特性が劣化せず、従つて合成音声
の品質が低下しないで済む。すなわち、より原音
声に近い音質が得られる。 In addition, in this invention, information on the pitch period P ₀ is added to this spectral envelope Sh, and the data forming this spectral envelope Sh is reorganized.
Compared to the conventional method, in which the length of a phoneme piece is made to match the desired pitch period, data "0", which is completely unrelated to the phoneme piece data, is padded.
The frequency characteristics of the phoneme pieces do not deteriorate, so the quality of the synthesized speech does not deteriorate. In other words, a sound quality closer to the original sound can be obtained.

そして、この発明ではさらに音素片の波形が逆
対称化波形となるようにデータを変換したので、
波形の両端ｅは必ず零になる。そのため、異なる
音素片の波形の間で不連続になることがないか
ら、従来のような音質の劣化は生じない。 In addition, in this invention, the data is further converted so that the waveform of the phoneme piece becomes an inversely symmetrized waveform.
Both ends e of the waveform are always zero. Therefore, there is no discontinuity between the waveforms of different phoneme pieces, so the deterioration of sound quality as in the conventional case does not occur.

なお、上述の実施例の中で、ステツプ○ハと○ホは
入れ換えてもよい。音素片データとしては逆対称
化された波形のすべてのデータを格納してもよい
が、逆対称化されているので、一方の波形のデー
タだけを格納し、読出し時はこの波形データに基
づき残りの波形データを形成して使用するよう
に、してもよい。ピツチ周期P₀はケプストラム
の高ケフレンシ部から抽出してもよい。 Incidentally, in the above-described embodiment, steps ○C and ○H may be interchanged. It is possible to store all the data of the inversely symmetrical waveform as phoneme data, but since it is inversely symmetrical, only the data of one waveform is stored, and the remaining waveform data is read out based on this waveform data. The waveform data may be formed and used. The pitch period P ₀ may be extracted from the high quefrency part of the cepstrum.

[Brief explanation of drawings]

第１図は有声音の波形の一例を示す図、第２図
は音素片抽出のための説明図、第３図は音素片の
パワースペクトル図、第４図はLPC合成に基づ
くパワースペクトル図、第５図は音素片を対称化
した波形の説明図、第６図はこの対称化波形の合
成の説明図、第２図はピツチ付加の説明図、第８
図はこの発明による音素片データの作成方法の一
例を示す信号処理工程図、第９図はその動作説明
に供する波形図である。１０はケプストラム分析器、P₀はピツチ周期、
ｎはP₀の繰り返し回数、Shはスペクトル包絡で
ある。 Fig. 1 is a diagram showing an example of the waveform of a voiced sound, Fig. 2 is an explanatory diagram for phoneme segment extraction, Fig. 3 is a power spectrum diagram of a phoneme segment, and Fig. 4 is a power spectrum diagram based on LPC synthesis. Fig. 5 is an explanatory diagram of a waveform obtained by symmetricalizing phoneme pieces, Fig. 6 is an explanatory diagram of the synthesis of this symmetrical waveform, Fig. 2 is an explanatory diagram of pitch addition, and Fig. 8
The figure is a signal processing process diagram showing an example of the method for creating phoneme piece data according to the present invention, and FIG. 9 is a waveform diagram for explaining the operation. 10 is the cepstrum analyzer, P ₀ is the pitch period,
n is the number of repetitions of P ₀ and Sh is the spectral envelope.

Claims

[Claims] 1. A method for creating phoneme data based on spectral envelope characteristics obtained by cepstral analysis of an audio signal, which method involves reproducing a digital signal representing the spectral envelope characteristics based on pitch information of the audio signal. A method for creating phoneme piece data, characterized in that a digital signal obtained by sampling and resampling is used as phoneme piece data. 2. A method of creating phoneme segment data based on spectral envelope characteristics obtained by cepstral analysis of a speech signal, which involves resampling a digital signal representing the spectral envelope characteristics based on pitch information of the speech signal and performing the resampling. Therefore, a method for creating phoneme piece data, which comprises performing a function conversion process to add an odd function element to the obtained digital signal, and using the digital signal subjected to the function conversion process as phoneme piece data. .