WO2004072951A1

WO2004072951A1 - Multiple speech synthesizer using pitch alteration method

Info

Publication number: WO2004072951A1
Application number: PCT/KR2003/001238
Authority: WO
Inventors: Myungjin Bae
Original assignee: Kwangwoon Foundation
Priority date: 2003-02-13
Filing date: 2003-06-24
Publication date: 2004-08-26
Also published as: KR20030031936A

Abstract

This invent is about implementing a speech synthesizer that changing pitch, important voice characteristics of speech parameter, so that a voice on the microphone could be synthesis into several different voices. Generally speaking, voice is identified as a signal that comes through the excitation and the filter in the speech production model, and the excitation signal could be modeled by using the formant components. The shape of formant changes by geometrical shape of vocal tract. The pitch is produced by periodic vibration of the vocal cords, and it is parameter that reacts sensitively of human's auditory sense. Because of those characteristics, pitch is used to distinguish speaker's sound signal, and it gives huge effect on naturalness of sound signal. Thus, accurate interpretation of pitch is the essential element that choose the tone quality of speech synthesis and speech coding.

Description

MULTIPLE SPEECH SYNTHESIZERUSING PITCH ALTERATIONMETHOD

Technical Field

This invention could be divided into speech communication technology or Audio Signal process because it is about transferring a voice to group voice by alternating pitch. Technique that is used is synthesizing from one voice to one voice with different pitch, and it has disadvantage that could not synthesis diverse voices. This invention amend this advantages and make diverse voices.

Background Art

This invention suggests that transferring a speech to multiple-speech by changing pitch, which is important sound characteristic parameter. Fig. 1 is sound production model. The input from the lung through the vocal cords to vocal tract could be divided into two, and it is possible that voiced sound is used for impulse train signal, which is based on periodic of pitch, and unvoiced is used for modeling with random noises .

One that switch those two signals is doubled by input signal energy, and the one is through filter to make voice signal. If you interpret the voice signal according to voice alternation model, it contains formant information, which is excitation information that appears human's characteristics and emotions, and vocal tract filter that shows the communication. The pitch that shows excitation information is produced by vibration of vocal cords, and it is parameter that reacts sensitively by human's auditory system. So it is used to distinguish voices of people, and it gives huge effect on naturalness of voice signal. If you alter these pitches, you could produce diverse synthesis sound.

Disclosure of the Invention

Pitch alternation system organized as fig. 2. The interpretation of pitch alternation system detect the original signal ' s pitches from microphone input and purpose signal's pitches, and sent them to alternation rule production part. In alternation rule production part, they use them to determine rate of changing pitch, and fine suitable alternating system.

This pitch alternation regulation offers to real pitch alternation, and changing pitch of original signal by rate of changing pitch. Also synthesizing is using this to produce changed synthetic voice. In this process, accurate detection of pitch and little difference of pitch alternation are necessary. There are many suggestions about pitch detection of voice signal. For example, one of famous method is Autocorrelation method, and it is the method that calculate the function between neighboring speech waveform to detect periodical cycle of waveform(References) .

Alternating pitch has to be processed after complete pitch detection. Also many methods of alternating pitch had been suggested. For example, there is a method of PSOLA(Pitch Synchronous overLap and Add) 3. The PSOLA method widely separates speech waveform in time domain by pitch periodic unit and reconstructs overlapped waveform.

Brief Description of the Drawings

FIG. 1 shows the existing speech production model block fig. ;

FIG. 2 shows represents general pitch altering system block fig. ; FIG. 3 shows Pitch altering system block fig.;

FIG. 4 shows Pitch point detection block fig.; FIG. 5 shows Pitch alternation system block fig.; FIG. 6 shows Organization of multiple-speech synthesizer hardware; FIG. 7 shows Flow chart of multiple-speech synthesizer software.

Best Mode for Carrying Out the Invention

Fig. 3 is the pitch alternation system block fig. that applied to this invention. In this invention, for pitch detection, they used method as fig. 4.

First, pass the sound through the appeared filter by linear predictive coefficient that emphasis the high frequency area through pre-emphasis filter, and then apply the periodic characteristic and the amplitude characteristics of glottis from each interpretation area to accomplish the pitch detection process. As above, examine the pitch and use examined pitch with PSOLA process to combine sounds and -to alter sounds, 140%, 120% extended pitch and 80%, 60% shrieked pitch and produced. If you synthesis those pitches with a little time differences, you can produce multiple voice synthesized sound.

Hardware equipments organization

Fig. 6 represents the equipment that receives the analog-shaped voice signal (600) from microphone, and it changes pitch and synthesis voices. The voices that recorded as shape of analog (600) is amplified at an amplifier (601), and going through the Low Pass Filter to remove aliasing effect. Also it passes through analog-digital convertor to achieve quantization and coding, then the voices changed into PCM shaped digital signal. Last process are occurred in software or firmware at CPU or DSP.

During digital treatment process, the computer manager could use the other equipment that constructed outside, and it could use outside memory to save management result or input digital signal.

The multiple voice synthesized digital signal by altering pitch software in CPU would be converted into analog shaped signal which is sampled. If you pass this signal through Low Pass Filter, it would be the analog signal without quantization noises. Also if you amplify that signal with right rates, it would be analog signal that could be listened through speaker.

Software Management Process Multiple Speech Synthesizer using Pitch Alternation

Method is added software or firmware that using pitch alternation method rather than using original single pitch alternation. Fig. 7 represents flow chart of multiple speech synthesizer that used in this invention. The data sample (701) from Analog-digital converter (ADC) managed to one unit of frame. First, the data sample is interpreted whether it is voiced sound or not, and if it is not, (703) Buffer Rates would be calculated. A memory buffer that needed for standing by the data that managed is called Ring Buffer (710) .

The data that managed Buffer Rate, which is Ring Buffer, represents the rates of the tone. If recent frame is not voiced sound, and the time that delayed in ring buffer is over the setting time (Ex. BT=1.5), the program will need to shorten management tone process in order to make handling time shorter. By their process, you can bring time back to routine that delayed when Multiple Pitch Alternation processed. The voiced sound section would generate power slowly to alternate pitch in right way, but in non-voiced sound section, time would be fastening up to solve the postponement.

There are many suggestions to measure the current frame to interpret if it is voiced sound, and you could make easy process with invent energy level. Thus if the average energy of current frame gets below the standard value, it would be non- oiced sound.

If the data is placed on section of voiced sound, you need to detect the pitch cycle by Pitch Point Detection Process. There are many suggestions accepted about Pitch Point detection process of sound signal in past 40 years. For example, for pitch detection, Autocorrelation method is commonly used, and there is the way to detect the periodic waveform cycle by calculating the relationship between neighboring speech waveform.

Also, to restrict the range of changing intonation in voiced sound section (Ex. Max 1.3x), detect the pitch cycle from constant voiced sound section, and get the rate of changing per frame. If the changing rate is higher, it moderates the voice by using alternation pitch cycle (706) . Pitch cycle alternation is based on Pitch cycle detection. Also there are many solution suggested about alternation the pitch cycle. In this invention, PSOLA that divide the speech waveform widely by unit of pitch cycle in time section and turn reiterate waveform to reform is used to achieve multiple pitch alternation.

The voiced data that completely managed is saved on Ring Buffer (709) , and it would output the data through the digital-analog converter by saved order. The function of the multiple voiced synthesizers is managed immediately. Thus, you have to finish the treatment after downloading the dating from analog frame, but before downloading the other data. References

MyungJin Bae, SangHyo Lee, 'Digital Speech Analysis', Published by Dong-Young, 1998

MyungJin Bae, 'Digital Speech Synthesis', Published by Dong-Young, 1999

MyungJin Bae, 'Digital Speech Coding', Published by Dong-Young, 2000

Rabiner and Schefer, ' Digital Signal Processing of speech Signals', Prentice Hall, 1978 HyungBin Park, MyungJin Bae, "On a Detection of Pitch

Point for Voice Color Conversion", J. Acoust, Society, Korea, Vol. 19, No.l, pp. 1, 49-152, July 7-8, 2000.

Industrial Applicability

As above explanation, this invention is to synthesis the multiple voices from a voice by alternating pitch which is important parameter. The voices' information technique is selected that top 10 technique in 21 century by M.I.T, and top 10 bright prospect technique by Samsung Economy invent Center. Except the importance of technique, economy of voice technology will develop with highest rate. At present, the interior voice technological economy is in starting level, and the scale is predicted about 20 billion dollars. However, the rate of development is constantly over 50 percents, so by 2005, the scale of the interior voiced technological economy will reach over 100 billion dollars.

For this result, this invention is essential to be used, such as a cheer synthesizer that gives effect of group cheer from only one person's voice, congratulated synthesizer at birthday or party places, rotation singing toy, etc. Also effect sound of movie or play, and House protection system for working people could be produced. The modulation of sound could help to imitate voices of famous actors or cartoon characters, such as Mask-man. The prediction of an effect would be large.

Claims

Claims Multiple speech synthesizer by using pitch alteration method comprising to synthesis the multiple voices from a voiced by the alternating pitch which is important parameter, and that applied to the method to control the beam of the sound immediately to the time section, and to keep the characteristics of voices and accuracy, they use pitch point detection method that used Linear Predictive Analysis, which could detect based on core pitch of the voice and also in the time section, they applied PSOLA synthesis method for alternating pitch immediately so that multiple voices that changed from a voice could synthesis there.