JP2000250572A

JP2000250572A - Device and method for preparing voice database, device and method for preparing singing voice database

Info

Publication number: JP2000250572A
Application number: JP11052173A
Authority: JP
Inventors: Kimito Tanaka; 公人田中; Hideyuki Mizuno; 秀之水野; Masanobu Abe; 匡伸阿部; Shinya Nakajima; 信弥中嶌
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1999-03-01
Filing date: 1999-03-01
Publication date: 2000-09-14

Abstract

PROBLEM TO BE SOLVED: To correctly and efficiently record a voice of required text, accent intonation, tempo and voice quality or a singing voice of required text, melody and voice quality. SOLUTION: The text, accent intonation, temp and voice quality corresponding to the required voice are inputted from a text input editor 1, a rhythm design editor 3 and a voice quality design editor 6, and a required synthesized voice is made by a reading giving part 2, a rhythm parameter generation part 4, a voice synthetic part 5 and a voice quality conversion part 7 to be outputted from a speaker 8, and that is let an utterer hear, and is let the same person imitate, and is let the same utter to be recorded from a micro-phone 9. Further, that is listened on the spot, and when that is the intended voice, by operating a switch 10 to be registered in a database 12.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声認識装置の学
習等に用いる音声サンプル、波形接続型の音声合成装置
に用いる音声素片、音声応答装置で用いる応答音声、ア
ニメーション・映画・マルチメディアコンテンツ作成の
ための音声または歌声のように、発声内容、アクセント
・イントネーション、テンポ、声質等または歌詞、旋
律、声質等が予め決まっている音声または歌声のデータ
ベース作成作業において、作成者がイメージ・所望する
音声または歌声を正確にかつ効率的に収録して音声また
は歌声データベースを構築するための音声データベース
作成装置及びその方法並びに歌声データベース作成装置
及びその方法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech sample used for learning of a speech recognition device, a speech unit used for a waveform connection type speech synthesis device, a response speech used for a voice response device, animation / movie / multimedia contents. Like a voice or a singing voice for creation, the creator images and desires in a database operation of a voice or a singing voice in which utterance content, accent intonation, tempo, voice quality, etc. or lyrics, melody, voice quality, etc. are predetermined. The present invention relates to a voice database creation device and method for constructing a voice or singing voice database by accurately and efficiently recording voice or singing voice, and a singing voice database creation device and method.

【０００２】[0002]

【従来の技術】従来の音声データベース作成装置は、発
声者が発声する音声を収録するマイクと、その音声を記
録する記録媒体とから構成されており、発声する内容は
メモによって発声者に知らされ、韻律、漢字の読み、声
質等は、データベース作成者が発声者に説明することに
よって伝えられていた。2. Description of the Related Art A conventional speech database creating apparatus is composed of a microphone for recording a voice uttered by a speaker and a recording medium for recording the voice, and the uttered content is notified to the speaker by a memo. , Prosody, kanji reading, voice quality, etc. were communicated by the database creator explaining to the speaker.

【０００３】また、従来の歌声データベース作成装置
は、歌声を収録するマイクと、その歌声を記録する記録
媒体とから構成されており、歌詞、旋律等はメモによっ
て歌唱者に知らされ、歌詞の読み、声質等は、データベ
ース作成者が歌唱者に説明することによって伝えられて
いた。Further, the conventional singing voice database creating apparatus is composed of a microphone for recording the singing voice and a recording medium for recording the singing voice. , Voice quality, etc. were communicated by the database creator explaining to the singer.

【０００４】[0004]

【発明が解決しようとする課題】しかし、このような従
来の音声データベース作成装置では、登録したい音声の
内容を記したメモを発声者に渡し、読み方、アクセント
・イントネーション、テンポ、声質等について、データ
ベース作成者が多大な時間をかけて説明した後、発声者
はそれに従って発声する、というふうに多くの手間と人
手と時間が必要であり、また、イントネーション、テン
ポ、声質等は発声者への説明が難しく、データベース作
成者が所望する音声を収集することは非常に困難であっ
た。However, in such a conventional voice database creating apparatus, a memo describing the content of the voice to be registered is handed over to the speaker, and a database is prepared for reading, accent / intonation, tempo, voice quality, and the like. After the creator spends a lot of time explaining, the speaker speaks according to it, so much labor, manpower and time are required, and the intonation, tempo, voice quality etc. are explained to the speaker. And it is very difficult for the database creator to collect the desired voice.

【０００５】また、従来の歌声データベース作成装置で
は、登録したい歌声の歌詞や旋律を記したメモを歌唱者
に渡し、声質等について、データベース作成者が多大な
時間をかけて説明した後、歌唱者はそれに従って歌唱す
る、というふうに多くの手間と人手と時間が必要であ
り、また、声質等は発声者への説明が難しく、データベ
ース作成者が所望する歌声を収集することは非常に困難
であった。In the conventional singing voice database creating apparatus, a memo describing the lyrics and melody of the singing voice to be registered is given to the singer, and the creator of the database spends a great deal of time explaining the voice quality and the like, and then the singer Singing in accordance with it requires a lot of trouble, manpower and time, and it is difficult to explain the voice quality etc. to the speaker, and it is very difficult for the database creator to collect the desired singing voice. there were.

【０００６】本発明の目的は、データベース作成者が所
望するテキスト、アクセント・イントネーション、テン
ポ、声質の音声を正確かつ効率的に収録して音声データ
ベースに登録することができる音声データベース作成装
置及びその方法、並びにデータベース作成者が所望する
歌詞、旋律、声質の歌声を正確かつ効率的に収録して歌
声データベースに登録することができる歌声データベー
ス作成装置及びその方法を提供することにある。SUMMARY OF THE INVENTION It is an object of the present invention to provide a speech database creation apparatus and method capable of accurately and efficiently recording text, accent intonation, tempo, and voice quality desired by a database creator and registering the speech in a speech database. It is another object of the present invention to provide a singing voice database creating apparatus and method capable of accurately and efficiently recording singing voices having desired lyrics, melody, and voice quality and registering the singing voices in a singing voice database.

【０００７】[0007]

【課題を解決するための手段】前記目的を達成するた
め、本発明の音声データベース作成装置は、音声データ
ベース作成者が所望する音声に対応するテキストを入力
するテキスト入力部と、所望の韻律パターンを入力する
韻律入力部と、所望の声質を示す情報を入力する声質入
力部とを備え、さらにそれらの入力部から入力されたパ
ラメータ通り正確に音声を合成する音声合成部を備えて
いる。In order to achieve the above object, a speech database creating apparatus according to the present invention comprises: a text input unit for inputting a text corresponding to a speech desired by a speech database creator; It has a prosody input unit for inputting, a voice quality input unit for inputting information indicating desired voice quality, and a voice synthesis unit for synthesizing voice exactly according to parameters input from those input units.

【０００８】実際の収録現場では、合成された音声を発
声者が聞いた直後、それを真似て発声することにより、
データベース作成者が所望するテキスト、アクセント・
イントネーション、テンポ、声質の音声サンプルセット
を正確かつ効率的に収録して音声データベースに登録す
ることができる。In an actual recording site, immediately after the speaker hears the synthesized voice, the speaker imitates the synthesized voice,
Text, accent,
A sound sample set of intonation, tempo, and voice quality can be accurately and efficiently recorded and registered in a sound database.

【０００９】また、本発明の歌声データベース作成装置
は、歌声データベース作成者が所望する歌声に対応する
歌詞を入力する歌詞入力部と、所望の旋律を入力する旋
律入力部と、所望の声質を示す情報を入力する声質入力
部とを備え、さらにそれらの入力部から入力されたパラ
メータ通り正確に歌声を合成する歌声合成部を備えてい
る。Further, the singing voice database creation device of the present invention indicates a lyric input unit for inputting lyrics corresponding to a singing voice desired by the singing voice database creator, a melody input unit for inputting a desired melody, and a desired voice quality. A voice quality input unit for inputting information; and a singing voice synthesizing unit for synthesizing a singing voice exactly according to the parameters input from the input units.

【００１０】実際の収録現場では、合成された歌声を歌
唱者が聞いた直後、それを真似て歌唱することにより、
データベース作成者が所望する歌詞、旋律、声質の歌声
サンプルセットを正確かつ効率的に収録して歌声データ
ベースに登録することができる。In an actual recording site, immediately after the singer hears the synthesized singing voice, the singer imitates the singing voice,
A singing voice sample set of the lyrics, melody, and voice quality desired by the database creator can be accurately and efficiently recorded and registered in the singing voice database.

【００１１】[0011]

【発明の実施の形態】次に、本発明について図面を参照
して説明する。Next, the present invention will be described with reference to the drawings.

【００１２】図１は本発明の音声データベース作成装置
の実施の形態の一例を示すもので、図中、１はテキスト
入力エディタ、２は読み付与部、３は韻律デザインエデ
ィタ、４は韻律パラメータ生成部、５は音声合成部、６
は声質デザインエディタ、７は声質変換部、８はスピー
カ、９はマイク、１０はスイッチ、１１は指示表示モニ
タ、１２は記憶装置である。FIG. 1 shows an example of an embodiment of a speech database creating apparatus according to the present invention. In the figure, 1 is a text input editor, 2 is a reading addition section, 3 is a prosody design editor, and 4 is a prosody parameter generation. Section, 5 is a speech synthesis section, 6
Is a voice quality design editor, 7 is a voice quality conversion unit, 8 is a speaker, 9 is a microphone, 10 is a switch, 11 is an instruction display monitor, and 12 is a storage device.

【００１３】テキスト入力エディタ１は、音声データベ
ース作成者が所望する音声に対応するテキストを入力す
るためのものである。読み付与部２は、前記入力された
テキストの発音（読み仮名）を分析し、前記テキストに
分析した発音を対応づけて（読み仮名を付与して）出力
する。The text input editor 1 is for inputting text corresponding to a voice desired by a voice database creator. The reading giving unit 2 analyzes the pronunciation (reading kana) of the input text, associates the analyzed pronunciation with the text (gives the reading kana), and outputs the text.

【００１４】韻律デザインエディタ３は、所望するアク
セント・イントネーションやテンポ等の韻律パターンを
デザイン（編集）して入力する。韻律パラメータ生成部
４は、入力された韻律パターンから音声の韻律パラメー
タを生成する。音声合成部５は、読み仮名が付与された
テキストと韻律パラメータとに基づいて音声を合成す
る。The prosody design editor 3 designs (edits) and inputs a desired prosody pattern such as desired accent intonation and tempo. The prosody parameter generation unit 4 generates a prosody parameter of the voice from the input prosody pattern. The speech synthesizer 5 synthesizes speech based on the text to which the reading kana is added and the prosodic parameters.

【００１５】声質デザインエディタ６は、所望する声質
を示す情報をデザイン（編集）して入力する。声質変換
部７は、前記合成された音声を前記声質を示す情報に従
って所望の声質に変換する。The voice quality design editor 6 designs (edits) information indicating a desired voice quality and inputs the information. The voice quality conversion unit 7 converts the synthesized voice into a desired voice quality according to the information indicating the voice quality.

【００１６】スピーカ８は、前記声質を変換した合成音
声を出力する。マイク９は、スピーカ８から出力された
合成音声を真似て発声者が発声した音声を収録する。The speaker 8 outputs a synthesized voice obtained by converting the voice quality. The microphone 9 records the voice uttered by the speaker while imitating the synthesized voice output from the speaker 8.

【００１７】スイッチ１０は、マイク９から収録された
音声を音声データベース作成者が聞いてそれがデータベ
ースに登録するのに十分なほど作成者の意図通り発声さ
れているかどうかを判断した結果を入力するためのもの
である。指示表示モニタ１１は、スイッチ１０の出力を
発声者に知らせる。The switch 10 inputs the result of determining whether the voice database creator has listened to the voice recorded from the microphone 9 and uttered as intended by the creator sufficiently to register the voice in the database. It is for. The instruction display monitor 11 notifies the output of the switch 10 to the speaker.

【００１８】記憶装置１２は、音声データベースを保持
するもので、このデータベースにマイク９から収録され
た音声とそれに対応する（テキスト入力エディタ１から
入力された）テキストを同時に登録する。The storage device 12 holds a voice database, and simultaneously registers the voice recorded from the microphone 9 and the corresponding text (input from the text input editor 1) in this database.

【００１９】次に、前記装置を用いた音声データベース
作成のようすをその動作とともに説明する。Next, how to create a voice database using the above-described apparatus will be described together with its operation.

【００２０】データベース作成者は、収録作業の前に、
本音声データベース作成装置に組み込まれているテキス
ト入力エディタ１を用いて所望の音声に対応するテキス
トを入力し、韻律デザインエディタ３を用いてそのテキ
ストのアクセント・イントネーションやテンポ等の韻律
パターンを所望するようにデザインして入力する。Before the recording operation, the database creator
A text corresponding to a desired voice is input using a text input editor 1 incorporated in the present voice database creating apparatus, and a prosody pattern such as accent / intonation and tempo of the text is desired using a prosody design editor 3. Design and input.

【００２１】韻律デザインエディタとしては、例えば、
水野、阿部、中嶌、”音声表現力と制作効率を向上させ
た音声制作ツール−Ｓｅｓｉｇｎ９８−”日本音響学会
平成１０年度秋季研究発表会講演論文集、２−Ｐ−１
２、ｐｐ．３０９−３１０、１９９８．１０等に記載さ
れたものがある。As a prosody design editor, for example,
Mizuno, Abe, Nakashima, "Speech Production Tool with Improved Speech Expression and Production Efficiency -Signing98-" Proceedings of the Fall Meeting of the Acoustical Society of Japan 1998, 2-P-1
2, pp. 309-310, 1998.10 and the like.

【００２２】さらにデータベース作成者は、声質デザイ
ンエディタ６を用いて所望する声質を示す情報をデザイ
ンして入力する。Further, the database creator uses the voice quality design editor 6 to design and input information indicating a desired voice quality.

【００２３】前記入力されたテキストは読み付与部２に
て読みが付与され、また、前記入力された韻律パターン
は韻律パラメータ生成部４にて音声の韻律パラメータに
変換された後、共に音声合成部５に入力されて合成音声
となり、さらに声質変換部７にて前記入力された声質を
示す情報に従う声質に変換される。The input text is provided with a reading by a reading providing unit 2, and the input prosody pattern is converted into a prosody parameter of a voice by a prosody parameter generation unit 4. 5 is converted into a synthesized speech, and further converted by a voice quality conversion unit 7 into a voice quality according to the information indicating the input voice quality.

【００２４】声質変換部（装置）としては、例えば、阿
部、”波形処理による声質変換装置（ＶａｒｉｏＶｏｉ
ｃｅ）”日本音響学会平成９年度春季研究発表会講演論
文集、３−７−６、ｐｐ．２６９−２７０、１９９７．
３等に記載されたものがある。As the voice conversion unit (device), for example, Abe, "Voice conversion device (VarioVoi) by waveform processing
ce) "The Acoustical Society of Japan 1997 Spring Research Conference Proceedings, 3-7-6, pp. 269-270, 1997.
3 and the like.

【００２５】以上の処理により、音声データベース作成
者は、自分がイメージ・所望する音声と同じ音響的性質
を持つ合成音声を得ることができる。音声データベース
作成者は、事前にそのような合成音声を作成しておく。Through the above processing, the creator of the speech database can obtain a synthesized speech having the same acoustic properties as the image and the desired speech. The voice database creator creates such a synthesized voice in advance.

【００２６】収録の現場では、その合成音声をスピーカ
８から出力して発声者に順番に聞かせ、発声者にその合
成音声を真似させて発声させ、これをマイク９から収録
する。At the recording site, the synthesized voice is output from the speaker 8 to be heard by the speaker in order, and the speaker is made to imitate the synthesized voice and uttered.

【００２７】その収録された音声を収録作業者（データ
ベース作成者と同一でも、そうでなくても良い。）が聞
いて、直前の合成音声に音響的に近く、データベース作
成者の意図に十分合っていると判断したならば、スイッ
チ１０を操作する。これによってマイク９から収録され
た音声が前記入力されたテキストとともに記憶装置１２
（の音声データベース）に登録される。A recording operator (which may or may not be the same as the database creator) listens to the recorded voice, and is acoustically close to the immediately preceding synthesized voice and sufficiently matches the intention of the database creator. If it is determined that the switch is operating, the switch 10 is operated. Thus, the voice recorded from the microphone 9 is stored in the storage device 12 together with the input text.
(Voice database).

【００２８】なお、スイッチ１０の操作出力は、指示表
示モニタ１１に表示され、発声者はその発声が十分であ
ったかどうかを即座に知ることができる。The operation output of the switch 10 is displayed on the instruction display monitor 11 so that the speaker can immediately know whether or not the utterance is sufficient.

【００２９】図２は本発明の歌声データベース作成装置
の実施の形態の一例を示すもので、図中、２１は歌詞入
力エディタ、２２は読み付与部、２３は旋律デザインエ
ディタ、２４は旋律パラメータ生成部、２５は歌声合成
部、２６は声質デザインエディタ、２７は声質変換部、
２８はスピーカ、２９はマイク、３０はスイッチ、３１
は指示表示モニタ、３２は記憶装置である。FIG. 2 shows an example of an embodiment of a singing voice database creation apparatus according to the present invention. In the figure, reference numeral 21 denotes a lyric input editor, 22 denotes a reading addition section, 23 denotes a melody design editor, and 24 denotes a melody parameter generation. Section, 25 is a singing voice synthesis section, 26 is a voice quality design editor, 27 is a voice quality conversion section,
28 is a speaker, 29 is a microphone, 30 is a switch, 31
Is an instruction display monitor, and 32 is a storage device.

【００３０】歌詞入力エディタ２１は、歌声データベー
ス作成者が所望する歌声に対応する歌詞を入力するため
のものである。読み付与部２２は、前記入力された歌詞
の発音（読み仮名）を分析し、前記歌詞に分析した発音
を対応づけて（読み仮名を付与して）出力する。The lyrics input editor 21 is for inputting the lyrics corresponding to the desired singing voice by the creator of the singing voice database. The reading giving unit 22 analyzes the pronunciation (reading kana) of the input lyrics, associates the analyzed pronunciation with the lyrics (adds reading kana), and outputs the lyrics.

【００３１】旋律デザインエディタ２３は、所望する旋
律パターンをデザイン（編集）して入力する。旋律パラ
メータ生成部２４は、入力された旋律パターンから歌声
の旋律パラメータを生成する。歌声合成部２５は、読み
仮名が付与された歌詞と旋律パラメータとに基づいて歌
声を合成する。The melody design editor 23 designs (edits) and inputs a desired melody pattern. The melody parameter generation unit 24 generates a melody parameter of the singing voice from the input melody pattern. The singing voice synthesizing unit 25 synthesizes a singing voice based on the lyrics to which the reading kana is added and the melody parameter.

【００３２】声質デザインエディタ２６は、所望する声
質を示す情報をデザイン（編集）して入力する。声質変
換部２７は、前記合成された歌声を前記声質を示す情報
に従って所望の声質に変換する。The voice quality design editor 26 designs (edits) and inputs information indicating a desired voice quality. The voice quality conversion unit 27 converts the synthesized singing voice into a desired voice quality according to the information indicating the voice quality.

【００３３】スピーカ２８は、前記声質を変換した合成
歌声を出力する。マイク２９は、スピーカ２８から出力
された合成歌声を真似て歌唱者が歌唱した歌声を収録す
る。The speaker 28 outputs a synthesized singing voice whose voice quality has been converted. The microphone 29 records the singing voice sung by the singer by imitating the synthesized singing voice output from the speaker 28.

【００３４】スイッチ３０は、マイク２９から収録され
た歌声を歌声データベース作成者が聞いてそれがデータ
ベースに登録するのに十分なほど作成者の意図通り歌唱
されているかどうかを判断した結果を入力するためのも
のである。指示表示モニタ３１は、スイッチ３０の出力
を歌唱者に知らせる。The switch 30 inputs the result of the singing voice database creator listening to the singing voice recorded from the microphone 29 and determining whether or not the singing voice is sung as intended by the creator sufficiently to register it in the database. It is for. The instruction display monitor 31 notifies the singer of the output of the switch 30.

【００３５】記億装置３２は、歌声データベースを保持
するもので、このデータベースにマイク２９から収録さ
れた歌声とそれに対応する（歌詞入力エディタ２１から
入力された）歌詞を同時に登録する。The storage device 32 holds a singing voice database, and simultaneously registers the singing voice recorded from the microphone 29 and the corresponding lyrics (input from the lyrics input editor 21) in this database.

【００３６】次に、前記装置を用いた歌声データベース
作成のようすをその動作とともに説明する。Next, the operation of creating a singing voice database using the above-described apparatus will be described along with its operation.

【００３７】データベース作成者は、収録作業の前に、
本歌声データベース作成装置に組み込まれている歌詞入
力エディタ２１を用いて所望の歌声に対応する歌詞を入
力し、旋律デザインエディタ２３を用いてその歌詞の旋
律パターンを所望するようにデザインして入力する。Before the recording operation, the database creator
The lyrics corresponding to the desired singing voice are input using the lyrics input editor 21 incorporated in the present singing voice database creation device, and the melody pattern of the lyrics is designed and input as desired using the melody design editor 23. .

【００３８】旋律デザインエディタとしては、例えば、
水野、阿部、中嶌、”音声表現力と制作効率を向上させ
た音声制作ツール−Ｓｅｓｉｇｎ９８−”日本音響学会
平成１０年度秋季研究発表会講演論文集、２−Ｐ−１
２、ｐｐ．３０９−３１０、１９９８．１０等に記載さ
れたものがある。As a melody design editor, for example,
Mizuno, Abe, Nakashima, "Speech Production Tool with Improved Speech Expression and Production Efficiency -Signing98-" Proceedings of the Fall Meeting of the Acoustical Society of Japan 1998, 2-P-1
2, pp. 309-310, 1998.10 and the like.

【００３９】さらにデータベース作成者は、声質デザイ
ンエディタ２６を用いて所望する声質を示す情報をデザ
インして入力する。Further, the database creator uses the voice quality design editor 26 to design and input information indicating the desired voice quality.

【００４０】前記入力された歌詞は読み付与部２２にて
読みが付与され、また、前記入力された旋律パターンは
旋律パラメータ生成部２４にて歌声の旋律パラメータに
変換された後、共に歌声合成部２５に入力されて合成歌
声となり、さらに声質変換部２７にて前記入力された声
質を示す情報に従う声質に変換される。The input lyric is given a reading by a reading giving unit 22, and the inputted melody pattern is converted into a melody parameter of a singing voice by a melody parameter generating unit 24. 25, the singing voice is converted into a synthesized singing voice, and further converted into a voice quality according to the input information indicating the voice quality by a voice quality converting unit 27.

【００４１】声質変換部（装置）としては、例えば、阿
部、”波形処理による声質変換装置（ＶａｒｉｏＶｏｉ
ｃｅ）”日本音響学会平成９年度春季研究発表会講演論
文集、３−７−６、ｐｐ．２６９−２７０、１９９７．
３等に記載されたものがある。As the voice conversion unit (device), for example, Abe, “Voice conversion device (VarioVoi) by waveform processing
ce) "The Acoustical Society of Japan 1997 Spring Research Conference Proceedings, 3-7-6, pp. 269-270, 1997.
3 and the like.

【００４２】以上の処理により、歌声データベース作成
者は、自分がイメージ・所望する歌声と同じ音響的性質
を持つ合成歌声を得ることができる。歌声データベース
作成者は、事前にそのような合成歌声を作成しておく。By the above processing, the singing voice database creator can obtain a synthesized singing voice having the same acoustic properties as the image and the desired singing voice. The singing voice database creator creates such a synthetic singing voice in advance.

【００４３】収録の現場では、その合成歌声をスピーカ
２８から出力して歌唱者に順番に聞かせ、歌唱者にその
合成歌声を真似させて歌唱させ、これをマイク２９から
収録する。At the recording site, the synthesized singing voice is output from the speaker 28 and is sequentially heard by the singer, and the singing person is caused to imitate the synthesized singing voice and sing the singing voice.

【００４４】その収録された歌声を収録作業者（データ
ベース作成者と同一でも、そうでなくても良い。）が聞
いて、直前の合成歌声に音響的に近く、データベース作
成者の意図に十分合っていると判断したならば、スイッ
チ３０を操作する。これによってマイク２９から収録さ
れた歌声が前記入力された歌詞とともに記憶装置３２
（の歌声データベース）に登録される。A recording operator (which may or may not be the same as the database creator) listens to the recorded singing voice, and is acoustically close to the immediately preceding synthesized singing voice and sufficiently matches the intention of the database creator. If the switch 30 is determined to be operating, the switch 30 is operated. Thereby, the singing voice recorded from the microphone 29 is stored in the storage device 32 together with the inputted lyrics.
(Singing voice database).

【００４５】なお、スイッチ３０の操作出力は、指示表
示モニタ３１に表示され、歌唱者はその歌唱が十分であ
ったかどうかを即座に知ることができる。The operation output of the switch 30 is displayed on the instruction display monitor 31, so that the singer can immediately know whether or not the singing is sufficient.

【００４６】[0046]

【発明の効果】以上説明したように、本発明によれば、
データベース作成者がイメージするアクセント・イント
ネーション、テンポ、声質で発声された音声サンプル、
または所望する歌詞、旋律、声質で歌唱された歌声サン
プルを、データベース作成者が所望する通り正確かつ短
時間に収録して音声データベースまたは歌声データベー
スを構築することができる。さらに、データベース作成
者は合成音声または合成歌声の作成さえしておけば、収
録現場に作成者本人が立ち会うことなく代理人が収録作
業を行うことも可能となる。As described above, according to the present invention,
Speech samples uttered with accent intonation, tempo, voice quality that the database creator imagines,
Alternatively, a singing voice sample sung with desired lyrics, melody, and voice quality can be recorded accurately and in a short time as desired by a database creator to construct a voice database or a singing voice database. Furthermore, as long as the database creator only creates a synthetic voice or a synthetic singing voice, the agent can also perform the recording operation without the creator himself present at the recording site.

【００４７】発声者または歌唱者にとっては、自分の発
声または歌唱が十分データベース作成者の意図に合って
いたかどうかを即座に知ることができるので、発声また
は歌唱の精度を向上させることができる。The speaker or singer can immediately know whether his or her utterance or singing has sufficiently met the intention of the database creator, and therefore the accuracy of the utterance or singing can be improved.

[Brief description of the drawings]

【図１】本発明の音声データベース作成装置の実施の形
態の一例を示すブロック図FIG. 1 is a block diagram showing an example of an embodiment of a voice database creation device of the present invention.

【図２】本発明の歌声データベース作成装置の実施の形
態の一例を示すブロック図FIG. 2 is a block diagram showing an example of an embodiment of a singing voice database creation device of the present invention.

[Explanation of symbols]

１：テキスト入力エディタ、２、２２：：読み付与部、
３：韻律デザインエディタ、４：韻律パラメータ生成
部、５：音声合成部、６，２６：声質デザインエディ
タ、７，２７：声質変換部、８，２８：スピーカ、９，
２９：マイク、１０，３０：スイッチ、１１，３１：指
示表示モニタ、１２，３２：記憶装置、２１：歌詞入力
エディタ、２３：旋律デザインエディタ、２４：旋律パ
ラメータ生成部、２５：歌声合成部。1: text input editor, 2, 22 :: reading addition unit,
3: prosody design editor, 4: prosody parameter generation unit, 5: speech synthesis unit, 6, 26: voice design editor, 7, 27: voice conversion unit, 8, 28: speaker, 9,
29: microphone, 10, 30: switch, 11, 31: instruction display monitor, 12, 32: storage device, 21: lyrics input editor, 23: melody design editor, 24: melody parameter generator, 25: singing voice synthesizer.

───────────────────────────────────────────────────── フロントページの続き (72)発明者阿部匡伸東京都新宿区西新宿３丁目19番２号日本電信電話株式会社内 (72)発明者中嶌信弥東京都新宿区西新宿３丁目19番２号日本電信電話株式会社内Ｆターム(参考） 5D045 AA07 AA08 AA09 ──────────────────────────────────────────────────続き Continuing on the front page (72) Inventor Masanobu Abe 3-19-2 Nishi-Shinjuku, Shinjuku-ku, Tokyo Japan Telegraph and Telephone Corporation (72) Inventor Shinya Nakashima 3-192-2, Nishi-Shinjuku, Shinjuku-ku, Tokyo No. Nippon Telegraph and Telephone Corporation F-term (reference) 5D045 AA07 AA08 AA09

Claims

[Claims]

1. A speech database creation device for creating a speech database consisting of text and corresponding speech data by recording speech uttered by a human being, wherein: a text input unit for inputting a text corresponding to the speech; A reading provision unit that analyzes the reading kana of the text, and adds a reading kana to the text, a prosody input unit that inputs a desired prosody pattern, and a prosody parameter generation unit that generates a prosodic parameter of speech from the prosody pattern. A voice synthesis unit that synthesizes a voice based on the text to which the reading kana is given and the prosodic parameter; a voice quality input unit that inputs information indicating a desired voice quality; and the synthesized voice indicates the voice quality. A voice conversion unit that converts the voice into a desired voice according to the information; a voice reproduction unit that reproduces the synthesized voice after the voice conversion; A voice recording unit that records a voice of a speaker who imitates the recorded and reproduced synthesized voice, a determination input unit that inputs a determination result as to whether to register the recorded voice in the voice database, and the determination result. And an instruction display unit that displays the recorded audio and the text corresponding thereto are registered in the audio database in association with each other.

2. The apparatus according to claim 1, wherein said prosody input section and said voice quality input section each have means for editing information indicating a prosody pattern and voice quality.

3. A speech database creation method for creating a speech database comprising text and speech data corresponding to the text by recording a speech uttered by a human, comprising: inputting a text corresponding to the speech; Analyzing the reading kana, assigning the reading kana to the text, inputting a desired prosody pattern, generating a prosodic parameter of speech from the prosody pattern, based on the text to which the reading kana is added and the prosody parameter Synthesizing a voice, inputting information indicating a desired voice quality, converting the synthesized voice to a desired voice quality according to the information indicating the voice quality, reproducing the synthesized voice after the voice quality conversion, the reproducing synthesized Record the voice of the speaker who imitated the voice, and determine whether to register the recorded voice in the voice database. Speech database creation method and registers the voice database in association with the text and the corresponding sound is the recording.

4. A singing voice database creating apparatus for creating a singing voice database composed of lyrics and corresponding singing voice data by recording a singing voice uttered by a human, a lyrics input section for inputting lyrics corresponding to the singing voice, A reading assignment unit that analyzes the reading kana of the given lyrics and adds a reading kana to the lyrics, a melody input unit that inputs a desired melody, a melody parameter generation unit that generates a melody parameter of a singing voice from the melody, A singing voice synthesizing unit that synthesizes a singing voice based on the lyrics to which the reading kana is assigned and the melody parameter; a voice quality inputting unit that inputs information indicating a desired voice quality; and the synthesized singing voice according to the information indicating the voice quality. A voice quality conversion unit for converting to a desired voice quality, a singing voice reproducing unit for reproducing the synthesized singing voice after the voice quality conversion, and a singer imitating the reproduced synthesized singing voice A singing voice recording unit that records a singing voice, a determination input unit that inputs a determination result as to whether or not to register the recorded singing voice in the singing voice database, and an instruction display unit that displays the determination result, A singing voice database creating apparatus, wherein a recorded singing voice is associated with lyrics corresponding thereto and registered in the singing voice database.

5. The singing voice database creating apparatus according to claim 4, wherein said melody input section and said voice quality input section each have means for editing information indicating the melody and voice quality.

6. A singing voice database creating method for creating a singing voice database comprising lyrics and singing voice data corresponding to the singing voice uttered by a human being, wherein the singing voice corresponding to the singing voice is input, and Analyzing the reading kana, giving the reading kana to the lyrics, inputting a desired melody, generating a melody parameter of the singing voice from the melody, and singing the singing voice based on the lyric to which the reading kana is given and the melody parameter. Synthesizing, inputting information indicating a desired voice quality, converting the synthesized singing voice into a desired voice quality according to the information indicating the voice quality, reproducing the synthesized singing voice after the voice quality conversion; The singing voice of the imitated singer is recorded, and it is determined whether or not the recorded singing voice is registered in the singing voice database, and the recorded singing voice and the corresponding singing voice are recorded. Vocal database creation method characterized by the lyrics in association registered in the voice database.