JP6003195B2

JP6003195B2 - Apparatus and program for performing singing synthesis

Info

Publication number: JP6003195B2
Application number: JP2012104092A
Authority: JP
Inventors: 英治赤澤; 入山　達也; 達也入山
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2012-04-27
Filing date: 2012-04-27
Publication date: 2016-10-05
Anticipated expiration: 2032-04-27
Also published as: JP2013231872A

Description

この発明は、歌唱合成技術に係り、特にラップのための歌唱合成に好適な装置およびプログラムに関する。 The present invention relates to a song synthesis technique, and more particularly to an apparatus and program suitable for song synthesis for rap.

歌唱合成装置では、曲を構成する個々の音符を示す情報と音符に合わせて歌唱する歌詞を示す情報とに基づいて歌唱音声を合成する（例えば、特許文献１）。従って、歌唱合成装置に歌唱合成を行わせるためには、曲を構成する音符毎に音符に合わせて歌唱する歌詞を指定する必要がある。 In the singing voice synthesizing apparatus, the singing voice is synthesized based on information indicating individual notes constituting the song and information indicating lyrics to be sung in accordance with the notes (for example, Patent Document 1). Therefore, in order for the synthesizer to perform singing synthesis, it is necessary to specify the lyrics to be sung in accordance with the notes for each note constituting the song.

特開２０１１−１２８１８６号公報JP 2011-128186 A

ところで、近年、ラップと呼ばれる歌唱法が流行している。このラップは、小節の終わりなどで韻を踏みながら、あまりメロディを付けずに、リズミカルに喋るように歌う歌唱法である。従来、このようなラップのための歌唱合成に適した歌唱合成装置は提供されていなかった。仮に従来の歌唱合成装置にラップの歌唱合成を行わせるものとすると、１つの音符に対して比較的多数の音韻列を含む歌詞を対応付ける必要があり、そのための操作が面倒なものとなる。 By the way, in recent years, a singing method called rap has become popular. This rap is a singing method that sings in a rhythmic manner without melody but with rhymes at the end of a measure. Conventionally, a singing voice synthesis apparatus suitable for singing voice synthesis for such a lap has not been provided. If a conventional synthesizer is to synthesize a rap song, it is necessary to associate lyrics containing a relatively large number of phoneme strings with one note, and the operation for that is troublesome.

この発明は以上のような事情に鑑みてなされたものであり、ラップの歌唱合成に適した歌唱合成装置を提供することを目的としている。 This invention is made | formed in view of the above situations, and it aims at providing the song synthesis apparatus suitable for the song synthesis | combination of a rap.

この発明は、歌詞データが示す歌詞を複数の断片に区切り、各断片から合成歌唱音のフレーズの発声態様を各々示す複数のフレーズデータを生成して記憶手段に書き込むフレーズデータ生成手段と、所望のフレーズデータを対象とする再生指示に応じて、当該フレーズデータを前記記憶手段から読み出し、読み出したフレーズデータを用いて合成歌唱音を生成するフレーズ再生手段とを具備することを特徴とする歌唱合成装置を提供する。 The present invention includes a phrase data generating unit that divides the lyrics indicated by the lyrics data into a plurality of fragments, generates a plurality of phrase data each indicating the utterance mode of the phrase of the synthesized singing sound from each fragment, and writes the phrase data to the storage unit; A singing voice synthesizing apparatus comprising: phrase reproducing means for reading out the phrase data from the storage means and generating a synthesized singing sound using the read phrase data in response to a reproduction instruction for phrase data. I will provide a.

かかる発明によれば、フレーズデータ生成手段により歌詞が複数の断片に区切られ、各断片から合成歌唱音のフレーズの発声態様を各々示す複数のフレーズデータが生成され、記憶手段に記憶される。そして、所望のフレーズデータを対象とする再生指示が発生した場合、フレーズ再生手段により、当該フレーズデータが記憶手段から読み出され、このフレーズデータを用いて合成歌唱音が生成される。従って、ラップの際の素材となるフレーズデータを容易に生成し、そのフレーズデータを利用したラップを容易に行うことができる。 According to this invention, the lyrics are divided into a plurality of fragments by the phrase data generation means, and a plurality of phrase data indicating the utterance mode of the phrase of the synthesized singing sound is generated from each fragment and stored in the storage means. And when the reproduction | regeneration instruction | indication which makes desired phrase data object generate | occur | produces, the said phrase data is read from a memory | storage means by a phrase reproduction | regeneration means, and a synthetic | combination song sound is produced | generated using this phrase data. Therefore, it is possible to easily generate phrase data as a material for wrapping and easily wrap using the phrase data.

この発明は、ラップのためのシーケンスデータを生成するシーケンスデータ編集装置としても実現され得る。このシーケンス編集装置は、操作手段と、表示手段と、時間軸を有するシーケンスデータ編集エリアを前記表示手段に表示させ、前記操作手段の操作に従って、歌詞の断片の合成歌唱音であるフレーズを示すブロックを前記シーケンスデータ編集エリア内にレイアウトし、前記シーケンスデータ編集エリア内のブロックの時間軸方向の位置に基づいて、当該ブロックに対応したフレーズの再生タイミングを決定し、前記シーケンスデータ編集エリア内の各ブロックに対応したフレーズの再生指示を時系列化したシーケンスデータを生成するシーケンスデータ生成手段とを具備する。 The present invention can also be realized as a sequence data editing apparatus that generates sequence data for wrapping. This sequence editing apparatus is configured to display an operation means, a display means, and a sequence data editing area having a time axis on the display means, and a block indicating a phrase that is a synthesized singing sound of a fragment of lyrics according to the operation of the operation means Are laid out in the sequence data editing area, and the playback timing of the phrase corresponding to the block is determined based on the position of the block in the sequence data editing area in the time axis direction. Sequence data generating means for generating sequence data in which a phrase playback instruction corresponding to a block is time-sequentially provided.

この態様によれば、フレーズを示すブロックをシーケンスデータ編集エリア内にレイアウトする簡単な操作により、ラップ演奏のためのシーケンスデータを作成することができる。 According to this aspect, it is possible to create sequence data for rap performance by a simple operation of laying out blocks indicating phrases in the sequence data editing area.

この発明の第１実施形態である歌唱合成装置の構成を示すブロック図である。It is a block diagram which shows the structure of the song synthesizing | combining apparatus which is 1st Embodiment of this invention. 同実施形態におけるフレーズデータ生成部１００の構成を示すブロック図である。It is a block diagram which shows the structure of the phrase data generation part 100 in the same embodiment. 同実施形態における歌唱合成部２００の構成を示すブロック図である。It is a block diagram which shows the structure of the song synthesis | combination part 200 in the embodiment. 同実施形態の動作を説明する図である。It is a figure explaining operation | movement of the embodiment. この発明の第２実施形態である歌唱合成装置の構成を示すブロック図である。It is a block diagram which shows the structure of the song synthesizing | combining apparatus which is 2nd Embodiment of this invention. 同実施形態の動作を示すタイムチャートである。It is a time chart which shows operation | movement of the embodiment. この発明の第３実施形態である歌唱合成装置の構成を示すブロック図である。It is a block diagram which shows the structure of the song synthesizing | combining apparatus which is 3rd Embodiment of this invention. 同実施形態におけるシーケンスデータの編集処理を示す図である。It is a figure which shows the edit process of the sequence data in the embodiment. この発明の他の実施形態において用いられるピアノロール表示を示す図である。It is a figure which shows the piano roll display used in other embodiment of this invention. この発明の他の実施形態におけるフレーズの音量操作を説明する図である。It is a figure explaining the volume operation of the phrase in other embodiment of this invention. この発明の他の実施形態において行われるブロックレイアウトを説明する図である。It is a figure explaining the block layout performed in other embodiment of this invention.

以下、図面を参照し、この発明の実施形態について説明する。
＜第１実施形態＞
図１は、本発明の第１実施形態である歌唱合成装置の構成例を示すブロック図である。この歌唱合成装置は、例えば、携帯型ゲーム機、携帯電話あるいはスマートフォンなど音声を出力する機能を備えた携帯端末に歌唱合成用のプログラム類をインストールしたものである。図１に示すように、歌唱合成装置は、ＣＰＵ１０、不揮発性メモリ２０、ＲＯＭ３０、ＲＡＭ４０、表示入力部５０、サウンドシステム６０および外部インタフェース群７０を有する。 Embodiments of the present invention will be described below with reference to the drawings.
<First Embodiment>
FIG. 1 is a block diagram illustrating a configuration example of a singing voice synthesis apparatus according to the first embodiment of the present invention. This singing synthesizer is obtained by installing a singing synthesizer program on a mobile terminal having a function of outputting sound, such as a portable game machine, a mobile phone, or a smartphone. As shown in FIG. 1, the song synthesizer includes a CPU 10, a nonvolatile memory 20, a ROM 30, a RAM 40, a display input unit 50, a sound system 60, and an external interface group 70.

ＣＰＵ１０は、この歌唱合成装置の各部を制御する制御中枢である。ＲＯＭ３０は、ローダなど、この歌唱合成装置の基本的な動作を制御するための制御プログラムを記憶した読み出し専用メモリである。ＲＡＭ４０は、ＣＰＵ１０によってワークエリアとして使用される揮発性メモリである。 CPU10 is a control center which controls each part of this song synthesis apparatus. The ROM 30 is a read-only memory that stores a control program for controlling basic operations of the singing voice synthesizing apparatus such as a loader. The RAM 40 is a volatile memory used as a work area by the CPU 10.

表示入力部５０は、例えばタッチパネルであり、装置の動作状態や入力データおよび操作者（ユーザ）に対するメッセージなどを表示する表示機能と、ユーザによって行われる操作を受け付ける入力機能とを有するユーザインタフェースである。ユーザが行う操作の内容には、歌詞を示す情報の入力、音符を示す情報の入力、１フレーズ分の合成歌唱音の再生指示の入力などが含まれる。なお、表示入力部５０は、タッチパネルのように表示機能と入力機能とが一体となった構成のもののほか、例えば、ディスプレイとキーボードのように表示機能と入力機能とを分離した構成のものであってもよい。 The display input unit 50 is, for example, a touch panel, and is a user interface having a display function for displaying an operation state of the apparatus, input data, a message for an operator (user), and the like, and an input function for receiving an operation performed by the user. . The contents of the operation performed by the user include input of information indicating lyrics, input of information indicating notes, input of an instruction to reproduce a synthesized singing sound for one phrase, and the like. The display input unit 50 has a configuration in which the display function and the input function are integrated, such as a touch panel, and a configuration in which the display function and the input function are separated, such as a display and a keyboard. May be.

外部インタフェース群７０は、ネットワークを介して他の装置との間でデータ通信を行うためのネットワークインタフェースや、フラッシュメモリなどの外部記憶媒体との間でデータの授受を行うためのドライバなどを含む。 The external interface group 70 includes a network interface for performing data communication with other devices via a network, a driver for transferring data with an external storage medium such as a flash memory, and the like.

サウンドシステム６０は、この歌唱合成装置により得られる合成歌唱音の波形を示す時系列デジタルデータを音声として出力する手段であり、合成歌唱音の波形を示す時系列デジタルデータをアナログ音声信号に変換するＤ／Ａ変換器と、このアナログ音声信号を増幅するアンプと、このアンプの出力信号を音として出力するスピーカ等により構成されている。 The sound system 60 is means for outputting, as speech, time-series digital data indicating the waveform of the synthesized singing sound obtained by the singing synthesizer, and converts the time-series digital data indicating the waveform of the synthesized singing sound into an analog voice signal. A D / A converter, an amplifier that amplifies the analog audio signal, a speaker that outputs the output signal of the amplifier as sound, and the like are included.

不揮発性メモリ２０は、各種のプログラムやデータベースなどの情報を記憶するための記憶装置であり、例えばＥＥＰＲＯＭ（ＥｌｅｃｔｒｉｃＥｒａｓａｂｌｅＰｒｏｇｒａｍａｂｌｅＲｅａｄＯｎｌｙＭｅｍｏｒｙ；電気的に消去および書き込みが可能なＲＯＭ）が用いられる。不揮発性メモリ２０の記憶内容のうち本実施形態に特有のものとして、歌唱合成プログラム２１、辞書２５、フレーズデータベース２６および音声素片データベース２８がある。ＣＰＵ１０は、表示入力部５０を介して入力される指示に従い、不揮発性メモリ２０内のプログラムをＲＡＭ４０にロードして実行する。 The nonvolatile memory 20 is a storage device for storing information such as various programs and databases. For example, an EEPROM (Electrically Erasable Programmable Read Only Memory; ROM that can be electrically erased and written) is used. Among the contents stored in the nonvolatile memory 20, there are a singing synthesis program 21, a dictionary 25, a phrase database 26, and a speech segment database 28 as specific to the present embodiment. The CPU 10 loads the program in the non-volatile memory 20 into the RAM 40 and executes it in accordance with an instruction input via the display input unit 50.

なお、不揮発性メモリ２０に記憶されているプログラム等は、ネットワークを介してダウンロードにより取引されても良い。この場合、プログラム等は、インターネット内のサイトから外部インタフェース群７０の中の適当なものを介してダウンロードされ不揮発性メモリ２０内にインストールされる。また、プログラム等は、コンピュータ読み取り可能な記憶媒体に記憶された状態で取引されても良い。この場合、プログラム等は、フラッシュメモリなどの外部記憶媒体を介して不揮発性メモリ２０内にインストールされる。 The program stored in the nonvolatile memory 20 may be traded by downloading via a network. In this case, the program or the like is downloaded from a site in the Internet via an appropriate one in the external interface group 70 and installed in the nonvolatile memory 20. The program or the like may be traded in a state stored in a computer-readable storage medium. In this case, the program or the like is installed in the nonvolatile memory 20 via an external storage medium such as a flash memory.

本実施形態の特徴は、歌唱合成プログラム２１にある。この歌唱合成プログラム２１は、フレーズデータ生成部１００と、歌唱合成部２００を有する。フレーズデータ生成部１００は、ユーザインタフェースたる表示入力部５０を介して歌詞データを受け取り、歌詞データが示す歌詞を複数の断片に分割し、各断片から発音記号列を各々生成し、各発音記号列に基づいて、複数のフレーズに対応した複数のフレーズデータを各々生成するプログラムである。ここで、フレーズは、ラップによる歌唱の際に一気に早口に発音する断片的な歌唱音声であり、フレーズデータはこのフレーズの発声態様を指示する時系列データである。本実施形態では、辞書２５がこのフレーズデータの生成の際に参照される。このフレーズデータ生成部１００により生成された各フレーズデータは、フレーズデータベース２６に格納される。歌唱合成部２００は、フレーズデータから歌唱音の合成を行う手段である。 The feature of this embodiment is in the song synthesis program 21. The song synthesis program 21 includes a phrase data generation unit 100 and a song synthesis unit 200. The phrase data generation unit 100 receives the lyric data via the display input unit 50 that is a user interface, divides the lyrics indicated by the lyric data into a plurality of fragments, generates a phonetic symbol string from each fragment, and generates each phonetic symbol string. Is a program for generating a plurality of phrase data corresponding to a plurality of phrases, respectively. Here, the phrase is a fragmentary singing voice that is quickly pronounced when singing by rap, and the phrase data is time-series data indicating the utterance mode of the phrase. In the present embodiment, the dictionary 25 is referred to when generating this phrase data. Each phrase data generated by the phrase data generation unit 100 is stored in the phrase database 26. The singing voice synthesizing unit 200 is a means for synthesizing the singing sound from the phrase data.

図２は、フレーズデータ生成部１００の構成を示すブロック図である。なお、この図２では、フレーズデータ生成部１００の各部の機能の理解を容易にするため、これらの各部と関連する辞書２５、フレーズデータベース２６等も併せて図示されている。 FIG. 2 is a block diagram illustrating a configuration of the phrase data generation unit 100. In FIG. 2, in order to facilitate understanding of functions of each unit of the phrase data generation unit 100, a dictionary 25, a phrase database 26, and the like related to these units are also illustrated.

本実施形態におけるフレーズデータ生成部１００は、図２に示す歌詞解析部１０１と、発音記号生成部１０２と、フレーズデータ合成部１０３とを有する。歌詞解析部１０１は、表示入力部５０の操作により入力される歌詞データをＲＡＭ４０内のワークエリアに格納し、この歌詞データが示す歌詞を断片、具体的には文節に区切る。ここで、文節とは、自立語（名詞、動詞など）に接語（助詞など）がつながった（接語は無いこともある）発音上の単位である。 The phrase data generation unit 100 in the present embodiment includes a lyrics analysis unit 101, a phonetic symbol generation unit 102, and a phrase data synthesis unit 103 shown in FIG. The lyric analysis unit 101 stores the lyric data input by the operation of the display input unit 50 in the work area in the RAM 40, and divides the lyrics indicated by the lyric data into fragments, specifically, phrases. Here, the phrase is a unit of pronunciation in which an adjunct (noun, verb, etc.) is connected to an adjunct (a particle etc.) (there may be no adjunct).

歌詞解析部１０１は、歌詞を文節に区切るに当たって、辞書２５を参照する。この辞書２５は、単語データおよびルールベースの集合体である。単語データは、名詞、動詞あるいは助詞などの単語の品詞を示す情報および単語に対応する発音記号情報を含む。さらに単語データは、その単語を発音するときの通常のピッチ変化を示すピッチモデルを含む。また、辞書２５は、このような単語毎に定義された情報の他、各種の文の構成に基づいて、人が通常その文を発声するときの発声音のピッチ軌跡を求めるためのルールを含む。 The lyric analysis unit 101 refers to the dictionary 25 when dividing the lyrics into phrases. The dictionary 25 is a collection of word data and rule base. The word data includes information indicating the part of speech of a word such as a noun, a verb, or a particle, and pronunciation symbol information corresponding to the word. Furthermore, the word data includes a pitch model that indicates a normal pitch change when the word is pronounced. In addition to the information defined for each word, the dictionary 25 includes rules for obtaining the pitch trajectory of the uttered sound when a person normally utters the sentence based on the structure of various sentences. .

そして、歌詞解析部１０１は、辞書２５の単語データと歌詞を構成する文字列を比較して歌詞に含まれる各単語の先頭位置を求める。次いで、歌詞解析部１０１は、辞書２５の単語データに含まれる品詞情報から、歌詞に含まれる各単語の品詞を判断し、単語を自立語と接語に区別する。そして、区別した自立語と接語の並び順から歌詞における文節の区切り位置を判断し、歌詞を文節に区切る。さらに歌詞解析部１０１は、辞書２５を参照することにより、人が通常その歌詞を発声するときの発声音のピッチ軌跡を求め、そのピッチ軌跡を文節毎に区分する。 Then, the lyric analysis unit 101 compares the word data in the dictionary 25 with the character string constituting the lyric to obtain the top position of each word included in the lyric. Next, the lyrics analyzing unit 101 determines the part of speech of each word included in the lyrics from the part of speech information included in the word data of the dictionary 25, and distinguishes the word into an independent word and a close word. Then, the division position of the phrase in the lyrics is determined from the order of distinct independent words and adjuncts, and the lyrics are divided into phrases. Furthermore, the lyric analysis unit 101 refers to the dictionary 25 to obtain a pitch trajectory of the uttered sound when a person normally utters the lyrics, and classifies the pitch trajectory for each phrase.

歌詞解析部１０１により歌詞から生成された複数の文節は、発音記号生成部１０２に引き渡される。本実施形態では、この文節がラップの際に歌唱するフレーズの素材となる。発音記号生成部１０２は、辞書２５を参照することにより、歌詞解析部１０１から引き渡された各文節を各々発音記号列に変換する。 The plurality of phrases generated from the lyrics by the lyrics analyzing unit 101 are delivered to the phonetic symbol generating unit 102. In this embodiment, this phrase becomes the material of the phrase sung during the lap. The phonetic symbol generation unit 102 refers to the dictionary 25 to convert each phrase delivered from the lyrics analysis unit 101 into a phonetic symbol string.

フレーズデータ合成部１０３は、歌詞を分割した各文節について、発音記号生成部１０２により得られた発音記号列を用いてフレーズデータを合成する。このフレーズデータは、時間軸を共通にする発音記号トラックとピッチトラックとにより構成されている。発音記号トラックは、発音記号列における各発音記号の発音タイミングを指定するトラックである。フレーズデータ合成部１０３は、発音記号列を構成する各発音記号を発音記号トラックに例えば一定の時間間隔を空けてマッピングする。ピッチトラックは、フレーズの期間中のピッチの時間変化を示すトラックである。このピッチトラックの生成方法は、どのようなラップを歌唱合成装置に行わせるかにより異なったものとなる。 The phrase data synthesizing unit 103 synthesizes phrase data for each phrase obtained by dividing the lyrics using the phonetic symbol string obtained by the phonetic symbol generating unit 102. This phrase data is composed of a phonetic symbol track and a pitch track having a common time axis. The phonetic symbol track is a track for designating the pronunciation timing of each phonetic symbol in the phonetic symbol string. The phrase data synthesis unit 103 maps each phonetic symbol constituting the phonetic symbol string to the phonetic symbol track, for example, with a certain time interval. The pitch track is a track showing a change in pitch over time during a phrase. The pitch track generation method differs depending on what kind of lap is to be performed by the synthesizer.

本実施形態による歌唱合成装置は、３種類のラップを行うことが可能である。ピッチをあまり変化させない通常のラップである第１のラップと、ピッチを変化させて歌唱らしく発声する第２のラップと、喋りのように発声する第３のラップである。ユーザは、いずれの種類のラップを行わせるかを表示入力部５０の操作により指定することが可能である。 The song synthesizer according to this embodiment can perform three types of laps. The first lap, which is a normal lap that does not change the pitch very much, the second lap that utters like a singing by changing the pitch, and the third lap that utters like a song. The user can specify which type of lap is to be performed by operating the display input unit 50.

ユーザが第１のラップを指示した場合、フレーズデータ合成部１０３は、フレーズの始点から終点まで一定のピッチをピッチトラックにマッピングする。好ましい態様では、ラップを自然なものにするため、各フレーズデータのピッチトラックにマッピングするピッチを擬似乱数等を利用して変化させ、フレーズ間にピッチの微妙なばらつきを持たせる。 When the user instructs the first lap, the phrase data synthesis unit 103 maps a constant pitch from the start point to the end point of the phrase on the pitch track. In a preferred embodiment, in order to make the lap natural, the pitch mapped to the pitch track of each phrase data is changed using a pseudo-random number or the like to give subtle variations in pitch between phrases.

ユーザが第２のラップを指示した場合、フレーズデータ合成部１０３は、フレーズデータの元となる文節と辞書２５とを参照し、その文節に含まれる単語を発音するときのピッチ変化を示すピッチモデルを辞書２５から読み出し、このピッチモデルをピッチトラックにマッピングする。 When the user instructs the second lap, the phrase data synthesizing unit 103 refers to the phrase that is the source of the phrase data and the dictionary 25, and indicates a pitch model that indicates a pitch change when a word included in the phrase is pronounced. Is read from the dictionary 25 and this pitch model is mapped to the pitch track.

ユーザが第３のラップを指示した場合、フレーズデータ合成部１０３は、歌詞解析部１０１が生成した各文節のピッチ軌跡のうちフレーズデータの元となる文節のピッチ軌跡をピッチトラックにマッピングする。 When the user instructs the third lap, the phrase data synthesis unit 103 maps the pitch trajectory of the phrase that is the basis of the phrase data among the pitch trajectories of each phrase generated by the lyrics analysis unit 101 to the pitch track.

このように本実施形態では、歌詞データが示す歌詞を区分した文節毎にフレーズデータが生成される。各文節とそれらの文節から得られたフレーズデータはそれらの元となった歌詞データとともにフレーズデータベース２６に格納される。 As described above, in this embodiment, phrase data is generated for each phrase in which the lyrics indicated by the lyrics data are segmented. Each phrase and the phrase data obtained from those phrases are stored in the phrase database 26 together with the original lyric data.

図２に示すように、フレーズデータベース２６を記憶するためのエリアは、歌詞データエリアとフレーズデータエリアに区分されている。歌詞データエリアには、フレーズデータ生成部１００の処理対象となった歌詞データがその歌詞データを識別するためのＩＤとともに記憶される。また、フレーズデータエリアには、歌詞データが示す歌詞を区分した文節毎に、当該文節が属する歌詞データのＩＤと、当該文節の歌詞内での通番ＦＮと、当該文節と、当該文節から得られたフレーズデータが記憶される。 As shown in FIG. 2, the area for storing the phrase database 26 is divided into a lyrics data area and a phrase data area. In the lyrics data area, the lyrics data to be processed by the phrase data generation unit 100 is stored together with an ID for identifying the lyrics data. In the phrase data area, the ID of the lyric data to which the phrase belongs, the serial number FN in the lyrics of the phrase, the phrase, and the phrase are obtained for each phrase into which the lyrics indicated by the lyrics data are divided. Stored phrase data.

図１において、歌唱合成部２００は、ユーザインタフェースたる表示入力部５０を介して所望のフレーズデータを対象とする再生指示を受け取り、当該再生指示の対象であるフレーズデータをフレーズデータベース２６から読み出し、このフレーズデータを用いてフレーズの合成歌唱音の波形を示す時系列デジタルデータを合成するプログラムである。本実施形態では、音声素片データベース２８がこのフレーズの時系列デジタルデータの生成の際に参照される。 In FIG. 1, a singing voice synthesizing unit 200 receives a reproduction instruction for a desired phrase data via the display input unit 50 as a user interface, reads out the phrase data that is the object of the reproduction instruction from the phrase database 26, and This is a program for synthesizing time-series digital data indicating a waveform of a synthesized singing sound of a phrase using phrase data. In the present embodiment, the speech segment database 28 is referred to when generating time-series digital data of this phrase.

この音声素片データベース２８は、歌声の素材となる各種の音声素片を示す音声素片データの集合体である。これらの音声素片データは、実際の人間が発した音声波形から抽出された音声素片に基づいて作成されたデータである。音声素片データベース２８では、声質の異なった歌手毎に、各歌手の歌唱音声波形から得られた音声素片データのグループが用意されている。歌唱合成部２００による歌唱合成の際、ユーザは、各種の音声素片データのグループの中から歌唱合成に使用する音声素片データのグループを選択することができる。 The speech segment database 28 is a collection of speech segment data indicating various speech segments that are singing voice materials. These speech segment data are data created based on speech segments extracted from speech waveforms emitted by actual humans. In the speech segment database 28, a group of speech segment data obtained from the singer speech waveform of each singer is prepared for each singer with different voice quality. At the time of song synthesis by the song synthesis unit 200, the user can select a group of speech segment data to be used for song synthesis from a group of various speech segment data.

図３は歌唱合成部２００の構成を示すブロック図である。図３に示すように、歌唱合成部２００は、読出制御部２２０、ピッチ変換部２３０、素片連結部２４０、音量制御部２５０の各ソフトウェアモジュールを含む。 FIG. 3 is a block diagram illustrating a configuration of the singing voice synthesizing unit 200. As shown in FIG. 3, the singing voice synthesizing unit 200 includes software modules such as a reading control unit 220, a pitch conversion unit 230, a segment connection unit 240, and a volume control unit 250.

読出制御部２２０は、フレーズデータの再生指示を受け取るためのＧＵＩを表示入力部５０を介してユーザに提供する機能と、このＧＵＩを介して所望のフレーズデータを対象とする再生指示を受け取り、その再生指示の対象であるフレーズデータをフレーズとして再生するための制御を行う機能とを有している。 The read control unit 220 receives a reproduction instruction for a desired phrase data via the GUI, a function for providing a user with a GUI for receiving a reproduction instruction for the phrase data via the display input unit 50, And a function of performing control for reproducing the phrase data that is the target of the reproduction instruction as a phrase.

まず、前者の機能について説明する。読出制御部２２０は、ユーザがラップによる歌唱を望んでいる歌詞データを特定するための処理を行う。この処理の態様には各種考えられるが、例えばこれまでにフレーズデータの生成を行った各歌詞データが示す歌詞の例えば先頭の所定文字数分の文字列をメニューとして表示入力部５０に表示させ、その中の所望の歌詞データをユーザに選択させてもよい。あるいは所望の歌詞の一部をキーワードとして入力させ、そのキーワードを含む歌詞データを表示入力部５０にメニュー表示させ、その中から所望の歌詞データを選択させてもよい。 First, the former function will be described. The reading control unit 220 performs a process for specifying the lyric data that the user desires to sing by rap. Various types of processing modes can be considered. For example, a character string corresponding to, for example, a predetermined number of characters at the beginning of the lyrics indicated by each lyric data for which phrase data has been generated is displayed on the display input unit 50 as a menu. The user may select desired lyric data. Alternatively, a part of desired lyrics may be input as a keyword, and lyrics data including the keyword may be displayed on the display input unit 50 as a menu, and desired lyrics data may be selected from the menu.

読出制御部２２０は、このようにしてユーザによって選択された歌詞データのＩＤをフレーズデータベース２６の歌詞データエリアから読み出し、このＩＤと同じＩＤに対応付けられた文節およびフレーズデータの各組をフレーズデータベース２６のフレーズデータエリアから読み出し、ＲＡＭ４０内のワークエリアに格納する。そして、ワークエリア内の各文節の内容を視覚的に示すブロックを表示入力部５０に表示させる。これによりユーザは、表示入力部５０に表示された所望の文節のブロックを指示することにより所望の文節のフレーズデータを対象とする再生指示を入力することができる。 The reading control unit 220 reads the ID of the lyrics data selected by the user in this way from the lyrics data area of the phrase database 26, and each phrase / phrase data pair associated with the same ID as this ID is stored in the phrase database. 26 phrase data areas are read out and stored in the work area in the RAM 40. And the block which shows visually the content of each clause in a work area is displayed on the display input part 50. FIG. Thus, the user can input a reproduction instruction for phrase data of a desired phrase by instructing a block of the desired phrase displayed on the display input unit 50.

次に後者の機能について説明する。読出制御部２２０は、表示入力部５０に表示された各ブロックの中の１つが指示されたことを検知すると、その指示されたブロックに対応したフレーズデータを特定し、ワークエリア内のこのフレーズデータの発音記号トラックおよびピッチトラックから発音記号およびピッチデータを各々読み出す。また、読出制御部２２０は、読み出した発音記号に対応した音声素片データを音声素片データベース２８から読み出す。そして、読出制御部２２０は、これら読み出した音声素片データとピッチデータをピッチ変換部２３０に送る。
以上が読出制御部２２０の機能である。 Next, the latter function will be described. When the reading control unit 220 detects that one of the blocks displayed on the display input unit 50 is designated, the reading control unit 220 identifies the phrase data corresponding to the designated block, and this phrase data in the work area. The phonetic symbol and pitch data are read out from the phonetic symbol track and the pitch track, respectively. Further, the reading control unit 220 reads out the speech unit data corresponding to the read phonetic symbol from the speech unit database 28. Then, the read control unit 220 sends the read speech element data and pitch data to the pitch conversion unit 230.
The above is the function of the read control unit 220.

ピッチ変換部２３０は、読出制御部２２０によって読み出された音声素片データにピッチ変換を施す手段である。ピッチ変換部２３０は、音声素片データとともに読出制御部２２０によって読み出されたピッチデータに従って音声素片データのピッチ変換を行い、このピッチデータが示すピッチを持った音声素片データを生成する。 The pitch conversion unit 230 is a unit that performs pitch conversion on the speech segment data read by the read control unit 220. The pitch conversion unit 230 performs pitch conversion of the speech unit data according to the pitch data read by the read control unit 220 together with the speech unit data, and generates speech unit data having a pitch indicated by the pitch data.

素片連結部２４０は、ピッチ変換部２３０の処理を経た音声素片データを滑らかに繋ぐ処理を行い、歌唱音声波形を示す時系列データとして出力する手段である。 The unit linking unit 240 is a unit that performs a process of smoothly connecting the speech unit data that has undergone the processing of the pitch conversion unit 230 and outputs it as time-series data indicating a singing speech waveform.

音量制御部２５０は、例えば表示入力部５０を利用して行われる音量設定操作に従い、素片連結部２４０から出力される時系列データの音量を制御し、最終的な合成歌唱音を示す時系列データとしてサウンドシステム６０に出力する手段である。
以上が本実施形態による歌唱合成装置の構成である。 The volume control unit 250 controls the volume of the time-series data output from the segment connecting unit 240 in accordance with a volume setting operation performed using the display input unit 50, for example, and shows a time series indicating the final synthesized singing sound It is a means for outputting to the sound system 60 as data.
The above is the configuration of the singing voice synthesizing apparatus according to the present embodiment.

図４は本実施形態の動作例を示す図である。以下、この図４を参照し、本実施形態の動作を説明する。ユーザは、所望の歌詞によるラップを歌唱合成装置に実行させたい場合、まず、その歌詞を示す歌詞データを表示入力部５０の操作により入力する。図４（Ａ）は、この操作により入力された歌詞データが示す歌詞「英治は走った。そして英治は全力で飛んだ。ああっ。」を示している。この歌詞データは、ＲＡＭ４０内のワークエリアに格納される。 FIG. 4 is a diagram illustrating an operation example of the present embodiment. The operation of this embodiment will be described below with reference to FIG. When the user wants the song synthesizer to perform rap with desired lyrics, first, the user inputs lyrics data indicating the lyrics by operating the display input unit 50. FIG. 4A shows the lyrics “Eiji ran. And Eiji flew with full power. Ah.” Indicated by the lyric data input by this operation. This lyrics data is stored in the work area in the RAM 40.

歌詞データの入力を終えると、ユーザは、表示入力部５０の操作により上述した第１〜第３のラップのうち所望のものを指定してフレーズデータの生成指示を入力する。この結果、フレーズデータ生成部１００の歌詞解析部１０１は、辞書２５を参照し、ワークエリア内の歌詞データを図４（Ｂ）に示すように文節に区切るとともに、上述した文節毎のピッチ軌跡を求める。図４（Ｂ）において、記号“｜”は文節の区切りを示す。 When the input of the lyrics data is completed, the user inputs a phrase data generation instruction by designating a desired one of the first to third laps described above by operating the display input unit 50. As a result, the lyric analysis unit 101 of the phrase data generation unit 100 refers to the dictionary 25 and divides the lyric data in the work area into phrases as shown in FIG. Ask. In FIG. 4B, the symbol “|” indicates a segment break.

このように区切られたそれぞれの文節は、発音記号生成部１０２に引き渡される。発音記号生成部１０２では、歌詞解析部１０１から引き渡された各文節が発音記号列に各々変換される。 Each phrase divided in this way is delivered to the phonetic symbol generation unit 102. In the phonetic symbol generation unit 102, each clause delivered from the lyrics analysis unit 101 is converted into a phonetic symbol string.

そして、フレーズデータ合成部１０３では、各文節毎に、フレーズデータが生成される。その際、フレーズデータのピッチトラックには、上述したように、ユーザによって指示されたラップの種類により異なった方法でピッチのマッピングが行われる。図４（Ｃ）には、このように文節毎に生成されるフレーズデータのピッチトラックの内容が例示されている。そして、各文節と、各文節に対応したフレーズデータは、歌詞データとともにフレースデータベース２６に格納される。 Then, the phrase data synthesis unit 103 generates phrase data for each phrase. At that time, as described above, pitch mapping is performed on the pitch track of the phrase data by a different method depending on the type of lap instructed by the user. FIG. 4C illustrates the contents of the pitch track of the phrase data generated for each phrase in this way. Each phrase and phrase data corresponding to each phrase are stored in the frace database 26 together with the lyrics data.

ラップによる歌唱を歌唱合成装置に実行させる場合、ユーザは、表示入力部５０を利用して、歌詞データを選択するための操作を行う。そして、ユーザが図４（Ａ）に示す歌詞データを選択すると、歌唱合成部２００の読出制御部２２０は、図４（Ａ）に示す歌詞データが示す歌詞の各文節と各文節に対応したフレーズデータをフレーズデータベース２６から読み出し、ＲＡＭ４０内のワークエリアに格納する。そして、ワークエリアに格納した各文節を視覚的に示すブロックを表示入力部５０に表示させる。この例では、図４（Ｄ）に例示するように、文節「英治は」を示すブロックＢ１、文節「走った」を示すブロックＢ２、文節「そして」を示すブロックＢ３、文節「英治は」を示すブロックＢ４、文節「全力で」を示すブロックＢ５、文節「飛んだ」を示すブロックＢ６、文節「ああっ」を示すブロックＢ７が表示入力部５０に表示される。このようにしてユーザからの再生指示に応じるための準備が完了する。 When making the song synthesizer execute a song by rap, the user uses the display input unit 50 to perform an operation for selecting lyrics data. When the user selects the lyric data shown in FIG. 4 (A), the reading control unit 220 of the singing composition unit 200 causes each phrase of the lyrics indicated by the lyric data shown in FIG. 4 (A) and the phrase corresponding to each phrase. Data is read from the phrase database 26 and stored in the work area in the RAM 40. Then, a block that visually indicates each phrase stored in the work area is displayed on the display input unit 50. In this example, as illustrated in FIG. 4D, a block B1 indicating the phrase “Eiji”, a block B2 indicating the phrase “run”, a block B3 indicating the phrase “and”, and a phrase “Eiji is”. A block B4 showing, a block B5 showing the phrase “with full power”, a block B6 showing the phrase “flew”, and a block B7 showing the phrase “oh” are displayed on the display input unit 50. In this way, the preparation for responding to the reproduction instruction from the user is completed.

この状態において、ユーザは表示入力部５０に表示された所望のフレーズ（文節）を示すブロックをタップすることによりラップによる歌唱を歌唱合成装置に実行させることができる。ここで、タップとは、表示されたブロックを指先で軽く叩く操作である。 In this state, the user can cause the singing apparatus to perform singing by rap by tapping a block indicating a desired phrase (sentence) displayed on the display input unit 50. Here, tapping is an operation of tapping the displayed block with a fingertip.

図４（Ｅ）はこのタップに応じて実行されるフレーズの歌唱合成の実行例を示すものである。この例では、時刻Ｔ１０、Ｔ１１、Ｔ１２、Ｔ１３、Ｔ１４、Ｔ１５において、ブロックＢ１、Ｂ１、Ｂ５、Ｂ７、Ｂ７、Ｂ６が各々タップされている。 FIG. 4E shows an execution example of phrase singing synthesis executed in response to this tap. In this example, the blocks B1, B1, B5, B7, B7, and B6 are tapped at times T10, T11, T12, T13, T14, and T15.

時刻Ｔ１０において、文節「英治は」を示すブロックＢ１がタップされると、読出制御部２２０は、そのタップされたブロックＢ１に対応した文節「英治は」のフレーズデータを特定し、ワークエリア内のこのフレーズデータの発音記号トラックおよびピッチトラックから発音記号およびピッチデータを各々読み出す。また、読出制御部２２０は、読み出した発音記号に対応した音声素片データを音声素片データベース２８から読み出す。そして、読出制御部２２０は、これら読み出した音声素片データとピッチデータをピッチ変換部２３０に送る。 At time T10, when the block B1 indicating the phrase “Eiji wa” is tapped, the reading control unit 220 identifies the phrase data of the phrase “Eiji wa” corresponding to the tapped block B1, and within the work area. The phonetic symbol and pitch data are read out from the phonetic symbol track and pitch track of the phrase data, respectively. Further, the reading control unit 220 reads out the speech unit data corresponding to the read phonetic symbol from the speech unit database 28. Then, the read control unit 220 sends the read speech element data and pitch data to the pitch conversion unit 230.

ピッチ変換部２３０は、読出制御部２２０によって読み出された音声素片データにピッチ変換を施し、読出制御部２２０によって読み出されたピッチデータが示すピッチを持った音声素片データを生成する。素片連結部２４０は、このピッチ変換部２３０の処理を経た音声素片データを滑らかに繋ぐ処理を行い、歌唱音声波形を示す時系列データを出力する。音量制御部２５０は、この素片連結部２４０から出力される時系列データの音量を制御し、最終的な合成歌唱音を示す時系列データとして出力する。このようにして文節「英治は」の合成歌唱音が放音される。 The pitch conversion unit 230 performs pitch conversion on the speech unit data read by the read control unit 220, and generates speech unit data having a pitch indicated by the pitch data read by the read control unit 220. The segment linking unit 240 performs a process of smoothly connecting the speech segment data that has undergone the processing of the pitch conversion unit 230, and outputs time-series data indicating a singing speech waveform. The volume control unit 250 controls the volume of the time series data output from the segment connection unit 240 and outputs the time series data indicating the final synthesized singing sound. In this way, the synthesized singing sound of the phrase “Eijiha” is emitted.

以下同様であり、時刻Ｔ１１においてブロックＢ１がタップされると、文節「英治は」の合成歌唱音が放音され、時刻Ｔ１２においてブロックＢ５がタップされると、文節「全力で」の合成歌唱音が放音され、…、時刻Ｔ１５においてブロックＢ６がタップされると、文節「飛んだ」の合成歌唱音が放音される。 The same applies to the following. When the block B1 is tapped at the time T11, the synthesized singing sound of the phrase “Eijiha” is emitted, and when the block B5 is tapped at the time T12, the synthesized singing sound of the phrase “full power”. Is emitted, and when the block B6 is tapped at time T15, the synthesized singing sound of the phrase “flew” is emitted.

なお、あるブロックがタップされ、そのタップされたブロックに対応したフレーズデータを合成歌唱音として放音している途中に次のタップが行われた場合、前のフレーズデータの発音は次のタップの時点で終了し、次のタップにより指示されたフレーズデータに基づく合成歌唱音を発音するようにしても良い。例えば、「英治は」を示すブロックＢ１がタップされて「英」まで発音した時点で再度「英治は」を示すブロックＢ１がタップされたたとき、「英英治は」というように発音する。このようにすると、スクラッチ（ディスクジョッキーで使われる同じ部分を反復再生させる技法）に似た効果を得ることができ、よりリズム感のある曲に合わせて容易にラップを行うことができる。 If a block is tapped and the next tap is performed while the phrase data corresponding to the tapped block is being emitted as a synthesized singing sound, the pronunciation of the previous phrase data is You may make it generate | occur | produce the synthetic | combination singing sound based on the phrase data instruct | indicated by the next tap which is complete | finished at the time. For example, when the block B1 indicating “Eiji wa” is tapped and sounded up to “English”, when the block B1 indicating “Eiji wa” is tapped again, the sound is generated as “Eiji wa wa”. In this way, it is possible to obtain an effect similar to scratch (a technique for repeatedly reproducing the same portion used in a disc jockey), and it is possible to easily rap a song with a more rhythmic feeling.

以上説明したように、本実施形態によれば、ユーザが所望の歌詞の歌詞データを歌唱合成装置に入力すると、歌唱合成装置のフレーズデータ生成部１００により歌詞データが示す歌詞の各文節からフレーズデータが生成される。そして、歌唱合成装置は、ラップ演奏を所望するユーザのために、歌詞の各文節を示すブロックを表示入力部５０に表示させる。この状態において、ユーザは、所望のタイミングにおいて所望の文節を示すブロックをタップすることにより、そのブロックに対応した文節のフレーズデータを合成歌唱音として放音させることができる。従って、ユーザは、本実施形態による歌唱合成装置を用いて、ラップを行うことができる。 As described above, according to the present embodiment, when the user inputs lyrics data of desired lyrics to the song synthesizer, phrase data is generated from each phrase of the lyrics indicated by the lyrics data by the phrase data generation unit 100 of the song synthesizer. Is generated. Then, the singing voice synthesizing apparatus causes the display input unit 50 to display blocks indicating each phrase of the lyrics for the user who desires the rap performance. In this state, by tapping a block indicating a desired phrase at a desired timing, the user can emit the phrase data of the phrase corresponding to the block as a synthesized singing sound. Therefore, the user can lap using the singing voice synthesizing apparatus according to the present embodiment.

＜第２実施形態＞
次にこの発明の第２実施形態である歌唱合成装置について説明する。本実施形態における歌唱合成装置は、上記第１実施形態における歌唱合成プログラムの歌唱合成部２００を図５に示す歌唱合成部２００Ａに置き換えたものである。この歌唱合成部２００Ａは、上記第１実施形態における歌唱合成部２００に対してピッチ変化制御部３２０を追加した構成となっている。 Second Embodiment
Next, a singing voice synthesizing apparatus according to a second embodiment of the present invention will be described. The singing voice synthesizing apparatus in the present embodiment is obtained by replacing the singing voice synthesizing unit 200 of the singing voice synthesizing program in the first embodiment with a singing voice synthesizing unit 200A shown in FIG. This singing composition unit 200A has a configuration in which a pitch change control unit 320 is added to the singing composition unit 200 in the first embodiment.

上記第1実施形態による歌唱合成部２００は、表示入力部５０に表示されたブロックをタップする再生指示に応じて、そのブロックに対応したフレーズデータを合成歌唱音として出力した。これに対し、本実施形態による歌唱合成部２００Ａは、表示入力部５０に表示されたブロックをフリックする再生指示に応じて、そのブロックに対応したフレーズデータを合成歌唱音として出力する。ここで、フリックとは、表示入力部５０に表示されたブロックを軽く払うような動作である。本実施形態における歌唱合成部２００Ａは、フリックされたブロックに対応したフレーズデータを用いて合成歌唱音を生成する際、フリックの向きに応じた方向のピッチ変化を合成歌唱音に対して与える。歌唱合成部２００Ａに設けられたピッチ変化制御部３２０は、このフリックの向きに応じたピッチ変化を合成歌唱音に対して与えるための手段である。 The singing voice synthesizing unit 200 according to the first embodiment outputs phrase data corresponding to the block as a synthesized singing sound in response to a playback instruction to tap the block displayed on the display input unit 50. On the other hand, according to the reproduction | regeneration instruction | indication which flicks the block displayed on the display input part 50, 200 A of song synthesis | combination parts by this embodiment output the phrase data corresponding to the block as a synthetic song sound. Here, the flick is an operation of lightly removing the block displayed on the display input unit 50. When the song synthesis unit 200A in the present embodiment generates a synthesized song sound using phrase data corresponding to a flicked block, the song synthesis unit 200A gives a pitch change in a direction corresponding to the direction of the flick to the synthesized song sound. The pitch change control unit 320 provided in the singing voice synthesis unit 200A is a means for giving a pitch change corresponding to the direction of the flick to the synthesized singing sound.

本実施形態において、読出制御部２２０とピッチ変化制御部３２０には、表示入力部５０の画面における指の接触位置を示す情報が供給される。あるブロックがフリックされた場合、読出制御部２２０は、そのフリックにより最初に発生する接触位置情報に基づき、フリックされたブロックを特定し、そのブロックに対応したフレーズデータを再生対象とする。このように本実施形態では、あるブロック内に指が最初に接触したときに、そのブロックに対応したフレーズデータが再生指示の対象に確定される。この確定後は、指が表示入力部５０の画面から離れない限り、指の接触位置がどのように移動しようと、再生指示の対象は変わらない。従って、ユーザは、ブロックの枠の制約を受けることなく、フリックを行うことが可能である。 In the present embodiment, the reading control unit 220 and the pitch change control unit 320 are supplied with information indicating the contact position of the finger on the screen of the display input unit 50. When a certain block is flicked, the read control unit 220 identifies the flicked block based on the first contact position information generated by the flick, and uses the phrase data corresponding to the block as a reproduction target. As described above, in this embodiment, when a finger first touches a certain block, the phrase data corresponding to that block is determined as a reproduction instruction target. After this determination, unless the finger is separated from the screen of the display input unit 50, the reproduction instruction target does not change regardless of how the finger contact position moves. Therefore, the user can perform flicking without being restricted by the block frame.

このようにして再生指示の対象であるフレーズデータが定まると、読出制御部２２０は、そのフレーズデータの発音記号トラックとピッチトラックから発音記号とピッチデータを読み出し、発音記号に対応した音声素片データを音声素片データベース２８から読み出す。そして、読出制御部２２０は、音声素片データをピッチ変換部２３０に送り、ピッチデータをピッチ変化制御部３２０に送る。 When the phrase data that is the target of the reproduction instruction is determined in this way, the reading control unit 220 reads the phonetic symbol and pitch data from the phonetic symbol track and the pitch track of the phrase data, and speech segment data corresponding to the phonetic symbol. Are read from the speech segment database 28. Then, the read control unit 220 sends the speech segment data to the pitch conversion unit 230 and sends the pitch data to the pitch change control unit 320.

ピッチ変化制御部３２０は、接触位置情報が示す接触位置の変化に基づき、フリックの向きを求め、このフリックの向きに応じた時間勾配で変化するピッチ変化量を発生する。具体的には、フリックの向きが右斜め上方向である場合、ピッチ変化制御部３２０は、正の時間勾配で変化するピッチ変化量を発生する。また、フリックの向きが右斜め下方向である場合、ピッチ変化制御部３２０は、負の時間勾配で変化するピッチ変化量を発生する。また、フリックの向きが水平方向である場合、ピッチ変化制御部３２０は、ピッチ変化量として０を発生する。右斜め上方向または右斜め下方向にフリックされた場合において、フリックの向きが水平方向に対してなす角度が大きくなる程、ピッチ変化量の時間勾配の絶対値を大きくしてもよい。そして、ピッチ変化制御部３２０は、読出制御部２２０から与えられたピッチデータにこのピッチ変化量を加算したピッチデータをピッチ変換部２３０に対して出力する。ピッチ変換部２３０、素片連結部２４０および音量制御部２５０の機能は上記第1実施形態において説明した通りである。 The pitch change control unit 320 obtains the direction of the flick based on the change in the contact position indicated by the contact position information, and generates a pitch change amount that changes with a time gradient corresponding to the direction of the flick. Specifically, when the direction of the flick is diagonally right upward, the pitch change control unit 320 generates a pitch change amount that changes with a positive time gradient. When the direction of the flick is diagonally downward to the right, the pitch change control unit 320 generates a pitch change amount that changes with a negative time gradient. When the direction of the flick is the horizontal direction, the pitch change control unit 320 generates 0 as the pitch change amount. When flicking in the upper right direction or the lower right direction, the absolute value of the time gradient of the pitch change amount may be increased as the angle formed by the flick direction with respect to the horizontal direction is increased. Then, the pitch change control unit 320 outputs pitch data obtained by adding the pitch change amount to the pitch data given from the read control unit 220 to the pitch conversion unit 230. The functions of the pitch conversion unit 230, the segment connection unit 240, and the volume control unit 250 are as described in the first embodiment.

図６は本実施形態の動作例を示すタイムチャートである。この例では、時刻Ｔ２０、Ｔ２１、Ｔ２２、Ｔ２３において、ブロックＢ１、Ｂ３、Ｂ１、Ｂ７が各々フリックされている。時刻Ｔ２０およびＴ２２では、ブロックＢ１が水平方向にフリックされている。従って、この時刻Ｔ２０およびＴ２２では、ブロックＢ１に対応した文節「英治は」の合成歌唱音として、フレーズデータのピッチトラックにより指定された通りのピッチを持った合成歌唱音が発音される。時刻Ｔ２１では、ブロックＢ３が斜め右下方向にフリックされている。従って、この時刻Ｔ２１では、負の時間勾配を持ったピッチ変化量が発生され、ブロックＢ３に対応した文節「そして」の合成歌唱音として、語尾に向かってピッチが低下する合成歌唱音が発音される。時刻Ｔ２３では、ブロックＢ７が斜め右上方向にフリックされている。従って、この時刻Ｔ２３では、正の時間勾配を持ったピッチ変化量が発生され、ブロックＢ７に対応した文節「ああっ」の合成歌唱音として、語尾に向かってピッチが上昇する合成歌唱音が発音される。 FIG. 6 is a time chart showing an operation example of the present embodiment. In this example, the blocks B1, B3, B1, and B7 are flicked at times T20, T21, T22, and T23, respectively. At times T20 and T22, the block B1 is flicked in the horizontal direction. Therefore, at times T20 and T22, a synthesized singing sound having the pitch specified by the pitch track of the phrase data is generated as the synthesized singing sound of the phrase “Eijiha” corresponding to the block B1. At time T21, the block B3 is flicked diagonally to the lower right. Therefore, at this time T21, a pitch change amount having a negative time gradient is generated, and a synthesized singing sound whose pitch decreases toward the end of the phrase “and” corresponding to the block B3 is generated. The At time T23, the block B7 is flicked obliquely in the upper right direction. Therefore, at time T23, a pitch change amount having a positive time gradient is generated, and a synthetic singing sound whose pitch increases toward the end of the phrase “Ah” corresponding to block B7 is generated. The

このように本実施形態によれば、画面に表示されているブロックをフリックすると、そのフリックされたブロックに対応するフレーズが、そのフレーズデータのピッチを基準としてフリックの向きに従ったピッチ変化がなされて発音される。このため、フレーズの語尾のピッチを上昇あるいは下降して発音させることができ、ユーザは、フリックという簡単な操作によりフレーズの発音に抑揚を付けてライブ演奏を行うことができる。 As described above, according to the present embodiment, when a block displayed on the screen is flicked, the phrase corresponding to the flicked block is changed in pitch according to the direction of the flick on the basis of the pitch of the phrase data. Pronounced. For this reason, the pitch of the ending of the phrase can be increased or decreased, and the user can perform live performance with inflection on the pronunciation of the phrase by a simple operation of flicking.

なお、フリックによりフレーズのピッチを変化させる場合に、フレーズの先頭から最後尾に向かってピッチを変化させるようにしてもよいが、フレーズの途中のピッチ変化開始点から最後尾に向かってピッチを変化させるようにしても良い。後者の場合、フレーズの先頭からピッチ変化開始点までの部分はフレーズデータのピッチトラックにより指定されたピッチで発音され、フレーズのピッチ変化開始点から最後尾までの部分はフリック操作に従ったピッチ変化が与えられて発音される。この場合、ユーザが表示入力部５０の操作によりフレーズにおけるピッチ変化開始点を指定することができるようにしてもよい。 When changing the pitch of a phrase by flicking, the pitch may be changed from the beginning to the end of the phrase, but the pitch is changed from the starting point of the pitch change in the middle of the phrase to the end. You may make it let it. In the latter case, the part from the beginning of the phrase to the pitch change start point is sounded at the pitch specified by the pitch track of the phrase data, and the part from the pitch change start point to the end of the phrase changes according to the flick operation. Is given and pronounced. In this case, the user may be able to specify the pitch change start point in the phrase by operating the display input unit 50.

また、フリックの速さ（速いあるいは遅い）を検出してフレーズの発音のパラメータとしてもよい。例えば、フリックの速さをフレーズを再生させる速さに対応させるなどである。また、フリックの距離を検出して発音のパラメータとしてもよい。例えば、長い距離に亙るフリックを検出した場合は文節のフレーズを最後まで再生させ、短い距離のフリックを検出した場合は文節のフレーズの途中まで再生させるなどである。 Alternatively, the speed of flicking (fast or slow) may be detected and used as a phrase pronunciation parameter. For example, the speed of flick is made to correspond to the speed of reproducing a phrase. Further, the flick distance may be detected and used as a sound generation parameter. For example, when a flick over a long distance is detected, the phrase phrase is played back to the end, and when a short distance flick is detected, the phrase phrase is played back halfway.

また、水平方向や右斜め上下方向の単純なフリック動作の他、種々のフリック動作を認識してピッチ変化に反映するようにしてもよい。例えば、フリックの途中まで右横方向に指の接触位置を移動させ、途中から右斜め上方向へ移動させることにより、フレーズの途中からピッチを上昇させるようにしてもよい。また、波状に指の接触位置を移動させることにより、ビブラートのような小刻みに上昇下降を繰り返すピッチ変化をフレーズに与えてもよい。この場合、フリックの継続時間（指が表示入力部５０の画面に接触し続ける時間）だけフレーズの発音を持続させるようにしてもよい。例えば文節「英治は」に対応したブロックがフリックされ、そのフリックの継続時間が長くなる場合には、「英治は」の「は」の母音である「あ」を長く延ばしてフレーズの発音継続時間を長くし、その間、複雑なフリック動作に応じて「あ」の音声に複雑なピッチの変化を与えればよい。 In addition to simple flick operations in the horizontal direction and diagonally right and up and down, various flick operations may be recognized and reflected in pitch changes. For example, the pitch may be raised from the middle of the phrase by moving the contact position of the finger in the right lateral direction to the middle of the flick and moving the finger diagonally upward to the right from the middle. Alternatively, the phrase may be given a pitch change that repeatedly rises and lowers in small increments, such as vibrato, by moving the contact position of the finger in a wave shape. In this case, the pronunciation of the phrase may be maintained for the duration of the flick (the time during which the finger continues to touch the screen of the display input unit 50). For example, if the block corresponding to the phrase “Eijiha” is flicked and the duration of the flick becomes longer, the vowel “a” that is the vowel of “Eijiha” is extended longer and the pronunciation duration of the phrase In the meantime, a complicated pitch change may be given to the voice of “A” in accordance with a complicated flick operation.

＜第３実施形態＞
図７はこの発明の第３実施形態である歌唱合成装置の構成を示すブロック図である。本実施形態による歌唱合成装置において、不揮発性メモリ２０には歌唱合成プログラム２１Ｂおよびシーケンスデータベース２７が記憶されている。歌唱合成プログラム２１Ｂは、上記第1および第２実施形態におけるフレーズデータ生成部１００と、上記第２実施形態における歌唱合成部２００Ａと、シーケンスデータ編集部３００と、シーケンスデータ再生部４００とを有する。 <Third Embodiment>
FIG. 7 is a block diagram showing a configuration of a singing voice synthesizing apparatus according to the third embodiment of the present invention. In the singing voice synthesizing apparatus according to the present embodiment, the non-volatile memory 20 stores a singing voice synthesis program 21B and a sequence database 27. The song synthesis program 21B includes the phrase data generation unit 100 in the first and second embodiments, the song synthesis unit 200A in the second embodiment, the sequence data editing unit 300, and the sequence data reproduction unit 400.

シーケンスデータ編集部３００は、表示入力部５０の操作に応じて、シーケンスデータを編集し、シーケンスデータベース２７に格納するプログラムである。このシーケンスデータは、フレーズデータの再生指示を時間軸上にマッピングした時系列データである。本実施形態において、1個のフレーズデータの再生指示は、当該フレーズデータを特定する情報（例えば上記第1実施形態における歌詞ＩＤと通番ＦＮの組）と、上記第２実施形態において説明したピッチ変化量の発生を指示する情報を含む。シーケンスデータ再生部４００は、ユーザによって指定されたシーケンスデータをシーケンスデータベース２７から読み出し、このシーケンスデータを再生し、シーケンスデータ中のフレーズデータの再生指示に従って、再生指示の対象であるフレーズデータを合成歌唱音として出力するプログラムである。 The sequence data editing unit 300 is a program that edits the sequence data in accordance with the operation of the display input unit 50 and stores it in the sequence database 27. This sequence data is time-series data in which phrase data playback instructions are mapped on the time axis. In the present embodiment, the reproduction instruction for one phrase data includes information specifying the phrase data (for example, the combination of the lyrics ID and the serial number FN in the first embodiment) and the pitch change described in the second embodiment. Contains information that indicates the generation of the quantity. The sequence data reproducing unit 400 reads out the sequence data designated by the user from the sequence database 27, reproduces the sequence data, and synthesizes the phrase data that is the target of the reproduction instruction in accordance with the reproduction instruction of the phrase data in the sequence data. It is a program that outputs as sound.

ここで、図８を参照し、本実施形態におけるシーケンスデータ編集部３００の機能を説明する。本実施形態におけるシーケンスデータ再生部４は、後述するように、フレーズデータベース２６内のフレーズデータを用いてシーケンスデータの再生を行う。従って、シーケンスデータが再生可能なものであるためには、そのシーケンスデータにおいて再生が指示されているフレーズデータがフレーズデータベース２６に格納されていなければならない。一方、フレーズデータベース２６の歌詞データエリアには、フレーズデータエリア内のフレーズデータの元となった歌詞データが格納されている（図２参照）。そこで、本実施形態におけるシーケンスデータ編集部３００は、ユーザに所望のフレーズ（文節）を含む歌詞データを選択するための操作を行わせる。そして、ユーザが例えば前掲図４（Ａ）の歌詞データを選択すると、シーケンスデータ編集部３００は、この歌詞データに関連した文節とフレーズデータの組をフレーズデータベース２６のフレーズデータエリアから読み出し、図８に例示するように、文節「英治は」を示すブロックＢ１、文節「走った」を示すブロックＢ２、文節「そして」を示すブロックＢ３、文節「英治は」を示すブロックＢ４、文節「全力で」を示すブロックＢ５、文節「飛んだ」を示すブロックＢ６、文節「ああっ」を示すブロックＢ７を表示入力部５０に表示させる。また、シーケンスデータ編集部３００は、これらのブロックの表示エリアの下にシーケンスデータ編集エリアを表示させる。このシーケンスデータ編集エリアは、横軸を時間、縦軸をピッチとするエリアである。 Here, the function of the sequence data editing unit 300 in this embodiment will be described with reference to FIG. The sequence data reproducing unit 4 according to the present embodiment reproduces sequence data using phrase data in the phrase database 26 as will be described later. Therefore, in order for the sequence data to be reproducible, phrase data for which reproduction is instructed in the sequence data must be stored in the phrase database 26. On the other hand, the lyrics data area of the phrase database 26 stores the lyrics data that is the basis of the phrase data in the phrase data area (see FIG. 2). Therefore, the sequence data editing unit 300 in the present embodiment causes the user to perform an operation for selecting lyric data including a desired phrase (sentence). When the user selects, for example, the lyric data shown in FIG. 4A, the sequence data editing unit 300 reads a phrase and phrase data set related to the lyric data from the phrase data area of the phrase database 26, and FIG. , Block B1 indicating the phrase “Eiji is”, block B2 indicating the phrase “run”, block B3 indicating the phrase “and”, block B4 indicating the phrase “Eiji is”, and the phrase “with full power” Are displayed on the display input unit 50. The block B5 indicating the phrase “Flew” and the block B7 indicating the phrase “Oh” are displayed on the display input unit 50. The sequence data editing unit 300 displays a sequence data editing area below the display area of these blocks. This sequence data editing area is an area in which the horizontal axis represents time and the vertical axis represents pitch.

このシーケンスデータ編集エリアが表示された状態において、ユーザは、シーケンスデータの編集を行うことができる。具体的には、ユーザが所望のフレーズに対応したブロックに指を接触させ、その接触位置を移動させると、そのブロックのコピーが接触位置に追従して移動する。そして、ユーザは、ブロックのコピーをシーケンスデータ編集エリアの時間軸上の所望の位置に移動させることにより、そのブロックに対応したフレーズデータの再生指示の発生タイミングをシーケンスデータ編集部３００に対して指示することができる。 In a state where this sequence data editing area is displayed, the user can edit the sequence data. Specifically, when the user touches a block corresponding to a desired phrase and moves the contact position, a copy of the block moves following the contact position. Then, the user moves the copy of the block to a desired position on the time axis of the sequence data editing area, thereby instructing the sequence data editing unit 300 to generate the phrase data playback instruction corresponding to the block. can do.

この状態において、ユーザが時間軸上のブロックのコピーの右端部分を上から下に向けてフリックすると、ブロックのコピーは右下がりに傾く。このようにしてユーザは、負の勾配で変化するピッチ変化量の発生指示をそのブロックのコピーが示すフレーズデータの再生指示に付加することができる。また、ユーザが時間軸上のブロックのコピーの左端部分を上から下に向けてフリックすると、ブロックのコピーは右上がりに傾く。このようにしてユーザは、正の勾配で変化するピッチ変化量の発生指示をそのブロックのコピーが示すフレーズデータの再生指示に付加することができる。 In this state, when the user flicks the right end portion of the copy of the block on the time axis from the top to the bottom, the copy of the block is inclined downward to the right. In this way, the user can add a pitch change amount generation instruction that changes with a negative gradient to the phrase data reproduction instruction indicated by the copy of the block. Further, when the user flicks the left end portion of the copy of the block on the time axis from the top to the bottom, the copy of the block is inclined upward. In this way, the user can add a pitch change amount generation instruction that changes with a positive gradient to the phrase data reproduction instruction indicated by the copy of the block.

図８に示す例では、時間軸上、左から順にブロックＢ１のコピーであるブロックＣＢ１ａ、ブロックＢ１のコピーであるブロックＣＢ１ｂ、ブロックＢ５のコピーであるブロックＣＢ５、ブロックＢ７のコピーであるブロックＣＢ７ａ、ブロックＢ７のコピーであるブロックＣＢ７ｂ、ブロックＢ６のコピーであるブロックＣＢ６が配置されている。ここで、ブロックＣＢ１ａ、ＣＢ５、ＣＢ７ｂは、時間軸上への配置後、フリック操作が行われておらず、水平な姿勢を保っている。従って、これらに対応したフレーズデータの再生指示にはピッチ変化量＝０の発生指示が付加される。ブロックＣＢ１ｂ、ＣＢ６は、時間軸上への配置後のフリック操作により、右下がりに傾いている。従って、これらに対応したフレーズデータの再生指示には負の勾配で変化するピッチ変化量の発生指示が付加される。ブロックＣＢ７ａは、時間軸上への配置後のフリック操作により、右上がりに傾いている。従って、これに対応したフレーズデータの再生指示には正の勾配で変化するピッチ変化量の発生指示が付加される。 In the example shown in FIG. 8, on the time axis, in order from the left, block CB1a that is a copy of block B1, block CB1b that is a copy of block B1, block CB5 that is a copy of block B5, block CB7a that is a copy of block B7, A block CB7b that is a copy of the block B7 and a block CB6 that is a copy of the block B6 are arranged. Here, after the blocks CB1a, CB5, and CB7b are arranged on the time axis, the flick operation is not performed and the horizontal posture is maintained. Accordingly, an instruction to generate a pitch change amount = 0 is added to the reproduction instruction of phrase data corresponding to these. The blocks CB1b and CB6 are inclined downward by a flick operation after being arranged on the time axis. Therefore, an instruction to generate a pitch change amount that changes with a negative gradient is added to the reproduction instruction of phrase data corresponding to these. The block CB7a is inclined upward by a flick operation after being arranged on the time axis. Therefore, an instruction to generate a pitch change amount that changes with a positive gradient is added to the corresponding phrase data reproduction instruction.

シーケンスデータ編集部３００は、このようにしてシーケンスデータ編集エリア内にレイアウトされたブロックに基づいて、シーケンスデータを生成し、シーケンスデータベース２７に格納する。すなわち、各ブロックの時間軸方向における位置に基づいて各ブロックに対応したフレーズデータの再生指示の発生タイミングを決定し、各ブロックの向きに基づいて各ブロックのフレーズデータに付加するピッチ変換量の発生指示の内容を決定し、このようにして得られるフレーズデータの再生指示を時系列化してシーケンスデータを生成するのである。 The sequence data editing unit 300 generates sequence data based on the blocks laid out in the sequence data editing area in this way, and stores them in the sequence database 27. That is, the generation timing of the phrase data playback instruction corresponding to each block is determined based on the position of each block in the time axis direction, and the amount of pitch conversion to be added to the phrase data of each block is generated based on the direction of each block The contents of the instruction are determined, and the phrase data reproduction instruction obtained in this way is time-series to generate sequence data.

シーケンスデータ再生部４００は、表示入力部５０の操作により指定されたシーケンスデータをシーケンスデータベース２７から読み出し、このシーケンスデータを先頭から走査してフレーズデータの再生指示を取り出し、歌唱合成部２００Ａ（図５参照）の読出制御部２２０に与える。また、再生指示に含まれるピッチ変化量の発生指示をピッチ変化制御部３２０に与える。 The sequence data reproducing unit 400 reads out the sequence data designated by the operation of the display input unit 50 from the sequence database 27, scans the sequence data from the top, takes out the reproduction instruction of the phrase data, and extracts the phrase data reproducing unit 200A (FIG. 5). To the read control unit 220). Also, a pitch change amount generation instruction included in the reproduction instruction is given to the pitch change control unit 320.

上記第２実施形態では、ピッチ変化制御部３２０は、フリック方向に基づいてピッチ変化量を発生した。これに対し、本実施形態におけるシーケンスデータの再生時においてピッチ変化制御部３２０は、フリック方向の代わりに、フレーズデータの再生指示から取り出されたピッチ変化量の発生指示に従ってピッチ変化量を発生する。ピッチ変化制御部３２０は、読出制御部２２０から与えられたピッチデータにこのピッチ変化量を加算したピッチデータをピッチ変換部２３０に送る。ピッチ変換部２３０、素片連結部２４０、音量制御部２５０の動作は既に説明した通りである。 In the second embodiment, the pitch change control unit 320 generates the pitch change amount based on the flick direction. On the other hand, when the sequence data is reproduced in the present embodiment, the pitch change control unit 320 generates the pitch change amount according to the pitch change amount generation instruction extracted from the phrase data reproduction instruction instead of the flick direction. The pitch change control unit 320 sends the pitch data obtained by adding the pitch change amount to the pitch data given from the read control unit 220 to the pitch conversion unit 230. The operations of the pitch conversion unit 230, the segment connection unit 240, and the volume control unit 250 are as described above.

本実施形態によれば、図８を参照して説明したように、簡単なブロックの移動操作および回転操作を行うことにより、所望のタイミングにおいて所望のフレーズを所望のピッチ変化を与えて再生させることができるシーケンスデータを生成することができ、このシーケンスデータを用いてラップの再生を行うことができる。 According to the present embodiment, as described with reference to FIG. 8, a desired phrase is reproduced with a desired pitch change at a desired timing by performing a simple block movement operation and rotation operation. Can be generated, and lap reproduction can be performed using this sequence data.

＜他の実施形態＞
以上、この発明の第１〜第３実施形態について説明したが、この発明には他にも実施形態が考えられる。例えば次の通りである。 <Other embodiments>
Although the first to third embodiments of the present invention have been described above, other embodiments are conceivable for the present invention. For example:

（１）上記各実施形態では、歌詞を文節に区切り、各文節からフレーズを各々生成した。しかし、ユーザが設定または編集を行うことにより、複数の文節を繋いだものから１つのフレーズを生成しても良い。また、上記各実施形態では、歌詞を文節に区切ったが、歌詞の区切り方はこれに限定されるものではない。また、例えば英語の歌詞の場合に、歌詞を単語に区切り、各単語からフレーズを生成してもよい。すなわち、歌詞を任意のルールに従って断片に区切り、各断片からフレーズを生成すればよい。 (1) In each of the above embodiments, the lyrics are divided into phrases, and phrases are generated from the phrases. However, one phrase may be generated from a combination of a plurality of clauses by the user setting or editing. Moreover, in each said embodiment, although the lyrics were divided | segmented into the phrase, the way of dividing a lyrics is not limited to this. For example, in the case of English lyrics, the lyrics may be divided into words, and a phrase may be generated from each word. That is, the lyrics may be divided into fragments according to an arbitrary rule, and a phrase may be generated from each fragment.

（２）上記第３実施形態において、シーケンスデータの入力を受け付けるための手段として、図９に示すように、ピアノ鍵盤画像３０１とその右横のシーケンスデータ編集エリア３０２からなるピアノロール表示を採用してもよい。ここで、シーケンスデータ編集エリア３０２は、横軸が時間、縦軸がピッチ（ピアノ鍵盤の各鍵）となっている。 (2) In the third embodiment, as shown in FIG. 9, a piano roll display composed of a piano keyboard image 301 and a sequence data editing area 302 on the right side is adopted as means for receiving sequence data input. May be. Here, in the sequence data editing area 302, the horizontal axis represents time, and the vertical axis represents pitch (each key of the piano keyboard).

図９に示す例では、文節「英治は」を示すブロックＣＢ２１、文節「全力で」を示すブロックＣＢ２２、文節「飛んだ」を示すブロックＣＢ２３がシーケンスデータ編集エリア３０２に配置されている。そして、この態様においてシーケンスデータ編集部は、ピッチ変化量の発生指示に加えて、ピッチオフセットの発生指示を含んだフレーズデータの再生指示を時系列化したシーケンスデータを発生する。 In the example shown in FIG. 9, a block CB21 indicating the phrase “Eijiha”, a block CB22 indicating the phrase “with full power”, and a block CB23 indicating the phrase “flyed” are arranged in the sequence data editing area 302. In this aspect, the sequence data editing unit generates sequence data in which phrase data reproduction instructions including a pitch offset generation instruction are time-series in addition to the pitch change generation instruction.

ここで、各ブロックに対応したフレーズデータの再生指示において、ピッチオフセットの発生指示により指示されるピッチオフセットは、各ブロックの五線譜上における上下方向位置（あるいは各ブロックの左横に位置するピアノの鍵）に基づいて決定され、ピッチ変化量の発生指示により指示されるピッチ変化量の時間勾配は、各ブロックの傾きに基づいて決定される。 Here, in the phrase data reproduction instruction corresponding to each block, the pitch offset instructed by the pitch offset generation instruction is the vertical position on the staff score of each block (or the piano key located on the left side of each block). ) And the time gradient of the pitch change amount indicated by the pitch change amount generation instruction is determined based on the inclination of each block.

シーケンスデータの再生時、歌唱合成部２００Ａのピッチ変化制御部３２０（図５参照）は、シーケンスデータから取り出された再生指示に含まれるピッチオフセットの発生指示に従ってピッチオフセットを発生し、ピッチ変化量の発生指示に従ってピッチ変化量を発生する。そして、再生対象であるフレーズデータのピッチトラックを無視して、ピッチオフセットおよびピッチ変化量を加算し、この加算結果であるピッチデータをピッチ変換部２３０に対して出力する。 When the sequence data is reproduced, the pitch change control unit 320 (see FIG. 5) of the singing synthesis unit 200A generates a pitch offset according to the pitch offset generation instruction included in the reproduction instruction extracted from the sequence data, and the pitch change amount A pitch change amount is generated in accordance with the generation instruction. Then, the pitch offset of the phrase data to be reproduced is ignored, the pitch offset and the pitch change amount are added, and the pitch data as the addition result is output to the pitch conversion unit 230.

この態様によれば、シーケンスデータ再生時、ブロックＣＢ２１に基づいて生成されたフレーズデータの再生指示がシーケンスデータから読み出されると、文節「英治は」のフレーズが黒鍵ＫＹ１のピッチで発音される。また、ブロックＣＢ２２に基づいて生成されたフレーズデータの再生指示がシーケンスデータから読み出されると、文節「全力で」のフレーズが、黒鍵ＫＹ２のピッチから始まり、語尾に向かって低下するピッチで発音される。また、ブロックＣＢ２３に基づいて生成されたフレーズデータの再生指示がシーケンスデータから読み出されると、文節「飛んだ」のフレーズが白鍵ＫＹ３のピッチで発音される。この態様によれば、シーケンス再生においてフレーズのピッチを大きく変化させることができ、ラップ演奏の幅を広げることができる。 According to this aspect, when reproducing the sequence data, when the phrase data reproduction instruction generated based on the block CB21 is read from the sequence data, the phrase “Eijiha” is pronounced at the pitch of the black key KY1. When the phrase data playback instruction generated based on the block CB22 is read from the sequence data, the phrase “full power” is pronounced at a pitch that starts from the pitch of the black key KY2 and decreases toward the end of the phrase. The When the phrase data reproduction instruction generated based on the block CB23 is read from the sequence data, the phrase “flew” is pronounced at the pitch of the white key KY3. According to this aspect, the pitch of the phrase can be changed greatly in sequence reproduction, and the width of the rap performance can be widened.

（４）上記第２実施形態において、ブロックをフリックする速度を検出し、この速度に応じてフレーズを発音する際の音量を制御するようにしてもよい。 (4) In the second embodiment, the speed at which the block is flicked may be detected, and the sound volume when the phrase is pronounced may be controlled according to the speed.

（５）上記第３実施形態において、シーケンスデータの編集の際に、ユーザによって行われる操作に従って、シーケンスデータ編集エリアにレイアウトするブロックの形状を変化させ、このブロックの形状に合わせて、ブロックに対応したフレーズを発音する際の音量を変化させるようにしてもよい。 (5) In the third embodiment, when editing sequence data, the shape of the block to be laid out in the sequence data editing area is changed according to the operation performed by the user, and the block is adapted to the shape of this block. You may make it change the sound volume at the time of pronounce | playing the phrase which carried out.

図１０はこのブロックの変形による音量制御を説明する図である。この例において、シーケンスデータ編集部３００は、２種類の音量変化指示ボタンＳＢ１およびＳＢ２を表示入力部５０に表示させる。 FIG. 10 is a diagram for explaining volume control by deformation of this block. In this example, the sequence data editing unit 300 causes the display input unit 50 to display two kinds of volume change instruction buttons SB1 and SB2.

ここで、ユーザがシーケンスデータ編集エリア内の例えば文節「英治は」を示すブロックＣＢ３０を指示した後、音量変化指示ボタンＳＢ１を指示したとする。この場合、シーケンスデータ編集部３００は、ブロックＣＢ３０を右に向かう程太くなるブロックＣＢ３０ａに変形させるとともに、フレーズの語尾に向かって音量を増大させる音量制御データを発生し、フレーズデータの再生指示に付加する。 Here, it is assumed that the user designates the volume change instruction button SB1 after instructing the block CB30 indicating, for example, the phrase “Eijiha” in the sequence data editing area. In this case, the sequence data editing unit 300 transforms the block CB30 into a block CB30a that becomes thicker toward the right, generates volume control data that increases the volume toward the end of the phrase, and adds it to the phrase data playback instruction. To do.

これに対し、ユーザがシーケンスデータ編集エリア内のブロックＣＢ３０を指示した後、音量変化指示ボタンＳＢ２を指示したとする。この場合、シーケンスデータ編集部３００は、ブロックＣＢ３０を右に向かう程細くなるブロックＣＢ３０ｂに変形させるとともに、フレーズの語尾に向かって音量を小さく絞る音量制御データを発生し、フレーズデータの再生指示に付加する。 On the other hand, it is assumed that the user designates the volume change instruction button SB2 after instructing the block CB30 in the sequence data editing area. In this case, the sequence data editing unit 300 transforms the block CB30 into a block CB30b that becomes narrower toward the right, and generates volume control data that reduces the volume toward the end of the phrase and adds it to the phrase data playback instruction. To do.

そして、シーケンスデータの再生時には、このようにしてフレーズデータの再生指示に付加された音量制御データに従ってフレーズの音量を制御するのである。この態様によれば、ピッチ変化に加えて、音量変化をフレーズに与えることができ、迫力のあるラップ演奏を行うことができる。 When the sequence data is reproduced, the volume of the phrase is controlled according to the volume control data added to the phrase data reproduction instruction in this way. According to this aspect, in addition to the pitch change, a volume change can be given to the phrase, and a powerful rap performance can be performed.

（６）上記第１実施形態および第２実施形態を組み合わせた態様が考えられる。この態様において表示入力部５０に表示されたブロックがタップされたとき、歌唱合成部は、上記第１実施形態のように、タップされたブロックに対応したフレーズデータを用いてフレーズを生成する。また、表示入力部５０に表示されたブロックがフリックされたとき、歌唱合成部は、上記第２実施形態のように、フリックされたブロックに対応したフレーズデータを用いてフレーズを生成し、その際にフリック方向に応じたピッチ変化をフレーズに与えるのである。また、この態様において歌唱合成部は、ブロックがタップされたときは、そのブロックに対応したフレーズデータにより定まる時間長のフレーズを生成し、ブロックがフリックされたときは、そのブロックに対応したフレーズデータを用いてフリックの継続時間に相当する時間だけ継続するフレーズを生成してもよい。 (6) The aspect which combined the said 1st Embodiment and 2nd Embodiment can be considered. In this aspect, when the block displayed on the display input unit 50 is tapped, the singing composition unit generates a phrase using phrase data corresponding to the tapped block, as in the first embodiment. Further, when the block displayed on the display input unit 50 is flicked, the singing composition unit generates a phrase using phrase data corresponding to the flicked block as in the second embodiment, and at that time The pitch change according to the flick direction is given to the phrase. In this aspect, when the block is tapped, the singing composition unit generates a phrase having a length of time determined by the phrase data corresponding to the block, and when the block is flicked, the phrase data corresponding to the block is generated. May be used to generate a phrase that lasts for a time corresponding to the duration of the flick.

（７）上記第１および第２実施形態において、ユーザがブロックをタップまたはフリックするタイミングが適切であるかどうか（例えば一定のリズムに同期してタップまたはフリックしているかどうか）を評価し、評価結果を表示する機能を歌唱合成装置に設けてもよい。あるいは、適切でないタイミングにタップまたはフリックが行われた場合に、その直後の適切なタイミングにおいてタップまたはフリックが行われたものとして扱う補正機能を歌唱合成装置に設けてもよい。 (7) In the first and second embodiments, whether or not the timing at which the user taps or flicks a block is appropriate (for example, whether or not the user taps or flicks in synchronization with a certain rhythm) is evaluated. You may provide the function which displays a result in a song synthesizing | combining apparatus. Alternatively, when a tap or flick is performed at an inappropriate timing, the singing synthesis apparatus may be provided with a correction function that treats the tap or flick as being performed at an appropriate timing immediately after that.

（８）上記第１実施形態に上記第３実施形態の技術を導入した態様が考えられる。この態様では、前掲図４（Ｄ）のように表示入力部５０にブロックが表示された状態において、所望のブロックを傾かせる操作を行うことができる。そして、ブロックが傾けられた場合に、歌唱合成部は、そのブロックの傾きに対応した時間勾配で変化する時系列のピッチ変化量を生成し、この時系列のピッチ変化量をそのブロックに対応したフレーズデータのピッチトラックのピッチデータに加えるのである。 (8) The aspect which introduce | transduced the technique of the said 3rd Embodiment to the said 1st Embodiment can be considered. In this aspect, an operation of tilting a desired block can be performed in a state where the block is displayed on the display input unit 50 as shown in FIG. 4D. When the block is tilted, the singing composition unit generates a time-series pitch change amount that changes with a time gradient corresponding to the block inclination, and the time-series pitch change amount corresponds to the block. The phrase data is added to the pitch data of the pitch track.

このブロックがタップされたときには、このブロックの傾きに対応したピッチ変化を持ったフレーズが生成される。このようにブロックを傾かせる操作を行うことにより、所望のピッチ変化を持ったフレーズを生成することができる。また、この態様において、ブロックのコピーを生成し、このコピーであるブロックを傾かせる操作を行えるようにしてもよい。このように傾いたブロックのコピーが生成された場合、元のブロックに対応したフレーズデータのコピーを生成し、このコピーであるフレーズデータのピッチトラックにブロックの傾きに応じたピッチ変化量を加えるのである。そして、元のブロックがタップされた場合には、元のブロックに対応したフレーズデータを用いてフレーズを合成し、傾いたコピーのブロックがタップされた場合には、コピーのフレーズデータを用いてフレーズを合成するのである。この態様によれば、同じ文節のフレーズを各種のピッチ変化で合成することができ、多彩な演奏が可能になる。 When this block is tapped, a phrase having a pitch change corresponding to the inclination of this block is generated. By performing the operation of tilting the block in this way, a phrase having a desired pitch change can be generated. In this aspect, a copy of a block may be generated, and an operation for tilting the block that is the copy may be performed. When a copy of a block tilted in this way is generated, a copy of the phrase data corresponding to the original block is generated, and a pitch change amount corresponding to the block inclination is added to the pitch track of the phrase data that is the copy. is there. When the original block is tapped, the phrase is synthesized using the phrase data corresponding to the original block, and when the tilted copy block is tapped, the phrase is copied using the phrase data of the copy. Is synthesized. According to this aspect, phrases of the same phrase can be synthesized with various pitch changes, and various performances can be performed.

（９）上記第１実施形態に上記第３実施形態の技術を導入した他の態様として次のようなものが考えられる。まず、図１１に示すようにピアノ鍵盤画像３０１とその右横のブロックレイアウトエリア３０３を表示入力部５０に表示させる。ここで、ブロックレイアウトエリア３０３は、縦軸がピッチ（あるいはピアノ鍵盤の各鍵）であるが、横軸はない。このブロックレイアウトエリア３０３は、複数のブロックを横に並べて配置することができるように横方向の広がりを持っている。この態様において、ユーザは所望のフレーズに対応したブロックをブロックレイアウトエリア３０３に配置することができる。その際にブロックを傾かせることもできる。図１１に示す例では、ブロックＣＢ２１〜ＣＢ２６がブロックレイアウトエリア３０３内に配置されている。 (9) As another aspect in which the technique of the third embodiment is introduced into the first embodiment, the following is conceivable. First, as shown in FIG. 11, a piano keyboard image 301 and a block layout area 303 on the right side thereof are displayed on the display input unit 50. Here, in the block layout area 303, the vertical axis is the pitch (or each key of the piano keyboard), but there is no horizontal axis. The block layout area 303 has a horizontal extent so that a plurality of blocks can be arranged side by side. In this aspect, the user can place a block corresponding to a desired phrase in the block layout area 303. At that time, the block can be tilted. In the example shown in FIG. 11, the blocks CB21 to CB26 are arranged in the block layout area 303.

この状態において、ユーザが文節「全力で」のフレーズに対応したブロックＣＢ２５をタップしたとする。この場合、歌唱合成部は、ブロックＣＢ２５の左横の白鍵ＫＹ３に対応したピッチオフセットを発生する。また、ブロックＣＢ２５が水平方向を向いているので、歌唱合成部は、ピッチ変化量として０を発生する。そして、歌唱合成部は、ブロックＣＢ２５に対応したフレーズの歌唱音を合成する場合に、ブロックＣＢ２５に対応したフレーズデータのピッチトラックを無視し、白鍵ＫＹ３に対応したピッチオフセットとピッチ変化量＝０を加算し、この加算結果に相当するピッチを持った歌唱音を合成する。従って、この場合、白鍵ＫＹ３のピッチでフレーズ「全力で」が発音される。 In this state, it is assumed that the user taps the block CB25 corresponding to the phrase “with full power”. In this case, the song synthesis unit generates a pitch offset corresponding to the white key KY3 on the left side of the block CB25. Further, since the block CB25 faces the horizontal direction, the singing composition unit generates 0 as the pitch change amount. Then, when synthesizing the singing sound of the phrase corresponding to the block CB25, the singing synthesis unit ignores the pitch track of the phrase data corresponding to the block CB25, and the pitch offset and the pitch change amount corresponding to the white key KY3 = 0. Is added and a singing sound having a pitch corresponding to the addition result is synthesized. Accordingly, in this case, the phrase “with full power” is pronounced at the pitch of the white key KY3.

また、ユーザが文節「全力で」のフレーズに対応したブロックであって、右下がりに傾いたブロックＣＢ２２をタップしたとする。この場合、歌唱合成部は、ブロックＣＢ２２の左横の黒鍵ＫＹ２に対応したピッチオフセットを発生する。また、ブロックＣＢ２２が右下がりに傾いているので、歌唱合成部は、負の時間勾配で変化するピッチ変化量を発生する。そして、歌唱合成部は、ブロックＣＢ２２に対応したフレーズの歌唱音を合成する場合に、黒鍵ＫＹ２に対応したピッチオフセットと負の時間勾配で変化するピッチ変化量を加算し、この加算結果に相当するピッチを持った歌唱音を合成する。従って、この場合、黒鍵ＫＹ２のピッチから始まり、語尾に向かってピッチが低下するフレーズ「全力で」が発音される。 Further, it is assumed that the user taps a block CB22 corresponding to the phrase “full power” and tilted downward. In this case, the song synthesis unit generates a pitch offset corresponding to the black key KY2 on the left side of the block CB22. Further, since the block CB22 is inclined downward, the singing composition unit generates a pitch change amount that changes with a negative time gradient. Then, when the singing sound of the phrase corresponding to the block CB22 is synthesized, the singing synthesis unit adds the pitch offset corresponding to the black key KY2 and the pitch change amount that changes with a negative time gradient, and corresponds to this addition result. Synthesize a singing sound with a pitch to play. Therefore, in this case, the phrase “full power” is pronounced starting from the pitch of the black key KY2 and decreasing toward the end of the word.

（１０）上記各実施形態では、フレーズの再生指示が与えられたときに、歌唱合成部がフレーズデータから歌唱音声波形を示す時系列データを合成した。しかし、このように再生時に歌唱合成を行う代わりに、歌詞を断片に区分した際に、各断片のフレーズデータとして、その断片の歌唱音声波形を示す時系列データを生成し、これをフレーズデータとして記憶し、再生指示が発生したときには、その対象であるフレーズデータをそのままサウンドシステムから音として出力してもよい。 (10) In each of the above embodiments, when a phrase reproduction instruction is given, the singing synthesizing unit synthesizes time-series data indicating the singing voice waveform from the phrase data. However, instead of performing singing synthesis during playback in this way, when the lyrics are divided into fragments, time-series data indicating the singing voice waveform of the fragment is generated as the phrase data of each fragment, and this is used as the phrase data. When it is stored and a reproduction instruction is generated, the target phrase data may be output as it is from the sound system.

（１１）上記各実施形態において、フリックによりブロックが１８０度回転された場合に、そのブロックに対応したフレーズを逆転再生させるようにしてもよい。 (11) In the above embodiments, when a block is rotated 180 degrees by flicking, the phrase corresponding to the block may be reproduced in reverse.

（１２）上記第３実施形態では、フレーズデータ生成機能、タップまたはフリックに応じたフレーズデータの再生機能、シーケンスデータ編集機能、シーケンスデータ再生機能を備えた歌唱合成装置を説明した。しかし、シーケンスデータ編集機能のみを備えたシーケンスデータ編集装置を構成してもよい。 (12) In the third embodiment, the singing voice synthesizing apparatus having the phrase data generation function, the phrase data reproduction function in response to the tap or flick, the sequence data editing function, and the sequence data reproduction function has been described. However, a sequence data editing apparatus having only a sequence data editing function may be configured.

（１３）上記各実施形態における歌唱合成装置は、辞書２５、フレーズデータベース２６、音声素片データベース２８、シーケンスデータベース２７を不揮発性メモリ２０内に記憶した。しかし、このような歌唱合成装置に設けられた記憶手段ではなく、外部の記憶手段に辞書２５、フレーズデータベース２６、音声素片データベース２８、シーケンスデータベース２７を記憶させ、インターネット等のネットワークを介してこれらのデータにアクセスすることにより歌唱合成を行う歌唱合成装置を構成してもよい。 (13) The song synthesizer in each of the above embodiments stores the dictionary 25, the phrase database 26, the speech segment database 28, and the sequence database 27 in the nonvolatile memory 20. However, the dictionary 25, the phrase database 26, the speech segment database 28, and the sequence database 27 are stored in an external storage means, not the storage means provided in such a singing voice synthesizing apparatus, and these are stored via a network such as the Internet. You may comprise the song synthesizing | combining apparatus which synthesize | combines a song by accessing the data.

（１４）上記第２実施形態では、ブロックのフリックが行われるのに応じて、そのブロックに対応したフレーズの発声時のピッチを変化させた。しかし、ピッチ以外のフレーズの発声態様を変化させてもよい。例えばブロックのフリックが行われるのに応じて、そのブロックに対応したフレーズの音量や再生スピードを変化させてもよい。 (14) In the second embodiment, as the block is flicked, the pitch at the time of utterance of the phrase corresponding to the block is changed. However, you may change the utterance aspect of phrases other than a pitch. For example, as the block is flicked, the volume and playback speed of the phrase corresponding to the block may be changed.

（１５）上記第３実施形態では、ブロックを傾ける操作が行われた場合、そのブロックに対応したフレーズの発声時のピッチにブロックの傾きに応じた時間変化を与えた。しかし、ピッチ以外のフレーズの発声態様にブロックの傾きに応じた時間変化を与えてもよい。例えばそのブロックに対応したフレーズの音量や再生スピードにブロックの傾きに応じた時間変化を与えてもよい。 (15) In the third embodiment, when an operation of tilting a block is performed, a time change corresponding to the block tilt is given to the pitch when the phrase corresponding to the block is uttered. However, you may give the time change according to the inclination of a block to the utterance aspect of phrases other than a pitch. For example, a time change corresponding to the inclination of the block may be given to the volume and playback speed of the phrase corresponding to the block.

１０…ＣＰＵ、２０…不揮発性メモリ、２１，２１Ｂ…歌唱合成プログラム、１００…フレーズデータ生成部、２００，２００Ａ…歌唱合成部、３００…シーケンスデータ編集部、４００…シーケンスデータ生成部、２５…辞書、２６…フレーズデータベース、２７…シーケンスデータベース、２８…音声素片データベース、３０…ＲＯＭ、４０…ＲＡＭ、５０…表示入力部、６０…サウンドシステム、７０…外部インタフェース群、１０１…歌詞解析部、１０２…発音記号生成部、１０３…フレーズデータ合成部、２２０…読出制御部、２３０…ピッチ変換部、２４０…素片連結部、２５０…音量制御部、３２０…ピッチ変化制御部、３０１…ピアノ鍵盤画像、３０２…シーケンスデータ編集エリア、３０３…ブロックレイアウトエリア。 DESCRIPTION OF SYMBOLS 10 ... CPU, 20 ... Non-volatile memory, 21,21B ... Singing synthesis program, 100 ... Phrase data generation unit, 200,200A ... Singing synthesis unit, 300 ... Sequence data editing unit, 400 ... Sequence data generation unit, 25 ... Dictionary , 26 ... phrase database, 27 ... sequence database, 28 ... speech segment database, 30 ... ROM, 40 ... RAM, 50 ... display input unit, 60 ... sound system, 70 ... external interface group, 101 ... lyric analysis unit, 102 DESCRIPTION OF SYMBOLS ... Phonetic symbol generation part, 103 ... Phrase data composition part, 220 ... Reading control part, 230 ... Pitch conversion part, 240 ... Segment connection part, 250 ... Volume control part, 320 ... Pitch change control part, 301 ... Piano keyboard image 302 ... Sequence data editing area 303 ... Block layout area.

Claims

Phrase data generating means for dividing the lyrics indicated by the lyric data into a plurality of fragments, generating a plurality of phrase data each indicating the utterance mode of the phrase of the synthesized singing sound from each fragment, and writing it into the storage means;
Display means;
Pointing position detecting means;
In response to a playback instruction for desired phrase data, the phrase data is read from the storage means, and a synthesized singing sound is generated using the read phrase data, each corresponding to a plurality of phrase data Phrase data corresponding to the block is displayed by displaying a plurality of blocks on the display means and detecting that one of the blocks displayed on the display means is designated by the designated position detecting means. When the synthetic singing sound is generated using the and the operation of tilting the block displayed on the display means is performed, when generating the synthetic singing sound according to the reproduction instruction of the phrase data corresponding to the block, A phrase synthesizer comprising: phrase reproducing means for giving a change corresponding to the inclination of the block to the synthesized singing sound .

Phrase data generating means for dividing the lyrics indicated by the lyric data into a plurality of fragments, generating a plurality of phrase data each indicating the utterance mode of the phrase of the synthesized singing sound from each fragment, and writing it into the storage means;
Display means;
Pointing position detecting means;
In response to a playback instruction for desired phrase data, the phrase data is read from the storage means, and a synthesized singing sound is generated using the read phrase data, and a block layout area having a pitch axis is provided. Each block that is displayed on the display means and indicates each phrase data obtained from the lyrics data is displayed on the display means, and according to an operation performed by the user, a block corresponding to the desired phrase data is selected among these blocks. When the designated position detecting means detects that a block in the block layout area is laid out and the block in the block layout area is designated, a synthesized singing sound is generated using phrase data corresponding to the block, On the basis of the position of the block in the pitch axis direction And phrases reproducing means for controlling the pitch of 唱音
A singing voice synthesizing apparatus comprising:

Phrase data generation process for dividing the lyrics indicated by the lyric data into a plurality of fragments, generating a plurality of phrase data each indicating the utterance mode of the phrase of the synthesized singing sound from each fragment, and writing to the storage means;
In response to a playback instruction for desired phrase data, the phrase data is read from the storage means, and a synthesized singing sound is generated using the read phrase data, each corresponding to a plurality of phrase data A plurality of blocks are displayed on the display means, and when the designated position detecting means detects that any one of the blocks displayed on the display means is designated, the phrase data corresponding to the block is used. When a synthetic singing sound is generated and an operation of tilting the block displayed on the display means is performed, when generating a synthetic singing sound in response to a phrase data reproduction instruction corresponding to the block, Phrase playback process that gives the synthesized singing sound a change corresponding to the inclination
A method for synthesizing a song.