JP2018156417A

JP2018156417A - Input device and voice synthesis device

Info

Publication number: JP2018156417A
Application number: JP2017052950A
Authority: JP
Inventors: 潮岡部; Ushio Okabe; 亮佑石浦; Ryosuke Ishiura; 航平大竹; Kohei Otake; 悠真竹内; Yuma Takeuchi; 俊文八木; Toshifumi Yagi
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2017-03-17
Filing date: 2017-03-17
Publication date: 2018-10-04
Anticipated expiration: 2037-03-17
Also published as: JP6888351B2

Abstract

PROBLEM TO BE SOLVED: To provide an input device enabling a vowel and a consonant to be input to a speech synthesizer by a simple operation.SOLUTION: An input device 10 designates one of vowels of lyrics of a singing voice synthesized by a singing voice synthesis control device 20 according to operation on an operator (S 11, S 12). Moreover, the input device 10 designates the other of consonants in accordance with the movement of the input device 10 (S13, S14). The input device 10 transmits vowels and consonants to the singing voice synthesis control device 20 (S15). The singing voice synthesis control apparatus 20 generates and outputs a synthesized voice having vowels and consonants received from the input device 10 and pitches determined according to the operation on the operator of the own device (S21 to S26).SELECTED DRAWING: Figure 5

Description

本発明は、ユーザーの操作に応じてリアルタイムで歌唱音声を合成する技術に関する。 The present invention relates to a technique for synthesizing a singing voice in real time according to a user operation.

ユーザーによる演奏及び歌詞の入力に応じて、リアルタイムで歌唱音声を合成及び再生する技術が知られている。例えば、非特許文献１には、母音を入力するためのキー及び演奏を入力するためのキーを有する歌唱音声合成装置が記載されている。 A technique for synthesizing and reproducing a singing voice in real time in accordance with a performance and lyrics input by a user is known. For example, Non-Patent Document 1 describes a singing voice synthesizer having a key for inputting a vowel and a key for inputting a performance.

“歌うキーボードポケットミク”、［online］、平成２６年４月３日、［２０１７年３月６日検索］、インターネット＜URL：http://otonanokagaku.net/nsx39/＞“Singing Keyboard Pocket Miku”, [online], April 3, 2014, [Search March 6, 2017], Internet <URL: http://otonanokagaku.net/nsx39/>

非特許文献１に記載の技術においては、入力できる歌詞が母音のみであり、合成される歌唱音声が単調であるという問題があった。
これに対し本発明は、母音及び子音を簡単な操作で音声合成装置に入力できるようにした入力装置を提供することを目的とする。 In the technique described in Non-Patent Document 1, there is a problem that the lyrics that can be input are only vowels and the synthesized singing voice is monotonous.
On the other hand, an object of the present invention is to provide an input device capable of inputting vowels and consonants to a speech synthesizer with a simple operation.

本発明は、歌唱合成制御装置で合成する歌唱音声の歌詞の母音及び子音の一方を操作子に対する操作に応じて指定する第１指定部と、前記母音及び子音の他方を、自装置の動きに応じて指定する第２指定部と、前記指定された母音及び子音を、歌唱合成制御装置へ送信する送信部とを有する入力装置を提供する。 The present invention provides a first designating unit that designates one of the vowels and consonants of the lyrics of the singing voice synthesized by the singing synthesis control device according to an operation on the operation element, and the other of the vowels and the consonant is a movement of the device There is provided an input device having a second designating unit designated in response and a transmitting unit for transmitting the designated vowel and consonant to the singing synthesis control device.

この入力装置は、使用状態においてユーザーの指と接触する接触面を有する被把持部を有し、前記操作子は、前記被把持部における前記接触面に設けられてもよい。 The input device may include a gripped portion having a contact surface that contacts a user's finger in a use state, and the operation element may be provided on the contact surface of the gripped portion.

前記第２指定部は、前記入力装置を動かす方向に応じて前記母音及び子音の他方を指定してもよい。 The second designating unit may designate the other of the vowel and the consonant according to a direction in which the input device is moved.

上記いずれかの構成の入力装置と、歌唱合成制御装置とを備え、前記歌唱合成制御装置は、前記入力装置から前記指定された母音及び子音を受信する受信部と、１以上の操作子と、前記１以上の操作子に対する操作を検出する操作検出部と、前記操作検出部により前記操作が検出された前記操作子に応じて音高を決定する決定部と前記受信部により受信された前記母音及び子音と、前記決定部により決定された音高とを有する合成音声を生成する音声合成部とを有する音声合成装置を提供する。 An input device having any one of the above configurations and a singing synthesis control device, wherein the singing synthesis control device includes a receiving unit that receives the designated vowels and consonants from the input device, and one or more operators. An operation detection unit that detects an operation on the one or more operation elements, a determination unit that determines a pitch according to the operation element detected by the operation detection unit, and the vowel received by the reception unit And a speech synthesizer that generates a synthesized speech having a consonant and a pitch determined by the determination unit.

本発明によれば、母音及び子音を簡単な操作で音声合成装置に入力できるようにした入力装置を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the input device which enabled it to input a vowel and a consonant to a speech synthesizer by simple operation can be provided.

本発明の一実施形態に係る音声合成装置の概略構成を例示する図。The figure which illustrates schematic structure of the speech synthesizer concerning one embodiment of the present invention. 被把持部１１の構成を例示する図。The figure which illustrates the structure of the to-be-held part 11. FIG. 入力装置１０の動きと指定される子音との関係を例示する図。The figure which illustrates the relationship between the motion of the input device 10, and the consonant designated. 入力装置１０及び歌唱合成制御装置２０の機能構成を例示する図。The figure which illustrates the function structure of the input device 10 and the song synthesis control apparatus 20. 入力装置１０及び歌唱合成制御装置２０の動作を示すフローチャート。The flowchart which shows operation | movement of the input device 10 and the song synthesis | combination control apparatus 20. FIG. 変形例に係る被把持部１１の構造を例示する図。The figure which illustrates the structure of the to-be-gripped part 11 which concerns on a modification. 変形例に係る被把持部１１の動きと指定される子音との関係を例示する図。The figure which illustrates the relationship between the motion of the to-be-held part 11 which concerns on a modification, and the designated consonant. 別の変形例に係る被把持部１１の構造を例示する図。The figure which illustrates the structure of the to-be-gripped part 11 which concerns on another modification. 別の変形例に係る被把持部１１の動きと指定される子音との関係を示す図。The figure which shows the relationship between the motion of the to-be-held part 11 which concerns on another modification, and the designated consonant. 変形例に係る入力装置及び歌唱合成制御装置の動作を示すフローチャート。The flowchart which shows operation | movement of the input device which concerns on a modification, and a song synthesis | combination control apparatus.

１．構成
図１は、本発明の一実施形態に係る音声合成装置１の概略構成を例示する図である。音声合成装置１は、リアルタイムで歌唱音声を合成する装置である。音声合成装置１は、入力装置１０と、歌唱合成制御装置２０とを含む。歌唱音声の合成には、少なくとも、歌詞及び音高の情報が必要である。この例において、歌詞は入力装置１０において入力され、音高は歌唱合成制御装置２０において入力される。入力装置１０において入力された歌詞を伝達するため、入力装置１０と歌唱合成制御装置２０とは、情報を送受信するためのケーブル３０を用いて、接続されている。ただし、入力装置１０と歌唱合成制御装置２０とは、有線ではなく、無線で接続されてもよい。 1. Configuration FIG. 1 is a diagram illustrating a schematic configuration of a speech synthesizer 1 according to an embodiment of the invention. The voice synthesizer 1 is a device that synthesizes a singing voice in real time. The speech synthesizer 1 includes an input device 10 and a song synthesis control device 20. In order to synthesize a singing voice, at least information on lyrics and pitch is required. In this example, lyrics are input at the input device 10, and pitches are input at the singing synthesis control device 20. In order to transmit lyrics input by the input device 10, the input device 10 and the singing voice synthesis control device 20 are connected using a cable 30 for transmitting and receiving information. However, the input device 10 and the song synthesis control device 20 may be connected wirelessly instead of wired.

歌唱合成制御装置２０は、歌唱合成を行う装置である。この例において、歌唱合成制御装置２０は、電子ピアノで例示される鍵盤楽器を模した外観を有する。歌唱合成制御装置２０は、前面に操作部２１を備える。操作部２１は、鍵を模した複数の操作子２１１を有する。歌唱合成制御装置２０は、入力装置１０から入力された歌詞と、いずれかの操作子２１１を押す操作に応じて決定した音高とに基づいて、歌唱音声の合成を制御する。 The song synthesis control device 20 is a device that performs song synthesis. In this example, the singing synthesis control device 20 has an external appearance imitating a keyboard instrument exemplified by an electronic piano. The singing synthesis control device 20 includes an operation unit 21 on the front surface. The operation unit 21 includes a plurality of operation elements 211 imitating keys. The singing voice synthesis control device 20 controls the synthesis of the singing voice based on the lyrics input from the input device 10 and the pitch determined according to the operation of pressing any one of the operators 211.

入力装置１０は、歌詞を入力するための装置である。歌詞は母音及び子音の組み合わせにより構成される。入力装置１０は、棒状の形状を有する。入力装置１０は、被把持部１１と、発光部１２とを含む。被把持部１１は、ユーザーによって把持される部位である。発光部１２は、発光する部位である。このように入力装置１０は、ケミカルライトで例示される照明器具としても機能する。発光部１２及びその制御には公知の技術が用いられる。 The input device 10 is a device for inputting lyrics. The lyrics are composed of a combination of vowels and consonants. The input device 10 has a rod shape. The input device 10 includes a gripped part 11 and a light emitting part 12. The gripped part 11 is a part gripped by the user. The light emitting unit 12 is a part that emits light. As described above, the input device 10 also functions as a lighting fixture exemplified by chemical lights. A well-known technique is used for the light emission part 12 and its control.

図２は、被把持部１１の構成を例示する図である。被把持部１１は、使用状態においてユーザーの指と接触する接触面１１Ａを有する。接触面１１Ａには、複数のスイッチ１１１〜１１６が設けられている。接触面１１Ａに設けられるスイッチは、例えばモーメンタリ型のプッシュ式スイッチである。このスイッチによれば、これらのスイッチを押している間はオンが入力され、スイッチが押されていない間はオフが入力される。スイッチは、オン／オフの入力が可能であれば、プッシュ式でなくてもよい。 FIG. 2 is a diagram illustrating the configuration of the gripped portion 11. The gripped portion 11 has a contact surface 11A that comes into contact with the user's finger in the use state. A plurality of switches 111 to 116 are provided on the contact surface 11A. The switch provided on the contact surface 11A is, for example, a momentary push switch. According to this switch, ON is input while these switches are pressed, and OFF is input while the switches are not pressed. The switch may not be a push type as long as an on / off input is possible.

この実施形態では、歌詞を構成する母音及び子音のうちの母音については、スイッチ１１１〜１１４の操作によって指定される。例えば、スイッチ１１１のみが押されている間は、［ａ］（あ）が指定される。スイッチ１１２のみが押されている間は、［ｉ］（い）が指定される。スイッチ１１３のみが押されている間は、［ｕ］（う）が指定される。スイッチ１１１及び１１２のみが押されている間は、［ｅ］（え）が指定される。スイッチ１１１及び１１３のみが押されている間は、［ｏ］（お）が指定される。 In this embodiment, the vowels of the vowels and consonants constituting the lyrics are designated by the operation of the switches 111 to 114. For example, while only the switch 111 is pressed, [a] (A) is designated. While only the switch 112 is being pressed, [i] (yes) is designated. [U] (U) is designated while only the switch 113 is pressed. While only the switches 111 and 112 are pressed, [e] (e) is designated. While only the switches 111 and 113 are pressed, [o] (O) is designated.

スイッチ１１４が押されている間は、拗音（半母音）の使用が指定される。例えば［ｋａ］（か）の拗音は［ｋｊａ］（きゃ）である。このため、スイッチ１１４が押されて間は、拗音を表現するために、母音［ａ］の直前に半母音［ｊ］を挿入することが指定される。 While the switch 114 is being pressed, use of stuttering (semi-vowel) is designated. For example, the roar of [ka] (ka) is [kja] (kya). For this reason, while the switch 114 is pressed, it is designated to insert the semi-vowel [j] immediately before the vowel [a] in order to express the stuttering.

歌詞の母音及び子音のうちの子音については、スイッチ１１５及び１１６の操作、並びに入力装置１０の動きによって指定される。この例において、入力装置１０の「動き」は、入力装置１０が振られることによる入力装置１０の位置の変化（つまり移動）である。 The consonant of the vowel and consonant of the lyrics is designated by the operation of the switches 115 and 116 and the movement of the input device 10. In this example, the “movement” of the input device 10 is a change (that is, movement) of the position of the input device 10 due to the input device 10 being shaken.

また、この実施形態では、入力装置１０の動きによって清音が指定され、濁音の使用の有無はスイッチ１１５の操作、半濁音の使用の有無はスイッチ１１６の操作によって指定される。例えば、子音として［ｋ］（か行）が指定され、且つ濁音の使用が指定された場合、［ｇ］（が行）が指定される。また、子音として［ｈ］（は行）が指定され、且つ半濁音の使用が指定された場合、［ｐ］（ぱ行）が指定される。 In this embodiment, clear sound is designated by the movement of the input device 10, whether or not muddy sound is used is designated by operating the switch 115, and whether or not semi-voiced sound is used is designated by operating the switch 116. For example, when [k] (or line) is designated as a consonant and use of muddy sound is designated, [g] (ga line) is designated. [H] (ha line) is specified as a consonant, and [p] (pa line) is specified when use of a semi-turbid sound is specified.

図３は、入力装置１０の動きと、指定される子音との関係を例示する図である。ここで、入力装置１０の下から上に延びる中心軸を「Ｌ」と規定する。入力装置１０が、中心軸Ｌの軸方向に振られている間は、［ｋ］が、その反対方向に振られている間は、［ｈ］が指定される。入力装置１０が、中心軸Ｌに対して時計回りに４５度回転した方向に振られている間は、［ｓ］（さ行）が、その反対方向に振られている間は、［ｍ］（ま行）が指定される。入力装置１０が、中心軸Ｌに対して時計回りに９０度回転した方向に振られている間は、［ｔ］（た行）が、その反対方向に振られている間は、［ｙ］（や行）が指定される。入力装置１０が、中心軸Ｌに対して時計回りに１３５度回転した方向に振られている間は、［ｎ］（な行）が、その反対方向に振られている間は、［ｒ］（ら行）が指定される。入力装置１０がいずれの方向にも振られていない場合は、［ａ］（あ行）が指定される。図３に示す矢印で示した方向以外に入力装置１０が振られた場合は、最も近い方向に対応する子音が指定される。 FIG. 3 is a diagram illustrating the relationship between the movement of the input device 10 and the specified consonant. Here, the central axis extending from the bottom to the top of the input device 10 is defined as “L”. [K] is designated while the input device 10 is swung in the axial direction of the central axis L, and [h] is designated while the input device 10 is swung in the opposite direction. While the input device 10 is swung in the direction rotated 45 degrees clockwise with respect to the central axis L, [s] (running) is swung in the opposite direction, while [m] (Line) is specified. While the input device 10 is swung in the direction rotated 90 degrees clockwise with respect to the central axis L, [t] (the row) is swung in the opposite direction, [y] (Or line) is specified. While the input device 10 is swung in a direction rotated 135 degrees clockwise with respect to the central axis L, [n] (a row) is swung in the opposite direction, [r] (Ra line) is specified. When the input device 10 is not swung in any direction, [a] (row) is designated. When the input device 10 is swung in a direction other than the direction indicated by the arrow shown in FIG. 3, the consonant corresponding to the closest direction is designated.

なお入力装置１０の動きと、指定される子音との関係は図３の例に限定されない。図３の例では、入力装置１０を地面に垂直に立てて用いた場合、地面にほぼ垂直な面における入力装置１０の動きに応じて子音が定義される。しかし、図３の例における中心軸Ｌを、入力装置１０において横（具体的には、例えば被把持部１１においてスイッチが設けられた面に垂直な方向）に設定してもよい。この例によれば、地面にほぼ水平な面における入力装置１０の動きに応じて子音が定義される。 The relationship between the movement of the input device 10 and the designated consonant is not limited to the example of FIG. In the example of FIG. 3, when the input device 10 is used while standing perpendicular to the ground, a consonant is defined according to the movement of the input device 10 in a plane substantially perpendicular to the ground. However, the central axis L in the example of FIG. 3 may be set laterally in the input device 10 (specifically, for example, a direction perpendicular to the surface of the gripped portion 11 where the switch is provided). According to this example, consonants are defined according to the movement of the input device 10 on a surface substantially horizontal to the ground.

なお、［わ］、［を］、［ん］の各音は、例えば、スイッチ１１５をオンし、且つ入力装置１０を動かさないことによって指定される。また、［ゃ］、［ゅ］［ょ］という小書き文字を表現する場合は、これを指定するためのスイッチが別に設けられてもよい。 Note that the sounds [wa], [o], and [n] are specified by turning on the switch 115 and not moving the input device 10, for example. In addition, when expressing small characters such as [］], [］] [［], a switch for designating this may be provided separately.

図４は、入力装置１０及び歌唱合成制御装置２０の機能構成を例示する図である。入力装置１０は、操作検出部１０１、第１指定部１０２、動き検出部１０３、第２指定部１０４及び送信部１０５を含む。操作検出部１０１は、スイッチ１１１〜１１６の各スイッチから入力される信号に基づいて、スイッチ１１１〜１１６の操作の状態を検出する。第１指定部１０２は、歌詞の母音及び子音のうちの母音を、操作検出部１０１により検出されたスイッチ１１１〜１１４の操作の状態に応じて指定する。動き検出部１０３は、入力装置１０の動きを検出する。この実施形態では、動き検出部１０３は、図示せぬセンサからの情報に基づいて、少なくとも入力装置１０の動かされた方向（振られた方向）を検出する。センサは、例えば、２軸又は３軸の加速度センサを含む。動き検出部１０３は、例えば、加速度センサによって計測された加速度、加速度から求められた速度、及び変位の大きさに基づいて、入力装置１０の動きを検出する。動き検出部１０３は、加速度センサ以外のセンサを用いて、入力装置１０の動きを検出してもよい。第２指定部１０４は、歌詞の母音及び子音のうちの子音を、動き検出部１０３により検出された入力装置１０の動き、並びに操作検出部１０１により検出されたスイッチ１１５及び１１６の操作の状態に応じて指定する。送信部１０５は、第１指定部１０２により指定された母音及び第２指定部１０４により指定された子音を、歌唱合成制御装置２０へ送信する。 FIG. 4 is a diagram illustrating a functional configuration of the input device 10 and the song synthesis control device 20. The input device 10 includes an operation detection unit 101, a first specification unit 102, a motion detection unit 103, a second specification unit 104, and a transmission unit 105. The operation detection unit 101 detects the operation states of the switches 111 to 116 based on signals input from the switches 111 to 116. The first designation unit 102 designates a vowel of the vowels and consonants of the lyrics according to the operation state of the switches 111 to 114 detected by the operation detection unit 101. The motion detection unit 103 detects the motion of the input device 10. In this embodiment, the motion detection unit 103 detects at least the direction in which the input device 10 has been moved (the direction in which the input device 10 has been shaken) based on information from a sensor (not shown). The sensor includes, for example, a biaxial or triaxial acceleration sensor. The motion detection unit 103 detects the motion of the input device 10 based on, for example, the acceleration measured by the acceleration sensor, the speed obtained from the acceleration, and the magnitude of the displacement. The motion detection unit 103 may detect the motion of the input device 10 using a sensor other than the acceleration sensor. The second designation unit 104 converts the consonant of the vowels and consonants of the lyrics into the movement of the input device 10 detected by the movement detection unit 103 and the operation state of the switches 115 and 116 detected by the operation detection unit 101. Specify accordingly. The transmitting unit 105 transmits the vowel specified by the first specifying unit 102 and the consonant specified by the second specifying unit 104 to the singing synthesis control device 20.

なお、入力装置１０の各機能は、ＣＰＵ（Central Processing Unit）で例示される演算処理装置、ＲＯＭ（Read only memory）及びＲＡＭ（Random access memory）で例示されるメモリ、並びに通信モジュール等を搭載したプロセッサにより実装される。入力装置１０の各機能は、例えば、プロセッサ及びプロセッサが実行するプログラムにより実装される。また、入力装置１０の機能は２以上のプロセッサ又はプログラムにより実装されてもよい。 Each function of the input device 10 is equipped with an arithmetic processing unit exemplified by a CPU (Central Processing Unit), a memory exemplified by a ROM (Read only memory) and a RAM (Random access memory), and a communication module. Implemented by the processor. Each function of the input device 10 is implemented by, for example, a processor and a program executed by the processor. The function of the input device 10 may be implemented by two or more processors or programs.

歌唱合成制御装置２０は、受信部２０１、操作検出部２０２、決定部２０３、合成指示部２０４、音声合成部２０５及び音声出力部２０６を含む。受信部２０１は、入力装置１０（送信部１０５）から、歌詞の母音及び子音を受信する。操作検出部２０２は、操作部２１の各操作子２１１から入力される信号に基づいて、
各操作子２１１の操作の状態を検出する。決定部２０３は、操作検出部２０２の検出結果に基づいて、ユーザーにより押された操作子２１１に応じた音高を決定する。合成指示部２０４は、受信部２０１により受信された子音及び母音、並びに決定部２０３により決定された音高に基づいて、歌唱音声を合成するように、音声合成部２０５に指示する。音声合成部２０５は、合成指示部２０４からの合成指示に従って歌唱音声を合成して、歌唱音声（合成音声）を生成する。音声合成部２０５は、合成された歌唱音声を示す音信号を、音声出力部２０６に出力する。音声出力部２０６は、音声合成部２０５から出力された音信号に従って音を出力する。 The singing voice synthesis control device 20 includes a receiving unit 201, an operation detection unit 202, a determination unit 203, a synthesis instruction unit 204, a voice synthesis unit 205, and a voice output unit 206. The receiving unit 201 receives lyrics vowels and consonants from the input device 10 (transmitting unit 105). The operation detection unit 202 is based on a signal input from each operator 211 of the operation unit 21.
The state of operation of each operator 211 is detected. Based on the detection result of the operation detection unit 202, the determination unit 203 determines a pitch according to the operation element 211 pressed by the user. The synthesis instruction unit 204 instructs the speech synthesis unit 205 to synthesize the singing voice based on the consonant and vowel received by the reception unit 201 and the pitch determined by the determination unit 203. The voice synthesizer 205 synthesizes a singing voice in accordance with a synthesis instruction from the synthesis instruction unit 204 to generate a singing voice (synthesized voice). The voice synthesis unit 205 outputs a sound signal indicating the synthesized singing voice to the voice output unit 206. The sound output unit 206 outputs sound according to the sound signal output from the sound synthesis unit 205.

なお、受信部２０１、操作検出部２０２、決定部２０３、合成指示部２０４及び音声合成部２０５の各機能は、ＣＰＵで例示される演算処理装置、ＲＯＭ及びＲＡＭで例示されるメモリ、並びに通信モジュール等を搭載したプロセッサにより実装される。歌唱合成制御装置２０の各機能は、例えば、プロセッサ及びプロセッサが実行するプログラムにより実装される。また、歌唱合成制御装置２０の機能は２以上のプロセッサ又はプログラムにより実装されてもよい。音声出力部２０６は、例えば、信号処理回路、増幅器及びスピーカを含む。 Note that each function of the receiving unit 201, the operation detecting unit 202, the determining unit 203, the synthesis instructing unit 204, and the voice synthesizing unit 205 includes an arithmetic processing unit exemplified by a CPU, a memory exemplified by a ROM and a RAM, and a communication module. Etc. are mounted by a processor equipped with. Each function of the song synthesis control device 20 is implemented by, for example, a processor and a program executed by the processor. Moreover, the function of the song synthesis control apparatus 20 may be implemented by two or more processors or programs. The audio output unit 206 includes, for example, a signal processing circuit, an amplifier, and a speaker.

２．動作
図５は、入力装置１０及び歌唱合成制御装置２０の動作を示すフローチャートである。図５のフローは、例えば、入力装置１０及び歌唱合成制御装置２０の電源がオンされている期間において実行される。 2. Operation FIG. 5 is a flowchart showing operations of the input device 10 and the song synthesis control device 20. The flow in FIG. 5 is executed, for example, during a period in which the input device 10 and the song synthesis control device 20 are turned on.

入力装置１０において、第１指定部１０２は、操作検出部１０１の検出結果に基づいて、スイッチ１１１〜１１３の少なくともいずれかが押されたか否かを判断する（ステップＳ１１）。いずれのスイッチも押されていないと判断した場合（ステップＳ１１；ＮＯ）、第１指定部１０２は、待機する。スイッチ１１１〜１１３の少なくともいずれかが押されたと判断した場合（ステップＳ１１；ＹＥＳ）、第１指定部１０２は、母音を指定する（ステップＳ１２）。第１指定部１０２は、スイッチ１１１〜１１３の操作状態に応じて、［ａ］、［ｉ］、［ｕ］、［ｅ］、［ｏ］のうちのいずれかの母音を指定し、また、スイッチ１１４の操作状態に応じて拗音を表現するための半母音を指定する。 In the input device 10, the first designation unit 102 determines whether or not at least one of the switches 111 to 113 is pressed based on the detection result of the operation detection unit 101 (step S11). When it is determined that none of the switches is pressed (step S11; NO), the first designation unit 102 stands by. When it is determined that at least one of the switches 111 to 113 has been pressed (step S11; YES), the first designation unit 102 designates a vowel (step S12). The first designation unit 102 designates one of vowels among [a], [i], [u], [e], and [o] according to the operation state of the switches 111 to 113, and A semi-vowel for expressing a stuttering is designated according to the operation state of the switch 114.

次に、動き検出部１０３は、入力装置１０の動きを検出する（ステップＳ１３）。第２指定部１０４は、動き検出部１０３により検出された入力装置１０の動かされた方向、並びに操作検出部１０１により検出されたスイッチ１１５及びスイッチ１１６の操作状態に応じて、子音を指定する（ステップＳ１４）。第２指定部１０４は、［ａ］、［ｋ］、［ｓ］、［ｔ］、［ｎ］、［ｈ］、［ｍ］、［ｇ］、［ｚ］（ざ行）、［ｄ］（だ行）、［ｂ］（ば行）及び［ｐ］（ぱ行）のうちの、いずれかの子音を指定する。 Next, the motion detector 103 detects the motion of the input device 10 (step S13). The second designation unit 104 designates a consonant according to the direction in which the input device 10 is detected detected by the motion detection unit 103 and the operation state of the switch 115 and the switch 116 detected by the operation detection unit 101 ( Step S14). The second designating unit 104 selects [a], [k], [s], [t], [n], [h], [m], [g], [z] (zag), [d] One of the consonants is specified among (line), [b] (line), and [p] (line).

次に、送信部１０５は、指定された母音及び子音を、歌唱合成制御装置２０へ送信する（ステップＳ１５）。この送信後、入力装置１０の処理はステップＳ１１に戻される。即ち、スイッチ１１１〜１１３の少なくともいずれかが押されている間は、送信部１０５は、母音及び子音を歌唱合成制御装置２０へ送信する。 Next, the transmission part 105 transmits the designated vowel and consonant to the song synthesis control apparatus 20 (step S15). After this transmission, the process of the input device 10 is returned to step S11. That is, the transmission unit 105 transmits vowels and consonants to the singing synthesis control device 20 while at least one of the switches 111 to 113 is pressed.

歌唱合成制御装置２０において受信部２０１は、入力装置１０から、母音及び子音を受信したか否かを判断する（ステップＳ２１）。母音及び子音を受信していないと判断した場合（ステップＳ２１；ＮＯ）、受信部２０１は待機する。母音及び子音が受信されたと判断した場合（ステップＳ２１；ＹＥＳ）、決定部２０３は、操作検出部２０２の検出結果に基づいて、少なくともいずれかの操作子２１１が押されたか否かを判断する（ステップＳ２２）。いずれの操作子２１１も押されていないと判断した場合（ステップＳ２１；ＮＯ）、歌唱合成制御装置２０の処理は、ステップＳ２１に戻される。 In the singing synthesis control device 20, the receiving unit 201 determines whether or not a vowel and a consonant have been received from the input device 10 (step S21). If it is determined that vowels and consonants are not received (step S21; NO), the receiving unit 201 waits. When it is determined that a vowel and a consonant have been received (step S21; YES), the determination unit 203 determines whether or not at least one of the operators 211 has been pressed based on the detection result of the operation detection unit 202 ( Step S22). When it is determined that none of the operators 211 is pressed (step S21; NO), the processing of the singing synthesis control device 20 is returned to step S21.

決定部２０３は、少なくともいずれかの操作子２１１が押されたと判断した場合（ステップＳ２２；ＹＥＳ）、押された操作子２１１に応じた音高を決定する（ステップＳ２３）。決定部２０３は、この操作子２１１に固有の音高を決定する。操作子２１１は鍵を模した操作子である。よって、より高い音高に対応する鍵に相当する操作子２１１が押された場合ほど、決定部２０３はより高い音高を決定するとよい。 If it is determined that at least one of the operators 211 has been pressed (step S22; YES), the determination unit 203 determines a pitch corresponding to the pressed operator 211 (step S23). The determination unit 203 determines a pitch specific to the operator 211. An operator 211 is an operator imitating a key. Therefore, the determination unit 203 may determine a higher pitch as the operator 211 corresponding to a key corresponding to a higher pitch is pressed.

合成指示部２０４は、受信された子音及び母音、並びに決定された音高に基づいて、歌唱音声を合成するように、音声合成部２０５に指示する（ステップＳ２４）。具体的には、合成指示部２０４は、子音及び母音に従い決定された歌詞を発音記号に変換して、この発音記号及び決定した音高の音声を合成する指示を生成し、音声合成部２０５に出力する。音声合成部２０５は、入力された合成指示に従って歌唱音声を合成する（ステップＳ２５）。歌唱音声の合成には公知の技術を用いることができるので、ここではその概要だけ説明する。音声合成部２０５は、素片ライブラリを有している。素片ライブラリは、ある特定の歌唱者の声からサンプリングした音楽素片（歌声の断片）を含むデータベースである。素片ライブラリには、その歌唱者の歌唱音声波形から採取された素片データが複数含まれている。素片データとは、歌唱音声波形から、音声学的な特徴部分を切り出して符号化した音声データをいう。 The synthesis instruction unit 204 instructs the voice synthesis unit 205 to synthesize the singing voice based on the received consonants and vowels and the determined pitch (step S24). Specifically, the synthesis instruction unit 204 converts the lyrics determined according to the consonant and the vowel into phonetic symbols, generates an instruction for synthesizing the phonetic symbol and the voice of the determined pitch, and sends it to the voice synthesis unit 205. Output. The voice synthesizer 205 synthesizes a singing voice according to the inputted synthesis instruction (step S25). Since a well-known technique can be used for synthesis of the singing voice, only the outline will be described here. The speech synthesis unit 205 has a segment library. The segment library is a database including musical segments (singing voice fragments) sampled from a voice of a specific singer. The segment library includes a plurality of segment data collected from the singing voice waveform of the singer. Segment data refers to speech data obtained by extracting and encoding phonetic features from a singing speech waveform.

ここで、素片データについて、［さいた］という歌詞の歌唱音声を合成する場合を例として説明する。［さいた］という歌詞は発音記号で［ｓａｉｔａ］と表される。発音記号［ｓａｉｔａ］で表される音声の波形を特徴により分析すると、［ｓ］の音の立ち上がり部分、［ｓ］の音、［ｓ］の音から［ａ］の音への遷移部分、［ａ］の音…と続き、［ａ］の音の減衰部分で終わる。各素片データは、これらの音声学的な特徴部分に対応する音声データである。素変ライブラリには、あらゆる音及び音の組み合わせに関する素片データが格納されている。以下の説明において、ある発音記号で表される音の立ち上がり部分に対応する素片データを、その発音記号の前に［＃］を付けて、［＃ｓ］のように表す。また、ある発音記号で表される音の減衰部分に対応する素片データを、その発音記号の後に［＃］を付けて、［ａ＃］のように表す。また、ある発音記号で表される音から他の発音記号で表される音への遷移部分に対応する素片データを、それらの発音記号の間に［−］を入れて、［ｓ−ａ］のように表す。 Here, the case of synthesizing the singing voice of the lyrics “sai” with respect to the segment data will be described as an example. The lyrics of [sai] are represented by phonetic symbols [saita]. When the waveform of the voice represented by the phonetic symbol [saita] is analyzed by features, the rising part of the sound of [s], the sound of [s], the transition part from the sound of [s] to the sound of [a], [ The sound of [a] is continued and ends with the attenuation part of the sound of [a]. Each piece of data is audio data corresponding to these phonetic features. The element change library stores element data relating to all sounds and combinations of sounds. In the following description, segment data corresponding to the rising portion of a sound represented by a phonetic symbol is represented as [#s] with [#] in front of the phonetic symbol. In addition, segment data corresponding to a sound attenuation portion represented by a certain phonetic symbol is expressed as [a #] by adding [#] after the phonetic symbol. Further, segment data corresponding to a transition portion from a sound represented by a certain phonetic symbol to a sound represented by another phonetic symbol is inserted between those phonetic symbols, and [s-a ].

例えば、［ぱ］という音声は、［＃ｐ］、［ｐ］、［ｐ−ａ］及び［ａ］という素片データを順番に並べて繋げることにより合成される。音声合成部２０５は、これらの素片データを組み合わせた後、音高を調整する。音声合成部２０５は、音高を調整した合成音声の音信号を出力する。音声出力部２０６は、音声合成部２０５から出力された音信号に従って合成音声を出力する（ステップＳ２６）。 For example, a voice [Pa] is synthesized by arranging and connecting segment data [#p], [p], [pa] and [a] in order. The speech synthesizer 205 adjusts the pitch after combining these segment data. The speech synthesizer 205 outputs a sound signal of synthesized speech whose pitch is adjusted. The voice output unit 206 outputs a synthesized voice according to the sound signal output from the voice synthesis unit 205 (step S26).

次に、合成指示部２０４は、入力装置１０から受信される母音又は子音が変化したか否かを判断する（ステップＳ２７）。具体的には、合成指示部２０４は、母音及び子音の少なくとも一方が変化したか、並びに母音及び子音が受信されなくなったかを判断する。母音及び子音に変化がないと判断した場合（ステップＳ２７；ＮＯ）、合成指示部２０４は音高の変更があるか否かを判断する（ステップＳ２８）。具体的には、合成指示部２０４は、操作子２１１が押されなくなった（操作子２１１から指が離された）か、及び別の操作子２１１が押されたかを判断する。音高に変更がないと判断された場合は（ステップＳ２８；ＮＯ）、合成指示部２０４は、新たな歌唱音声の合成を指示しない。具体的に葉、歌唱合成制御装置２０の処理はステップＳ２５に処理が戻され、音声合成部２０５は、音声出力部２０６を用いて、同じ歌詞（文字）の合成音声を出力し続ける（ステップＳ２５，Ｓ２６）。音声合成部２０５は、最後の母音（先の例では［ａ］）を伸ばし続ける音信号を出力する。 Next, the synthesis instruction unit 204 determines whether or not the vowel or consonant received from the input device 10 has changed (step S27). Specifically, the synthesis instruction unit 204 determines whether at least one of the vowels and consonants has changed, and whether the vowels and consonants are no longer received. If it is determined that there is no change in the vowels and consonants (step S27; NO), the synthesis instruction unit 204 determines whether there is a change in pitch (step S28). Specifically, the composition instruction unit 204 determines whether the operator 211 is no longer pressed (a finger is released from the operator 211) and whether another operator 211 is pressed. When it is determined that there is no change in the pitch (step S28; NO), the synthesis instruction unit 204 does not instruct the synthesis of a new singing voice. Specifically, the processing of the leaf and singing synthesis control device 20 is returned to step S25, and the speech synthesis unit 205 continues to output synthesized speech of the same lyrics (characters) using the speech output unit 206 (step S25). , S26). The speech synthesizer 205 outputs a sound signal that continues to extend the last vowel ([a] in the previous example).

一方、合成指示部２０４が入力装置１０から受信される母音又は子音が変化したと判断された場合（ステップＳ２７；ＹＥＳ）、又は音高が変更されたと判断した場合（ステップＳ２８；ＹＥＳ）には、歌唱合成制御装置２０の処理は、ステップＳ２１に戻される。
そして、入力装置１０から母音及び子音が受信され（ステップＳ２１；ＹＥＳ）、更に操作子２１１の操作で音高が指定された場合には（ステップＳ２２；ＹＥＳ）、合成指示部２０４は、新たな歌唱音声の合成を音声合成部２０５に指示し、歌唱音声の合成、及び合成音声の出力を行わせる（ステップＳ２３〜Ｓ２６）。 On the other hand, when the synthesis instruction unit 204 determines that the vowel or consonant received from the input device 10 has changed (step S27; YES), or when the pitch has been changed (step S28; YES). The process of the song synthesis control device 20 is returned to step S21.
When the vowel and the consonant are received from the input device 10 (step S21; YES) and the pitch is specified by the operation of the operator 211 (step S22; YES), the synthesis instruction unit 204 receives a new one. The voice synthesis unit 205 is instructed to synthesize the singing voice, and singing voice is synthesized and the synthesized voice is output (steps S23 to S26).

以上説明した音声合成装置１によれば、ユーザーは片方の手で入力装置１０を持ち、スイッチを押す操作と入力装置１０を動かす動作とによって、歌詞の母音及び子音を指定することができる。更に、ユーザーは他方の手で歌唱合成制御装置２０を操作することによって、歌詞の音高を指定することができる。よって、ユーザーは、歌詞の母音、子音及び音高を簡単に指定して、歌唱合成制御装置２０に合成音声を出力させることができる。 According to the speech synthesizer 1 described above, the user can hold the input device 10 with one hand, and can specify vowels and consonants of lyrics by an operation of pressing a switch and an operation of moving the input device 10. Furthermore, the user can designate the pitch of the lyrics by operating the singing synthesis control device 20 with the other hand. Therefore, the user can easily specify the vowels, consonants, and pitches of the lyrics, and cause the singing synthesis control device 20 to output the synthesized speech.

３．変形例
本発明は上述の実施形態に限定されるものではなく、種々の変形実施が可能である。以下、変形例をいくつか説明する。以下の変形例のうち２つ以上のものが組み合わせて用いられてもよい。 3. Modifications The present invention is not limited to the above-described embodiments, and various modifications can be made. Hereinafter, some modifications will be described. Two or more of the following modifications may be used in combination.

３−１．被把持部１１
図６は、変形例に係る被把持部１１の構造を例示する図であり、図７は、被把持部１１の動きと指定される子音との関係を例示する図である。被把持部１１の接触面１１Ａに設けられるスイッチの種類及び数は、図２の例に限定されない。この変形例では、図６に示すように、接触面１１Ａにおいて、濁音を指定するスイッチ１１５及び半濁音を指定する１１６が設けられておらず、代わりに、モードを切り替えるスイッチ１１７が設けられている。第１指定部１０２は、スイッチ１１７がオフされている間は、図７の左側の図に示すように、子音として［ａ］、［ｋ］、［ｓ］、［ｔ］、［ｎ］、［ｈ］、［ｍ］、［ｙ］、［ｒ］を指定可能とする。一方、第１指定部１０２は、スイッチ１１７がオンされている間は、図７の右側の図に示すように、子音として［ｙ］、［ｗ］、［ｇ］、［ｚ］、［ｄ］、［ｂ］、［ｐ］を指定可能とする。 3-1. Grasping part 11
FIG. 6 is a diagram illustrating the structure of the gripped portion 11 according to a modification, and FIG. 7 is a diagram illustrating the relationship between the movement of the gripped portion 11 and the specified consonant. The type and number of switches provided on the contact surface 11A of the gripped portion 11 are not limited to the example of FIG. In this modified example, as shown in FIG. 6, the contact surface 11A is not provided with a switch 115 for designating muddy sound and 116 for designating semi-turbid sound. Instead, a switch 117 for switching modes is provided. . While the switch 117 is off, the first designation unit 102 uses [a], [k], [s], [t], [n], [n], as consonants as shown in the left diagram of FIG. [H], [m], [y], [r] can be specified. On the other hand, as long as the switch 117 is on, the first designation unit 102 uses [y], [w], [g], [z], [d ], [B], and [p] can be specified.

図８は、別の変形例に係る被把持部１１の構造を例示する図であり、図９は、被把持部１１の動きと指定される子音との関係を例示する図である。入力装置１０は、接触面１１Ａに設けられたスイッチの操作に応じて子音を、入力装置１０の動きに応じて母音を指定してもよい。この例では図８に示すように、接触面１１Ａに、子音を指定するためのスイッチとして、スイッチ１１１〜１１４及び１１８が設けられている。この場合、４つのスイッチ１１１〜１１３及び１１８のオン／オフの組み合わせにより、清音、濁音及び半濁音を含む計１６個の子音を指定することができる。スイッチ１１４は、上述した実施形態と同様、拗音の使用の有無を指定するためのスイッチである。図９に示すように、この変形例では、第１指定部１０２及び第２指定部１０４に代えて、第１指定部１０６及び第２指定部１０７が設けられている。第１指定部１０６は、歌詞の母音及び子音のうちの子音を、操作検出部１０１により検出されたスイッチ１１１〜１１３及び１１８の操作の状態に応じて指定する。第２指定部１０７は、歌唱音声の歌詞の母音及び子音のうちの母音を、スイッチ１１４の操作の状態及び動き検出部１０３により検出された入力装置１０の動きに応じて指定する。送信部１０５は、第１指定部１０６により指定された子音及び第２指定部１０７により指定された母音を、歌唱合成制御装置２０へ送信する。歌唱合成制御装置２０の構成は、上述した実施形態と同じでよい。 FIG. 8 is a diagram illustrating the structure of the grasped portion 11 according to another modification, and FIG. 9 is a diagram illustrating the relationship between the movement of the grasped portion 11 and the specified consonant. The input device 10 may designate a consonant according to the operation of a switch provided on the contact surface 11 </ b> A and a vowel according to the movement of the input device 10. In this example, as shown in FIG. 8, switches 111 to 114 and 118 are provided on the contact surface 11 </ b> A as switches for designating consonants. In this case, a total of 16 consonants including clear sound, muddy sound and semi-muddy sound can be designated by the combination of on / off of the four switches 111 to 113 and 118. The switch 114 is a switch for designating whether or not the stuttering is used, as in the above-described embodiment. As shown in FIG. 9, in this modification, a first specifying unit 106 and a second specifying unit 107 are provided instead of the first specifying unit 102 and the second specifying unit 104. The first designation unit 106 designates a consonant among the vowels and consonants of the lyrics according to the operation state of the switches 111 to 113 and 118 detected by the operation detection unit 101. The second designation unit 107 designates the vowels of the lyrics vowels and consonants of the singing voice according to the operation state of the switch 114 and the movement of the input device 10 detected by the movement detection unit 103. The transmitting unit 105 transmits the consonant specified by the first specifying unit 106 and the vowel specified by the second specifying unit 107 to the singing synthesis control device 20. The configuration of the singing synthesis control device 20 may be the same as that of the above-described embodiment.

図１０は、この変形例に係る入力装置１０及び歌唱合成制御装置２０の動作を示すフローチャートである。図１０のフローは、例えば、入力装置１０及び歌唱合成制御装置２０の電源がオンされている間において、実行される。入力装置１０において、第１指定部１０６は、操作検出部１０１の検出結果に基づいて、スイッチ１１１〜１１３及び１１８の少なくともいずれかが押されたか否かを判断する（ステップＳ３１）。いずれのスイッチも押されていないと判断した場合（ステップＳ３１；ＮＯ）、第１指定部１０６は、待機する。スイッチ１１１〜１１３及び１１８の少なくともいずれかが押されたと判断した場合（ステップＳ３１；ＹＥＳ）、第１指定部１０６は、子音を指定する（ステップＳ３２）。 FIG. 10 is a flowchart showing operations of the input device 10 and the song synthesis control device 20 according to this modification. The flow in FIG. 10 is executed, for example, while the input device 10 and the song synthesis control device 20 are powered on. In the input device 10, the first designation unit 106 determines whether at least one of the switches 111 to 113 and 118 has been pressed based on the detection result of the operation detection unit 101 (step S31). When it is determined that none of the switches is pressed (step S31; NO), the first designation unit 106 stands by. When determining that at least one of the switches 111 to 113 and 118 has been pressed (step S31; YES), the first designation unit 106 designates a consonant (step S32).

次に、動き検出部１０３は、入力装置１０の動きを検出する（ステップＳ３３）。第２指定部１０７は、動き検出部１０３により検出された入力装置１０の動かされた方向に応じて、母音を指定する（ステップＳ３４）。 Next, the motion detector 103 detects the motion of the input device 10 (step S33). The second designation unit 107 designates a vowel according to the direction of movement of the input device 10 detected by the motion detection unit 103 (step S34).

次に、送信部１０５は、指定された子音及び母音を、歌唱合成制御装置２０へ送信する（ステップＳ３５）。この送信後、入力装置１０の処理はステップＳ３１に戻される。即ち、スイッチ１１１〜１１３及び１１８の少なくともいずれかが押されている間は、送信部１０５は、子音及び母音を歌唱合成制御装置２０へ送信する。 Next, the transmission part 105 transmits the designated consonant and vowel to the song synthesis control apparatus 20 (step S35). After this transmission, the process of the input device 10 is returned to step S31. That is, while at least one of the switches 111 to 113 and 118 is pressed, the transmission unit 105 transmits the consonant and the vowel to the song synthesis control device 20.

３−２．入力装置１０の動きと子音との関係
上述した実施形態で説明した入力装置１０の動きの方向と指定される子音との関係は、一例に過ぎない。例えば３軸の直交座標系を規定して、軸方向毎に異なる子音を対応させてもよい。また、入力装置１０の動きは、入力装置１０の振動に限られず、入力装置１０の姿勢の変化（回転、ひねり）等であってもよい。入力装置１０は、その動きに応じた子音又は母音が指定するように構成されていればよい。 3-2. The relationship between the movement of the input device 10 and the consonant The relationship between the direction of movement of the input device 10 and the specified consonant described in the above-described embodiment is merely an example. For example, a three-axis orthogonal coordinate system may be defined, and different consonants may be associated with each axial direction. Further, the movement of the input device 10 is not limited to the vibration of the input device 10 and may be a change in the posture of the input device 10 (rotation, twist) or the like. The input device 10 should just be comprised so that the consonant or vowel according to the motion may designate.

３−３．他の変形例
入力装置１０の具体的形状は、実施形態において例示したものに限定されない。例えば、入力装置１０は、交通整理のための誘導灯等の、棒状に形成された装置であってもよい。また、入力装置１０は、照明機能を有さなくてもよく、例えば、杖又は指揮棒であってもよい。また、入力装置１０の形状は棒状であるものに限られず、例えば、ダンベル、又はユーザーの身体の部位に装着される装置（例えば、グローブ型の装置）等の、棒状でない装置であってもよい。また、入力装置１０は、携帯型のデバイス（例えば、スマートフォン）であってもよい。この場合、入力装置１０は、タッチスクリーンの表面をなぞるユーザーの指の動きを検出し、その動きに応じた母音又は子音を指定してもよい。この場合、タッチスクリーン上で指が移動した方向と、母音又は子音とが対応付けられていればよい。 3-3. Other Modifications The specific shape of the input device 10 is not limited to that illustrated in the embodiment. For example, the input device 10 may be a rod-shaped device such as a guide light for traffic control. Further, the input device 10 may not have a lighting function, and may be, for example, a walking stick or a command stick. The shape of the input device 10 is not limited to a rod shape, and may be a device that is not a rod shape, such as a dumbbell or a device (for example, a glove-type device) that is attached to a body part of a user. . The input device 10 may be a portable device (for example, a smartphone). In this case, the input device 10 may detect the movement of the user's finger tracing the surface of the touch screen and specify a vowel or consonant corresponding to the movement. In this case, the direction in which the finger moves on the touch screen may be associated with the vowel or consonant.

入力装置１０は、母音又は子音の指定に用いられるスイッチが押されている期間においてのみ、入力装置１０の動きを検出してもよい。これにより、常に入力装置１０の動きを検知する場合に比べて、入力装置１０消費電力の低減が期待できる。 The input device 10 may detect the movement of the input device 10 only during a period in which a switch used for designating a vowel or consonant is being pressed. Thereby, compared with the case where the movement of the input device 10 is always detected, a reduction in power consumption of the input device 10 can be expected.

入力装置において母音又は子音の指定に用いられる操作子は、モーメンタリ型のスイッチに限定されない。モーメンタリ型のスイッチに代えて、又は加えて、オルタネイト型のスイッチが用いられてもよい。あるいは、スイッチに代えて、又は加えて、レバー、スライダー、又はダイヤル等が用いられてもよい。 An operator used for designating a vowel or consonant in the input device is not limited to a momentary switch. Instead of or in addition to the momentary switch, an alternate switch may be used. Alternatively, a lever, a slider, a dial, or the like may be used instead of or in addition to the switch.

歌唱合成制御装置２０は、電子鍵盤楽器を模した外観を有していなくてもよく、弦楽器や、管楽器、吹奏楽器等の楽器を模した外観を有していてもよいし、楽器を模した外観でなくてもよい。歌唱合成制御装置２０は、少なくとも歌唱音声の合成を制御する機能を有していればよい。操作部２１に含まれる操作子の数も、１以上の数であれば、いくつでもよい。 The singing synthesis control device 20 may not have an external appearance imitating an electronic keyboard instrument, may have an external appearance imitating a stringed instrument, a wind instrument, a wind instrument, or the like, or imitate an instrument. It does not have to be the appearance. Singing synthesis control device 20 should just have a function which controls composition of at least singing voice. The number of operators included in the operation unit 21 may be any number as long as it is one or more.

上述した実施形態で説明した入力装置１０及び歌唱合成制御装置２０の構成又は動作の一部が省略されてもよい。例えば、入力装置１０が、拗音、濁音及び半濁音の少なくともいずれかを指定しない構成であってもよい。 A part of the configuration or operation of the input device 10 and the singing voice synthesis control device 20 described in the above-described embodiment may be omitted. For example, the input device 10 may be configured not to specify at least one of stuttering, muddy sound, and semi-muddy sound.

１…音声合成装置、１０…入力装置、１０１…操作検出部、１０２…第１指定部、１０３…動き検出部、１０４…第２指定部、１０５…送信部、１０６…第１指定部、１０７…第２指定部、１１…被把持部、１１Ａ…接触面、１１１〜１１８…スイッチ、１２…発光部、２０…歌唱合成制御装置、２０１…受信部、２０２…操作検出部、２０３…決定部、２０４…合成指示部、２０５…音声合成部、２０６…音声出力部、２１…操作部、２１１…操作子、３０…ケーブル。 DESCRIPTION OF SYMBOLS 1 ... Speech synthesizer, 10 ... Input device, 101 ... Operation detection part, 102 ... 1st designation | designated part, 103 ... Motion detection part, 104 ... 2nd designation | designated part, 105 ... Transmission part, 106 ... 1st designation | designated part, 107 DESCRIPTION OF SYMBOLS 2nd designation | designated part, 11 ... Grasped part, 11A ... Contact surface, 111-118 ... Switch, 12 ... Light emission part, 20 ... Singing synthesis control apparatus, 201 ... Reception part, 202 ... Operation detection part, 203 ... Determination part , 204 ... synthesis instruction unit, 205 ... voice synthesis unit, 206 ... voice output unit, 21 ... operation unit, 211 ... operator, 30 ... cable.

Claims

A first designation unit that designates one of the vowels and consonants of the lyrics of the singing voice to be synthesized by the singing synthesis control device according to an operation on the operator;
A second designating unit for designating the other of the vowels and consonants according to the movement of the device;
An input device comprising: a transmitter that transmits the designated vowels and consonants to the singing synthesis control device.

Having a gripped portion having a contact surface that comes into contact with a user's finger in use;
The input device according to claim 1, wherein the operation element is provided on the contact surface of the gripped portion.

The input device according to claim 1, wherein the second designation unit designates the other of the vowel and the consonant according to a direction in which the input device is moved.

An input device according to any one of claims 1 to 3,
A singing synthesis control device,
The singing synthesis control device is
A receiving unit for receiving the designated vowel and consonant from the input device;
One or more controls,
An operation detection unit for detecting an operation on the one or more operators;
A determination unit that determines a pitch according to the operation element for which the operation is detected by the operation detection unit; a vowel and a consonant received by the reception unit; and a pitch determined by the determination unit. A speech synthesizer comprising: a speech synthesizer that generates synthesized speech.