JP5131220B2

JP5131220B2 - Singing pitch difference identification device and program

Info

Publication number: JP5131220B2
Application number: JP2009029886A
Authority: JP
Inventors: 典昭阿瀬見
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2009-02-12
Filing date: 2009-02-12
Publication date: 2013-01-30
Anticipated expiration: 2029-02-12
Also published as: JP2010186032A

Description

本発明は、推奨曲の提案や採点の実行といったカラオケ装置が有する機能の実施に用いられる情報を生成する歌唱音高差特定装置、及びプログラムに関する。 The present invention relates to a singing pitch difference specifying device and a program for generating information used for implementing functions of a karaoke device such as suggesting a recommended song and executing scoring.

従来より、伴奏音楽に合わせてユーザが歌唱を楽しむカラオケ装置において、ユーザが歌唱可能であると推定した楽曲を推奨曲として提案するものや、ユーザの歌唱の上手さを採点するものが知られている。 Conventionally, in a karaoke apparatus in which a user enjoys singing along with accompaniment music, what is proposed as a recommended song that the user has estimated to be able to sing, and what scores the skill of the user's singing are known. Yes.

一般的に、これらのカラオケ装置では、推奨曲を提案する際には、ユーザにとって歌唱しやすい楽曲を推奨曲として提案することが、採点を実行する際には、多くの人が納得するように採点結果を導出することが求められている。 In general, in these karaoke devices, when proposing a recommended song, it is suggested that a song that is easy for the user to sing is recommended as a recommended song. It is required to derive the scoring results.

これに対し、例えば、推奨曲を提案する際に、ユーザが歌唱可能な音域内において、構成音の音高が遷移する楽曲を推奨曲として選択することが提案されている（例えば、特許文献１参照）。また、採点を実行する際に、歌唱音声についての基本周波数と、予め用意された基準値との一致度合いが高いほど優れた結果となるように採点すること（以下、基本採点とする）に加えて、歌唱者が歌唱可能な音域には重みを付与した採点結果を導出することが提案されている（例えば、特許文献２参照）。 On the other hand, for example, when proposing a recommended song, it has been proposed to select, as a recommended song, a song whose pitch of component sounds transitions within a range where the user can sing (for example, Patent Document 1). reference). In addition, when performing scoring, in addition to scoring so that the higher the degree of coincidence between the basic frequency of the singing voice and the reference value prepared in advance, the better the result (hereinafter referred to as basic scoring) Thus, it has been proposed to derive a scoring result in which a weight is given to a range where a singer can sing (for example, see Patent Document 2).

つまり、特許文献１や、特許文献２に記載のカラオケ装置では、ユーザが歌唱可能な音域（以下、歌唱音域と称す）を用いて、推奨曲を決定したり、基本採点の結果に付加する重みを導出している。 In other words, in the karaoke apparatuses described in Patent Document 1 and Patent Document 2, a weight that is determined by using a sound range that the user can sing (hereinafter referred to as a singing sound range) or added to the result of the basic scoring. Is derived.

特開２００５−１０７３１３号公報JP-A-2005-107313 特許第３２９０９４５号Japanese Patent No. 3290945

ところで、歌唱のしやすさや、歌唱の上手さは、歌唱音域のみによって決定されるものではなく、その他の様々な歌唱能力によっても変わる。 By the way, the ease of singing and the skill of singing are not determined only by the singing range, but also vary by other various singing abilities.

しかしながら、特許文献１、２に記載されたカラオケ装置では、ユーザが有している歌唱能力のうち、歌唱音域のみを用いて、推奨曲の決定や、歌唱の上手さを採点（評価）するため、ユーザにとって十分に歌唱しやすい楽曲を推奨曲としたり、ユーザの歌唱の上手さを正確に採点したりすることができないという問題があった。 However, in the karaoke apparatuses described in Patent Literatures 1 and 2, in order to score (evaluate) the determination of recommended songs and the skill of singing using only the singing sound range among the singing abilities possessed by the user. There is a problem that it is not possible to make a song that is easy to sing for the user as a recommended song or to accurately score the user's singing skill.

そこで、本発明は、ユーザにとって歌唱しやすい推奨曲を決定したり、歌唱の上手さを正確に採点したりするために用いる情報を生成することが可能な技術を提供することを目的とする。 Then, this invention aims at providing the technique which can produce | generate the information used in order to determine the recommended music which is easy to sing for a user, or to accurately mark the skill of a singing.

上記目的を達成するためになされた本発明は、各楽曲を構成する構成音それぞれの音高及び音価を表す楽曲データに従って楽曲を演奏するカラオケ装置にて用いられる歌唱音高差特定装置である。 The present invention made to achieve the above object is a singing pitch difference identifying device used in a karaoke device that plays music according to music data representing the pitch and value of each of the constituent sounds constituting each music. .

その本発明の歌唱音高差特定装置では、音声信号取得手段が、カラオケ装置にて楽曲の演奏中に入力された音声（以下、音声信号とする）を、その音声を発したユーザ及び演奏中の楽曲を識別するための識別情報と対応付けて取得し、歌唱データ生成手段が、その取得した音声信号を周波数解析することで、音高の遷移を表す歌唱データを生成する。 In the singing pitch difference identifying device of the present invention, the voice signal acquisition means is the voice that is input during the performance of the music (hereinafter referred to as the voice signal) by the karaoke device, and the user who is playing the voice The song data is generated in association with the identification information for identifying the song, and the song data generation means analyzes the frequency of the acquired voice signal to generate song data representing pitch transition.

そして、遷移区間特定手段が、その生成された歌唱データの中で、遷移元音に対する歌唱区間の終端（以下、区間終端とする）から、到達音に対する歌唱区間の始端（以下、区間始端とする）までの歌唱区間（以下、歌唱遷移区間と称す）を特定する。ただし、ここで言う到達音とは、識別情報から特定される楽曲中にて音高が切り替わるように連続する２つの構成音（以下、特定構成音と称す）のうち、音高が切り替わった後に到達する構成音であり、ここで言う遷移元音とは、特定構成音のうちの他方（音高が切り替わる前の遷移元となる構成音）である。 Then, the transition section specifying means, in the generated singing data, from the end of the singing section for the transition source sound (hereinafter referred to as section end) to the beginning of the singing section for the arrival sound (hereinafter referred to as section starting end). ) To the singing section (hereinafter referred to as singing transition section). However, the reaching sound referred to here is after the pitch is switched among two consecutive constituent sounds (hereinafter referred to as specific constituent sounds) so that the pitch is switched in the music specified from the identification information. The component sound that arrives, and the transition source sound referred to here is the other of the specific component sounds (the component sound that is the transition source before the pitch is switched).

さらに、その特定された歌唱遷移区間それぞれについて、遷移値導出手段が、歌唱遷移区間と、音高遷移モデルとの一致度合いが高いほど大きな値となる音高遷移値を導出する。ただし、ここで言う音高遷移モデルとは、遷移元音に対応する音高から到達音に対応する音高へと発声音高を理想的に遷移させた時の遷移態様である。 Further, for each of the identified singing transition sections, the transition value deriving means derives a pitch transition value that becomes a larger value as the degree of coincidence between the singing transition section and the pitch transition model is higher. However, the pitch transition model referred to here is a transition mode when the utterance pitch is ideally shifted from the pitch corresponding to the transition source sound to the pitch corresponding to the arrival sound.

このようにして導出された音高遷移値を、歌唱音高差特定手段が、それぞれの歌唱遷移区間での音高差毎に集計し、その集計結果（以下、集計遷移値とする）が規定値以上である音高差を、識別情報から特定されるユーザが歌唱可能な音高差として特定する。 The pitch transition value derived in this way is summed up for each pitch difference in each singing transition section by the singing pitch difference specifying means, and the counting result (hereinafter referred to as the total transition value) is defined. A pitch difference that is greater than or equal to the value is specified as a pitch difference that can be sung by the user specified from the identification information.

このような本発明の歌唱音高差特定装置によれば、ユーザが歌唱可能な音高差、即ち、ユーザが歌唱可能な音程を特定することができる。 According to such a singing pitch difference specifying device of the present invention, it is possible to specify a pitch difference that the user can sing, that is, a pitch that the user can sing.

特に、ユーザが無理をして発声した状態では、遷移元音に対応する音高から到達音に対応する音高への音高推移を滑らか（スムーズ）に歌唱することが困難であることから、本発明の歌唱音高差特定装置によれば、ユーザが無理をすることなく歌唱可能な音高差を特定することができる。 In particular, in a state in which the user uttered forcibly, it is difficult to smoothly sing the pitch transition from the pitch corresponding to the transition source sound to the pitch corresponding to the arrival sound, According to the singing pitch difference specifying device of the present invention, it is possible to specify a pitch difference that can be sung without the user's excessive effort.

そして、本発明の歌唱音高差特定装置を有したカラオケ装置において、連続する２つの構成音の音高差が、主として、特定した歌唱可能な音高差の範囲内である楽曲を推奨曲としてユーザに提案するようにすれば、ユーザにカラオケをより楽しませることができる。また、本発明の歌唱音高差特定装置を有したカラオケ装置において、特定した歌唱可能な音高差が大きいほど大きな値となる重みを、基本採点の結果に付与すれば、ユーザの歌唱の上手さをより正確に採点することができる。 And in the karaoke apparatus having the singing pitch difference identifying device of the present invention, the music whose pitch difference between two consecutive constituent sounds is mainly within the range of the singable pitch difference is recommended music. Proposing to the user can make the user more entertaining karaoke. Further, in the karaoke apparatus having the singing pitch difference specifying device of the present invention, if a weight that becomes larger as the specified singing pitch difference is larger is given to the result of the basic scoring, the user's singing skill is improved. Can be scored more accurately.

ところで、一般的には、遷移元音の音高と到達音の音高との音高差（以下、構成音間音高差と称す）が大きいと、楽曲中における遷移元音の終端から到達音の始端までの間の区間（以下、特定区間とする）を滑らかに歌唱することは困難となる。しかし、構成音間音高差が大きかったとしても、特定区間の時間長が長いほど、特定区間を滑らかに歌唱することは容易となる。 By the way, in general, if the pitch difference between the pitch of the transition source sound and the pitch of the arrival sound (hereinafter referred to as the pitch difference between constituent sounds) is large, it reaches from the end of the transition source sound in the music It is difficult to smoothly sing the section between the beginning of the sound (hereinafter referred to as a specific section). However, even if the pitch difference between the constituent sounds is large, it becomes easier to sing the specific section more smoothly as the time length of the specific section is longer.

すなわち、構成音間音高差が同一であっても、特定区間の時間長によって、ユーザが歌唱可能であるか否かが異なる。 That is, even if the pitch difference between the constituent sounds is the same, whether or not the user can sing varies depending on the time length of the specific section.

このため、本発明において、歌唱音高差特定手段は、請求項２に記載のように、音高遷移値を、それぞれの歌唱遷移区間の区間長毎に分類して集計するように構成されていることが望ましい。 For this reason, in the present invention, the singing pitch difference specifying means is configured to classify and sum the pitch transition values for each section length of each singing transition section, as described in claim 2. It is desirable that

このように構成された歌唱音高差特定装置によれば、ユーザが歌唱可能な音高差を、歌唱区間の時間長毎に特定することができる。つまり、このように構成された歌唱音高差特定装置によれば、ユーザの歌唱能力を詳細に特定することができる。 According to the singing pitch difference specifying device configured as described above, the pitch difference that can be sung by the user can be specified for each time length of the singing section. That is, according to the singing pitch difference specifying device configured as described above, the singing ability of the user can be specified in detail.

また、通常、一つの楽曲における構成音間音高差の分布は、それぞれの楽曲によって大きく異なっているため、特定の構成音間音高差が多く含まれた楽曲のみをユーザが歌唱した場合、その特定の構成音間音高差に対する集計遷移値が大きな値となってしまう。 In addition, since the distribution of the pitch difference between the constituent sounds in one musical piece is usually greatly different depending on each musical piece, when the user sings only the musical piece that includes a lot of specific constituent pitch differences, The total transition value for the specific inter-sound pitch difference becomes a large value.

したがって、本発明の歌唱音高差特定装置においては、集計遷移値を正規化することが望ましい。 Therefore, in the singing pitch difference specifying device of the present invention, it is desirable to normalize the total transition value.

このように、集計遷移値を正規化した場合、正規化された集計遷移値の分布は、図９に示すように、歌唱の上手な（即ち、歌唱能力が高い）ユーザであれば、各音高差それぞれでの集計結果（即ち、正規化された集計遷移値）にバラつきがなくなり、全ての音高差において、集計結果が最大値に近い値となる。一方、歌唱の下手な（即ち、歌唱能力が低い）ユーザであれば、各音高差それぞれでの集計結果（即ち、正規化された集計遷移値）にバラつきが生じ、歌唱可能な音高差における集計結果と、歌唱不可能な音高差における集計結果との間の差が大きくなる。 In this way, when the total transition value is normalized, the distribution of the normalized total transition value is as shown in FIG. 9, as long as the user is a good singer (ie, has high singing ability). The tabulation results for each pitch difference (ie, normalized tabulated transition values) are not varied, and the tabulation results are close to the maximum value for all pitch differences. On the other hand, if the user is not good at singing (that is, the singing ability is low), the tabulation results for each pitch difference (that is, the normalized total transition value) vary, and the pitch difference that can be sung. The difference between the totaling result in and the totaling result in the pitch difference that cannot be sung becomes large.

したがって、請求項３に記載のように、歌唱音高差特定手段にて用いる規定値を、集計遷移値の最大値に対して予め規定された割合とすれば、ユーザのレベル（技量）に応じて、ユーザが歌唱可能である音高差を適切に特定することができる。 Therefore, if the specified value used in the singing pitch difference specifying means is a ratio specified in advance with respect to the maximum value of the total transition value as described in claim 3, it depends on the level (skill) of the user. Thus, it is possible to appropriately specify the pitch difference that the user can sing.

ところで、本発明において、遷移区間特定手段は、請求項４に記載のように、分散値導出手段が、歌唱データに対して、時間の進行に沿って連続するように予め設定された区間長を有する時間窓を複数設定し、その設定された時間窓それぞれについて、時間窓内における歌唱データの分散である分散値を導出し、始終端特定手段が、その導出された分散値のうち、互いに連続する時間窓に対する分散値の差分を導出し、その差分が最小及び最大となる２つの時間窓の中心それぞれを、区間終端及び区間始端として特定するように構成されていても良い。 By the way, in the present invention, as described in claim 4, the transition section specifying means has a section length that is set in advance so that the variance value deriving means continues to the song data as time progresses. A plurality of time windows are set, and for each of the set time windows, a dispersion value that is a variance of the singing data within the time window is derived, and the start / end identification means are continuous with each other among the derived dispersion values. It is also possible to derive the difference between the variance values for the time window to be determined, and specify the centers of the two time windows where the difference is minimum and maximum as the end of the section and the start of the section.

このように構成された遷移区間特定手段において、分散値導出手段は、特に、切替区間特定手段が、歌唱データにて、遷移元音に対する歌唱区間、到達音に対する歌唱区間、及びそれらの両歌唱区間に挟まれた歌唱区間からなる切替区間を特定し、その特定した切替区間内に対して複数の時間窓を設定するように構成されていても良い。そして、歌唱区間を特定する方法は、識別情報から特定される楽曲データを取得し、歌唱データと楽曲データとを照合することでも良い。 In the transition section specifying means configured as described above, the variance value deriving means is, in particular, the switching section specifying means, in the song data, the singing section for the transition original sound, the singing section for the reaching sound, and both singing sections. It may be configured to specify a switching section composed of singing sections sandwiched between and set a plurality of time windows in the specified switching section. And the method of specifying a song section may acquire the music data specified from identification information, and may collate song data and music data.

なお、本発明は、カラオケ装置にて用いられるコンピュータに実行されるプログラムであっても良い。 The present invention may be a program executed on a computer used in a karaoke apparatus.

この場合、本発明のプログラムは、請求項５に記載のように、音声信号取得手順にて、音声信号を識別情報と対応付けて取得し、その取得された音声信号を、歌唱データ生成手順にて周波数解析することで、音高の遷移を表す歌唱データを生成すると共に、その生成した歌唱データの中で、区間終端から区間始端までの歌唱区間を、遷移区間特定手順にて歌唱遷移区間として特定する。これと共に、特定された歌唱遷移区間それぞれについて、遷移値導出手順にて、歌唱遷移区間と、音高遷移モデルとの一致度合いが高いほど大きな値となる音高遷移値を導出し、その導出された音高遷移値を、歌唱音高差特定手順にて、それぞれの歌唱遷移区間での音高差毎に集計し、その集計結果が規定値以上である音高差を、識別情報から特定されるユーザが歌唱可能な音高差として特定するようになされている必要がある。 In this case, as described in claim 5, the program of the present invention acquires the audio signal in association with the identification information in the audio signal acquisition procedure, and uses the acquired audio signal as the song data generation procedure. By performing frequency analysis, singing data representing pitch transitions is generated, and in the generated singing data, the singing section from the end of the section to the beginning of the section is set as the singing transition section by the transition section specifying procedure. Identify. At the same time, for each specified singing transition section, the transition value deriving procedure derives a pitch transition value that becomes larger as the degree of coincidence between the singing transition section and the pitch transition model is higher, and is derived. The pitch transition values are counted for each pitch difference in each singing transition section in the singing pitch difference identification procedure, and the pitch difference for which the counting result is a specified value or more is identified from the identification information. It is necessary to be specified as a pitch difference that the user can sing.

つまり、請求項５に記載されたプログラムは、請求項１に記載された歌唱音高差特定装置を構成する各手段を、コンピュータに実現させるためになされたものである。したがって、コンピュータに各手順を実行させることで、請求項１に記載された歌唱音高差特定装置と同様の効果を得ることができる。 That is, the program described in claim 5 is a program for causing a computer to realize each means constituting the singing pitch difference specifying device described in claim 1. Therefore, by causing the computer to execute each procedure, it is possible to obtain the same effect as the singing pitch difference specifying device described in claim 1.

なお、このようなプログラムであれば、例えば、ＤＶＤ−ＲＯＭ、ＣＤ−ＲＯＭ、ハードディスク等のコンピュータ読み取り可能な記録媒体に記録し、必要に応じてコンピュータにロードさせて起動することや、必要に応じて通信回線を介してコンピュータに取得させて起動することにより用いることができる。 If it is such a program, for example, it is recorded on a computer-readable recording medium such as a DVD-ROM, a CD-ROM, a hard disk, etc., and loaded into a computer as necessary, and started. It can be used by being acquired by a computer via a communication line and starting up.

カラオケシステムの概略構成を示すブロック図である。It is a block diagram which shows schematic structure of a karaoke system. 歌唱音高差特定処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of a song pitch difference specific process. 歌唱遷移区間抽出処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of a singing transition area extraction process. 歌唱音高差特定処理の処理内容を模式的に示した図面である。It is drawing which showed typically the processing content of song pitch difference specific processing. 歌唱音高差特定処理の処理内容を模式的に示した図面である。It is drawing which showed typically the processing content of song pitch difference specific processing. 歌唱遷移区間抽出処理の処理内容を模式的に示した図面である。It is drawing which showed typically the processing content of the singing transition area extraction process. 歌唱遷移区間抽出処理の処理内容を模式的に示した図面である。It is drawing which showed typically the processing content of the singing transition area extraction process. 歌唱音高差特定処理の処理内容を模式的に示した図面である。It is drawing which showed typically the processing content of song pitch difference specific processing. 正規化された集計遷移値について（請求項３に係る発明の概要）について示した説明図である。It is explanatory drawing shown about the totalization transition value normalized (outline | summary of the invention which concerns on Claim 3).

以下に本発明の実施形態を図面と共に説明する。 Embodiments of the present invention will be described below with reference to the drawings.

まず、図１は、カラオケ用に予め加工された楽曲（以下、カラオケ楽曲とする）の演奏に応じてユーザが歌唱するためのカラオケシステムの概略構成を示すブロック図である。 First, FIG. 1 is a block diagram showing a schematic configuration of a karaoke system for a user to sing according to the performance of a song processed in advance for karaoke (hereinafter referred to as karaoke song).

〈カラオケシステム全体の構成〉
図１に示すように、カラオケシステム１は、ユーザから指定されたカラオケ楽曲の再生を行うカラオケ装置２０と、カラオケ楽曲の再生に必要なデータである楽曲データをカラオケ装置２０に配信するサーバ３０とを備え、それらカラオケ装置２０とサーバ３０とはネットワーク（例えば、専用回線や、ＷＡＮ等）を介して接続されている。つまり、カラオケシステム１は、いわゆる通信カラオケシステムとして構成されたものである。 <Configuration of the entire karaoke system>
As shown in FIG. 1, the karaoke system 1 includes a karaoke device 20 that reproduces karaoke music specified by a user, and a server 30 that distributes music data, which is data necessary for reproducing karaoke music, to the karaoke device 20. The karaoke apparatus 20 and the server 30 are connected via a network (for example, a dedicated line, a WAN, or the like). That is, the karaoke system 1 is configured as a so-called communication karaoke system.

このうち、サーバ３０は、カラオケ楽曲の演奏に必要な処理プログラム（以下、カラオケ処理プログラムと称す）及び楽曲データを格納する記憶装置（図示せず）と、ＲＯＭ，ＲＡＭ，ＣＰＵを少なくとも有した周知のマイクロコンピュータ（図示せず）とを中心に構成された情報処理装置からなる周知のカラオケサービス用サーバ装置である。 Among these, the server 30 is a known device having at least a processing program (hereinafter referred to as a karaoke processing program) necessary for playing karaoke music and a storage device (not shown) for storing music data, and a ROM, RAM, and CPU. This is a well-known karaoke service server device composed of an information processing device mainly composed of a microcomputer (not shown).

つまり、サーバ３０は、カラオケシステム１（より正確には、カラオケ装置２０）を使用したユーザに関する情報（以下、ユーザ情報とする）や、カラオケ装置２０から送信されたカラオケ採点の結果等を一元管理すると共に、カラオケ装置２０からの要求に対して楽曲データやカラオケ処理プログラムを送信するように構成されている。 That is, the server 30 centrally manages information about the user who uses the karaoke system 1 (more precisely, the karaoke device 20) (hereinafter referred to as user information), the karaoke scoring result transmitted from the karaoke device 20, and the like. In addition, music data and a karaoke processing program are transmitted in response to a request from the karaoke apparatus 20.

ところで、楽曲データは、カラオケ楽曲それぞれについて予め用意され、例えば、ＭＩＤＩ（ＭｕｓｉｃａｌＩｎｓｔｒｕｍｅｎｔＤｉｇｉｔａｌＩｎｔｅｒｆａｃｅ）フォーマットにて記述された周知のカラオケ再生用データである。 By the way, the music data is prepared in advance for each karaoke music, and is, for example, well-known karaoke playback data described in the MIDI (Musical Instrument Digital Interface) format.

したがって、楽曲データは、カラオケ楽曲に関するデータである楽曲情報と、ユーザが歌唱すべき旋律に関するデータであるガイドメロディと、そのカラオケ楽曲の歌詞に関するデータである歌詞情報とから構成されている。 Therefore, the music data is composed of music information that is data related to karaoke music, a guide melody that is data related to the melody that the user should sing, and lyrics information that is data related to the lyrics of the karaoke music.

そして、楽曲情報には、カラオケ楽曲を特定するための曲番号データと、曲名を示す曲名データと、そのカラオケ楽曲の演奏時間を示す時間データとが含まれている。 The music information includes music number data for specifying the karaoke music, music name data indicating the music name, and time data indicating the performance time of the karaoke music.

また、ガイドメロディは、カラオケ楽曲の旋律を形成する各構成音の音高、及び音価から構成されている。具体的に、本実施形態のガイドメロディは、各構成音それぞれの楽音出力開始時間及び楽音出力終了時間が、それぞれの構成音の音高と共に表されている。ただし、ここで言う楽音出力開始時間とは、その構成音の出力を開始するまでのカラオケ楽曲の演奏開始からの時間であり、楽音出力終了時間とは、その構成音の出力を終了するまでのカラオケ楽曲の演奏開始からの時間である。 The guide melody is composed of pitches and tone values of constituent sounds that form the melody of karaoke music. Specifically, in the guide melody of the present embodiment, the tone output start time and tone output end time of each component sound are represented together with the pitch of each component sound. However, the tone output start time referred to here is the time from the start of the performance of the karaoke music until the output of the constituent sound is started, and the tone output end time is the time until the output of the constituent sound is ended. This is the time since the start of karaoke music.

〈カラオケ装置について〉
次に、カラオケ装置２０の構成について説明する。 <About karaoke equipment>
Next, the configuration of the karaoke apparatus 20 will be described.

このカラオケ装置２０は、サーバ３０との間でデータ通信を実行するための通信部２２と、通信部２２を介してサーバ３０から取得したカラオケ処理プログラム及び楽曲データを記憶する記憶部２１と、各種画像を表示するための表示部２３と、ユーザからの指示を受け付ける操作受付部２４とを備えている。さらに、カラオケ装置２０は、音声を入力するためのマイクロホン２６と、音声を出力するためのスピーカ２７と、マイクロホン２６やスピーカ２７を介した音声の入出力を制御する音声入出力部２５と、カラオケ装置２０を構成する各部２１，２２，２３，２４，２５を制御する制御部２８とを備えている。 The karaoke apparatus 20 includes a communication unit 22 for executing data communication with the server 30, a storage unit 21 for storing a karaoke processing program and music data acquired from the server 30 via the communication unit 22, and various types. A display unit 23 for displaying an image and an operation reception unit 24 for receiving an instruction from the user are provided. Further, the karaoke apparatus 20 includes a microphone 26 for inputting sound, a speaker 27 for outputting sound, a sound input / output unit 25 for controlling sound input / output through the microphone 26 and the speaker 27, and karaoke. And a control unit 28 for controlling the units 21, 22, 23, 24, and 25 constituting the apparatus 20.

このうち、通信部２２は、カラオケ装置２０をネットワーク（例えば、専用回線や、ＷＡＮ）に接続して外部と通信を行うための通信インタフェースであり、サーバ３０に各種データを出力すると共に、サーバ３０から各種データや処理プログラムを取得する。 Among these, the communication unit 22 is a communication interface for connecting the karaoke apparatus 20 to a network (for example, a dedicated line or WAN) to communicate with the outside, and outputs various data to the server 30 and also the server 30. Acquire various data and processing programs.

そして、表示部２３は、例えば、液晶ディスプレイ等から構成された表示装置であり、操作受付部２４は、例えば、複数のキースイッチ等から構成された入力装置や、周知のリモコンを介して入力された指示を受け付ける受信装置などからなる。 The display unit 23 is a display device configured from, for example, a liquid crystal display, and the operation receiving unit 24 is input via, for example, an input device configured from a plurality of key switches or the like, or a known remote controller. It comprises a receiving device that accepts instructions.

また、音声入出力部２５は、マイクロホン２６を介して入力された音声（アナログ信号）をデジタル信号に変換し、そのデジタル信号を制御部２８に入力するＡＤ変換器として構成されている。これと共に、音声入出力部２５は、スピーカ２７からの音声の出力を制御するように構成されている。なお、以下では、マイクロホン２６を介して入力され、デジタル信号に変換された音声を音声データと称す。 The voice input / output unit 25 is configured as an AD converter that converts voice (analog signal) input via the microphone 26 into a digital signal and inputs the digital signal to the control unit 28. At the same time, the audio input / output unit 25 is configured to control the output of audio from the speaker 27. In the following, the sound that is input via the microphone 26 and converted into a digital signal is referred to as sound data.

さらに、記憶部２１は、電源が切断されても記憶内容を保持すると共に記憶内容を読み書き可能に構成された記憶装置（例えば、ハードディスクドライブ）である。その記憶部２１は、カラオケ処理プログラムを格納するプログラム格納領域と、楽曲データを記憶する楽曲データ格納領域と、音声データを記憶する特定用データ格納領域とを備えている。なお、特定用データ格納領域は、予め規定された規定数（例えば、５曲分）分のカラオケ楽曲に対する音声データを格納する音声データ格納領域が、予め設定されたユーザ数（例えば、５人分）分だけ用意されたものである。 Furthermore, the storage unit 21 is a storage device (for example, a hard disk drive) configured to retain stored contents even when the power is turned off and to be able to read and write the stored contents. The storage unit 21 includes a program storage area for storing a karaoke processing program, a music data storage area for storing music data, and a specifying data storage area for storing audio data. The data storage area for specification is a voice data storage area for storing voice data for a predetermined number of karaoke songs (for example, for five songs), and a predetermined number of users (for example, for five people). ) Is prepared for the minutes.

次に、制御部２８は、電源が切断されても記憶内容を保持する必要のあるプログラムやデータを格納するＲＯＭ２８ａと、プログラムやデータを一時的に格納するＲＡＭ２８ｂと、ＲＯＭ２８ａやＲＡＭ２８ｂに記憶されたプログラムやデータに従って、カラオケ装置２０を構成する各部２１，２２，２３，２４，２５に対する制御及び各種演算を実行するＣＰＵ２８ｃとを少なくとも有した周知のマイクロコンピュータを中心に構成されている。 Next, the control unit 28 stores the ROM 28a that stores programs and data that need to retain stored contents even when the power is turned off, the RAM 28b that temporarily stores programs and data, and the ROM 28a and RAM 28b. It is mainly configured by a known microcomputer having at least a CPU 28c that executes control and various operations for the respective units 21, 22, 23, 24, and 25 constituting the karaoke apparatus 20 according to programs and data.

なお、ＲＡＭ２８ｂには、記憶部２１からカラオケ処理プログラムが読み込まれ、ＣＰＵ２８ｃは、ＲＡＭ２８ｂに記憶したカラオケ処理プログラムに従って各処理を実行する。 The karaoke processing program is read from the storage unit 21 into the RAM 28b, and the CPU 28c executes each process according to the karaoke processing program stored in the RAM 28b.

そのカラオケ処理プログラムとして、操作受付部２４を介して指定されたカラオケ楽曲を演奏（再生）する共に、表示部２３に歌詞を表示する周知のカラオケ演奏処理を実行するためのカラオケ演奏処理プログラムがある。また、カラオケ処理プログラムとして、カラオケ楽曲の演奏中にマイクロホン２６を介して入力される音声から抽出した歌唱の音程やテンポを採点基準（即ち、ガイドメロディ）と照合し、その適合度合いを点数化することで採点結果とする周知の採点処理を実行するための採点処理プログラムがある。 As the karaoke processing program, there is a karaoke performance processing program for performing (reproducing) a karaoke piece designated via the operation reception unit 24 and executing a well-known karaoke performance processing for displaying lyrics on the display unit 23. . Also, as a karaoke processing program, the pitch and tempo of the singing extracted from the voice input via the microphone 26 during the performance of the karaoke music are collated with a scoring standard (that is, a guide melody), and the degree of adaptation is scored. There is a scoring processing program for executing a well-known scoring process as a scoring result.

この他、カラオケ処理プログラムとして、ユーザ情報（例えば、氏名、性別、識別番号（ＩＤ）、年齢）を操作受付部２４を介して受け付けるユーザ情報処理を実行するためのユーザ情報処理プログラムがある。また、カラオケ処理プログラムとして、カラオケ楽曲の演奏中に生成されるデータ（例えば、音声データ）や、カラオケ装置２０を使用した時の履歴を、記憶部２１またはサーバ３０に蓄積する蓄積処理を実行するための蓄積処理プログラムがある。 In addition, as the karaoke processing program, there is a user information processing program for executing user information processing for receiving user information (for example, name, sex, identification number (ID), age) via the operation receiving unit 24. Further, as the karaoke processing program, an accumulation process for accumulating data generated during the performance of karaoke music (for example, voice data) or a history when the karaoke apparatus 20 is used in the storage unit 21 or the server 30 is executed. There is a storage processing program.

また、カラオケ処理プログラムとして、記憶部２１の特定用データ格納領域に格納された音声データに基づいて、ユーザが歌唱可能な音高差（以下、歌唱音高差とする）を特定する歌唱音高差特定処理を実行するための音高差特定処理プログラムがある。つまり、ＣＰＵ２８ｃが歌唱音高差特定処理を実行することにより、カラオケ装置２０（より正確には、制御部２８）が、本発明の歌唱音高差特定装置として機能する。 Moreover, the singing pitch which specifies the pitch difference (henceforth singing pitch difference) which a user can sing based on the audio | voice data stored in the data storage area for specification of the memory | storage part 21 as a karaoke processing program. There is a pitch difference specifying process program for executing the difference specifying process. That is, when the CPU 28c executes the singing pitch difference specifying process, the karaoke device 20 (more precisely, the control unit 28) functions as the singing pitch difference specifying device of the present invention.

〈カラオケシステム１の動作について〉
次に、カラオケシステム１の作用（動作）について説明する。 <Operation of Karaoke System 1>
Next, the operation (operation) of the karaoke system 1 will be described.

カラオケシステム１が使用される場合、カラオケ装置２０（より正確には制御部２８）では、ユーザ情報処理プログラムを実行して、当該カラオケ装置２０を利用するユーザの人数分のユーザ情報を受け付け、その受け付けたユーザ情報それぞれを各音声データ格納領域と対応付ける。 When the karaoke system 1 is used, the karaoke device 20 (more precisely, the control unit 28) executes a user information processing program to receive user information for the number of users who use the karaoke device 20, Each received user information is associated with each audio data storage area.

さらに、カラオケ装置２０では、カラオケ演奏処理プログラムを実行することで、ユーザに指定されたカラオケ楽曲を演奏する共に、そのカラオケ楽曲の歌詞を表示部２３に表示する。ただし、制御部２８は、各カラオケ楽曲の演奏が開始される前には、操作受付部２４を介してユーザ情報を取得する。 Furthermore, in the karaoke apparatus 20, by executing a karaoke performance processing program, the karaoke music designated by the user is played and the lyrics of the karaoke music are displayed on the display unit 23. However, the control unit 28 acquires user information via the operation receiving unit 24 before the performance of each karaoke piece is started.

そして、カラオケ楽曲の演奏が開始されると、その時に演奏されているカラオケ楽曲（以下、該当カラオケ楽曲と称す）に応じてユーザが歌唱する。そして、該当カラオケ楽曲の演奏が終了すると、制御部２８では、ＣＰＵ２８ｃが蓄積処理プログラムを実行して、ユーザの歌唱音声から生成した音声データを、該当カラオケ楽曲の曲番号データと対応付けた上で（以下、音声データと、曲番号データとが対応付けられたデータを対応データとも称す）、音声データ格納領域に格納する。ただし、対応データが格納される音声データ格納領域は、該当カラオケ楽曲に対して歌唱を実行したユーザのユーザ情報が対応付けられたものである。これにより、音声データと、曲番号データと、ユーザ情報とが対応付けられることになる。 When the performance of the karaoke music is started, the user sings according to the karaoke music being played at that time (hereinafter referred to as the corresponding karaoke music). Then, when the performance of the corresponding karaoke song is completed, in the control unit 28, the CPU 28c executes the accumulation processing program and associates the voice data generated from the user's singing voice with the song number data of the corresponding karaoke song. (Hereinafter, data in which audio data and song number data are associated with each other is also referred to as correspondence data) and is stored in the audio data storage area. However, the audio data storage area in which the correspondence data is stored is associated with the user information of the user who sang the corresponding karaoke song. Thereby, audio | voice data, music number data, and user information are matched.

以降、カラオケ装置２０は、カラオケ演奏処理プログラムの実行から、音声データ格納領域に音声データを格納するまでの一連のサイクルを、ユーザがカラオケ楽曲の指定を終了するまで繰り返す。 Thereafter, the karaoke apparatus 20 repeats a series of cycles from the execution of the karaoke performance processing program to storing the voice data in the voice data storage area until the user finishes specifying the karaoke music.

〈歌唱音高差特定処理〉
次に、ＣＰＵ２８ｃが実行する歌唱音高差特定処理について説明する。 <Singing pitch difference identification processing>
Next, the singing pitch difference specifying process executed by the CPU 28c will be described.

ここで、図２は、歌唱音高差特定処理の処理手順を示したフローチャートである。 Here, FIG. 2 is a flowchart showing a processing procedure of the singing pitch difference identification processing.

この歌唱音高差特定処理は、記憶部２１の音声データ格納領域に規定数の音声データが格納された場合、即ち、一人のユーザが規定数分のカラオケ楽曲を歌唱した場合に起動されるものである。 This singing pitch difference specifying process is started when a prescribed number of voice data is stored in the voice data storage area of the storage unit 21, that is, when one user sings a prescribed number of karaoke songs. It is.

図２に示すように、歌唱音高差特定処理が起動されると、まず、Ｓ１１０にて、記憶部２１の音声データ格納領域に格納されている全音声データのうち、一つの音声データを取得する。すなわち、図４（Ａ）に示すように、カラオケ楽曲の演奏（即ち、演奏時間の進行）に沿って信号レベルが変化する音声信号を、音声データとして取得する。 As shown in FIG. 2, when the singing pitch difference specifying process is started, first, in S110, one piece of voice data is acquired from all the voice data stored in the voice data storage area of the storage unit 21. To do. That is, as shown in FIG. 4A, an audio signal whose signal level changes along with the performance of karaoke music (that is, the progress of performance time) is acquired as audio data.

続く、Ｓ１２０では、演奏時間の進行に沿って連続するように設定された基準時間窓ｍ毎に、Ｓ１１０で取得した音声データを抽出して、その抽出した音声データ毎に周波数解析（本実施形態では、ＦＦＴ（ＦａｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ））する。これにより、周波数解析の結果として、各基準時間窓ｍ内での音声データの周波数スペクトル（即ち、周波数成分の分布）が、設定された基準時間窓ｍの数分だけ生成される。なお、基準時間窓ｍは、予め規定された時間長Ｔｗ（例えば、数十ｍｓ）を有した期間であり、添え字ｍは、基準時間窓のインデックス番号（したがって、ｍは、１以上の自然数）を表す。 Subsequently, in S120, the audio data acquired in S110 is extracted for each reference time window m set so as to continue along with the progress of the performance time, and frequency analysis is performed for each extracted audio data (this embodiment). Then, FFT (Fast Fourier Transform) is performed. As a result of the frequency analysis, the frequency spectrum (that is, the distribution of frequency components) of the audio data within each reference time window m is generated by the number of the set reference time windows m. The reference time window m is a period having a predetermined time length Tw (for example, several tens of ms), and the subscript m is the index number of the reference time window (therefore, m is a natural number of 1 or more). ).

さらに、Ｓ１３０では、Ｓ１２０での周波数解析の結果（周波数スペクトル）に基づいて、本発明の歌唱データに相当する基本周波数遷移ｆ０ｖ（ｍ）を導出する。 Furthermore, in S130, based on the result (frequency spectrum) of the frequency analysis in S120, a fundamental frequency transition f0v (m) corresponding to the song data of the present invention is derived.

具体的に、本実施形態におけるＳ１３０では、櫛型形状で表される調波構造モデルを予め用意し、その調波構造モデルと、各周波数スペクトルとを照合する周知の手法を用いて、基準時間窓ｍにおける基本周波数ｆ０を周波数スペクトルから検出する。そして、その検出した基本周波数ｆ０を、基準時間窓ｍの時間遷移、即ち、該当カラオケ楽曲の演奏時間の進行に従って、基本周波数遷移ｆ０ｖ（ｍ）としてまとめる。 Specifically, in S130 in the present embodiment, a harmonic structure model represented by a comb shape is prepared in advance, and the reference time is determined using a known method for collating the harmonic structure model with each frequency spectrum. The fundamental frequency f0 in the window m is detected from the frequency spectrum. The detected fundamental frequency f0 is collected as a fundamental frequency transition f0v (m) in accordance with the time transition of the reference time window m, that is, the progress of the performance time of the corresponding karaoke piece.

これにより、図４（Ｂ）に示すような、該当カラオケ楽曲をユーザが歌唱した時の基本周波数ｆ０の時間遷移を表す基本周波数遷移ｆ０ｖ（ｍ）が導出される。 Thereby, the fundamental frequency transition f0v (m) representing the temporal transition of the fundamental frequency f0 when the user sings the corresponding karaoke piece as shown in FIG. 4B is derived.

続くＳ１４０では、Ｓ１１０で取得した音声データと対応付けられている曲番号データに従って、その曲番号データに対応するカラオケ楽曲のガイドメロディを記憶部２１から取得する。 In subsequent S140, according to the music number data associated with the audio data acquired in S110, the guide melody of the karaoke music corresponding to the music number data is acquired from the storage unit 21.

そして、Ｓ１５０では、Ｓ１３０で導出された基本周波数遷移ｆ０ｖ（ｍ）と、Ｓ１４０にて取得したガイドメロディとを照合することにより、時間遅延量ｔｌ（ｋ）を導出する。 In S150, the basic frequency transition f0v (m) derived in S130 and the guide melody acquired in S140 are collated to derive the time delay amount tl (k).

ただし、時間遅延量ｔｌ（ｋ）とは、音高が切り替わるように連続し、かつ歌詞が対応付けられた２つの構成音（以下、特定構成音と称す）のうち、音高が切り替わった後に到達する構成音（以下、到達音と称す）に対する発声遅れ時間を表すものである。なお、以下では、特定構成音のうち、音高が切り替わる前の遷移元となる構成音を遷移元音と称す。 However, the time delay amount tl (k) means that after the pitch is switched between two component sounds (hereinafter referred to as specific component sounds) that are continuous so that the pitches are switched and the lyrics are associated with each other. It represents the utterance delay time with respect to the constituent sound that reaches (hereinafter referred to as the arrival sound). Hereinafter, among the specific component sounds, a component sound that is a transition source before the pitch is switched is referred to as a transition source sound.

ここで、本実施形態における時間遅延量ｔｌ（ｋ）の算出方法について詳しく説明する。ただし、本実施形態において、ｋは、該当する構成音（ここでは、到達音）の演奏の順番を表すものであり、該当カラオケ楽曲の演奏開始から、ｋ番目に演奏されることを表している。よって、ｋは、構成音の総数を最大値とした自然数である。 Here, a method of calculating the time delay amount tl (k) in the present embodiment will be described in detail. However, in the present embodiment, k represents the order of performance of the corresponding component sound (here, the arrival sound), and represents that the kth performance is performed from the start of the performance of the corresponding karaoke piece. . Therefore, k is a natural number with the total number of constituent sounds as a maximum value.

まず、ガイドメロディによって表された全ての構成音の音高の時間変化（即ち、ガイドメロディによる旋律）をガイドメロディ音高とする。 First, the time change of the pitches of all the constituent sounds represented by the guide melody (that is, the melody by the guide melody) is set as the guide melody pitch.

そして、図４（Ｃ）に示すように、基本周波数遷移ｆ０ｖ（ｍ）にガイドメロディ音高を照合することで、基本周波数遷移ｆ０ｖ（ｍ）中において、構成音ｋに対する発声を開始したとみなせるタイミング（以下、構成音歌唱開始タイミングと称す）を検出する。なお、基本周波数遷移ｆ０ｖ（ｍ）にガイドメロディ音高を照合する手法としては、特開２００５−１０７３３０号公報に記載された手法を用いれば良い。 Then, as shown in FIG. 4C, by comparing the guide melody pitch with the fundamental frequency transition f0v (m), it can be considered that the utterance to the constituent sound k is started in the fundamental frequency transition f0v (m). Timing (hereinafter referred to as constituent sound singing start timing) is detected. In addition, as a method for collating the guide melody pitch with the fundamental frequency transition f0v (m), a method described in JP-A-2005-107330 may be used.

さらに、その検出した構成音歌唱開始タイミングと、構成音ｋについての楽音出力開始時刻ｓｔ（ｋ）との差を時間遅延量ｔｌ（ｋ）として算出する。 Furthermore, the difference between the detected component sound singing start timing and the musical sound output start time st (k) for the component sound k is calculated as a time delay amount tl (k).

このように、ガイドメロディ音高と基本周波数遷移ｆ０ｖ（ｍ）との照合から、時間遅延量ｔｌ（ｋ）の算出までの一連の流れを、カラオケ楽曲の時間進行に従って（即ち、ｋが１から最大となるまで、ｋを順次増加させながら）繰り返す。これにより、全ての構成音歌唱開始タイミングが検出され、それら全構成音歌唱開始タイミングについての時間遅延量ｔｌ（ｋ）が求められる。 As described above, a series of flow from the comparison between the guide melody pitch and the fundamental frequency transition f0v (m) to the calculation of the time delay amount tl (k) is performed according to the time progress of the karaoke music (that is, k is 1). Repeat until k is reached, increasing k sequentially. Thereby, all the constituent sound singing start timings are detected, and the time delay amount tl (k) for these all constituent sound singing start timings is obtained.

続くＳ１６０では、基本周波数遷移ｆ０ｖ（ｍ）の中で、遷移元音に対する区間の終端（以下、区間終端とする）、及び到達音に対する区間の始端（以下、区間始端とする）を特定すると共に、それらの特定した区間終端から区間始端までの間の連続する区間を歌唱遷移区間ｆ０ｖｔ（ｍ）として抽出する歌唱遷移区間抽出処理を実行する。 In subsequent S160, in the fundamental frequency transition f0v (m), the end of the section for the transition source sound (hereinafter referred to as section end) and the start of the section for the arrival sound (hereinafter referred to as section start) are specified. Then, a singing transition section extraction process is performed to extract a continuous section from the end of the specified section to the start of the section as a singing transition section f0vt (m).

〈歌唱遷移区間抽出処理〉
ここで、歌唱遷移区間抽出処理について詳細に説明する。 <Singing transition section extraction processing>
Here, the song transition section extraction process will be described in detail.

なお、図３は、歌唱遷移区間抽出処理の処理手順を示したフローチャートであり、図６，７は、歌唱遷移区間抽出処理の処理内容を示した模式図である。 FIG. 3 is a flowchart showing a processing procedure of the singing transition section extraction process, and FIGS. 6 and 7 are schematic diagrams showing processing contents of the singing transition section extraction process.

この歌唱遷移区間抽出処理は、歌唱音高差特定処理のＳ１６０にて起動されると、図３に示すように、まず、Ｓ４００にて、先に検出した全ての構成音歌唱開始タイミングの中で、一つの構成音歌唱開始タイミングを含む基準時間窓（以下、探査中心時間窓と称す）ｍ₀（ｋ）を、下記（１）式に従って導出する。 When this singing transition section extraction process is started in S160 of the singing pitch difference specifying process, as shown in FIG. 3, first, in S400, all the constituent sound singing start timings detected earlier are included. A reference time window (hereinafter referred to as a search center time window) m ₀ (k) including one component sound singing start timing is derived according to the following equation (1).

ただし、（１）式において、ｒｏｕｎｄは、整数値への丸めを意味する。 However, in the expression (1), round means rounding to an integer value.

そして、Ｓ４１０では、基本周波数遷移ｆ０ｖ（ｍ）に対して、予め規定された区間の長さ（以下、窓区間長と称す）を有した一つの分析時間窓＃ｐを設定する。なお、本実施形態において、窓区間長は、２Ｍ＋１であり、Ｍは、基準時間窓ｍの個数である。 In S410, one analysis time window #p having a predetermined section length (hereinafter referred to as a window section length) is set for the fundamental frequency transition f0v (m). In the present embodiment, the window section length is 2M + 1, and M is the number of reference time windows m.

具体的に、その分析時間窓＃ｐは、窓区間長の中心（即ち、中心となる基準時間窓ｍ）が規定中心値となるように設定される。ただし、その規定中心値は、初期値として、探査中心時間窓ｍ₀（ｋ）が設定されている。つまり、本歌唱遷移区間抽出処理が起動されて最初にＳ４１０へと進んだ場合、分析時間窓＃ｐの窓区間長の中心は、探査中心時間窓ｍ₀（ｋ）となる。 Specifically, the analysis time window #p is set so that the center of the window section length (that is, the reference time window m serving as the center) becomes the specified center value. However, the specified center value has an exploration center time window m ₀ (k) as an initial value. That is, when the present singing transition section extraction process is activated and first proceeds to S410, the center of the window section length of the analysis time window #p is the search center time window m ₀ (k).

ただし、ここで言う＃ｐは、分析時間窓のインデックス番号であり、探査中心時間窓ｍ₀（ｋ）を原点として、負（マイナス）の値から正の値までをとる。 Here, #p is the index number of the analysis time window, and takes from a negative value to a positive value with the search center time window m ₀ (k) as the origin.

さらに、Ｓ４２０では、Ｓ４１０にて設定された分析時間窓＃ｐ内の基本周波数遷移ｆ０ｖ（ｍ）に対する分散値ｖｆ０（＃ｐ）を、下記（２）式により導出する。ただし、（２）式において、ｍｆ０（＃ｐ）は、分析時間窓＃ｐ内での基本周波数遷移ｆ０ｖ（ｍ）の平均値であり、下記（３）式により導出されるものである。 Further, in S420, a variance value vf0 (#p) for the fundamental frequency transition f0v (m) within the analysis time window #p set in S410 is derived by the following equation (2). However, in the equation (2), mf0 (#p) is an average value of the fundamental frequency transition f0v (m) within the analysis time window #p, and is derived from the following equation (3).

続いて、Ｓ４３０では、互いに連続する分析時間窓＃ｐに対する分散値ｖｆ０（＃ｐ）の差分を、下記（４）式により求める。以下、（４）式にて求められる値を分散値差分ｄｖｆ０（＃ｐ）と称す。 Subsequently, in S430, the difference of the variance value vf0 (#p) with respect to the analysis time windows #p continuous with each other is obtained by the following equation (4). Hereinafter, the value obtained by the equation (4) is referred to as a variance difference dvf0 (#p).

続く、Ｓ４４０では、今サイクルのＳ４３０で導出した分散値差分（以下、今分散値差分と称す）ｄｖｆ０（＃ｐ）を記憶する。以下、前サイクルのＳ４３０にて導出された分散値差分ｄｖｆ０（＃ｐ）を、前分散値差分ｄｖｆ０と称す。 Subsequently, in S440, the variance value difference (hereinafter referred to as the now variance value difference) dvf0 (#p) derived in S430 of the current cycle is stored. Hereinafter, the dispersion value difference dvf0 (#p) derived in S430 of the previous cycle is referred to as a previous dispersion value difference dvf0.

そして、Ｓ４５０では、今分散値差分ｄｖｆ０（＃ｐ）の絶対値が、予め規定された許容値未満であるか否かを判定する。そして、判定の結果、今分散値差分ｄｖｆ０（＃ｐ）の絶対値が許容値未満であれば、即ち、今分散値差分ｄｖｆ０（＃ｐ）が０とみなせる場合には、Ｓ４６０へと進む。 In S450, it is determined whether or not the absolute value of the current variance value dvf0 (#p) is less than a predetermined allowable value. If the absolute value of the current variance value dvf0 (#p) is less than the allowable value as a result of the determination, that is, if the current variance value difference dvf0 (#p) can be regarded as 0, the process proceeds to S460.

そのＳ４６０では、前分散値差分ｄｖｆ０の絶対値が許容値以上であるか否かを判定する。そして、判定の結果、前分散値差分ｄｖｆ０の絶対値が許容値以上であれば、即ち、前分散値差分ｄｖｆ０が０とみなせない場合には、Ｓ４７０へと進む。 In S460, it is determined whether or not the absolute value of the previous variance difference dvf0 is greater than or equal to the allowable value. As a result of the determination, if the absolute value of the previous variance value difference dvf0 is greater than or equal to the allowable value, that is, if the previous variance value difference dvf0 cannot be regarded as 0, the process proceeds to S470.

そのＳ４７０では、変極回数カウンタをインクリメントして、Ｓ４８０へと進む。ただし、変極回数カウンタの初期値は、０である。 In S470, the inversion number counter is incremented, and the process proceeds to S480. However, the initial value of the pole change counter is 0.

そのＳ４８０では、変極回数カウンタが２以上であるか否かを判定し、その判定の結果、変極回数カウンタが２未満であれば、Ｓ４９０へと進む。 In S480, it is determined whether or not the pole change counter is 2 or more. If the pole change counter is less than 2 as a result of the determination, the process proceeds to S490.

なお、Ｓ４５０での判定の結果、今分散値差分ｄｖｆ０（＃ｐ）の絶対値が許容値以上である場合や、Ｓ４６０での判定の結果、前分散値差分ｄｖｆ０の絶対値が許容値未満である場合にも、Ｓ４９０へと進む。 As a result of the determination in S450, when the absolute value of the current variance value difference dvf0 (#p) is greater than or equal to the allowable value, or as a result of the determination in S460, the absolute value of the previous variance value difference dvf0 is less than the allowable value. In some cases, the process proceeds to S490.

そのＳ４９０では、規定中心値を、今サイクルにて設定されている規定中心値から、基準時間窓ｍをｐ個（ただし、ｐ≦Ｍ）だけずらして設定する。これにより、次にＳ４１０へと戻った際に、窓区間長の中心について、基準時間窓ｍがｐ個分だけ変位された分析時間窓＃ｐが設定される。 In S490, the specified center value is set by shifting the reference time window m by p (where p ≦ M) from the specified center value set in the current cycle. Thereby, when returning to S410 next time, an analysis time window #p in which the reference time window m is displaced by p pieces is set for the center of the window section length.

ただし、本実施形態のＳ４９０では、変極回数カウンタが０である時には、規定中心値を、基本周波数遷移ｆ０ｖ（ｍ）の時間進行において、時間の進行方向とは反対の方向に沿って変位させる。また、変極カウンタが１である時には、規定中心値を、基本周波数遷移ｆ０ｖ（ｍ）の時間進行において、時間の進行方向に沿って変位させる。 However, in S490 of the present embodiment, when the inversion number counter is 0, the specified center value is displaced along the direction opposite to the time direction in the time progress of the fundamental frequency transition f0v (m). . Further, when the change counter is 1, the specified center value is displaced along the direction of time progression in the time progression of the fundamental frequency transition f0v (m).

ここで、Ｓ４１０からＳ４９０の処理（以下、特定処理と称す）を繰り返した時の動作を説明する。 Here, the operation when the processing from S410 to S490 (hereinafter referred to as specific processing) is repeated will be described.

まず、図６（Ａ）に示すように、変極回数カウンタが０である時に特定処理を繰り返すと、探査中心時間窓ｍ₀（ｋ）を基準として、基本周波数遷移ｆ０ｖ（ｍ）における時間の進行とは反対の方向に沿って（図６（Ａ）中の左方向へと）、分析時間窓＃ｐが設定される。なお、この時に、Ｓ４１０にて設定される分析時間窓のインデックス番号＃ｐは、分析時間窓＃ｐが設定される毎に、値を減少させる（値は、マイナス方向に大きくなる）。 First, as shown in FIG. 6 (A), if the specific process is repeated when the inversion number counter is 0, the time of the fundamental frequency transition f0v (m) is determined with reference to the search center time window m ₀ (k). An analysis time window #p is set along the direction opposite to the progression (to the left in FIG. 6A). At this time, the index number #p of the analysis time window set in S410 is decreased every time the analysis time window #p is set (the value increases in the minus direction).

一方、変極回数カウンタが１である時に特定処理を繰り返すと、探査中心時間窓ｍ₀（ｋ）を基準として、基本周波数遷移ｆ０ｖ（ｍ）における時間の進行に沿って（図６（Ａ）中の右方向へと）、分析時間窓＃ｐが設定される。なお、この時に、Ｓ４１０にて設定される分析時間窓のインデックス番号＃ｐは、分析時間窓＃ｐが設定される毎に、値を増加する（値は、プラス方向に大きくなる）。 On the other hand, if the specific process is repeated when the inversion number counter is 1, along the time progress at the fundamental frequency transition f0v (m) with reference to the search center time window m ₀ (k) (FIG. 6A). Analysis right window #p is set. At this time, the index number #p of the analysis time window set in S410 is incremented every time the analysis time window #p is set (the value increases in the plus direction).

そして、Ｓ４１０にて設定される分析時間窓＃ｐは、窓区間長の中心が、ｐずつ変位されることから、特定処理を繰り返すことで、少なくとも互いに隣接する複数の分析時間窓＃ｐが設定される。 The analysis time window #p set in S410 is set so that at least a plurality of analysis time windows #p adjacent to each other are set by repeating the specific processing because the center of the window section length is displaced by p. Is done.

さらに、特定処理を繰り返す際にＳ４２０にて導出される分散値ｖｆ０（＃ｐ）は、図６（Ｂ）に示すように、遷移元音から到達音へと歌唱音高が遷移する区間（以下、遷移区間と称す）内でのものであれば、分析時間窓＃ｐ毎の変化は小さいものの、値自体は大きなものとなる。一方、遷移元音及び到達音の歌唱音高に対する区間（以下、定常区間と称す）内でのものであれば、分析時間窓＃ｐ毎の変化は小さいものの、値自体は大きなものとなる。 Furthermore, when the specific process is repeated, the variance value vf0 (#p) derived in S420 is a section in which the singing pitch transitions from the transition source sound to the arrival sound (hereinafter, referred to as “B”) as shown in FIG. If it is within the transition section), the value itself is large although the change for each analysis time window #p is small. On the other hand, if it is within the section (hereinafter referred to as a steady section) with respect to the singing pitch of the transition source sound and the reaching sound, the value itself is large although the change for each analysis time window #p is small.

したがって、特定処理を繰り返す際にＳ４３０にて導出される分散値差分ｄｖｆ０（＃ｐ）は、図７（Ａ）に示すように、定常区間内及び遷移区間内でのものであれば、その値自体が０とみなせる。一方、定常区間から遷移区間への切り替わりや、遷移区間から定常区間への切り替わりでは、その値自体が０とはみなすことができない。 Therefore, when the specific process is repeated, the variance difference dvf0 (#p) derived in S430 is the value if it is within the steady section and the transition section as shown in FIG. 7A. It can be regarded as 0 itself. On the other hand, the value itself cannot be regarded as 0 when switching from the steady section to the transition section or switching from the transition section to the steady section.

つまり、分散値差分ｄｖｆ０は、定常区間から遷移区間への切り替わりにて極大値となり、遷移区間から定常区間への切り替わりにて極小値となる。このため、本実施形態では、Ｓ４５０，Ｓ４６０にて、今分散値差分ｄｖｆ０（＃ｐ）の絶対値が許容値未満であるか否か、及び前分散値差分ｄｖｆ０の絶対値が許容値以上であるか否かを判定して、分散値差分ｄｖｆ０（＃ｐ）が極値となったか否かを判定している。 That is, the variance difference dvf0 becomes a maximum value when switching from the steady section to the transition section, and becomes a minimum value when switching from the transition section to the steady section. Therefore, in this embodiment, in S450 and S460, whether or not the absolute value of the current variance value difference dvf0 (#p) is less than the allowable value, and the absolute value of the previous variance value difference dvf0 is greater than or equal to the allowable value. It is determined whether or not there is a variance, and it is determined whether or not the variance difference dvf0 (#p) is an extreme value.

言い換えれば、特定処理を繰り返す際に、変極回数カウンタが２以上となると、分散値差分ｄｖｆ０（＃ｐ）から、２つの極値が検出されたことになる。 In other words, when the specific processing is repeated, if the pole number counter becomes 2 or more, two extreme values are detected from the variance difference dvf0 (#p).

ここで、図３へと戻り、Ｓ４８０での判定の結果、変極回数カウンタが２以上であれば、Ｓ５００へと進む。そのＳ５００では、区間終端に対応する基準時間窓（以下、区間終端窓ｍ_fin（ｋ−１）とする）、及び区間始端に対応する基準時間窓（以下、区間始端窓ｍ_st（ｋ）とする）を検出する。 Here, returning to FIG. 3, if the result of determination in S480 is that the number of pole change counter is 2 or more, the process proceeds to S500. In S500, a reference time window corresponding to the section end (hereinafter referred to as section end window m _fin (k−1)) and a reference time window corresponding to the section start end (hereinafter referred to as section start end window m _st (k)) ) Is detected.

具体的に、本実施形態のＳ５００では、図７（Ａ）に示すように、分散値差分ｄｖｆ０（＃ｐ）が極大値となった分析時間窓＃ｐに対応する規定中心値を区間終端とし、分散値差分ｄｖｆ０（＃ｐ）が極小値となった分析時間窓＃ｐに対応する規定中心値を区間始端とする。このため、区間終端となった規定中心値から探索中心時間窓ｍ₀（ｋ）までの間の基準時間窓ｍの個数をｐｂとし、区間始端となった規定中心値から探索中心時間窓ｍ₀（ｋ）までの間の基準時間窓ｍの個数をｐｆとする。すると、区間終端窓ｍ_fin（ｋ−１）は、ｍ_fin（ｋ−１）＝ｍ₀（ｋ）−ｐｂにて表され、区間始端窓ｍｓｔ（ｋ）は、ｍ_st（ｋ）＝ｍ₀（ｋ）＋ｐｆにて表される。 Specifically, in S500 of the present embodiment, as shown in FIG. 7A, the specified center value corresponding to the analysis time window #p in which the variance difference dvf0 (#p) is a maximum value is set as the section end. The defined center value corresponding to the analysis time window #p in which the variance value difference dvf0 (#p) is the minimum value is set as the section start end. Therefore, the number of reference time windows m between the specified center value at the end of the section and the search center time window m ₀ (k) is pb, and the search center time window m ₀ from the specified center value at the start of the section. Let pf be the number of reference time windows m up to (k). Then, the section end window m _fin (k−1) is represented by m _fin (k−1) = m ₀ (k) −pb, and the section start window mst (k) is m _st (k) = m ₀ (k) + pf.

なお、極大値及び極小値を検出する手法としては、Ｓ４４０にて記憶された全ての分散値差分の中で、値が最大または最小となるものを、それぞれ、極大値及び極小値とすればよい。 As a technique for detecting the maximum value and the minimum value, the maximum or minimum value among all the variance value differences stored in S440 may be set as the maximum value and the minimum value, respectively. .

続いて、Ｓ５１０では、遷移元音に対する区間についての歌唱音高ｆ０（ｋ−１）、到達音に対する区間についての歌唱音高ｆ０（ｋ）を、それぞれ下記（５）式，（６）式に従って導出すると共に、それらの歌唱音高の音高差ｆ０ｄ（ｋ）を下記（７）式に従って導出する。 Subsequently, in S510, the singing pitch f0 (k-1) for the section with respect to the transition source sound and the singing pitch f0 (k) for the section with respect to the reaching sound are respectively expressed by the following formulas (5) and (6). At the same time, the pitch difference f0d (k) between these singing pitches is derived according to the following equation (7).

つまり、音高差ｆ０ｄ（ｋ）は、歌唱音高ｆ０（ｋ−１）と歌唱音高ｆ０（ｋ）との比を、半音で量子化したものであり、いわゆる音程として表されたものである。 That is, the pitch difference f0d (k) is obtained by quantizing the ratio of the singing pitch f0 (k-1) and the singing pitch f0 (k) with a semitone and expressed as a so-called pitch. is there.

また、Ｓ５２０では、区間終端窓ｍ_fin（ｋ−１）、及び区間始端窓ｍ_st（ｋ）それぞれの該当カラオケ楽曲の演奏開始からの時間を、下記（８），（９）式に従って導出する。 In S520, the time from the start of the performance of the corresponding karaoke piece in the section end window m _fin (k-1) and the section start window m _st (k) is derived according to the following equations (8) and (9). .

これと共に、Ｓ５２０では、区間終端窓ｍ_fin（ｋ−１）から区間始端窓ｍ_st（ｋ）までの期間長を遷移期間ｔｒｔ（ｋ）として、下記（１０）式に従って導出する。 At the same time, in S520, the period length from the section end window m _fin (k−1) to the section start window m _st (k) is derived as the transition period trt (k) according to the following equation (10).

ただし、（１０）式にて導出される遷移期間ｔｒｔ（ｋ）は、基準時間窓ｍの個数によって表されている。 However, the transition period trt (k) derived by the equation (10) is represented by the number of reference time windows m.

続く、Ｓ５３０では、Ｓ５１０にて導出した音高差ｆ０ｄ（ｋ）と、Ｓ４７０にて導出した遷移期間ｔｒｔ（ｋ）とを記憶する。 In S530, the pitch difference f0d (k) derived in S510 and the transition period trt (k) derived in S470 are stored.

そして、Ｓ５４０では、下記（１１）式に従って、基本周波数遷移ｆ０ｖ（ｍ）において、先のＳ５２０にて導出した遷移期間ｔｒｔ（ｋ）に対応する区間を歌唱遷移区間ｆ０ｖｔ（ｍ）として抽出する。ただし、（１１）式中のｍｍは、１からｔｒｔ（ｋ）までの基準時間窓ｍである。 In S540, according to the following equation (11), in the fundamental frequency transition f0v (m), the section corresponding to the transition period trt (k) derived in the previous S520 is extracted as the singing transition section f0vt (m). However, mm in the formula (11) is a reference time window m from 1 to trt (k).

そして、その後、歌唱音高差特定処理のＳ１７０へと戻る。 Then, the process returns to S170 of the singing pitch difference identification process.

つまり、歌唱遷移区間抽出処理では、図７（Ｂ）に示すように、遷移元音に対する歌唱区間、到達音に対する歌唱区間、及びそれらの歌唱区間に挟まれた歌唱区間の中から、区間終端窓ｍ_fin（ｋ−１）と区間始端窓ｍ_st（ｋ）とを特定する。そして、特定した区間終端窓ｍ_fin（ｋ−１）から区間始端窓ｍ_st（ｋ）までの間の期間を遷移期間ｔｒｔ（ｋ）として導出する。これと共に、基本周波数遷移ｆ０ｖ（ｍ）中にて、遷移期間ｔｒｔ（ｋ）に対応する区間を歌唱遷移区間ｆ０ｖｔ（ｍ）として抽出している。 That is, in the singing transition section extraction process, as shown in FIG. 7 (B), a section ending window is selected from the singing section for the transition source sound, the singing section for the reaching sound, and the singing section sandwiched between these singing sections. _Specify m _fin (k−1) and the section start window m _st (k). Then, the period from the identified section end window m _fin (k−1) to the section start window m _st (k) is derived as a transition period trt (k). At the same time, the section corresponding to the transition period trt (k) is extracted as the singing transition section f0vt (m) in the fundamental frequency transition f0v (m).

ここで、歌唱音高差特定処理（即ち、図２）のＳ１７０へと戻る。 Here, it returns to S170 of a song pitch difference specific process (namely, FIG. 2).

そのＳ１７０では、歌唱遷移区間抽出処理にて抽出した歌唱遷移区間ｆ０ｖｔ（ｍ）と、遷移元音から到達音への発声音高の理想的な遷移態様を表す音高遷移モデルとの類似度合いを、下記（１２）式から（１５）式に従って求める。以下、このＳ１７０で導出した類似度合いを、音高遷移スコアｔｓｃ（ｍ）とする。 In S170, the degree of similarity between the singing transition section f0vt (m) extracted by the singing transition section extraction process and the pitch transition model representing the ideal transition mode of the utterance pitch from the transition source sound to the arrival sound is calculated. From the following equations (12) to (15): Hereinafter, the similarity degree derived in S170 is referred to as a pitch transition score tsc (m).

ここで、音高遷移スコアｔｓｃ（ｍ）を導出する具体的な手法について説明する。 Here, a specific method for deriving the pitch transition score tsc (m) will be described.

なお、本実施形態における音高遷移モデルは、遷移元音から到達音への音高の遷移が正確、かつ滑らか（スムーズ）に聞こえるように、下記（１２）式にて表された時間関数である。以下、下記（１２）式にて表される音高遷移モデルを、音高遷移モデル曲線ｆ０ｍｏｄｅｌ（ｍ）と称す。 Note that the pitch transition model in the present embodiment is a time function expressed by the following equation (12) so that the transition of the pitch from the transition source sound to the arrival sound can be heard accurately and smoothly. is there. Hereinafter, the pitch transition model represented by the following equation (12) is referred to as a pitch transition model curve f0model (m).

つまり、音高遷移モデル曲線ｆ０ｍｏｄｅｌ（ｍ）は、照合すべき歌唱遷移区間ｆ０ｖｔ（ｍ）それぞれの時間長に従って変動するものである。ただし、Ｍ₀は、遷移期間ｔｒｔ（ｋ）を表す。 That is, the pitch transition model curve f0model (m) varies according to the time length of each singing transition section f0vt (m) to be verified. However, M ₀ represents the transition period trt (k).

そして、図５（Ａ）に示すように、音高遷移モデル曲線ｆ０ｍｏｄｅｌ（ｍ）を、歌唱遷移区間ｆ０ｖｔ（ｍ）に照合することで、音高遷移スコアｔｓｃ（ｍ）を導出する。この音高遷移モデル曲線ｆ０ｍｏｄｅｌ（ｍ）と、歌唱遷移区間ｆ０ｖｔ（ｍ）との照合は、下記（１３）式により実行される。 Then, as shown in FIG. 5A, the pitch transition score tsc (m) is derived by matching the pitch transition model curve f0model (m) with the singing transition section f0vt (m). The pitch transition model curve f0model (m) and the singing transition section f0vt (m) are collated by the following equation (13).

ただし、（１３）式中のＭＶ_VOは、歌唱遷移区間ｆ０ｖｔ（ｍ）の平均値を示し、下記（１４）式に従って導出され、（１３）式中のＭＶ_MOは、音高遷移モデル曲線ｆ０ｍｏｄｅｌ（ｍ）の平均値を示し、下記（１５）式に従って導出される。 However, MV _VO in the equation (13) indicates an average value of the singing transition section f0vt (m) and is derived according to the following equation (14). The MV _MO in the equation (13) is a pitch transition model curve f0model. The average value of (m) is shown and derived according to the following equation (15).

つまり、音高遷移スコアｔｓｃ（ｋ）は、歌唱遷移区間ｆ０ｖｔ（ｍ）と、音高遷移モデル曲線ｆ０ｍｏｄｅｌ（ｍ）と類似度合いが高いほど（即ち、音高遷移が滑らかな（スムーズな）ほど）大きな値となる。 That is, the pitch transition score tsc (k) is higher as the degree of similarity between the singing transition section f0vt (m) and the pitch transition model curve f0model (m) is higher (that is, the pitch transition is smoother). ) Large value.

続くＳ１８０では、Ｓ１５０にて算出した全ての時間遅延量ｔｌ（ｋ）に対応する歌唱遷移区間ｆ０ｖｔ（ｍ）を抽出して、音高遷移モデル曲線ｆ０ｍｏｄｅｌ（ｍ）との照合を実施したか否かを判定する。その判定の結果、照合を実施済でなければ、Ｓ１６０へと戻り、全ての時間遅延量ｔｌ（ｋ）に対応する歌唱遷移区間ｆ０ｖｔ（ｍ）を抽出して、音高遷移モデル曲線ｆ０ｍｏｄｅｌ（ｍ）との照合を実施するまで、Ｓ１６０，Ｓ１７０を繰り返す。一方、判定の結果、実施済であれば、Ｓ１９０へと進む。 In subsequent S180, the singing transition section f0vt (m) corresponding to all the time delay amounts tl (k) calculated in S150 is extracted, and collation with the pitch transition model curve f0model (m) is performed. Determine whether. As a result of the determination, if collation has not been performed, the process returns to S160, where the singing transition section f0vt (m) corresponding to all the time delay amounts tl (k) is extracted, and the pitch transition model curve f0model (m S160 and S170 are repeated until collation with () is performed. On the other hand, if it is determined that the determination has been made, the process proceeds to S190.

そのＳ１９０では、Ｓ１７０にて導出した音高遷移スコアｔｓｃ（ｋ）それぞれを、下記（１６）式に従って集計する。以下、（１６）式にて集計した結果を、集計遷移スコアｍｔｓｃ（ｔｒｔ，ｆ０ｄ）とする。 In S190, the pitch transition scores tsc (k) derived in S170 are totaled according to the following equation (16). Hereinafter, the result obtained by the aggregation by the equation (16) is defined as an aggregation transition score mtsc (trt, f0d).

なお、（１６）式中のｋ'は、音高差ｆ０ｄ（ｋ）、及び遷移期間ｔｒｔ（ｋ）が同一分類区分中に含まれる構成音（以下、同一構成音と称す）を表す識別番号（上述したｋ番目や、ｋ＋１番目に相当）の集合であり、Ｋ０は、同一構成音の総数である。 Note that k ′ in the equation (16) is an identification number that represents a constituent sound (hereinafter referred to as the same constituent sound) in which the pitch difference f0d (k) and the transition period trt (k) are included in the same classification category. This is a set (corresponding to the k-th and k + 1-th mentioned above), and K0 is the total number of identical constituent sounds.

つまり、この集計遷移スコアｍｔｓｃ（ｔｒｔ，ｆ０ｄ）は、Ｓ１７０にて導出された音高遷移スコアｔｓｃ（ｍ）を、音高差ｆ０ｄ（ｋ）、及び遷移期間ｔｒｔ（ｋ）が同一である分類区分（以下、音高差区分（ｔｒｔ，ｆ０ｄ）と称す。）毎に集計したものである。したがって、集計遷移スコアｍｔｓｃ（ｔｒｔ，ｆ０ｄ）をマップ化した集計遷移スコア分布は、図５（Ｂ）に示すように、音高差ｆ０ｄ（ｋ）、遷移期間ｔｒｔ（ｋ）、集計遷移スコアｍｔｓｃ（ｔｒｔ，ｆ０ｄ）をそれぞれの軸とした３次元のマトリクスとなる。ただし、図５（Ｂ）では、集計遷移スコア（ｔｒｔ，ｆ０ｄ）の大きさを色の濃淡を用いて示した。 In other words, the total transition score mtsc (trt, f0d) is classified into the pitch transition score tsc (m) derived in S170 with the same pitch difference f0d (k) and transition period trt (k). These are tabulated for each section (hereinafter referred to as pitch difference section (trt, f0d)). Therefore, the total transition score distribution obtained by mapping the total transition score mtsc (trt, f0d) is, as shown in FIG. 5B, pitch difference f0d (k), transition period trt (k), total transition score mtsc. This is a three-dimensional matrix with (trt, f0d) as the respective axes. However, in FIG. 5B, the size of the total transition score (trt, f0d) is shown using shades of color.

なお、本実施形態では、Ｓ１８０へと進んだ回数が二回目以降である場合、そのサイクル（即ち、今回のＳ１９０）にて導出したｍｔｓｃ（ｔｒｔ，ｆ０ｄ）を、前サイクルにて（即ち、前回のＳ１９０以前に）導出されたｍｔｓｃ（ｔｒｔ，ｆ０ｄ）に積算する。つまり、最終的に生成された集計遷移スコア（ｔｒｔ、ｆ０ｄ）には、各音声データから導出される集計遷移スコアｍｔｓｃ（ｔｒｔ，ｆ０ｄ）が全て積算されている。 In the present embodiment, when the number of times of advance to S180 is the second time or later, mtsc (trt, f0d) derived in that cycle (that is, current S190) is changed to the previous cycle (that is, the previous time). Before S190) is added to the derived mtsc (trt, f0d). In other words, the total transition score mtsc (trt, f0d) derived from each audio data is integrated with the total transition score (trt, f0d) finally generated.

さらに、Ｓ２００にて、記憶部２１の音声データ格納領域に格納されている全ての音声データに対して、Ｓ１１０からＳ１９０までの処理（ここでは、規定処理と称す）を実行したか否かを判定する。 Further, in S200, it is determined whether or not the processing from S110 to S190 (herein referred to as the regulation processing) has been executed for all the audio data stored in the audio data storage area of the storage unit 21. To do.

その判定の結果、全ての音声データに対して規定処理を実行していなければ、Ｓ１１０へと戻り、そのＳ１１０にて、記憶部２１の音声データ格納領域に格納されている全音声データの中から、規定処理を未実行である音声データを取得して、Ｓ１２０へと進む。一方、Ｓ２００での判定の結果、全ての音声データに対して規定処理を実行済であれば、Ｓ２１０へと進む。 As a result of the determination, if the prescribed process has not been executed for all the audio data, the process returns to S110, and in S110, from among all the audio data stored in the audio data storage area of the storage unit 21 The voice data for which the regulation process has not been executed is acquired, and the process proceeds to S120. On the other hand, if the result of determination in S200 is that definition processing has been executed for all audio data, the process proceeds to S210.

そして、Ｓ２１０では、各音高差区分（ｔｒｔ，ｆ０ｄ）における集計遷移スコアｍｔｓｃ（ｔｒｔ、ｆ０ｄ）それぞれが規定値以上であるか否かを、（１７）式により判定する。 In S210, whether or not each of the total transition scores mtsc (trt, f0d) in each pitch difference section (trt, f0d) is equal to or greater than a specified value is determined by the equation (17).

つまり、集計遷移スコアｍｔｓｃ（ｔｒｔ，ｆ０ｄ）が規定値以上であれば、その音高差区分（ｔｒｔ，ｆ０ｄ）を歌唱可能音高差区分ＩＴＭ₁（ｔｒｔ，ｆ０ｄ）として判別する。一方、集計遷移スコアｍｔｓｃ（ｔｒｔ，ｆ０ｄ）が規定値未満であれば、その音高差区分（ｔｒｔ，ｆ０ｄ）を歌唱不能音高差区分ＩＴＭ₀（ｔｒｔ，ｆ０ｄ）として判別する。 That is, if the total transition score mtsc (trt, f0d) is equal to or greater than the specified value, the pitch difference section (trt, f0d) is determined as the singable pitch difference section ITM ₁ (trt, f0d). On the other hand, if the total transition score mtsc (trt, f0d) is less than the specified value, the pitch difference section (trt, f0d) is determined as the unsongable pitch difference section ITM ₀ (trt, f0d).

なお、本実施形態における規定値は、集計遷移スコアｍｔｓｃ（ｔｒｔ，ｆ０ｄ）の最大値Ｍａｘ＿ｔｓｃに対して予め規定された割合Ａ（例えば、最大値の半分や、１／３等）である（規定値＝Ｍａｘ＿ｔｓｃ×Ａ）。 Note that the specified value in the present embodiment is a ratio A (for example, half of the maximum value, 1/3, etc.) defined in advance with respect to the maximum value Max_tsc of the total transition score mtsc (trt, f0d) (specified) Value = Max_tsc × A).

続く、Ｓ２２０では、Ｓ２１０で特定した歌唱可能音高差区分ＩＴＭ₁（ｔｒｔ，ｆ０ｄ）を利用した処理（以下、特定音高差利用処理）を実行する。 Subsequently, in S220, singing possible pitch difference division ITM ₁ (trt, f0d) specified in S210 processing using (hereinafter, a particular pitch difference use processing) executes.

具体的に、本実施形態では、Ｓ２１０で特定した歌唱可能音高差区分ＩＴＭ₁（ｔｒｔ，ｆ０ｄ）、及び歌唱不能音高差区分ＩＴＭ₀（ｔｒｔ，ｆ０ｄ）に基づいて、結果マトリクスＩＴＭ（ｔｒｔ，ｆ０ｄ）を生成し、その生成した結果マトリクスＩＴＭ（ｔｒｔ，ｆ０ｄ）を表示部２３に表示する。これと共に、結果マトリクスＩＴＭ（ｔｒｔ，ｆ０ｄ）を、対応するユーザ情報と対応付けて記憶部２１及びサーバ３０に記憶する。 Specifically, in this embodiment, on the basis of singing possible pitch difference was identified in S210 Category ITM ₁ (trt, f0d), and singing impossible pitch difference division ITM ₀ (trt, f0d), the result matrix ITM (trt , F0d), and the generated result matrix ITM (trt, f0d) is displayed on the display unit 23. At the same time, the result matrix ITM (trt, f0d) is stored in the storage unit 21 and the server 30 in association with the corresponding user information.

この結果マトリクスＩＴＭ（ｔｒｔ，ｆ０ｄ）は、図８に示すように、歌唱可能音高差区分ＩＴＭ₁（ｔｒｔ，ｆ０ｄ）、歌唱不能音高差区分ＩＴＭ₀（ｔｒｔ，ｆ０ｄ）それぞれを、各音高差区分（ｔｒｔ，ｆ０ｄ）の並びに沿って配置したものである。 As a result, as shown in FIG. 8, the matrix ITM (trt, f0d) includes the singable pitch difference section ITM ₁ (trt, f0d) and the unsingable pitch difference section ITM ₀ (trt, f0d). It is arranged along the sequence of the height difference sections (trt, f0d).

さらに、本実施形態の特定音高差利用処理では、特定構成音の音高差が、主として、歌唱可能音高差区分ＩＴＭ₁（ｔｒｔ，ｆ０ｄ）の範囲内である楽曲を推奨曲としてユーザに提案する。これと共に、特定音高差利用処理では、歌唱可能音高差区分ＩＴＭ₁（ｔｒｔ，ｆ０ｄ）の最大音高差ｆ０ｄが大きいほど大きな値となる重みを、採点処理による採点結果に付与する。 Furthermore, in the specific pitch difference use processing of the present embodiment, the music whose pitch difference of the specific component sound is mainly within the range of the singable pitch difference category ITM ₁ (trt, f0d) is recommended to the user. suggest. At the same time, in the specific pitch difference use processing, a weight that becomes larger as the maximum pitch difference f0d of the singable pitch difference section ITM ₁ (trt, f0d) is larger is given to the scoring result by the scoring processing.

そして、その後、本歌唱音高差特定処理を終了する。 Then, the singing pitch difference identification process is terminated.

つまり、本実施形態の歌唱音高差特定処理では、歌唱遷移区間抽出処理にて抽出した歌唱遷移区間ｆ０ｖｔ（ｍ）それぞれを、対応する音高遷移モデル曲線ｆ０ｍｏｄｅｌ（ｍ）に照合して、音高遷移スコアｔｓｃ（ｋ）を導出する。さらに、それらの導出した音高遷移スコアｔｓｃ（ｋ）を音高差区分（ｔｒｔ，ｆ０ｄ）毎に集計した集計遷移スコアｍｔｓｃ（ｔｒｔ，ｆ０ｄ）が規定値以上であるか否かを判定する。そして、その判定結果に従って、歌唱可能音高差区分ＩＴＭ₁（ｔｒｔ，ｆ０ｄ）、即ち、ユーザが歌唱可能な音高差区分を特定している。
［実施形態の効果］
以上説明したように、本実施形態のカラオケ装置２０によれば、ユーザが歌唱可能な音高差、即ち、ユーザが歌唱可能な音程を特定することができる。 That is, in the singing pitch difference specifying process of the present embodiment, each singing transition section f0vt (m) extracted in the singing transition section extracting process is compared with the corresponding pitch transition model curve f0model (m), and A high transition score tsc (k) is derived. Further, it is determined whether or not the total transition score mtsc (trt, f0d) obtained by tabulating the derived pitch transition scores tsc (k) for each pitch difference section (trt, f0d) is equal to or greater than a specified value. Then, according to the determination result, the singable pitch difference section ITM ₁ (trt, f0d), that is, the pitch difference section that the user can sing is specified.
[Effect of the embodiment]
As described above, according to the karaoke apparatus 20 of the present embodiment, it is possible to specify a pitch difference that the user can sing, that is, a pitch that the user can sing.

特に、本実施形態のカラオケ装置２０によれば、集計遷移スコアｍｔｓｃ（ｔｒｔ，ｆ０ｄ）を、音高差ｆ０ｄ（ｋ）毎かつ遷移期間ｔｒｔ（ｋ）毎に集計しているため、ユーザが歌唱可能な音高差ｆ０ｄを、遷移期間ｔｒｔの時間長毎に特定することができる。 In particular, according to the karaoke apparatus 20 of the present embodiment, the total transition score mtsc (trt, f0d) is totaled for each pitch difference f0d (k) and for each transition period trt (k). A possible pitch difference f0d can be specified for each time length of the transition period trt.

つまり、カラオケ装置２０によれば、ユーザの歌唱能力を詳細に特定することができる。 That is, according to the karaoke apparatus 20, a user's singing capability can be specified in detail.

なお、ユーザが無理をして発声した状態では、遷移期間ｔｒｔでの音高推移を滑らか（スムーズ）に歌唱することは困難である。しかし、このような場合、本実施形態のカラオケ装置２０では、音高遷移スコアｔｓｃ（ｋ）が小さな値として導出する。 In addition, in the state which the user uttered and forced, it is difficult to sing smoothly the pitch transition in the transition period trt. However, in such a case, in the karaoke apparatus 20 of this embodiment, the pitch transition score tsc (k) is derived as a small value.

したがって、本実施形態のカラオケ装置２０によれば、ユーザが無理をすること無く歌唱可能な音高差ｆ０ｄを特定することができる。 Therefore, according to the karaoke apparatus 20 of the present embodiment, it is possible to specify the pitch difference f0d that can be sung by the user without overdoing it.

そして、本実施形態のカラオケ装置２０では、特定構成音の音高差が、主として、歌唱可能音高差区分ＩＴＭ₁（ｔｒｔ，ｆ０ｄ）の範囲内である楽曲を推奨曲としてユーザに提案したり、歌唱可能音高差区分ＩＴＭ₁（ｔｒｔ，ｆ０ｄ）の最大音高差ｆ０ｄが大きいほど大きな値となる重みを、採点処理による採点結果に付与したりしている。 And in the karaoke apparatus 20 of this embodiment, the music whose pitch difference of a specific component sound is mainly in the range of the singable pitch difference division ITM ₁ (trt, f0d) is proposed to the user as a recommended song. In addition, a higher weight is assigned to the scoring result by the scoring process as the maximum pitch difference f0d of the singable pitch difference section ITM ₁ (trt, f0d) is larger.

この結果、本実施形態のカラオケ装置２０によれば、推奨曲をユーザに歌唱させることで、カラオケをより楽しませることができ、また、ユーザの歌唱の上手さをより正確に採点したりすることができる。
［その他の実施形態］
以上、本発明の実施形態について説明したが、本発明は上記実施形態に限定されるものではなく、本発明の要旨を逸脱しない範囲において、様々な態様にて実施することが可能である。 As a result, according to the karaoke apparatus 20 of the present embodiment, karaoke can be more enjoyed by allowing the user to sing the recommended song, and the user's skill in singing can be more accurately scored. Can do.
[Other Embodiments]
As mentioned above, although embodiment of this invention was described, this invention is not limited to the said embodiment, In the range which does not deviate from the summary of this invention, it is possible to implement in various aspects.

例えば、上記実施形態のカラオケシステム１は、一つのカラオケ装置２０と一つのサーバ３０とが接続されたものであったが、カラオケシステム１は、これに限るものではなく、複数のカラオケ装置２０と一つのサーバ３０とが接続されたものでも良いし、複数のカラオケ装置２０と複数のサーバ３０とが接続されたものでも良い。 For example, the karaoke system 1 of the above embodiment is one in which one karaoke device 20 and one server 30 are connected, but the karaoke system 1 is not limited to this, and a plurality of karaoke devices 20 and One server 30 may be connected, or a plurality of karaoke apparatuses 20 and a plurality of servers 30 may be connected.

また、上記実施形態の歌唱遷移区間抽出処理では、互いに連続する分析時間窓に対する分散値差分ｄｖｆ０（＃ｐ）が極大または極小となる基準時間窓ｍを、区間終端ｍ_fin（ｋ−１）及び区間始端ｍ_st（ｋ）として導出したが、区間終端ｍ_fin（ｋ−１）及び区間始端ｍ_st（ｋ）を導出する方法は、これに限るものではない。例えば、遷移元音及び到達音それぞれについて安定して歌唱している区間を特定し、それらの区間の終端，始端に対応する基準時間窓ｍをそれぞれ区間終端ｍ_fin（ｋ−１）及び区間始端ｍ_st（ｋ）としても良い。 In the singing transition section extraction process of the above embodiment, the reference time window m in which the variance difference dvf0 (#p) with respect to the consecutive analysis time windows is maximized or minimized is defined as the section end m _fin (k−1) and It derived as the section starting m _st (k), but the method for deriving a section termination m _fin (k-1) and the section beginning m _st (k) is not limited thereto. For example, the section which is singing stably about each of the transition source sound and the arrival sound is specified, and the reference time window m corresponding to the end and start of those sections is set to the section end m _fin (k−1) and the section start, respectively. m _st (k) may be used.

なお、上記実施形態における音高差特定処理では、音高差ｆ０ｄ（ｋ）と遷移期間ｔｒｔ（ｋ）とに従って、音高遷移スコアｔｓｃ（ｋ）を集計したが、音高遷移スコアｔｓｃ（ｋ）の集計方法は、これに限るものではなく、音高差ｆ０ｄ（ｋ）のみに従って、音高遷移スコアｔｓｃ（ｋ）を集計しても良い。
［実施形態と本発明との対応関係］
次に、上記実施形態と、特許請求の範囲との対応関係について説明する。 In the pitch difference specifying process in the above embodiment, the pitch transition score tsc (k) is tabulated according to the pitch difference f0d (k) and the transition period trt (k), but the pitch transition score tsc (k ) Is not limited to this, and the pitch transition score tsc (k) may be totaled according to only the pitch difference f0d (k).
[Correspondence between Embodiment and Present Invention]
Next, a correspondence relationship between the above embodiment and the claims will be described.

上記実施形態の音高差特定処理におけるＳ１１０を実行することで得られる機能が、本発明の音声信号取得手段に相当し、Ｓ１２０，Ｓ１３０を実行することで得られる機能が、歌唱データ生成手段に相当する。さらに、音高差特定処理のＳ１６０を実行することで得られる機能が、遷移区間特定手段に相当し、Ｓ１７０を実行することで得られる機能が、遷移値導出手段に相当する。また、音高差特定処理におけるＳ１９０，Ｓ２１０を実行することで得られる機能が、音高差特定手段に相当する。 The function obtained by executing S110 in the pitch difference specifying process of the above embodiment corresponds to the audio signal acquisition means of the present invention, and the function obtained by executing S120 and S130 is the song data generation means. Equivalent to. Furthermore, the function obtained by executing S160 of the pitch difference specifying process corresponds to the transition section specifying means, and the function obtained by executing S170 corresponds to the transition value deriving means. The function obtained by executing S190 and S210 in the pitch difference specifying process corresponds to the pitch difference specifying means.

なお、音高差特定処理のＳ１６０を実行することで得られる機能のうち、歌唱遷移区間抽出処理におけるＳ４１０，Ｓ４２０を実行することで得られる機能が、本発明の分散値導出手段に相当し、Ｓ４３０，Ｓ４５０〜Ｓ４８０，Ｓ５００を実行することで得られる機能が、本発明の始終端特定手段に相当する。 Of the functions obtained by executing S160 of the pitch difference specifying process, the functions obtained by executing S410 and S420 in the singing transition section extraction process correspond to the variance value deriving means of the present invention, The function obtained by executing S430, S450 to S480, S500 corresponds to the start / end identification means of the present invention.

１…カラオケシステム２０…カラオケ装置２１…記憶部２２…通信部２３…表示部２４…操作受付部２５…音声入出力部２６…マイクロホン２７…スピーカ２８…制御部３０…サーバ DESCRIPTION OF SYMBOLS 1 ... Karaoke system 20 ... Karaoke apparatus 21 ... Memory | storage part 22 ... Communication part 23 ... Display part 24 ... Operation reception part 25 ... Voice input / output part 26 ... Microphone 27 ... Speaker 28 ... Control part 30 ... Server

Claims

A singing pitch difference identifying device used in a karaoke device that plays the music according to music data representing the pitch and value of each of the constituent sounds constituting each music,
An audio signal for acquiring the audio signal in association with identification information for identifying the user who emitted the audio and the music being played, using the audio input during the performance of the music in the karaoke apparatus as an audio signal Acquisition means;
Singing data generating means for generating singing data representing a transition in pitch by analyzing the frequency of the sound signal acquired by the sound signal acquiring means;
Among two constituent sounds that are continuous so that the pitch is switched in the music specified from the identification information, the constituent sound that arrives after the pitch is switched is the arrival sound, and the transition before the pitch is switched In the singing data generated by the singing data generating means, the original component sound is the transition original sound, from the end of the singing section for the transition original sound, at the beginning of the singing section for the reaching sound A transition section specifying means for specifying a singing section up to a certain section start point as a singing transition section;
A transition mode when an utterance pitch is ideally transitioned from a pitch corresponding to the transition source sound to a pitch corresponding to the arrival sound is a pitch transition model, and is specified by the transition section specifying unit For each singing transition section, transition value deriving means for deriving a pitch transition value that becomes a larger value as the degree of coincidence between the singing transition section and the pitch transition model is higher,
The pitch transition values derived by the transition value deriving means are totaled for each pitch difference in each singing transition section, and the total transition value that is the total result is a pitch that is equal to or greater than a predetermined value. A singing pitch difference specifying device for specifying a difference as a pitch difference that can be sung by a user specified by the identification information.

The singing pitch difference specifying means is:
2. The singing pitch difference identifying device according to claim 1, wherein the pitch transition values are classified and totaled for each section length of each singing transition section.

The singing pitch difference specifying means is:
Normalizing the aggregate transition value,
The singing pitch difference identifying device according to claim 1 or 2, wherein a ratio defined in advance with respect to the normalized maximum value of the totalized transition value is set as the specified value.

The transition section specifying means includes
In the singing data, in a switching section consisting of a singing section for the transition source sound, a singing section for the reaching sound, and a singing section sandwiched between the two singing sections, so as to continue with the progress of time A variance value deriving means for deriving a variance value that is a variance of the singing data within the time window for each of the set time windows, and setting a plurality of time windows having a preset section length;
Of the variance values derived by the variance value deriving means, a difference between variance values for successive time windows is derived, and the centers of two time windows where the difference is minimum and maximum are respectively defined as the end of the section and the The singing pitch difference specifying device according to any one of claims 1 to 3, further comprising start / end specifying means for specifying the section start end.

A program to be executed by a computer used in a karaoke apparatus that plays the music according to music data representing the pitch and value of each of the constituent sounds that make up each music,
An audio signal for acquiring the audio signal in association with identification information for identifying the user who emitted the audio and the music being played, using the audio input during the performance of the music in the karaoke apparatus as an audio signal Acquisition procedure;
A singing data generation procedure for generating singing data representing a transition in pitch by analyzing the frequency of the audio signal acquired in the audio signal acquisition procedure;
Among two constituent sounds that are continuous so that the pitch is switched in the music specified from the identification information, the constituent sound that arrives after the pitch is switched is the arrival sound, and the transition before the pitch is switched In the singing data generated in the singing data generation procedure, the original component sound is a transition original sound, and from the end of the singing section for the transition original sound, at the beginning of the singing section for the reaching sound A transition section specifying procedure for specifying a singing section up to the beginning of a section as a singing transition section,
A transition mode when an utterance pitch is ideally transitioned from a pitch corresponding to the transition source sound to a pitch corresponding to the arrival sound is defined as a pitch transition model, and specified by the transition section specifying procedure. For each singing transition section, a transition value derivation procedure for deriving a pitch transition value that becomes a larger value as the degree of coincidence between the singing transition section and the pitch transition model is higher,
The pitch transition values derived by the transition value deriving procedure are totaled for each pitch difference in each singing transition section, and the total transition value as a result of the totaling is equal to or higher than a predetermined value. A program that causes the computer to execute a singing pitch difference specifying procedure that specifies a difference as a pitch difference that can be sung by a user specified from the identification information.