JP2015114361A

JP2015114361A - Acoustic signal analysis device and acoustic signal analysis program

Info

Publication number: JP2015114361A
Application number: JP2013253993A
Authority: JP
Inventors: 陽前澤; Akira Maezawa
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2013-12-09
Filing date: 2013-12-09
Publication date: 2015-06-22
Anticipated expiration: 2033-12-09
Also published as: JP6252147B2

Abstract

PROBLEM TO BE SOLVED: To provide an acoustic signal analysis device and an acoustic signal analysis program in which the accuracy of estimating a beat point, code progress, and the position of a measure line.SOLUTION: An acoustic signal of a music is taken in, and a first beat point candidate series OR1 and a second beat point candidate series OR2 out of beat by half with the first beat point candidate series OR1 are detected. Next, a first beat synchronous type code feature amount series CR1 and a second beat synchronous type code feature amount series CR2 representing the feature of code between two adjacent beat point candidates of the first beat point candidate series and second beat point candidate series are calculated. From among probability models representing the code progress of the music, probability models in which the first beat synchronous type code feature amount series CR1 and the second beat synchronous type code feature amount series CR2 satisfy a prescribed standard are selected one for each, and the beat point, code progress, and the position of a measure line of the music are estimated on the basis of one of the two selected probability models that has greater likelihood.

Description

本発明は、楽曲を表わす音響信号を分析して、楽曲における拍点（拍のタイミング）及び楽曲の各区間で発音されるコード（和音）を検出する音響信号分析装置及び音響信号分析プログラムに関する。 The present invention relates to an acoustic signal analysis apparatus and an acoustic signal analysis program for analyzing a sound signal representing a music and detecting a beat point (beat timing) in the music and a chord (chord) generated in each section of the music.

従来から、例えば、下記非特許文献１に示されているように、楽曲における拍点、及び楽曲の各区間で発音されるコード（和音）を検出する音響信号分析装置は知られている。この音響信号分析装置においては、まず、音響信号を分析して、楽曲の拍点を決定している。そして、前記検出した拍点においてコード変化が生起し、かつ小節の先頭でコード変化が生起するという仮定の下で、楽曲の各区間のコード及び小節線の位置を検出している。 2. Description of the Related Art Conventionally, for example, as shown in Non-Patent Document 1, an acoustic signal analyzer that detects beat points in music and chords (chords) generated in each section of the music is known. In this acoustic signal analyzer, first, an acoustic signal is analyzed to determine a beat point of a music piece. Then, under the assumption that a chord change occurs at the detected beat point and a chord change occurs at the beginning of the bar, the chord and bar line positions of each section of the music are detected.

Ｍ．Ｇｏｔｏｅｔａｌ．、“ＳＯＮＧＬＥ：ＡＷＥＢＳＥＲＶＩＣＥＦＯＲＡＣＴＩＶＥＭＵＳＩＣＬＩＳＴＥＮＩＮＧＩＭＰＲＯＶＥＤＢＹＵＳＥＲＣＯＮＴＲＩＢＵＴＩＯＮＳ”、ＩＳＭＩＲ、２０１１、ｐ．３１１−３１６M.M. Goto et al. “SONGLE: A WEB SERVICE FOR ACTIVE MUSIC LISTENING IMPROVED BY USER CONTRIBUTIONS”, ISMIR, 2011, p. 311-316

上記非特許文献１には、誤って裏拍を拍点（表拍）として選択してしまう可能性や、楽曲のテンポが真のテンポの倍のテンポとなるような拍点を選択してしまう可能性を考慮して、尤もらしい拍点を選択すると記載されているが、その選択手段については具体的には開示されていない。また、裏拍を拍点（表拍）として選択した場合や、楽曲のテンポが真のテンポの倍のテンポとなるような拍点を選択した場合には、コードの検出精度及び小節線の位置の検出精度が低下する。 In Non-Patent Document 1, there is a possibility that the back beat is selected as a beat point (front beat) by mistake, or a beat point at which the tempo of the music is a tempo that is double the true tempo is selected. Although it is described that a possible beat point is selected in consideration of the possibility, the selection means is not specifically disclosed. In addition, when the back beat is selected as the beat (front beat), or when the beat is selected so that the tempo of the music is double the true tempo, the chord detection accuracy and bar line position The accuracy of detection decreases.

本発明は上記問題に対処するためになされたもので、その目的は、拍点、コード進行、及び小節線の位置の推定精度を向上させた音響信号分析装置及び音響信号分析プログラムを提供することにある。なお、下記本発明の各構成要件の記載においては、本発明の理解を容易にするために、実施形態の対応箇所の符号を括弧内に記載しているが、本発明の各構成要件は、実施形態の符号によって示された対応箇所の構成に限定解釈されるべきものではない。 The present invention has been made to address the above problems, and an object thereof is to provide an acoustic signal analysis apparatus and an acoustic signal analysis program that improve the estimation accuracy of beat points, chord progressions, and bar line positions. It is in. In addition, in the description of each constituent element of the present invention below, in order to facilitate understanding of the present invention, reference numerals of corresponding portions of the embodiment are described in parentheses, but each constituent element of the present invention is The present invention should not be construed as being limited to the configurations of the corresponding portions indicated by the reference numerals of the embodiments.

上記目的を達成するために、本発明の特徴は、分析対象としての楽曲の演奏音を表わす音響信号を取り込む音響信号取得手段（Ｓ１１）と、前記取り込んだ音響信号に基づいて、前記楽曲の複数の拍点候補からなる第１拍点候補系列（ＯＲ１）、及び前記第１拍点候補系列を構成する複数の拍点候補に対してそれぞれ半拍分ずれた複数の拍点候補からなる第２拍点候補系列（ＯＲ２）を検出する拍点候補系列検出手段（Ｓ１５）と、前記取り込んだ音響信号に基づいて、前記第１拍点系列を構成する複数の拍点候補のうちの隣り合う２つの拍点候補の間のコードの特徴をそれぞれ表わすビート同期型コード特徴量から構成された第１ビート同期型コード特徴量系列（ＣＲ１）を計算するとともに、前記第２拍点系列を構成する複数の拍点候補のうちの隣り合う２つの拍点候補の間のコードの特徴をそれぞれ表わすビート同期型コード特徴量から構成された第２ビート同期型コード特徴量系列（ＣＲ２）を計算するビート同期型コード特徴量系列計算手段（Ｓ１６）と、前記楽曲のコード進行を表わす確率モデルであって、１小節内の拍数、前記楽曲の調及び最初の拍点の拍子位置の組み合わせに応じて拍点間におけるコードの遷移確率が設定された確率モデルのうち、前記第１ビート同期型コード特徴量系列が所定の基準を満たす確率モデル及び前記第２ビート同期型コード特徴量系列が所定の基準を満たす確率モデルをそれぞれ１つずつ選択し、前記選択した２つの確率モデルのうち尤度が大きい確率モデルに基づいて、前記楽曲の拍点、コード進行、及び小節線の位置を推定する推定手段（Ｓ１７〜Ｓ２１）と、を備えた音響信号分析装置としたことにある。 In order to achieve the above object, the present invention is characterized in that an acoustic signal acquisition means (S11) that captures an acoustic signal representing a performance sound of a musical piece to be analyzed, and a plurality of musical pieces based on the captured acoustic signal. A first beat point candidate series (OR1) consisting of a plurality of beat point candidates, and a second beat consisting of a plurality of beat point candidates each shifted by half a beat with respect to the plurality of beat point candidates constituting the first beat point candidate series. A beat point candidate series detecting means (S15) for detecting a beat point candidate series (OR2) and two adjacent beat points among a plurality of beat point candidates constituting the first beat point series on the basis of the acquired acoustic signal. A first beat-synchronized chord feature amount sequence (CR1) composed of beat-synchronized chord feature amounts each representing a chord feature between two beat point candidates is calculated, and a plurality of the second beat point sequences are constructed. Beat point candidates Beat-synchronized chord feature amount sequence for calculating a second beat-synchronized chord feature amount sequence (CR2) composed of beat-synchronized chord feature amounts each representing a chord feature between two adjacent beat point candidates. A calculation means (S16) and a probability model representing chord progression of the music piece, and the chords between beat points according to the combination of the number of beats within one measure, the key of the music piece, and the beat position of the first beat point. Among the probability models in which transition probabilities are set, a probability model in which the first beat-synchronized chord feature amount sequence satisfies a predetermined criterion and a probability model in which the second beat-synchronized chord feature amount sequence satisfies a predetermined criterion, respectively Select one by one and estimate the beat point, chord progression, and bar line position of the song based on the probability model with the highest likelihood of the two selected probability models And estimating means (S17~S21), in that the sound signal analysis device provided with a.

この場合、前記ビート同期型コード特徴量計算手段は、前記隣り合う２つの拍点候補の間に位置する複数の区間（ｔ_ｉ）ごとにコードの特徴量を表すコード特徴量（ＸＣ）を計算するコード特徴量計算手段と、前記隣り合う２つの拍点候補の間に位置する複数の区間のコード特徴量を平滑化することにより、前記ビート同期型コード特徴量を計算するコード特徴量平滑化手段と、を備えるとよい。 In this case, the beat-synchronized chord feature amount calculating means calculates a chord feature amount (XC) representing a chord feature amount for each of a plurality of sections (t _i ) located between the two adjacent beat point candidates. Code feature amount calculating means for calculating the beat-synchronized chord feature amount by smoothing chord feature amounts of a plurality of sections located between the two adjacent beat point candidates Means.

また、この場合、拍点候補系列検出手段は、前記楽曲の各区間における拍の存在に関する特徴を表わす第１特徴量及びテンポに関する特徴を表わす第２特徴量を計算する拍・テンポ特徴量計算手段と、前記楽曲の各区間における拍の存在に関する物理量及びテンポに関する物理量の組み合わせにより分類された状態の系列として記述された複数の確率モデルのうち、前記第１特徴量及び前記第２特徴量が前記楽曲の各区間において同時に観測される確率を表わす観測尤度の系列が所定の基準を満たす確率モデルを選択することにより、前記楽曲における拍点及びテンポの推移を同時に推定する拍点・テンポ推定手段と、を備えるとよい。 In this case, the beat point candidate series detecting means calculates beat / tempo feature quantity calculating means for calculating a first feature quantity representing a feature relating to the presence of a beat in each section of the music and a second feature quantity representing a feature relating to the tempo. And among the plurality of probability models described as a series of states classified by combinations of physical quantities related to the presence of beats and physical quantities related to tempo in each section of the music, the first feature quantity and the second feature quantity are Beat point and tempo estimation means for simultaneously estimating beat points and tempo transitions in the music piece by selecting a probability model in which a series of observation likelihoods representing the probability observed simultaneously in each section of the music piece satisfies a predetermined criterion It is good to provide.

一般に、コードの変化は、表拍で生起する可能性が高い。そのため、誤って裏拍を表拍として選択してしまった場合には、拍点と拍点の間においてコードが変化する可能性が高い。そのため、この場合、ビート同期型コード特徴量は、コードの特徴を的確に表現できていない。つまり、裏拍を表拍として選択してしまった場合には、真の拍点（つまり表拍）を選択した場合に比べて、ビート同期型コード特徴量系列の尤度が低くなる。そこで、本発明に係る音響信号分析装置は、第１拍点候補系列と、第１拍点候補系列を構成する各拍点候補に対して半拍分ずれた拍点候補からなる第２拍点候補系列を検出し、第１拍点候補系列及び第２拍点候補系列に関するビート同期型コード特徴量系列の尤度をそれぞれ計算する。そして、両尤度を比較して、尤度の高い拍点候補系列を選択する。これにより、裏拍を誤って表拍として選択してしまうことを抑制できる。また、ビート同期型コード特徴量系列の尤度が最も高くなるような拍子位置及び調の組み合わせが選択されて、楽曲における拍点、コード進行及び小節線の位置が同時に（一体的）に推定される。したがって、本発明に係る音響信号分析装置によれば、拍点、コード進行、及び小節線の位置の推定精度を従来よりも向上させることができる。 In general, chord changes are more likely to occur at the beat. For this reason, if the back beat is selected as the front beat by mistake, there is a high possibility that the chord changes between beat points. Therefore, in this case, the beat synchronization type chord feature amount cannot accurately represent the chord feature. In other words, when the back beat is selected as the front beat, the likelihood of the beat-synchronized code feature quantity sequence is lower than when the true beat point (ie, the front beat) is selected. Therefore, the acoustic signal analysis apparatus according to the present invention includes a first beat point candidate series and a second beat point composed of beat point candidates shifted by half a beat with respect to each beat point candidate constituting the first beat point candidate series. Candidate sequences are detected, and the likelihoods of the beat-synchronized code feature amount sequences relating to the first beat point candidate sequence and the second beat point candidate sequence are calculated. Then, both likelihoods are compared, and a beat point candidate series having a high likelihood is selected. Thereby, it can suppress selecting a back beat accidentally as a front beat. Also, the beat position and key combination that maximizes the likelihood of the beat-synchronized chord feature quantity sequence is selected, and the beat point, chord progression, and bar line position in the music are estimated simultaneously (integrally). The Therefore, according to the acoustic signal analysis device of the present invention, it is possible to improve the estimation accuracy of beat points, chord progressions, and bar line positions as compared to the conventional art.

また、本発明は、音響信号分析装置が備えるコンピュータに適用されるコンピュータプログラムとしても実施可能である。 The present invention can also be implemented as a computer program applied to a computer provided in the acoustic signal analyzer.

本発明の一実施形態に係る音響信号分析装置の構成を表わすブロック図である。It is a block diagram showing the structure of the acoustic signal analyzer which concerns on one Embodiment of this invention. 拍点・コード推定処理を表わすフローチャートである。It is a flowchart showing a beat point and chord estimation process. 分析対象の音響信号の波形を表わすグラフである。It is a graph showing the waveform of the acoustic signal to be analyzed. 拍点候補を計算するための確率モデルの概念図である。It is a conceptual diagram of the probability model for calculating a beat point candidate. コムフィルタのブロック図である。It is a block diagram of a comb filter. ＢＰＭ特徴量の計算結果を示すグラフである。It is a graph which shows the calculation result of a BPM feature-value. テンプレートの構成を示す表である。It is a table | surface which shows the structure of a template. コード特徴量の概念図である。It is a conceptual diagram of a code feature amount. ビート同期型コード特徴量の概念図である。It is a conceptual diagram of a beat synchronous chord feature amount.

本発明の一実施形態に係る音響信号分析装置１０について説明する。音響信号分析装置１０は、以下説明するように、楽曲を表わす音響信号を取り込んで、その楽曲における拍点及びテンポの推移を検出する。音響信号分析装置１０は、図１に示すように、入力操作子１１、コンピュータ部１２、表示器１３、記憶装置１４、外部インターフェース回路１５及びサウンドシステム１６を備えており、これらがバスＢＳを介して接続されている。 An acoustic signal analyzer 10 according to an embodiment of the present invention will be described. As will be described below, the acoustic signal analysis apparatus 10 takes in an acoustic signal representing music and detects transitions in beat points and tempos in the music. As shown in FIG. 1, the acoustic signal analyzer 10 includes an input operator 11, a computer unit 12, a display 13, a storage device 14, an external interface circuit 15, and a sound system 16, which are connected via a bus BS. Connected.

入力操作子１１は、オン・オフ操作に対応したスイッチ（例えば数値を入力するためのテンキー）、回転操作に対応したボリューム又はロータリーエンコーダ、スライド操作に対応したボリューム又はリニアエンコーダ、マウス、タッチパネルなどから構成される。これらの操作子は、演奏者の手によって操作されて、分析対象の楽曲の選択、音響信号の分析開始又は停止、楽曲の再生又は停止（後述するサウンドシステム１６からの出力又は停止）、音響信号の分析に関する各種パラメータの設定などに用いられる。入力操作子１１を操作すると、その操作内容を表す操作情報が、バスＢＳを介して、後述するコンピュータ部１２に供給される。 The input operator 11 includes a switch corresponding to an on / off operation (for example, a numeric keypad for inputting a numerical value), a volume or rotary encoder corresponding to a rotation operation, a volume or linear encoder corresponding to a slide operation, a mouse, a touch panel, etc. Composed. These operators are operated by the performer's hand to select the music to be analyzed, start or stop the analysis of the sound signal, play or stop the music (output or stop from the sound system 16 described later), sound signal It is used to set various parameters related to the analysis. When the input operator 11 is operated, operation information indicating the operation content is supplied to the computer unit 12 described later via the bus BS.

コンピュータ部１２は、バスＢＳにそれぞれ接続されたＣＰＵ１２ａ、ＲＯＭ１２ｂ及びＲＡＭ１２ｃからなる。ＣＰＵ１２ａは、詳しくは後述する音響信号分析プログラム及びそのサブルーチンをＲＯＭ１２ｂから読み出して実行する。ＲＯＭ１２ｂには、音響信号分析プログラム及びそのサブルーチンに加えて、初期設定パラメータ、表示器１３に表示される画像を表わす表示データを生成するための図形データ及び文字データなどの各種データが記憶されている。ＲＡＭ１２ｃには、音響信号分析プログラムの実行時に必要なデータが一時的に記憶される。 The computer unit 12 includes a CPU 12a, a ROM 12b, and a RAM 12c connected to the bus BS. The CPU 12a reads an acoustic signal analysis program and its subroutine, which will be described later in detail, from the ROM 12b and executes them. In addition to the acoustic signal analysis program and its subroutine, the ROM 12b stores various data such as initial setting parameters, graphic data for generating display data representing an image displayed on the display 13, and character data. . The RAM 12c temporarily stores data necessary for executing the acoustic signal analysis program.

表示器１３は、液晶ディスプレイ（ＬＣＤ）によって構成される。コンピュータ部１２は、図形データ、文字データなどを用いて表示すべき内容を表わす表示データを生成して表示器１３に供給する。表示器１３は、コンピュータ部１２から供給された表示データに基づいて画像を表示する。例えば分析対象の楽曲の選択時には、楽曲のタイトルリストが表示される。また、例えば分析終了時には、拍点及び小節線を表わすグラフやコード進行を表わすコード名の系列が表示される。 The display 13 is configured by a liquid crystal display (LCD). The computer unit 12 generates display data representing contents to be displayed using graphic data, character data, and the like, and supplies the display data to the display unit 13. The display device 13 displays an image based on the display data supplied from the computer unit 12. For example, when selecting a song to be analyzed, a title list of songs is displayed. For example, at the end of the analysis, a graph representing beat points and bar lines and a chord name series representing chord progression are displayed.

また、記憶装置１４は、ＨＤＤ、ＦＤＤ、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤなどの大容量の不揮発性記録媒体と、同各記録媒体に対応するドライブユニットから構成されている。記憶装置１４には、複数の楽曲をそれぞれ表わす複数の楽曲データが記憶されている。楽曲データは、楽曲を所定のサンプリング周期（例えば１／４４１００秒）でサンプリングして得られた複数のサンプル値からなり、各サンプル値が記憶装置１４における連続するアドレスに順に記録されている。楽曲のタイトルを表わすタイトル情報、楽曲データの容量を表わすデータサイズ情報なども楽曲データに含まれている。楽曲データは予め記憶装置１４に記憶されていてもよいし、後述する外部インターフェース回路１５を介して外部機器から取り込んでもよい。記憶装置１４に記憶されている楽曲データは、ＣＰＵ１２ａによって読み込まれ、楽曲における拍点及びテンポの推移が分析される。 The storage device 14 includes a large-capacity nonvolatile recording medium such as an HDD, FDD, CD-ROM, MO, and DVD, and a drive unit corresponding to each recording medium. The storage device 14 stores a plurality of pieces of music data representing a plurality of pieces of music. The music data is composed of a plurality of sample values obtained by sampling the music at a predetermined sampling period (for example, 1/444100 seconds), and each sample value is sequentially recorded at successive addresses in the storage device 14. Title information representing the title of the song, data size information representing the capacity of the song data, and the like are also included in the song data. The music data may be stored in advance in the storage device 14, or may be taken in from an external device via the external interface circuit 15 described later. The music data stored in the storage device 14 is read by the CPU 12a, and the transition of beat points and tempo in the music is analyzed.

外部インターフェース回路１５は、音響信号分析装置１０を電子音楽装置、パーソナルコンピュータなどの外部機器に接続可能とする接続端子を備えている。音響信号分析装置１０は、外部インターフェース回路１５を介して、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、インターネットなどの通信ネットワークにも接続可能である。 The external interface circuit 15 includes a connection terminal that enables the acoustic signal analyzer 10 to be connected to an external device such as an electronic music device or a personal computer. The acoustic signal analyzer 10 can be connected to a communication network such as a LAN (Local Area Network) or the Internet via the external interface circuit 15.

サウンドシステム１６は、楽曲データをアナログ音信号に変換するＤ／Ａ変換器、変換したアナログ音信号を増幅するアンプ、及び増幅されたアナログ音信号を音響信号に変換して出力する左右一対のスピーカを備えている。ユーザが入力操作子１１を用いて分析対象の楽曲の再生を指示すると、ＣＰＵ１２ａは、分析対象の楽曲データをサウンドシステム１６に供給する。これにより、ユーザは分析対象の楽曲を試聴できる。 The sound system 16 includes a D / A converter that converts music data into an analog sound signal, an amplifier that amplifies the converted analog sound signal, and a pair of left and right speakers that convert the amplified analog sound signal into an acoustic signal and output it. It has. When the user uses the input operator 11 to instruct the reproduction of the music to be analyzed, the CPU 12a supplies the music data to be analyzed to the sound system 16. Thereby, the user can audition the music to be analyzed.

つぎに、音響信号分析装置１０の動作について具体的に説明する。ユーザが音響信号分析装置１０の図示しない電源スイッチをオンにすると、ＣＰＵ１２ａは、図２に示す拍点・コード推定プログラムをＲＯＭ１２ｂから読み出して実行する。なお、図２においては、「判断」のステップを六角形で示す。 Next, the operation of the acoustic signal analyzer 10 will be specifically described. When the user turns on a power switch (not shown) of the acoustic signal analyzer 10, the CPU 12a reads the beat point / code estimation program shown in FIG. 2 from the ROM 12b and executes it. In FIG. 2, the “judgment” step is indicated by a hexagon.

ＣＰＵ１２ａは、ステップＳ１０にて拍点・コード推定処理を開始し、ステップＳ１１にて、記憶装置１４に記憶されている複数の楽曲データにそれぞれ含まれるタイトル情報を読み込んで、楽曲のタイトルをリスト形式で表示器１３に表示する。ユーザは、入力操作子１１を用いて、表示器１３に表示された楽曲の中から分析対象の楽曲データを選択する。なお、ステップＳ１１にて分析対象の楽曲データを選択する際、選択しようとする楽曲データが表す楽曲の一部又は全部を再生して楽曲データの内容を確認できるように構成してもよい。 The CPU 12a starts beat point / code estimation processing at step S10, reads title information included in each of a plurality of music data stored in the storage device 14 at step S11, and lists the titles of the music in a list format. Is displayed on the display 13. The user uses the input operator 11 to select music data to be analyzed from the music displayed on the display 13. In addition, when selecting the music data of analysis object in step S11, you may comprise so that the content of music data can be confirmed by reproducing | regenerating part or all of the music which the music data to select selects.

つぎに、ＣＰＵ１２ａは、ステップＳ１２にて、音響信号分析のための初期設定を実行する。具体的には、前記選択された楽曲データのデータサイズ情報に応じた記憶領域をＲＡＭ１２ｃ内に確保し、前記確保した記憶領域に前記選択された楽曲データを読み込む。また、後述するオンセット特徴量ＸＯ、ＢＰＭ特徴量ＸＢなどを一時的に記憶する領域をＲＡＭ１２ｃ内に確保する。また、ユーザは、前記選択した楽曲の拍子（又は１小節内に含まれる拍数）を、入力操作子１１を用いて入力する。つまり、本実施形態においては、前記選択した楽曲の拍子（又は１小節内に含まれる拍数）が既知であると仮定する。 Next, CPU12a performs the initial setting for an acoustic signal analysis in step S12. Specifically, a storage area corresponding to the data size information of the selected music data is secured in the RAM 12c, and the selected music data is read into the secured storage area. In addition, an area for temporarily storing later-described onset feature amounts XO, BPM feature amounts XB, and the like is secured in the RAM 12c. In addition, the user inputs the time signature (or the number of beats included in one measure) of the selected music using the input operator 11. That is, in this embodiment, it is assumed that the time signature (or the number of beats included in one measure) of the selected music is known.

ＣＰＵ１２ａは、ステップＳ１３にて、図３に示すように、前記選択された楽曲を所定の時間間隔をおいて区切り、複数のフレームｔ_ｉ｛ｉ＝０，１，・・・，Ｉ｝に分割する。各フレームの長さは共通である。説明を簡単にするために、本実施形態では各フレームの長さを１２５ｍｓとする。上記のように、各楽曲のサンプリング周期は１／４４１００秒であるので、各フレームは、約５０００個のサンプル値から構成されている。 In step S13, the CPU 12a divides the selected music piece at predetermined time intervals and divides it into a plurality of frames t _i {i = 0, 1,..., I} as shown in FIG. To do. The length of each frame is common. In order to simplify the explanation, in this embodiment, the length of each frame is set to 125 ms. As described above, since the sampling period of each piece of music is 1/444100 seconds, each frame is composed of about 5000 sample values.

次に、ＣＰＵ１２ａは、複数の拍点候補から構成される拍点候補系列ＯＲ１及び拍点候補系列ＯＲ２を計算する。ここで、拍点候補系列ＯＲ１及び拍点候補系列ＯＲ２の計算手順の概略を説明する。まず、拍点候補系列ＯＲ１が次のようにして計算される。拍の存在に関する特徴を表すオンセット特徴量ＸＯ及びテンポに関する特徴を表すＢＰＭ（ｂｅａｔｓｐｅｒｍｉｎｕｔｅ（１分間あたりの拍数））特徴量ＸＢをフレームｔ_ｉごとに計算する。そして、各フレームｔ_ｉにおける拍周期ｂの値（テンポの逆数に比例する値）及び次の拍までのフレーム数ｎの値の組み合わせに応じて分類された状態ｑ_ｂ，ｎの系列として記述された確率モデル（隠れマルコフモデル）のうち、観測値としてのオンセット特徴量ＸＯ及びＢＰＭ特徴量ＸＢが同時に観測される確率を表わす観測尤度の系列が最も尤もらしい確率モデルを選択する（図４参照）。これにより、分析対象の楽曲における拍点候補系列ＯＲ１が検出される。そして、前記検出された拍点候補系列ＯＲ１を用いて、拍点候補系列ＯＲ１を構成する複数の拍点に対して半拍ずれた複数の拍点候補から構成された拍点候補系列ＯＲ２が計算される（図９参照）。なお、拍周期ｂは、フレームの数によって表わされる。したがって、拍周期ｂの値は「１≦ｂ≦ｂ_ｍａｘ」を満たす整数であり、拍周期ｂの値が「β」である状態では、フレーム数ｎの値は「０≦ｎ＜β」を満たす整数である。 Next, the CPU 12a calculates a beat point candidate series OR1 and a beat point candidate series OR2 composed of a plurality of beat point candidates. Here, the outline of the calculation procedure of the beat point candidate series OR1 and the beat point candidate series OR2 will be described. First, the beat point candidate series OR1 is calculated as follows. An onset feature value XO representing a feature related to the presence of a beat and a BPM (beats per minute) feature value XB representing a feature related to a tempo are calculated for each frame t _i . It is described as a series of states q _{b, n} classified according to the combination of the value of the beat period b in each frame t _i (value proportional to the reciprocal of the tempo) and the value of the number of frames n up to the next beat. Among the obtained probability models (hidden Markov models), a probability model in which the series of observation likelihoods representing the probability that the onset feature quantity XO and the BPM feature quantity XB as observation values are observed simultaneously is most likely (FIG. 4). reference). Thereby, the beat point candidate series OR1 in the music to be analyzed is detected. Then, using the detected beat point candidate series OR1, a beat point candidate series OR2 composed of a plurality of beat point candidates shifted by half a beat from the plurality of beat points constituting the beat point candidate series OR1 is calculated. (See FIG. 9). The beat period b is represented by the number of frames. Therefore, the value of the beat period b is an integer satisfying “1 ≦ b ≦ b _max ”, and in the state where the value of the beat period b is “β”, the value of the number of frames n is “0 ≦ n <β”. It is an integer that satisfies.

次に、拍点候補系列ＯＲ１及び拍点候補系列ＯＲ２の計算手順について具体的に説明する。まず、ＣＰＵ１２ａは、ステップＳ１４にて、フレームごとに、オンセット特徴量ＸＯ及びＢＰＭ特徴量ＸＢを計算する。 Next, the calculation procedure of the beat point candidate series OR1 and the beat point candidate series OR2 will be specifically described. First, in step S14, the CPU 12a calculates an onset feature value XO and a BPM feature value XB for each frame.

フレームｔ_ｉのオンセット特徴量ＸＯ（ｔ_ｉ）は、次のようにして計算される。ＣＰＵ１２ａは、まず、フレームごとに短時間フーリエ変換を実行し、各周波数ビンの信号強度を計算する。次に、ＣＰＵ１２ａは、メルフィルタバンクを用いて、各周波数帯域ｆｂ_ｘ（例えば、ｘ＝１，２，・・・，２０）の信号強度Ｍ（ｆｂ_ｘ，ｔ_ｉ）を計算する。次に、ＣＰＵ１２ａは、フレーム間における各周波数帯域の信号強度の増加量Ｒ（ｆｂ_ｘ，ｔ_ｉ）を計算する。下記の式（１）に示すように、フレーム間における前記各周波数帯域の信号強度の増加量の総和がオンセット特徴量ＸＯ（ｔ_ｉ）である。

The onset feature value XO (t _i ) of the frame t _i is calculated as follows. First, the CPU 12a performs a short-time Fourier transform for each frame to calculate the signal intensity of each frequency bin. Next, the CPU 12a calculates the signal intensity M (fb _x , t _i ) of each frequency band fb _x (for example, x = 1, 2,..., 20) using the mel filter bank. Next, the CPU 12a calculates an increase amount R (fb _x , t _i ) of the signal strength in each frequency band between frames. As shown in the following equation (1), the sum of the increase amounts of the signal strength of each frequency band between frames is the onset feature amount XO (t _i ).

フレームｔ_ｉのＢＰＭ特徴量ＸＢ（ｔ_ｉ）は、次のようにして計算される。ＣＰＵ１２ａは、まず、オンセット特徴量ＸＯ（ｔ_０），ＸＯ（ｔ_１）・・・をこの順にフィルタバンクＦＢＢ（図５参照）に入力する。フィルタバンクＦＢＢは、拍周期ｂの値に応じてそれぞれ設けられた複数のコムフィルタＣＦ_ｂからなる。コムフィルタＣＦ_ｂは、１つのデータが入力される度に１つのデータを出力する。コムフィルタＣＦ_ｂは、過去の出力データを拍周期ｂの値に応じた個数だけ記憶するＦＩＦＯ（＝ＦｉｒｓｔＩｎＦｉｒｓｔＯｕｔ）メモリを有しており、入力されたデータと前記記憶手段に記憶されているデータのうちの最古のデータを所定の比率（例えば、１：１（すなわち、α＝０．５））で加算して出力する。オンセット特徴量ＸＯの系列ＸＯ（ｔ）｛＝ＸＯ（ｔ_０），ＸＯ（ｔ_１）・・・｝をフィルタバンクＦＢＢに入力することにより得られたデータＸＤ_ｂの系列ＸＤ_ｂ（ｔ）｛＝ＸＤ_ｂ（ｔ_０），ＸＤ_ｂ（ｔ_１）・・・を時系列的に逆にして、フィルタバンクＦＢＢに再度入力することにより、拍周期ｂに関するＢＰＭ特徴量の系列ＸＢ_ｂ（ｔ）｛＝ＸＢ_ｂ（ｔ_０），ＸＢ_ｂ（ｔ_１）・・・｝が得られる。フレームｔ_ｉのＢＰＭ特徴量ＸＢ（ｔ_ｉ）は、拍周期ｂごとに計算されたＢＰＭ特徴量ＸＢ_{ｂ＝１，２・・・}（ｔ_ｉ）の集合として表わされる（図６参照）。 The BPM feature value XB (t _i ) of the frame t _i is calculated as follows. First, the CPU 12a inputs onset feature values XO (t ₀ ), XO (t ₁ ),... In this order to the filter bank FBB (see FIG. 5). Filter bank FBB is composed of a plurality of comb filters CF _b respectively provided in accordance with the value of the beat period b. The comb filter CF _b outputs one data every time one data is input. The comb filter CF _b has a FIFO (= First In First Out) memory for storing past output data by the number corresponding to the value of the beat period b, and is stored in the storage means with the input data. The oldest data among the existing data is added at a predetermined ratio (for example, 1: 1 (that is, α = 0.5)) and output. The sequence XD _b (t) of the data XD _b obtained by inputting the sequence XO (t) {= XO (t ₀ ), XO (t ₁ )...} Of the onset feature quantity XO to the filter bank FBB. By reversing {= XD _b (t ₀ ), XD _b (t ₁ )... In time series and inputting them again into the filter bank FBB, the BPM feature quantity series XB _b (t ) {= XB _b (t ₀ ), XB _b (t ₁ ). The BPM feature value XB (t _i ) of the frame t _i is represented as a set of BPM feature values XB _{b = 1, 2...} (T _i ) calculated for each beat period b (see FIG. 6).

次に、ＣＰＵ１２ａは、ステップＳ１５にて、ビタビアルゴリズムを用いて、最尤の状態系列を推定する。これにより、拍点候補系列ＯＲ１が推定される。なお、拍点候補系列ＯＲ１が推定される際、拍周期ｂの値の系列（つまり、テンポの推移）も同時（一体的）に推定される。 Next, in step S15, the CPU 12a estimates the maximum likelihood state sequence using the Viterbi algorithm. Thereby, the beat point candidate series OR1 is estimated. When the beat point candidate series OR1 is estimated, the series of values of the beat period b (that is, tempo transition) is also estimated simultaneously (integrally).

具体的には、ＣＰＵ１２ａは、まず、オンセット特徴量ＸＯ（ｔ_ｉ）及びＢＰＭ特徴量ＸＢ（ｔ_ｉ）の観測尤度ＬＯ（ｔ_ｉ）及び観測尤度ＬＢ（ｔ_ｉ）をそれぞれ計算する。ここで、オンセット特徴量ＸＯ（ｔ_ｉ）は、次の拍点までのフレーム数ｎの値に応じて設定された正規分布に従うものとする。つまり、オンセット特徴量ＸＯの観測尤度ＬＯ（ｔ_ｉ）は、次の拍点までのフレーム数ｎの値に応じて設定された正規分布の確率変数としてオンセット特徴量ＸＯを代入することにより計算される。例えば、フレーム数ｎの値が「０」であるときは、平均値が「３」であって、且つ分散が「１」である正規分布が用いられる。また、拍周期ｂの値が「β」であって、フレーム数ｎの値が「β／２」であるときは、平均値が「１」であって、且つ分散が「１」である正規分布が用いられる。また、フレーム数ｎの値が「０」及び「β／２」のいずれの値とも異なるとき、平均値が「０」であって、且つ分散が「１」である正規分布が用いられる。 Specifically, CPU 12a first calculates onset feature quantity XO _{(t i)} and BPM feature value XB observation likelihood LO _{(t i)} of _{(t i)} and the observation likelihood LB a _{(t i),} respectively . Here, the onset feature amount XO (t _i ) follows a normal distribution set according to the value of the number of frames n up to the next beat point. In other words, the observation likelihood LO (t _i ) of the onset feature quantity XO substitutes the onset feature quantity XO as a normal distribution random variable set according to the value of the number of frames n up to the next beat point. Is calculated by For example, when the value of the number of frames n is “0”, a normal distribution having an average value of “3” and a variance of “1” is used. Further, when the value of the beat period b is “β” and the value of the number of frames n is “β / 2”, the normal value is “1” and the variance is “1”. A distribution is used. Further, when the value of the number of frames n is different from both “0” and “β / 2”, a normal distribution having an average value of “0” and a variance of “1” is used.

また、ＢＰＭ特徴量ＸＢ（ｔ_ｉ）の観測尤度ＬＢは、拍周期ｂごとに設けられたテンプレートＴＭＰ_ｂに対するＢＰＭ特徴量ＸＢの適合度に相当する。つまり、下記の式（２）に示すように、テンプレートＴＭＰ_ｂとＢＰＭ特徴量ＸＢ（ｔ_ｉ）の内積が、観測尤度ＬＢ（ｔ_ｉ）である。なお、この演算式における「ν_ｂ」は、オンセット特徴量ＸＯ（ｔ_ｉ）に対するＢＰＭ特徴量ＸＢ（ｔ_ｉ）の重みを決定する係数である。つまり、「ν_ｂ」を大きく設定するほど、結果的に、ＢＰＭ特徴量ＸＢ（ｔ_ｉ）が重視される。また、この演算式におけるＺ（ν_ｂ）は、「ν_ｂ」に依存する正規化係数である。つまり、前記テンプレートＴＭＰ_ｂは、ＢＰＭ特徴量ＸＢ（ｔ_ｉ）を構成するＢＰＭ特徴量ＸＢ_ｂ（ｔ_ｉ）にそれぞれ乗算される係数δ_{ｂ，γ｛＝１，２・・・｝}の系列からなる（図７参照）。テンプレートＴＭＰ_ｂを構成する係数δ_ｂ，γのうち、インデックスγが拍周期ｂに等しい係数及び拍周期ｂの整数倍に等しい係数が極大となるように、テンプレートＴＭＰ_ｂが設定されている。

Further, the observation likelihood LB of the BPM feature quantity XB (t _i ) corresponds to the adaptability of the BPM feature quantity XB to the template TMP _b provided for each beat period b. That is, as shown in the following equation (2), the inner product of the template TMP _b and the BPM feature quantity XB (t _i ) is the observation likelihood LB (t _i ). Note that “ν _b ” in this arithmetic expression is a coefficient that determines the weight of the BPM feature quantity XB (t _i ) with respect to the onset feature quantity XO (t _i ). That is, the larger the value of “ν _b ” is, the greater the importance is placed on the BPM feature value XB (t _i ). Further, Z (ν _b ) in this arithmetic expression is a normalization coefficient that depends on “ν _b ”. In other words, the template TMP _b, the coefficient [delta] _b, which are respectively multiplied to the BPM feature value XB _{(t i)} constituting the BPM feature value _XB b _{(t _i),} from the series of _{γ {= 1,2 ···}} (See FIG. 7). Coefficient [delta] _b constituting the template TMP _{_b,} of _gamma, as a factor equal to an integer multiple of the index gamma is the beat period b equal to the coefficient and the beat period b is maximum, template TMP _b is set.

次に、ＣＰＵ１２ａは、観測尤度ＬＯ（ｔ_ｉ）と観測尤度ＬＢ（ｔ_ｉ）との積の対数である対数観測尤度ＬＯＢ（ｔ_ｉ）（下記の式（３）参照）を用いて、尤度が最大となる状態系列を計算する。この最尤の状態系列の計算においては、ビタビアルゴリズムを用いる。

Next, the CPU 12a uses a logarithmic observation likelihood LOB (t _i ) (see Equation (3) below) that is a logarithm of the product of the observation likelihood LO (t _i ) and the observation likelihood LB (t _i ). Then, the state series having the maximum likelihood is calculated. In calculating the maximum likelihood state sequence, a Viterbi algorithm is used.

なお、本実施形態においては、（拍周期ｂの値が「βｓ」であり、且つフレーム数ｎの値が「ηｓ」である状態から、拍周期ｂの値が「βｅ」であり、且つフレーム数ｎの値が「ηｅ」である状態への対数遷移確率Ｔの値は、次のように設定されている（図４参照）。「ηｅ＝０」、「βｅ＝βｓ」、かつ「ηｅ＝βｅ−１」のとき、対数遷移確率Ｔの値は、「−０．２」である。また、「ηｓ＝０」、「βｅ＝βｓ＋１」、かつ「ηｅ＝βｅ−１」のとき、対数遷移確率Ｔの値は、「−０．６」である。また、「ηｓ＝０」、「βｅ＝βｓ−１」、かつ「ηｅ＝βｅ−１」のとき、対数遷移確率Ｔの値は、「−０．６」である。また、「ηｓ＞０」、「βｅ＝βｓ」、かつ「ηｅ＝ηｓ−１」のとき、対数遷移確率Ｔの値は、「０」である。上記以外の対数遷移確率Ｔの値は、「−∞」である。すなわち、フレーム数ｎの値が「０」である状態（ηｓ＝０）から次の状態へ遷移するとき、拍周期ｂの値は「１」だけ増減され得る。このとき、フレーム数ｎの値は、遷移後の拍周期ｂの値より「１」だけ小さい値に設定される。また、フレーム数ｎの値が「０」でない状態（ηｓ≠０）から次の状態へ遷移するとき、拍周期ｂの値は変更されず、フレーム数ｎの値が「１」だけ減少する。 In the present embodiment (from the state where the value of the beat period b is “βs” and the value of the number of frames n is “ηs”, the value of the beat period b is “βe” and the frame The value of the logarithmic transition probability T to the state where the value of the number n is “ηe” is set as follows (see FIG. 4): “ηe = 0”, “βe = βs”, and “ηe” = Βe−1 ”, the value of the logarithmic transition probability T is“ −0.2. ”When“ ηs = 0 ”,“ βe = βs + 1 ”, and“ ηe = βe−1 ”, The value of the logarithmic transition probability T is “−0.6”, and when “ηs = 0”, “βe = βs−1”, and “ηe = βe−1”, the value of the logarithmic transition probability T. Is “−0.6.” When “ηs> 0”, “βe = βs”, and “ηe = ηs−1”, the logarithmic transition probability T is “0”. Logarithm other than the above The value of the transition probability T is “−∞.” That is, when the transition from the state where the value of the number of frames n is “0” (ηs = 0) to the next state, the value of the beat period b is “1”. In this case, the value of the frame number n is set to a value smaller by “1” than the value of the beat period b after the transition. Also, the value of the frame number n is not “0” ( When transitioning from (ηs ≠ 0) to the next state, the value of the beat period b is not changed, and the value of the number of frames n is decreased by “1”.

上記のようにして計算された最尤の状態系列を構成する各状態のうち、フレーム数ｎが「０」である状態の系列が拍点候補系列ＯＲ１である。また、前記最尤の状態系列を構成する各状態のうち、拍周期ｂとフレーム数ｎの値が下記の式（４）を満たす状態の系列が拍点候補系列ＯＲ２である。

Of the states constituting the maximum likelihood state sequence calculated as described above, a sequence in which the number of frames n is “0” is the beat candidate sequence OR1. Of the states constituting the maximum likelihood state sequence, a sequence in which the values of the beat period b and the number of frames n satisfy the following formula (4) is the beat candidate sequence OR2.

これにより、拍点候補系列ＯＲ１を構成する複数の拍点候補から見てそれぞれ半拍分（例えば、４分の４拍子の楽曲においては８分音符の長さ）ずれた複数の拍点候補からなる拍点候補系列ＯＲ２が得られる。 Thereby, from a plurality of beat point candidates shifted from each other by a half beat (for example, the length of an eighth note in the case of a four-beat music) as seen from a plurality of beat point candidates constituting the beat point candidate series OR1. The beat point candidate series OR2 is obtained.

つぎに、ＣＰＵ１２ａは、ステップＳ１６にて、拍点候補系列ＯＲ１を構成する各拍点候補における和音の特徴を表わすビート同期型コード特徴量ＸＢＣ１の系列であるビート同期型コード特徴量系列ＣＲ１と、拍点候補系列ＯＲ２を構成する各拍点候補における和音の特徴を表わすビート同期型コード特徴量ＸＢＣ２の系列であるビート同期型コード特徴量系列ＣＲ２を計算する。 Next, in step S16, the CPU 12a, in step S16, beat-synchronized chord feature amount sequence CR1 that is a sequence of beat-synchronized chord feature amount XBC1 representing chord features in each beat point candidate constituting beat point candidate sequence OR1; A beat-synchronized chord feature amount series CR2 that is a series of beat-synchronized chord feature amounts XBC2 representing the chord features in each beat point candidate constituting the beat point candidate sequence OR2 is calculated.

ビート同期型コード特徴量系列ＣＲ１及びビート同期型コード特徴量系列ＣＲ２は次のようにして計算される。まず、各フレームｔ_ｉの各周波数ビンのパワーを、その周波数に最も近い音高の周波数（例えば平均律における各音高の基本周波数）にマッピングする。上記のようにして各音高にマッピングされたパワーのうち、低音域（例えば「Ｂ１」以下）に属するパワーをピッチクラス（Ｃ，Ｃ＃，Ｄ，・・・，Ｂ＃）ごとに加算（又は積算）する。このようにして計算された各ピッチクラスのパワーからなる１２次元の特徴量をベース特徴量ＨＰＣＰ^（Ｂ）と呼ぶ（図８参照）。また、各音高にマッピングされたパワーのうち、高音域（例えば「Ｃ２」以上）に属するパワーをピッチクラス（Ｃ，Ｃ＃，Ｄ，・・・，Ｂ＃）ごとに加算（又は積算）する。このようにして計算された各ピッチクラスのパワーからなる１２次元の特徴量をトレブル特徴量ＨＰＣＰ^（Ｔ）と呼ぶ。 The beat-synchronized chord feature amount series CR1 and the beat-synchronized chord feature amount series CR2 are calculated as follows. First, the power of each frequency bin of each frame t _i, is mapped to the nearest pitch frequency to the frequency (e.g., the fundamental frequency of each pitch in equal temperament). Of the power mapped to each pitch as described above, the power belonging to the low frequency range (for example, “B1” or lower) is added for each pitch class (C, C #, D,..., B #) ( (Or accumulating). The 12-dimensional feature amount composed of the power of each pitch class calculated in this way is called a base feature amount HPCP ^(B) (see FIG. 8). Further, among the power mapped to each pitch, power belonging to a high pitch range (for example, “C2” or higher) is added (or integrated) for each pitch class (C, C #, D,..., B #). To do. The 12-dimensional feature quantity composed of the power of each pitch class calculated in this way is called a treble feature quantity HPCP ^(T) .

また、低音域のパワーのＬ２ノルムをベースパワーρ^（Ｂ）と呼び、高音域のパワーのＬ２ノルムをトレブルパワーρ^（Ｔ）と呼ぶ。 Further, the L2 norm of the low frequency range power is referred to as base power ρ ^(B), and the L2 norm of the high frequency range power is referred to as treble power ρ ^(T) .

各フレームｔ_ｉに関するベース特徴量ＨＰＣＰ^（Ｂ）、トレブル特徴量ＨＰＣＰ^（Ｔ）、ベースパワーρ^（Ｂ）及びトレブルパワーρ^（Ｔ）からなる２６次元の特徴量をコード特徴量ＸＣ（ｔ_ｉ）と呼ぶ。 A 26-dimensional feature amount consisting of a base feature amount HPCP ^(B) , a treble feature amount HPCP ^(T) , a base power ρ ^(B), and a treble power ρ ^{(T) for} each frame t _i is converted into a code feature amount XC (t _i ). Call it.

ビート同期型コード特徴量ＸＢＣ１（ｍ）は、拍点候補系列ＯＲ１における「ｍ」番目の拍点候補と「ｍ＋１」番目の拍点候補の間のフレームのコード特徴量ＸＣ（ｔ_ｉ）を平滑化することにより得られる２６次元の特徴量である。また、ビート同期型コード特徴量ＸＢＣ２（ｍ）は、拍点候補系列ＯＲ２における「ｍ」番目の拍点候補と「ｍ＋１」番目の拍点候補の間のフレームのコード特徴量ＸＣ（ｔ_ｉ）を平滑化することにより得られる２６次元の特徴量である。なお、上記の平滑化とは、例えば、前記拍点候補間のフレームのベース特徴量ＨＰＣＰ^（Ｂ）の平均を計算するとともに、前記拍点候補間のフレームのトレブル特徴量ＨＰＣＰ^（Ｔ）、ベースパワーρ^（Ｂ）及びトレブルパワーρ^（Ｔ）のメジアンをそれぞれ計算することを意味する。拍点候補系列ＯＲ１を構成する全ての拍点候補に関して、ビート同期型コード特徴量ＸＢＣ１を計算することにより、ビート同期型コード特徴量系列ＣＲ１が計算される。また、拍点候補系列ＯＲ２を構成する全ての拍点候補に関して、ビート同期型コード特徴量ＸＢＣ２を計算することにより、ビート同期型コード特徴量系列ＣＲ２が計算される。 The beat-synchronized code feature value XBC1 (m) smoothes the code feature value XC (t _i ) of the frame between the “m” -th beat point candidate and the “m + 1” -th beat point candidate in the beat point candidate series OR1. This is a 26-dimensional feature amount obtained by converting to The beat-synchronized chord feature amount XBC2 (m) is the chord feature amount XC (t _i ) of the frame between the “m” -th beat point candidate and the “m + 1” -th beat point candidate in the beat point candidate series OR2. Is a 26-dimensional feature amount obtained by smoothing. Note that the smoothing means, for example, calculating the average of the base feature amount HPCP ^(B) of the frames between the beat point candidates, and calculating the treble feature amount HPCP ^(T) of the frame between the beat point candidates. It means calculating the median of power ρ ^(B) and treble power ρ ^(T) , respectively. The beat-synchronized chord feature amount series CR1 is calculated by calculating the beat-synchronized chord feature amount XBC1 for all the beat point candidates constituting the beat point candidate sequence OR1. Further, the beat-synchronized chord feature amount series CR2 is calculated by calculating the beat-synchronized chord feature amount XBC2 for all the beat point candidates constituting the beat point candidate sequence OR2.

次に、ＣＰＵ１２ａは、ステップＳ１７にて、最尤のビート同期型コード特徴量系列ＣＲ１の尤度ＬＫ１及び最尤のビート同期型コード特徴量系列ＣＲ２の尤度ＬＫ２を計算する。最尤のビート同期型コード特徴量系列ＣＲ１の尤度ＬＫ１及び最尤のビート同期型コード特徴量系列ＣＲ２の尤度ＬＫ２の計算手順は共通である。そこで、以下の説明においては、ビート同期型コード特徴量系列ＣＲ１及びビート同期型コード特徴量系列ＣＲ２を単にビート同期型コード特徴量系列ＣＲと表記する。また、ビート同期型コード特徴量ＸＢＣ１及びビート同期型コード特徴量ＸＢＣ２を単にビート同期型コード特徴量ＸＢＣと表記する。 Next, in step S17, the CPU 12a calculates the likelihood LK1 of the maximum likelihood beat-synchronized code feature amount sequence CR1 and the likelihood LK2 of the maximum likelihood beat-synchronized code feature amount sequence CR2. The procedure for calculating the likelihood LK1 of the maximum likelihood beat-synchronized code feature amount sequence CR1 and the likelihood LK2 of the maximum likelihood beat-synchronized code feature amount sequence CR2 is common. Therefore, in the following description, the beat synchronization type code feature value sequence CR1 and the beat synchronization type code feature value sequence CR2 are simply referred to as a beat synchronization type code feature value sequence CR. Further, the beat synchronization type chord feature amount XBC1 and the beat synchronization type chord feature amount XBC2 are simply referred to as a beat synchronization type chord feature amount XBC.

ここで、ベース特徴量ＨＰＣＰ^（Ｂ）及びトレブル特徴量ＨＰＣＰ^（Ｔ）は、ｖＭＦ（＝ｖｏｎＭｉｓｅｓＦｉｓｈｅｒ）分布に従うと仮定する。また、ベースパワーρ^（Ｂ）及びトレブルパワーρ^（Ｔ）は、ガンマ分布に従うと仮定する。一般の楽曲は、和音が発生されている区間においてパワーが強い。一方、和音が発生されていない区間においてはパワーが弱い。また、和音が発生されていない区間におけるベース特徴量ＨＰＣＰ^（Ｂ）及びトレブル特徴量ＨＰＣＰ^（Ｔ）を構成する各ピッチクラスのパワーの分布と、和音が発生されている区間における前記各ピッチクラスのパワーの分布とが異なる。そこで、和音が発生されている状態と和音が発生されていない状態とに関し、ベース特徴量ＨＰＣＰ^（Ｂ）及びトレブル特徴量ＨＰＣＰ^（Ｔ）並びにベースパワーρ^（Ｂ）及びトレブルパワーρ^（Ｔ）を同時に学習する。また、１つの和音に対して、単一のｖＭＦ分布及びガンマ分布を用いるのではなく、複数のモデル（混合モデル）を設定しておき、それらの重み付き線形和としてビート同期型コード特徴量の観測尤度を定義する。ビート同期型コード特徴量ＸＢＣの観測尤度は、ビート同期型コード特徴量ＸＢＣの構成要素としてのベース特徴量ＨＰＣＰ^（Ｂ）、トレブル特徴量ＨＰＣＰ^（Ｔ）、ベースパワーρ^（Ｂ）及びトレブルパワーρ^（Ｔ）、並びにｖＭＦ分布の平均μ_ｋ、ｖＭＦ分布の分散κ_ｋ、ガンマ分布の尺度母数ｕ、ガンマ分布の形状母数ｖ、及び前記線形和の重みｗ_ｋを用いて、下記の式（５）のように表わされる。なお、「ｋ」は、混合モデルを構成する分布を識別するためのインデックスである。また、低音域に関する変数の右上の括弧内には、「Ｂ」が表記されている。また、中高音域に関する変数の右上の括弧内には、「Ｔ」が表記されている。また、「Θ_ｊ」は、和音ｊに関するパラメータを表わす。例えば、「Θ_ｊ」は、ベース特徴量ＨＰＣＰ^（Ｂ）及びトレブル特徴量ＨＰＣＰ^（Ｔ）の形状（各ピッチクラスのパワーの分布）を表わす。

Here, it is assumed that the base feature amount HPCP ^(B) and the treble feature amount HPCP ^(T) follow a vMF (= von Miss Fisher) distribution. Further, it is assumed that the base power ρ ^(B) and the treble power ρ ^(T) follow a gamma distribution. General music has strong power in a section where chords are generated. On the other hand, the power is weak in the section where no chord is generated. Further, the distribution of the power of each pitch class constituting the base feature amount HPCP ^(B) and the treble feature amount HPCP ^(T) in a section where no chord is generated, and each pitch class in the section where a chord is generated. The power distribution is different. Therefore, the base feature amount HPCP ^{(B), the} treble feature amount HPCP ^(T) , the base power ρ ^(B), and the treble power ρ ^(T) are related to the state where the chord is generated and the state where the chord is not generated. Learn at the same time. Also, instead of using a single vMF distribution and gamma distribution for one chord, a plurality of models (mixed models) are set, and the beat-synchronized chord feature amount is calculated as a weighted linear sum. Define the observation likelihood. The observation likelihood of the beat-synchronized code feature value XBC includes the base feature value HPCP ^(B) , the treble feature value HPCP ^(T) , the base power ρ ^(B), and the treble power as components of the beat-synchronized code feature value XBC. Using ρ ^(T) and the mean μ _{k of} the vMF distribution, the variance κ _{k of the} vMF distribution, the scale parameter u of the gamma distribution, the shape parameter v of the gamma distribution, and the weight w _{k of the} linear sum, It is expressed as equation (5). Note that “k” is an index for identifying the distribution constituting the mixed model. In addition, “B” is described in parentheses at the upper right of the variable related to the low frequency range. In addition, “T” is written in parentheses at the upper right of the variable relating to the middle and high pitch range. “Θ _j ” represents a parameter related to the chord j. For example, “Θ _j ” represents the shape (power distribution of each pitch class ⁾ of the base feature amount HPCP ^(B) and the treble feature amount HPCP ^(T) .

また、一般に、和音から和音への遷移確率は、楽曲の調ｋｅｙに依存する。例えば、和音「Ｃ」から和音「Ｆ」への遷移はハ長調の楽曲において生起する可能性が高い。また、和音から和音への遷移確率は、拍点の拍子位置ｓ（直前の小節線から数えた拍数）に依存する。例えば、４分の４拍子の楽曲において、４拍目（つまりｓ＝４）の和音が「Ｇ７」であるとき、次の小節の１拍目の和音は「Ｃ」である可能性が高い（ドミナントモーション）。そこで、音響信号分析装置１０は、和音の遷移確率を記憶した複数のデータベースを備える。各データベースは、拍子にそれぞれ対応している。つまり、音響信号分析装置１０は、例えば、４分の３拍子の楽曲を分析するときに用いるデータベース、４分の４拍子の楽曲を分析するときに用いるデータベース、８分の６拍子の楽曲を分析するときに用いるデータベースなどを備える。そして、各データベースには、和音から和音への遷移確率が、調ｋｅｙ及び拍子位置ｓに関連づけて記憶されている。和音から和音への遷移確率は、種々の楽曲における和音の遷移を学習することにより決定される。これらのデータベースはＲＯＭ１２ｂに記憶されている。ここで、「ｍ−１」番目の拍点候補における和音が「ｊ´」であって、かつ「ｍ」番目の拍点候補における和音が「ｊ」である確率を下記の式（６）のように表記する。なお、１小節内の拍数にそれぞれ対応したデータベースが設けられていても良い。

In general, the transition probability from chord to chord depends on the key of the music. For example, the transition from the chord “C” to the chord “F” is likely to occur in C major music. The transition probability from chord to chord depends on the beat position s of the beat point (the number of beats counted from the immediately preceding bar line). For example, when the chord of the fourth beat (that is, s = 4) is “G7” in the music of four quarters, there is a high possibility that the first chord of the next measure is “C” ( Dominant motion). Therefore, the acoustic signal analysis apparatus 10 includes a plurality of databases that store chord transition probabilities. Each database corresponds to a time signature. That is, the acoustic signal analysis apparatus 10 analyzes, for example, a database used when analyzing music of 3/4 time, a database used when analyzing music of 4/4 time, and music of 6/8 time. A database to be used when In each database, a transition probability from a chord to a chord is stored in association with the key and the beat position s. The transition probability from chord to chord is determined by learning the transition of chords in various musical pieces. These databases are stored in the ROM 12b. Here, the probability that the chord in the “m−1” th beat point candidate is “j ′” and the chord in the “m” th beat point candidate is “j” is expressed by the following equation (6). It describes as follows. A database corresponding to each beat number in one measure may be provided.

すると、最初の拍の拍子位置ｓ及び調ｋｅｙが既知であるという条件下においては、ビート同期型コード特徴量系列ＣＲの尤度ＬＫ（ｓ，ｋｅｙ）は、下記の式（７）のように表される。

Then, under the condition that the time position s and the key of the first beat are known, the likelihood LK (s, key) of the beat-synchronized code feature amount sequence CR is expressed by the following equation (7). expressed.

なお、式（７）における「Ｚ_ｊ（ｍ）」は、次に説明するような２値変数である。つまり、「Ｚ_ｊ（ｍ）」は、「ｍ」番目の拍点候補の和音が「ｊ」である場合に「１」であり、その他の場合に「０」である。また、「Ｚ_ｊ´（ｍ−１）Ｚ_ｊ（ｍ）」は、「ｍ−１」番目の拍点候補における和音が「ｊ´」であって、かつ「ｍ」番目の拍点候補における和音が「ｊ」である場合にのみ「１」であり、その他の場合に「０」である。 Note that “Z _j (m)” in Equation (7) is a binary variable as described below. That is, “Z _j (m)” is “1” when the chord of the “m” -th beat point candidate is “j”, and “0” in other cases. In addition, “Z _{j ′} (m−1) Z _j (m)” indicates that the chord in the “m−1” -th beat point candidate is “j ′” and the “m” -th beat point candidate. It is “1” only when the chord is “j”, and “0” otherwise.

ここで、拍点候補系列ＯＲ１についての尤度ＬＫ（ｓ，ｋｅｙ）を尤度ＬＫ^（１）（ｓ，ｋｅｙ）と表記する。最尤のビート同期型コード特徴量系列ＣＲ１の尤度ＬＫ１は、式（８）に基づいて計算される。

Here, the likelihood LK (s, key) for the beat point candidate series OR1 is expressed as the likelihood LK ⁽¹⁾ (s, key). The likelihood LK1 of the maximum likelihood beat-synchronized code feature quantity sequence CR1 is calculated based on Expression (8).

また、拍点候補系列ＯＲ２についての尤度ＬＫ（ｓ，ｋｅｙ）を尤度ＬＫ^（２）（ｓ，ｋｅｙ）と表記する。最尤のビート同期型コード特徴量系列ＣＲ２の尤度ＬＫ２は、式（９）に基づいて計算される。

Further, the likelihood LK (s, key) for the beat point candidate series OR2 is expressed as likelihood LK ⁽²⁾ (s, key). The likelihood LK2 of the maximum likelihood beat-synchronized code feature quantity sequence CR2 is calculated based on Expression (9).

なお、ＣＰＵ１２ａは、ビタビアルゴリズムを用いて尤度ＬＫ１及び尤度ＬＫ２を計算する。また、ＣＰＵ１２ａは、尤度ＬＫ１及び尤度ＬＫ２を計算する際、ステップＳ１２においてユーザによって入力された拍子（又は１小節内の拍数）に応じたデータベースを参照して和音の遷移確率を決定する。 The CPU 12a calculates the likelihood LK1 and the likelihood LK2 using the Viterbi algorithm. Further, when calculating the likelihood LK1 and the likelihood LK2, the CPU 12a determines a chord transition probability by referring to a database corresponding to the time signature (or the number of beats in one measure) input by the user in step S12. .

次に、ＣＰＵ１２ａは、ステップＳ１８にて、尤度ＬＫ１と尤度ＬＫ２と比較する。尤度ＬＫ１が尤度ＬＫ２よりも大きいとき、ＣＰＵ１２ａは、「Ｙｅｓ」と判定して、ステップＳ１９にて、前記推定された拍点候補系列ＯＲ１、最尤のビート同期型コード特徴量系列ＣＲ１及び拍子位置ｓを出力する。具体的には、拍点候補系列ＯＲ１及び拍子位置ｓに基づいて、拍点及び小節線を表わすグラフを表示器１３に表示する。また、最尤のビート同期型コード特徴量系列ＣＲ１に基づいて、コード進行Ｚ１（コード名の系列）を計算して、表示器１３に表示する。コード進行Ｚ１は、最尤のビート同期型コード特徴量系列ＣＲ１を構成する各ビート同期型コード特徴量ＸＢＣ１（ｍ）に対応するコード名の系列である。そして、ＣＰＵ１２ａは、ステップＳ２０にて、拍点・コード推定処理を終了する。 Next, the CPU 12a compares the likelihood LK1 and the likelihood LK2 in step S18. When the likelihood LK1 is larger than the likelihood LK2, the CPU 12a determines “Yes”, and in step S19, the estimated beat point candidate series OR1, the maximum likelihood beat-synchronized code feature quantity series CR1, and The beat position s is output. Specifically, a graph representing beat points and bar lines is displayed on the display 13 based on the beat point candidate series OR1 and the beat position s. Further, the chord progression Z1 (chord name sequence) is calculated based on the maximum likelihood beat-synchronized chord feature amount sequence CR1 and displayed on the display 13. The chord progression Z1 is a chord name sequence corresponding to each beat-synchronized chord feature amount XBC1 (m) constituting the most likely beat-synchronous chord feature amount sequence CR1. And CPU12a complete | finishes a beat point and chord estimation process in step S20.

一方、ステップＳ１８において、尤度ＬＫ１が尤度ＬＫ２以下であるとき、ＣＰＵ１２ａは、「Ｎｏ」と判定して、ステップＳ２１にて、前記推定された拍点候補系列ＯＲ２、最尤のビート同期型コード特徴量系列ＣＲ２及び拍子位置ｓを出力する。具体的には、拍点候補系列ＯＲ２及び拍子位置ｓに基づいて、拍点及び小節線を表わすグラフを表示器１３に表示する。また、最尤のビート同期型コード特徴量系列ＣＲ２に基づいて、コード進行Ｚ２（コード名の系列）を計算して、表示器１３に表示する。コード進行Ｚ２は、最尤のビート同期型コード特徴量系列ＣＲ２を構成する各ビート同期型コード特徴量ＸＢＣ２（ｍ）に対応するコード名の系列である。そして、ＣＰＵ１２ａは、ステップＳ２０にて、拍点・コード推定処理を終了する。 On the other hand, when the likelihood LK1 is less than or equal to the likelihood LK2 in step S18, the CPU 12a determines “No”, and in step S21, the estimated beat point candidate series OR2, the maximum likelihood beat synchronization type. The chord feature amount series CR2 and the time signature position s are output. Specifically, a graph representing a beat point and a bar line is displayed on the display 13 based on the beat point candidate series OR2 and the beat position s. Further, the chord progression Z2 (chord name sequence) is calculated based on the maximum likelihood beat-synchronized chord feature amount sequence CR2 and displayed on the display 13. The chord progression Z2 is a chord name sequence corresponding to each beat-synchronized chord feature amount XBC2 (m) constituting the most likely beat-synchronous chord feature amount CR2. And CPU12a complete | finishes a beat point and chord estimation process in step S20.

一般に、コードの変化は、表拍で生起する可能性が高い。そのため、フレームｔ_ｉごとに計算されたコード特徴量ＸＣ（ｔ_ｉ）は、表拍において大きく変化する可能性が高い。よって、誤って裏拍を表拍として選択してしまった場合には、拍点と拍点の間においてコード特徴量ＸＣ（ｔ_ｉ）が大きく変化する可能性が高い。そのため、この場合、ビート同期型コード特徴量は、コードの特徴を的確に表現できていない。つまり、裏拍を表拍として選択してしまった場合には、真の拍点（つまり表拍）を選択した場合に比べて、ビート同期型コード特徴量系列の尤度が低くなる。そこで、音響信号分析装置１０は、拍点候補系列ＯＲ１と、拍点候補系列ＯＲ１を構成する各拍点候補に対して半拍分ずれた拍点候補からなる拍点候補系列ＯＲ２を検出し、拍点候補系列ＯＲ１及び拍点候補系列ＯＲ２に関するビート同期型コード特徴量系列の尤度ＬＫ１及び尤度ＬＫ２を計算する。そして、尤度ＬＫ１及び尤度ＬＫ２を比較して、尤度の高い拍点候補系列を選択する。これにより、誤って裏拍を表拍として選択してしまうことを抑制できる。また、ビート同期型コード特徴量系列の尤度が最も高くなるような拍子位置ｓ及び調ｋｅｙの組み合わせが選択されて、楽曲における拍点、コード進行及び小節線の位置が同時に（一体的）に推定される。したがって、音響信号分析装置１０によれば、拍点、コード進行、及び小節線の位置の推定精度を従来よりも向上させることができる。 In general, chord changes are more likely to occur at the beat. Therefore, the chord feature value XC (t _i ) calculated for each frame t _i is highly likely to change greatly in the table beat. Therefore, if the back beat is selected as the front beat by mistake, there is a high possibility that the chord feature amount XC (t _i ) changes greatly between beat points. Therefore, in this case, the beat synchronization type chord feature amount cannot accurately represent the chord feature. In other words, when the back beat is selected as the front beat, the likelihood of the beat-synchronized code feature quantity sequence is lower than when the true beat point (ie, the front beat) is selected. Therefore, the acoustic signal analysis device 10 detects the beat point candidate series OR1 and the beat point candidate series OR2 including the beat point candidates shifted by half a beat with respect to each beat point candidate constituting the beat point candidate series OR1. The likelihood LK1 and the likelihood LK2 of the beat-synchronized code feature amount series relating to the beat point candidate series OR1 and the beat point candidate series OR2 are calculated. Then, the likelihood LK1 and the likelihood LK2 are compared, and a beat point candidate series having a high likelihood is selected. Thereby, it can suppress selecting a back beat as a front beat accidentally. In addition, the combination of the beat position s and the key key that maximizes the likelihood of the beat-synchronized chord feature quantity sequence is selected, and the beat point, chord progression, and bar line position in the music are simultaneously (integrated). Presumed. Therefore, according to the acoustic signal analyzer 10, the estimation accuracy of the beat point, chord progression, and bar line position can be improved as compared with the conventional technique.

さらに、本発明の実施にあたっては、上記実施形態に限定されるものではなく、本発明の目的を逸脱しない限りにおいて種々の変更が可能である。 Furthermore, in carrying out the present invention, the present invention is not limited to the above embodiment, and various modifications can be made without departing from the object of the present invention.

例えば、上記実施形態では、楽曲全体の演奏音が楽曲データとして記憶されており、その楽曲データを分析して、拍点、コード進行、及び小節線の位置を推定している。しかし、これに代えて、楽曲の演奏音をリアルタイムで取り込みつつ、取り込んだ演奏音を表わすデータを上記実施形態と同様に分析して、拍点、コード進行、及び小節線の位置を推定しても良い。 For example, in the above embodiment, the performance sound of the entire music is stored as music data, and the music data is analyzed to estimate the beat point, chord progression, and bar line position. However, instead of this, while capturing the performance sound of the music in real time, the data representing the captured performance sound is analyzed in the same manner as in the above embodiment, and the beat point, chord progression, and bar line position are estimated. Also good.

１０・・・音響信号分析装置、ＣＲ１・・・ビート同期型コード特徴量系列、ＣＲ２・・・ビート同期型コード特徴量系列、ＨＰＣＰ^（Ｔ）・・・トレブル特徴量、ＨＰＣＰ^（Ｂ）・・・ベース特徴量、ｊ・・・和音、ｋｅｙ・・・調、ＯＲ１・・・拍点候補系列、ＯＲ２・・・拍点候補系列、ｓ・・・拍子位置、ＸＢ・・・ＢＰＭ特徴量、ＸＢＣ１・・・ビート同期型コード特徴量、ＸＢＣ２・・・ビート同期型コード特徴量、ＸＣ・・・コード特徴量、ＸＯ・・・オンセット特徴量、ρ^（Ｔ）・・・トレブルパワー、ρ^（Ｂ）・・・ベースパワー DESCRIPTION OF SYMBOLS 10 ... Acoustic signal analyzer, CR1 ... Beat synchronous type chord feature amount series, CR2 ... Beat synchronous type chord feature amount sequence, HPCP ^(T) ... Treble feature amount, HPCP ^(B) ... Base feature value, j ... chord, key ... key, OR1 ... beat point candidate series, OR2 ... beat point candidate series, s ... beat position, XB ... BPM feature quantity, XBC1 ... beat-synchronized chord feature, XBC2 ... beat-synchronized chord feature, XC ... chord feature, XO ... onset feature, ρ ^(T) ... treble power, ρ ^(B) Base power

Claims

An acoustic signal acquisition means for capturing an acoustic signal representing a performance sound of a music piece as an analysis target;
Based on the acquired sound signal, the first beat point candidate series composed of a plurality of beat point candidates of the music and the half beat difference with respect to the plurality of beat point candidates constituting the first beat point candidate series, respectively. Beat point candidate series detecting means for detecting a second beat point candidate series comprising a plurality of beat point candidates;
Consists of beat-synchronized chord feature quantities each representing a chord feature between two adjacent beat point candidates out of a plurality of beat point candidates constituting the first beat point series based on the acquired sound signal Beats representing the chord features between two adjacent beat point candidates out of a plurality of beat point candidates constituting the second beat point series Beat-synchronized code feature value sequence calculating means for calculating a second beat-synchronized code feature value sequence composed of synchronized code feature values;
Probability model representing the chord progression of the music piece, the probability that the chord transition probability between beat points is set according to the combination of the number of beats within one measure, the key of the music piece, and the beat position of the first beat point Among the models, a probability model in which the first beat synchronization type chord feature amount sequence satisfies a predetermined criterion and a probability model in which the second beat synchronization type chord feature amount sequence satisfies a predetermined criterion are selected one by one, An acoustic signal analyzer comprising: estimation means for estimating a beat point, chord progression, and bar line position of the music piece based on a probability model having a high likelihood among the two selected probability models.

The acoustic signal analyzer according to claim 1,
The beat synchronization type chord feature quantity calculation means
Code feature amount calculating means for calculating individual feature amounts representing the feature amount of the code for each of a plurality of sections located between the two adjacent beat point candidates;
An acoustic signal comprising: a chord feature amount smoothing unit that calculates the beat synchronous chord feature amount by smoothing chord feature amounts of a plurality of sections located between the two adjacent beat point candidates. Analysis equipment.

In the acoustic signal analyzer according to claim 1 or 2,
The beat point candidate series detecting means includes
Beat / tempo feature amount calculating means for calculating a first feature amount representing a feature relating to the presence of a beat in each section of the music and a second feature amount representing a feature relating to the tempo;
Among the plurality of probability models described as a series of states classified by combinations of physical quantities related to the presence of beats and physical quantities related to tempo in each section of the music piece, the first feature quantity and the second feature quantity are those of the music piece. Beat point and tempo estimation means for simultaneously estimating beat points and tempo transitions in the music piece by selecting a probability model in which a series of observation likelihoods representing the probability observed simultaneously in each section satisfies a predetermined criterion; An acoustic signal analyzing apparatus.

In the computer provided in the acoustic signal analyzer,
An acoustic signal acquisition step for capturing an acoustic signal representing a performance sound of a music piece as an analysis target;
Based on the acquired sound signal, the first beat point candidate series composed of a plurality of beat point candidates of the music and the half beat difference with respect to the plurality of beat point candidates constituting the first beat point candidate series, respectively. A beat point candidate sequence detecting step for detecting a second beat point candidate sequence comprising a plurality of beat point candidates;
Consists of beat-synchronized chord feature quantities each representing a chord feature between two adjacent beat point candidates out of a plurality of beat point candidates constituting the first beat point series based on the acquired sound signal Beats representing the chord features between two adjacent beat point candidates out of a plurality of beat point candidates constituting the second beat point series A beat-synchronized chord feature amount sequence calculating step for calculating a second beat-synchronized chord feature amount sequence composed of the synchronized chord feature amounts;
Probability model representing the chord progression of the music piece, the probability that the chord transition probability between beat points is set according to the combination of the number of beats within one measure, the key of the music piece, and the beat position of the first beat point Among the models, a probability model in which the first beat synchronization type chord feature amount sequence satisfies a predetermined criterion and a probability model in which the second beat synchronization type chord feature amount sequence satisfies a predetermined criterion are selected one by one, A computer program for executing an estimation step of estimating a beat point, chord progression, and bar line position of the music piece based on a probability model having a high likelihood among the two selected probability models.