JP4735398B2

JP4735398B2 - Acoustic signal analysis apparatus, acoustic signal analysis method, and acoustic signal analysis program

Info

Publication number: JP4735398B2
Application number: JP2006124940A
Authority: JP
Inventors: 一郎宍戸
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2006-04-28
Filing date: 2006-04-28
Publication date: 2011-07-27
Anticipated expiration: 2026-04-28
Also published as: JP2007298607A

Abstract

<P>PROBLEM TO BE SOLVED: To precisely detect the musical interval by suppressing the variation of performance of musical signal and the effect of kinds of music. <P>SOLUTION: The sound signal 2 is frequency analyzed and the frequency components c[i] [q] are generated, the element of which is component intensity of prescribed frequency for every unit time period. Among each element, an element which satisfies at least one condition is detected as a peak element, wherein the conditions are ≥threshold α[q], ≥calculated value calculated based on the vicinity of each element, and the value of the harmonic element in harmonic relation to each element is ≥calculated value based on the vicinity element. The number of peak elements of the same frequency for every first interval H is counted, if the results of counting is ≥prescribed number, the same frequency in the first interval H is detected as a musical pitch region. The number of musical pitch regions of every second interval J which is not less than the first interval H, if the result of counting is ≥prescribed number, the second interval J is detected as the music interval. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、音響信号を分析して音楽が含まれる信号区間を音楽区間として検出するための音響信号分析装置、音響信号分析方法、及び音響信号分析用プログラムに関する。 The present invention relates to an acoustic signal analyzing apparatus, an acoustic signal analyzing method, and an acoustic signal analyzing program for analyzing a sound signal and detecting a signal section including music as a music section.

近年、動画や音声等のコンテンツをハードディスク、ＤＶＤ、メモリ等の記録媒体に記録することが広く行われている。そして、これら記録媒体の大容量化に伴い、長時間記録されたコンテンツから所望の記録区間だけを読み出して視聴する機能や、再生時間を短縮して視聴する機能へのニーズが高まっている。このような状況において、コンテンツをその内容に応じて小区間に分割するための各種情報を抽出したり、コンテンツの中で音楽が含まれる信号区間だけを識別したりする技術が開発されている。 In recent years, it has been widely practiced to record contents such as moving images and sounds on recording media such as hard disks, DVDs, and memories. With the increase in capacity of these recording media, there is an increasing need for a function for reading and viewing only a desired recording section from content recorded for a long time and a function for viewing with a reduced playback time. In such a situation, techniques for extracting various information for dividing a content into small sections according to the contents and identifying only a signal section in which music is included in the content have been developed.

具体的には、映像信号に含まれる音情報を解析し、スペクトルにおける時間方向の安定性を用いて音楽区間を識別する方法が知られている（例えば、特許文献１，２を参照）。また、音声のサブバンドデータの平均バンドエネルギー比やサブバンドエネルギー重心を用いて、音楽区間と音声区間とを識別する方法も知られている（例えば、特許文献３，４を参照）。
特開平１０−１８７１８２号公報特開２０００−３１５０９４号公報特開平１０−２４７０９３号公報特開２０００−６６６９１号公報 Specifically, a method is known in which sound information included in a video signal is analyzed and a music section is identified using stability in a time direction in a spectrum (see, for example, Patent Documents 1 and 2). There is also known a method for identifying a music section and a voice section using an average band energy ratio of voice subband data and a subband energy centroid (see, for example, Patent Documents 3 and 4).
Japanese Patent Laid-Open No. 10-187182 JP 2000-315094 A Japanese Patent Laid-Open No. 10-247093 JP 2000-66691 A

特許文献１，２においては、スペクトログラムのエッジ強度の総和が、予め設定した閾値より大きいか否かにより音楽区間を判定している。しかしながら、スペクトログラムのエッジ強度の総和は、分析対象の信号レベルや周波数特性のばらつき（入力ソースの特性のばらつき）や、音楽の種類（ジャンル）によって変動するため、多様な入力ソースや多様な音楽ジャンルに対応することが難しい。例えば、同じような音楽区間が２つあった場合に、平均音圧の大きい方ではエッジ強度の総和が大きいが、他方の平均音圧の小さい方ではエッジ強度の総和は小さい。したがって、多様な入力ソースや多様な種類の曲を対象にして音楽区間を検出する場合においては、検出精度が必ずしも十分ではないという問題がある。 In Patent Documents 1 and 2, the music section is determined based on whether or not the sum of the edge intensities of the spectrogram is larger than a preset threshold value. However, the sum of the spectrogram edge strengths varies depending on the signal level and frequency characteristics of the analysis target (variation of the characteristics of the input source) and the type of music (genre). It is difficult to cope with. For example, when there are two similar music sections, the sum of the edge strengths is large when the average sound pressure is large, whereas the sum of the edge strengths is small when the other average sound pressure is small. Therefore, there is a problem that the detection accuracy is not always sufficient when detecting a music section for various input sources and various types of music.

また、特許文献３，４に開示された技術によれば、圧縮符号化された音声データから比較的簡易な計算によって音楽区間と音声区間とを識別できるという長所はあるものの、やはり、多様な入力ソースや多様な音楽ジャンルの曲を対象に音楽区間を検出する場合においては、検出精度を高くすることが困難である。これは、一般的な音楽においては、その周波数成分に有音程楽器等が発する固有の周波数成分が一定時間以上持続するという性質があるが、上述した従来技術はこの性質を十分に利用したものではないからである。 Further, according to the techniques disclosed in Patent Documents 3 and 4, although there is an advantage that the music section and the voice section can be distinguished from the compression-coded voice data by a relatively simple calculation, there are still various inputs. When detecting a music section for a source or a song of various music genres, it is difficult to increase the detection accuracy. This is because in general music, the frequency component has a characteristic that a specific frequency component emitted by a musical instrument or the like lasts for a certain period of time. However, the above-described conventional technology does not fully utilize this property. Because there is no.

さらに、通常の音楽においては、平均律や純正律といった音階に基づく周波数成分が存在するが、特許文献１〜４のいずれにおいても、音楽の音階とは直接対応しない周波数成分を用いているため、音楽の検出精度が必ずしも高いものではない。 Furthermore, in normal music, there are frequency components based on scales such as equal temperament and pure temperament, but in any of Patent Documents 1 to 4, since frequency components that do not directly correspond to musical scales are used, Music detection accuracy is not necessarily high.

そこで本発明は、上記問題点に鑑みてなされたものであり、その目的は、入力ソースである音響信号の特性のばらつきや、この音響信号に含まれる音楽の種類による影響を低く抑えて、且つ音楽が含まれる信号区間を精度良く検出するための、音響信号分析装置、音響信号分析方法、及び音響信号分析用プログラムを提供することを目的とする。 Therefore, the present invention has been made in view of the above problems, and its purpose is to suppress the influence of variations in the characteristics of the acoustic signal that is the input source and the type of music included in the acoustic signal, and An object of the present invention is to provide an acoustic signal analysis device, an acoustic signal analysis method, and an acoustic signal analysis program for accurately detecting a signal section including music.

本発明は、上記の課題を解決するために、
［１］音響信号を複数の周波数バンドに分割して、前記複数の周波数バンドそれぞれにおける単位時間毎の成分強度を要素とする周波数成分データを生成する周波数分析手段と、前記周波数成分データの要素を読み出し、その読み出した要素が所定の閾値以上であるという条件、又は、その読み出した要素の方が、その読み出した要素の近傍の周波数成分の要素に基づき算出された値よりも大きいという条件、又は、その読み出した要素の倍音成分に対応する要素である倍音要素の方が、その倍音要素の近傍の周波数成分の要素に基づき算出された値よりも大きいという条件のうち、少なくとも１つの条件を満足する要素をピーク要素として検出するピーク要素検出手段と、前記複数の周波数バンドそれぞれについて、前記単位時間を含む時間長を有する第１の区間毎に前記ピーク要素の個数を計数し、この計数結果が第１の所定数以上である前記第１の区間を、その周波数バンドにおける音程領域として検出する音程領域検出手段と、前記第１の区間を含む時間長を有する第２の区間毎に、前記複数の周波数バンドそれぞれを対象として前記音程領域の個数を計数し、この計数結果が第２の所定数以上である場合に、その前記第２の区間を音楽区間として検出する音楽区間検出手段とを備えたことを特徴とする音響信号分析装置を提供する。
［２］また、音響信号を複数の周波数バンドに分割して、前記複数の周波数バンドそれぞれにおける単位時間毎の成分強度を要素とする周波数成分データを生成する周波数分析手段と、前記周波数成分データの要素を読み出し、その読み出した要素が所定の閾値以上であるという条件、又は、その読み出した要素の方が、その読み出した要素の近傍の周波数成分の要素に基づき算出された値よりも大きいという条件、又は、その読み出した要素の倍音成分に対応する要素である倍音要素の方が、その倍音要素の近傍の周波数成分の要素に基づき算出された値よりも大きいという条件のうち、少なくとも１つの条件を満足する要素をピーク要素として検出するピーク要素検出手段と、前記複数の周波数バンドそれぞれについて、前記単位時間を含む時間長を有する第１の区間毎に、前記ピーク要素の個数、及び前記ピーク要素の値の総和を計数し、前記ピーク要素の個数が第１の所定数以上であり、かつ前記ピーク要素の値の総和が第１の所定値以上である場合に、前記第１の区間を、その周波数バンドにおける音程領域として検出する音程領域検出手段と、前記第１の区間を含む時間長を有する第２の区間毎に、前記複数の周波数バンドそれぞれを対象として前記音程領域の個数を計数し、この計数結果が第２の所定値以上である場合に、その前記第２の区間を音楽区間として検出する音楽区間検出手段とを備えたことを特徴とする音響信号分析装置を提供する。
［３］また、音響信号を複数の周波数バンドに分割して、前記複数の周波数バンドそれぞれにおける単位時間毎の成分強度を要素とする周波数成分データを生成する周波数分析手段と、前記周波数成分データの要素を読み出し、その読み出した要素が所定の閾値以上であるという条件、又は、その読み出した要素の方が、その読み出した要素の近傍の周波数成分の要素に基づき算出された値よりも大きいという条件、又は、その読み出した要素の倍音成分に対応する要素である倍音要素の方が、その倍音要素の近傍の周波数成分の要素に基づき算出された値よりも大きいという条件のうち、少なくとも１つの条件を満足する要素をピーク要素として検出するピーク要素検出手段と、前記複数の周波数バンドそれぞれについて、前記単位時間を含む時間長を有する第１の区間毎に、前記ピーク要素の個数、及び前記ピーク要素の値の総和を計数し、前記ピーク要素の個数が第１の所定数以上であり、かつ前記ピーク要素の値の総和が第１の所定値以上である場合に、前記第１の区間を、その周波数バンドにおける音程領域として検出すると共に、前記ピーク要素の値の総和を、その音程領域の音程領域強度とする音程領域検出手段と、前記第１の区間を含む時間長を有する第２の区間毎に、前記複数の周波数バンドそれぞれを対象として前記音程領域強度の総和を計数し、この総和が第２の所定値以上である場合に、その前記第２の区間を音楽区間として検出する音楽区間検出手段とを備えたことを特徴とする音響信号分析装置を提供する。
［４］また、前記第１の区間は前記単位時間を複数含む時間長を有し、前記第２の区間は前記第１の区間を複数含む時間長を有することを特徴とする［１］〜［３］のいずれか１つに記載の音響信号分析装置を提供する。
［５］また、前記複数の周波数バンドは、音階の音程周波数に対応する周波数バンドであることを特徴とする［１］〜［４］のいずれか１つに記載の音響信号分析装置を提供する。
［６］また、前記単位時間は前記複数の周波数バンドそれぞれに応じて異なることを特徴とする［１］〜［５］のいずれか１つに記載の音響信号分析装置を提供する。
［７］また、前記音程領域検出手段は、前記第１の所定数として、前記第１の区間が含む前記単位時間の個数よりも小さな値を用いることを特徴とする［１］〜［６］のいずれか１つに記載の音響信号分析装置を提供する。
［８］また、前記音程領域検出手段は、一の周波数バンドにおいて、前記ピーク要素と、前記ピーク要素ではない要素と、前記ピーク要素とが時間的に連続していた場合、前記ピーク要素ではない要素を前記ピーク要素として計数することを特徴とする［１］〜［７］のいずれか１つに記載の音響信号分析装置を提供する。
［９］また、音響信号を複数の周波数バンドに分割して、前記複数の周波数バンドそれぞれにおける単位時間毎の成分強度を要素とする周波数成分データを生成する周波数分析ステップと、前記周波数成分データの要素を読み出し、その読み出した要素が所定の閾値以上であるという条件、又は、その読み出した要素の方が、その読み出した要素の近傍の周波数成分の要素に基づき算出された値よりも大きいという条件、又は、その読み出した要素の倍音成分に対応する要素である倍音要素の方が、その倍音要素の近傍の周波数成分の要素に基づき算出された値よりも大きいという条件のうち、少なくとも１つの条件を満足する要素をピーク要素として検出するピーク要素検出ステップと、前記複数の周波数バンドそれぞれについて、前記単位時間を含む時間長を有する第１の区間毎に前記ピーク要素の個数を計数し、この計数結果が第１の所定数以上である前記第１の区間を、その周波数バンドにおける音程領域として検出する音程領域検出ステップと、前記第１の区間を含む時間長を有する第２の区間毎に、前記複数の周波数バンドそれぞれを対象として前記音程領域の個数を計数し、この計数結果が第２の所定数以上である場合に、その前記第２の区間を音楽区間として検出する音楽区間検出ステップとを有することを特徴とする音響信号分析方法を提供する。
［１０］また、音響信号を複数の周波数バンドに分割して、前記複数の周波数バンドそれぞれにおける単位時間毎の成分強度を要素とする周波数成分データを生成する周波数分析ステップと、前記周波数成分データの要素を読み出し、その読み出した要素が所定の閾値以上であるという条件、又は、その読み出した要素の方が、その読み出した要素の近傍の周波数成分の要素に基づき算出された値よりも大きいという条件、又は、その読み出した要素の倍音成分に対応する要素である倍音要素の方が、その倍音要素の近傍の周波数成分の要素に基づき算出された値よりも大きいという条件のうち、少なくとも１つの条件を満足する要素をピーク要素として検出するピーク要素検出ステップと、前記複数の周波数バンドそれぞれについて、前記単位時間を含む時間長を有する第１の区間毎に、前記ピーク要素の個数、及び前記ピーク要素の値の総和を計数し、前記ピーク要素の個数が第１の所定数以上であり、かつ前記ピーク要素の値の総和が第１の所定値以上である場合に、前記第１の区間を、その周波数バンドにおける音程領域として検出する音程領域検出ステップと、前記第１の区間を含む時間長を有する第２の区間毎に、前記複数の周波数バンドそれぞれを対象として前記音程領域の個数を計数し、この計数結果が第２の所定値以上である場合に、その前記第２の区間を音楽区間として検出する音楽区間検出ステップとを有することを特徴とする音響信号分析方法を提供する。
［１１］また、音響信号を複数の周波数バンドに分割して、前記複数の周波数バンドそれぞれにおける単位時間毎の成分強度を要素とする周波数成分データを生成する周波数分析ステップと、前記周波数成分データの要素を読み出し、その読み出した要素が所定の閾値以上であるという条件、又は、その読み出した要素の方が、その読み出した要素の近傍の周波数成分の要素に基づき算出された値よりも大きいという条件、又は、その読み出した要素の倍音成分に対応する要素である倍音要素の方が、その倍音要素の近傍の周波数成分の要素に基づき算出された値よりも大きいという条件のうち、少なくとも１つの条件を満足する要素をピーク要素として検出するピーク要素検出ステップと、前記複数の周波数バンドそれぞれについて、前記単位時間を含む時間長を有する第１の区間毎に、前記ピーク要素の個数、及び前記ピーク要素の値の総和を計数し、前記ピーク要素の個数が第１の所定数以上であり、かつ前記ピーク要素の値の総和が第１の所定値以上である場合に、前記第１の区間を、その周波数バンドにおける音程領域として検出すると共に、前記ピーク要素の値の総和を、その音程領域の音程領域強度とする音程領域検出ステップと、前記第１の区間を含む時間長を有する第２の区間毎に、前記複数の周波数バンドそれぞれを対象として前記音程領域強度の総和を計数し、この総和が第２の所定値以上である場合に、その前記第２の区間を音楽区間として検出する音楽区間検出ステップとを有することを特徴とする音響信号分析方法を提供する。
［１２］また、コンピュータに、音響信号を複数の周波数バンドに分割して、前記複数の周波数バンドそれぞれにおける単位時間毎の成分強度を要素とする周波数成分データを生成する周波数分析ステップと、前記周波数成分データの要素を読み出し、その読み出した要素が所定の閾値以上であるという条件、又は、その読み出した要素の方が、その読み出した要素の近傍の周波数成分の要素に基づき算出された値よりも大きいという条件、又は、その読み出した要素の倍音成分に対応する要素である倍音要素の方が、その倍音要素の近傍の周波数成分の要素に基づき算出された値よりも大きいという条件のうち、少なくとも１つの条件を満足する要素をピーク要素として検出するピーク要素検出ステップと、前記複数の周波数バンドそれぞれについて、前記単位時間を含む時間長を有する第１の区間毎に前記ピーク要素の個数を計数し、この計数結果が第１の所定数以上である前記第１の区間を、その周波数バンドにおける音程領域として検出する音程領域検出ステップと、前記第１の区間を含む時間長を有する第２の区間毎に、前記複数の周波数バンドそれぞれを対象として前記音程領域の個数を計数し、この計数結果が第２の所定数以上である場合に、その前記第２の区間を音楽区間として検出する音楽区間検出ステップとを実行させるための音響信号分析用プログラムを提供する。
［１３］また、コンピュータに、音響信号を複数の周波数バンドに分割して、前記複数の周波数バンドそれぞれにおける単位時間毎の成分強度を要素とする周波数成分データを生成する周波数分析ステップと、前記周波数成分データの要素を読み出し、その読み出した要素が所定の閾値以上であるという条件、又は、その読み出した要素の方が、その読み出した要素の近傍の周波数成分の要素に基づき算出された値よりも大きいという条件、又は、その読み出した要素の倍音成分に対応する要素である倍音要素の方が、その倍音要素の近傍の周波数成分の要素に基づき算出された値よりも大きいという条件のうち、少なくとも１つの条件を満足する要素をピーク要素として検出するピーク要素検出ステップと、前記複数の周波数バンドそれぞれについて、前記単位時間を含む時間長を有する第１の区間毎に、前記ピーク要素の個数、及び前記ピーク要素の値の総和を計数し、前記ピーク要素の個数が第１の所定数以上であり、かつ前記ピーク要素の値の総和が第１の所定値以上である場合に、前記第１の区間を、その周波数バンドにおける音程領域として検出する音程領域検出ステップと、前記第１の区間を含む時間長を有する第２の区間毎に、前記複数の周波数バンドそれぞれを対象として前記音程領域の個数を計数し、この計数結果が第２の所定値以上である場合に、その前記第２の区間を音楽区間として検出する音楽区間検出ステップとを実行させるための音響信号分析用プログラムを提供する。
［１４］また、コンピュータに、音響信号を複数の周波数バンドに分割して、前記複数の周波数バンドそれぞれにおける単位時間毎の成分強度を要素とする周波数成分データを生成する周波数分析ステップと、前記周波数成分データの要素を読み出し、その読み出した要素が所定の閾値以上であるという条件、又は、その読み出した要素の方が、その読み出した要素の近傍の周波数成分の要素に基づき算出された値よりも大きいという条件、又は、その読み出した要素の倍音成分に対応する要素である倍音要素の方が、その倍音要素の近傍の周波数成分の要素に基づき算出された値よりも大きいという条件のうち、少なくとも１つの条件を満足する要素をピーク要素として検出するピーク要素検出ステップと、前記複数の周波数バンドそれぞれについて、前記単位時間を含む時間長を有する第１の区間毎に、前記ピーク要素の個数、及び前記ピーク要素の値の総和を計数し、前記ピーク要素の個数が第１の所定数以上であり、かつ前記ピーク要素の値の総和が第１の所定値以上である場合に、前記第１の区間を、その周波数バンドにおける音程領域として検出すると共に、前記ピーク要素の値の総和を、その音程領域の音程領域強度とする音程領域検出ステップと、前記第１の区間を含む時間長を有する第２の区間毎に、前記複数の周波数バンドそれぞれを対象として前記音程領域強度の総和を計数し、この総和が第２の所定値以上である場合に、その前記第２の区間を音楽区間として検出する音楽区間検出ステップとを実行させるための音響信号分析用プログラムを提供する。 In order to solve the above problems, the present invention
[1] acoustic signal and divided into a plurality of frequency bands, a frequency analysis means for generating a frequency component data to the component intensity per unit time at pre-Symbol respective plurality of frequency bands and elements, said frequency component data The condition that the read element is equal to or greater than a predetermined threshold, or that the read element is larger than the value calculated based on the frequency component element in the vicinity of the read element. At least one of the conditions or the condition that the harmonic element that is the element corresponding to the harmonic component of the read element is larger than the value calculated based on the element of the frequency component in the vicinity of the harmonic element a peak component detecting means for detecting an element that satisfies a condition as a peak component, for each of the plurality of frequency bands, when including the unit time Counts the number of pre-Symbol peak elements for each first section having a length, the counting result is the first section is greater than or equal to a first predetermined number, detected by the pitch area detected as the pitch region at that frequency band The number of the pitch regions is counted for each of the plurality of frequency bands for each second section having a time length including the first section, and the counting result is equal to or greater than a second predetermined number . in some cases, providing a sound signal analysis apparatus characterized by comprising a music segment detection means for detecting the second interval as a music segment.
[2] Also, in split an acoustic signal into a plurality of frequency bands, a frequency analysis means for generating a frequency component data to the component intensity per unit time at pre-Symbol respective plurality of frequency bands and elements, the frequency Read the element of the component data and the condition that the read element is equal to or greater than the predetermined threshold, or the read element is more than the value calculated based on the element of the frequency component near the read element At least of the condition that the harmonic element that is the element corresponding to the harmonic component of the read element is larger than the value calculated based on the element of the frequency component in the vicinity of the harmonic element. A peak element detecting means for detecting an element satisfying one condition as a peak element, and the unit time for each of the plurality of frequency bands. Counting the number of the peak elements and the sum of the values of the peak elements for each first section having a time length including the number of peak elements is equal to or greater than a first predetermined number, and A pitch area detecting means for detecting the first section as a pitch area in the frequency band when the sum of the values is equal to or greater than a first predetermined value, and a second length having a time length including the first section . for each interval, the number of the pitch region counted as a target to each of the plurality of frequency bands, when the count result is a second predetermined value or more, detects the second section as music section There is provided an acoustic signal analyzing apparatus comprising a music section detecting means .
[3] Further, by dividing the acoustic signals into a plurality of frequency bands, a frequency analysis means for generating a frequency component data to the component strength per unit time element in each of the plurality of frequency bands, said frequency component data The condition that the read element is equal to or greater than a predetermined threshold, or that the read element is larger than the value calculated based on the frequency component element in the vicinity of the read element. At least one of the conditions or the condition that the harmonic element that is the element corresponding to the harmonic component of the read element is larger than the value calculated based on the element of the frequency component in the vicinity of the harmonic element A peak element detecting means for detecting an element satisfying a condition as a peak element, and the unit time for each of the plurality of frequency bands. Counting the number of the peak elements and the sum of the values of the peak elements for each first section having a time length including the number of peak elements is equal to or greater than a first predetermined number, and When the sum of the values is equal to or greater than a first predetermined value, the first section is detected as a pitch region in the frequency band, and the sum of the peak element values is determined as a pitch region intensity of the pitch region. For each of the plurality of frequency bands, the sum of the pitch region intensities is counted for each of the second sections having a time length including the first section . when a predetermined value or more, to provide a sound signal analysis apparatus characterized by comprising a music segment detection means for detecting the second interval as a music segment.
[4] The first interval has a time length including a plurality of the unit times, and the second interval has a time length including a plurality of the first intervals. The acoustic signal analyzer according to any one of [3] is provided .
[5] In addition, the plurality of frequency bands, to provide an acoustic signal analyzer according to any one of [1] to [4], characterized in that a frequency band corresponding to the pitch frequency of the musical scale .
[6] The acoustic signal analyzer according to any one of [1] to [5], wherein the unit time is different depending on each of the plurality of frequency bands .
[7] In addition, the pitch region detecting means uses a value smaller than the number of unit times included in the first section as the first predetermined number [1] to [6]. An acoustic signal analysis device according to any one of the above is provided .
[8] In addition, the pitch region detection unit is not the peak element when the peak element, the element that is not the peak element, and the peak element are temporally continuous in one frequency band. An acoustic signal analyzer according to any one of [1] to [7], wherein an element is counted as the peak element .
[9] In addition, the acoustic signal is divided into a plurality of frequency bands, a frequency analysis step of generating frequency component data to the component intensity per unit time at pre-Symbol respective plurality of frequency bands and elements, the frequency Read the element of the component data and the condition that the read element is equal to or greater than the predetermined threshold, or the read element is more than the value calculated based on the element of the frequency component near the read element At least of the condition that the harmonic element that is the element corresponding to the harmonic component of the read element is larger than the value calculated based on the element of the frequency component in the vicinity of the harmonic element. a peak component detecting step of detecting an element that satisfies one condition as the peak element, for each of the plurality of frequency bands, the single Counts the number of pre-Symbol peak elements for each first section having a length of time including the time, the counting result is the first section is greater than or equal to a first predetermined number, detected as the pitch region at that frequency band a pitch region detection step of, in the second each section having a time length that includes the first interval, the number of the pitch region counted as a target to each of the plurality of frequency bands, the count result of the second If it is more than a predetermined number, to provide a sound signal analysis method, which comprises have a music-segment detection step of detecting the second section as music section.
[10] Also, to split the acoustic signal into a plurality of frequency bands, a frequency analysis step of generating frequency component data to the component strength per unit time element before Symbol respective plurality of frequency bands, said frequency Read the element of the component data and the condition that the read element is equal to or greater than the predetermined threshold, or the read element is more than the value calculated based on the element of the frequency component near the read element At least of the condition that the harmonic element that is the element corresponding to the harmonic component of the read element is larger than the value calculated based on the element of the frequency component in the vicinity of the harmonic element. a peak component detecting step of detecting an element that satisfies one condition as the peak element, for each of the plurality of frequency bands, wherein Counting the number of the peak elements and the sum of the values of the peak elements for each first section having a time length including a rank time, the number of the peak elements being a first predetermined number or more, and A pitch area detecting step for detecting the first section as a pitch area in the frequency band when the sum of the peak element values is equal to or greater than a first predetermined value, and a time length including the first section. second for each section, the number of the pitch region counted as a target to each of the plurality of frequency bands, when the count result is a second predetermined value or more, music and the second section having to provide an acoustic signal analysis method, which comprises have a music-segment detection step of detecting as a section.
[11] In addition, by dividing the acoustic signals into a plurality of frequency bands, a frequency analysis step of generating frequency component data to the component intensity per unit time and the elements in each of the plurality of frequency bands, said frequency component data The condition that the read element is equal to or greater than a predetermined threshold, or that the read element is larger than the value calculated based on the frequency component element in the vicinity of the read element. At least one of the conditions or the condition that the harmonic element that is the element corresponding to the harmonic component of the read element is larger than the value calculated based on the element of the frequency component in the vicinity of the harmonic element a peak component detecting step of detecting an element that satisfies a condition as a peak component, for each of the plurality of frequency bands, wherein Counting the number of the peak elements and the sum of the values of the peak elements for each first section having a time length including a rank time, the number of the peak elements being a first predetermined number or more, and When the sum of the peak element values is equal to or greater than the first predetermined value, the first section is detected as a pitch region in the frequency band, and the sum of the peak element values is calculated as a pitch of the pitch region. The sum of the pitch region intensities is counted for each of the plurality of frequency bands for each pitch region detecting step to be a region intensity and for each second interval having a time length including the first interval. If it is the second predetermined value or more, to provide a sound signal analysis method, which comprises have a music-segment detection step of detecting the second section as music section.
[12] In addition, the computer, the acoustic signal is divided into a plurality of frequency bands, a frequency analysis step of generating frequency component data to the component intensity per unit time at pre-Symbol respective plurality of frequency bands and elements The element of the frequency component data is read, and the condition that the read element is equal to or greater than a predetermined threshold, or the read element is calculated based on the element of the frequency component near the read element. Or a harmonic element that is an element corresponding to the harmonic component of the read element is larger than a value calculated based on a frequency component element in the vicinity of the harmonic element . of a peak component detecting step of detecting an element that satisfies at least one condition as a peak component, the plurality of frequency bands that For counts the number of pre-Symbol peak elements for each first section having a time length that includes the unit time, the counting result is the first section is greater than or equal to a first predetermined number, at that frequency band The number of the pitch range is counted for each of the plurality of frequency bands for each pitch range detection step that is detected as a pitch range, and for each second interval having a time length including the first interval, and the counting result There the case where the second predetermined number or more, to provide a sound signal analysis program for executing the music-segment detection step of detecting the second section as music section.
[13] In addition, the computer, the acoustic signal is divided into a plurality of frequency bands, a frequency analysis step of generating frequency component data to the component intensity per unit time at pre-Symbol respective plurality of frequency bands and elements The element of the frequency component data is read, and the condition that the read element is equal to or greater than a predetermined threshold, or the read element is calculated based on the element of the frequency component near the read element. Or a harmonic element that is an element corresponding to the harmonic component of the read element is larger than a value calculated based on a frequency component element in the vicinity of the harmonic element . A peak element detecting step for detecting an element satisfying at least one condition as a peak element, and each of the plurality of frequency bands. For each of the first intervals having a time length including the unit time, the number of the peak elements and the sum of the values of the peak elements are counted, and the number of the peak elements is equal to or greater than a first predetermined number. And when the sum of the peak element values is equal to or greater than a first predetermined value, a pitch region detecting step of detecting the first interval as a pitch region in the frequency band; and a second each section having a time length that includes a number of said pitch region counted as a target to each of the plurality of frequency bands, when the count result is a second predetermined value or more, the second to provide an acoustic signal analysis program for executing the music-segment detection step of detecting a section as music section.
[14] Also, the computer divides the acoustic signal into a plurality of frequency bands, a frequency analysis step of generating frequency component data to the component strength per unit time element in each of the plurality of frequency bands, wherein Read the frequency component data element, and the condition that the read element is equal to or greater than the predetermined threshold, or the read element is a value calculated based on the frequency component element in the vicinity of the read element Or a harmonic element that is an element corresponding to the harmonic component of the read element is larger than a value calculated based on an element of a frequency component in the vicinity of the harmonic element . A peak element detecting step of detecting, as a peak element, an element satisfying at least one condition, and each of the plurality of frequency bands. For each of the first intervals having a time length including the unit time, the number of the peak elements and the sum of the values of the peak elements are counted, and the number of the peak elements is equal to or greater than a first predetermined number. And when the sum of the peak element values is greater than or equal to a first predetermined value, the first section is detected as a pitch region in the frequency band, and the sum of the peak element values is For each pitch region detecting step for setting the pitch region intensity of the pitch region and for each second section having a time length including the first section, the sum of the pitch area intensities is counted for each of the plurality of frequency bands. , if the sum is a second predetermined value or more, to provide a sound signal analysis program for executing the music-segment detection step of detecting the second section as music section.

本発明によれば、音響信号について、音楽が含まれる信号区間に存在する確率の高い有音程楽器の発音区間を精度良く検出することができるため、多様な入力ソースによる音響信号の中から音楽区間を精度良く検出することができる。 According to the present invention, a sound section of a musical instrument with a high probability of being present in a signal section including music can be detected with high accuracy, so that a music section can be selected from acoustic signals from various input sources. Can be detected with high accuracy.

また、本発明によれば、音楽の存在しない音声区間において、瞬間的に特定の周波数成分の強度が大きくなるような場合があっても、音声区間を音楽区間と誤判定することを飛躍的に低く抑えることができる。 Further, according to the present invention, even in the case where the intensity of a specific frequency component instantaneously increases in a voice section where there is no music, it is possible to dramatically determine that a voice section is a music section. It can be kept low.

さらに、本発明によれば、音楽で用いられる音階に直接対応する周波数成分を抽出することができるため、有音程楽器が発音している場合に、この発音された音程の近傍の音程の周波数成分は小さいという性質を演算処理に十分に反映させることができるため、音楽区間をより精度良く検出することができる。 Furthermore, according to the present invention, since it is possible to extract a frequency component that directly corresponds to a scale used in music, when a musical instrument is pronounced, the frequency component of a pitch in the vicinity of the sounded pitch. Can be sufficiently reflected in the arithmetic processing, so that the music section can be detected more accurately.

以下、本発明を実施するための最良の形態について、好ましい実施例を示して詳細に説明する。 Hereinafter, the best mode for carrying out the present invention will be described in detail with reference to preferred embodiments.

図１に、本発明の第１の実施例である音響信号分析装置の全体構成を示す。同図において、音響信号分析装置１は、音響信号入力部１１と、周波数分析部１２と、ピーク要素検出部１３と、音程領域検出部１４と、音楽区間検出部１５とを備えている。そして、各処理部は、各種演算処理及び当該処理部を制御するための演算処理回路１１ａ〜１５ａをそれぞれ具備している。さらに、音響信号分析装置１には、不図示ではあるが、本発明の実施形態である音響信号分析用プログラムを実行して前記各処理部を制御するための、ＣＰＵを具備した制御部を備えている。この音響信号分析装置１は、音響信号２が音響信号入力部１１に入力され、同図に示す信号処理の流れにより各種処理がされた後、音楽区間検出部１５から音楽区間情報３を外部に出力するようになっている。 FIG. 1 shows the overall configuration of an acoustic signal analyzer according to the first embodiment of the present invention. In FIG. 1, the acoustic signal analysis device 1 includes an acoustic signal input unit 11, a frequency analysis unit 12, a peak element detection unit 13, a pitch region detection unit 14, and a music section detection unit 15. Each processing unit includes various arithmetic processes and arithmetic processing circuits 11a to 15a for controlling the processing units. Furthermore, although not shown, the acoustic signal analyzing apparatus 1 includes a control unit including a CPU for executing the acoustic signal analysis program according to the embodiment of the present invention and controlling the processing units. ing. In this acoustic signal analyzing apparatus 1, after the acoustic signal 2 is input to the acoustic signal input unit 11 and subjected to various processes according to the signal processing flow shown in FIG. It is designed to output.

音響信号２は、ＰＣＭデータ、アナログ音声信号、デジタル圧縮信号等のオーディオ信号フォーマットのうち、いずれの形式であってもよい。 The acoustic signal 2 may be in any format among audio signal formats such as PCM data, analog audio signal, and digital compressed signal.

音響信号入力部１１は、入力された音響信号２から所定のサンプリング周波数ＦｓのＰＣＭデータを生成する機能を有する。具体的には、音響信号入力部１１は、音響信号２がアナログ音声信号である場合はデジタル変換処理を実行し、一方、音響信号２がデジタル圧縮信号である場合はデコード処理を実行する。また、音響信号２がＰＣＭデータである場合は、音響信号入力部１１は、入力された音響信号２をそのまま次段の周波数分析部１２に供給するが、入力源のサンプリング周波数がサンプリング周波数Ｆｓと異なる場合には、レート変換処理を実行してサンプリング周波数ＦｓのＰＣＭデータに変換する。 The acoustic signal input unit 11 has a function of generating PCM data having a predetermined sampling frequency Fs from the input acoustic signal 2. Specifically, the acoustic signal input unit 11 performs digital conversion processing when the acoustic signal 2 is an analog audio signal, and performs decoding processing when the acoustic signal 2 is a digital compression signal. When the acoustic signal 2 is PCM data, the acoustic signal input unit 11 supplies the input acoustic signal 2 as it is to the frequency analysis unit 12 at the next stage, but the sampling frequency of the input source is the sampling frequency Fs. If they are different, a rate conversion process is executed to convert them to PCM data of the sampling frequency Fs.

なお、以下説明においては、音響信号入力部１１から出力されるＰＣＭデータを、音響データｘ［ｍ］（ｍ＝０〜Ｌ−１、Ｌは１以上の整数であり音響データの総数を示す。）、又は音響データと記載する。 In the following description, the PCM data output from the acoustic signal input unit 11 is the acoustic data x [m] (m = 0 to L−1, L is an integer of 1 or more, and indicates the total number of acoustic data. ) Or acoustic data.

周波数分析部１２は、音響信号入力部１１から出力された音響データを入力して周波数分析処理を実行し、単位時間毎に所定の周波数成分の成分強度を要素とする行列データを作成する機能を有する。 The frequency analysis unit 12 has a function of inputting the acoustic data output from the acoustic signal input unit 11 and executing a frequency analysis process to create matrix data having a component intensity of a predetermined frequency component as an element every unit time. Have.

周波数分析部１２の処理について、図２のフローチャートを参照して説明する。本実施例においては、周波数分析部１２は、音響データを固定長のフレームに分割し、フレーム単位での処理を実行する。また、周波数分析の方法としては公知のＳＴＦＴ（Ｓｈｏｒｔ−ｔｉｍｅＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ）を用いるが、これ以外にもウェーブレット変換やフィルターバンク等の方法を用いてもよい。また、本実施例では、全ての周波数成分を同一時間間隔で生成しているが、周波数バンドに応じて時間間隔を変えるようにして周波数成分を生成するようにしてもよい。 The processing of the frequency analysis unit 12 will be described with reference to the flowchart of FIG. In the present embodiment, the frequency analysis unit 12 divides the acoustic data into fixed-length frames and executes processing in units of frames. As a frequency analysis method, a well-known STFT (Short-time Fourier Transform) is used, but other methods such as wavelet transform and filter bank may be used. In this embodiment, all frequency components are generated at the same time interval. However, the frequency components may be generated by changing the time interval according to the frequency band.

なお、以下の説明においては、フレーム長をＮ、フレームシフト長をＳとする。フレームシフト長Ｓに相当する時間長が単位時間である。 In the following description, the frame length is N and the frame shift length is S. A time length corresponding to the frame shift length S is a unit time.

フレームの総数をＭとすると、フレーム総数Ｍは、数式１により算出される。 When the total number of frames is M, the total number M of frames is calculated by Equation 1.

上記のｆｌｏｏｒ関数は、小数点以下を切り捨てた整数を返す関数である。なお、本実施例においては、Ｌ≧Ｎであるものとする。図２のフローチャートにおいて、周波数分析部１２の演算処理回路１２ａは、フレーム番号を示す制御変数ｉを０に設定する（ステップＳ１１０）。次に、演算処理回路１２ａは、ｉ番目のフレームを作成する（ステップＳ１２０）。すなわち、図３に示すように、音響データの先頭からｉ×Ｓ個分オフセットされた位置からＮ個のデータを抽出し、これに数式２に示すように、窓関数ｗを乗じてｉ番目のフレームデータｙ［ｉ］［ｎ］（ｎ＝０〜Ｎ−１）を生成する。 The above floor function is a function that returns an integer with the decimal part truncated. In this embodiment, it is assumed that L ≧ N. In the flowchart of FIG. 2, the arithmetic processing circuit 12a of the frequency analysis unit 12 sets a control variable i indicating a frame number to 0 (step S110). Next, the arithmetic processing circuit 12a creates the i-th frame (step S120). That is, as shown in FIG. 3, N pieces of data are extracted from the position offset by i × S from the head of the acoustic data, and this is multiplied by the window function w as shown in Equation 2 to obtain the i th Frame data y [i] [n] (n = 0 to N−1) is generated.

窓関数ｗとしては、例えば数式３に示すハミング窓を用いることができる。この他にも、方形窓、ハニング窓、ブラックマン窓等を用いてもよい。 As the window function w, for example, a Hamming window shown in Equation 3 can be used. In addition, a rectangular window, Hanning window, Blackman window, or the like may be used.

次に、演算処理回路１２ａは、数式４によってｉ番目のフレームの離散フーリエ変換（ＤＦＴ）を計算する（ステップＳ１３０）。 Next, the arithmetic processing circuit 12a calculates the discrete Fourier transform (DFT) of the i-th frame by Equation 4 (step S130).

次に、演算処理回路１２ａは、ステップＳ１３０により算出された複素系列ａ［ｉ］［ｋ］（ｋ＝０〜Ｎ−１）の実数部Ｒｅ｛ａ［ｉ］［ｋ］｝と虚数部Ｉｍ｛ａ［ｉ］［ｋ］｝とを用いて、数式５又は数式６によりｉ番目のフレームのスペクトル系列ｂ［ｉ］［ｋ］（ｋ＝０〜Ｎ／２−１）を計算する（ステップＳ１４０）。 Next, the arithmetic processing circuit 12a includes the real part Re {a [i] [k]} and the imaginary part Im of the complex sequence a [i] [k] (k = 0 to N−1) calculated in step S130. {A [i] [k]} is used to calculate the spectrum sequence b [i] [k] (k = 0 to N / 2-1) of the i-th frame using Equation 5 or Equation 6 (Step 0). S140).

数式５を適用する場合は、スペクトル系列ｂ［ｉ］［ｋ］がパワースペクトルとなり、数式６を適用する場合は、スペクトル系列ｂ［ｉ］［ｋ］が振幅スペクトルとなる。 When Equation 5 is applied, the spectrum sequence b [i] [k] is a power spectrum, and when Equation 6 is applied, the spectrum sequence b [i] [k] is an amplitude spectrum.

次に、演算処理回路１２ａは、ＤＦＴの結果からフレームｉ、周波数バンドｑにおける周波数成分ｃ［ｉ］［ｑ］（ｑ＝０〜Ｑ−１、Ｑは１以上の整数であり周波数バンド数を示す。）を計算する（ステップＳ１５０）。ここで、ステップＳ１５０における計算方法としては、例えば以下の２通りを適用することができる。 Next, the arithmetic processing circuit 12a calculates the frequency component c [i] [q] (q = 0 to Q-1, Q is an integer equal to or greater than 1 and the number of frequency bands in the frame i and the frequency band q from the DFT result. (Step S150). Here, for example, the following two methods can be applied as the calculation method in step S150.

周波数成分ｃ［ｉ］［ｑ］を計算する第１の方法は、数式７によりスペクトル系列ｂ［ｉ］［ｋ］の一部又は全部をｃ［ｉ］［ｑ］に対応させる方法である。 The first method for calculating the frequency component c [i] [q] is a method in which part or all of the spectrum sequence b [i] [k] is made to correspond to c [i] [q] according to Equation 7.

ここで、λは０以上の所定の整数であり、周波数バンドの最低周波数を決めるパラメータである。また、周波数バンド数Ｑは（Ｎ／２−λ）以下である所定の値に設定する。この第１の計算方法は計算量が最も少なく簡便である利点を有している。しかし、周波数バンドの周波数間隔が等間隔となり、音楽で用いられている音階とは無関係となるため、音楽区間の検出精度は以下説明する第２の方法の方が高い。 Here, λ is a predetermined integer of 0 or more, and is a parameter that determines the lowest frequency of the frequency band. Further, the frequency band number Q is set to a predetermined value which is equal to or less than (N / 2−λ). This first calculation method has the advantage that the calculation amount is the smallest and simple. However, since the frequency intervals of the frequency bands are equal and irrelevant to the musical scale used in music, the second method described below has higher accuracy in detecting the music section.

周波数成分ｃ［ｉ］［ｑ］を計算する第２の方法は、数式８により音階に対応した周波数成分を算出する方法である。 The second method of calculating the frequency component c [i] [q] is a method of calculating the frequency component corresponding to the musical scale using Equation 8.

この方法では、音楽で用いられている音階の周波数に対応した周波数成分が得られるため、より高い精度で音楽区間を検出することが可能である。 In this method, since the frequency component corresponding to the frequency of the scale used in music is obtained, it is possible to detect the music section with higher accuracy.

ここで、ｚ［ｑ］［ｋ］（ｑ＝０〜Ｑ−１，ｋ＝０〜Ｎ／２−１）は、図４に示すような帯域特性を有するフィルタ群である。各フィルタの中心周波数は、音階の周波数に対応しており、中心周波数での減衰が少なく、中心周波数から離れるに伴い減衰が大きくなり、隣接するフィルタの中心周波数付近で通過特性が０になる特性を有している。同図（ａ）〜（ｄ）は、音階のＣ１音程をバンド０に対応させて、以降半音毎に１つのバンドに対応させて、最後に５オクターブ上のＢ６音程をバンドＱ−１に対応させた帯域特性を模式的に表している。同図（ａ）はＣ１に対応した周波数を通過させるフィルタ、同図（ｂ）は同図（ａ）より半音高いＣ＃１に対応した周波数を通過させるフィルタ、同図（ｃ）は最高音より半音低いＡ＃６に対応した周波数を通過させるフィルタ、そして同図（ｄ）は最高音のＢ６に対応した周波数を通過させるフィルタの帯域特性を示したものである。 Here, z [q] [k] (q = 0 to Q-1, k = 0 to N / 2-1) is a filter group having band characteristics as shown in FIG. The center frequency of each filter corresponds to the frequency of the scale, the attenuation at the center frequency is small, the attenuation increases with distance from the center frequency, and the pass characteristic becomes zero near the center frequency of the adjacent filter. have. Figures (a) to (d) show that the C1 pitch of the scale corresponds to band 0, then corresponds to one band for each semitone, and finally the B6 pitch above 5 octaves corresponds to band Q-1. The obtained band characteristics are schematically shown. (A) is a filter that passes a frequency corresponding to C1, FIG. (B) is a filter that passes a frequency corresponding to C # 1 that is a semitone higher than (a), and (c) is the highest sound. FIG. 4D shows the band characteristics of a filter that passes a frequency corresponding to A # 6, which is lower by a semitone, and a filter that passes a frequency corresponding to B6, which is the highest tone.

スペクトル系列ｂ［ｉ］［ｋ］は周波数間隔が等間隔であるのに対して、音階は高音部になるほど隣り合った音程間の周波数間隔が広がるので、フィルタ群ｚ［ｑ］［ｋ］の中心周波数もそれに対応して高音部ほど隣り合った中心周波数の間隔が広くなっている。例えば、図４（ａ）に示すｚ［０］［ｋ］と同図（ｂ）に示すｚ［１］［ｋ］との各中心周波数の差よりも、同図（ｃ）に示すｚ［Ｑ−２］［ｋ］と同図（ｄ）に示すｚ［Ｑ−１］［ｋ］との各中心周波数の差の方が大きい。また、各フィルタの帯域幅も、同様に高音部ほど広くなっている。例えば、図４（ｄ）に示すｚ［Ｑ−１］［ｋ］の帯域幅は、同図（ａ）に示すｚ［０］［ｋ］の帯域幅よりも広くなっている。 The spectrum sequence b [i] [k] has an equal frequency interval, whereas the frequency interval between adjacent pitches increases as the scale becomes higher, so that the filter group z [q] [k] Corresponding to the center frequency, the interval between the center frequencies adjacent to each other is increased as the treble part is increased. For example, z [0] [k] shown in FIG. 4 (a) and z [1] [k] shown in FIG. The difference in each center frequency between Q-2] [k] and z [Q-1] [k] shown in FIG. Similarly, the bandwidth of each filter becomes wider as the treble part increases. For example, the bandwidth of z [Q−1] [k] shown in FIG. 4D is wider than the bandwidth of z [0] [k] shown in FIG.

なお、図４に示した帯域特性の例は音階の各音程（半音）に一致した周波数バンドであるが、音階の各音程を更に細かく分割するような周波数バンドを得るようにしてもよい。また、中心周波数の設定を変更することにより、平均律、純正律、中全音律などの様々な音階に対応することができる。 The example of the band characteristics shown in FIG. 4 is a frequency band that matches each pitch (semitone) of the scale, but a frequency band that further divides each pitch of the scale may be obtained. Further, by changing the setting of the center frequency, it is possible to deal with various musical scales such as equal temperament, pure temperament, and all-medium temperament.

図２のフローチャートの説明に戻り、周波数分析部１２の演算処理回路１２ａは、ステップＳ１５０の処理の次に、フレーム番号を示す制御変数ｉの値を１増やす（ステップＳ１６０）。次に、演算処理回路１２ａは、フレーム番号を示す制御変数ｉの値がフレーム総数Ｍより小さいか否か判定する（ステップＳ１７０）。そして、制御変数ｉがフレーム総数Ｍより小さい場合はステップＳ１２０に移行し、フレーム総数Ｍ以上である場合はすべてのフレームについて処理を完了してとして周波数分析処理を終了する。 Returning to the description of the flowchart of FIG. 2, the arithmetic processing circuit 12a of the frequency analysis unit 12 increases the value of the control variable i indicating the frame number by 1 after the processing of step S150 (step S160). Next, the arithmetic processing circuit 12a determines whether or not the value of the control variable i indicating the frame number is smaller than the total number M of frames (step S170). When the control variable i is smaller than the total number M of frames, the process proceeds to step S120. When the control variable i is equal to or greater than the total number M of frames, the processing for all the frames is completed and the frequency analysis process is terminated.

以上の処理が終了した時点で、周波数分析部１２は、周波数成分ｃ［ｉ］［ｑ］（フレームｉ＝０〜Ｍ−１、バンドｑ＝０〜Ｑ−１）を行列データとして具備するメモリに記憶する。図５に、周波数分析部１２で作成された行列データを模式的に示す。同図では、フレームを横軸、周波数バンドを縦軸に示しており、成分強度の強い要素を黒い線分として表わしている。そして、同図（ａ）は、音楽の存在しない音声区間（例えば、アナウンス）の行列データを示しており、一定時間以上持続する周波数成分が少ないことを特徴として示している。一方、同図（ｂ）は、有音程楽器が発音している音楽区間を示しており、一定時間持続する周波数成分が多いことが特徴である。また、同図（ｂ）については、楽器の基本周波数の他に、基本周波数の整数倍の周波数を有する倍音周波数が存在することも特徴である。一般的な音楽では、発音されている音程の近傍の音程に対応する周波数成分は、相対的に小さい。本発明は、同図（ａ）に示すような音声区間を精度良く除外し、同図（ｂ）に示すような音楽区間を精度良く検出することを目的とするものである。 When the above processing is completed, the frequency analysis unit 12 includes a frequency component c [i] [q] (frame i = 0 to M−1, band q = 0 to Q−1) as matrix data. To remember. FIG. 5 schematically shows matrix data created by the frequency analysis unit 12. In the figure, the horizontal axis represents the frame and the vertical axis represents the frequency band, and the element having a strong component intensity is represented as a black line segment. FIG. 6A shows matrix data of a voice section (for example, announcement) in which no music exists, and is characterized by having a small number of frequency components that last for a certain period of time. On the other hand, FIG. 5B shows a music section in which a musical instrument is sounding, and is characterized by a large number of frequency components that last for a certain period of time. FIG. 2B is also characterized by the presence of a harmonic frequency having a frequency that is an integral multiple of the fundamental frequency in addition to the fundamental frequency of the musical instrument. In general music, the frequency component corresponding to the pitch in the vicinity of the pitch being played is relatively small. An object of the present invention is to accurately exclude a voice section as shown in FIG. 5A and detect a music section as shown in FIG.

次に、ピーク要素検出部１３の処理フローについて、図６のフローチャートを参照して説明する。ピーク要素検出部１３においては、有音程楽器により発音されている区間ではその音程に対応する周波数成分が大きくなり、又その音程に対応する周波数成分は近傍の周波数成分よりも相対的に大きいという特性を利用する。すなわち、ピーク要素検出部１３は、周波数分析部１２に行列データとして記憶されている周波数成分ｃ［ｉ］［ｑ］を読み出し、その値が所定の閾値以上である場合、その近傍の周波数成分と比較して相対的に大きい場合、又はその周波数成分と倍音関係にある倍音成分が該倍音成分の近傍の周波数成分と比較して相対的に大きい場合に、その周波数成分をピーク要素として検出する処理を実行する。 Next, the processing flow of the peak element detection unit 13 will be described with reference to the flowchart of FIG. The peak element detection unit 13 has a characteristic that the frequency component corresponding to the pitch is large in the section sounded by the pitched musical instrument, and the frequency component corresponding to the pitch is relatively larger than the nearby frequency component. Is used. That is, the peak element detection unit 13 reads the frequency component c [i] [q] stored as matrix data in the frequency analysis unit 12, and when the value is equal to or greater than a predetermined threshold, A process of detecting a frequency component as a peak element when the frequency component is relatively large compared with each other, or when a harmonic component having a harmonic relationship with the frequency component is relatively large compared with a frequency component in the vicinity of the harmonic component. Execute.

まず、ピーク要素検出部１３の演算処理回路１３ａは、探索を開始するフレーム番号を表す制御変数ｉを０に設定する（ステップＳ２１０）。次に、演算処理回路１３ａは、周波数バンドを示す制御変数ｑをピーク要素の対象範囲の最小のバンドＱ１に設定する（ステップＳ２２０）。ここで、Ｑ１は、後述する整数Ｇ２以上、且つ後述する整数Ｇ４以上、且つ後述する整数Ｑ２以下の所定の整数である。 First, the arithmetic processing circuit 13a of the peak element detection unit 13 sets a control variable i representing a frame number for starting a search to 0 (step S210). Next, the arithmetic processing circuit 13a sets the control variable q indicating the frequency band to the minimum band Q1 in the target range of the peak element (step S220). Here, Q1 is a predetermined integer that is greater than or equal to an integer G2 described later, greater than or equal to an integer G4 described later, and less than or equal to an integer Q2 described later.

次に、演算処理回路１３ａは、周波数成分ｃ［ｉ］［ｑ］がピーク要素であるか否かを判定する（ステップＳ２３０）。周波数成分ｃ［ｉ］［ｑ］がピーク要素であると判定した場合は、ステップＳ２４０に移行し、ピーク要素でないと判定した場合は、ステップＳ２５０に移行する。ステップＳ２３０において有効な成分を判定する具体的な方法としては、以下に説明する７つの方法のいずれか、または複数の方法の組合せを用いることができる。 Next, the arithmetic processing circuit 13a determines whether or not the frequency component c [i] [q] is a peak element (step S230). When it is determined that the frequency component c [i] [q] is a peak element, the process proceeds to step S240, and when it is determined that the frequency component is not a peak element, the process proceeds to step S250. As a specific method for determining an effective component in step S230, any of the seven methods described below or a combination of a plurality of methods can be used.

ピーク要素を判定する第１の方法は、数式９を用いて、ｃ［ｉ］［ｑ］が閾値α［ｑ］以上である場合にピーク要素であると判定する方法である。 The first method for determining the peak element is a method for determining that the peak element is a peak element when c [i] [q] is equal to or greater than the threshold value α [q] using Equation 9.

閾値α［ｑ］は、以下に示す２つの方法のうちいずれかを用いて設定する。すなわち、閾値α［ｑ］を決定する第１の方法は、予め設定された定数を用いて決定する方法であり、この方法によれば、演算量が最も少なく簡便である。また、閾値α［ｑ］を決定する第２の方法は、数式１０に示すように全フレーム（Ｍ個）のバンド毎の周波数成分の平均値を用いる方法である。ここで、βは予め設定されている定数である。 The threshold value α [q] is set using one of the following two methods. That is, the first method of determining the threshold value α [q] is a method of determining using a preset constant, and according to this method, the calculation amount is the smallest and simple. The second method for determining the threshold value α [q] is a method using an average value of frequency components for each band of all frames (M) as shown in Expression 10. Here, β is a preset constant.

ピーク要素を判定する第２の方法は、数式１１を用いる方法である。 The second method for determining the peak element is a method using Formula 11.

これは、バンドｑ（中心バンドとも言う。）と同一時間にある所定個数の周波数バンド（近傍バンドとも言う。）の周波数成分の最大値を中心バンドの両側について各々求め、その２つの最大値よりもｃ［ｉ］［ｑ］が大きい場合に、ピーク要素であると判定する方法である。数式１１において、Ｇ１及びＧ２はＧ１≦Ｇ２を満たす整数であり、周波数分析部１２において、音階の各音程（半音）に各周波数バンドを一致させた場合は、Ｇ１＝Ｇ２＝１とすればよい。また、音階の各音程を更に細かく分割するような周波数バンドを用いた場合には、隣接する音程の周波数に対応させるか、又は隣接する音程間の使用されない周波数に対応させるようにＧ１及びＧ２を設定する。また、∩はＡＮＤ条件を意味し、ｍａｘは引数の中の最大値を返す関数を意味する。 The maximum value of the frequency components of a predetermined number of frequency bands (also referred to as neighboring bands) at the same time as the band q (also referred to as the center band) is obtained for each side of the center band, and the two maximum values are obtained. Is also a method of determining that it is a peak element when c [i] [q] is large. In Equation 11, G1 and G2 are integers satisfying G1 ≦ G2, and when the frequency analysis unit 12 matches each frequency band with each pitch (semitone) of the scale, G1 = G2 = 1 may be set. . Further, when a frequency band that further divides each pitch of the scale is used, G1 and G2 are set so as to correspond to frequencies of adjacent pitches or to correspond to frequencies that are not used between adjacent pitches. Set. Also, ∩ means an AND condition, and max means a function that returns the maximum value among the arguments.

このピーク要素を判定する第２の方法は、事前に設定する必要のある閾値を用いないため、特に分析対象の音響信号レベルが大きく変動するような場合に好適である。また、音楽以外の区間を音楽区間と誤検出する率を低く抑えたい場合や、周波数分析部１２で用いたバンドの周波数分解能が比較的高い場合に適している。 This second method for determining the peak element does not use a threshold value that needs to be set in advance, and is particularly suitable when the acoustic signal level to be analyzed varies greatly. Further, this is suitable when it is desired to keep the rate of erroneous detection of sections other than music as music sections low, or when the frequency resolution of the band used in the frequency analysis unit 12 is relatively high.

ピーク要素を判定する第３の方法は、数式１２を用いる方法である。 A third method for determining the peak element is a method using Formula 12.

これは、近傍バンドの周波数成分の最小値を中心バンドの両側について各々求め、その２つの最小値よりもｃ［ｉ］［ｑ］が大きい場合に、ピーク要素と判定する方法である。数式１２におけるＧ１，Ｇ２，及び∩は前述したものと同じである。また、ｍｉｎは引数の中の最小値を返す関数を意味する。このピーク要素を判定する第３の方法は、事前に設定する必要のある閾値を用いないため、分析対象の音響信号レベルが大きく変動するような場合に適している。また、音楽区間の検出漏れを減らしたい場合や、周波数分析部１２で用いたバンドの周波数分解能が比較的低い場合に適している。 This is a method in which the minimum value of the frequency component of the neighboring band is obtained for both sides of the center band, and when c [i] [q] is larger than the two minimum values, it is determined as a peak element. G1, G2, and ∩ in Expression 12 are the same as those described above. “Min” means a function that returns the minimum value among the arguments. The third method for determining the peak element is suitable for the case where the acoustic signal level to be analyzed largely fluctuates because a threshold that needs to be set in advance is not used. Further, it is suitable when it is desired to reduce the detection omission of the music section or when the frequency resolution of the band used in the frequency analysis unit 12 is relatively low.

ピーク要素を判定する第４の方法は、数式１３を用いる方法である。 A fourth method for determining the peak element is a method using Formula 13.

これは、近傍バンドの周波数成分の総和に一定比率γを乗じた値よりも中心バンドの周波数成分ｃ［ｉ］［ｑ］の方が大きい場合にピーク要素である判定する方法である。ここで、Ｇ１及びＧ２は前述したものと同じであり、γは定数である。 This is a method for determining a peak element when the frequency component c [i] [q] of the center band is larger than the value obtained by multiplying the sum of the frequency components of neighboring bands by a certain ratio γ. Here, G1 and G2 are the same as those described above, and γ is a constant.

有音程楽器が発音している区間においては、一般的には、基本周波数の他に基本周波数のｄ倍の周波数を持つｄ次倍音成分が存在する。以下に説明するピーク要素を判定する第５〜７の方法ではこの特性を利用する。 In a section where a musical instrument is sounding, generally there is a d-order harmonic component having a frequency d times the fundamental frequency in addition to the fundamental frequency. This characteristic is used in the fifth to seventh methods for determining the peak element described below.

ピーク要素を判定する第５の方法は、数式１４を用いる方法である。 A fifth method for determining the peak element is a method using Expression 14.

これは、中心バンドのｄ次の倍音（２≦ｄ≦Ｄ、Ｄは２以上の整数）に対応する周波数バンド（倍音バンドと言う。）を特定し、倍音バンドの近傍の周波数成分の最大値を倍音バンドの両側についてそれぞれ求め、ｄ＝２〜Ｄの全てについて２つの最大値よりも倍音バンドの周波数成分が大きい場合に、ｃ［ｉ］［ｑ］をピーク要素とする方法である。ここで、Ｇ３及びＧ４はＧ３≦Ｇ４を満たす整数であり、周波数分析部１２において、音階の各音程（半音）に各周波数バンドを一致させた場合は、Ｇ３＝Ｇ４＝１とすればよい。また、音階の各音程を更に細かく分割するような周波数バンドを用いた場合には、隣接する音程の周波数に対応させるか、又は隣接する音程間の使用されない周波数に対応させるようにＧ３及びＧ４を設定する。なお、関数ｈ（ｄ，ｑ）は、バンドｑのｄ倍の周波数（ｄ次倍音）に対応するバンド番号を返す関数である。 This specifies a frequency band (referred to as a harmonic band) corresponding to a d-order harmonic (2 ≦ d ≦ D, D is an integer of 2 or more) of the center band, and the maximum value of frequency components in the vicinity of the harmonic band. Is obtained for both sides of the harmonic band, and c [i] [q] is a peak element when the frequency components of the harmonic band are larger than the two maximum values for all of d = 2 to D. Here, G3 and G4 are integers satisfying G3 ≦ G4. When the frequency analysis unit 12 matches each frequency band with each pitch (semitone) of the scale, G3 = G4 = 1 may be set. In addition, when a frequency band that further divides each pitch of the scale is used, G3 and G4 are set so as to correspond to frequencies of adjacent pitches or to correspond to frequencies that are not used between adjacent pitches. Set. The function h (d, q) is a function that returns a band number corresponding to a frequency (d-order overtone) that is d times the band q.

ピーク要素を判定する第６の方法は、数式１５を用いる方法である。 A sixth method for determining the peak element is a method using Formula 15.

これは、中心バンドのｄ次の倍音（２≦ｄ≦Ｄ、Ｄは２以上の整数）に対応する周波数バンド（倍音バンド）を特定し、倍音バンドの近傍の周波数成分の最小値を倍音バンドの両側について各々求め、ｄ＝２〜Ｄの全てについて、２つの最小値よりも倍音バンドの周波数成分が大きい場合に、ｃ［ｉ］［ｑ］をピーク要素とする方法である。ここで、Ｇ３，Ｇ４，及びｈ（ｄ，ｑ）は前述したものと同じである。 This specifies the frequency band (harmonic band) corresponding to the d-order harmonic (2 ≦ d ≦ D, D is an integer of 2 or more) of the center band, and the minimum value of the frequency components in the vicinity of the harmonic band is determined as the harmonic band. This is a method in which c [i] [q] is a peak element when the frequency components of the harmonic band are larger than the two minimum values for all of d = 2 to D. Here, G3, G4, and h (d, q) are the same as those described above.

ピーク要素を判定する第７の方法は、数式１６を用いる方法である。 A seventh method for determining the peak element is a method using Expression 16.

これは、ｄ次倍音バンド（２≦ｄ≦Ｄ、Ｄは２以上の整数）の近傍のバンドの周波数成分の総和に一定比率を乗じた値よりもｄ次倍音バンドの周波数成分の方が大きい場合にｃ［ｉ］［ｑ］をピーク要素とする方法である。ここで、Ｇ３，Ｇ４，及びｈ（ｄ，ｑ）は前述と同じであり、η［ｄ］は倍音の次数によって決まる所定の定数（比率）である。 This is because the frequency component of the d-order harmonic band is larger than the value obtained by multiplying the sum of the frequency components of the bands in the vicinity of the d-order harmonic band (2 ≦ d ≦ D, D is an integer of 2 or more) by a certain ratio. In this case, c [i] [q] is a peak element. Here, G3, G4, and h (d, q) are the same as described above, and η [d] is a predetermined constant (ratio) determined by the order of harmonics.

なお、上述した数式１１〜１６においては、フレームｉの周波数成分のみを用いて計算しているが、これに限定されることなく、例えば、ｉ＋１、ｉ−１といったフレームｉと時間的に近い近傍のフレームをも用いて計算するようにしてもよい。 In addition, in the above-described formulas 11 to 16, the calculation is performed using only the frequency component of the frame i. However, the present invention is not limited to this. For example, i + 1, i−1 and the like that are close in time to the frame i It is also possible to calculate using this frame.

さらには、上述したピーク要素を判定する第１〜７の方法を適宜ＡＮＤ条件で組み合わせて、全ての条件を満たす場合にピーク要素であると判定するようにしてもよい。具体的には、例えば、ピーク要素を判定する第１，２，及び５の方法を、数式１７に示すようにＡＮＤ条件で組み合わせて判定を行ってもよい。 Furthermore, the above-described first to seventh methods for determining a peak element may be appropriately combined with an AND condition, and may be determined as a peak element when all the conditions are satisfied. Specifically, for example, the determination may be performed by combining the first, second, and fifth methods for determining the peak element using an AND condition as shown in Expression 17.

特に、基本周波数に関する条件である第２〜４の方法のうちのいずれかと、倍音成分に関する条件である第５〜７の方法のうちのいずれかとを、ＡＮＤ条件で組み合わせて判定することによって、より高い精度で音楽区間を検出することが可能になる。 In particular, any one of the second to fourth methods that are conditions related to the fundamental frequency and any one of the fifth to seventh methods that are conditions related to the harmonic component are determined in combination with an AND condition. It becomes possible to detect a music section with high accuracy.

図６のピーク要素検出部１３の処理の説明に戻り、ステップＳ２３０の次に、演算処理回路１３ａは、ピーク要素検出部１３が具備するピーク要素メモリに、ピーク要素であると判定したフレーム番号ｉとバンド番号ｑと周波数成分ｃ［ｉ］［ｑ］とを対応させたデータ（ｉ，ｑ，ｃ［ｉ］［ｑ］）を図１２に示すような形式で記憶する（ステップＳ２４０）。 Returning to the description of the processing of the peak element detection unit 13 in FIG. 6, after step S230, the arithmetic processing circuit 13a determines the frame number i determined as the peak element in the peak element memory included in the peak element detection unit 13. And the data (i, q, c [i] [q]) in which the band number q and the frequency component c [i] [q] are associated with each other are stored in the format shown in FIG. 12 (step S240).

次に、演算処理回路１３ａは、周波数バンドを表わす制御変数ｑの値を１増やす（ステップＳ２５０）。次に、演算処理回路１３ａは、制御変数ｑがピーク要素の対象範囲の最大のバンドＱ２以下であるか否かを判定する（ステップＳ２６０）。但し、Ｑ２は、Ｑ２＜Ｑ−Ｇ２、且つＱ２＜Ｑ−Ｇ４、且つＱ１以上の所定の整数である。制御変数ｑがＱ２以下である場合は、ステップＳ２３０に戻り、そうでない場合はステップＳ２７０に進む。 Next, the arithmetic processing circuit 13a increases the value of the control variable q representing the frequency band by 1 (step S250). Next, the arithmetic processing circuit 13a determines whether or not the control variable q is equal to or less than the maximum band Q2 of the target range of the peak element (step S260). However, Q2 is a predetermined integer of Q2 <Q−G2, Q2 <Q−G4, and Q1 or more. If the control variable q is Q2 or less, the process returns to step S230, and if not, the process proceeds to step S270.

次に、演算処理回路１３ａは、フレーム番号を示す制御変数ｉの値を１増やす（ステップＳ２７０）。次に、演算処理回路１３ａは、フレーム番号を示す制御変数ｉの値がフレーム総数Ｍ未満であるか否かを判定する（ステップＳ２８０）。制御変数ｉがフレーム総数Ｍ未満である場合はステップＳ２２０に移行し、フレーム総数Ｍ以上である場合はピーク要素検出部１３の処理を終了する。 Next, the arithmetic processing circuit 13a increases the value of the control variable i indicating the frame number by 1 (step S270). Next, the arithmetic processing circuit 13a determines whether or not the value of the control variable i indicating the frame number is less than the total number M of frames (step S280). When the control variable i is less than the total number M of frames, the process proceeds to step S220. When the control variable i is equal to or greater than the total number M of frames, the processing of the peak element detection unit 13 is terminated.

次に、音程領域検出部１４の処理フローを図７のフローチャートを参照して説明する。音程領域検出部１４では、ピーク要素検出部１３のピーク要素メモリを参照して処理を実行する。まず、音程領域検出部１４の演算処理回路１４ａは、フレーム番号を示す制御変数ｐを０に設定する（ステップＳ３１０）。次に、演算処理回路１４ａは、周波数バンドを示す制御変数ｑをピーク要素検出部１３で用いた定数Ｑ１に設定する（ステップＳ３２０）。 Next, the processing flow of the pitch area detection unit 14 will be described with reference to the flowchart of FIG. The pitch region detection unit 14 executes processing with reference to the peak element memory of the peak element detection unit 13. First, the arithmetic processing circuit 14a of the pitch area detection unit 14 sets a control variable p indicating a frame number to 0 (step S310). Next, the arithmetic processing circuit 14a sets the control variable q indicating the frequency band to the constant Q1 used in the peak element detection unit 13 (step S320).

次に、演算処理回路１４ａは、ピーク要素検出部１３のピーク要素メモリを参照して、フレーム番号がｐ以上、且つ（ｐ＋Ｈ）未満であり、且つ周波数バンドがｑであるピーク要素の個数Ｃを計数する（ステップＳ３３０）。ここで、Ｈは予め設定された整数である。 Next, the arithmetic processing circuit 14a refers to the peak element memory of the peak element detection unit 13, and calculates the number C of peak elements whose frame number is p or more and less than (p + H) and whose frequency band is q. Count (step S330). Here, H is a preset integer.

ここでさらに、周波数の微小なゆらぎを許容してピーク要素の個数Ｃを計数する処理を行ってもよい。例えば、ｃ［ｉ−１］［ｑ］及びｃ［ｉ＋１］［ｑ］がピーク要素であり、ｃ［ｉ］［ｑ］がピーク要素でない場合に、ｃ［ｉ］［ｑ−１］又はｃ［ｉ］［ｑ＋１］がピーク要素であれば周波数の微小なゆらぎがあると見なし、ｃ［ｉ］［ｑ］をピーク要素と見なして個数Ｃを計数するといった処理を行う。 Here, a process of counting the number C of peak elements while allowing a minute fluctuation in frequency may be performed. For example, when c [i-1] [q] and c [i + 1] [q] are peak elements and c [i] [q] is not a peak element, c [i] [q-1] or c If [i] [q + 1] is a peak element, it is considered that there is a minute frequency fluctuation, and c [i] [q] is regarded as a peak element and the number C is counted.

次に、演算処理回路１４ａは、ステップＳ３３０で計数されたピーク要素の個数ＣがＣ１以上であるか否かを判定する（ステップＳ３４０）。ここで、Ｃ１は０＜Ｃ１≦Ｈを満たす整数である。ピーク要素の個数ＣがＣ１以上である場合は、フレーム番号がｐ以上で（ｐ＋Ｈ）未満であり、且つ周波数バンドがｑの領域を音程領域と判定してステップＳ３５０に進む。データ個数ＣがＣ１未満である場合は、ステップＳ３６０に移行する。 Next, the arithmetic processing circuit 14a determines whether or not the number C of peak elements counted in step S330 is C1 or more (step S340). Here, C1 is an integer that satisfies 0 <C1 ≦ H. When the number C of peak elements is equal to or greater than C1, the region where the frame number is greater than or equal to p and less than (p + H) and the frequency band is q is determined as a pitch region, and the process proceeds to step S350. If the data number C is less than C1, the process proceeds to step S360.

次に、演算処理回路１４ａは、音程領域の先頭のフレーム番号ｐ、音程領域の最後のフレーム番号ｐ＋Ｈ−１、及びバンド番号ｑを対応させて、音程領域検出手段１４が具備する音程領域メモリに図１３に示すような形式で記憶する（ステップＳ３５０）。 Next, the arithmetic processing circuit 14a associates the first frame number p of the pitch region, the last frame number p + H-1 of the pitch region, and the band number q with each other in the pitch region memory included in the pitch region detecting means 14. Store in the format as shown in FIG. 13 (step S350).

次に、演算処理回路１４ａは、周波数バンドを表わす制御変数ｑの値を１増やす（ステップＳ３６０）。次に、演算処理回路１４ａは、制御変数ｑが定数Ｑ２以下であるか否かを判定する（ステップＳ３７０）。ここで、Ｑ２はピーク要素検出部１３で用いた所定の整数である。制御変数ｑがＱ２以下である場合は、ステップＳ３３０に移行し、そうでない場合はステップＳ３８０に移行する。 Next, the arithmetic processing circuit 14a increases the value of the control variable q representing the frequency band by 1 (step S360). Next, the arithmetic processing circuit 14a determines whether or not the control variable q is equal to or less than the constant Q2 (step S370). Here, Q2 is a predetermined integer used in the peak element detection unit 13. When the control variable q is Q2 or less, the process proceeds to step S330, and otherwise, the process proceeds to step S380.

次に、演算処理回路１４ａは、フレーム番号を示す制御変数ｐの値をＨだけ増やす（ステップＳ３８０）。次に、演算処理回路１４ａは、フレーム番号を示す制御変数ｐの値が（Ｍ−Ｈ）未満であるか否かを判定する（ステップＳ３９０）。ここで、Ｍはフレーム総数である。制御変数ｐが（Ｍ−Ｈ）未満である場合は、ステップＳ３２０に移行する。制御変数ｐが（Ｍ−Ｈ）以上である時は、音程領域検出部１４の処理を終了する。 Next, the arithmetic processing circuit 14a increases the value of the control variable p indicating the frame number by H (step S380). Next, the arithmetic processing circuit 14a determines whether or not the value of the control variable p indicating the frame number is less than (M−H) (step S390). Here, M is the total number of frames. When the control variable p is less than (M−H), the process proceeds to step S320. When the control variable p is equal to or greater than (M−H), the process of the pitch range detection unit 14 is terminated.

次に、音楽区間検出部１５の処理フローを図８のフローチャートを参照して説明する。音楽区間検出部１５では、音程領域検出部１４の音程領域メモリを参照して処理を実行する。まず、音楽区間検出部１５の演算処理回路１５ａは、フレーム番号を示す制御変数ｉを０に設定する（ステップＳ４１０）。 Next, the processing flow of the music section detection unit 15 will be described with reference to the flowchart of FIG. The music section detection unit 15 executes processing with reference to the pitch region memory of the pitch region detection unit 14. First, the arithmetic processing circuit 15a of the music section detecting unit 15 sets a control variable i indicating a frame number to 0 (step S410).

次に、演算処理回路１５ａは、音程領域検出部１４の音程領域メモリを参照して、音程領域開始位置がｉ以上で、且つ音程領域終了位置が（ｉ＋Ｊ）未満である音程領域の個数Ｅを計数する（ステップＳ４２０）。ここで、Ｊは、検出する音楽区間の最小の長さを決定する定数であり、Ｊ≧Ｈを満たす整数である。通常は、Ｈの数倍〜数十倍の長さに設定すればよい。 Next, the arithmetic processing circuit 15a refers to the pitch region memory of the pitch region detection unit 14, and calculates the number E of pitch regions whose pitch region start position is i or more and whose pitch region end position is less than (i + J). Count (step S420). Here, J is a constant that determines the minimum length of the music section to be detected, and is an integer that satisfies J ≧ H. Usually, the length may be set to several times to several tens of times H.

次に、演算処理回路１５ａは、ステップＳ４２０でカウントされた音程領域の個数ＥがＥ１以上であるか否かを判定する（ステップＳ４３０）。ここで、Ｅ１は１以上の整数である。ＥがＥ１以上である場合は、フレーム番号がｉ以上で（ｉ＋Ｊ−１）以下の区間を音楽区間と判定してステップＳ４４０に移行する。一方、ＥがＥ１未満である場合はステップＳ４５０に移行する。 Next, the arithmetic processing circuit 15a determines whether or not the number E of pitch ranges counted in step S420 is equal to or greater than E1 (step S430). Here, E1 is an integer of 1 or more. If E is equal to or greater than E1, the section having a frame number greater than or equal to i and equal to or less than (i + J−1) is determined as a music section, and the process proceeds to step S440. On the other hand, if E is less than E1, the process proceeds to step S450.

次に、演算処理回路１５ａは、音楽区間の先頭のフレーム番号ｉと、音楽区間の最後のフレーム番号（ｉ＋Ｊ−１）とを対応させて、図１４に示すような形式で音楽区間検出手段１５が具備する音楽区間メモリに記憶する（ステップＳ４４０）。 Next, the arithmetic processing circuit 15a associates the first frame number i of the music section with the last frame number (i + J-1) of the music section, and forms the music section detecting means 15 in the format shown in FIG. (Step S440).

次に、演算処理回路１５ａは、フレーム番号を示す制御変数ｉの値をＪだけ増やす（ステップＳ４５０）。次に、演算処理回路１５ａは、フレーム番号を示す制御変数ｉの値が（Ｍ−Ｊ）未満であるか否かを判定する（ステップＳ４６０）。ここで、Ｍはフレーム総数である。制御変数ｉが（Ｍ−Ｊ）未満である場合は、ステップＳ４２０に移行する。一方、制御変数ｉが（Ｍ−Ｊ）以上である場合はステップＳ４７０に移行する。 Next, the arithmetic processing circuit 15a increases the value of the control variable i indicating the frame number by J (step S450). Next, the arithmetic processing circuit 15a determines whether or not the value of the control variable i indicating the frame number is less than (M−J) (step S460). Here, M is the total number of frames. If the control variable i is less than (M−J), the process proceeds to step S420. On the other hand, if the control variable i is equal to or greater than (M−J), the process proceeds to step S470.

次に、演算処理回路１５ａは、音楽区間メモリのデータを音楽区間情報３として出力し、音楽区間検出部１５の処理を終了する（ステップＳ４７０）。なお、ある音楽区間の最後のフレーム番号と別の音楽区間の先頭のフレーム番号が連続している場合に（例えば、図１４の２行目と３行目）、これらを統合して１つの音楽区間とする処理を行うようにしてもよい。 Next, the arithmetic processing circuit 15a outputs the music section memory data as the music section information 3, and ends the processing of the music section detection unit 15 (step S470). When the last frame number of one music section and the first frame number of another music section are consecutive (for example, the second and third lines in FIG. 14), these are integrated into one music. You may make it perform the process made into an area.

ここで、図５を用いて従来技術との違いについて説明する。従来技術では、隣接する周波数成分の差分値（１次差分又は２次差分）を所定の時間分加算した総和が所定の閾値よりも大きい場合に音楽区間と判定するものである。このため、ある瞬間の特定の周波数成分が大きい場合には、それに影響されて差分値及び総和が大きくなるため、図５（ａ）のような音声区間であっても音楽区間として誤判定される場合があった。このように、従来技術では、多様な入力ソースや多様なジャンルの音楽に対して、十分な精度で音楽区間を検出することが困難である。 Here, the difference from the prior art will be described with reference to FIG. In the prior art, when a sum total obtained by adding a difference value (primary difference or secondary difference) of adjacent frequency components for a predetermined time is larger than a predetermined threshold, the music section is determined. For this reason, when a specific frequency component at a certain moment is large, the difference value and the sum are increased due to the large frequency component. Therefore, even a voice segment as shown in FIG. 5A is erroneously determined as a music segment. There was a case. As described above, in the conventional technology, it is difficult to detect music sections with sufficient accuracy for various input sources and various genres of music.

一方、本実施例では、前述したように、隣接する周波数成分の差分値の総和を用いることなく音楽区間を判定するものである。ピーク要素検出部１３において、各フレーム、周波数バンド毎にピーク要素か否かの判定を行い、音程領域検出部１４において、所定時間Ｈ内に同一周波数バンドのピーク要素がＣ１個以上ある場合に音程領域と判定し、音楽区間検出部１５において、所定時間Ｊ内の音程領域の個数がＥ１以上である場合に、音楽区間と判定する。これにより、図５（ａ）のような音声区間において、ある瞬間の特定の周波数成分が大きい場合であっても、音程領域検出部１４において音程領域と判定され難いため、音楽区間と誤判定されることを抑制することができる。 On the other hand, in this embodiment, as described above, the music section is determined without using the sum of the difference values of adjacent frequency components. The peak element detector 13 determines whether or not the peak element is present for each frame and frequency band. The pitch region detector 14 determines the pitch when there are C1 or more peak elements in the same frequency band within the predetermined time H. When the number of pitch areas within the predetermined time J is equal to or greater than E1, the music section detection unit 15 determines that it is a music section. Accordingly, even if a specific frequency component at a certain moment is large in the voice section as shown in FIG. 5A, it is difficult to determine the pitch area by the pitch area detection unit 14, and thus it is erroneously determined as a music section. Can be suppressed.

また、従来技術で用いる周波数成分は、音楽の音階とは異なるため、本来は隣接する周波数成分との差分が、音階上で隣接する音程間での差分に相当するのが望ましいのに対し、実際には同一音程同士の差分となったり、離れた音程との差分になったりする場合があった。このため、有音程楽器のみが存在する図５（ｂ）のような区間であっても、音程によっては検出精度が十分でない場合があった。しかしながら、本実施例では、周波数分析部１２において、音楽の音階に対応する周波数成分を算出することができ、隣接する周波数成分を用いた演算処理を、隣接する音程間の処理操作に直接対応させることができるため、音楽区間の検出精度を高くすることができる。 In addition, since the frequency component used in the prior art is different from the musical scale, it is desirable that the difference between the adjacent frequency components is equivalent to the difference between the adjacent pitches on the scale, whereas In some cases, a difference between the same pitches or a difference from a distant pitch may occur. For this reason, even in the section as shown in FIG. 5B where only a musical instrument is present, the detection accuracy may not be sufficient depending on the pitch. However, in this embodiment, the frequency analysis unit 12 can calculate the frequency component corresponding to the musical scale, and the arithmetic processing using the adjacent frequency components is directly associated with the processing operation between the adjacent pitches. Therefore, the detection accuracy of the music section can be increased.

本発明の第２の実施例の全体構成は図１と同一である。そして、音響信号入力部１１、周波数分析部１２、ピーク要素検出部１３、及び音楽区間検出部１５の各処理動作は、第１の実施例と同一であるためその説明を省略する。 The overall configuration of the second embodiment of the present invention is the same as FIG. Since the processing operations of the acoustic signal input unit 11, the frequency analysis unit 12, the peak element detection unit 13, and the music section detection unit 15 are the same as those in the first embodiment, description thereof is omitted.

本実施例における音程領域検出部１４の処理フローを図９のフローチャートを参照して説明する。音程領域検出部１４では、ピーク要素検出部１３のピーク要素メモリを参照して処理を実行する。まず、音程領域検出部１４の演算処理回路１４ａは、フレーム番号を示す制御変数ｐを０に設定する（ステップＳ３１０）。次に、演算処理回路１４ａは、周波数バンドを示す制御変数ｑをピーク要素検出部１３で用いた定数Ｑ１に設定する（ステップＳ３２０）。 The processing flow of the pitch area detection unit 14 in the present embodiment will be described with reference to the flowchart of FIG. The pitch region detection unit 14 executes processing with reference to the peak element memory of the peak element detection unit 13. First, the arithmetic processing circuit 14a of the pitch area detection unit 14 sets a control variable p indicating a frame number to 0 (step S310). Next, the arithmetic processing circuit 14a sets the control variable q indicating the frequency band to the constant Q1 used in the peak element detection unit 13 (step S320).

次に、演算処理回路１４ａは、ピーク要素検出部１３のピーク要素メモリを参照して、フレーム番号がｐ以上で、且つ（ｐ＋Ｈ）未満であり、周波数バンドがｑであるピーク要素の個数Ｃを計数する（ステップＳ３３０）。ここで、Ｈは予め設定された１以上の整数である。 Next, the arithmetic processing circuit 14a refers to the peak element memory of the peak element detection unit 13, and calculates the number C of peak elements whose frame number is greater than or equal to p and less than (p + H) and whose frequency band is q. Count (step S330). Here, H is a preset integer of 1 or more.

次に、演算処理回路１４ａは、ピーク要素検出部１３のピーク要素メモリを参照して、数式１８に示すように、フレーム番号がｐ以上で、且つ（ｐ＋Ｈ）未満であり、周波数バンドがｑであるピーク要素ｃ［ｉ］［ｑ］の値の総和Ｒを算出する（ステップＳ３３１）。 Next, the arithmetic processing circuit 14a refers to the peak element memory of the peak element detector 13, and as shown in Equation 18, the frame number is greater than or equal to p and less than (p + H), and the frequency band is q. A total sum R of values of a certain peak element c [i] [q] is calculated (step S331).

次に、演算処理回路１４ａは、ステップＳ３３１において計数したデータ個数ＣがＣ１以上であり、且つステップＳ３３２において算出したＲが所定の値Ｒ１以上であるか否かを判定する（ステップＳ３３２）。ここで、Ｃ１は０＜Ｃ１≦Ｈを満たす整数である。データ個数ＣがＣ１以上であり、且つＲがＲ１以上である場合は、フレーム番号がｐ以上で、且つ（ｐ＋Ｈ）未満であり、周波数バンドがｑの領域を音程領域であると判定し、ステップＳ３５０に移行する。そうでない場合はステップＳ３６０に移行する。 Next, the arithmetic processing circuit 14a determines whether or not the number of data C counted in step S331 is equal to or greater than C1 and R calculated in step S332 is equal to or greater than a predetermined value R1 (step S332). Here, C1 is an integer that satisfies 0 <C1 ≦ H. When the number of data C is C1 or more and R is R1 or more, it is determined that the region where the frame number is p or more and less than (p + H) and the frequency band is q is the pitch region. The process proceeds to S350. Otherwise, the process proceeds to step S360.

次に、演算処理回路１４ａは、周波数バンドｑと、音程領域の先頭のフレーム番号ｐと、音程領域の最後のフレーム番号ｐ＋Ｈ−１とを対応させて、音程領域検出手段１４が具備する音程領域メモリに図１３に示すような形式で記憶する（ステップＳ３５０）。 Next, the arithmetic processing circuit 14a associates the frequency band q, the first frame number p of the pitch range, and the last frame number p + H-1 of the pitch range, and the pitch range included in the pitch range detection means 14 is provided. The data is stored in the memory in the format as shown in FIG. 13 (step S350).

次に、演算処理回路１４ａは、周波数バンドを表わす制御変数ｑの値を１増やす（ステップＳ３６０）。次に、演算処理回路１４ａは、制御変数ｑが定数Ｑ２以下であるか否かを判定する（ステップＳ３７０）。ここで、Ｑ２は、ピーク要素検出部１３で用いた所定の整数である。制御変数ｑがＱ２以下である場合は、ステップＳ３３０に移行し、そうでない場合はステップＳ３８０に移行する。 Next, the arithmetic processing circuit 14a increases the value of the control variable q representing the frequency band by 1 (step S360). Next, the arithmetic processing circuit 14a determines whether or not the control variable q is equal to or less than the constant Q2 (step S370). Here, Q2 is a predetermined integer used in the peak element detection unit 13. When the control variable q is Q2 or less, the process proceeds to step S330, and otherwise, the process proceeds to step S380.

次に、演算処理回路１４ａは、フレーム番号を示す制御変数ｐの値をＨだけ増やす（ステップＳ３８０）。次に、演算処理回路１４ａは、フレーム番号を示す制御変数ｐの値が（Ｍ−Ｈ）未満であるか否かを判定する（ステップＳ３９０）。ここで、Ｍはフレーム総数である。制御変数ｐが（Ｍ−Ｈ）未満である場合は、ステップＳ３２０に移行し、制御変数ｐが（Ｍ−Ｈ）以上である場合は、音程領域検出部１４の処理を終了する。 Next, the arithmetic processing circuit 14a increases the value of the control variable p indicating the frame number by H (step S380). Next, the arithmetic processing circuit 14a determines whether or not the value of the control variable p indicating the frame number is less than (M−H) (step S390). Here, M is the total number of frames. When the control variable p is less than (M−H), the process proceeds to step S320, and when the control variable p is equal to or greater than (M−H), the processing of the pitch region detection unit 14 is terminated.

本実施例では、音程領域検出部１４のステップＳ３３２の処理において、所定時間Ｈ内に同一周波数バンドのピーク要素がＣ１個以上あり、且つピーク要素の値の総和Ｒが所定値Ｒ１以上である場合に音程領域であると判定しているので、音声区間を音楽区間と誤判定する可能性が更に少なくなる。 In the present embodiment, in the process of step S332 of the pitch region detection unit 14, there are C1 or more peak elements in the same frequency band within the predetermined time H, and the sum R of the peak element values is the predetermined value R1 or more. Therefore, the possibility that the voice section is erroneously determined as the music section is further reduced.

本発明の第３の実施例の全体の構成は図１同一である。そして、そして、音響信号入力部１１、周波数分析部１２、及びピーク要素検出部１３の各処理動作は、第１の実施例と同一であるためその説明を省略する。 The overall configuration of the third embodiment of the present invention is the same as that shown in FIG. And since each processing operation of the acoustic signal input unit 11, the frequency analysis unit 12, and the peak element detection unit 13 is the same as that of the first embodiment, the description thereof is omitted.

本実施例における音程領域検出部１４の処理フローを図１０に示すフローチャートに基づいて説明する。但し、同図におけるステップＳ３５１以外のステップの処理については第２の実施例と同一であるため、その説明を省略してステップＳ３５１のみについて説明する。 The processing flow of the pitch area detection unit 14 in the present embodiment will be described based on the flowchart shown in FIG. However, since the processing of steps other than step S351 in the figure is the same as that of the second embodiment, the description thereof is omitted and only step S351 is described.

同図におけるステップＳ３５１では、音程領域検出部１４の演算処理回路１４ａは、周波数バンドｑ、音程領域の先頭のフレーム番号ｐ、音程領域の最後のフレーム番号ｐ＋Ｈ−１、及び前述のステップＳ３３２で算出したＲを対応させて、音程領域検出手段１４の音程領域メモリに図１５に示すような形式で格納する。 In step S351 in the figure, the arithmetic processing circuit 14a of the pitch region detecting unit 14 calculates the frequency band q, the first frame number p of the pitch region, the last frame number p + H-1 of the pitch region, and the above-described step S332. The corresponding R is stored in the pitch area memory of the pitch area detecting means 14 in the format as shown in FIG.

次に、音楽区間検出部１５の動作を図１１のフローチャートを参照して説明する。まず、音楽区間検出部１５の演算処理回路１５ａは、フレーム番号を示す制御変数ｉを０に設定する（ステップＳ４１０）。次に、演算処理回路１５ａは、音程領域検出部１４の音程領域メモリを参照して、音程領域開始位置がｉ以上で、且つ音程領域終了位置が（ｉ＋Ｊ）未満である音程領域の強度Ｒの総和Ｙを算出する（ステップＳ４２１）。ここで、Ｊは検出する音楽区間の最小の長さを決定する定数であり、且つＪ≧Ｈを満たす整数である。通常は、Ｈの数倍〜数十倍の長さに設定すればよい。 Next, the operation of the music section detection unit 15 will be described with reference to the flowchart of FIG. First, the arithmetic processing circuit 15a of the music section detecting unit 15 sets a control variable i indicating a frame number to 0 (step S410). Next, the arithmetic processing circuit 15a refers to the pitch region memory of the pitch region detection unit 14, and determines the intensity R of the pitch region whose pitch region start position is i or more and whose pitch region end position is less than (i + J). The sum Y is calculated (step S421). Here, J is a constant that determines the minimum length of the music section to be detected, and is an integer that satisfies J ≧ H. Usually, the length may be set to several times to several tens of times H.

次に、演算処理回路１５ａは、ステップＳ４２１で算出したＹがＹ１以上であるか否かを判定する（ステップＳ４３１）。ここで、Ｙ１は所定の定数である。ＹがＹ１以上である場合は、フレーム番号がｉ以上で（ｉ＋Ｊ−１）以下の区間を音楽区間であると判定してステップＳ４４０に移行する。一方、ＹがＹ１以上でない場合はステップＳ４５０に移行する。 Next, the arithmetic processing circuit 15a determines whether or not Y calculated in step S421 is Y1 or more (step S431). Here, Y1 is a predetermined constant. If Y is equal to or greater than Y1, it is determined that a section with a frame number greater than or equal to i and equal to or less than (i + J−1) is a music section, and the process proceeds to step S440. On the other hand, if Y is not equal to or greater than Y1 , the process proceeds to step S450.

次に、演算処理回路１５ａは、音楽区間の先頭のフレーム番号ｉと、音楽区間の最後のフレーム番号（ｉ＋Ｊ−１）とを対応させて、音楽区間検出手段１５が具備する音楽区間メモリに記憶する（ステップＳ４４０）。 Next, the arithmetic processing circuit 15a associates the first frame number i of the music section with the last frame number (i + J-1) of the music section and stores it in the music section memory provided in the music section detecting means 15. (Step S440).

次に、演算処理回路１５ａは、フレーム番号を示す制御変数ｉの値をＪだけ増やす（ステップＳ４５０）。次に、演算処理回路１５ａは、フレーム番号を示す制御変数ｉの値が（Ｍ−Ｊ）未満であるか否かを判定する（ステップＳ４６０）。ここで、Ｍはフレーム総数である。制御変数ｉが（Ｍ−Ｊ）未満である場合はステップＳ４２１に移行し、制御変数ｉが（Ｍ−Ｊ）以上である場合はステップＳ４７０に移行する。 Next, the arithmetic processing circuit 15a increases the value of the control variable i indicating the frame number by J (step S450). Next, the arithmetic processing circuit 15a determines whether or not the value of the control variable i indicating the frame number is less than (M−J) (step S460). Here, M is the total number of frames. When the control variable i is less than (M−J), the process proceeds to step S421, and when the control variable i is equal to or greater than (M−J), the process proceeds to step S470.

次に、演算処理回路１５ａは、音楽区間メモリのデータを音楽区間情報３として出力し、音楽区間検出部１５の処理を終了する（ステップＳ４７０）。なお、ある音楽区間の最後のフレーム番号と別の音楽区間の先頭のフレーム番号が連続している場合に（例えば、図１４の２行目と３行目）、これらを統合して１つの音楽区間にする処理を行うようにしてもよい。 Next, the arithmetic processing circuit 15a outputs the music section memory data as the music section information 3, and ends the processing of the music section detection unit 15 (step S470). When the last frame number of one music section and the first frame number of another music section are consecutive (for example, the second and third lines in FIG. 14), these are integrated into one music. You may make it perform the process made into an area.

以上、説明したように、本実施例では、音程領域検出部１４のステップＳ３３２の処理において、所定時間Ｈ内に同一周波数バンドのピーク要素がＣ１個以上あり、かつピーク要素の値の総和Ｒが所定値Ｒ１以上である場合に音程領域と判定するのに加えて、音楽区間検出部１５のステップＳ４３１の処理において、所定時間Ｊ内のＲの総和Ｙを算出して音楽区間の判定を行っているので、有音程楽器が比較的大きな音で発音されていて、発音回数が少ないような場合における音楽区間の判定に好適である。 As described above, in the present embodiment, in the process of step S332 of the pitch region detection unit 14, there are C1 or more peak elements in the same frequency band within the predetermined time H, and the total sum R of the peak element values is In addition to determining the pitch region when the value is equal to or greater than the predetermined value R1, in the process of step S431 of the music section detection unit 15, the sum of Y in the predetermined time J is calculated to determine the music section. Therefore, it is suitable for the determination of the music section when the musical instrument is sounded with a relatively loud sound and the number of times of sounding is small.

以上説明したように、本発明は、音響信号を含むコンテンツから音楽区間を検出することができるので、コンテンツを内容に応じた小区間（チャプター）に分割する機能を有する持つ映像機器やコンテンツ検索装置に有用である。 As described above, since the present invention can detect a music section from content including an audio signal, a video device and a content search apparatus having a function of dividing the content into small sections (chapter) according to the contents. Useful for.

本発明の第１〜第３の実施例における音響信号分析装置１の全体構成を示すブロック図である。It is a block diagram which shows the whole structure of the acoustic signal analyzer 1 in the 1st-3rd Example of this invention. 本発明の第１〜第３の実施例における周波数分析部１２の処理動作を説明するためのフローチャートである。It is a flowchart for demonstrating the processing operation of the frequency analysis part 12 in the 1st-3rd Example of this invention. 本発明の第１〜第３の実施例における周波数分析部１２のフレーム作成動作を説明するための図である。It is a figure for demonstrating the flame | frame creation operation | movement of the frequency analysis part 12 in the 1st-3rd Example of this invention. 本発明の第１〜第３の実施例における周波数分析部１２の周波数成分計算動作で使用するフィルタ群の特性を模式的に示した図である。It is the figure which showed typically the characteristic of the filter group used by the frequency component calculation operation | movement of the frequency analysis part 12 in the 1st-3rd Example of this invention. 本発明の第１〜第３の実施例における周波数分析部１２で作成された行列データの特性を模式的に示した図である。It is the figure which showed typically the characteristic of the matrix data produced in the frequency analysis part 12 in the 1st-3rd Example of this invention. 本発明の第１〜第３の実施例におけるピーク要素検出部１３の処理動作を説明するためのフローチャートである。It is a flowchart for demonstrating the processing operation of the peak element detection part 13 in the 1st-3rd Example of this invention. 本発明の第１の実施例における音程領域検出部１４の処理動作を説明するためのフローチャートである。It is a flowchart for demonstrating the processing operation of the pitch area | region detection part 14 in 1st Example of this invention. 本発明の第１及び第２の実施例における音楽区間検出部１５の処理動作を説明するためのフローチャートである。It is a flowchart for demonstrating the processing operation of the music area detection part 15 in the 1st and 2nd Example of this invention. 本発明の第２の実施例における音程領域検出部１４の処理動作を説明するためのフローチャートである。It is a flowchart for demonstrating the processing operation of the pitch area | region detection part 14 in 2nd Example of this invention. 本発明の第３の実施例における音程領域検出部１４の処理動作を説明するためのフローチャートである。It is a flowchart for demonstrating the processing operation of the pitch area | region detection part 14 in the 3rd Example of this invention. 本発明の第３の実施例における音楽区間検出部１５の処理動作を説明するためのフローチャートである。It is a flowchart for demonstrating the processing operation of the music area detection part 15 in 3rd Example of this invention. 本発明の第１〜第３の実施例におけるピーク要素検出部１３のデータ格納形式を示した図である。It is the figure which showed the data storage format of the peak element detection part 13 in the 1st-3rd Example of this invention. 本発明の第１及び第２の実施例における音程領域検出部１４のデータ格納形式を示した図である。It is the figure which showed the data storage format of the pitch area | region detection part 14 in the 1st and 2nd Example of this invention. 本発明の第１〜第３の実施例における音楽区間検出部１５のデータ格納形式を示した図である。It is the figure which showed the data storage format of the music area detection part 15 in the 1st-3rd Example of this invention. 本発明の第３の実施例における音程領域検出部１４のデータ格納形式を示した図である。It is the figure which showed the data storage format of the pitch area | region detection part 14 in the 3rd Example of this invention.

Explanation of symbols

１音響信号分析装置
２音響信号
３音楽区間情報
１１音響信号入力部
１２周波数分析部
１３ピーク要素検出部
１４音程領域検出部
１５音楽区間検出部
１１ａ〜１５ａ演算処理回路 DESCRIPTION OF SYMBOLS 1 Acoustic signal analyzer 2 Acoustic signal 3 Music area information 11 Acoustic signal input part 12 Frequency analysis part 13 Peak element detection part 14 Pitch area detection part 15 Music area detection part 11a-15a Arithmetic processing circuit

Claims

The acoustic signal is divided into a plurality of frequency bands, a frequency analysis means for generating a frequency component data to the component intensity per unit time at pre-Symbol respective plurality of frequency bands as elements,
A value calculated based on the condition that the element of the frequency component data is read and the read element is equal to or greater than a predetermined threshold, or the read element is a frequency component element near the read element. Or a harmonic element that is an element corresponding to the harmonic component of the read element is larger than a value calculated based on a frequency component element in the vicinity of the harmonic element. Peak element detection means for detecting an element satisfying at least one condition as a peak element;
For each of the plurality of frequency bands, the first counts the number of pre-Symbol peak elements for each section, the count result is the first section is greater than or equal to a first predetermined number having a time length that includes the unit time Pitch range detection means for detecting the pitch range as a pitch range in the frequency band ;
For each second section having a time length including the first section, the number of the pitch ranges is counted for each of the plurality of frequency bands , and the count result is equal to or greater than a second predetermined number. , sound signal analysis apparatus characterized by comprising a music segment detection means for detecting the second interval as a music segment.

The acoustic signal is divided into a plurality of frequency bands, a frequency analysis means for generating a frequency component data to the component intensity per unit time at pre-Symbol respective plurality of frequency bands as elements,
A value calculated based on the condition that the element of the frequency component data is read and the read element is equal to or greater than a predetermined threshold, or the read element is a frequency component element near the read element. Or a harmonic element that is an element corresponding to the harmonic component of the read element is larger than a value calculated based on a frequency component element in the vicinity of the harmonic element. Peak element detection means for detecting an element satisfying at least one condition as a peak element;
For each of the plurality of frequency bands, the number of peak elements and the sum of the peak element values are counted for each first section having a time length including the unit time, and the number of peak elements is the first. A pitch region detecting means for detecting the first section as a pitch region in the frequency band when the sum of the peak element values is not less than a first predetermined value .
For each second section having a length of time including the first section, the number of the pitch ranges is counted for each of the plurality of frequency bands , and the count result is equal to or greater than a second predetermined value. , sound signal analysis apparatus characterized by comprising a music segment detection means for detecting the second interval as a music segment.

By dividing the acoustic signals into a plurality of frequency bands, a frequency analysis means for generating a frequency component data to the component intensity per unit time and the elements in each of the plurality of frequency bands,
A value calculated based on the condition that the element of the frequency component data is read and the read element is equal to or greater than a predetermined threshold, or the read element is a frequency component element near the read element. Or a harmonic element that is an element corresponding to the harmonic component of the read element is larger than a value calculated based on a frequency component element in the vicinity of the harmonic element. Peak element detection means for detecting an element satisfying at least one condition as a peak element;
For each of the plurality of frequency bands, the number of peak elements and the sum of the peak element values are counted for each first section having a time length including the unit time, and the number of peak elements is the first. When the sum of the peak element values is equal to or greater than a first predetermined value, the first section is detected as a pitch region in the frequency band, and the peak element value is A pitch region detecting means for setting the sum of the pitch region intensity of the pitch region,
For each second section having a time length including the first section, the sum of the pitch range intensities is counted for each of the plurality of frequency bands, and the sum is equal to or greater than a second predetermined value. , sound signal analysis apparatus characterized by comprising a music segment detection means for detecting the second interval as a music segment.

4. The system according to claim 1, wherein the first section has a time length including a plurality of the unit times, and the second section has a time length including a plurality of the first sections. The acoustic signal analyzer according to item.

The acoustic signal analyzer according to claim 1, wherein the plurality of frequency bands are frequency bands corresponding to a pitch frequency of a scale.

The acoustic signal analyzer according to any one of claims 1 to 5, wherein the unit time differs according to each of the plurality of frequency bands.

The pitch range detection means uses a value smaller than the number of unit times included in the first section as the first predetermined number. Acoustic signal analyzer.

The pitch region detecting means, when the peak element, the element that is not the peak element, and the peak element are temporally continuous in one frequency band, the element that is not the peak element is the peak element. The acoustic signal analyzer according to claim 1, wherein

The acoustic signals by a plurality of divided frequency bands, and frequency analysis step of generating frequency component data to the component intensity per unit time at pre-Symbol respective plurality of frequency bands as elements,
A value calculated based on the condition that the element of the frequency component data is read and the read element is equal to or greater than a predetermined threshold, or the read element is a frequency component element near the read element. Or a harmonic element that is an element corresponding to the harmonic component of the read element is larger than a value calculated based on a frequency component element in the vicinity of the harmonic element. A peak element detection step of detecting, as a peak element, an element that satisfies at least one condition;
For each of the plurality of frequency bands, the first counts the number of pre-Symbol peak elements for each section, the count result is the first section is greater than or equal to a first predetermined number having a time length that includes the unit time A pitch region detecting step for detecting the pitch region as a pitch region in the frequency band ;
For each second section having a time length including the first section, the number of the pitch ranges is counted for each of the plurality of frequency bands , and the count result is equal to or greater than a second predetermined number. , sound signal analysis method, which comprises have a music-segment detection step of detecting the second section as music section.

The acoustic signals by a plurality of divided frequency bands, and frequency analysis step of generating frequency component data to the component intensity per unit time at pre-Symbol respective plurality of frequency bands as elements,
A value calculated based on the condition that the element of the frequency component data is read and the read element is equal to or greater than a predetermined threshold, or the read element is a frequency component element near the read element. Or a harmonic element that is an element corresponding to the harmonic component of the read element is larger than a value calculated based on a frequency component element in the vicinity of the harmonic element. A peak element detection step of detecting, as a peak element, an element that satisfies at least one condition;
For each of the plurality of frequency bands, the number of peak elements and the sum of the peak element values are counted for each first section having a time length including the unit time, and the number of peak elements is the first. A pitch region detecting step of detecting the first section as a pitch region in the frequency band when the sum of the peak element values is equal to or greater than a first predetermined value .
For each second section having a length of time including the first section, the number of the pitch ranges is counted for each of the plurality of frequency bands , and the count result is equal to or greater than a second predetermined value. , sound signal analysis method, which comprises have a music-segment detection step of detecting the second section as music section.

By dividing the acoustic signals into a plurality of frequency bands, a frequency analysis step of generating frequency component data to the component strength per unit time in each of the plurality of frequency band element,
A value calculated based on the condition that the element of the frequency component data is read and the read element is equal to or greater than a predetermined threshold, or the read element is a frequency component element near the read element. Or a harmonic element that is an element corresponding to the harmonic component of the read element is larger than a value calculated based on a frequency component element in the vicinity of the harmonic element. A peak element detection step of detecting, as a peak element, an element that satisfies at least one condition;
For each of the plurality of frequency bands, the number of peak elements and the sum of the peak element values are counted for each first section having a time length including the unit time, and the number of peak elements is the first. When the sum of the peak element values is equal to or greater than a first predetermined value, the first section is detected as a pitch region in the frequency band, and the peak element value is A pitch region detection step in which the sum of the pitch region is the pitch region intensity of the pitch region,
For each second section having a time length including the first section, the sum of the pitch range intensities is counted for each of the plurality of frequency bands, and the sum is equal to or greater than a second predetermined value. , sound signal analysis method, which comprises have a music-segment detection step of detecting the second section as music section.

On the computer,
The acoustic signals by a plurality of divided frequency bands, and frequency analysis step of generating frequency component data to the component intensity per unit time at pre-Symbol respective plurality of frequency bands as elements,
A value calculated based on the condition that the element of the frequency component data is read and the read element is equal to or greater than a predetermined threshold, or the read element is a frequency component element near the read element. Or a harmonic element that is an element corresponding to the harmonic component of the read element is larger than a value calculated based on a frequency component element in the vicinity of the harmonic element. A peak element detection step of detecting, as a peak element, an element that satisfies at least one condition;
For each of the plurality of frequency bands, the first counts the number of pre-Symbol peak elements for each section, the count result is the first section is greater than or equal to a first predetermined number having a time length that includes the unit time A pitch region detecting step for detecting the pitch region as a pitch region in the frequency band ;
For each second section having a time length including the first section, the number of the pitch ranges is counted for each of the plurality of frequency bands , and the count result is equal to or greater than a second predetermined number. , sound signal analysis program for executing the music-segment detection step of detecting the second section as music section.

On the computer,
The acoustic signals by a plurality of divided frequency bands, and frequency analysis step of generating frequency component data to the component intensity per unit time at pre-Symbol respective plurality of frequency bands as elements,
A value calculated based on the condition that the element of the frequency component data is read and the read element is equal to or greater than a predetermined threshold, or the read element is a frequency component element near the read element. Or a harmonic element that is an element corresponding to the harmonic component of the read element is larger than a value calculated based on a frequency component element in the vicinity of the harmonic element. A peak element detection step of detecting, as a peak element, an element that satisfies at least one condition;
For each of the plurality of frequency bands, the number of peak elements and the sum of the peak element values are counted for each first section having a time length including the unit time, and the number of peak elements is the first. A pitch region detecting step of detecting the first section as a pitch region in the frequency band when the sum of the peak element values is equal to or greater than a first predetermined value .
For each second section having a length of time including the first section, the number of the pitch ranges is counted for each of the plurality of frequency bands , and the count result is equal to or greater than a second predetermined value. , sound signal analysis program for executing the music-segment detection step of detecting the second section as music section.

On the computer,
By dividing the acoustic signals into a plurality of frequency bands, a frequency analysis step of generating frequency component data to the component strength per unit time in each of the plurality of frequency band element,
A value calculated based on the condition that the element of the frequency component data is read and the read element is equal to or greater than a predetermined threshold, or the read element is a frequency component element near the read element. Or a harmonic element that is an element corresponding to the harmonic component of the read element is larger than a value calculated based on a frequency component element in the vicinity of the harmonic element. A peak element detection step of detecting, as a peak element, an element that satisfies at least one condition;
For each of the plurality of frequency bands, the number of peak elements and the sum of the peak element values are counted for each first section having a time length including the unit time, and the number of peak elements is the first. When the sum of the peak element values is equal to or greater than a first predetermined value, the first section is detected as a pitch region in the frequency band, and the peak element value is A pitch region detection step in which the sum of the pitch region is the pitch region intensity of the pitch region,
For each second section having a time length including the first section, the sum of the pitch range intensities is counted for each of the plurality of frequency bands, and the sum is equal to or greater than a second predetermined value. , sound signal analysis program for executing the music-segment detection step of detecting the second section as music section.