JP3488626B2 - Video division method, apparatus and recording medium recording video division program - Google Patents
Video division method, apparatus and recording medium recording video division programInfo
- Publication number
- JP3488626B2 JP3488626B2 JP06816098A JP6816098A JP3488626B2 JP 3488626 B2 JP3488626 B2 JP 3488626B2 JP 06816098 A JP06816098 A JP 06816098A JP 6816098 A JP6816098 A JP 6816098A JP 3488626 B2 JP3488626 B2 JP 3488626B2
- Authority
- JP
- Japan
- Prior art keywords
- video
- sound information
- image
- voice
- music
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Landscapes
- Electrophonic Musical Instruments (AREA)
- Television Signal Processing For Recording (AREA)
Description
【0001】[0001]
【発明の属する技術分野】本発明は、映像に含まれる音
情報の背景音を解析し、その特徴量の類似性に基づいて
映像を分割する映像分割方法、装置および映像分割プロ
グラムを記録した記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention analyzes a background sound of sound information contained in a video and records the video splitting method, device and video splitting program for splitting the video based on the similarity of their feature amounts. Regarding the medium.
【0002】[0002]
【従来の技術】映像を分割する方法には主に画像情報を
用いるものがあり、例えば、カメラの切り替わりである
カット点を検出し、映像をショットに分割するものがあ
る。2. Description of the Related Art Some methods of dividing an image mainly use image information. For example, there is a method of detecting a cut point at which a camera is switched and dividing the image into shots.
【0003】[0003]
【発明が解決しようとする課題】カット点を検出する方
法を用いて画像情報を分割するようにした技術の応用例
として、ショットの先頭画像をそのショットを表す代表
的な静止画像(代表画像)として空間的に並べて表示
し、映像の内容を一覧できるようにした映像表現方法が
あるが、カット点は頻繁に存在するため、長時間の映像
を対象とした場合には、代表画像の数が増えすぎてしま
うという問題があった。代表画像の数を減らすために
は、映像をより大まかに分割する必要がある。As an application example of a technique for dividing image information by using a method of detecting a cut point, a leading still image of a shot is a typical still image (representative image) representing the shot. There is a video expression method that allows you to view the contents of the video by displaying them side by side, but since there are frequent cut points, the number of representative images is long when a long video is targeted. There was a problem that it would increase too much. In order to reduce the number of representative images, it is necessary to divide the video more roughly.
【0004】映像製作の観点から、ショットの集合はシ
ーンであり、当該シーンをとらえて映像を分割すること
も考えられるが、通常シーンは同じ場面のつながりであ
り、自動的に分割することは困難であった。From the viewpoint of image production, a set of shots is a scene, and it is possible to divide the image by capturing the scene, but a normal scene is a connection of the same scenes, and it is difficult to automatically divide it. Met.
【0005】本発明は、同じ場面では背景音が類似する
可能性が高いという特徴を利用し、映像を大まかに分割
するようにすることを目的としている。An object of the present invention is to roughly divide an image by utilizing the characteristic that background sounds are likely to be similar in the same scene.
【0006】[0006]
【課題を解決するための手段】上記目的を達成するた
め、本発明においては、映像を入力し、入力された映像
を蓄積し、映像の音情報から音楽および音声を検出し、
音情報のうち、音楽および音声を含まない区間に対して
当該音情報について周波数解析して時間平均スペクトル
を求めた特徴量を抽出し、抽出された特徴量の相互の相
関を求めた類似度を算出し、類似度が高い区間およびそ
の区間に挟まれた音情報を含む映像を1つのまとまりの
ある区間とみなしてセグメントとして分割することによ
り、大まかに映像を分割するようにしている。In order to achieve the above object, in the present invention, an image is inputted, the inputted image is accumulated, and music and voice are detected from sound information of the image,
For sections of sound information that do not include music and voice
Frequency average of the sound information and time-averaged spectrum
The extracted feature quantity is extracted, and the mutual phase of the extracted feature quantity
The degree of similarity obtained by calculating the relationship is calculated, and an image including a section with a high degree of similarity and sound information sandwiched between the sections is combined into one unit.
The video is roughly divided by dividing it as a segment by regarding it as a certain section .
【0007】[0007]
【0008】[0008]
【発明の実施の形態】以下に、本発明の実施例について
図面を参照して説明する。図1は、本発明の一実施形態
の映像分割装置の概略構成を示すブロック図である。本
実施形態の映像分割装置は、映像を入力する映像入力部
101と、映像を蓄積する映像蓄積部102と、音楽を
検出する音楽検出部103と、音声を検出する音声検出
部104と、音楽および音声を含まない区間に対して、
特徴量を抽出する特徴抽出部105と、抽出された特徴
量の類似度を算出し、類似度が高い区間およびその区間
に挟まれた音情報を含む映像を1つのセグメントとして
分割する映像分割部106から構成されている。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a schematic configuration of a video division device according to an embodiment of the present invention. The video division device of the present embodiment includes a video input unit 101 for inputting a video, a video storage unit 102 for storing a video, a music detection unit 103 for detecting music, a voice detection unit 104 for detecting voice, and a music detection unit 104. And for the section that does not include voice,
A feature extraction unit 105 that extracts a feature amount, and a video division unit that calculates a similarity between the extracted feature amounts and divides a video including a section with high similarity and sound information sandwiched between the sections as one segment. It is composed of 106.
【0009】図2は、本発明の一実施例の映像分割装置
の処理の流れを示したフローチャートである。本発明を
ソフトウェアで実現した場合でも同様の処理の流れとな
る。1ループの処理は1秒程度の映像セグメントに対し
て行われる。FIG. 2 is a flow chart showing the flow of processing of the video division apparatus according to the embodiment of the present invention. Even when the present invention is realized by software, the same processing flow is used. The processing of one loop is performed for a video segment of about 1 second.
【0010】まず、映像蓄積処理201で映像を蓄積
し、映像の音情報に対して音楽検出処理202を行う。
判断203において音楽かどうかの判別を行い、音楽な
らば判断208へジャンプする。音楽でない場合には、
音声検出処理204を施す。判断205において音声か
どうかの判断を行い、音声ならば判断208へジャンプ
する。音楽の検出には、音情報の周波数スペクトルのピ
ークが、周波数方向に対して時間的に安定しているとい
う特徴を用い、音声の検出には、くし形フィルタを用い
る方法(南他、「音解析による映像インデクシング」、
電子情報通信学会総合大会、D−12−64、199
7)などが有効である。First, video is stored in the video storage processing 201, and music detection processing 202 is performed on the sound information of the video.
At decision 203, it is decided whether or not it is music, and if it is music, the process jumps to decision 208. If not music,
The voice detection process 204 is performed. In judgment 205, it is judged whether or not it is a voice, and if it is a voice, the process jumps to judgment 208. For music detection, the peak of the frequency spectrum of the sound information is temporally stable in the frequency direction, and for speech detection, a comb filter method is used (Minami et al. Video indexing by analysis ",
IEICE General Conference, D-12-64, 199
7) etc. are effective.
【0011】[0011]
【0012】音声でない場合には、その期間は音楽およ
び音声を含まない背景音であるとして、即ち、背景音に
対応するセグメントとして特徴抽出処理206が施され
る。特徴抽出処理206では、音情報を周波数解析し、
長時間平均スペクトルを求める。長時間平均スペクトル
は、各周波数におけるスペクトルのパワーの時間的平均
値である。If it is not a voice, the feature extraction processing 206 is performed as a background sound that does not include music and voice during the period, that is, as a segment corresponding to the background sound. In the feature extraction processing 206, frequency analysis is performed on the sound information,
Obtain the long-term average spectrum. The long-term average spectrum is a temporal average value of the power of the spectrum at each frequency.
【0013】次に、映像分割処理207において、1ル
ープ前に算出された長時間平均スペクトルと現在の長時
間平均スペクトルとの相関を求め、相関が高い場合には
同一場面であるとみなし、ラベリングする。相関を求め
た2つのセグメントに存在する音楽あるいは音声のセグ
メントも同一場面のものとしてラベリングする。ラベル
情報は、セグメントの時間情報と共に映像蓄積部102
に保存される。Next, in the image division processing 207, the correlation between the long-time average spectrum calculated one loop before and the current long-time average spectrum is obtained, and if the correlation is high, it is considered that the scene is the same, and labeling is performed. To do. The music or voice segments existing in the two segments for which the correlation has been obtained are also labeled as the same scene. The label information, together with the segment time information, is stored in the video storage unit 102.
Stored in.
【0014】なお前記において映像の分割について説明
したが、当該分割の態様はデータ処理装置が実行できる
プログラムの形で保持することができ、本発明は当該プ
ログラムを記録した記録媒体をも含むものである。Although the video division has been described above, the mode of the division can be held in the form of a program that can be executed by the data processing apparatus, and the present invention also includes a recording medium recording the program.
【0015】[0015]
【発明の効果】(1)請求項1、2および3の発明は、
映像を入力し、入力された映像を蓄積し、映像の音情報
から音楽および音声を検出し、音情報のうち、音楽およ
び音声を含まない区間に対して当該音情報について周波
数解析して時間平均スペクトルを求めた特徴量を抽出
し、抽出された特徴量の相互の相関を求めた類似度を算
出し、類似度が高い区間およびその区間に挟まれた音情
報を含む映像を1つのまとまりのある区間とみなしてセ
グメントとして分割することを可能にし、大まかに映像
を分割することを可能にする。(1) The inventions of claims 1, 2 and 3 are as follows:
Input the video, store the input video, detect music and voice from the sound information of the video, and detect the frequency of the sound information in the section that does not include the music and voice in the sound information.
Extract the feature quantity for which the time-averaged spectrum is obtained by numerical analysis , calculate the similarity degree by calculating the mutual correlation of the extracted feature quantity, and include the section with high similarity degree and the sound information sandwiched between the sections. It is possible to divide an image into segments by treating the image as one united section, and to roughly divide the image.
【0016】[0016]
【図面の簡単な説明】[Brief description of drawings]
【図1】本発明の一実施形態の映像分割装置の概略構成
を示すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of a video division device according to an embodiment of the present invention.
【図2】本発明の一実施形態の映像分割装置の処理の流
れと本発明をソフトウェアで実現した場合の処理の流れ
を示すフローチャートである。FIG. 2 is a flowchart showing a processing flow of the video division device according to the embodiment of the present invention and a processing flow when the present invention is realized by software.
101 映像入力部 102 映像蓄積部 103 音楽検出部 104 音声検出部 105 特徴抽出部 106 映像分割部 201 映像蓄積処理 202 音楽検出処理 203 音楽判定処理 204 音声検出処理 205 音声判定処理 206 特徴抽出処理 207 映像分割処理 208 映像終了判定処理 101 Video input section 102 video storage 103 music detector 104 voice detector 105 Feature Extraction Unit 106 video division unit 201 Image storage processing 202 Music detection processing 203 Music determination processing 204 voice detection processing 205 voice determination processing 206 Feature extraction process 207 Video division processing 208 Video end determination processing
フロントページの続き (56)参考文献 特開 平9−214879(JP,A) 特開 平8−95596(JP,A) (58)調査した分野(Int.Cl.7,DB名) H04N 5/76 - 5/956 G10L 3/00 Continuation of the front page (56) References JP-A-9-214879 (JP, A) JP-A-8-95596 (JP, A) (58) Fields investigated (Int.Cl. 7 , DB name) H04N 5 / 76-5/956 G10L 3/00
Claims (3)
る方法であって、 映像を入力する映像入力段階と、 映像を蓄積する映像蓄積段階と、 映像における音情報から音楽を検出する音楽検出段階
と、 音情報から音声を検出する音声検出段階と、 音情報のうち、音楽と音声とを含まない区間に対して、
当該音情報について周波数解析して時間平均スペクトル
を求めた特徴量を抽出する特徴抽出段階と、 抽出された特徴量の相互の相関を求めた類似度を算出し
て類似度が高い区間の映像およびその区間に挟まれた音
情報を含む映像を1つのまとまりのある区間とみなして
セグメントとしてまとめることにより、映像を複数のセ
グメントに分割する映像分割段階と、 を実行することを特徴とする映像分割方法。1. A method of dividing a given image according to a scene, the image inputting step of inputting the image, the image accumulating step of accumulating the image, and the music detecting music from the sound information in the image. A detection step, a voice detection step of detecting voice from sound information, and a section of the sound information that does not include music and voice,
Frequency average of the sound information and time-averaged spectrum
Image including a feature extraction step of extracting a feature amount obtained, video and sound information sandwiched between the interval of the extracted feature quantity calculated to the similarity degree of similarity calculated correlation mutual high section of And a video segmentation step of segmenting the video into a plurality of segments by regarding them as one coherent section and grouping the segments into segments, and a video segmentation method comprising:
る装置であって、 映像を入力する映像入力部と、 映像を蓄積する映像蓄積部と、 映像における音情報から音楽を検出する音楽検出部と、 音情報から音声を検出する音声検出部と、 音情報のうち、音楽と音声とを含まない区間に対して、
当該音情報について周波数解析して時間平均スペクトル
を求めた特徴量を抽出する特徴抽出部と、 抽出された特徴量の相互の相関を求めた類似度を算出し
て類似度が高い区間の映像およびその区間に挟まれた音
情報を含む映像を1つのまとまりのある区間とみなして
セグメントとしてまとめることにより、映像を複数のセ
グメントとして分割する映像分割部と、 を具備することを特徴とする映像分割装置。2. An apparatus for dividing a given video image according to a scene, the video input unit for inputting the video image, the video storage unit for accumulating the video image, and the music for detecting music from the sound information in the video image. A detection unit, a voice detection unit that detects voice from sound information, and a section of the sound information that does not include music and voice,
Frequency average of the sound information and time-averaged spectrum
Image including a feature extracting section for extracting a feature amount obtained, video and sound information sandwiched between the interval of the extracted feature quantity calculated to the similarity degree of similarity calculated correlation mutual high section of And a video dividing unit that divides the video into a plurality of segments by regarding them as one coherent section, and dividing the video into a plurality of segments.
るプログラムを記録した記録媒体であって、 映像を入力する映像入力処理と、 映像を蓄積する映像蓄積処理と、 映像における音情報から音楽を検出する音楽検出処理
と、 音情報から音声を検出する音声検出処理と、 音情報のうち、音楽と音声とを含まない区間に対して、
当該音情報について周波数解析して時間平均スペクトル
を求めた特徴量を抽出する特徴抽出処理と、 抽出された特徴量の相互の相関を求めた類似度を算出し
て類似度が高い区間の映像およびその区間に挟まれた音
情報を含む映像を1つのまとまりのある区間とみなして
セグメントとしてまとめることにより、映像を複数のセ
グメントとして分割する映像分割処理と、 をコンピュータに実行させるための映像分割プログラム
を記録したことを特徴とする記録媒体。3. A recording medium in which a program for dividing a given video corresponding to a scene is recorded, the video input processing for inputting the video, the video storage processing for storing the video, and the sound information in the video. Music detection processing that detects music, voice detection processing that detects voice from sound information, and for the section of the sound information that does not include music and voice,
Frequency average of the sound information and time-averaged spectrum
Image including a feature extraction process of extracting a feature amount obtained, video and sound information sandwiched between the interval of the extracted feature quantity calculated to the similarity degree of similarity calculated correlation mutual high section of Is regarded as one cohesive section, and is grouped as a segment to divide the video into a plurality of segments, and a video division program for causing a computer to execute is recorded. Recording medium.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP06816098A JP3488626B2 (en) | 1998-03-18 | 1998-03-18 | Video division method, apparatus and recording medium recording video division program |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP06816098A JP3488626B2 (en) | 1998-03-18 | 1998-03-18 | Video division method, apparatus and recording medium recording video division program |
Publications (2)
Publication Number | Publication Date |
---|---|
JPH11266428A JPH11266428A (en) | 1999-09-28 |
JP3488626B2 true JP3488626B2 (en) | 2004-01-19 |
Family
ID=13365737
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP06816098A Expired - Lifetime JP3488626B2 (en) | 1998-03-18 | 1998-03-18 | Video division method, apparatus and recording medium recording video division program |
Country Status (1)
Country | Link |
---|---|
JP (1) | JP3488626B2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016080660A1 (en) * | 2014-11-18 | 2016-05-26 | Samsung Electronics Co., Ltd. | Content processing device and method for transmitting segment of variable size |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114586068A (en) * | 2019-10-28 | 2022-06-03 | 索尼集团公司 | Information processing apparatus, proposal apparatus, information processing method, and proposal method |
-
1998
- 1998-03-18 JP JP06816098A patent/JP3488626B2/en not_active Expired - Lifetime
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016080660A1 (en) * | 2014-11-18 | 2016-05-26 | Samsung Electronics Co., Ltd. | Content processing device and method for transmitting segment of variable size |
US9910919B2 (en) | 2014-11-18 | 2018-03-06 | Samsung Electronics Co., Ltd. | Content processing device and method for transmitting segment of variable size, and computer-readable recording medium |
Also Published As
Publication number | Publication date |
---|---|
JPH11266428A (en) | 1999-09-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7127120B2 (en) | Systems and methods for automatically editing a video | |
US8467610B2 (en) | Video summarization using sparse basis function combination | |
US7483624B2 (en) | System and method for indexing a video sequence | |
JPH10294923A (en) | Scene change detection method and scene change detector | |
Kolekar et al. | Semantic concept mining based on hierarchical event detection for soccer video indexing | |
WO2014022254A2 (en) | Identifying key frames using group sparsity analysis | |
JP2960939B2 (en) | Scene extraction processing method | |
CN102222104A (en) | Method for intelligently extracting video abstract based on time-space fusion | |
Chen et al. | Detection of soccer goal shots using joint multimedia features and classification rules | |
Chen et al. | Scene change detection by audio and video clues | |
CN102067589A (en) | Digital video recorder system and operating method thereof | |
Wang et al. | Soccer replay detection using scene transition structure analysis | |
KR101195613B1 (en) | Apparatus and method for partitioning moving image according to topic | |
JP3488626B2 (en) | Video division method, apparatus and recording medium recording video division program | |
Truong et al. | Improved fade and dissolve detection for reliable video segmentation | |
JP5096259B2 (en) | Summary content generation apparatus and summary content generation program | |
Ma et al. | An indexing and browsing system for home video | |
CN1330175C (en) | Content editing device | |
JP3785068B2 (en) | Video analysis apparatus, video analysis method, video analysis program, and program recording medium | |
JP2008236729A (en) | Method and apparatus for generating digest | |
JP3469122B2 (en) | Video segment classification method and apparatus for editing, and recording medium recording this method | |
¯ Cerneková et al. | Entropy metrics used for video summarization | |
Volkmer et al. | Gradual transition detection using average frame similarity | |
Lehane et al. | Action Sequence Detection in Motion Pictures. | |
EP2401700B1 (en) | Digital data stream processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20071031 Year of fee payment: 4 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20081031 Year of fee payment: 5 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20091031 Year of fee payment: 6 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20101031 Year of fee payment: 7 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20101031 Year of fee payment: 7 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20111031 Year of fee payment: 8 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20111031 Year of fee payment: 8 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20121031 Year of fee payment: 9 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20121031 Year of fee payment: 9 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20131031 Year of fee payment: 10 |
|
S531 | Written request for registration of change of domicile |
Free format text: JAPANESE INTERMEDIATE CODE: R313531 |
|
R350 | Written notification of registration of transfer |
Free format text: JAPANESE INTERMEDIATE CODE: R350 |
|
EXPY | Cancellation because of completion of term |