[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

JP3488626B2 - Video division method, apparatus and recording medium recording video division program - Google Patents

Video division method, apparatus and recording medium recording video division program

Info

Publication number
JP3488626B2
JP3488626B2 JP06816098A JP6816098A JP3488626B2 JP 3488626 B2 JP3488626 B2 JP 3488626B2 JP 06816098 A JP06816098 A JP 06816098A JP 6816098 A JP6816098 A JP 6816098A JP 3488626 B2 JP3488626 B2 JP 3488626B2
Authority
JP
Japan
Prior art keywords
video
sound information
image
voice
music
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP06816098A
Other languages
Japanese (ja)
Other versions
JPH11266428A (en
Inventor
憲一 南
明人 阿久津
佳伸 外村
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP06816098A priority Critical patent/JP3488626B2/en
Publication of JPH11266428A publication Critical patent/JPH11266428A/en
Application granted granted Critical
Publication of JP3488626B2 publication Critical patent/JP3488626B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Electrophonic Musical Instruments (AREA)
  • Television Signal Processing For Recording (AREA)

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【発明の属する技術分野】本発明は、映像に含まれる音
情報の背景音を解析し、その特徴量の類似性に基づいて
映像を分割する映像分割方法、装置および映像分割プロ
グラムを記録した記録媒体に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention analyzes a background sound of sound information contained in a video and records the video splitting method, device and video splitting program for splitting the video based on the similarity of their feature amounts. Regarding the medium.

【0002】[0002]

【従来の技術】映像を分割する方法には主に画像情報を
用いるものがあり、例えば、カメラの切り替わりである
カット点を検出し、映像をショットに分割するものがあ
る。
2. Description of the Related Art Some methods of dividing an image mainly use image information. For example, there is a method of detecting a cut point at which a camera is switched and dividing the image into shots.

【0003】[0003]

【発明が解決しようとする課題】カット点を検出する方
法を用いて画像情報を分割するようにした技術の応用例
として、ショットの先頭画像をそのショットを表す代表
的な静止画像(代表画像)として空間的に並べて表示
し、映像の内容を一覧できるようにした映像表現方法が
あるが、カット点は頻繁に存在するため、長時間の映像
を対象とした場合には、代表画像の数が増えすぎてしま
うという問題があった。代表画像の数を減らすために
は、映像をより大まかに分割する必要がある。
As an application example of a technique for dividing image information by using a method of detecting a cut point, a leading still image of a shot is a typical still image (representative image) representing the shot. There is a video expression method that allows you to view the contents of the video by displaying them side by side, but since there are frequent cut points, the number of representative images is long when a long video is targeted. There was a problem that it would increase too much. In order to reduce the number of representative images, it is necessary to divide the video more roughly.

【0004】映像製作の観点から、ショットの集合はシ
ーンであり、当該シーンをとらえて映像を分割すること
も考えられるが、通常シーンは同じ場面のつながりであ
り、自動的に分割することは困難であった。
From the viewpoint of image production, a set of shots is a scene, and it is possible to divide the image by capturing the scene, but a normal scene is a connection of the same scenes, and it is difficult to automatically divide it. Met.

【0005】本発明は、同じ場面では背景音が類似する
可能性が高いという特徴を利用し、映像を大まかに分割
するようにすることを目的としている。
An object of the present invention is to roughly divide an image by utilizing the characteristic that background sounds are likely to be similar in the same scene.

【0006】[0006]

【課題を解決するための手段】上記目的を達成するた
め、本発明においては、映像を入力し、入力された映像
を蓄積し、映像の音情報から音楽および音声を検出し、
音情報のうち、音楽および音声を含まない区間に対して
当該音情報について周波数解析して時間平均スペクトル
を求めた特徴量を抽出し、抽出された特徴量の相互の相
関を求めた類似度を算出し、類似度が高い区間およびそ
の区間に挟まれた音情報を含む映像を1つのまとまりの
ある区間とみなしてセグメントとして分割することによ
り、大まかに映像を分割するようにしている。
In order to achieve the above object, in the present invention, an image is inputted, the inputted image is accumulated, and music and voice are detected from sound information of the image,
For sections of sound information that do not include music and voice
Frequency average of the sound information and time-averaged spectrum
The extracted feature quantity is extracted, and the mutual phase of the extracted feature quantity
The degree of similarity obtained by calculating the relationship is calculated, and an image including a section with a high degree of similarity and sound information sandwiched between the sections is combined into one unit.
The video is roughly divided by dividing it as a segment by regarding it as a certain section .

【0007】[0007]

【0008】[0008]

【発明の実施の形態】以下に、本発明の実施例について
図面を参照して説明する。図1は、本発明の一実施形態
の映像分割装置の概略構成を示すブロック図である。本
実施形態の映像分割装置は、映像を入力する映像入力部
101と、映像を蓄積する映像蓄積部102と、音楽を
検出する音楽検出部103と、音声を検出する音声検出
部104と、音楽および音声を含まない区間に対して、
特徴量を抽出する特徴抽出部105と、抽出された特徴
量の類似度を算出し、類似度が高い区間およびその区間
に挟まれた音情報を含む映像を1つのセグメントとして
分割する映像分割部106から構成されている。
BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a schematic configuration of a video division device according to an embodiment of the present invention. The video division device of the present embodiment includes a video input unit 101 for inputting a video, a video storage unit 102 for storing a video, a music detection unit 103 for detecting music, a voice detection unit 104 for detecting voice, and a music detection unit 104. And for the section that does not include voice,
A feature extraction unit 105 that extracts a feature amount, and a video division unit that calculates a similarity between the extracted feature amounts and divides a video including a section with high similarity and sound information sandwiched between the sections as one segment. It is composed of 106.

【0009】図2は、本発明の一実施例の映像分割装置
の処理の流れを示したフローチャートである。本発明を
ソフトウェアで実現した場合でも同様の処理の流れとな
る。1ループの処理は1秒程度の映像セグメントに対し
て行われる。
FIG. 2 is a flow chart showing the flow of processing of the video division apparatus according to the embodiment of the present invention. Even when the present invention is realized by software, the same processing flow is used. The processing of one loop is performed for a video segment of about 1 second.

【0010】まず、映像蓄積処理201で映像を蓄積
し、映像の音情報に対して音楽検出処理202を行う。
判断203において音楽かどうかの判別を行い、音楽な
らば判断208へジャンプする。音楽でない場合には、
音声検出処理204を施す。判断205において音声か
どうかの判断を行い、音声ならば判断208へジャンプ
する。音楽の検出には、音情報の周波数スペクトルのピ
ークが、周波数方向に対して時間的に安定しているとい
う特徴を用い、音声の検出には、くし形フィルタを用い
る方法(南他、「音解析による映像インデクシング」、
電子情報通信学会総合大会、D−12−64、199
7)などが有効である。
First, video is stored in the video storage processing 201, and music detection processing 202 is performed on the sound information of the video.
At decision 203, it is decided whether or not it is music, and if it is music, the process jumps to decision 208. If not music,
The voice detection process 204 is performed. In judgment 205, it is judged whether or not it is a voice, and if it is a voice, the process jumps to judgment 208. For music detection, the peak of the frequency spectrum of the sound information is temporally stable in the frequency direction, and for speech detection, a comb filter method is used (Minami et al. Video indexing by analysis ",
IEICE General Conference, D-12-64, 199
7) etc. are effective.

【0011】[0011]

【0012】音声でない場合には、その期間は音楽およ
び音声を含まない背景音であるとして、即ち、背景音に
対応するセグメントとして特徴抽出処理206が施され
る。特徴抽出処理206では、音情報を周波数解析し、
長時間平均スペクトルを求める。長時間平均スペクトル
は、各周波数におけるスペクトルのパワーの時間的平均
値である。
If it is not a voice, the feature extraction processing 206 is performed as a background sound that does not include music and voice during the period, that is, as a segment corresponding to the background sound. In the feature extraction processing 206, frequency analysis is performed on the sound information,
Obtain the long-term average spectrum. The long-term average spectrum is a temporal average value of the power of the spectrum at each frequency.

【0013】次に、映像分割処理207において、1ル
ープ前に算出された長時間平均スペクトルと現在の長時
間平均スペクトルとの相関を求め、相関が高い場合には
同一場面であるとみなし、ラベリングする。相関を求め
た2つのセグメントに存在する音楽あるいは音声のセグ
メントも同一場面のものとしてラベリングする。ラベル
情報は、セグメントの時間情報と共に映像蓄積部102
に保存される。
Next, in the image division processing 207, the correlation between the long-time average spectrum calculated one loop before and the current long-time average spectrum is obtained, and if the correlation is high, it is considered that the scene is the same, and labeling is performed. To do. The music or voice segments existing in the two segments for which the correlation has been obtained are also labeled as the same scene. The label information, together with the segment time information, is stored in the video storage unit 102.
Stored in.

【0014】なお前記において映像の分割について説明
したが、当該分割の態様はデータ処理装置が実行できる
プログラムの形で保持することができ、本発明は当該プ
ログラムを記録した記録媒体をも含むものである。
Although the video division has been described above, the mode of the division can be held in the form of a program that can be executed by the data processing apparatus, and the present invention also includes a recording medium recording the program.

【0015】[0015]

【発明の効果】(1)請求項1、およびの発明は、
映像を入力し、入力された映像を蓄積し、映像の音情報
から音楽および音声を検出し、音情報のうち、音楽およ
び音声を含まない区間に対して当該音情報について周波
数解析して時間平均スペクトルを求めた特徴量を抽出
し、抽出された特徴量の相互の相関を求めた類似度を算
出し、類似度が高い区間およびその区間に挟まれた音情
報を含む映像を1つのまとまりのある区間とみなして
グメントとして分割することを可能にし、大まかに映像
を分割することを可能にする。
(1) The inventions of claims 1, 2 and 3 are as follows:
Input the video, store the input video, detect music and voice from the sound information of the video, and detect the frequency of the sound information in the section that does not include the music and voice in the sound information.
Extract the feature quantity for which the time-averaged spectrum is obtained by numerical analysis , calculate the similarity degree by calculating the mutual correlation of the extracted feature quantity, and include the section with high similarity degree and the sound information sandwiched between the sections. It is possible to divide an image into segments by treating the image as one united section, and to roughly divide the image.

【0016】[0016]

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明の一実施形態の映像分割装置の概略構成
を示すブロック図である。
FIG. 1 is a block diagram showing a schematic configuration of a video division device according to an embodiment of the present invention.

【図2】本発明の一実施形態の映像分割装置の処理の流
れと本発明をソフトウェアで実現した場合の処理の流れ
を示すフローチャートである。
FIG. 2 is a flowchart showing a processing flow of the video division device according to the embodiment of the present invention and a processing flow when the present invention is realized by software.

【符号の説明】[Explanation of symbols]

101 映像入力部 102 映像蓄積部 103 音楽検出部 104 音声検出部 105 特徴抽出部 106 映像分割部 201 映像蓄積処理 202 音楽検出処理 203 音楽判定処理 204 音声検出処理 205 音声判定処理 206 特徴抽出処理 207 映像分割処理 208 映像終了判定処理 101 Video input section 102 video storage 103 music detector 104 voice detector 105 Feature Extraction Unit 106 video division unit 201 Image storage processing 202 Music detection processing 203 Music determination processing 204 voice detection processing 205 voice determination processing 206 Feature extraction process 207 Video division processing 208 Video end determination processing

フロントページの続き (56)参考文献 特開 平9−214879(JP,A) 特開 平8−95596(JP,A) (58)調査した分野(Int.Cl.7,DB名) H04N 5/76 - 5/956 G10L 3/00 Continuation of the front page (56) References JP-A-9-214879 (JP, A) JP-A-8-95596 (JP, A) (58) Fields investigated (Int.Cl. 7 , DB name) H04N 5 / 76-5/956 G10L 3/00

Claims (3)

(57)【特許請求の範囲】(57) [Claims] 【請求項1】 与えられた映像を場面に対応して分割す
る方法であって、 映像を入力する映像入力段階と、 映像を蓄積する映像蓄積段階と、 映像における音情報から音楽を検出する音楽検出段階
と、 音情報から音声を検出する音声検出段階と、 音情報のうち、音楽と音声とを含まない区間に対して、
当該音情報について周波数解析して時間平均スペクトル
を求めた特徴量を抽出する特徴抽出段階と、 抽出された特徴量の相互の相関を求めた類似度を算出し
て類似度が高い区間の映像およびその区間に挟まれた音
情報を含む映像を1つのまとまりのある区間とみなして
セグメントとしてまとめることにより、映像を複数のセ
グメントに分割する映像分割段階と、 を実行することを特徴とする映像分割方法。
1. A method of dividing a given image according to a scene, the image inputting step of inputting the image, the image accumulating step of accumulating the image, and the music detecting music from the sound information in the image. A detection step, a voice detection step of detecting voice from sound information, and a section of the sound information that does not include music and voice,
Frequency average of the sound information and time-averaged spectrum
Image including a feature extraction step of extracting a feature amount obtained, video and sound information sandwiched between the interval of the extracted feature quantity calculated to the similarity degree of similarity calculated correlation mutual high section of And a video segmentation step of segmenting the video into a plurality of segments by regarding them as one coherent section and grouping the segments into segments, and a video segmentation method comprising:
【請求項2】 与えられた映像を場面に対応して分割す
る装置であって、 映像を入力する映像入力部と、 映像を蓄積する映像蓄積部と、 映像における音情報から音楽を検出する音楽検出部と、 音情報から音声を検出する音声検出部と、 音情報のうち、音楽と音声とを含まない区間に対して、
当該音情報について周波数解析して時間平均スペクトル
を求めた特徴量を抽出する特徴抽出部と、 抽出された特徴量の相互の相関を求めた類似度を算出し
て類似度が高い区間の映像およびその区間に挟まれた音
情報を含む映像を1つのまとまりのある区間とみなして
セグメントとしてまとめることにより、映像を複数のセ
グメントとして分割する映像分割部と、 を具備することを特徴とする映像分割装置。
2. An apparatus for dividing a given video image according to a scene, the video input unit for inputting the video image, the video storage unit for accumulating the video image, and the music for detecting music from the sound information in the video image. A detection unit, a voice detection unit that detects voice from sound information, and a section of the sound information that does not include music and voice,
Frequency average of the sound information and time-averaged spectrum
Image including a feature extracting section for extracting a feature amount obtained, video and sound information sandwiched between the interval of the extracted feature quantity calculated to the similarity degree of similarity calculated correlation mutual high section of And a video dividing unit that divides the video into a plurality of segments by regarding them as one coherent section, and dividing the video into a plurality of segments.
【請求項3】 与えられた映像を場面に対応して分割す
るプログラムを記録した記録媒体であって、 映像を入力する映像入力処理と、 映像を蓄積する映像蓄積処理と、 映像における音情報から音楽を検出する音楽検出処理
と、 音情報から音声を検出する音声検出処理と、 音情報のうち、音楽と音声とを含まない区間に対して、
当該音情報について周波数解析して時間平均スペクトル
を求めた特徴量を抽出する特徴抽出処理と、 抽出された特徴量の相互の相関を求めた類似度を算出し
て類似度が高い区間の映像およびその区間に挟まれた音
情報を含む映像を1つのまとまりのある区間とみなして
セグメントとしてまとめることにより、映像を複数のセ
グメントとして分割する映像分割処理と、 をコンピュータに実行させるための映像分割プログラム
を記録したことを特徴とする記録媒体。
3. A recording medium in which a program for dividing a given video corresponding to a scene is recorded, the video input processing for inputting the video, the video storage processing for storing the video, and the sound information in the video. Music detection processing that detects music, voice detection processing that detects voice from sound information, and for the section of the sound information that does not include music and voice,
Frequency average of the sound information and time-averaged spectrum
Image including a feature extraction process of extracting a feature amount obtained, video and sound information sandwiched between the interval of the extracted feature quantity calculated to the similarity degree of similarity calculated correlation mutual high section of Is regarded as one cohesive section, and is grouped as a segment to divide the video into a plurality of segments, and a video division program for causing a computer to execute is recorded. Recording medium.
JP06816098A 1998-03-18 1998-03-18 Video division method, apparatus and recording medium recording video division program Expired - Lifetime JP3488626B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP06816098A JP3488626B2 (en) 1998-03-18 1998-03-18 Video division method, apparatus and recording medium recording video division program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP06816098A JP3488626B2 (en) 1998-03-18 1998-03-18 Video division method, apparatus and recording medium recording video division program

Publications (2)

Publication Number Publication Date
JPH11266428A JPH11266428A (en) 1999-09-28
JP3488626B2 true JP3488626B2 (en) 2004-01-19

Family

ID=13365737

Family Applications (1)

Application Number Title Priority Date Filing Date
JP06816098A Expired - Lifetime JP3488626B2 (en) 1998-03-18 1998-03-18 Video division method, apparatus and recording medium recording video division program

Country Status (1)

Country Link
JP (1) JP3488626B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016080660A1 (en) * 2014-11-18 2016-05-26 Samsung Electronics Co., Ltd. Content processing device and method for transmitting segment of variable size

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114586068A (en) * 2019-10-28 2022-06-03 索尼集团公司 Information processing apparatus, proposal apparatus, information processing method, and proposal method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016080660A1 (en) * 2014-11-18 2016-05-26 Samsung Electronics Co., Ltd. Content processing device and method for transmitting segment of variable size
US9910919B2 (en) 2014-11-18 2018-03-06 Samsung Electronics Co., Ltd. Content processing device and method for transmitting segment of variable size, and computer-readable recording medium

Also Published As

Publication number Publication date
JPH11266428A (en) 1999-09-28

Similar Documents

Publication Publication Date Title
US7127120B2 (en) Systems and methods for automatically editing a video
US8467610B2 (en) Video summarization using sparse basis function combination
US7483624B2 (en) System and method for indexing a video sequence
JPH10294923A (en) Scene change detection method and scene change detector
Kolekar et al. Semantic concept mining based on hierarchical event detection for soccer video indexing
WO2014022254A2 (en) Identifying key frames using group sparsity analysis
JP2960939B2 (en) Scene extraction processing method
CN102222104A (en) Method for intelligently extracting video abstract based on time-space fusion
Chen et al. Detection of soccer goal shots using joint multimedia features and classification rules
Chen et al. Scene change detection by audio and video clues
CN102067589A (en) Digital video recorder system and operating method thereof
Wang et al. Soccer replay detection using scene transition structure analysis
KR101195613B1 (en) Apparatus and method for partitioning moving image according to topic
JP3488626B2 (en) Video division method, apparatus and recording medium recording video division program
Truong et al. Improved fade and dissolve detection for reliable video segmentation
JP5096259B2 (en) Summary content generation apparatus and summary content generation program
Ma et al. An indexing and browsing system for home video
CN1330175C (en) Content editing device
JP3785068B2 (en) Video analysis apparatus, video analysis method, video analysis program, and program recording medium
JP2008236729A (en) Method and apparatus for generating digest
JP3469122B2 (en) Video segment classification method and apparatus for editing, and recording medium recording this method
¯ Cerneková et al. Entropy metrics used for video summarization
Volkmer et al. Gradual transition detection using average frame similarity
Lehane et al. Action Sequence Detection in Motion Pictures.
EP2401700B1 (en) Digital data stream processing

Legal Events

Date Code Title Description
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20071031

Year of fee payment: 4

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20081031

Year of fee payment: 5

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20091031

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20101031

Year of fee payment: 7

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20101031

Year of fee payment: 7

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20111031

Year of fee payment: 8

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20111031

Year of fee payment: 8

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20121031

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20121031

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20131031

Year of fee payment: 10

S531 Written request for registration of change of domicile

Free format text: JAPANESE INTERMEDIATE CODE: R313531

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350

EXPY Cancellation because of completion of term