JP4923517B2

JP4923517B2 - Imaging device, imaging method, and semiconductor device

Info

Publication number: JP4923517B2
Application number: JP2005313490A
Authority: JP
Inventors: 雅保井口; 敏志近藤; 孝啓西; 敏康杉尾; 正真遠間; 寿郎笹井
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 2004-10-27
Filing date: 2005-10-27
Publication date: 2012-04-25
Anticipated expiration: 2025-10-27
Also published as: JP2006157893A

Description

本発明は撮像装置、撮像方法、および半導体装置に関し、特に、撮像装置を用いた撮影により得られたオーディオビデオデータに対して編集点を設定するデータ処理の改良に関するものである。 The present invention relates to an imaging device , an imaging method, and a semiconductor device , and more particularly to an improvement in data processing for setting an edit point for audio video data obtained by photographing using an imaging device.

近年、被写体を撮影してデジタル映像データを符号化して記録媒体に記録するデジタル撮像装置が普及し、一般家庭でも大量のデジタル映像データを扱う機会が増えてきた。 In recent years, digital imaging apparatuses that photograph a subject, encode digital video data, and record the data on a recording medium have become widespread, and opportunities for handling a large amount of digital video data have increased even in general households.

ところが、一度撮影した映像は、編集してまとめておきたいところではあるが、編集のスタートポイントを探したりするのが面倒である。例えば、運動会や結婚式などで撮影した映像データは、一旦、記録媒体に記録するものの、その編集が面倒であるなどの理由で、一度も見ずに放置してしまっているという状況も考えられる。 However, once you shoot the video, you want to edit it and put it together, but it is troublesome to find the starting point of editing. For example, video data taken at an athletic meet or wedding may be temporarily recorded on a recording medium, but may be left unattended for reasons such as troublesome editing. .

また、編集のスタートポイントを見つけても、このスタートポイントに相当するピクチャが、予測符号化処理における画面間予測ピクチャとなっているため、開始位置として容易に使用できないなどの課題もある。 Even when the editing start point is found, the picture corresponding to the start point is an inter-screen prediction picture in the predictive coding process, and thus cannot be easily used as a start position.

このように従来の撮像装置で撮影した映像データは、その重要な部分のみを簡単に視聴したり、記録媒体に残しておくようにしたりするには、面倒な編集作業を必要とするものであった。 As described above, the video data shot by the conventional imaging device requires troublesome editing work in order to easily view only the important part or leave it on the recording medium. It was.

ところで、特開２００３−２９９０１０号公報には、映像コンテンツ編集支援システムが開示されており、このシステムは、画像を撮影して映像コンテンツデータを記録する撮像装置と、該撮像装置での撮影により得られた映像コンテンツデータをネットワークなどを介してリアルタイムで受信して表示する編集者端末装置とを有している。 By the way, Japanese Patent Laid-Open No. 2003-299010 discloses a video content editing support system. This system captures an image and records video content data, and obtains it by shooting with the imaging device. And an editor terminal device for receiving and displaying the received video content data in real time via a network or the like.

この編集支援システムの撮像装置は、ユーザ操作などに基づいて電子マークデータを発生する電子マーク発生部と、発生した電子マークデータを、撮影により得られた映像コンテンツデータにそのタイムコードと関連付けて記述する電子マーク挿入部とを有するものである。また、上記編集支援システムの編集者端末装置は、撮像装置からの電子マークデータに基づいて電子マークリストデータを作成するリスト作成部と、撮像装置からの映像コンテンツデータを表示する表示部とを有し、該表示部に、電子マークデータに対応するタイミングに同期した映像コンテンツデータの画像を表示するものである。 An imaging device of this editing support system includes an electronic mark generator that generates electronic mark data based on user operations and the like, and describes the generated electronic mark data in association with the time code of video content data obtained by shooting And an electronic mark insertion portion. The editor terminal device of the editing support system includes a list creation unit that creates electronic mark list data based on the electronic mark data from the imaging device, and a display unit that displays video content data from the imaging device. The video content data image synchronized with the timing corresponding to the electronic mark data is displayed on the display unit.

このような映像コンテンツ編集支援システムでは、撮影中にユーザ操作により、被写体の撮像データである映像コンテンツデータに電子マークデータを付加することにより、撮像により得られた映像コンテンツデータを、パーソナルコンピュータなどの編集者端末装置で電子マークデータに基づいて自動編集されるものとすることができる。
特開２００３−２９９０１０号公報 In such a video content editing support system, by adding electronic mark data to video content data which is imaging data of a subject by user operation during shooting, video content data obtained by imaging is stored in a personal computer or the like. It can be automatically edited on the editor terminal device based on the electronic mark data.
JP 2003-299010 A

ところが、上記文献記載の映像コンテンツ編集支援システムでは、撮影後の編集作業が自動で行われるようにするには、撮影中に、編集位置を示す電子マークデータを映像コンテンツデータに付加しておく必要があり、撮影時には、編集して残すべきと思われる重要な撮影部分にマーカを付加するといったわずらわしい操作を行わなければならないという問題がある。 However, in the video content editing support system described in the above document, electronic mark data indicating the editing position needs to be added to the video content data during shooting in order to automatically perform editing after shooting. When shooting, there is a problem that it is necessary to perform troublesome operations such as adding a marker to an important shooting part that should be edited and left.

本発明は、上記のような従来の問題点を解決するためになされたもので、撮影者にとって重要と思われる撮影部分を、自動で、あるいはガイダンスに対する簡単な選択操作により編集可能とする撮像装置、撮像方法、および半導体装置を得ることを目的とする。 The present invention has been made in order to solve the above-described conventional problems, and enables an imaging unit that is important for a photographer to be edited automatically or by a simple selection operation for guidance. An object is to obtain an imaging method and a semiconductor device .

本願の請求項１に係る発明は、被写体の撮影により画像情報及び音声情報を取得して、該画像情報及び音声情報を含むオーディオビデオストリームを記録する撮像装置であって、被写体を撮像して画像信号を出力する撮像部と、上記被写体の撮像により得られた画像信号に信号処理を施して、画像の変化の特徴を示す画像特徴量を含む画像情報を抽出する画像処理部と、音声を取得して音声信号を出力する音声取得部と、上記音声の取得により得られた音声信号に信号処理を施して、音声の変化の特徴を示す音声特徴量を含む音声情報を抽出する音声処理部と、上記画像特徴量あるいは音声特徴量が所定の閾値よりも大きい場合に、上記画像あるいは音声が変化した撮影タイミングが編集点として妥当であると判定する特徴量判定部と、該編集点として妥当であると判定された撮影タイミングを示す編集点情報を生成する情報生成部とを備え、上記画像情報、音声情報、及び編集点情報を含むオーディオビデオストリームを記録媒体に格納し、さらに、上記情報生成部は、上記画像処理部に符号化前の画像情報であるバッファデータが保持されているか否かを判定し、該符号化前のバッファデータが保持されている場合には、上記編集点を、上記画像、音声、あるいは撮影状態が変化した撮影タイミングに対応するピクチャに設定し、該符号化前のバッファデータが保持されていない場合には、上記編集点を、上記画像処理部により画像信号を符号化して得られたストリームにおける、上記画像、あるいは音声が変化した撮影タイミングに最も近い、ランダムアクセスの単位であるＶＯＢユニットの先頭ピクチャに設定し、上記画像処理部は、上記符号化前のバッファデータが上記画像処理部に保持されている場合には、上記編集点に対応するピクチャが、ＶＯＢユニットの先頭ピクチャとなるようＶＯＢユニットを形成する、ものである。 The invention according to claim 1 of the present application is an imaging apparatus that acquires image information and audio information by photographing a subject, and records an audio video stream including the image information and audio information. An image capturing unit that outputs a signal, an image processing unit that performs image processing on the image signal obtained by capturing the subject and extracts image information including image feature amounts indicating the characteristics of image change, and obtains sound A voice acquisition unit that outputs a voice signal, a voice processing unit that performs signal processing on the voice signal obtained by the voice acquisition, and extracts voice information including a voice feature amount indicating a feature of voice change; , when the image feature amount or the audio feature amount is greater than a predetermined threshold value, the characteristic amount determination unit for determining a photographing timing of the image or voice has changed is valid for editing point, knitted And an information generation unit that generates an edit point information indicating a photographing timing is determined to be appropriate as a point, stored in a recording medium an audio video stream including the image information, audio information, and an editing point information, further The information generation unit determines whether or not buffer data that is image information before encoding is held in the image processing unit, and when the buffer data before encoding is held, When the edit point is set to a picture corresponding to the image, sound, or shooting timing when the shooting state has changed, and the buffer data before encoding is not held, the edit point is set to the image processing unit. This is the unit of random access that is closest to the shooting timing at which the image or sound changes in the stream obtained by encoding the image signal by If the pre-encoding buffer data is held in the image processing unit, the picture processing unit sets the picture corresponding to the editing point as the first picture of the VOB unit. The VOB unit is formed so that

本願の請求項２に係る発明は、請求項１記載の撮像装置において、撮影状態を示す固有識別情報を取得する固有識別情報取得部と、取得した固有識別情報に信号処理を施して、撮影状態の変化の特徴を示す固有特徴量を抽出する固有識別情報処理部とを備え、上記特徴量判定部は、上記画像特徴量あるいは音声特徴量あるいは固有特徴量が所定の閾値より大きい場合に、上記画像、音声、あるいは撮影状態が変化した撮影タイミングが編集点として妥当であると判定する、ものである。 The invention according to claim 2 of the present application is the imaging apparatus according to claim 1, wherein the unique identification information acquisition unit that acquires the unique identification information indicating the shooting state and the acquired unique identification information are subjected to signal processing to obtain the shooting state. A unique identification information processing unit that extracts a unique feature amount indicating a feature of the change, and the feature amount determination unit, when the image feature amount, the audio feature amount, or the unique feature amount is larger than a predetermined threshold, It is determined that the shooting timing at which the image, the sound, or the shooting state has changed is appropriate as the editing point.

本願の請求項３に係る発明は、請求項２記載の撮像装置において、上記固有特徴量を、撮影中に生じた撮影者の生理変化の大きさ、または撮影者の操作による調整の大きさを示すものとした、ものである。 The invention according to claim 3 of the present application is the imaging device according to claim 2, wherein the unique feature amount is determined by a magnitude of a physiological change of a photographer that occurs during photographing or a magnitude of adjustment by an operation of the photographer. It was meant to be shown.

本願の請求項４に係る発明は、請求項３記載の撮像装置において、上記撮影中に生じた撮影者の生理変化は、撮影者の発汗量の変化、α波の変化、まばたきの回数変化、瞳孔の変化、及び脈拍の変化のうちの少なくとも１つであり、上記固有識別情報取得部は、上記撮影者の生理変化を測定する、該生理変化の種類に応じたセンサを有する、ものである。 According to a fourth aspect of the present invention, in the imaging device according to the third aspect, the photographer's physiological changes that occurred during the photographing are a change in the amount of sweating of the photographer, a change in α wave, a change in the number of blinks It is at least one of a change in the pupil and a change in the pulse, and the unique identification information acquisition unit has a sensor according to the type of the physiological change for measuring the physiological change of the photographer. .

本願の請求項５に係る発明は、請求項１記載の撮像装置において、上記画像処理部は、被写体の撮像により得られた画像信号に対して、符号化の対象となるピクチャを、符号化済みのピクチャを参照して予測符号化する画面間予測符号化処理を施し、上記画像特徴量を、該画面間予測符号化処理で用いる、画像の動きの大きさを示す動きベクトルに基づいて抽出し、上記音声処理部は、音声の取得により得られた音声信号に対して、上記画像信号に対する符号化処理に対応した符号化処理を施し、上記情報生成部は、上記編集点として妥当であると判定された撮影タイミングに基づいて、画像信号の符号化により得られた画像ストリームにおける特定のピクチャを上記編集点に設定する、ものである。 The invention according to claim 5 of the present application is the imaging apparatus according to claim 1, wherein the image processing unit has already encoded a picture to be encoded with respect to an image signal obtained by imaging a subject. Inter-picture predictive coding processing for predictive coding with reference to the picture, and extracting the image feature amount based on a motion vector indicating the magnitude of image motion used in the inter-picture predictive coding processing. The audio processing unit performs an encoding process corresponding to the encoding process for the image signal on the audio signal obtained by acquiring the audio, and the information generation unit is valid as the editing point. Based on the determined shooting timing, a specific picture in the image stream obtained by encoding the image signal is set as the edit point.

本願の請求項６に係る発明は、請求項１記載の撮像装置において、上記音声処理部は、上記音声特徴量を、音声信号の変化の大きさに基づいて抽出する、ものである。 The invention according to claim 6 of the present application is the imaging apparatus according to claim 1, wherein the sound processing unit extracts the sound feature amount based on a change amount of the sound signal.

本願の請求項７に係る発明は、請求項２記載の撮像装置において、ユーザのマニュアル操作信号に基づいて、上記画像特徴量あるいは音声特徴量、並びに上記固有特徴量のそれぞれに対する閾値レベルを設定する制御部を有し、上記特徴量判定部は、上記各特徴量を、対応する、上記制御部で設定された閾値レベルに基づいて判定して、上記画像、音声、あるいは撮影状態が変化した撮影タイミングが編集点として妥当であるか否かを決定する、ものである。 According to a seventh aspect of the present invention, in the imaging device according to the second aspect, a threshold level is set for each of the image feature amount or the sound feature amount and the unique feature amount based on a user manual operation signal. A control unit, wherein the feature amount determination unit determines each feature amount based on a corresponding threshold level set by the control unit, and the image, sound, or shooting state in which the shooting state is changed This is to determine whether the timing is valid as an edit point.

本願の請求項８に係る発明は、請求項２記載の撮像装置において、複数のシナリオのそれぞれと、上記画像特徴量あるいは音声特徴量、並びに固有特徴量の各々に対する閾値レベルの組合せとの対応関係を示すテーブル情報を保持し、ユーザのマニュアル操作により指定されたシナリオと、上記テーブル情報とに基づいて、上記各種特徴量の閾値レベルを設定する制御部を有し、上記特徴量判定部は、上記各特徴量を、対応する、上記制御部で設定された閾値レベルに基づいて判定して、上記画像、音声、あるいは撮影状態が変化した撮影タイミングが編集点として妥当であるか否かを決定する、ものである。 The invention according to claim 8 of the present application is the imaging device according to claim 2, wherein each of the plurality of scenarios is associated with a combination of a threshold level with respect to each of the image feature amount, the sound feature amount, and the unique feature amount. And a control unit that sets threshold levels of the various feature amounts based on the scenario specified by the user's manual operation and the table information, and the feature amount determination unit includes: Each feature amount is determined based on a corresponding threshold level set by the control unit, and it is determined whether the shooting timing at which the image, sound, or shooting state has changed is appropriate as an editing point. To do.

本願の請求項９に係る発明は、請求項８記載の撮像装置において、上記テーブル情報を、ネットワーク上の情報端末からダウンロードして取得した情報とした、ものである。 The invention according to claim 9 of the present application is the imaging apparatus according to claim 8, wherein the table information is information obtained by downloading from an information terminal on a network.

本願の請求項１０に係る発明は、請求項１記載の撮像装置において、上記情報生成部は、上記編集点を、画像、あるいは音声に変化を与えるイベントが発生した時点から、実際に画像、あるいは音声が変化するまでの遅延時間に応じた撮影タイミングに設定する、ものである。 The invention according to claim 10 of the present invention is the imaging apparatus according to claim 1, wherein said information generating unit, the editing point, the image or from the time when the event that changes occur in the voice, actually image or, The shooting timing is set according to the delay time until the sound changes.

本願の請求項１１に係る発明は、被写体の撮影により画像情報及び音声情報を取得して、該画像情報及び音声情報を含むオーディオビデオストリームを記録する撮像装置であって、被写体を撮像して画像信号を出力する撮像部と、上記被写体の撮像により得られた画像信号に信号処理を施して、画像の変化の特徴を示す画像特徴量を含む画像情報を抽出する画像処理部と、音声を取得して音声信号を出力する音声取得部と、上記音声の取得により得られた音声信号に信号処理を施して、音声の変化の特徴を示す音声特徴量を含む音声情報を抽出する音声処理部と、上記画像特徴量あるいは音声特徴量が所定の閾値より大きい場合に、上記画像あるいは音声が変化した撮影タイミングが編集点として妥当であると判定する特徴量判定部と、該編集点として妥当であると判定された撮影タイミングを示す編集点情報を生成する情報生成部とを備え、上記画像情報、音声情報、及び編集点情報を含むオーディオビデオストリームを記録媒体に格納し、さらに、上記情報生成部は、上記画像処理部に符号化前の画像情報であるバッファデータが保持されているか否かを判定し、該符号化前のバッファデータが保持されている場合には、上記編集点を、上記画像、あるいは音声が変化した撮影タイミングに対応するピクチャに設定し、該符号化前のバッファデータが保持されていない場合には、上記画像処理部での符号化処理に利用可能な残り時間と、再符号化に要する時間とを比較判定し、該再符号化に要する時間が上記画像処理部での符号化処理に利用可能な残り時間を超えている場合は、上記編集点を、上記画像処理部により画像信号を符号化して得られた画像ストリームにおける、上記画像、あるいは音声が変化した撮影タイミングに最も近い、ランダムアクセスの単位であるＶＯＢユニットの先頭ピクチャに設定し、上記再符号化に要する時間が上記画像処理部での符号化処理に利用可能な残り時間を超えていない場合は、上記画像ストリームの再符号化を画像処理部に指令し、上記画像処理部は、上記符号化前のバッファデータが保持されている場合には、上記編集点に対応するピクチャが、ＶＯＢユニットの先頭ピクチャとなるようＶＯＢユニットを形成し、上記符号化前のバッファデータが保持されておらず、かつ上記再符号化に要する時間が上記画像処理部での符号化処理に利用可能な残り時間を超えていない場合は、上記画像ストリームを、上記編集点に対応するピクチャが、ＶＯＢユニットの先頭に位置するＩピクチャとなるよう再符号化する、ものである。 The invention according to claim 11 of the present application is an imaging apparatus that acquires image information and audio information by photographing a subject and records an audio video stream including the image information and audio information. An image capturing unit that outputs a signal, an image processing unit that performs image processing on the image signal obtained by capturing the subject and extracts image information including image feature amounts indicating the characteristics of image change, and obtains sound A voice acquisition unit that outputs a voice signal, a voice processing unit that performs signal processing on the voice signal obtained by the voice acquisition, and extracts voice information including a voice feature amount indicating a feature of voice change; A feature amount determination unit that determines that a shooting timing at which the image or sound has changed is appropriate as an edit point when the image feature amount or the sound feature amount is greater than a predetermined threshold; An information generation unit that generates edit point information indicating shooting timing determined to be valid as a point, and stores an audio video stream including the image information, audio information, and edit point information in a recording medium, and The information generation unit determines whether or not buffer data that is image information before encoding is held in the image processing unit, and when the buffer data before encoding is held, When the edit point is set to the picture corresponding to the shooting timing when the image or sound changes, and the buffer data before the encoding is not held, it can be used for the encoding process in the image processing unit. If the remaining time and the time required for re-encoding are compared and the time required for re-encoding exceeds the remaining time available for the encoding process in the image processing unit, The edit point is set to the first picture of a VOB unit, which is a unit of random access, closest to the shooting timing at which the image or sound changes in the image stream obtained by encoding the image signal by the image processing unit. If the time required for the re-encoding does not exceed the remaining time available for the encoding process in the image processing unit, the image processing unit is instructed to re-encode the image stream, and the image processing When the pre-encoding buffer data is held, the unit forms a VOB unit so that the picture corresponding to the edit point becomes the first picture of the VOB unit, and the pre-encoding buffer data When the time required for the re-encoding does not exceed the remaining time available for the encoding process in the image processing unit. Is on the outs picture stream, the picture corresponding to the edit point, be re-encoded to be an I-picture positioned at the head of the VOB unit is intended.

本願の請求項１２に係る発明は、請求項１または１１記載の撮像装置において、上記画像、あるいは音声に変化を与えるイベントが発生した時刻を、上記編集点としてオーディオビデオストリームに記録する、ものである。 According to a twelfth aspect of the present invention, in the imaging apparatus according to the first or eleventh aspect , the time at which an event that changes the image or the sound occurs is recorded in the audio video stream as the edit point. is there.

本願の請求項１３に係る発明は、請求項１２記載の撮像装置において、上記イベントの発生時刻を、再生条件を示すプレイリストとして上記オーディオビデオストリームに記録する、ものである。 According to a thirteenth aspect of the present invention, in the imaging device according to the twelfth aspect , the occurrence time of the event is recorded in the audio video stream as a play list indicating a reproduction condition.

本願の請求項１４に係る発明は、請求項１２記載の撮像装置において、上記編集点が、画像、あるいは音声のいずれの要因によるものであるかを示す情報を、上記オーディオビデオストリームに埋め込む、ものである。 According to a fourteenth aspect of the present invention, in the imaging device according to the twelfth aspect, information indicating whether the edit point is caused by an image or a sound is embedded in the audio video stream. It is.

本願の請求項１５に係る発明は、請求項１または１１記載の撮像装置において、上記情報生成部は、上記画像、あるいは音声に変化を与えるイベントが発生した時刻に対応するピクチャを、編集時のサムネイル表示に用いるシーケンス外ピクチャとして上記オーディオビデオストリームに埋め込む、ものである。 According to a fifteenth aspect of the present invention, in the imaging device according to the first or eleventh aspect , the information generation unit is configured to edit a picture corresponding to a time at which an event that changes the image or the sound occurs at the time of editing. It is embedded in the audio video stream as an out-of-sequence picture used for thumbnail display.

本願の請求項１６に係る発明は、被写体の撮影により画像情報及び音声情報を取得して、該画像情報及び音声情報を含むオーディオビデオストリームを記録する撮像方法であって、被写体を撮像して画像信号を出力する撮像ステップと、上記被写体の撮像により得られた画像信号に信号処理を施して、画像の変化の特徴を示す画像特徴量を含む画像情報を抽出する画像処理ステップと、音声を取得して音声信号を出力する音声取得ステップと、上記音声の取得により得られた音声信号に信号処理を施して、音声の変化の特徴を示す音声特徴量を含む音声情報を抽出する音声処理ステップと、上記画像特徴量あるいは音声特徴量が所定の閾値よりも大きい場合に、上記画像あるいは音声が変化した撮影タイミングが編集点として妥当であると判定する特徴量判定ステップと、該編集点として妥当であると判定された撮影タイミングを示す編集点情報を生成する情報生成ステップと、上記画像情報、音声情報、及び編集点情報を含むオーディオビデオストリームを記録媒体に格納するステップとを含み、さらに、上記情報生成ステップは、上記画像処理ステップを実行する画像処理部に符号化前の画像情報であるバッファデータが保持されているか否かを判定し、該符号化前のバッファデータが保持されている場合には、上記編集点を、上記画像、音声、あるいは撮影状態が変化した撮影タイミングに対応するピクチャに設定し、該符号化前のバッファデータが保持されていない場合には、上記編集点を、上記画像処理部により画像信号を符号化して得られたストリームにおける、上記画像、あるいは音声が変化した撮影タイミングに最も近い、ランダムアクセスの単位であるＶＯＢユニットの先頭ピクチャに設定し、上記画像処理ステップは、上記符号化前のバッファデータが上記画像処理部に保持されている場合には、上記編集点に対応するピクチャが、ＶＯＢユニットの先頭ピクチャとなるようＶＯＢユニットを形成する、ものである。 The invention according to claim 16 of the present application is an imaging method for acquiring image information and audio information by photographing a subject and recording an audio video stream including the image information and audio information. An imaging step for outputting a signal, an image processing step for performing signal processing on the image signal obtained by imaging the subject, and extracting image information including an image feature amount indicating a feature of an image change, and acquiring sound A voice acquisition step for outputting a voice signal, and a voice processing step for performing signal processing on the voice signal obtained by the voice acquisition to extract voice information including a voice feature amount indicating a feature of voice change; When the image feature amount or the sound feature amount is larger than a predetermined threshold, it is determined that the shooting timing at which the image or sound has changed is appropriate as the editing point. A feature amount determination step to be determined, an information generation step for generating edit point information indicating a photographing timing determined to be appropriate as the edit point, and an audio video stream including the image information, audio information, and edit point information The information generating step determines whether or not buffer data that is image information before encoding is held in the image processing unit that executes the image processing step. If the buffer data before encoding is held, the editing point is set to the picture corresponding to the image, audio, or shooting timing when the shooting state has changed, and the buffer data before encoding is set. Is not held, the edit point is the stream in the stream obtained by encoding the image signal by the image processing unit. Set to the first picture of the VOB unit, which is the unit of random access, closest to the shooting timing at which the image or sound has changed, and the image processing step stores the buffer data before encoding in the image processing unit. If so, the VOB unit is formed so that the picture corresponding to the edit point becomes the first picture of the VOB unit.

本願の請求項１７に係る発明は、被写体の撮影により画像情報及び音声情報を取得して、該画像情報及び音声情報を含むオーディオビデオストリームを記録する半導体装置であって、上記被写体の撮像により得られた画像信号に信号処理を施して、画像の変化の特徴を示す画像特徴量を含む画像情報を抽出する画像処理部と、音声を取得して音声信号を出力する音声取得部と、上記音声の取得により得られた音声信号に信号処理を施して、音声の変化の特徴を示す音声特徴量を含む音声情報を抽出する音声処理部と、上記画像特徴量あるいは音声特徴量が所定の閾値よりも大きい場合に、上記画像あるいは音声が変化した撮影タイミングが編集点として妥当であると判定する特徴量判定部と、該編集点として妥当であると判定された撮影タイミングを示す編集点情報を生成する情報生成部とを備え、上記画像情報、音声情報、及び編集点情報を含むオーディオビデオストリームを記録媒体に格納し、さらに、上記情報生成部は、上記画像処理部に符号化前の画像情報であるバッファデータが保持されているか否かを判定し、該符号化前のバッファデータが保持されている場合には、上記編集点を、上記画像、音声、あるいは撮影状態が変化した撮影タイミングに対応するピクチャに設定し、該符号化前のバッファデータが保持されていない場合には、上記編集点を、上記画像処理部により画像信号を符号化して得られたストリームにおける、上記画像、あるいは音声が変化した撮影タイミングに最も近い、ランダムアクセスの単位であるＶＯＢユニットの先頭ピクチャに設定し、上記画像処理部は、上記符号化前のバッファデータが上記画像処理部に保持されている場合には、上記編集点に対応するピクチャが、ＶＯＢユニットの先頭ピクチャとなるようＶＯＢユニットを形成する、ものである。 The invention according to claim 17 of the present application is a semiconductor device that acquires image information and audio information by photographing a subject and records an audio video stream including the image information and audio information, and is obtained by imaging the subject. An image processing unit that performs signal processing on the received image signal to extract image information including an image feature amount indicating a change feature of the image, an audio acquisition unit that acquires audio and outputs an audio signal, and the audio A voice processing unit that performs signal processing on the voice signal obtained by acquiring the voice information including the voice feature quantity indicating the feature of the voice change, and the image feature quantity or the voice feature quantity is greater than a predetermined threshold value. when is large, the characteristic amount determination unit for determining a photographing timing of the image or voice has changed is valid for editing point, shot is determined to be appropriate as the edit point in Thailand And an information generation unit that generates an edit point information indicating the ring, and stores the image information, audio information, and the audio video stream including an edit point information on a recording medium, further, the information generating unit, the image processing Whether or not buffer data that is image information before encoding is held, and if the buffer data before encoding is held, the edit point is set to the image, audio, or When the picture corresponding to the shooting timing at which the shooting state has changed is set and the buffer data before encoding is not held, the editing point is obtained by encoding the image signal by the image processing unit. Set the first picture of the VOB unit, which is the unit of random access, closest to the shooting timing at which the image or sound changes in the stream. Image processing unit, if the coding buffer before the data is held in the image processing section, a picture corresponding to the edit point, to form a VOB unit to be a head picture VOB unit, stuff It is.

本願請求項１の発明によれば、被写体の撮影により画像情報及び音声情報を取得して、該画像情報及び音声情報を含むオーディオビデオストリームを記録する撮像装置であって、被写体を撮像して画像信号を出力する撮像部と、上記被写体の撮像により得られた画像信号に信号処理を施して、画像の変化の特徴を示す画像特徴量を含む画像情報を抽出する画像処理部と、音声を取得して音声信号を出力する音声取得部と、上記音声の取得により得られた音声信号に信号処理を施して、音声の変化の特徴を示す音声特徴量を含む音声情報を抽出する音声処理部と、上記画像特徴量あるいは音声特徴量が所定の閾値よりも大きい場合に、上記画像あるいは音声が変化した撮影タイミングが編集点として妥当であると判定する特徴量判定部と、該編集点として妥当であると判定された撮影タイミングを示す編集点情報を生成する情報生成部とを備え、上記画像情報、音声情報、及び編集点情報を含むオーディオビデオストリームを記録媒体に格納し、さらに、上記情報生成部は、上記画像処理部に符号化前の画像情報であるバッファデータが保持されているか否かを判定し、該符号化前のバッファデータが保持されている場合には、上記編集点を、上記画像、音声、あるいは撮影状態が変化した撮影タイミングに対応するピクチャに設定し、該符号化前のバッファデータが保持されていない場合には、上記編集点を、上記画像処理部により画像信号を符号化して得られたストリームにおける、上記画像、あるいは音声が変化した撮影タイミングに最も近い、ランダムアクセスの単位であるＶＯＢユニットの先頭ピクチャに設定し、上記画像処理部は、上記符号化前のバッファデータが上記画像処理部に保持されている場合には、上記編集点に対応するピクチャが、ＶＯＢユニットの先頭ピクチャとなるようＶＯＢユニットを形成するものとしたので、符号化前のバッファデータがある場合に、編集点を正確に設定し、符号化前のバッファデータがない場合に、編集点を簡単に設定することができ、撮影により得られたオーディオビデオストリームの、撮影者にとって重要と思われる部分を、自動であるいはガイダンスに対する簡単な選択操作により編集することができる。 According to the first aspect of the present invention, there is provided an imaging apparatus that acquires image information and audio information by photographing a subject and records an audio video stream including the image information and audio information. An image capturing unit that outputs a signal, an image processing unit that performs image processing on the image signal obtained by capturing the subject and extracts image information including image feature amounts indicating the characteristics of image change, and obtains sound A voice acquisition unit that outputs a voice signal, a voice processing unit that performs signal processing on the voice signal obtained by the voice acquisition, and extracts voice information including a voice feature amount indicating a feature of voice change; , when the image feature amount or the audio feature amount is greater than a predetermined threshold value, the characteristic amount determination unit for determining a photographing timing of the image or voice has changed is valid for editing point, knitted And an information generation unit that generates an edit point information indicating a photographing timing is determined to be appropriate as a point, stored in a recording medium an audio video stream including the image information, audio information, and an editing point information, further The information generation unit determines whether or not buffer data that is image information before encoding is held in the image processing unit, and when the buffer data before encoding is held, When the edit point is set to a picture corresponding to the image, sound, or shooting timing when the shooting state has changed, and the buffer data before encoding is not held, the edit point is set to the image processing unit. This is the unit of random access that is closest to the shooting timing at which the image or sound changes in the stream obtained by encoding the image signal by If the pre-encoding buffer data is held in the image processing unit, the picture processing unit sets the picture corresponding to the editing point as the first picture of the VOB unit. The VOB unit is formed so that the edit point is accurately set when there is buffer data before encoding, and the edit point is easily set when there is no buffer data before encoding. The portion of the audio / video stream obtained by shooting can be edited automatically or by a simple selection operation for guidance.

本願請求項２の発明によれば、請求項１記載の撮像装置において、撮影状態を示す固有識別情報を取得する固有識別情報取得部と、取得した固有識別情報に信号処理を施して、撮影状態の変化の特徴を示す固有特徴量を抽出する固有識別情報処理部とを備え、上記特徴量判定部は、上記画像特徴量あるいは音声特徴量あるいは固有特徴量が所定の閾値より大きい場合に、上記画像、音声、あるいは撮影状態が変化した撮影タイミングが編集点として妥当であると判定するものとしたので、撮影状態が大きく変化した撮影タイミングを編集点として設定することができる。 According to the second aspect of the present invention, in the imaging apparatus according to the first aspect, the unique identification information acquisition unit that acquires the unique identification information indicating the shooting state, and the acquired unique identification information are subjected to signal processing to obtain the shooting state. A unique identification information processing unit that extracts a unique feature amount indicating a feature of the change, and the feature amount determination unit, when the image feature amount, the audio feature amount, or the unique feature amount is larger than a predetermined threshold, Since it is determined that the shooting timing at which the image, sound, or shooting state has changed is appropriate as the editing point, the shooting timing at which the shooting state has greatly changed can be set as the editing point.

本願請求項３の発明によれば、請求項２記載の撮像装置において、上記固有特徴量を、撮影中に生じた撮影者の生理変化の大きさ、あるいは撮影者の操作による調整の大きさを示すものとしたので、撮影者が無意識で撮像装置を操作した撮影タイミングや、撮影者が意識を集中したり興奮したりした撮影タイミングを、編集点として設定することができるという効果がある。 According to a third aspect of the present invention, in the imaging apparatus according to the second aspect, the characteristic feature amount is determined by a magnitude of a photographer's physiological change occurring during photographing or a magnitude of adjustment by a photographer's operation. Therefore, the shooting timing when the photographer unconsciously operates the imaging apparatus and the shooting timing when the photographer concentrates or gets excited can be set as editing points.

本願請求項４の発明によれば、請求項３記載の撮像装置において、撮影者の発汗量の変化、α波の変化、まばたきの頻度、瞳孔の変化、及び脈拍の変化のうちの少なくとも１つをセンサにより測定して、このような撮影者の生理変化が撮影中に生じた撮影タイミングを編集点とするので、撮影者にとって重要なシーンを、撮影者の生理変化に基づいて編集することが可能となる。 According to the invention of claim 4 of the present application, in the imaging apparatus of claim 3, at least one of a change in the amount of sweat of the photographer, a change in α wave, a blinking frequency, a change in pupil, and a change in pulse Therefore, it is possible to edit an important scene for the photographer based on the physiological change of the photographer. It becomes possible.

本願請求項５の発明によれば、請求項１記載の撮像装置において、上記画像処理部は、被写体の撮像により得られた画像信号に対して、符号化の対象となるピクチャを、符号化済みのピクチャを参照して予測符号化する画面間予測符号化処理を施し、上記画像特徴量を、該画面間予測符号化処理で用いる、画像の動きの大きさを示す動きベクトルに基づいて抽出するので、画像の動きに関する画像特徴量を、予測符号化処理で用いる動きベクトルに基づいて正確に抽出することができる。 According to a fifth aspect of the present invention, in the imaging apparatus according to the first aspect, the image processing unit has encoded a picture to be encoded with respect to an image signal obtained by imaging a subject. Inter-picture predictive coding processing for predictive coding with reference to the picture is extracted, and the image feature amount is extracted based on a motion vector indicating the magnitude of image motion used in the inter-picture predictive coding processing. Therefore, the image feature amount related to the motion of the image can be accurately extracted based on the motion vector used in the predictive encoding process.

本願請求項６の発明によれば、請求項１記載の撮像装置において、上記音声処理部は、上記音声特徴量を、音声信号の変化の大きさに基づいて抽出するので、音の大きさに関する音声特徴量を、音声信号に基づいて正確に抽出することができる。 According to the invention of claim 6 of the present application, in the imaging apparatus according to claim 1, the sound processing unit extracts the sound feature amount based on the amount of change in the sound signal. The voice feature amount can be accurately extracted based on the voice signal.

本願請求項７の発明によれば、請求項２記載の撮像装置において、上記画像特徴量あるいは音声特徴量、並びに固有特徴量のそれぞれに対する閾値レベルをマニュアル操作信号に基づいて設定する制御部を有するので、画像特徴量あるいは音声特徴量、並びに固有特徴量の検出強度を、ユーザが設定することができ、これにより、撮影した映像データの自動編集にユーザの嗜好などを反映することができる。 According to the seventh aspect of the present invention, in the imaging apparatus according to the second aspect, the image processing apparatus includes a control unit that sets a threshold level for each of the image feature amount, the sound feature amount, and the unique feature amount based on a manual operation signal. Therefore, the user can set the detection intensity of the image feature amount or the voice feature amount and the unique feature amount, and thus the user's preference can be reflected in the automatic editing of the captured video data.

本願請求項８の発明によれば、請求項２記載の撮像装置において、複数の異なるシナリオのそれぞれと、画像特徴量あるいは音声特徴量、並びに固有特徴量に対する閾値レベルの組合せとの対応関係を示すテーブル情報を保持し、ユーザのマニュアル操作によるシナリオの選択により、上記各特徴量に対する閾値レベルを設定するので、運動会や結婚式といった撮影場所に応じたシナリオを選択するという簡単な操作により、運動会や結婚式などの撮影が行われる場所に応じた自動編集が可能となる。 According to the invention of claim 8 of the present application, in the imaging apparatus according to claim 2, the correspondence relationship between each of a plurality of different scenarios and a combination of a threshold level with respect to an image feature quantity or a voice feature quantity and a unique feature quantity is shown. The table level is stored, and the threshold level for each feature is set by selecting the scenario by manual operation of the user. Therefore, by selecting the scenario according to the shooting location, such as athletic meet or wedding, Automatic editing is possible depending on the location where the wedding is taken.

本願請求項９の発明によれば、請求項８記載の撮像装置において、上記テーブル情報は、ネットワーク上の情報端末からダウンロードして取得するので、撮像装置のメーカのホームページなどを利用して、上記画像、音声あるいは撮影状態の変化である各特徴量に対して、画像、音声、あるいは撮影状態が変化した撮影タイミングが編集点として妥当であるか否かを判定する、シナリオに合った適切な判定強度を設定することができる。 According to the ninth aspect of the present invention, in the imaging apparatus according to the eighth aspect, the table information is downloaded and acquired from an information terminal on a network. Appropriate judgment that suits the scenario for determining whether the shooting timing at which the image, sound, or shooting state has changed is appropriate as an edit point for each feature that is a change in the image, sound, or shooting state The intensity can be set.

本願請求項１０の発明によれば、請求項１記載の撮像装置において、上記情報生成部は、上記編集点を、画像、あるいは音声に変化を与えるイベントが発生した時点から、実際に画像、あるいは音声が変化するまでの遅延時間に応じた撮影タイミングに設定するので、編集点を、ほぼ、イベントが実際に発生したタイミングに設定することができる。 According to a tenth aspect of the present invention, in the imaging apparatus according to the first aspect , the information generation unit may actually change the edit point from the point in time when an event that changes the image or the sound occurs. Since the shooting timing is set according to the delay time until the sound changes, the editing point can be set almost at the timing when the event actually occurs.

本願請求項１１の発明によれば、被写体の撮影により画像情報及び音声情報を取得して、該画像情報及び音声情報を含むオーディオビデオストリームを記録する撮像装置であって、被写体を撮像して画像信号を出力する撮像部と、上記被写体の撮像により得られた画像信号に信号処理を施して、画像の変化の特徴を示す画像特徴量を含む画像情報を抽出する画像処理部と、音声を取得して音声信号を出力する音声取得部と、上記音声の取得により得られた音声信号に信号処理を施して、音声の変化の特徴を示す音声特徴量を含む音声情報を抽出する音声処理部と、上記画像特徴量あるいは音声特徴量が所定の閾値より大きい場合に、上記画像あるいは音声が変化した撮影タイミングが編集点として妥当であると判定する特徴量判定部と、該編集点として妥当であると判定された撮影タイミングを示す編集点情報を生成する情報生成部とを備え、上記画像情報、音声情報、及び編集点情報を含むオーディオビデオストリームを記録媒体に格納し、さらに、上記情報生成部は、上記画像処理部に符号化前の画像情報であるバッファデータが保持されているか否かを判定し、該符号化前のバッファデータが保持されている場合には、上記編集点を、上記画像、あるいは音声が変化した撮影タイミングに対応するピクチャに設定し、該符号化前のバッファデータが保持されていない場合には、上記画像処理部での符号化処理に利用可能な残り時間と、再符号化に要する時間とを比較判定し、該再符号化に要する時間が上記画像処理部での符号化処理に利用可能な残り時間を超えている場合は、上記編集点を、上記画像処理部により画像信号を符号化して得られた画像ストリームにおける、上記画像、あるいは音声が変化した撮影タイミングに最も近い、ランダムアクセスの単位であるＶＯＢユニットの先頭ピクチャに設定し、上記再符号化に要する時間が上記画像処理部での符号化処理に利用可能な残り時間を超えていない場合は、上記画像ストリームの再符号化を画像処理部に指令し、上記画像処理部は、上記符号化前のバッファデータが保持されている場合には、上記編集点に対応するピクチャが、ＶＯＢユニットの先頭ピクチャとなるようＶＯＢユニットを形成し、上記符号化前のバッファデータが保持されておらず、かつ上記再符号化に要する時間が上記画像処理部での符号化処理に利用可能な残り時間を超えていない場合は、上記画像ストリームを、上記編集点に対応するピクチャが、ＶＯＢユニットの先頭に位置するＩピクチャとなるよう再符号化するので、符号化前のバッファデータがある場合、また、符号化前のバッファデータがない場合でも再符号化のために必要な時間が符号化に使える残り時間を超えていない場合には、上記編集点を正確に設定し、符号化前のバッファデータがない場合で再符号化のために必要な時間が符号化に使える残り時間を超えているときには編集点を簡単に設定することができ、撮影により得られたオーディオビデオストリームの、撮影者にとって重要と思われる部分を、自動であるいはガイダンスに対する簡単な選択操作により編集することができる。 According to the invention of claim 11 of the present application, there is provided an imaging apparatus that acquires image information and audio information by photographing a subject and records an audio video stream including the image information and audio information. An image capturing unit that outputs a signal, an image processing unit that performs image processing on the image signal obtained by capturing the subject and extracts image information including image feature amounts indicating the characteristics of image change, and obtains sound A voice acquisition unit that outputs a voice signal, a voice processing unit that performs signal processing on the voice signal obtained by the voice acquisition, and extracts voice information including a voice feature amount indicating a feature of voice change; A feature amount determination unit that determines that a shooting timing at which the image or sound has changed is appropriate as an edit point when the image feature amount or the sound feature amount is greater than a predetermined threshold; An information generation unit that generates edit point information indicating shooting timing determined to be valid as a point, and stores an audio video stream including the image information, audio information, and edit point information in a recording medium, and The information generation unit determines whether or not buffer data that is image information before encoding is held in the image processing unit, and when the buffer data before encoding is held, When the edit point is set to the picture corresponding to the shooting timing when the image or sound changes, and the buffer data before the encoding is not held, it can be used for the encoding process in the image processing unit. If the remaining time and the time required for re-encoding are compared and the time required for re-encoding exceeds the remaining time available for the encoding process in the image processing unit, The edit point is set to the first picture of a VOB unit, which is a unit of random access, closest to the shooting timing at which the image or sound changes in the image stream obtained by encoding the image signal by the image processing unit. If the time required for the re-encoding does not exceed the remaining time available for the encoding process in the image processing unit, the image processing unit is instructed to re-encode the image stream, and the image processing When the pre-encoding buffer data is held, the unit forms a VOB unit so that the picture corresponding to the edit point becomes the first picture of the VOB unit, and the pre-encoding buffer data When the time required for the re-encoding does not exceed the remaining time available for the encoding process in the image processing unit. Is on the outs picture stream, the picture corresponding to the edit point, since the re-encoded to be an I-picture positioned at the head of the VOB unit, if there is a pre-encoding of the buffer data, and coding Even if there is no previous buffer data, if the time required for re-encoding does not exceed the remaining time available for encoding, the above edit point is set correctly and there is no buffer data before encoding. When the time required for re-encoding exceeds the remaining time available for encoding, the edit point can be set easily, and it seems that it is important for the photographer of the audio video stream obtained by shooting The part can be edited automatically or by a simple selection operation for guidance.

本願請求項１２の発明によれば、請求項１または１１記載の撮像装置において、上記画像、あるいは音声に変化を与えるイベントが発生した時刻を、上記編集点としてオーディオビデオストリームに記録するので、編集点を非常に簡単に設定することができる。 According to the twelfth aspect of the present invention, in the imaging apparatus according to the first or eleventh aspect , the time when the event that changes the image or the sound is recorded is recorded in the audio video stream as the edit point. The point can be set very easily.

本願請求項１３の発明によれば、請求項１２記載の撮像装置において、イベント発生時刻を、再生条件を示すプレイリストとしてオーディオビデオストリームに記録するので、編集点を非常に簡単に設定することができる。 According to the thirteenth aspect of the present invention, in the imaging device according to the twelfth aspect , since the event occurrence time is recorded in the audio-video stream as a playlist indicating the reproduction condition, the editing point can be set very easily. it can.

本願請求項１４の発明によれば、請求項１２記載の撮像装置において、上記編集点が、画像、あるいは音声のいずれの要因によるものであるかを示す情報を、オーディオビデオストリームに埋め込むので、編集時には、編集点がどのような要因によるものであるかによって編集点の間引きを行うことも可能である。 According to the invention of claim 14, in the imaging apparatus according to claim 12, wherein said edit point, information indicating if by any cause of images, or voice, embeds the audio video stream, At the time of editing, it is possible to thin out the editing points depending on what factors cause the editing points.

本願請求項１５の発明によれば、請求項１または１１記載の撮像装置において、イベント発生時刻に対応するピクチャを、編集時のサムネイル表示に用いるシーケンス外ピクチャとしてストリームに埋め込むので、編集時には、編集点として適切なピクチャを、サムネイル表示により一目で確認することができる。 According to the fifteenth aspect of the present invention, in the imaging device according to the first or eleventh aspect , the picture corresponding to the event occurrence time is embedded in the stream as an out-of-sequence picture used for thumbnail display at the time of editing. A picture suitable as a point can be confirmed at a glance by a thumbnail display.

本願請求項１６の発明によれば、被写体の撮影により画像情報及び音声情報を取得して、該画像情報及び音声情報を含むオーディオビデオストリームを記録する撮像方法であって、被写体を撮像して画像信号を出力する撮像ステップと、上記被写体の撮像により得られた画像信号に信号処理を施して、画像の変化の特徴を示す画像特徴量を含む画像情報を抽出する画像処理ステップと、音声を取得して音声信号を出力する音声取得ステップと、上記音声の取得により得られた音声信号に信号処理を施して、音声の変化の特徴を示す音声特徴量を含む音声情報を抽出する音声処理ステップと、上記画像特徴量あるいは音声特徴量が所定の閾値よりも大きい場合に、上記画像あるいは音声が変化した撮影タイミングが編集点として妥当であると判定する特徴量判定ステップと、該編集点として妥当であると判定された撮影タイミングを示す編集点情報を生成する情報生成ステップと、上記画像情報、音声情報、及び編集点情報を含むオーディオビデオストリームを記録媒体に格納するステップとを含み、さらに、上記情報生成ステップは、上記画像処理ステップを実行する画像処理部に符号化前の画像情報であるバッファデータが保持されているか否かを判定し、該符号化前のバッファデータが保持されている場合には、上記編集点を、上記画像、音声、あるいは撮影状態が変化した撮影タイミングに対応するピクチャに設定し、該符号化前のバッファデータが保持されていない場合には、上記編集点を、上記画像処理部により画像信号を符号化して得られたストリームにおける、上記画像、あるいは音声が変化した撮影タイミングに最も近い、ランダムアクセスの単位であるＶＯＢユニットの先頭ピクチャに設定し、上記画像処理ステップは、上記符号化前のバッファデータが上記画像処理部に保持されている場合には、上記編集点に対応するピクチャが、ＶＯＢユニットの先頭ピクチャとなるようＶＯＢユニットを形成するので、符号化前のバッファデータがある場合に、編集点を正確に設定し、符号化前のバッファデータがない場合に、編集点を簡単に設定することができ、撮影により得られたオーディオビデオストリームの、撮影者にとって重要と思われる部分を、自動であるいはガイダンスに対する簡単な選択操作により編集することが可能となる。
According to the sixteenth aspect of the present invention, there is provided an imaging method for acquiring image information and audio information by photographing a subject and recording an audio video stream including the image information and audio information. An imaging step for outputting a signal, an image processing step for performing signal processing on the image signal obtained by imaging the subject, and extracting image information including an image feature amount indicating a feature of an image change, and acquiring sound A voice acquisition step for outputting a voice signal, and a voice processing step for performing signal processing on the voice signal obtained by the voice acquisition to extract voice information including a voice feature amount indicating a feature of voice change; When the image feature amount or the sound feature amount is larger than a predetermined threshold, it is determined that the shooting timing at which the image or sound has changed is appropriate as the editing point. A feature amount determination step to be determined, an information generation step for generating edit point information indicating a photographing timing determined to be appropriate as the edit point, and an audio video stream including the image information, audio information, and edit point information The information generating step determines whether or not buffer data that is image information before encoding is held in the image processing unit that executes the image processing step. If the buffer data before encoding is held, the edit point is set to the picture corresponding to the image, audio, or shooting timing when the shooting state has changed, and the buffer data before encoding is set. Is not held, the edit point is the stream in the stream obtained by encoding the image signal by the image processing unit. Set to the first picture of the VOB unit, which is the unit of random access, closest to the shooting timing at which the image or sound has changed, and the image processing step stores the buffer data before encoding in the image processing unit. If so, the VOB unit is formed so that the picture corresponding to the edit point becomes the first picture of the VOB unit. Therefore, when there is buffer data before encoding, the edit point is accurately set and encoded. When there is no previous buffer data, the edit point can be set easily, and the part of the audio / video stream obtained by shooting is considered to be important for the photographer automatically or by simple selection operation for guidance. It becomes possible to edit.

本願請求項１７の発明によれば、被写体の撮影により画像情報及び音声情報を取得して、該画像情報及び音声情報を含むオーディオビデオストリームを記録する半導体装置であって、上記被写体の撮像により得られた画像信号に信号処理を施して、画像の変化の特徴を示す画像特徴量を含む画像情報を抽出する画像処理部と、音声を取得して音声信号を出力する音声取得部と、上記音声の取得により得られた音声信号に信号処理を施して、音声の変化の特徴を示す音声特徴量を含む音声情報を抽出する音声処理部と、上記画像特徴量あるいは音声特徴量が所定の閾値よりも大きい場合に、上記画像あるいは音声が変化した撮影タイミングが編集点として妥当であると判定する特徴量判定部と、該編集点として妥当であると判定された撮影タイミングを示す編集点情報を生成する情報生成部とを備え、上記画像情報、音声情報、及び編集点情報を含むオーディオビデオストリームを記録媒体に格納し、さらに、上記情報生成部は、上記画像処理部に符号化前の画像情報であるバッファデータが保持されているか否かを判定し、該符号化前のバッファデータが保持されている場合には、上記編集点を、上記画像、音声、あるいは撮影状態が変化した撮影タイミングに対応するピクチャに設定し、該符号化前のバッファデータが保持されていない場合には、上記編集点を、上記画像処理部により画像信号を符号化して得られたストリームにおける、上記画像、あるいは音声が変化した撮影タイミングに最も近い、ランダムアクセスの単位であるＶＯＢユニットの先頭ピクチャに設定し、上記画像処理部は、上記符号化前のバッファデータが上記画像処理部に保持されている場合には、上記編集点に対応するピクチャが、ＶＯＢユニットの先頭ピクチャとなるようＶＯＢユニットを形成するので、符号化前のバッファデータがある場合、また、符号化前のバッファデータがない場合でも再符号化のために必要な時間が符号化に使える残り時間を超えていない場合には、上記編集点を正確に設定し、符号化前のバッファデータがない場合で再符号化のために必要な時間が符号化に使える残り時間を超えているときには編集点を簡単に設定することができ、撮影により得られたオーディオビデオストリームを、その撮影者にとって重要と思われる部分を、自動であるいはガイダンスに対する簡単な選択操作により編集可能なストリームとすることができる半導体装置を得ることができる。 According to the invention of claim 17 of the present application, a semiconductor device that acquires image information and audio information by photographing a subject and records an audio-video stream including the image information and audio information is obtained by imaging the subject. An image processing unit that performs signal processing on the received image signal to extract image information including an image feature amount indicating a change feature of the image, an audio acquisition unit that acquires audio and outputs an audio signal, and the audio A voice processing unit that performs signal processing on the voice signal obtained by acquiring the voice information including the voice feature quantity indicating the feature of the voice change, and the image feature quantity or the voice feature quantity is greater than a predetermined threshold value. when is large, the characteristic amount determination unit for determining a photographing timing of the image or voice has changed is valid for editing point, shot is determined to be appropriate as the edit point in Thailand And an information generation unit that generates an edit point information indicating the ring, and stores the image information, audio information, and the audio video stream including an edit point information on a recording medium, further, the information generating unit, the image processing Whether or not buffer data that is image information before encoding is held, and if the buffer data before encoding is held, the edit point is set to the image, audio, or When the picture corresponding to the shooting timing at which the shooting state has changed is set and the buffer data before encoding is not held, the editing point is obtained by encoding the image signal by the image processing unit. Set the first picture of the VOB unit, which is the unit of random access, closest to the shooting timing at which the image or sound changes in the stream. Image processing unit, if the coding buffer before the data is held in the image processing section, a picture corresponding to the edit point, because it forms a VOB unit to be a head picture VOB unit, If there is buffer data before encoding, or if there is no buffer data before encoding and the time required for re-encoding does not exceed the remaining time that can be used for encoding, the above edit points are used. If there is no buffer data before encoding and the time required for re-encoding exceeds the remaining time that can be used for encoding, the edit point can be set easily and obtained by shooting. The recorded audio-video stream can be edited as a stream that can be edited automatically or by a simple selection operation for guidance. A semiconductor device that can be obtained can be obtained.

以下、本発明の実施の形態について説明する。
（実施の形態１）
図１及び図２は、本発明の実施の形態１による撮像装置を説明するための図であり、図１は、この実施の形態１の撮像装置の全体構成を示し、図２は、この撮像装置により得られるオーディオビデオストリームを示している。
本実施の形態１の撮像装置１０１は、被写体の撮影により画像信号Ｓｉｍ及び音声信号Ｓａｕを得るとともに、得られた画像信号Ｓｉｍ及び音声信号Ｓａｕに、撮影状況を示す情報に基づいた信号処理を施して、撮影者にとって重要と思われる撮影部分を自動で、あるいはガイダンスに対する簡単な選択操作により編集可能なＭＰＥＧ‐２対応のストリーム（以下オーディオビデオデータともいう。）Ｄを生成するものである。 Embodiments of the present invention will be described below.
(Embodiment 1)
1 and 2 are diagrams for explaining an imaging apparatus according to Embodiment 1 of the present invention. FIG. 1 shows the overall configuration of the imaging apparatus of Embodiment 1, and FIG. Fig. 2 shows an audio video stream obtained by the device.
The imaging apparatus 101 according to the first embodiment obtains an image signal Sim and an audio signal Sau by photographing a subject, and performs signal processing based on information indicating a shooting state on the obtained image signal Sim and audio signal Sau. Thus, an MPEG-2 compliant stream (hereinafter also referred to as audio video data) D that can be edited automatically or by a simple selection operation with respect to guidance is generated.

すなわち、この撮像装置１０１は、被写体を撮影して画像信号Ｓｉｍを出力する撮像部１１と、被写体の撮影により得られた画像信号Ｓｉｍに、フィルタ処理、圧縮符号化処理、及び特徴量抽出処理等の信号処理を施して、画像の変化の特徴を示す画像特徴量を含む画像情報を抽出する画像処理部１１ａとを有している。ここで、画像の変化は、イベントの発生により生じた被写体の画像の変化であり、また、画像特徴量は、画像の変化の大きさや、画像が全くあるいは実質的に変化しない期間の長さなどである。 That is, the imaging apparatus 101 includes an imaging unit 11 that captures a subject and outputs an image signal Sim, an image signal Sim obtained by capturing the subject, filter processing, compression encoding processing, feature amount extraction processing, and the like. The image processing unit 11a extracts the image information including the image feature amount indicating the feature of the image change. Here, the change in the image is a change in the image of the subject caused by the occurrence of the event, and the image feature amount is the magnitude of the change in the image, the length of the period in which the image is not changed substantially or substantially, etc. It is.

上記撮像装置１０１は、音声を取得して音声信号Ｓａｕに出力する音声取得部１２と、該音声信号Ｓａｕに、フィルタ処理、圧縮符号化処理、及び特徴量抽出処理などの信号処理を施して、音声の変化の特徴を示す音声特徴量を含む音声情報を抽出する音声処理部１２ａとを有している。ここで、音声の変化は、イベントの発生により生じた被写体からの音声の変化であり、音声特徴量は、音声の変化の大きさや、音声が全くあるいは実質的に変化しない期間の長さなどである。 The imaging device 101 acquires sound and outputs it as a sound signal Sau, and performs signal processing such as filter processing, compression coding processing, and feature amount extraction processing on the sound signal Sau, And a voice processing unit 12a that extracts voice information including a voice feature amount indicating a feature of voice change. Here, the change in the sound is a change in the sound from the subject caused by the occurrence of the event, and the sound feature amount depends on the magnitude of the change in the sound or the length of the period in which the sound does not change substantially or substantially. is there.

上記撮像装置１０１は、撮影者の撮影状態を識別する固有の識別情報Ｄｉｄを取得する固有識別情報取得部１０と、取得した固有識別情報Ｄｉｄにフィルタ処理や特徴量抽出処理などの信号処理を施して、撮影状態の変化の特徴を示す固有特徴量を含む情報を抽出する固有識別情報処理部１０ａとを有している。ここで、撮影状態の変化は、イベントの発生により生じた撮影者の生理変化や撮影者による撮像装置の操作であり、固有特徴量は、撮影者の生理変化の大きさや、撮影者によるズーム調整，フォーカス調整の大きさなどである。 The imaging device 101 performs unique identification information acquisition unit 10 that acquires unique identification information Did for identifying a photographer's shooting state, and performs signal processing such as filtering and feature amount extraction processing on the acquired unique identification information Did. And a unique identification information processing unit 10a that extracts information including a unique feature amount indicating a feature of a change in shooting state. Here, the change in the shooting state is a photographer's physiological change caused by the occurrence of an event or an operation of the imaging device by the photographer, and the unique feature amount is a magnitude of the photographer's physiological change or a zoom adjustment by the photographer. , And the size of focus adjustment.

上記撮像装置１０１は、上記画像処理部１１ａ、音声処理部１２ａ、及び固有識別情報処理部１０ａでの特徴量抽出処理により得られた特徴量に基づいて、撮影状況が変化した撮影タイミングが編集点として妥当かどうかを判定する特徴量判定部２１と、該編集点として妥当であると判定された撮影タイミングを示す編集点情報を生成する編集点情報生成部２２ａとを有している。ここで、撮影状況が変化した撮影タイミングは、撮影中に被写体の画像が変化したタイミング、撮影中に被写体からの音声が変化したタイミング、及び、撮影状態が変化したタイミングを含むものである。また、撮影状態の変化は、撮影中に生じた撮影者の生理変化や撮影者の操作によるズーム、フォーカスなどの変化を含むものである。 The imaging apparatus 101 is configured such that the shooting timing at which the shooting situation has changed is an edit point based on the feature amount obtained by the feature amount extraction processing in the image processing unit 11a, the sound processing unit 12a, and the unique identification information processing unit 10a. And an edit point information generation unit 22a that generates edit point information indicating the shooting timing determined to be appropriate as the edit point. Here, the shooting timing at which the shooting situation has changed includes the timing at which the subject image has changed during shooting, the timing at which the sound from the subject has changed during shooting, and the timing at which the shooting state has changed. The change in the shooting state includes a change in the photographer's physiology that occurs during shooting, and a change in zoom, focus, and the like due to the operation of the photographer.

上記撮像装置１０１は、画像処理部１１ａ、音声処理部１２ａ、及び固有識別情報処理部１０ａからの情報に基づいて、画像処理部１１ａでの画像信号Ｓｉｍの圧縮符号化処理により得られた画像ストリーム、音声処理部１２ａでの音声信号の圧縮符号化処理により得られた音声ストリーム、及び編集点情報生成部２２ａにて生成された編集点情報を含むオーディオビデオストリームを作成するシステム処理部１３と、該オーディオビデオストリームを格納する記録媒体３０ａと、該記録媒体３０ａとデータバスＤｂｕｓとの間に接続された記録媒体インターフェース部３０と、ユーザの操作により発生したユーザ操作信号に基づいて、一連の記録再生処理が行われるよう上記各部を制御する制御部２０ａとを有している。 The image pickup apparatus 101 uses an image stream obtained by compressing and encoding the image signal Sim in the image processing unit 11a based on information from the image processing unit 11a, the sound processing unit 12a, and the unique identification information processing unit 10a. A system processing unit 13 for creating an audio video stream including the audio stream obtained by the compression encoding process of the audio signal in the audio processing unit 12a and the editing point information generated by the editing point information generating unit 22a; A series of recordings based on a recording medium 30a for storing the audio-video stream, a recording medium interface unit 30 connected between the recording medium 30a and the data bus Dbus, and a user operation signal generated by a user operation And a control unit 20a for controlling the above-described units so that the reproduction process is performed.

以下、上記各部で行われる信号処理について詳しく説明する。
上記画像処理部１１ａで行われる画像信号Ｓｉｍに対するフィルタ処理は、特定の周波数帯域の信号のみを抽出する処理である。画像処理部１１ａで行われる画像信号Ｓｉｍに対する圧縮符号化処理は、ＭＰＥＧ‐２に対応した画面内及び画面間予測符号化処理である。なお、この予測符号化処理は、ＭＰＥＧ‐２に対応したものに限らず、ＭＰＥＧ‐４あるいはＭＰＥＧ‐４ＡＶＣに対応したものであってもよい。また、ここでは、画像信号Ｓｉｍに対する特徴量抽出処理は、撮影された画像が急に変化した急変部分での変化の大きさや、画像が全くあるいは実質的に変化しない状態の継続時間などを、上記画面間予測符号化処理で用いる、画像の動きを示す動きベクトルに基づいて特徴量として抽出する処理である。画像の急変部分は、例えば、撮影者が、ハッとして、特定の被写体にカメラを向けたときの撮影部分などであり、また、映像の非変部分は、例えば、撮影者の視点が特定の方向に定まって動かないときの撮影部分などである。 Hereinafter, signal processing performed in each of the above-described units will be described in detail.
The filter process for the image signal Sim performed by the image processing unit 11a is a process of extracting only a signal in a specific frequency band. The compression encoding process for the image signal Sim performed in the image processing unit 11a is an intra-screen and inter-screen predictive encoding process corresponding to MPEG-2. The predictive encoding process is not limited to that corresponding to MPEG-2, but may be one corresponding to MPEG-4 or MPEG-4 AVC. Further, here, the feature amount extraction processing for the image signal Sim is performed by calculating the magnitude of the change in the sudden change portion where the captured image has changed suddenly, the duration of the state in which the image does not change at all or substantially, and the like. This is a process of extracting as a feature amount based on a motion vector indicating the motion of an image used in the inter-screen predictive encoding process. The sudden change portion of the image is, for example, a shooting portion when the photographer is pointed at the camera toward a specific subject, and the non-changeable portion of the image is, for example, a direction in which the photographer's viewpoint is in a specific direction This is the shooting part when it does not move.

上記音声取得部１２で行われる音声信号Ｓａｕに対するフィルタ処理は、特定の周波数帯域の信号のみを抽出する処理である。音声取得部１２で行われる音声信号Ｓａｕに対する圧縮符号化処理は、音声信号を圧縮して音声圧縮データを生成する、ＭＰＥＧ‐２，ＭＰＥＧ‐４などの画像信号に対する符号化処理に対応した処理である。また、ここでは、音声信号Ｓａｕに対する特徴量抽出処理は、音声信号の変化の大きさに基づいて、音声が大きく変化した急変部分での変化の大きさや、音声が全くあるいは実質的に変化しない状態の継続時間などを特徴量として抽出する処理である。音声の急変部分は、例えば、撮影されている人が会話をはじめたとき、演奏会などで音楽演奏が始まったとき、あるいは、運動会などでスタートの合図として用いられるピストルやホイッスルの音が発生したときの録音部分などである。また、音声の非変部分は、演劇などの中間幕の一瞬の静かな状態の録音部分などである。 The filtering process on the audio signal Sau performed by the audio acquisition unit 12 is a process of extracting only a signal in a specific frequency band. The compression encoding process for the audio signal Sau performed by the audio acquisition unit 12 is a process corresponding to an encoding process for an image signal such as MPEG-2 or MPEG-4, which generates audio compression data by compressing the audio signal. is there. Further, here, the feature amount extraction processing for the audio signal Sau is based on the magnitude of the change in the audio signal, the magnitude of the change at the sudden change portion where the voice has changed greatly, or the state in which the voice does not change at all or substantially. This is a process for extracting the duration of the process as a feature amount. For example, the sudden change in the sound occurred when the person being photographed started talking, when music performance began at a concert, or when a pistol or whistle sound was used as a start signal at an athletic meet, etc. The recording part of the time. In addition, the non-changeable part of the voice is a silent recording part of an intermediate curtain such as a play.

上記固有識別情報処理部１０ａで行われる固有識別情報Ｄｉｄに対するフィルタ処理は、該固有識別情報Ｄｉｄである固有識別情報取得部１０の出力信号の特定周波数成分のみ抽出する処理である。固有識別情報処理部１０ａで行われる固有識別情報Ｄｉｄに対する特徴量抽出処理は、固有識別情報Ｄｉｄの値が急激にあるいは大きく変化した急変部分での変化の大きさや、固有識別情報Ｄｉｄの値が全く変化しなくなった状態の継続時間などを固有特徴量として抽出する処理である。固有識別情報の急変部分は、例えば、撮影者の、意識の集中による緊張が始まったときに生ずる生理現象の特徴的な変化などに対応する。ここで、上記生理現象の特徴的な変化は、例えば、撮影中に生じた撮影者の生理現象の大きな変化である。また、検出の対象となる生理現象は、発汗作用、まばたき、瞳孔の変化、及び脈拍であり、固有識別情報処理部１０ａは、発汗作用やまばたき等の各種生理現象の変化を検知する、その種類に応じたセンサを有している。例えば、発汗作用は、撮影者の手の熱伝導率を測定するセンサによりモニタすることができる。なお、上記固有識別情報としての撮影者の生理現象は上記のものに限るものではない。 The filtering process for the unique identification information Did performed by the unique identification information processing unit 10a is a process of extracting only the specific frequency component of the output signal of the unique identification information acquiring unit 10 that is the unique identification information Did. In the feature quantity extraction processing for the unique identification information Did performed by the unique identification information processing unit 10a, the magnitude of the change at the sudden change portion where the value of the unique identification information Did has changed suddenly or greatly, and the value of the unique identification information Did are completely This is a process of extracting the duration of the state that has ceased to change as a characteristic feature. The sudden change portion of the unique identification information corresponds to, for example, a characteristic change of a physiological phenomenon that occurs when a photographer's tension due to concentration of consciousness starts. Here, the characteristic change in the physiological phenomenon is, for example, a large change in the physiological phenomenon of the photographer that occurs during the shooting. The physiological phenomena to be detected are sweating action, blinking, pupil change, and pulse, and the unique identification information processing unit 10a detects changes in various physiological phenomena such as sweating action and blinking. It has a sensor according to. For example, the sweating action can be monitored by a sensor that measures the thermal conductivity of the photographer's hand. The photographer's physiological phenomenon as the unique identification information is not limited to the above.

また、上記編集点を判定する処理は、特徴量判定部２１が、撮影状況が変化した撮影タイミングが編集点として妥当か否かを判定するものであり、具体的には以下の６つの判定処理である。
第１の判定処理は、画像処理部１１ａからの特徴量である、画面内のすべてのマクロブロックの動きベクトルの大きさが、あるいは画面内の特定のマクロブロックの動きベクトルの大きさが、決められた閾値を超えたか否かを判定し、動きベクトルの大きさが閾値を超えたと判定された撮影タイミングを、編集点として適切と判定するものである。 In addition, the process for determining the edit point is for the feature amount determination unit 21 to determine whether or not the shooting timing at which the shooting situation has changed is appropriate as the edit point. Specifically, the following six determination processes are performed. It is.
In the first determination process, the size of the motion vector of all macroblocks in the screen, or the size of the motion vector of a specific macroblock in the screen, which is a feature amount from the image processing unit 11a, is determined. It is determined whether or not the threshold value is exceeded, and the shooting timing at which the magnitude of the motion vector is determined to exceed the threshold value is determined to be appropriate as the editing point.

画面内のすべてのマクロブロックの動きベクトルの大きさが、決められた閾値を超えた場合は、撮像装置の筐体の揺れの大きさがある閾値を超えたこと、あるいは画面輝度レベルが急に変化したことが考えられる。 If the size of the motion vector of all macroblocks in the screen exceeds a predetermined threshold, the amount of shaking of the imaging device's housing exceeds a certain threshold, or the screen brightness level suddenly It may have changed.

第２の判定処理は、画像処理部１１ａからの特徴量である、動きベクトルの大きさの変化やフォーカス距離の変化の大きさがある閾値以下である状態が一定時間続いているか否かを判定し、一定時間以上続いていると判定された撮影タイミングを編集点として適切と判定するものである。 In the second determination process, it is determined whether or not a state in which the magnitude of the change in the motion vector or the change in the focus distance, which is the feature amount from the image processing unit 11a, is equal to or less than a certain threshold value continues for a certain period of time. Then, the shooting timing determined to have continued for a certain time or more is determined to be appropriate as the editing point.

動きベクトルの大きさの変化やフォーカス距離の変化の大きさが、ある閾値以下を維持している場合は、撮影者の視点が変化していない状態と考えられる。 If the change in the magnitude of the motion vector or the change in the focus distance remains below a certain threshold, it is considered that the photographer's viewpoint has not changed.

第３の判定処理は、音声処理部１２ａからの特徴量である、音のダイナミックレンジの変化の大きさが、決められた閾値を超えたか否かを判定し、閾値を超えたと判定された撮影タイミングを、編集点として適切と判定するものである。なお、音の変化は、被写体からの音の変化だけでなく、撮影者が発する音、例えば咳払いなどの音も含まれる。 In the third determination process, it is determined whether or not the magnitude of the change in the dynamic range of the sound, which is the feature amount from the sound processing unit 12a, exceeds a predetermined threshold value. The timing is determined to be appropriate as an editing point. Note that the change in sound includes not only a change in sound from the subject but also a sound emitted by the photographer, for example, a coughing sound.

第４の判定処理は、音のダイナミックレンジの無変化状態が、一定時間続いているか否かを検出し、無変化状態が一定時間以上続いていると判定された撮影タイミングを、編集点として適切と判定するものである。この場合、無変化部分の先頭位置を編集点とすることができる。 The fourth determination process detects whether or not the unchanged state of the sound dynamic range has continued for a certain period of time, and the shooting timing determined that the unchanged state has continued for a certain period of time is appropriate as an editing point. It is determined. In this case, the start position of the unchanged part can be set as the editing point.

第５の判定処理は、固有識別情報処理部１０ａからの特徴量である、撮影者の心拍数の変化の大きさや撮影者の手の熱伝導率の変化の大きさが、ある閾値を超えたか否かを検出し、閾値を超えたと判定された撮影タイミングを、編集点として適切と判定するものである。 In the fifth determination process, whether the magnitude of the change in the heart rate of the photographer or the change in the thermal conductivity of the photographer's hand, which is a feature amount from the unique identification information processing unit 10a, exceeds a certain threshold value. Whether or not the image capturing timing is determined to have exceeded the threshold is determined to be appropriate as the editing point.

第６の判定処理は、固有識別情報処理部１０ａからの特徴量である、まばたきの回数変化や瞳孔の変化の大きさなどが、ある閾値を超えたか否かを検出し、閾値を超えたと判定された撮影タイミングを、編集点として適切と判定するものである。 The sixth determination process detects whether or not the change in the number of blinks or the size of the pupil, which is a feature amount from the unique identification information processing unit 10a, exceeds a certain threshold value, and determines that the threshold value has been exceeded. The obtained shooting timing is determined to be appropriate as an editing point.

なお、フォーカスやズームなどの無意識で行われる操作については、画像処理部１１ａからの特徴量ではなく、専用のセンサの出力レベルに基づいて、これらの操作が行われた撮影タイミングが編集点として妥当か否かを判定するようにしてもよい。この場合、具体的には、固有識別情報取得部１０が上記専用センサの出力を、撮影状態を示す固有識別情報として取得し、固有識別情報処理部１０ａが、該固有識別情報に基づいて、撮影状態の変化の大きさである、撮影者の操作によるフォーカスやズームなどの調整の大きさを示す固有特徴量を取得する。そして、特徴量判定部２１が、固有特徴量を判定して、フォーカスやズームなどの撮影状態が変化した撮影タイミングが編集点として妥当か否かを判定する。また、撮影者の脳波、例えばα波を測定するセンサを設け、該センサの出力レベルに基づいて、α波が変化した撮影タイミングが編集点として妥当か否かを判定するようにしてもよい。この場合、具体的には、固有識別情報取得部１０が上記α波測定センサの出力レベルを、撮影状態を示す固有識別情報として取得し、固有識別情報処理部１０ａが、該固有識別情報に基づいて、撮影状態を表す撮影者のα波の変化の大きさを示す固有特徴量を取得する。そして、特徴量判定部２１が、固有特徴量を判定して、撮影状態を表す撮影者のα波が変化した撮影タイミングが編集点として妥当か否かを決定する。また、画像や音の特徴量は、撮影により得られた画像信号や音声信号を信号処理して抽出するのではなく、専用のセンサを用いて検出することも可能である。 For operations that are performed unconsciously such as focus and zoom, the shooting timing at which these operations are performed is appropriate as an edit point based on the output level of the dedicated sensor, not the feature amount from the image processing unit 11a. It may be determined whether or not. In this case, specifically, the unique identification information acquisition unit 10 acquires the output of the dedicated sensor as unique identification information indicating a shooting state, and the unique identification information processing unit 10a performs shooting based on the unique identification information. A unique feature amount indicating the magnitude of adjustment such as focus and zoom by the photographer's operation, which is the magnitude of the state change, is acquired. Then, the feature amount determination unit 21 determines the unique feature amount, and determines whether or not the shooting timing at which the shooting state such as focus or zoom has changed is appropriate as the editing point. In addition, a sensor that measures a photographer's brain wave, for example, an α wave may be provided, and based on the output level of the sensor, it may be determined whether or not the imaging timing at which the α wave has changed is appropriate as an editing point. In this case, specifically, the unique identification information acquisition unit 10 acquires the output level of the α wave measurement sensor as unique identification information indicating a shooting state, and the unique identification information processing unit 10a is based on the unique identification information. Thus, the characteristic feature amount indicating the magnitude of the change of the α wave of the photographer representing the photographing state is acquired. Then, the feature amount determination unit 21 determines the unique feature amount, and determines whether or not the shooting timing at which the photographer's α wave representing the shooting state has changed is appropriate as the editing point. In addition, the image and sound feature quantities can be detected by using a dedicated sensor instead of signal processing and extracting image signals and sound signals obtained by photographing.

また、この実施の形態１では、編集点情報生成部２２ａは、編集点として適切と判定された撮影タイミングを示す情報と、この撮影タイミングが、例えば、音の変化や映像の変化，あるいは撮影状態の変化などの特徴量のうちのどのような特徴量に基づいて判定されたものであるかを示す情報とを生成してシステム処理部１３に出力するものである。また、編集点情報生成部２２ａは、編集点として判定された撮影タイミングに最も近い、この撮影タイミング以前のＶＯＢユニットの先頭のＩピクチャを、編集時にアクセスポイントとして用いるピクチャに設定し、このように編集点を上記Ｉピクチャに設定したことを示す情報をシステム処理部１３に出力する。また、システム処理部１３は、編集点情報生成部２２ａからの情報に基づいて、オーディオビデオストリームＤに含まれる管理情報であるプレイリストを更新するものとなっている。
つまり、システム処理部１３により作成されたオーディオビデオストリームのプレイリストは、編集点として適切と判定された撮影タイミングを示す編集点情報と、編集点として適切と判定された撮影タイミングが、どのような特徴量に基づいて判定されたものであるかを示す情報と、編集時にアクセスポイントとして用いるピクチャにいずれのピクチャを設定したかを示す情報とを含んでいる。 Further, in the first embodiment, the editing point information generation unit 22a includes information indicating the photographing timing determined to be appropriate as the editing point, and the photographing timing includes, for example, a sound change, a video change, or a photographing state. Information indicating what kind of feature quantity is determined based on the feature quantity such as a change in the amount of the change, and outputs the information to the system processing unit 13. The editing point information generation unit 22a sets the first I picture of the VOB unit closest to the shooting timing determined as the editing point before the shooting timing as a picture to be used as an access point at the time of editing. Information indicating that the edit point is set in the I picture is output to the system processing unit 13. Further, the system processing unit 13 updates a playlist, which is management information included in the audio video stream D, based on information from the editing point information generation unit 22a.
In other words, the playlist of the audio video stream created by the system processing unit 13 has the edit point information indicating the shooting timing determined to be appropriate as the edit point and the shooting timing determined to be appropriate as the edit point. It includes information indicating whether the determination is based on the feature amount, and information indicating which picture is set as a picture used as an access point at the time of editing.

但し、上記アクセスポイントとして用いるピクチャは、編集点として判定された、単に画像や音声などの撮影状況が変化した撮影タイミングに最も近い、この撮影タイミング以前のＩピクチャに限るものではなく、例えば、編集点の設定を行う、画像の変化や音声の変化などの要因に応じて、被写体の画像または音声、あるいは撮影者の撮影状態に変化を与えるイベントが発生したタイミングから、このイベントに起因する特徴量が検出されるまでの遅延時間を考慮して、編集点とするピクチャを決定しても良い。例えば、撮影状況が変化したタイミングから上記遅延時間だけ遡った撮影タイミングに一番近いＩピクチャを編集点として用いるピクチャに設定してもよい。この場合、遅延時間は、フォーカス情報などに応じて決定した時間としても、予めすべの要因に対して一律に、あるいは個々の要因に対して別々に決められた固定の時間としてもよい。 However, the picture used as the access point is not limited to the I picture before the shooting timing, which is the closest to the shooting timing at which the shooting situation such as the image or sound, which has been determined as the editing point, has changed. The amount of feature attributed to this event from the timing of the event that changes the image or sound of the subject or the shooting state of the photographer, depending on factors such as the change in the image and the change in the sound. A picture to be an edit point may be determined in consideration of a delay time until detection of. For example, the I picture closest to the shooting timing that is backed by the delay time from the timing when the shooting situation changes may be set as the picture that is used as the editing point. In this case, the delay time may be a time determined according to the focus information or the like, or may be a fixed time determined in advance uniformly for all factors or separately for each factor.

また、本実施の形態１では、オーディオビデオストリームはＭＰＥＧ‐２に対応するものとしているため、上記画像、音声、あるいは撮影状態を変化させるイベントが発生した時刻を、上記画像、音声、あるいは撮影状態の変化の特徴を示す特徴量が検出された時刻として、プレイリストに含めて、該ストリームの管理情報記録領域に書き込んでいるが、該ストリームはＭＰＥＧ‐４ＡＶＣに対応するものでもよく、この場合は、特徴量検出時刻のみを、該ストリームの付加情報記録領域（ＳＥＩ）に記録してもよい。 In the first embodiment, since the audio / video stream corresponds to MPEG-2, the time at which the event for changing the image, sound, or shooting state occurs is set as the image, sound, or shooting state. As the time when the feature amount indicating the feature of the change is detected, it is included in the playlist and written in the management information recording area of the stream. However, the stream may correspond to MPEG-4 AVC, and in this case Only the feature amount detection time may be recorded in the additional information recording area (SEI) of the stream.

また、この実施の形態１では、制御部２０ａは、撮影前にユーザにより選択されたシナリオに基づいて特徴量判定部２１に指令信号を出力して、編集点の設定を行う要因となる、例えば、音、映像、撮影者の生理現象などを決定するとともに、編集点設定を行う要因が変化した撮影タイミングを編集点と判定する際の判定強度、つまり特徴量の閾値を決定するものとしている。 Further, in the first embodiment, the control unit 20a outputs a command signal to the feature amount determination unit 21 based on a scenario selected by the user before shooting, and becomes a factor for setting an edit point. In addition to determining sound, video, photographer's physiological phenomenon, and the like, determination intensity, that is, a threshold value of a feature amount, when determining a shooting timing when an editing point setting factor has changed as an editing point is determined.

また、上記制御部２０ａは、ユーザが、本撮像装置によるガイダンスに応じて、運動会、演奏会、結婚式、旅行などの、撮影場所を選択すると、撮影状況の変化のパターン、例えば、音量の変化パターン、明るさの変化パターン、撮影者の生理現象の変化パターンなどに応じて、編集点設定のための各種の要因に対する判定強度が、予め容易された複数の既定値のうちの１つに設定する。但し、各種の要因に対する判定強度として用意されている既定値は、ユーザの好みなどに応じて、独自に調整可能としてもよい。 In addition, when the user selects a shooting location such as an athletic meet, a concert, a wedding, or a trip according to the guidance from the imaging apparatus, the control unit 20a changes a shooting situation change pattern, for example, a change in volume. In accordance with the pattern, change pattern of brightness, change pattern of photographer's physiological phenomenon, etc., the determination strength for various factors for setting the edit point is set to one of a plurality of predetermined default values that have been facilitated in advance. To do. However, the default values prepared as determination strengths for various factors may be independently adjustable according to user preferences and the like.

なお、この撮影装置１０１は、図示していないが、上記制御部２０ａからの制御信号に基づいて、記録媒体に記録されたオーディオビデオストリームを復号化して再生する再生部を有している。 Although not shown, the photographing apparatus 101 includes a playback unit that decodes and plays back an audio / video stream recorded on a recording medium based on a control signal from the control unit 20a.

次に、上記記録媒体に記録されたＡＶデータの構造について簡単に説明する。
図２は、記録媒体に記録されたＡＶデータの構造を説明する図である。
ここで、記録媒体は、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ）ディスクなどのディスク状記録媒体としている。ただし、記録媒体は、ＤＶＤなどのディスク状記録媒体に限るものではなく、例えば、ＨＤＤ（ハードディスクドライブ）、メモリーカード、あるいは磁気テープなどでもよい。また、上記記録媒体には、１つのコンテンツに対応する画像信号Ｓｉｍ及び音声信号Ｓａｕを符号化して得られたストリームＤｓと、これらのコンテンツに対応する管理情報Ｄｍとを含むオーディオビデオストリームＤが書き込まれている。この管理情報Ｄｍは、ディスク状記録媒体の中心近傍の内側領域に書き込まれ、上記ストリームＤｓは、この内側領域の外側の領域に書き込まれている。また、ストリームＤｓは、ＶＯＢユニットＶＯＢＵにより区分されている。 Next, the structure of AV data recorded on the recording medium will be briefly described.
FIG. 2 is a diagram for explaining the structure of AV data recorded on a recording medium.
Here, the recording medium is a disc-shaped recording medium such as a DVD (Digital Versatile Disk) disc. However, the recording medium is not limited to a disk-shaped recording medium such as a DVD, and may be, for example, an HDD (Hard Disk Drive), a memory card, or a magnetic tape. In addition, an audio video stream D including a stream Ds obtained by encoding an image signal Sim and an audio signal Sau corresponding to one content and management information Dm corresponding to these contents is written on the recording medium. It is. The management information Dm is written in an inner area near the center of the disc-shaped recording medium, and the stream Ds is written in an area outside the inner area. The stream Ds is divided by the VOB unit VOBU.

また、上記管理情報ＤｍはプレイリストＤｍｐを含んでおり、このプレイリストＤｍｐには、複数の補助情報ｐｌａｙｉｔｅｍ［０］，［１］，［２］，・・・，［ｎ］，・・・が含まれている。 The management information Dm includes a playlist Dmp, and the playlist Dmp includes a plurality of auxiliary information playitems [0], [1], [2],..., [N],. It is included.

例えば、図２に示す符号化データＤのストリームＤｓには、ＶＯＢユニットＶＯＢＵ（ｍ−ｋ）ＶＯＢユニットＶＯＢＵ（ｍ）、ＶＯＢユニットＶＯＢＵ（ｍ＋ｑ）が含まれており、特定のＶＯＢユニットＶＯＢＵ（ｍ）に対応するプレイリストの補助情報ｐｌａｙｉｔｅｍ［ｎ］には、時間情報Ｄｔｍ、ＡＶ情報Ｄａｖ、操作情報Ｄｏｐ、生理的情報Ｄｐｈ、及び編集済みフラグＤｅｆが含まれている。ここで、時間情報Ｄｔｍは、ＶＯＢユニットＶＯＢＵ（ｍ）の開始時刻を示す情報Ｄｓｔと、ＶＯＢユニットＶＯＢＵ（ｍ）の終了時刻を示す情報Ｄｅｔとを含んでいる。ＡＶ情報Ｄａｖは、画像に関する特徴量を示す情報Ｄｖｉ、及び音声に関する特徴量を示す情報Ｄａｕを含んでいる。操作情報Ｄｏｐは、手ブレの程度を示す情報Ｄｈｍ、フォーカス操作時の操作量を示す情報Ｄｆｏ、及びズーム操作時の操作量を示す情報Ｄｚｍを含んでいる。生理的情報Ｄｐｈは、撮影者の発汗量を示す汗情報Ｄｓｕ、撮影者のα波強度を示すα波情報Ｄαｗ、操作者のまばたきの頻度を示すまばたき情報Ｄｂｋ、操作者の瞳孔変化の程度を示す瞳孔情報Ｄｐｕ、及び操作者の脈拍数を示す脈拍情報Ｄｐｓを含んでいる。このように、上記画像、音声、あるいは撮影状態を変化させるイベントが発生した時刻は、上記画像、音声、あるいは撮影状態の変化の特徴を示す特徴量が検出された時刻として、実質的に、プレイリストに含めて該ストリームの管理情報記録領域に書き込まれている。 For example, the stream Ds of the encoded data D shown in FIG. 2 includes a VOB unit VOBU (m−k), a VOB unit VOBU (m), and a VOB unit VOBU (m + q), and a specific VOB unit VOBU (m ) Corresponding to the playlist includes time information Dtm, AV information Dav, operation information Dop, physiological information Dph, and an edited flag Def. Here, the time information Dtm includes information Dst indicating the start time of the VOB unit VOBU (m) and information Det indicating the end time of the VOB unit VOBU (m). The AV information Dav includes information Dvi indicating a feature amount related to an image and information Dau indicating a feature amount related to a sound. The operation information Dop includes information Dhm indicating the degree of camera shake, information Dfo indicating the operation amount during the focus operation, and information Dzm indicating the operation amount during the zoom operation. The physiological information Dph includes sweat information Dsu indicating the amount of sweat of the photographer, α-wave information Dαw indicating the α-wave intensity of the photographer, blink information Dbk indicating the frequency of the operator's blink, and the degree of pupil change of the operator. It includes pupil information Dpu to be displayed and pulse information Dps to indicate the pulse rate of the operator. As described above, the time at which the event for changing the image, sound, or shooting state occurs is substantially the same as the time at which the feature amount indicating the change feature of the image, sound, or shooting state is detected. It is included in the list and written in the management information recording area of the stream.

次に動作について説明する。
〔撮影前の設定操作〕
まず、撮影前のマニュアル設定操作について説明する。
撮影者は、運動会や結婚式などの催し物に合わせて、撮影状況が変化した撮影タイミングが編集点として適切であるか否かの判定に用いる判定強度を設定する。 Next, the operation will be described.
[Setting operation before shooting]
First, a manual setting operation before photographing will be described.
The photographer sets a determination strength used for determining whether or not the shooting timing at which the shooting situation has changed is appropriate as an editing point in accordance with an event such as an athletic meet or a wedding.

この判定強度については、撮像装置に予め設定されている複数のシナリオのうちから、運動会や結婚式に対応するものを選択することにより、編集点設定のための個々の要因に対する判定強度を、選択されたシナリオに応じた値に設定することもできるが、ここでは、操作者がマニュアルで設定する操作について説明する。 For this judgment strength, select the judgment strength for each factor for editing point setting by selecting the one corresponding to athletic meet or wedding from a plurality of scenarios preset in the imaging device Although it can be set to a value according to the scenario, the operation manually set by the operator will be described here.

図３は、撮像装置１００の編集点挿入設定を行う画面を示している。
この設定画面１００ａ上には、ＡＶ情報の設定ボタン１１０、操作情報の設定ボタン１２０、生理的情報の設定ボタン１３０が表示されている。また、設定画面１００ａの右下部分には、生理的情報のより詳細な設定を行う詳細設定画面１３０ａが表示されており、該詳細設定画面１３０ａ上には、汗情報の設定ボタン１３１、瞳孔情報の設定ボタン１３２、及び脈拍情報の設定ボタン１３３が表示されている。なお、図３では、示していないが、ＡＶ情報のより詳細な設定を行う詳細設定画面や操作情報のより詳細な設定を行う詳細設定画面も表示可能となっている。 FIG. 3 shows a screen for performing edit point insertion setting of the imaging apparatus 100.
On this setting screen 100a, an AV information setting button 110, an operation information setting button 120, and a physiological information setting button 130 are displayed. Further, a detailed setting screen 130a for performing more detailed setting of physiological information is displayed in the lower right portion of the setting screen 100a. On the detailed setting screen 130a, a sweat information setting button 131, pupil information, and the like are displayed. A setting button 132 and a pulse information setting button 133 are displayed. Although not shown in FIG. 3, a detailed setting screen for performing more detailed settings of AV information and a detailed setting screen for performing more detailed settings of operation information can be displayed.

それぞれのボタンは、各要素に対する判定強度を、“−”表示が示す最小レベルと、“＋”表示が示す最大レベルとの間で、任意のレベルに設定可能となっている。なお、“０”表示は、これらの中間のレベルを示している。 Each button can set the determination strength for each element to an arbitrary level between the minimum level indicated by “−” display and the maximum level indicated by “+” display. The “0” display indicates an intermediate level between these.

ここで、例えば、汗情報に関する判定強度のレベルが高いということは、発汗量の変化が比較的小さくても、この発汗量の変化が生じた撮影タイミングを、編集点として適切であると判定するということである。一方、汗情報に関する判定強度のレベルが小さいということは、発汗量の変化が比較的大きくても、この発汗量の変化が生じた撮影タイミングは、編集点として適切でないと判定するということである。 Here, for example, the fact that the level of the determination intensity related to sweat information is high means that even when the change in the amount of sweat is relatively small, the shooting timing at which the change in the amount of sweat occurs is appropriate as the editing point. That's what it means. On the other hand, when the level of the determination intensity related to sweat information is small, even if the change in the amount of sweat is relatively large, it is determined that the shooting timing at which the change in the amount of sweat occurs is not appropriate as an edit point. .

例えば、運動会など競技大会で撮影を行う場合には、演技や競技の開始時にはその合図などの音声の大きな変化が発生すると考えられるため、ＡＶ情報の音声要素に対する判定強度を平均的なレベルより強く設定し、また、生理的情報の脈拍要素に対する判定強度なども、競技中は撮影者がハラハラする場合も考えられることから、強めに設定するのがよいと考えられる。 For example, when shooting at a competition such as an athletic meet, it is considered that a large change in sound such as a cue occurs at the start of performance or competition, so the determination strength for the audio element of AV information is stronger than the average level. It is considered that it is better to set the strength of the determination for the pulse element of the physiological information because the photographer may be injured during the competition.

旅行などで風景を撮影する場合には、撮影者は、ＡＶ情報の画像要素に対する判定強度を平均的なレベルより強く設定し、また、遠くの景色などを撮影する場合も考えられるので、フォーカスやズームの操作量に対する判定強度を高くする場合があると考えられる。 When shooting a landscape during a trip or the like, the photographer may set the determination strength for the image element of the AV information to be higher than the average level, and may shoot a distant landscape. It is considered that the determination strength for the zoom operation amount may be increased.

また、結婚式では、撮影者は、ＡＶ情報の画像特徴量の判定強度及び音声特徴量の判定強度をともに平均的なレベルより強く設定し、生理的情報の各要素の特徴量についても比較的判定強度を高く設定する場合が考えられる。 In weddings, the photographer sets both the image feature amount determination strength and the sound feature amount determination strength of AV information to be stronger than the average level, and the feature amount of each element of physiological information is relatively high. A case where the determination intensity is set high can be considered.

このような判定強度の設定は、ユーザ操作、つまり撮影者のマニュアル操作に応じて上記制御部２０ａにて行われ、制御部２０ａは、ユーザ操作に応じて設定された各要素に対する判定強度を示す制御信号を上記特徴量判定部２１に供給する。その後、撮影者が撮影を行うと、上記特徴量判定部２１は、上記各特徴量を、対応する、上記制御部２０ａで設定された判定強度（閾値レベル）に基づいて判定して、上記画像、音声、あるいは撮影状態が変化した撮影タイミングが編集点として妥当であるか否かを決定する。 Such setting of the determination strength is performed by the control unit 20a according to a user operation, that is, a manual operation of the photographer, and the control unit 20a indicates the determination strength for each element set according to the user operation. A control signal is supplied to the feature amount determination unit 21. Thereafter, when the photographer performs photographing, the feature amount determination unit 21 determines the feature amounts based on the corresponding determination strength (threshold level) set by the control unit 20a, and the image Then, it is determined whether or not the shooting timing at which the sound or the shooting state has changed is appropriate as the editing point.

〔撮影時の動作〕
続いて、撮影時の撮像装置の動作について具体的に説明する。
図４は、実施の形態１の撮像装置の動作フローを説明する図である。
撮影が開始されると（ステップＳ１）、撮像装置１０１は、画像情報、音声情報、及び撮影状態に関する情報を取得する（ステップＳ２）。 [Operation during shooting]
Next, the operation of the imaging apparatus at the time of shooting will be specifically described.
FIG. 4 is a diagram illustrating an operation flow of the imaging apparatus according to the first embodiment.
When shooting is started (step S1), the imaging apparatus 101 acquires image information, audio information, and information regarding the shooting state (step S2).

具体的には、上記ステップＳ２では、撮像部１１が、被写体の撮像により画像信号Ｄｉｍを出力する処理、音声取得部１２が音声を取得して音声信号Ｄａｕを出力する処理、及び、固有識別情報取得部１０が撮影者による撮像装置の操作及び撮影者の生理的変化を検知して、操作量及び生理的な変化に関する固有識別情報Ｄｉｄを出力する処理が並行して行われる。 Specifically, in step S2, the imaging unit 11 outputs the image signal Dim by imaging the subject, the audio acquisition unit 12 acquires the audio and outputs the audio signal Dau, and the unique identification information. The acquisition unit 10 detects an operation of the imaging device by the photographer and a physiological change of the photographer, and performs processing of outputting unique identification information Did regarding the operation amount and the physiological change in parallel.

すると、固有識別情報処理部１０ａは、固有識別情報取得部１０からの固有識別情報Ｄｉｄ及び制御部２０ａからの制御信号に基づいて、フォーカスやズーム操作における操作量、及び撮影者の生理的な変化の大きさを示す、発汗量、まばたきの頻度、脈拍数の変動量など特徴量を検出する（ステップＳ２ａ）。また、画像処理部１１ａでは、撮影部１１からの画像信号Ｄｉｍ及び制御部２０ａからの制御信号に基づいて、画像信号に対してＭＰＥＧ‐２対応の予測符号化処理を施して画像ストリームを生成するとともに、該予測符号化処理で用いる動きベクトルに基づいて、画像が急変した部分での画像変化の大きさなどである画像の特徴量を含む画像情報を取得する（ステップＳ２ｂ）。また、音声処理部１２ａでは、音声取得部１２からの音声信号Ｄｉｍ及び制御部２０ａからの制御信号に基づいて、音声信号に対して符号化処理を施して音声ストリームを生成するとともに、該音声信号に基づいて、音声が急変した部分での音声変化の大きさなどである音声の特徴量を含む音声情報を取得する（ステップＳ２ｃ）。 Then, the unique identification information processing unit 10a, based on the unique identification information Did from the unique identification information acquisition unit 10 and the control signal from the control unit 20a, the operation amount in the focus or zoom operation, and the physiological change of the photographer. Features such as the amount of sweating, the frequency of blinking, and the amount of fluctuation of the pulse rate are detected (step S2a). Further, the image processing unit 11a generates an image stream by performing an MPEG-2 compatible predictive coding process on the image signal based on the image signal Dim from the photographing unit 11 and the control signal from the control unit 20a. At the same time, based on the motion vector used in the predictive encoding process, image information including the image feature amount such as the magnitude of the image change at the portion where the image suddenly changed is acquired (step S2b). The audio processing unit 12a generates an audio stream by performing an encoding process on the audio signal based on the audio signal Dim from the audio acquisition unit 12 and the control signal from the control unit 20a. On the basis of the voice information, the voice information including the voice feature amount such as the magnitude of the voice change at the portion where the voice is suddenly changed is acquired (step S2c).

次に、特徴量判定部２１は、ユーザ操作や撮影者の生理的変化に関する特徴量、画像に関する特徴量、及び音声に関する特徴量と、各特徴量に対して設定されている判定強度とに基づいて、特徴量が検出された撮影タイミングが編集点として妥当であるかを判定する（ステップＳ３）。 Next, the feature amount determination unit 21 is based on a feature amount related to a user operation or a physiological change of a photographer, a feature amount related to an image, a feature amount related to sound, and a determination intensity set for each feature amount. Then, it is determined whether the shooting timing at which the feature amount is detected is appropriate as an editing point (step S3).

続いて、編集点情報生成部２２ａは、編集点として妥当であると判定された撮影タイミングを示す編集点情報を生成するとともに、編集点として用いるピクチャを、該撮影タイミング以前であってこれに最も近いＶＯＢユニットの先頭のＩピクチャに設定したことを示す編集点ピクチャ情報を生成する（ステップＳ４）。 Subsequently, the edit point information generation unit 22a generates edit point information indicating the shooting timing determined to be appropriate as the edit point, and the picture used as the edit point is the most before this shooting timing. Edit point picture information indicating that the first I picture of the near VOB unit is set is generated (step S4).

その後、システム処理部１３は、制御部２０ａからの制御信号に基づいて、上記画像ストリーム、音声ストリーム、編集点情報、及び編集点ピクチャ情報を含むオーディオビデオストリームを作成して記録媒体インターフェース３０に出力する。すると、記録媒体インターフェースは、入力されたオーディオビデオストリームを記録媒体に記録する（ステップＳ５）。 Thereafter, the system processing unit 13 creates an audio video stream including the image stream, the audio stream, the editing point information, and the editing point picture information based on the control signal from the control unit 20a and outputs the audio video stream to the recording medium interface 30. To do. Then, the recording medium interface records the input audio video stream on the recording medium (step S5).

以下、編集点を判定するステップＳ３の処理について説明する。
具体的には、特徴量判定部２１は、制御部２０ａからの判定強度を示す制御信号に基づいて、固有識別情報処理部１０ａで検出された固有特徴量、画像処理部１１ａで検出された画像特徴量、音声処理部１２ａで検出された音声特徴量のそれぞれについて、それぞれの特徴量が検出された撮影タイミングが編集点として妥当であるか否かを判定する。 Hereinafter, the process of step S3 for determining the edit point will be described.
Specifically, the feature amount determination unit 21 uses the unique feature amount detected by the unique identification information processing unit 10a and the image detected by the image processing unit 11a based on a control signal indicating the determination strength from the control unit 20a. For each of the feature quantity and the voice feature quantity detected by the voice processing unit 12a, it is determined whether or not the shooting timing at which each feature quantity is detected is appropriate as an edit point.

例えば、固有識別情報処理部１０ａで検出された、手ブレに関する特徴量は、撮影者の手ブレの大きさである。この検出された手ブレの大きさが、予め撮影前に設定されている判定強度、つまり手ブレの大きさの閾値以上であれば、この手ブレに関する特徴量が検出された撮影タイミングが編集点として妥当であると判定され、手ブレの大きさが上記判定強度より小さければ、該撮影タイミングは編集点として妥当でないと判定される（ステップＳ３ａ）。また、固有識別情報処理部１０ａで検出された、フォーカスに関する特徴量、及びズームに関する特徴量は、それぞれ、フォーカス操作により変化したフォーカス変動量、及びズーム操作により変化したズーム変動量である。そして、これらの特徴量についても、手ブレに関する特徴量と同様に、その大きさが撮影前に設定されている判定強度以上であるか否かに応じて、特徴量が検出された撮影タイミングが編集点として妥当であるか否かが判定される（ステップＳ３ａ）。 For example, the feature amount related to camera shake detected by the unique identification information processing unit 10a is the size of the camera shake of the photographer. If the detected camera shake size is equal to or greater than the judgment intensity set in advance before shooting, that is, the camera shake threshold value, the shooting timing at which the feature amount related to the camera shake is detected is the editing point. If the magnitude of camera shake is smaller than the above-described determination strength, it is determined that the photographing timing is not appropriate as an editing point (step S3a). Further, the feature quantity related to focus and the feature quantity related to zoom detected by the unique identification information processing unit 10a are a focus fluctuation amount changed by the focus operation and a zoom fluctuation amount changed by the zoom operation, respectively. And, for these feature amounts as well as the feature amounts related to camera shake, the shooting timing at which the feature amounts are detected depends on whether the magnitude is equal to or greater than the determination intensity set before shooting. It is determined whether or not the edit point is valid (step S3a).

さらに、固有識別情報処理部１０ａで検出された、発汗に関する特徴量は、撮影者の発汗量である。この検出された発汗量が、予め撮影前に設定されている判定強度、つまり発汗量の閾値以上であれば、この発汗に関する特徴量が検出された撮影タイミングが編集点として妥当であると判定され、上記発汗量が上記検出強度より小さければ、該撮影タイミングは編集点として妥当でないと判定される。また、固有識別情報処理部１０ａで検出された、α波に関する特徴量、まばたきに関する特徴量、瞳孔に関する特徴量、及び脈拍に関する特徴量は、α波の変化の大きさ、まばたきの頻度、瞳孔の変化の大きさ、及び脈拍数の変化の大きさである。そして、これらの撮影者の生理変化に関する特徴量についても、発汗に関する特徴量と同様、その値が予め撮影前に設定されている判定強度以上であるか否かに応じて、それぞれの特徴量が検出された撮影タイミングが編集点として妥当であるか否かが判定される（ステップＳ３ａ）。 Further, the feature amount related to sweating detected by the unique identification information processing unit 10a is the sweating amount of the photographer. If the detected perspiration amount is equal to or higher than the determination intensity set in advance before photographing, that is, the perspiration amount threshold, it is determined that the photographing timing at which the feature amount related to perspiration is detected is appropriate as an editing point. If the perspiration amount is smaller than the detected intensity, it is determined that the photographing timing is not appropriate as an editing point. Further, the feature quantity related to α wave, the feature quantity related to blink, the feature quantity related to pupil, and the feature quantity related to pulse detected by the unique identification information processing unit 10a are the magnitude of change of α wave, the frequency of blink, The magnitude of the change and the magnitude of the change in the pulse rate. As for the feature quantities related to the physiological changes of these photographers, each feature quantity depends on whether or not the value is equal to or higher than the determination strength set in advance before shooting, as is the feature quantity related to sweating. It is determined whether or not the detected shooting timing is valid as an editing point (step S3a).

画像処理部１１ａで検出された、画像に関する特徴量は、画像が急に変化した部分での変化の大きさ、あるいは画像がまったくあるいは実質的に変化しない部分が継続した時間である。そして、この検出された画像急変部分での変化の大きさ、あるいは画像無変化状態の継続時間が、予め撮影前に設定されている判定強度、つまり変化の大きさの閾値、あるいは状態継続時間の閾値以上であれば、これらの特徴量が検出された撮影タイミングが編集点として妥当であると判定され、そうでなければ、撮影タイミングは編集点として妥当でないと判定される（ステップＳ３ｂ）。 The feature amount related to the image detected by the image processing unit 11a is the magnitude of the change at the portion where the image suddenly changes, or the time when the portion where the image does not change substantially or substantially continues. The magnitude of the change in the detected image sudden change portion or the duration of the no-image-change state is determined in advance by the judgment intensity set before shooting, that is, the threshold of the magnitude of change or the state duration. If it is equal to or greater than the threshold value, it is determined that the shooting timing at which these feature amounts are detected is appropriate as an editing point, and otherwise, it is determined that the shooting timing is not appropriate as an editing point (step S3b).

音声処理部１２ａで検出された、音声に関する特徴量は、音声が大きく変化した部分での変化の大きさ、あるいは音声がまったくあるいは実質的に変化しない状態が継続した時間である。そして、この検出された音声急変部分での変化の大きさ、あるいは音声無変化状態の継続時間が、予め撮影前に設定されている判定強度、つまり変化の大きさの閾値、あるいは状態継続時間の閾値以上であれば、これらの特徴量が検出された撮影タイミングが編集点として妥当であると判定され、そうでなければ、該撮影タイミングは編集点として妥当でないと判定される（ステップＳ３ｃ）。 The feature amount related to the sound detected by the sound processing unit 12a is the magnitude of the change at the portion where the sound has changed greatly, or the time during which the state where the sound does not change at all or substantially does not change. Then, the magnitude of the change in the detected sudden voice change portion or the duration of the voice no-change state is determined in advance by the judgment intensity set before shooting, that is, the threshold of the magnitude of change, or the state duration. If it is equal to or greater than the threshold value, it is determined that the shooting timing at which these feature amounts are detected is appropriate as an editing point, and otherwise, it is determined that the shooting timing is not appropriate as an editing point (step S3c).

その後、編集点情報生成部２２ａは、特徴量判定部２１で、各処理部１０ａ、１１ａ、１２ａから供給されたそれぞれの特徴量に基づいて、該特徴量が検出された撮影タイミングが編集点として妥当であると判定される度に、該撮影タイミングを示す編集点情報を生成するとともに、編集点として用いるピクチャを、該撮影タイミング以前であってこれに最も近いＶＯＢユニットの先頭のＩピクチャに設定したことを示す編集点ピクチャ情報を生成する（ステップＳ４）。 Thereafter, the editing point information generation unit 22a uses the feature amount determination unit 21 as the editing point based on the respective feature amounts supplied from the processing units 10a, 11a, and 12a. Each time it is determined to be valid, edit point information indicating the shooting timing is generated, and the picture used as the edit point is set to the first I picture of the VOB unit that is before the shooting timing and is closest thereto. Edit point picture information indicating that this has been done is generated (step S4).

図５は、編集点の設定処理を具体的に説明する図であり、図５（ａ）は、処理フローを示し、図５（ｂ）は、撮影タイミングと、画像ストリームにおけるＶＯＢユニットの切れ目との関係を示している。 5A and 5B are diagrams for specifically explaining the edit point setting process. FIG. 5A shows a processing flow, and FIG. 5B shows the shooting timing and the breaks of VOB units in the image stream. Shows the relationship.

この実施の形態１では、編集点情報生成部２２ａは、特徴量による遅延時間、つまり特徴量によって異なる、イベントの発生時点から該イベントの発生により撮影状況が変化するまでの時間を算出する（ステップＳ１１）。 In the first embodiment, the editing point information generation unit 22a calculates a delay time due to the feature amount, that is, a time period that varies depending on the feature amount from when the event occurs until the shooting situation changes due to the occurrence of the event (step S1). S11).

次に、編集点情報生成部２２ａは、上記特徴量が検出された撮影タイミングＴｃｐから、上記算出された遅延時間Δｔだけ遡った撮影タイミングＴｅｐより前で最も近いＶＯＢユニット（ｉ）の切れ目を編集点に設定する（ステップＳ１２ａ）。 Next, the edit point information generation unit 22a edits the cut of the nearest VOB unit (i) before the shooting timing Tep that is backed by the calculated delay time Δt from the shooting timing Tcp at which the feature amount is detected. A point is set (step S12a).

その後、編集点情報生成部２２ａは、編集点として妥当であると判定された撮影タイミングＴｅｐを示す編集点情報を生成するとともに、図５（ｂ）に示すように、編集点として用いるピクチャを、該撮影タイミング以前であってこれに最も近いＶＯＢユニットＶＯＢＵ（ｆ）の先頭のＩピクチャＦ１に設定したことを示す編集点設定情報を生成する。そして、オーディオビデオストリームのプレイアイテムを、撮影タイミングＴｅｐが編集点に設定されたことが示されるよう変更する（ステップＳ１３）。 After that, the edit point information generation unit 22a generates edit point information indicating the photographing timing Tep determined to be valid as the edit point, and, as shown in FIG. Edit point setting information indicating that the first I picture F1 of the VOB unit VOBU (f) closest to the shooting timing is set is generated. Then, the play item of the audio video stream is changed so as to indicate that the shooting timing Tep is set as the edit point (step S13).

なお、図５（ｂ）では、ピクチャＦ１は、その符号化あるいは復号化の際に他のピクチャを参照しないＩピクチャであり、ピクチャＦ４、Ｆ７、Ｆ１０は、符号化あるいは復号化の際に、前方のＩピクチャあるいはＰピクチャを参照するＰピクチャであり、ピクチャＦ２、Ｆ３、Ｆ５、Ｆ６、Ｆ８、Ｆ９は、符号化あるいは復号化の際に、前方のＩピクチャあるいはＰピクチャと後方のＰピクチャとを参照するＢピクチャである。 In FIG. 5 (b), the picture F1 is an I picture that does not refer to other pictures at the time of encoding or decoding, and the pictures F4, F7, and F10 are at the time of encoding or decoding. A P picture that refers to a forward I picture or P picture, and the pictures F2, F3, F5, F6, F8, and F9 are, when encoded or decoded, a forward I picture or P picture and a backward P picture. B picture that refers to

そして、各ＶＯＢユニットは、複数のピクチャからなり、その先頭にはＩピクチャが位置し、隣接するＩピクチャとＰピクチャの間、あるいは隣接する２つのＰピクチャの間には２つのＢピクチャが配置されている。また、ＶＯＢユニットＶＯＢＵ（ｆ−１）及びＶＯＢＵ（ｆ＋１）は、ＶＯＢユニットＶＯＢＵ（ｆ）の前後に位置するＶＯＢユニットである。 Each VOB unit is composed of a plurality of pictures, and an I picture is located at the head of the VOB unit. Two B pictures are arranged between adjacent I and P pictures or between two adjacent P pictures. Has been. The VOB units VOBU (f−1) and VOBU (f + 1) are VOB units positioned before and after the VOB unit VOBU (f).

〔再生時の動作〕
そして、再生時には、記録媒体に記録されたオーディオビデオストリームは、埋め込まれている編集点情報、つまり先頭ピクチャが編集点に設定されているＶＯＢユニットに対応するプレイリストの開始時刻と終了時刻に基づいて自動編集して再生される。 [Operation during playback]
At the time of reproduction, the audio video stream recorded on the recording medium is based on embedded edit point information, that is, the start time and end time of the playlist corresponding to the VOB unit in which the first picture is set as the edit point. Automatically edited and played.

なお、上記記録媒体に記録されたオーディオビデオストリームの再生は、編集点をピックアップして自動編集して行うものに限らず、ユーザが設定した編集条件に基づいて、記録されたオーディオビデオストリームにおける、設定された編集条件を満たす部分のみを編集して行うものであってもよい。 Note that the playback of the audio video stream recorded on the recording medium is not limited to the one that is automatically edited by picking up the editing point, but in the recorded audio video stream based on the editing conditions set by the user, It may be performed by editing only the portion satisfying the set editing conditions.

図６は、例えば、設定条件に基づいて、記録されたオーディオビデオストリームを自動編集して再生する処理を説明する図である。
実施の形態１では、撮像装置１０１の再生部（図示せず）は、記録媒体に記録されたオーディオビデオストリームの再生が開始されると、該オーディオビデオストリームに含まれるプレイリストの各アイテムに基づいた処理が完了しているが否かを判定する（ステップＳ２１）。処理が終了している場合は、再生を終了する。 FIG. 6 is a diagram for explaining processing for automatically editing and reproducing a recorded audio-video stream based on setting conditions, for example.
In Embodiment 1, when a playback unit (not shown) of the imaging apparatus 101 starts playback of an audio video stream recorded on a recording medium, the playback unit (not shown) is based on each item of a playlist included in the audio video stream. It is determined whether the process has been completed (step S21). If the process has been completed, the reproduction is terminated.

一方、上記再生部は、ステップＳ２１での判定の結果、処理が終了していない場合は、編集点が編集条件を満たしているか否かを判定し（ステップＳ２２）、特徴量に関する設定条件を満たしているＶＯＢユニットＶＯＢＵを再生する（ステップＳ２３）。 On the other hand, if the result of determination in step S <b> 21 is that processing has not ended, the playback unit determines whether or not the edit point satisfies editing conditions (step S <b> 22), and satisfies the setting condition regarding the feature amount. The current VOB unit VOBU is reproduced (step S23).

なお、オーディオビデオストリームに含まれている編集点に関する情報は、自動編集に利用できるだけでなく、ユーザによるオーディオビデオストリームの編集作業に利用することもできる。 Note that the information related to the editing points included in the audio video stream can be used not only for automatic editing but also for editing audio video streams by the user.

図７は、このような編集点の利用方法を説明する図であり、記録媒体に記録されているオーディオビデオストリームを編集するための表示画面を示している。
ここでは、表示装置２００は、テレビジョンセットやパーソナルコンピュータの表示部であり、その表示画面２１０には、記録媒体に記録されている１つのコンテンツに対応するオーディオビデオストリームの全体を示す帯状インジケータ２１１、該オーディオビデオストリームにおける特定のＶＯＢユニット２１１ａを拡大して示す帯状インジケータ２１２、該ＶＯＢユニット２１１ａにおける、編集点となっているピクチャ２１２ａ、２１２ｂ、２１２ｃ、２１２ｄのサムネイル画面２１３ａ、２１３ｂ、２１３ｃ、２１３ｄが示されている。 FIG. 7 is a diagram for explaining how to use such editing points, and shows a display screen for editing an audio video stream recorded on a recording medium.
Here, the display device 200 is a display unit of a television set or a personal computer, and the display screen 210 has a strip-shaped indicator 211 indicating the entire audio-video stream corresponding to one content recorded on the recording medium. , A band-shaped indicator 212 showing a specific VOB unit 211a in the audio / video stream in an enlarged manner; It is shown.

また、表示画面２１０には、処理用サムネイル表示領域２２０があり、この領域２２０には、ユーザが編集条件を調整する対象となっている編集点のピクチャが表示されている。表示画面２１０では、処理用サムネイル表示領域２２０と隣接して、編集点のピクチャが満たすべき編集条件である特徴量の判定強度を調整するための、各要素に対応した操作領域２３０及び２４０が表示されている。 Further, the display screen 210 includes a processing thumbnail display area 220 in which a picture of an edit point for which the user adjusts the editing conditions is displayed. On the display screen 210, adjacent to the processing thumbnail display area 220, operation areas 230 and 240 corresponding to each element for adjusting the determination strength of the feature amount, which is an editing condition to be satisfied by the editing point picture, are displayed. Has been.

ユーザは、このように表示画面２１０上で、各編集点に設定されているピクチャが満たすべき編集条件、つまり特徴量の判定強度を調整することができる。 In this way, the user can adjust the editing condition to be satisfied by the picture set at each editing point, that is, the determination amount of the feature amount, on the display screen 210.

なお、上記編集サポートのための表示は、Ｉピクチャをすべてサムネイル画面で表示し、編集点となるピクチャのサムネイル画面を、他のＩピクチャのサムネイル画面よりも大きくすることも可能である。 Note that the display for editing support can display all the I pictures on the thumbnail screen, and the thumbnail screen of the picture to be edited can be made larger than the thumbnail screens of the other I pictures.

また、上記編集サポートのための表示は、編集点となるピクチャをサムネイル表示する順序は、特徴量の発生要因の種別に応じた順序としても、あるいは、すべての要因に対して正規化した特徴量の大きさ順としもよい。 In addition, the display for the above-mentioned editing support is such that the order in which the pictures to be edited are displayed as thumbnails may be the order corresponding to the type of the cause of the feature quantity, or the feature quantity normalized with respect to all the factors. It may be in the order of size.

さらに、上記編集サポートの表示は、編集点に設定されているピクチャをスライドショー形式で順次表示するものでもよく、この場合、必要な編集点を要否選択することで一次編集を行い、細かな２次編集のためのサポートを行うことも可能である。 Further, the editing support may be displayed by sequentially displaying pictures set as editing points in a slide show format. In this case, primary editing is performed by selecting whether or not the necessary editing points are necessary. It is also possible to provide support for subsequent editing.

また、上記編集サポートのための表示は、編集点から数秒ずつを自動的につなぎ合わせて、好みのＢＧＭの音程やテンポに合せて編集点を切り替えてダイジェストで表示するものであってもよい。この場合、記録されているオーディオビデオストリームをこのようなダイジェスト版になるよう編集しなおしても、特に編集しないでもこのような表示を行うだけでもよい。 The display for editing support may be one in which several seconds from the editing point are automatically connected, and the editing point is switched according to the favorite BGM pitch and tempo and displayed as a digest. In this case, the recorded audio-video stream may be re-edited so as to have such a digest version, or such display may be performed without any particular editing.

またさらに、制御部２０ａあるいは編集点情報生成部２２ａは、編集が終了したかどうかを認識するフラグも管理するものとし、記録されたオーディオビデオストリームは、編集されたものか否かの情報を有するものとしてもよい。
また、編集されたオーディオビデオストリームは、実データ部分は変更しないで、プレイリストのみ変更したものであってもよい。 Furthermore, the control unit 20a or the editing point information generation unit 22a also manages a flag for recognizing whether or not editing has been completed, and has information on whether or not the recorded audio-video stream has been edited. It may be a thing.
The edited audio / video stream may be one in which only the playlist is changed without changing the actual data portion.

このように本実施の形態１の撮像装置１０１では、被写体の撮影により得られた画像信号Ｄｉｍから、画像の変化の特徴を表す画像特徴量を抽出する画像処理部１１ａと、被写体の撮影により得られた音声信号Ｄａｕから、音声の変化の特徴を表す音声特徴量を抽出する音声処理部１２ａと、撮影者の生理変化を示す情報Ｄｉｄに基づいて、撮影状態の変化の特徴を表す固有特徴量を抽出する固有識別情報処理部１０ａとを備え、抽出された特徴量を予め設定されている判定強度と比較して、上記画像や音声が変化した撮影タイミングが編集点として妥当であるか否かを決定するので、撮影者にとって重要と思われる撮影部分を自動で編集可能なオーディオビデオストリームを生成することができる。 As described above, in the imaging apparatus 101 according to the first embodiment, the image processing unit 11a that extracts the image feature amount representing the feature of the image change from the image signal Dim obtained by photographing the subject, and obtained by photographing the subject. An audio processing unit 12a that extracts a voice feature amount representing a feature of voice change from the obtained voice signal Dau, and a unique feature amount representing a feature of a shooting state change based on information Did indicating a photographer's physiological change Whether or not the shooting timing at which the image or sound changes is appropriate as an editing point by comparing the extracted feature quantity with a preset determination intensity. Therefore, it is possible to generate an audio video stream that can automatically edit a shooting portion that is considered important for the photographer.

また、この実施の形態１では、編集点は、編集点として妥当であると判定された撮影タイミングに近い、ＡＶ符号化データにおけるＶＯＢユニットの切れ目に設定しているので、撮影により得られた画像信号が符号化されている状態でも、符号化された画像信号を処理することなく、編集点の設定が可能である。 In the first embodiment, the edit point is set at the break of the VOB unit in the AV encoded data close to the shooting timing determined to be valid as the edit point. Even when the signal is encoded, the edit point can be set without processing the encoded image signal.

また、この実施の形態１では、編集点を、イベントの発生時点からイベント発生により撮影状況が変化するまでの遅延時間だけ、撮影状況が変化した撮影タイミングから遡った撮影タイミングに設定するので、編集点を、ほぼイベントが実際に発生したタイミングに設定することができる。 In the first embodiment, the editing point is set to the shooting timing retroactive from the shooting timing at which the shooting situation has changed by the delay time from the event occurrence time until the shooting situation changes due to the event occurrence. The point can be set at the timing when the event actually occurs.

また、この実施の形態１では、撮影状況が変化した撮影タイミングを編集点として適切であると判定する際の判定強度を、操作者がマニュアルで設定する場合について説明したが、編集点設定のための個々の要因に対する判定強度は、撮像装置に予め設定されている複数のシナリオのうちから、運動会や結婚式に対応するものを選択することにより、設定するようにしてもよい。 In the first embodiment, the case where the operator manually sets the determination strength when determining that the shooting timing at which the shooting situation has changed is appropriate as the edit point has been described. The determination strength for each of the factors may be set by selecting one corresponding to an athletic meet or wedding from a plurality of scenarios set in advance in the imaging apparatus.

このようにシナリオの選択により個々の要因に対する判定強度を決定する撮像装置は、例えば、実施の形態１の撮像装置において、上記制御部を、複数のシナリオのそれぞれと、上記画像特徴量、音声特徴量、及び固有特徴量の各々に対する閾値レベルの組合せとの対応関係を示すテーブル情報を保持し、ユーザのマニュアル操作により指定されたシナリオと、上記テーブル情報とに基づいて、上記各要因に対応する特徴量の閾値レベルを設定するものとし、さらに上記特徴量判定部を、上記画像特徴量、音声特徴量、及び固有特徴量を、それぞれに対応する、上記制御部で設定された閾値レベルに基づいて、上記画像、音声、及び撮影状態が変化した撮影タイミングが編集点として妥当であるか否かを判定するものとすることにより、実現することが可能である。 As described above, the imaging device that determines the determination strength for each factor by selecting a scenario, for example, in the imaging device of Embodiment 1, the control unit is configured to each of a plurality of scenarios, the image feature amount, and the audio feature. The table information indicating the correspondence relationship between the amount and the combination of the threshold level for each of the unique feature amounts is held, and corresponds to each of the above factors based on the scenario specified by the user's manual operation and the table information. It is assumed that a threshold level of a feature amount is set, and further, the feature amount determination unit is configured based on the threshold level set by the control unit corresponding to the image feature amount, the audio feature amount, and the unique feature amount, respectively. This is realized by determining whether or not the shooting timing when the image, sound, and shooting state have changed is appropriate as an editing point. Rukoto is possible.

この場合、上記複数のシナリオのそれぞれと、上記画像特徴量、音声特徴量、及び固有特徴量の各々に対する閾値レベルの組合せとの対応関係を示すテーブル情報には、ネットワーク上の情報端末からダウンロードして取得したものを利用することも可能である。 In this case, table information indicating a correspondence relationship between each of the plurality of scenarios and a combination of threshold levels for each of the image feature amount, the sound feature amount, and the unique feature amount is downloaded from an information terminal on the network. It is also possible to use what has been acquired.

なお、上記テーブル情報に含まれる各特徴量の閾値レベルの組み合わせは、画像特徴量、音声特徴量、及び固有特徴量のうちの２つでもよく、また、上記テーブル情報は、複数のシナリオのそれぞれと、上記画像特徴量、音声特徴量、及び固有特徴量のいずれか１つに対する閾値レベルとの対応関係を示すものでもよい。 The combination of threshold levels of each feature amount included in the table information may be two of an image feature amount, an audio feature amount, and a unique feature amount, and the table information includes each of a plurality of scenarios. And a threshold level corresponding to any one of the image feature value, the sound feature value, and the unique feature value.

（実施の形態２）
図８は、本発明の実施の形態２による撮像装置を説明するための図である。
本実施の形態２の撮像装置１０２は、実施の形態１の撮像装置１０１における編集点情報生成部２２ａに代えて、編集点となるピクチャがＩピクチャでない場合は、編集点となるピクチャとその近傍のピクチャのピクチャタイプを変更するよう画像処理部１１ａに再符号化を指令する編集点情報生成部２２ｂを備えたものである。また、制御部２０ｂは、再符号化時に画像処理部１１ａを制御する点のみ、実施の形態１の制御部２０ａと異なっている。そして、本実施の形態２の撮影装置１０２のその他の構成は、実施の形態１の撮像装置１０１と同一である。 (Embodiment 2)
FIG. 8 is a diagram for explaining an imaging apparatus according to Embodiment 2 of the present invention.
In the imaging apparatus 102 according to the second embodiment, instead of the editing point information generation unit 22a in the imaging apparatus 101 according to the first embodiment, when a picture that is an editing point is not an I picture, a picture that is an editing point and its vicinity The editing point information generation unit 22b that instructs the image processing unit 11a to perform re-encoding so as to change the picture type of this picture. The control unit 20b is different from the control unit 20a of the first embodiment only in that the image processing unit 11a is controlled at the time of re-encoding. The other configuration of the imaging apparatus 102 according to the second embodiment is the same as that of the imaging apparatus 101 according to the first embodiment.

次に動作について説明する。
この実施の形態２の撮像装置１０２では、撮影前のマニュアル設定操作は、実施の形態１と同様に行われる。 Next, the operation will be described.
In the imaging apparatus 102 according to the second embodiment, the manual setting operation before photographing is performed in the same manner as in the first embodiment.

撮影が開始されると、撮像装置１０２の特徴量判定部２１は、実施の形態１の撮像装置１０１と同様、画像情報、音声情報、及び撮影状態に関する情報を取得し、該取得した情報から得られた、ユーザ操作や撮影者の生理的変化の特徴量、画像の特徴量、及び音声の特徴量に基づいて、撮影状況が変化した撮影タイミングが編集点として妥当であるか否かを判定する。 When shooting is started, the feature amount determination unit 21 of the imaging apparatus 102 acquires image information, audio information, and information regarding the shooting state, and obtains the acquired information from the acquired information, as in the imaging apparatus 101 of the first embodiment. It is determined whether or not the shooting timing at which the shooting situation has changed is appropriate as an edit point based on the user operation and the feature amount of the photographer's physiological change, the image feature amount, and the audio feature amount. .

そして、この実施の形態２では、編集点情報生成部２２ｂは、特徴量判定部２１で、各処理部１０ａ、１１ａ、１２ａから供給されたそれぞれの特徴量に基づいて、撮影状況が変化した撮影タイミングが編集点として妥当であると判定される度に、編集点を示す編集点情報を生成し、編集点に対応するピクチャがＩピクチャ以外である場合には再符号化の指令を画像処理部１１ａに対して行う。 In the second embodiment, the editing point information generation unit 22b is the feature amount determination unit 21, and the shooting state is changed based on the feature amounts supplied from the processing units 10a, 11a, and 12a. Each time it is determined that the timing is valid as an edit point, edit point information indicating the edit point is generated, and if the picture corresponding to the edit point is other than an I picture, a re-encoding command is issued to the image processing unit. 11a.

図９は、編集点情報の生成処理、及び再符号化処理のフローを示す。
この実施の形態２では、制御部２０ｂは、特徴量の種類に応じた遅延時間、つまり特徴量によって異なる、イベントの発生時点から該イベント発生により撮影状況が変化するまでの時間を算出する（ステップＳ１１）。 FIG. 9 shows a flow of edit point information generation processing and re-encoding processing.
In the second embodiment, the control unit 20b calculates a delay time corresponding to the type of feature quantity, that is, a time period that varies depending on the feature quantity from when the event occurs until the shooting situation changes due to the event occurrence (step S20). S11).

次に、編集点情報生成部２２ｂは、上記特徴量が検出された撮影タイミングＴｃｐから上記遅延時間だけ遡った撮影タイミングＴｅｐに対応するピクチャを先頭するＶＯＢユニットを強制的に作成するよう画像処理部１１ａに指令する。すると、画像処理部１１ａは、強制的にＶＯＢユニットＶＯＢＵを作成しなおす再符号化処理を行う（ステップＳ１２ｂ）。 Next, the editing point information generation unit 22b compulsorily creates a VOB unit that starts the picture corresponding to the shooting timing Tep that is backed by the delay time from the shooting timing Tcp at which the feature amount is detected. Command to 11a. Then, the image processing unit 11a performs re-encoding processing for forcibly re-creating the VOB unit VOBU (step S12b).

その後、編集点情報生成部２２ｂは、編集点として妥当であると判定された撮影タイミングＴｅｐを示す編集点情報を生成するとともに、図１０（ｂ）〜（ｄ）に示すように、編集点を、強制的に作成したＶＯＢユニットＶＯＢＵの先頭のＩピクチャに設定したことを示す編集点ピクチャ情報を生成する。そして、オーディオビデオストリームのプレイアイテムを、撮影タイミングＴｅｐが編集点に設定されたことが示されるよう変更する（ステップＳ１３）。 Thereafter, the editing point information generation unit 22b generates editing point information indicating the photographing timing Tep determined to be valid as the editing point, and also displays the editing point as shown in FIGS. 10 (b) to 10 (d). Then, edit point picture information indicating that the first I picture of the forcibly created VOB unit VOBU is set is generated. Then, the play item of the audio video stream is changed so as to indicate that the shooting timing Tep is set as the edit point (step S13).

以下、強制的にＶＯＢユニットＶＯＢＵを作成しなおす再符号化処理を説明する図である。
図１０（ａ）は、複数のピクチャＦ１、Ｆ２、Ｆ３、Ｆ４、Ｆ５、Ｆ６、Ｆ７、Ｆ８、Ｆ９、Ｆ１０、・・・からなる１つのＶＯＢユニットＶＯＢＵ（ｆ）を示している。 Hereinafter, a re-encoding process for forcibly re-creating a VOB unit VOBU will be described.
FIG. 10A shows one VOB unit VOBU (f) composed of a plurality of pictures F1, F2, F3, F4, F5, F6, F7, F8, F9, F10,.

ここで、ピクチャＦ１は、その符号化及び復号化の際に他のピクチャを参照しないＩピクチャであり、ピクチャＦ４、Ｆ７、Ｆ１０は、符号化及び復号化の際に、前方のＩピクチャあるいはＰピクチャを参照するＰピクチャであり、ピクチャＦ２、Ｆ３、Ｆ５、Ｆ６、Ｆ８、Ｆ９は、符号化及び復号化の際に、前方のＩピクチャあるいはＰピクチャと後方のＰピクチャとを参照するＢピクチャであり、図１０（ａ）に示す各ピクチャは、ＭＰＥＧ‐２で規定されている本来の参照関係となっている。 Here, the picture F1 is an I picture that does not refer to other pictures at the time of encoding and decoding, and the pictures F4, F7, and F10 are the front I picture or P at the time of encoding and decoding. P pictures that refer to pictures, and pictures F2, F3, F5, F6, F8, and F9 are B pictures that refer to forward I pictures or P pictures and backward P pictures at the time of encoding and decoding. Each picture shown in FIG. 10A has an original reference relationship defined by MPEG-2.

図１０（ｂ）は、編集点となるピクチャが、ＶＯＢユニットＶＯＢＵ（ｆ）の４番目のピクチャＦ４となり、このピクチャＦ４のピクチャタイプを変更し、かつその前の２つのＢピクチャＦ２及びＦ３の参照関係を変更する場合を示している。 In FIG. 10B, the picture to be edited is the fourth picture F4 of the VOB unit VOBU (f), the picture type of this picture F4 is changed, and the two previous B pictures F2 and F3 are changed. The case where the reference relationship is changed is shown.

この場合は、ピクチャＦ４は、ＰピクチャからＩピクチャに変更され、ＢピクチャＦ２及びＦ３は、前方のＩピクチャＦ１のみを参照するよう再符号化される。また、ピクチャＦ４を先頭とする新たなＶＯＢユニットＶＯＢＵ（ｆｂ１）が作成され、ピクチャＦ４以降のピクチャのインデックスの付け替えなどの処理が行われる。なお、ＶＯＢユニットＶＯＢＵ（ｆａ１）は、ＢピクチャＦ２及びＦ３の参照関係を変更した、ＶＯＢユニットＶＯＢＵ（ｆｂ１）直前の新たなＶＯＢユニットである。 In this case, the picture F4 is changed from the P picture to the I picture, and the B pictures F2 and F3 are re-encoded to refer only to the front I picture F1. Also, a new VOB unit VOBU (fb1) starting from the picture F4 is created, and processing such as changing the index of pictures after the picture F4 is performed. The VOB unit VOBU (fa1) is a new VOB unit immediately before the VOB unit VOBU (fb1), in which the reference relationship between the B pictures F2 and F3 is changed.

図１０（ｃ）は、編集点となるピクチャが、ＶＯＢユニットＶＯＢＵ（ｆ）の５番目のピクチャＦ５となり、このピクチャＦ５及びその後のＢピクチャＦ６の参照関係を変更し、ＰピクチャＦ７のピクチャタイプを変更する場合を示している。 In FIG. 10C, the picture to be edited is the fifth picture F5 of the VOB unit VOBU (f), the reference relationship between this picture F5 and the subsequent B picture F6 is changed, and the picture type of the P picture F7 Shows the case of changing.

この場合は、ピクチャＦ７は、ＰピクチャからＩピクチャに変更され、ピクチャＦ５及びＦ６は、ピクチャタイプが変更された後方のＩピクチャＦ７のみを参照するよう再符号化される。また、ピクチャＦ５を先頭とする新たなＶＯＢユニットＶＯＢＵ（ｆｂ２）が作成され、ピクチャＦ８以降のピクチャのインデックスの付け替えなどの処理が行われる。なお、ＶＯＢユニットＶＯＢＵ（ｆａ２）は、ＰピクチャＦ４を最終ピクチャとする、ＶＯＢユニットＶＯＢＵ（ｆｂ２）直前の新たなＶＯＢユニットである。 In this case, the picture F7 is changed from the P picture to the I picture, and the pictures F5 and F6 are re-encoded to refer only to the rear I picture F7 whose picture type is changed. Also, a new VOB unit VOBU (fb2) starting from the picture F5 is created, and processing such as changing the index of pictures after the picture F8 is performed. The VOB unit VOBU (fa2) is a new VOB unit immediately before the VOB unit VOBU (fb2) with the P picture F4 as the last picture.

図１０（ｄ）は、編集点となるピクチャが、ＶＯＢユニットＶＯＢＵ（ｆ）の６番目のピクチャＦ６となり、このピクチャＦ６の参照関係と、その前後のＢピクチャＦ５及びＦ７の参照関係を変更する場合を示している。 In FIG. 10D, the picture that is the editing point is the sixth picture F6 of the VOB unit VOBU (f), and the reference relationship of this picture F6 and the reference relationship of the B pictures F5 and F7 before and after that are changed. Shows the case.

この場合は、ピクチャＦ７は、ＰピクチャからＩピクチャに変更され、ピクチャＦ５は、その前方のＰピクチャＦ４のみを参照し、ピクチャＦ６は、その後方のＰピクチャＦ７のみを参照するよう再符号化される。また、ピクチャＦ６を先頭とする新たなＶＯＢユニットＶＯＢＵ（ｆｂ３）が作成され、ピクチャＦ８以降のピクチャのインデックスの付け替えなどの処理が行われる。なお、ＶＯＢユニットＶＯＢＵ（ｆａ３）は、ＰピクチャＦ５を最終ピクチャとする、ＶＯＢユニットＶＯＢＵ（ｆｂ３）直前の新たなＶＯＢユニットである。 In this case, the picture F7 is changed from the P picture to the I picture, the picture F5 is re-encoded to refer only to the front P picture F4, and the picture F6 is referred to only the rear P picture F7. Is done. Also, a new VOB unit VOBU (fb3) starting from the picture F6 is created, and processing such as changing the index of pictures after the picture F8 is performed. The VOB unit VOBU (fa3) is a new VOB unit immediately before the VOB unit VOBU (fb3) with the P picture F5 as the final picture.

このような構成の実施の形態２では、被写体の撮影により得られた画像信号から、画像の変化の特徴を表す画像特徴量を抽出する画像処理部１１ａと、被写体の撮影により得られた音声信号から、音声の変化の特徴を表す音声特徴量を抽出する音声処理部１２ａと、撮影者の生理変化を示す情報に基づいて、撮影状態の変化の特徴を表す固有特徴量を抽出する固有識別情報処理部１０ａとを備え、抽出された特徴量を予め設定されている判定強度と比較して、画像や音声などが変化した撮影タイミングが編集点として妥当であるか否かを判定するので、実施の形態１と同様、撮影者にとって重要と思われる撮影部分を自動で編集可能なオーディオビデオストリームを生成することができる。 In the second embodiment having such a configuration, an image processing unit 11a that extracts an image feature amount representing a feature of an image change from an image signal obtained by photographing a subject, and an audio signal obtained by photographing the subject. From the voice processing unit 12a that extracts a voice feature amount that represents the feature of a change in voice, and unique identification information that extracts a unique feature amount that represents a feature of a change in shooting state based on information indicating a physiological change of the photographer The processing unit 10a is provided, and the extracted feature value is compared with a predetermined determination strength, and it is determined whether or not the shooting timing at which the image or the sound has changed is appropriate as the editing point. As in the first embodiment, it is possible to generate an audio video stream that can automatically edit a shooting portion that is considered important for the photographer.

また、この実施の形態２では、編集点は、編集点として妥当であると判定された撮影タイミングに対応するピクチャが、ＶＯＢユニットの切れ目となるよう、そのピクチャタイプ及びその周辺のピクチャの参照関係が変更されるよう、これらのピクチャを再符号化するので、撮影により得られた画像信号が符号化されている状態でも、編集点の設定を正確に行うことができる。 In the second embodiment, the edit point is a reference relationship between the picture type and surrounding pictures so that the picture corresponding to the shooting timing determined to be valid as the edit point becomes a break in the VOB unit. Since these pictures are re-encoded so as to be changed, edit points can be set accurately even when the image signal obtained by shooting is encoded.

また、この実施の形態２では、編集点は、イベントの発生から該イベント発生により撮影状態が変化するまでの遅延時間だけ、画像や音声などの撮影状況が変化した撮影タイミングから遡った撮影タイミングに設定するので、編集点を、ほぼイベントが実際に発生したタイミングに設定することができる。 Further, in the second embodiment, the editing point is set to the shooting timing retroactive from the shooting timing when the shooting situation such as the image and the sound changes by the delay time from the occurrence of the event until the shooting state changes due to the event occurrence. Since it is set, the edit point can be set at the timing when the event actually occurs.

なお、上記実施の形態２では、編集点に設定されたピクチャが画面間予測ピクチャである場合は、このピクチャが面内予測ピクチャとなるようトランスコードして記録するようにしているが、トランスコードにより得られた面内予測ピクチャは、上記編集点に設定された画面間予測ピクチャとは別に、そのサブピクチャとして記録するようにしてもよい。
この場合、編集時には、編集点に設定されている画面間予測ピクチャをそのサブピクチャとして記録されている画面内予測ピクチャと置き換え、該置き換えた画面内予測ピクチャを、編集点であるＶＯＢユニットの先頭ピクチャとして再生に利用することができる。 In the second embodiment, when the picture set at the edit point is an inter-picture prediction picture, the picture is transcoded and recorded so as to be an in-plane prediction picture. The in-plane prediction picture obtained by the above may be recorded as a sub-picture separately from the inter-picture prediction picture set at the editing point.
In this case, at the time of editing, the inter-screen prediction picture set at the editing point is replaced with the intra-screen prediction picture recorded as the sub picture, and the replaced intra-screen prediction picture is replaced with the head of the VOB unit that is the editing point. It can be used for playback as a picture.

（実施の形態３）
図１１は、本発明の実施の形態３による撮像装置を説明するための図である。
本実施の形態３の撮像装置１０３は、実施の形態１の撮像装置１０１における編集点情報生成部２２ａに代えて、編集点を挿入する際、符号化前のバッファデータがあるか否かによって、先頭ピクチャが編集点に対応した新たなＶＯＢユニットＶＯＢＵを生成する処理と、編集点をこの編集点に最も近いＶＯＢユニットＶＯＢＵの切れ目に設定する処理とを切り替える編集点情報生成部２２ｃを備えたものである。また、制御部２０ｃは、編集点の設定処理の切り替えに応じて画像処理部１１ａを制御する点のみ、実施の形態１の制御部２０ａと異なっている。そして、本実施の形態３の撮影装置１０３のその他の構成は、実施の形態１の撮像装置１０１と同一である。 (Embodiment 3)
FIG. 11 is a diagram for explaining an imaging apparatus according to Embodiment 3 of the present invention.
The imaging apparatus 103 according to the third embodiment replaces the editing point information generation unit 22a in the imaging apparatus 101 according to the first embodiment and inserts editing points depending on whether there is buffer data before encoding. An edit point information generation unit 22c that switches between a process of generating a new VOB unit VOBU corresponding to the edit point of the first picture and a process of setting the edit point at the break of the VOB unit VOBU closest to the edit point It is. Further, the control unit 20c is different from the control unit 20a of the first embodiment only in that the image processing unit 11a is controlled in accordance with switching of the editing point setting process. The other configuration of the imaging apparatus 103 according to the third embodiment is the same as that of the imaging apparatus 101 according to the first embodiment.

次に動作について説明する。
この実施の形態３の撮像装置では、撮影前のマニュアル設定操作は、実施の形態１と同様に行われる。 Next, the operation will be described.
In the imaging apparatus according to the third embodiment, the manual setting operation before shooting is performed in the same manner as in the first embodiment.

撮影が開始されると、撮像装置１０３は、実施の形態１の撮像装置１０１と同様、画像情報、音声情報、及び撮影状態を示す情報を取得し、該取得した情報から得られた、ユーザ操作や撮影者の生理的変化の特徴量、画像の特徴量、及び音声の特徴量に基づいて、画像や音声などの撮影状況が変化した撮影タイミングが編集点として妥当であるか否かを判定する。 When shooting starts, the imaging device 103 acquires image information, audio information, and information indicating the shooting state, as with the imaging device 101 of the first embodiment, and a user operation obtained from the acquired information. And whether or not the shooting timing at which the shooting situation such as the image or sound changes is appropriate as the editing point based on the feature amount of the photographer's physiological change, the feature amount of the image, and the feature amount of the sound .

そして、この実施の形態３では、編集点情報生成部２２ｃは、特徴量判定部２１で、各処理部１０ａ、１１ａ、１２ａから供給されたそれぞれの特徴量に基づいて、撮影状況が変化した撮影タイミングが編集点として妥当であると判定される度に、編集点を設定した撮影タイミングを示す編集点情報を生成し、編集点の設定処理を行う。 In the third embodiment, the editing point information generation unit 22c is a feature amount determination unit 21 that captures a change in the shooting state based on the feature amounts supplied from the processing units 10a, 11a, and 12a. Each time it is determined that the timing is appropriate as an edit point, edit point information indicating the shooting timing at which the edit point is set is generated, and edit point setting processing is performed.

図１２は、編集点の設定処理のフローを示す。
この実施の形態３では、制御部２０ｃは、特徴量の種類に応じた遅延時間、つまり特徴量によって異なる、イベントの発生時点から該イベント発生により撮影状況が変化するまでの時間を算出する（ステップＳ１１）。 FIG. 12 shows a flow of edit point setting processing.
In the third embodiment, the control unit 20c calculates a delay time corresponding to the type of feature quantity, that is, a time period that varies depending on the feature quantity from the time when the event occurs until the shooting situation changes due to the event occurrence (step S1). S11).

次に、編集点情報生成部２２ｃは、遅延時間の算出時点で、符号化前の画像信号であるバッファデータがあるか否かを判定し（ステップＳ１１ａ）、符号化前のバッファデータがあると判定された場合は、作成途中のＶＯＢユニットＶＯＢＵをクローズして、新たなＶＯＢユニットＶＯＢＵを作成する（ステップＳ１２ｃ）。一方、ステップＳ１１ａにて、符号化前のバッファデータがないと判定された場合は、撮影状況が変化したタイミングＴｃｐから、算出された遅延時間だけ遡った撮影タイミングＴｅｐ以前で、この撮影タイミングＴｅｐに最も近いＶＯＢユニットＶＯＢＵの切れ目を編集点とする処理を行う（ステップＳ１２ａ）。このステップＳ１２ａの処理は、実施の形態１のステップＳ１２ａの処理と同じものである。 Next, the edit point information generation unit 22c determines whether or not there is buffer data that is an image signal before encoding at the time of calculation of the delay time (step S11a), and if there is buffer data before encoding. If it is determined, the VOB unit VOBU being created is closed and a new VOB unit VOBU is created (step S12c). On the other hand, if it is determined in step S11a that there is no buffer data before encoding, the shooting timing Tep is set to the shooting timing Tep before the shooting timing Tep, which is retroactive by the calculated delay time from the timing Tcp at which the shooting situation has changed. Processing for setting the cut point of the nearest VOB unit VOBU as an edit point is performed (step S12a). The process of step S12a is the same as the process of step S12a of the first embodiment.

その後、編集点情報生成部２２ｃは、編集点として妥当であると判定された撮影タイミングＴｅｐを示す編集点情報を生成するとともに、上記ステップＳ１２ａ及びＳ１２ｃのいずれかの処理により編集点が設定されたかが示されるよう、システムストリームのプレイアイテムを変更する（ステップＳ１３）。 Thereafter, the editing point information generation unit 22c generates editing point information indicating the photographing timing Tep determined to be valid as the editing point, and whether or not the editing point has been set by any one of the steps S12a and S12c. As shown, the play item of the system stream is changed (step S13).

以下、図１３は、上記ステップＳ１２ｃで、先頭ピクチャを編集点に設定した新たなＶＯＢユニットＶＯＢＵを作成する処理を説明する図である。
図１３（ａ）は、複数のピクチャＪ１、Ｊ２、Ｊ３、Ｊ４、Ｊ５、Ｊ６、Ｊ７、Ｊ８、Ｊ９、Ｊ１０、・・・からなる１つのＶＯＢユニットＶＯＢＵ（ｊ）を示している。 Hereinafter, FIG. 13 is a diagram for explaining the process of creating a new VOB unit VOBU in which the first picture is set as an edit point in step S12c.
FIG. 13A shows one VOB unit VOBU (j) composed of a plurality of pictures J1, J2, J3, J4, J5, J6, J7, J8, J9, J10,.

ここで、ピクチャＪ１は、その符号化及び復号化の際に他のピクチャを参照しないＩピクチャであり、ピクチャＪ４、Ｊ７、Ｊ１０は、符号化及び復号化の際に、前方のＩピクチャあるいはＰピクチャを参照するＰピクチャであり、ピクチャＪ２、Ｊ３、Ｊ５、Ｊ６、Ｊ８、Ｊ９は、符号化及び復号化の際に、前方のＩピクチャあるいはＰピクチャと後方のＰピクチャとを参照するＢピクチャであり、ＶＯＢユニットＶＯＢＵ（ｊ）の各ピクチャは、ＭＰＥＧ‐２で規定されている本来の参照関係となっている。 Here, the picture J1 is an I picture that does not refer to other pictures at the time of encoding and decoding, and the pictures J4, J7, and J10 are the preceding I picture or P at the time of encoding and decoding. P pictures that refer to pictures, and pictures J2, J3, J5, J6, J8, and J9 are B pictures that refer to a forward I picture or P picture and a backward P picture at the time of encoding and decoding. Each picture of the VOB unit VOBU (j) has an original reference relationship defined by MPEG-2.

図１３（ｂ）は、ＶＯＢユニットＶＯＢＵ（ｊ）の４番目のピクチャＪ４を編集点として新たなＶＯＢユニットＶＯＢＵを生成する場合を示している。
この場合は、ＶＯＢユニットＶＯＢＵ（ｊ）におけるＰピクチャとして符号化されるべきピクチャＪ４は、新たなＶＯＢユニットＶＯＢＵ（ｊａ）の先頭のＩピクチャＪａ１として符号化される。ＶＯＢユニットＶＯＢＵ（ｊ）における、それぞれＢピクチャである２番目ピクチャＪ２と３番目のピクチャＪ３は、前方のＩピクチャＪ１のみを参照するＢピクチャとして符号化される。なお、ＶＯＢユニットＶＯＢＵ（ｊａ）におけるピクチャＪａ４、Ｊａ７は、符号化及び復号化の際に、前方のＩピクチャあるいはＰピクチャを参照するＰピクチャであり、ＶＯＢユニットＶＯＢＵ（ｊａ）におけるピクチャＪａ２、Ｊａ３、Ｊａ５、Ｊａ６は、符号化及び復号化の際に、前方のＩピクチャあるいはＰピクチャと後方のＰピクチャとを参照するＢピクチャである。 FIG. 13B shows a case where a new VOB unit VOBU is generated with the fourth picture J4 of the VOB unit VOBU (j) as an editing point.
In this case, the picture J4 to be encoded as the P picture in the VOB unit VOBU (j) is encoded as the first I picture Ja1 of the new VOB unit VOBU (ja). In the VOB unit VOBU (j), the second picture J2 and the third picture J3, which are B pictures, are encoded as B pictures that refer to only the front I picture J1. Note that the pictures Ja4 and Ja7 in the VOB unit VOBU (ja) are P pictures that refer to the preceding I picture or P picture in encoding and decoding, and the pictures Ja2 and Ja3 in the VOB unit VOBU (ja). , Ja5, and Ja6 are B pictures that refer to the front I picture or P picture and the rear P picture at the time of encoding and decoding.

図１３（ｃ）は、ＶＯＢユニットＶＯＢＵ（ｊ）の５番目のピクチャＪ５を編集点として新たなＶＯＢユニットＶＯＢＵを生成する場合を示している。
この場合は、ＶＯＢユニットＶＯＢＵ（ｊ）におけるＢピクチャとして符号化されるべきピクチャＪ５は、新たなＶＯＢユニットＶＯＢＵ（ｊｂ）の先頭のＩピクチャとして符号化される。なお、ＶＯＢユニットＶＯＢＵ（ｊｂ）におけるピクチャＪ８は、符号化及び復号化の際に、前方のＩピクチャを参照するＰピクチャであり、ＶＯＢユニットＶＯＢＵ（ｊｂ）におけるピクチャＪ６、Ｊ７、Ｊ９、Ｊ１０は、符号化及び復号化の際に、前方のＩピクチャあるいはＰピクチャと後方のＰピクチャとを参照するＢピクチャである。 FIG. 13C shows a case where a new VOB unit VOBU is generated with the fifth picture J5 of the VOB unit VOBU (j) as an editing point.
In this case, the picture J5 to be encoded as the B picture in the VOB unit VOBU (j) is encoded as the first I picture of the new VOB unit VOBU (jb). Note that the picture J8 in the VOB unit VOBU (jb) is a P picture that refers to the front I picture at the time of encoding and decoding, and the pictures J6, J7, J9, and J10 in the VOB unit VOBU (jb) are The B picture refers to a front I picture or P picture and a rear P picture at the time of encoding and decoding.

図１３（ｄ）は、ＶＯＢユニットＶＯＢＵ（ｊ）の６番目のピクチャＪ６を編集点として新たなＶＯＢユニットＶＯＢＵを生成する場合を示している。
この場合は、ＶＯＢユニットＶＯＢＵ（ｊ）におけるＢピクチャとして符号化されるべきピクチャＪ６は、新たなＶＯＢユニットＶＯＢＵ（ｊｃ）の先頭のＩピクチャとして符号化される。ＶＯＢユニットＶＯＢＵ（ｊ）における、Ｂピクチャである５番目ピクチャＪ５は、前方のＰピクチャＪ４のみを参照するＢピクチャとして符号化される。なお、ＶＯＢユニットＶＯＢＵ（ｊｃ）におけるピクチャＪ９は、符号化及び復号化の際に、前方のＩピクチャを参照するＰピクチャであり、ＶＯＢユニットＶＯＢＵ（ｊｃ）におけるピクチャＪ７、Ｊ８、Ｊ１０は、符号化及び復号化の際に、前方のＩピクチャあるいはＰピクチャと後方のＰピクチャとを参照するＢピクチャである。 FIG. 13D shows a case where a new VOB unit VOBU is generated with the sixth picture J6 of the VOB unit VOBU (j) as an editing point.
In this case, the picture J6 to be encoded as the B picture in the VOB unit VOBU (j) is encoded as the first I picture of the new VOB unit VOBU (jc). The fifth picture J5 that is a B picture in the VOB unit VOBU (j) is encoded as a B picture that refers to only the front P picture J4. Note that the picture J9 in the VOB unit VOBU (jc) is a P picture that refers to the front I picture at the time of encoding and decoding, and the pictures J7, J8, and J10 in the VOB unit VOBU (jc) are code This is a B picture that refers to a forward I picture or P picture and a backward P picture at the time of encoding and decoding.

このような構成の実施の形態３では、被写体の撮影により得られた画像信号から、画像の変化の特徴を表す画像特徴量を抽出する画像処理部１１ａと、被写体の撮影により得られた音声信号から、音声の変化の特徴を表す音声特徴量を抽出する音声処理部１２ａと、撮影者の生理変化を示す情報に基づいて、撮影状態の変化の特徴を表す固有特徴量を抽出する固有識別情報処理部１０ａとを備え、抽出された特徴量を予め設定されている判定強度と比較して、この特徴量の発生した撮影タイミングが編集点として妥当であるか否かを判定するので、実施の形態１と同様、撮影者にとって重要と思われる撮影部分を自動で編集可能なオーディオビデオストリームを生成することができる。 In the third embodiment having such a configuration, an image processing unit 11a that extracts an image feature amount representing a feature of an image change from an image signal obtained by photographing a subject, and an audio signal obtained by photographing the subject. From the voice processing unit 12a that extracts a voice feature amount that represents the feature of a change in voice, and unique identification information that extracts a unique feature amount that represents a feature of a change in shooting state based on information indicating a physiological change of the photographer A processing unit 10a, and the extracted feature value is compared with a predetermined determination strength to determine whether the shooting timing at which the feature value is generated is valid as an edit point. As in the first mode, it is possible to generate an audio / video stream that can automatically edit a shooting portion that seems to be important for the photographer.

また、この実施の形態３では、編集点を挿入する際、符号化前のバッファデータがあるか否かによって、編集点を先頭ピクチャとする新たなＶＯＢユニットＶＯＢＵを生成する処理と、イベントの発生タイミングに最も近いＶＯＢユニットＶＯＢＵの切れ目を編集点とする処理とを切り替えるので、撮影により得られた画像信号が符号化されていない場合は、編集点を基準としてＶＯＢユニットＶＯＢＵを生成することにより正確な位置に編集ポイントを設定することができ、また、撮影により得られた画像信号が符号化されている場合には、オーディオビデオストリームを処理することなく、編集点の設定を簡単に行うことができる。 In the third embodiment, when an edit point is inserted, processing for generating a new VOB unit VOBU having the edit point as the first picture and occurrence of an event depending on whether or not there is buffer data before encoding. Since the process of changing the cut point of the VOB unit VOBU closest to the timing to the edit point is switched, if the image signal obtained by shooting is not encoded, the VOB unit VOBU is accurately generated by using the edit point as a reference. Edit points can be set at various positions, and when the image signal obtained by shooting is encoded, the edit points can be easily set without processing the audio-video stream. it can.

また、この実施の形態３では、編集点は、イベント発生からその検出、つまりイベント発生により撮影状況が変化するまでの遅延時間だけ、撮影状況が変化したタイミングから遡った撮影タイミングに設定するので、編集点を、ほぼイベントが実際に発生した撮影タイミングに設定することができる。 In the third embodiment, the editing point is set to the shooting timing retroactive from the timing at which the shooting situation has changed by the delay time from the occurrence of the event to its detection, that is, until the shooting situation changes due to the event occurrence. The editing point can be set to the shooting timing when the event actually occurs.

なお、上記実施の形態３では、イベントが発生してから実際に画像、音声、あるいは撮影状態が変化するまでの遅延時間に応じて、編集点を設定する撮影タイミングを決定しているが、イベントは、画像、音声、あるいは撮影状態が変化した後に発生する場合もあり、このような場合には、画像、音声、あるいは撮影状態の変化からイベント発生までの時間に応じて、編集点を設定する撮影タイミングを決定するようにしてもよい。 In the third embodiment, the shooting timing for setting the edit point is determined according to the delay time from the occurrence of the event to the actual change of the image, sound, or shooting state. May occur after the image, sound, or shooting state changes. In such a case, the edit point is set according to the time from the change of the image, sound, or shooting state to the event occurrence. The shooting timing may be determined.

また、上記実施の形態３では、撮影により得られた画像信号が符号化されている場合には、イベントが発生したタイミングに最も近いＶＯＢユニットＶＯＢＵの切れ目を編集点としているが、この場合は、編集点として妥当であると判定された撮影タイミングに対応するピクチャが、ＶＯＢユニットの切れ目となるよう、そのピクチャタイプ及びその周辺のピクチャの参照関係が変更されるよう、これらのピクチャを再符号化するようにしてもよい。 In the third embodiment, when the image signal obtained by shooting is encoded, the cut point of the VOB unit VOBU closest to the timing at which the event occurs is set as the editing point. In this case, These pictures are re-encoded so that the picture type and the reference relationship of the surrounding pictures are changed so that the picture corresponding to the shooting timing determined to be valid as the editing point becomes a break in the VOB unit. You may make it do.

この場合、図１４に示すように、編集点を設定する際、符号化前のバッファデータがあるか否かの判定（ステップＳ１１ａ）の結果によって、編集点を先頭ピクチャとする新たなＶＯＢユニットＶＯＢＵを生成する処理（ステップＳ１２ｃ）と、編集点として妥当であると判定された撮影タイミングに対応するピクチャが、ＶＯＢユニットの切れ目となり、かつそのピクチャタイプ及びその周辺のピクチャの参照関係が変更されるよう、これらのピクチャを再符号化する処理（ステップＳ１２ｂ）とが切り替えられることとなる。 In this case, as shown in FIG. 14, when setting an edit point, a new VOB unit VOBU having the edit point as the first picture is determined based on the result of determination as to whether there is buffer data before encoding (step S11a). And the picture corresponding to the shooting timing determined to be valid as the edit point become a break in the VOB unit, and the reference relationship between the picture type and the surrounding pictures is changed. As described above, the process of re-encoding these pictures (step S12b) is switched.

さらに、撮影により得られた画像信号が符号化されている場合には、イベント発生タイミングに最も近いＶＯＢユニットＶＯＢＵの切れ目を編集点とする処理と、再符号化により強制的にＶＯＢユニットＶＯＢＵを生成する処理とを、再符号化に要する時間が上記画像処理部での符号化処理に利用可能な残り時間を超えているか否かに応じて切り替えるようにしてもよい。 Furthermore, when the image signal obtained by shooting is encoded, processing for setting the break of the VOB unit VOBU closest to the event occurrence timing as an edit point and forcibly generating the VOB unit VOBU by re-encoding You may make it switch to the process to perform according to whether the time which re-encoding requires exceeds the remaining time which can be utilized for the encoding process in the said image process part.

図１５は、符号化前のバッファデータがない場合に、符号化に使える残り時間に応じて、編集点を設定する処理を切り替えるフローを示している。
この場合、編集点を挿入する際、符号化前のバッファデータがあるか否かを判定し（ステップＳ１２）、バッファデータがあると判定された場合は、実施の形態３と同様に、編集点を先頭ピクチャとする新たなＶＯＢユニットＶＯＢＵを強制的に生成する処理（ステップＳ１３ａ）を行う。 FIG. 15 shows a flow for switching processing for setting an edit point according to the remaining time available for encoding when there is no buffer data before encoding.
In this case, when the edit point is inserted, it is determined whether there is buffer data before encoding (step S12). If it is determined that there is buffer data, the edit point is the same as in the third embodiment. A process for forcibly generating a new VOB unit VOBU with the first picture as the first picture is performed (step S13a).

一方、符号化前のバッファデータがないと判定された場合には、再符号化に要する時間が、その時点で画像処理部１１ａでの符号化処理に利用できる残り時間を超えているか否かを判定する（ステップＳ１２ａ）。そして、再符号化に要する時間が、その時点で符号化処理に使える残り時間を超えていると判定された場合には、編集点は、イベント発生タイミングに近いＶＯＢユニットの切れ目に設定し（ステップＳ１３ｃ）、一方、ステップＳ１２ａにて、再符号化に要する時間が、その時点で符号化処理に使える残り時間を超えていないと判定された場合には、イベント発生タイミングに対応するピクチャを先頭とするＶＯＢユニットを強制的に作成する再符号化処理を行う（ステップＳ１３ｂ）。なお、図１５に示す処理フローでは、符号化前のバッファデータがない場合には、上記ステップＳ１３ｂの処理とステップＳ１３ｃの処理を、再符号化に要する時間と、画像処理部での符号化処理に使える残り時間との比較結果に応じて切り替えているが、この処理フローは、符号化前のバッファデータがない場合は、予め撮影者が設定した、ステップＳ１３ｂ及びステップＳ１３ｃのいずれか一方の処理を行うものであってもよい。 On the other hand, if it is determined that there is no buffer data before encoding, it is determined whether or not the time required for re-encoding exceeds the remaining time available for encoding processing in the image processing unit 11a at that time. Determination is made (step S12a). If it is determined that the time required for re-encoding exceeds the remaining time that can be used for the encoding process at that time, the edit point is set at the break of the VOB unit close to the event occurrence timing (step S13c) On the other hand, if it is determined in step S12a that the time required for re-encoding does not exceed the remaining time usable for the encoding process at that time, the picture corresponding to the event occurrence timing is set as the head. A re-encoding process for forcibly creating a VOB unit to be performed is performed (step S13b). In the processing flow shown in FIG. 15, when there is no buffer data before encoding, the processing in step S13b and the processing in step S13c are performed using the time required for re-encoding and the encoding processing in the image processing unit. The processing flow is changed according to either one of steps S13b and S13c previously set by the photographer when there is no buffer data before encoding. It may be what performs.

さらに、上記各実施の形態では、オーディオビデオストリームはＭＰＥＧ‐２に対応するシステムストリームを想定しているが、オーディオビデオストリームは、ＭＰＥＧ‐４やＭＰＥＧ‐４ＡＶＣに対応するシステムストリームを想定したものであってもよい。 Further, in each of the above embodiments, the audio video stream is assumed to be a system stream corresponding to MPEG-2, but the audio video stream is assumed to be a system stream corresponding to MPEG-4 or MPEG-4 AVC. There may be.

ただし、ＭＰＥＧ‐４ＡＶＣ対応のシステムストリームでは、Ｉピクチャには、ランダムアクセス不可能なＩピクチャとランダムアクセス可能なＩピクチャ（ＩＤＲ）があるため、編集点として設定するＩピクチャは、イベント発生タイミングから最も近い、ランダムアクセス可能なＩピクチャ（ＩＤＲ）とされる。 However, in an MPEG-4AVC-compatible system stream, there are I pictures that cannot be randomly accessed and I pictures (IDR) that can be accessed randomly, so that the I picture set as the edit point is determined from the event occurrence timing. It is the closest, randomly accessible I picture (IDR).

また、ＭＰＥＧ‐４ＡＶＣ対応のシステムストリームには、補助的な情報の書き込み領域（ＳＥＩ）が設定されているため、この書き込み領域に、特徴量の発生がどのような要因によるものであるかを示す情報を埋め込むこともできる。 In addition, since an auxiliary information writing area (SEI) is set in the MPEG-4 AVC compatible system stream, it indicates what causes the generation of the feature amount in this writing area. You can also embed information.

また、上記各実施の形態では、オーディオビデオストリームは、１つのシーケンスに対応するピクチャのデータを含むものであるが、このストリームは、１つのシーケンスに対応するピクチャのデータのほかに、サムネイル編集選択のためのシーケンス外のサブピクチャのデータを埋め込んだものであってもよい。この場合、編集時には、編集点として適切なピクチャを、サムネイル表示により一目で確認することができる。 In each of the above embodiments, the audio / video stream includes picture data corresponding to one sequence. This stream is used for thumbnail editing selection in addition to picture data corresponding to one sequence. The sub-picture data outside the sequence may be embedded. In this case, at the time of editing, an appropriate picture as an editing point can be confirmed at a glance by displaying thumbnails.

また、上記各実施の形態では、編集点に設定されたピクチャを全て編集に利用しているが、編集点が多いと編集しにくいということも考えられるので、編集点の設定後に、各編集点の設定要因毎に、つまり画像の変化や音声の変化などの別に、編集点を間引くようにしてよい。例えば、設定された複数の編集点から、音声の変化によって設定された編集点を削除することにより、編集時に利用する編集点の情報を削減することができる。 In each of the above embodiments, all the pictures set as edit points are used for editing. However, it may be difficult to edit if there are many edit points. The editing points may be thinned out for each setting factor, that is, for each change of the image or the change of the sound. For example, by deleting an edit point set by a change in sound from a plurality of set edit points, it is possible to reduce information on the edit point used at the time of editing.

また、ＭＰＥＧ‐４ＡＶＣのシステムストリームでは、ランダムアクセス可能なＩピクチャ（ＩＤＲ）は、ランダムアクセス不可能なＩピクチャよりも間隔をあけて配置されているため、このようなＩピクチャ（ＩＤＲ）を編集点として設定することにより、編集点の数を減らすことができる。 In addition, in the MPEG-4 AVC system stream, randomly accessible I pictures (IDR) are arranged with a gap from I pictures that cannot be randomly accessed, so such I pictures (IDR) are edited. By setting as a point, the number of editing points can be reduced.

またさらに、上記各実施の形態では、編集点の設定は、イベント発生時の特徴量が一定の判定強度以上であるか否かを判定して、イベント発生タイミングを編集点として設定しているが、イベント発生タイミングをすべて編集点として設定し、編集時に、編集点を実際に利用するか否かを決定するようにしてもよい。 Furthermore, in each of the above-described embodiments, the edit point is set by determining whether or not the feature amount at the time of the event occurrence is equal to or higher than a predetermined determination strength and setting the event occurrence timing as the edit point. Alternatively, all the event occurrence timings may be set as edit points, and at the time of editing, it may be determined whether or not the edit points are actually used.

このような構成の撮影装置は、具体的には、実施の形態１〜３のいずれかの撮像装置の情報生成部を、画像、音声、あるいは撮影状態が変化した撮影タイミングを編集点として示す編集点情報を生成するものとし、さらに、その特徴量判定部を、オーディオビデオストリームを編集する際、上記画像特徴量、音声特徴量、あるいは固有特徴量を判定して、上記編集点情報が編集点として示す撮影タイミングを編集に用いるか否かを決定するものとすることにより実現できる。 Specifically, the image capturing apparatus configured as described above is configured to edit the information generation unit of any of the image capturing apparatuses according to any one of the first to third embodiments using an image, a sound, or a shooting timing at which the shooting state is changed as an edit point. Point information is generated, and the feature amount determination unit determines the image feature amount, the audio feature amount, or the unique feature amount when the audio video stream is edited, and the edit point information is determined to be an edit point. It can be realized by determining whether or not to use the photographing timing shown as for editing.

この場合、具体的には、上記画像、音声、及び撮影状態を変化させるすべてのイベントの発生時刻は、撮影タイミングを編集点として示す編集点情報として上記オーディオビデオストリームに埋め込まれることとなる。このため、イベント発生タイミングを編集点に設定する際には、イベント発生タイミングを編集点として利用するか否かの判定をリアルタイムで行う必要がなくなる。 In this case, specifically, the occurrence time of all the events that change the image, sound, and shooting state is embedded in the audio video stream as editing point information indicating the shooting timing as an editing point. For this reason, when the event occurrence timing is set to the edit point, it is not necessary to determine in real time whether the event occurrence timing is used as the edit point.

また、上記各実施の形態では、編集点を、被写体の画像や音声が変化した撮影タイミングだけでなく、撮影者の生理現象に変化が生じた撮影タイミングや撮影者が撮影器装置を操作した撮影タイミングにも設定する撮影装置を示したが、編集点は、被写体の画像や音声が変化した撮影タイミングのみに設定するようにしてもよい。この場合、撮像装置は、上記実施の形態の固有識別情報取得部１０及び固有識別情報処理部１０ａを含まないものとなる。 Further, in each of the above embodiments, the editing point is not only the shooting timing when the subject image or sound changes, but also the shooting timing when the photographer's physiological phenomenon changes or the shooting when the photographer operates the camera device. Although the photographing apparatus that also sets the timing is shown, the editing point may be set only at the photographing timing when the image or sound of the subject has changed. In this case, the imaging apparatus does not include the unique identification information acquisition unit 10 and the unique identification information processing unit 10a of the above embodiment.

また、上記各実施の形態の説明では特に言及していないが、図１に示す実施の形態１の撮像装置１０１、図８に示す実施の形態２の撮像装置１０２、及び図１１に示す実施の形態３の撮像装置１０３における、撮像部１１、記録媒体３０ａ及び記録媒体インターフェース３０を除く各機能部は、典型的には集積回路であるＬＳＩとして実現されるものである。これらの機能部は、個別に１チップ化したものでもよいし、それらのうちのいくつかを、またはそれらの全てを含むように１チップ化したものでもよい。 Although not particularly mentioned in the description of each of the above embodiments, the imaging device 101 according to the first embodiment shown in FIG. 1, the imaging device 102 according to the second embodiment shown in FIG. 8, and the embodiment shown in FIG. In the imaging apparatus 103 according to the third aspect, each functional unit excluding the imaging unit 11, the recording medium 30a, and the recording medium interface 30 is typically realized as an LSI that is an integrated circuit. These functional units may be individually made into one chip, or some of them may be made into one chip so as to include all of them.

例えば、上記各実施の形態の撮像装置における複数の機能部は、記録媒体３０ａ及び記録媒体インターフェース３０に相当するメモリ以外の機能部を、１チップ化したものでもよい。 For example, the plurality of functional units in the imaging apparatus of each of the above embodiments may be a single-chip functional unit other than the memory corresponding to the recording medium 30a and the recording medium interface 30.

またここでは、集積回路にはＬＳＩと呼ばれるものを例に挙げたが、該集積回路は、集積度の違いにより、ＩＣ、システムＬＳＩ、スーパーＬＳＩ、ウルトラＬＳＩと呼称されることもある。 Here, the integrated circuit is referred to as an LSI, but the integrated circuit may be referred to as an IC, a system LSI, a super LSI, or an ultra LSI depending on the degree of integration.

また、集積回路化の手法は、１つまたは複数の機能部をＬＳＩとして実現するものに限らず、該機能部を専用回路又は汎用プロセサで実現してもよい。また、ＬＳＩとしては、その製造後に、プログラムすることが可能なＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）や、ＬＳＩ内部の回路セルの接続や設定を再構成可能なリコンフィギュラブル・プロセッサーを利用しても良い。 Further, the method of circuit integration is not limited to one in which one or more functional units are realized as an LSI, and the functional unit may be realized in a dedicated circuit or a general-purpose processor. Further, as the LSI, a Field Programmable Gate Array (FPGA) that can be programmed after its manufacture or a reconfigurable processor that can reconfigure the connection and setting of circuit cells inside the LSI may be used. .

さらには、半導体技術の進歩又は派生する別技術により、ＬＳＩに置き換わる集積回路化の技術が登場すれば、当然、その技術を用いて機能部の集積化を行ってもよく、例えば、将来的な集積回路化の技術はバイオ技術を適応したもの等である可能性がありえる。
また、近年、被写体を撮影してその動画像を記録可能なデジタルカメラや携帯端末が開発されており、このようなものに、上記実施の形態１〜３の撮像装置を構成する各機能部を搭載することにより、撮影者にとって重要と思われる撮影部分を、自動で、あるいはガイダンスに対する簡単な選択操作により編集可能なオーディオビデオストリームを記録し、該ストリームの所要部分を自動編集して再生するデジタルカメラや携帯端末を実現することができる。 Furthermore, if integrated circuit technology that replaces LSI emerges as a result of advances in semiconductor technology or other technologies derived from it, naturally, functional units may be integrated using this technology. There is a possibility that the technology of circuit integration is one that is applied biotechnology.
In recent years, digital cameras and portable terminals capable of photographing a subject and recording a moving image have been developed, and the functional units constituting the imaging devices of the first to third embodiments are added to such a digital camera and a portable terminal. With this digital audio recording, an audio / video stream that can be edited automatically or with a simple selection operation for guidance is recorded, and the required part of the stream is automatically edited and played back. A camera or a mobile terminal can be realized.

本発明の撮像装置は、撮影者にとって重要と思われる撮影部分を、自動で、あるいはガイダンスに対する簡単な選択操作により編集可能なオーディオビデオストリームを生成するものであり、特に、家庭用のデジタルビデオカメラ、さらにはデジタルカメラや携帯端末などにおいて有用である。 The imaging apparatus of the present invention generates an audio / video stream that can be edited automatically or by a simple selection operation with respect to a guidance, which is considered to be important for a photographer, and in particular, a digital video camera for home use. Furthermore, it is useful in digital cameras, portable terminals, and the like.

本発明の実施の形態１による撮像装置１０１を説明するブロック図である。It is a block diagram explaining the imaging device 101 by Embodiment 1 of this invention. 実施の形態１の撮像装置１０１における記録媒体に記憶されているオーディオビデオストリームを説明する図である。6 is a diagram illustrating an audio / video stream stored in a recording medium in the imaging apparatus 101 according to Embodiment 1. FIG. 実施の形態１の撮像装置１０１にて自動編集点挿入に関する条件を設定する操作を説明する図である。6 is a diagram for explaining an operation for setting a condition relating to automatic edit point insertion in the imaging apparatus 101 according to Embodiment 1. FIG. 上記実施の形態１の撮像装置１０１の動作を説明する図である。It is a figure explaining operation | movement of the imaging device 101 of the said Embodiment 1. FIG. 上記実施の形態１の撮像装置１０１における編集点設定処理を具体的に説明する図であり、処理フロー（図（ａ））、及び撮影タイミングと、オーディオビデオストリームにおけるＶＯＢユニットの切れ目との関係（図（ｂ））を示している。It is a figure explaining the edit point setting process in the imaging device 101 of the said Embodiment 1 concretely, and is the relationship between a processing flow (FIG. (A)) and imaging | photography timing, and the break of the VOB unit in an audio video stream ( The figure (b)) is shown. 上記実施の形態１の撮像装置１０１により得られたオーディオビデオストリームを自動編集して再生する処理のフローを示す図である。It is a figure which shows the flow of a process which automatically edits and reproduces the audio video stream obtained by the imaging device 101 of the said Embodiment 1. FIG. 上記実施の形態１の撮像装置１０１における、編集ポイントの利用方法を説明する図である。It is a figure explaining the utilization method of an edit point in the imaging device 101 of the said Embodiment 1. FIG. 本発明の実施の形態２による撮像装置１０２を説明するための図である。It is a figure for demonstrating the imaging device 102 by Embodiment 2 of this invention. 上記実施の形態２の撮像装置１０２における編集点設定処理のフローを示す図である。It is a figure which shows the flow of the edit point setting process in the imaging device 102 of the said Embodiment 2. FIG. 上記実施の形態２による撮像装置１０２における、強制的にＶＯＢユニットＶＯＢＵを作成しなおす再符号化処理を説明する図であり、符号化時における通常の参照関係（図（ａ））、再符号化によるピクチャタイプ及び参照関係の３つの変更例（図（ｂ）〜図（ｄ））を示す。It is a figure explaining the re-encoding process which forcibly recreates VOB unit VOBU in the imaging device 102 by the said Embodiment 2, and the normal reference relationship at the time of encoding (figure (a)), re-encoding Three modification examples (FIG. (B) to (d)) of the picture type and the reference relationship according to FIG. 本発明の実施の形態３による撮像装置１０３を説明するための図である。It is a figure for demonstrating the imaging device 103 by Embodiment 3 of this invention. 上記実施の形態３の撮像装置１０３における編集点の挿入処理のフローを示す図である。It is a figure which shows the flow of an edit point insertion process in the imaging device 103 of the said Embodiment 3. FIG. 上記実施の形態３による撮像装置１０３における、新たなＶＯＢユニットＶＯＢＵを作成する処理を説明する図であり、符号化時における通常の参照関係（図（ａ））、新たに作成されたＶＯＢユニットの３つの変更例（図（ｂ）〜図（ｄ））を示す。It is a figure explaining the process which produces the new VOB unit VOBU in the imaging device 103 by the said Embodiment 3, the normal reference relationship at the time of an encoding (figure (a)), and the newly created VOB unit Three modification examples (FIGS. (B) to (d)) are shown. 上記実施の形態３の撮像装置１０３における編集点挿入処理フローの変形例を示す図である。It is a figure which shows the modification of the edit point insertion process flow in the imaging device 103 of the said Embodiment 3. FIG. 上記実施の形態３の撮像装置１０３における編集点挿入処理フローの他の変形例を示す図である。It is a figure which shows the other modification of the edit point insertion processing flow in the imaging device 103 of the said Embodiment 3. FIG.

Explanation of symbols

１０固有識別情報取得部
１０ａ固有識別情報処理部
１１撮像部
１１ａ画像処理部
１２音声取得部
１２ａ音声処理部
２０ａ，２０ｂ，２０ｃ制御部
２１特徴量判定部
２２ａ，２２ｂ，２２ｃ編集点情報生成部
３０記録媒体インターフェース部
３０ａ記録媒体
１０１，１０２、１０３撮像装置 DESCRIPTION OF SYMBOLS 10 Unique identification information acquisition part 10a Unique identification information processing part 11 Imaging part 11a Image processing part 12 Audio | voice acquisition part 12a Audio | voice processing part 20a, 20b, 20c Control part 21 Feature-value determination part 22a, 22b, 22c Edit point information generation part 30 Recording medium interface unit 30a Recording medium 101, 102, 103 Imaging device

Claims

An imaging device that acquires image information and audio information by photographing a subject and records an audio video stream including the image information and audio information,
An imaging unit for imaging a subject and outputting an image signal;
An image processing unit that performs signal processing on an image signal obtained by imaging the subject and extracts image information including an image feature amount indicating a feature of an image change;
An audio acquisition unit that acquires audio and outputs an audio signal;
A voice processing unit that performs signal processing on the voice signal obtained by the voice acquisition and extracts voice information including a voice feature amount indicating a feature of voice change;
A feature amount determination unit that determines that a shooting timing at which the image or sound has changed is appropriate as an editing point when the image feature amount or the sound feature amount is larger than a predetermined threshold;
An information generation unit that generates edit point information indicating a shooting timing determined to be appropriate as the edit point;
An audio video stream including the image information, audio information, and editing point information is stored in a recording medium;
The information generation unit
It is determined whether or not buffer data that is image information before encoding is held in the image processing unit,
When the buffer data before encoding is held, the editing point is set to a picture corresponding to the shooting timing at which the image, sound, or shooting state has changed,
When the buffer data before the encoding is not held, the edit point is set to the shooting timing at which the image or the sound in the stream obtained by encoding the image signal by the image processing unit changes. Set it to the first picture of the VOB unit that is the nearest random access unit,
The image processing unit
When the buffer data before encoding is held in the image processing unit, the VOB unit is formed so that the picture corresponding to the edit point becomes the first picture of the VOB unit.
An imaging apparatus characterized by that.

The imaging device according to claim 1,
A unique identification information acquisition unit for acquiring unique identification information indicating a shooting state;
A unique identification information processing unit that performs signal processing on the acquired unique identification information and extracts a unique feature amount indicating a characteristic of a change in a shooting state;
The feature amount determination unit determines that the shooting timing at which the image, sound, or shooting state has changed is appropriate as an edit point when the image feature amount, the audio feature amount, or the unique feature amount is larger than a predetermined threshold. To
An imaging apparatus characterized by that.

The imaging apparatus according to claim 2, wherein
The unique feature amount indicates the magnitude of the photographer's physiological change that occurred during shooting, or the magnitude of adjustment by the photographer's operation.
An imaging apparatus characterized by that.

The imaging device according to claim 3.
The physiological change of the photographer that occurred during the photographing is at least one of a change in the amount of sweat of the photographer, a change in α wave, a change in the number of blinks, a change in the pupil, and a change in the pulse,
The unique identification information acquisition unit has a sensor that measures the photographer's physiological change and that corresponds to the type of the physiological change.
An imaging apparatus characterized by that.

The imaging device according to claim 1,
The image processing unit performs an inter-screen predictive encoding process for predictively encoding a picture to be encoded with reference to an encoded picture with respect to an image signal obtained by imaging a subject, The image feature amount is extracted based on a motion vector indicating the magnitude of motion of the image used in the inter-screen predictive encoding process,
The audio processing unit performs an encoding process corresponding to the encoding process for the image signal on the audio signal obtained by acquiring the audio,
The information generation unit sets a specific picture in an image stream obtained by encoding an image signal as the edit point, based on a shooting timing determined to be appropriate as the edit point.
An imaging apparatus characterized by that.

The imaging device according to claim 1,
The audio processing unit extracts the audio feature amount based on a magnitude of a change in the audio signal;
An imaging apparatus characterized by that.

The imaging apparatus according to claim 2, wherein
Based on a user's manual operation signal, a control unit that sets a threshold level for each of the image feature amount or the sound feature amount, and the unique feature amount,
The feature amount determination unit determines each feature amount based on a corresponding threshold level set by the control unit, and the shooting timing at which the image, sound, or shooting state changes is valid as an editing point. To determine whether or not
An imaging apparatus characterized by that.

The imaging apparatus according to claim 2, wherein
Table information indicating a correspondence relationship between each of a plurality of scenarios and a combination of threshold levels corresponding to each of the image feature amount, the sound feature amount, and the unique feature amount, and a scenario designated by a user's manual operation And a control unit for setting threshold levels of the various feature amounts based on the table information,
The feature amount determination unit determines each feature amount based on a corresponding threshold level set by the control unit, and the shooting timing at which the image, sound, or shooting state changes is valid as an editing point. To determine whether or not
An imaging apparatus characterized by that.

The imaging apparatus according to claim 8.
The table information is obtained by downloading from an information terminal on the network.
An imaging apparatus characterized by that.

The imaging device according to claim 1,
The information generation unit
The edit point is set to a shooting timing according to a delay time from when an event that changes the image or sound occurs until the image or sound actually changes.
An imaging apparatus characterized by that.

An imaging device that acquires image information and audio information by photographing a subject and records an audio video stream including the image information and audio information,
An imaging unit for imaging a subject and outputting an image signal;
An image processing unit that performs signal processing on an image signal obtained by imaging the subject and extracts image information including an image feature amount indicating a feature of an image change;
An audio acquisition unit that acquires audio and outputs an audio signal;
A voice processing unit that performs signal processing on the voice signal obtained by the voice acquisition and extracts voice information including a voice feature amount indicating a feature of voice change;
A feature amount determination unit that determines that the shooting timing at which the image or sound has changed is appropriate as an edit point when the image feature amount or the sound feature amount is greater than a predetermined threshold;
An information generation unit that generates edit point information indicating a shooting timing determined to be appropriate as the edit point;
An audio video stream including the image information, audio information, and editing point information is stored in a recording medium;
The information generation unit
It is determined whether or not buffer data that is image information before encoding is held in the image processing unit,
If the buffer data before encoding is held, the editing point is set to the picture corresponding to the shooting timing at which the image or sound has changed,
If the buffer data before encoding is not held, the remaining time available for the encoding process in the image processing unit is compared with the time required for re-encoding,
When the time required for the re-encoding exceeds the remaining time available for the encoding process in the image processing unit, the edit point is an image obtained by encoding the image signal by the image processing unit. Set the first picture of the VOB unit, which is the unit of random access, closest to the shooting timing at which the image or sound changes in the stream,
If the time required for the re-encoding does not exceed the remaining time available for the encoding process in the image processing unit, the image processing unit is instructed to re-encode the image stream,
The image processing unit
When the buffer data before encoding is held, a VOB unit is formed so that the picture corresponding to the edit point becomes the first picture of the VOB unit,
Not held above before encoding buffered data, and the case where the time required for re-encoding does not exceed the remaining time available for the encoding process in the image processing section, the upper Kiga image stream Are re-encoded so that the picture corresponding to the edit point is the I picture located at the head of the VOB unit.
An imaging apparatus characterized by that.

The imaging device according to claim 1 or 11,
Recording the time at which an event giving a change to the image or sound occurs as an edit point in an audio video stream;
An imaging apparatus characterized by that.

The imaging device according to claim 12, wherein
Recording the time of occurrence of the event in the audio-video stream as a playlist indicating playback conditions;
An imaging apparatus characterized by that.

The imaging device according to claim 12, wherein
Embedding in the audio video stream information indicating whether the edit point is due to image or audio factors;
An imaging apparatus characterized by that.

The imaging device according to claim 1 or 11,
The information generation unit
Embed a picture corresponding to the time at which an event giving a change to the image or sound occurs as an out-of-sequence picture used for thumbnail display at the time of editing in the audio video stream.
An imaging apparatus characterized by that.

An imaging method for acquiring image information and audio information by photographing a subject and recording an audio video stream including the image information and audio information,
An imaging step of imaging a subject and outputting an image signal;
An image processing step of performing signal processing on an image signal obtained by imaging the subject and extracting image information including an image feature amount indicating a feature of image change;
An audio acquisition step of acquiring audio and outputting an audio signal;
A voice processing step of performing voice signal processing on the voice signal obtained by the voice acquisition to extract voice information including a voice feature amount indicating a feature of voice change;
A feature amount determination step for determining that a shooting timing at which the image or sound has changed is appropriate as an edit point when the image feature amount or the sound feature amount is larger than a predetermined threshold;
An information generation step for generating edit point information indicating a photographing timing determined to be appropriate as the edit point;
Storing an audio video stream including the image information, audio information, and editing point information in a recording medium, and
The information generation step includes
It is determined whether or not buffer data that is image information before encoding is held in the image processing unit that executes the image processing step,
When the buffer data before encoding is held, the editing point is set to a picture corresponding to the shooting timing at which the image, sound, or shooting state has changed,
When the buffer data before the encoding is not held, the edit point is set to the shooting timing at which the image or the sound in the stream obtained by encoding the image signal by the image processing unit changes. Set it to the first picture of the VOB unit that is the nearest random access unit,
The image processing step includes
When the buffer data before encoding is held in the image processing unit, the VOB unit is formed so that the picture corresponding to the edit point becomes the first picture of the VOB unit.
An imaging method characterized by the above.

A semiconductor device that acquires image information and audio information by photographing a subject and records an audio video stream including the image information and audio information,
An image processing unit that performs signal processing on an image signal obtained by imaging the subject and extracts image information including an image feature amount indicating a feature of an image change;
An audio acquisition unit that acquires audio and outputs an audio signal;
A voice processing unit that performs signal processing on the voice signal obtained by the voice acquisition and extracts voice information including a voice feature amount indicating a feature of voice change;
A feature amount determination unit that determines that a shooting timing at which the image or sound has changed is appropriate as an editing point when the image feature amount or the sound feature amount is larger than a predetermined threshold;
An information generation unit that generates edit point information indicating a shooting timing determined to be appropriate as the edit point;
An audio video stream including the image information, audio information, and editing point information is stored in a recording medium;
The information generation unit
It is determined whether or not buffer data that is image information before encoding is held in the image processing unit,
When the buffer data before encoding is held, the editing point is set to a picture corresponding to the shooting timing at which the image, sound, or shooting state has changed,
When the buffer data before the encoding is not held, the edit point is set to the shooting timing at which the image or the sound in the stream obtained by encoding the image signal by the image processing unit changes. Set it to the first picture of the VOB unit that is the nearest random access unit,
The image processing unit
When the buffer data before encoding is held in the image processing unit, the VOB unit is formed so that the picture corresponding to the edit point becomes the first picture of the VOB unit.
A semiconductor device.