JP2015159405A

JP2015159405A - image processing apparatus, imaging device, control method, program, and storage medium

Info

Publication number: JP2015159405A
Application number: JP2014032726A
Authority: JP
Inventors: 保彦岩本; Yasuhiko Iwamoto; 一好清澤; Iazuyoshi Kiyosawa
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2014-02-24
Filing date: 2014-02-24
Publication date: 2015-09-03

Abstract

PROBLEM TO BE SOLVED: To provide an image processing apparatus which allows for automatic selection of a sub-video suitable of the user from a plurality of sub-videos associated with a main video, and to provide an imaging device, a control method, and a program.SOLUTION: An imaging device includes a face detector 125, an evaluation value by expression calculation unit 126, and a system control unit 150. The face detector 125 detects the face of a person from the sub-videos recorded in a recording medium 125. The evaluation value by expression calculation unit 126 calculates an evaluation value representing to what expression the face of the person corresponds, based on the area of the person thus detected, for each expression. The system control unit 150 discriminates the expression of the sub-video based on the calculation result from the expression calculation unit 126, selects a predetermined number of sub-videos from a plurality of sub-videos recorded in a recording medium 104, and displays the main video recorded in a recording medium 104 and the sub-videos thus selected on a display 128.

Description

本発明は、撮影されたメイン映像に対応付けて記録された複数のサブ映像から所定のサブ映像を自動的に選択して表示する画像処理装置、撮像装置、制御方法、プログラム、及び記憶媒体に関する。 The present invention relates to an image processing apparatus, an imaging apparatus, a control method, a program, and a storage medium that automatically select and display a predetermined sub-video from a plurality of sub-videos recorded in association with a captured main video. .

従来、第１カメラ部と第２カメラ部の２つのカメラ部を備える撮像装置がある（例えば、特許文献１参照）。特許文献１に記載された撮像装置では、撮影モード時に第１カメラ部と第２カメラ部を用いて同時に撮影することが可能となっている。これによれば、第１カメラ部による撮影により被写体の映像が得られ、第２カメラ部による撮影により撮影者の映像が得られる。また、第１カメラ部及び第２カメラ部による撮影で得られたそれぞれの映像を合成して表示することで、被写体と撮影者が一体となった映像を表示することができる。 2. Description of the Related Art Conventionally, there is an imaging apparatus including two camera units, a first camera unit and a second camera unit (see, for example, Patent Document 1). In the imaging apparatus described in Patent Document 1, it is possible to simultaneously photograph using the first camera unit and the second camera unit in the photographing mode. According to this, an image of the subject can be obtained by photographing with the first camera unit, and an image of the photographer can be obtained by photographing with the second camera unit. Further, by synthesizing and displaying the respective images obtained by photographing with the first camera unit and the second camera unit, it is possible to display an image in which the subject and the photographer are integrated.

特開２００５−９４７４１号公報Japanese Patent Laid-Open No. 2005-94741

上記特許文献１記載の第１カメラ部と第２カメラ部を備える撮像装置において、第１カメラ部で撮影したメイン映像を再生して表示する度に、第２カメラ部でメイン映像の閲覧者をサブ映像として撮影することが考えられる。この場合には１つのメイン映像に対して複数の閲覧者のサブ映像が撮像装置で記録され、被写体と被写体を閲覧した複数の閲覧者とを関連付けた映像が得られる。 In the imaging apparatus including the first camera unit and the second camera unit described in Patent Document 1, each time the main video captured by the first camera unit is reproduced and displayed, the viewer of the main video is displayed by the second camera unit. It is conceivable to shoot as a sub video. In this case, sub-videos of a plurality of viewers are recorded with respect to one main video by the imaging device, and a video in which a subject and a plurality of viewers who viewed the subject are associated with each other is obtained.

しかし、特許文献１では、複数のサブ映像から所定のサブ映像を選択して表示する方法については言及していない。また、撮像装置では撮影した映像を表示する映像再生領域が限られることから、撮像装置で記録したサブ映像が多い場合には映像再生領域に表示するサブ映像を選択する必要がある。この場合には撮像装置の制御部でユーザにとって好適なサブ映像を自動的に選択して表示することが望ましい。 However, Patent Document 1 does not mention a method of selecting and displaying a predetermined sub video from a plurality of sub videos. In addition, since the video playback area for displaying the captured video is limited in the imaging apparatus, it is necessary to select the sub video to be displayed in the video playback area when there are many sub videos recorded by the imaging apparatus. In this case, it is desirable to automatically select and display a sub-video suitable for the user by the control unit of the imaging apparatus.

本発明の目的は、メイン映像に対応付けられた複数のサブ映像からユーザにとって好適なサブ映像を自動的に選択可能とした画像処理装置、撮像装置、制御方法、プログラム、及び記憶媒体を提供することにある。 An object of the present invention is to provide an image processing apparatus, an imaging apparatus, a control method, a program, and a storage medium that can automatically select a sub video suitable for a user from a plurality of sub videos associated with a main video. There is.

上記目的を達成するため、本発明は、メイン映像と該メイン映像に対応付けられた複数のサブ映像を記録する記録手段と、前記記録手段に記録されたサブ映像から人物の顔を検出する検出手段と、前記検出手段により検出された人物の顔の領域を基に、該人物の顔がどのような表情に相当するかを表す評価値を表情別に算出する表情別評価値算出手段と、前記表情別評価値算出手段により算出された評価値を基に、前記記録手段に記録された前記複数のサブ映像からサブ映像を選択する選択手段と、前記記録手段に記録された前記メイン映像と共に前記選択手段により選択されたサブ映像を表示する制御手段と、を備えることを特徴とする。 In order to achieve the above object, the present invention provides a recording means for recording a main video and a plurality of sub-videos associated with the main video, and a detection for detecting a human face from the sub-video recorded in the recording means. And an evaluation value calculation means for each expression that calculates an evaluation value for each expression based on the facial area of the person detected by the detection means, and an evaluation value representing what expression the person's face corresponds to, Based on the evaluation value calculated by the expression-specific evaluation value calculation means, the selection means for selecting a sub video from the plurality of sub videos recorded in the recording means, and the main video recorded in the recording means together with the main video And a control means for displaying the sub video selected by the selection means.

本発明によれば、記録手段に記録されたサブ映像から検出した人物の顔の領域を基に、人物の顔がどのような表情に相当するかを表す評価値を表情別に算出する。更に、算出された評価値を基に、記録手段に記録されたメイン映像と共に選択されたサブ映像を表示する。これにより、メイン映像に対応付けられた複数のサブ映像からユーザにとって好適なサブ映像を自動的に選択することが可能となる。 According to the present invention, the evaluation value representing what expression the person's face corresponds to is calculated for each expression based on the area of the person's face detected from the sub video recorded in the recording means. Furthermore, based on the calculated evaluation value, the selected sub video is displayed together with the main video recorded in the recording means. Thereby, it is possible to automatically select a sub video suitable for the user from a plurality of sub videos associated with the main video.

本発明の第１実施形態に係る撮像装置の構成を示すブロック図である。1 is a block diagram illustrating a configuration of an imaging apparatus according to a first embodiment of the present invention. 第１実施形態に係る撮像装置の再生モード時の処理を示すフローチャートである。4 is a flowchart illustrating processing in a playback mode of the imaging apparatus according to the first embodiment. 第２実施形態に係る撮像装置の再生モード時の処理を示すフローチャートである。It is a flowchart which shows the process at the time of the reproduction | regeneration mode of the imaging device which concerns on 2nd Embodiment. 第３実施形態に係る撮像装置の再生モード時の処理を示すフローチャートである。14 is a flowchart illustrating processing in a playback mode of the imaging apparatus according to the third embodiment. メイン映像に対する閲覧者の顔と表情別評価値と表情判別結果を示す図である。It is a figure which shows the viewer's face with respect to a main image | video, the evaluation value classified by expression, and a facial expression discrimination | determination result. メイン映像とメイン映像に関連付けられた複数のサブ映像の表示例を示す図である。It is a figure which shows the example of a display of the some sub video linked | related with the main video and the main video. 第４実施形態に係る撮像装置のサブ映像から閲覧者を選択する処理を説明する図である。It is a figure explaining the process which selects a viewer from the sub image | video of the imaging device which concerns on 4th Embodiment. 第５実施形態に係る撮像装置のサブ映像から閲覧者を選択する処理を説明する図である。It is a figure explaining the process which selects a viewer from the sub image | video of the imaging device which concerns on 5th Embodiment.

以下、本発明の実施形態を図面を参照しながら詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

〔第１実施形態〕
図１は、本発明の第１実施形態に係る撮像装置の構成を示すブロック図である。図１において、撮像装置１００は、本発明の画像処理装置を実現するための一例であり、メイン撮像ユニット１９０とサブ撮像ユニット１９１の２つの撮像ユニットを備えるデジタルカメラとして構成されている。 [First Embodiment]
FIG. 1 is a block diagram showing the configuration of the imaging apparatus according to the first embodiment of the present invention. In FIG. 1, an imaging apparatus 100 is an example for realizing the image processing apparatus of the present invention, and is configured as a digital camera including two imaging units, a main imaging unit 190 and a sub imaging unit 191.

メイン撮像ユニット１９０とサブ撮像ユニット１９１は、それぞれ、シャッタ１０１、バリア１０２、撮影レンズ１０３、撮像部１２２、Ａ／Ｄ変換器１２３を備えている。メイン撮像ユニット１９０とサブ撮像ユニット１９１は、互いに異なる方向に向けることが可能な構造を有する。メイン撮像ユニット１９０は、被写体（メイン映像）の撮影に用いる。サブ撮像ユニット１９１は、メイン撮像ユニット１９０で撮影した被写体（メイン映像）を閲覧する閲覧者（サブ映像）の撮影などに用いる。 The main imaging unit 190 and the sub imaging unit 191 include a shutter 101, a barrier 102, a photographing lens 103, an imaging unit 122, and an A / D converter 123, respectively. The main imaging unit 190 and the sub imaging unit 191 have structures that can be directed in different directions. The main imaging unit 190 is used for shooting a subject (main video). The sub imaging unit 191 is used for shooting a viewer (sub video) browsing the subject (main video) captured by the main imaging unit 190.

撮影レンズ１０３は、ズームレンズ、フォーカスレンズを含むレンズ群である。シャッタ１０１は、絞り機能を備える。撮像部１２２は、被写体の光学像を電気信号に変換するＣＣＤもしくはＣＭＯＳ素子等から構成されている。Ａ／Ｄ変換器１２３は、撮像部１２２から出力されるアナログ信号をデジタル信号に変換する。バリア１０２は、撮影レンズ１０３等を覆うことにより、撮影レンズ１０３、シャッタ１０１、撮像部１２２を含む撮像系の汚れや破損を防止する。 The photographing lens 103 is a lens group including a zoom lens and a focus lens. The shutter 101 has a diaphragm function. The imaging unit 122 includes a CCD or CMOS element that converts an optical image of a subject into an electrical signal. The A / D converter 123 converts the analog signal output from the imaging unit 122 into a digital signal. The barrier 102 covers the photographing lens 103 and the like, thereby preventing the imaging system including the photographing lens 103, the shutter 101, and the imaging unit 122 from being soiled or damaged.

画像処理部１２４は、画質調整処理した画像信号から静止画データを生成する処理を含む各種処理を行う。即ち、Ａ／Ｄ変換器１２３から出力されるデータまたはメモリ制御部１１５から出力されるデータに対し、所定の画素補間、縮小といったリサイズ処理、色変換処理を行う。また、画像処理部１２４は、撮像した画像データを用いて所定の演算処理を行う。得られた演算結果に基づいてシステム制御部１５０が露光制御、測距制御を行う。これにより、ＴＴＬ（スルー・ザ・レンズ）方式のＡＦ（オートフォーカス）処理、ＡＥ（自動露出）処理、ＥＦ（フラッシュプリ発光）処理が行われる。 The image processing unit 124 performs various processes including a process of generating still image data from an image signal subjected to image quality adjustment processing. That is, resize processing such as predetermined pixel interpolation and reduction, and color conversion processing are performed on the data output from the A / D converter 123 or the data output from the memory control unit 115. In addition, the image processing unit 124 performs predetermined calculation processing using the captured image data. Based on the obtained calculation result, the system control unit 150 performs exposure control and distance measurement control. Thereby, AF (autofocus) processing, AE (automatic exposure) processing, and EF (flash pre-emission) processing of the TTL (through-the-lens) method are performed.

更に、画像処理部１２４は、撮像した画像データを用いて所定の演算処理を行い、得られた演算結果に基づいてＴＴＬ方式のＡＷＢ（オートホワイトバランス）処理も行う。Ａ／Ｄ変換器１２３から出力されるデータは、画像処理部１２４及びメモリ制御部１１５を介してまたはメモリ制御部１１５を介してメモリ１３２に書き込まれる。 Further, the image processing unit 124 performs predetermined calculation processing using the captured image data, and also performs TTL AWB (auto white balance) processing based on the obtained calculation result. Data output from the A / D converter 123 is written into the memory 132 via the image processing unit 124 and the memory control unit 115 or via the memory control unit 115.

顔検出部１２５は、Ａ／Ｄ変換器１２３から出力されるデータまたはメモリ制御部１１５から出力されるデータに対し、撮像装置１００により撮影した人物の顔を検出する所定の顔検出処理を行う。顔検出結果はメモリ制御部１１５を介してメモリ１３２に書き込まれる。顔検出部１２５は、本発明の検出手段として機能する。 The face detection unit 125 performs predetermined face detection processing for detecting the face of a person photographed by the imaging apparatus 100 on the data output from the A / D converter 123 or the data output from the memory control unit 115. The face detection result is written into the memory 132 via the memory control unit 115. The face detection unit 125 functions as detection means of the present invention.

表情別評価値算出部１２６は、Ａ／Ｄ変換器１２３から出力されるデータまたはメモリ制御部１１５から出力されるデータに対し、撮像装置１００により撮影した人物の表情別に評価値を算出する表情別評価値算出処理を行う。表情別評価値は、後述の図５（ａ）、図５（ｂ）に示すように、人物の顔がどのような表情（笑顔、泣き顔、怒り顔）に相当するかを数値で表したものである。表情別評価値算出部１２６は、本発明の表情別評価値算出手段として機能する。 The facial expression-specific evaluation value calculation unit 126 calculates an evaluation value for each facial expression of a person photographed by the imaging apparatus 100 for the data output from the A / D converter 123 or the data output from the memory control unit 115. An evaluation value calculation process is performed. The evaluation value for each facial expression is a numerical value representing what facial expression (smile, crying face, angry face) the person's face corresponds to, as shown in FIGS. 5 (a) and 5 (b) described later. It is. The expression-specific evaluation value calculation unit 126 functions as the expression-specific evaluation value calculation means of the present invention.

メモリ１３２は、撮像部１２２により得られＡ／Ｄ変換器１２３でデジタルデータに変換された画像データ、表示部１２８への表示用の画像データ、顔検出部１２５による人物の顔領域の検出結果などを格納する。メモリ１３２は、所定枚数の静止画や所定時間の動画及び音声を格納するのに十分な記憶容量を有する。また、メモリ１３２は、画像表示用のメモリ（ビデオメモリ）を兼ねている。Ｄ／Ａ変換器１１３は、メモリ１３２に格納されている表示用の画像データをアナログ信号に変換して表示部１２８に供給する。これにより、表示部１２８により画像が表示される。 The memory 132 is image data obtained by the imaging unit 122 and converted into digital data by the A / D converter 123, image data for display on the display unit 128, detection result of a human face area by the face detection unit 125, and the like. Is stored. The memory 132 has a storage capacity sufficient to store a predetermined number of still images and a predetermined time of moving images and audio. The memory 132 also serves as an image display memory (video memory). The D / A converter 113 converts the display image data stored in the memory 132 into an analog signal and supplies the analog signal to the display unit 128. Thereby, an image is displayed on the display unit 128.

表示部１２８は、ＬＣＤ等の表示器に、Ｄ／Ａ変換器１１３からのアナログ信号に応じた表示（静止画、動画）を行う。Ａ／Ｄ変換器１２３によって一度Ａ／Ｄ変換されメモリ１３２に蓄積されたデジタル信号（画像データ）をＤ／Ａ変換器１１３においてアナログ信号に変換し、表示部１２８に逐次転送して表示する。これにより、表示部１２８は電子ビューファインダとして機能し、スルー画像表示を行うことができる。 The display unit 128 performs display (still image, moving image) according to the analog signal from the D / A converter 113 on a display device such as an LCD. A digital signal (image data) once A / D converted by the A / D converter 123 and stored in the memory 132 is converted into an analog signal by the D / A converter 113 and sequentially transferred to the display unit 128 for display. Thereby, the display unit 128 functions as an electronic viewfinder, and can perform through image display.

不揮発性メモリ１５６は、電気的に消去／記録が可能なメモリであり、例えばＦＲＯＭ等が用いられる。不揮発性メモリ１５６には、システム制御部１５０の動作用の定数、撮像装置１００で撮影した人物の表情別評価値の算出に用いる表情別の統計データ、プログラム等が記憶される。ここで言うプログラムとは、後述する各フローチャートに示す処理を実行するためのプログラムのことである。また、表情別の統計データとは、人物の例えば笑顔、怒り顔、泣き顔等の表情毎に顔の特徴量が変化する傾向を予め記録したデータである。 The nonvolatile memory 156 is an electrically erasable / recordable memory, and for example, a FROM or the like is used. The non-volatile memory 156 stores constants for operation of the system control unit 150, statistical data for each facial expression used to calculate an evaluation value for each facial expression photographed by the imaging apparatus 100, a program, and the like. The program referred to here is a program for executing processing shown in each flowchart described later. Further, the statistical data for each facial expression is data in which a tendency of the facial feature amount to change for each facial expression of a person such as a smile, an angry face, and a crying face is recorded in advance.

システム制御部１５０は、撮像装置全体を制御するものであり、不揮発性メモリ１５６に記録されたプログラムを実行することで、後述の各実施形態の処理を実現する。また、システム制御部１５０は、メモリ１３２、Ｄ／Ａ変換器１１３、表示部１２８等を制御することにより表示制御も行う。システム制御部１５０は、不揮発性メモリ１５６に格納されたプログラムに基づき後述する各フローチャートに示す処理を実行する。 The system control unit 150 controls the entire imaging apparatus, and implements processing of each embodiment described later by executing a program recorded in the nonvolatile memory 156. The system control unit 150 also performs display control by controlling the memory 132, the D / A converter 113, the display unit 128, and the like. The system control unit 150 executes processing shown in each flowchart described later based on a program stored in the nonvolatile memory 156.

システム制御部１５０は、本発明の判別手段、選択手段、制御手段、表情別割合算出手段、フレーム選択手段として機能する。システムメモリ１５２は、ＲＡＭが用いられ、システム制御部１５０の動作用の定数、変数、不揮発性メモリ１５６から読み出したプログラム等を展開する。 The system control unit 150 functions as a determination unit, a selection unit, a control unit, a facial expression ratio calculation unit, and a frame selection unit of the present invention. The system memory 152 uses a RAM and develops constants and variables for operation of the system control unit 150, a program read from the nonvolatile memory 156, and the like.

モード切替スイッチ１６０、シャッタボタン１６１、第１シャッタスイッチ１６２、第２シャッタスイッチ１６３、操作部１７０は、システム制御部１５０に各種の動作指示を入力するための操作手段である。モード切替スイッチ１６０は、システム制御部１５０の動作モードを、静止画を記録媒体１０４に記録する静止画記録モード、動画を記録媒体１０４に記録する動画記録モード、静止画／動画を表示部１２８に表示する再生モード等のいずれかに切り替える。 The mode switch 160, the shutter button 161, the first shutter switch 162, the second shutter switch 163, and the operation unit 170 are operation means for inputting various operation instructions to the system control unit 150. The mode changeover switch 160 sets the operation mode of the system control unit 150 to a still image recording mode for recording a still image on the recording medium 104, a moving image recording mode for recording a moving image on the recording medium 104, and a still image / moving image on the display unit 128. Switch to one of the playback modes to display.

第１シャッタスイッチ１６２は、シャッタボタン１６１の操作途中いわゆる半押し（撮影準備指示）でＯＮとなり、第１シャッタスイッチ信号ＳＷ１を発生する。システム制御部１５０は、前記信号ＳＷ１により、ＡＦ処理、ＡＥ処理、ＡＷＢ処理、ＥＦ処理等の動作を開始する。第２シャッタスイッチ１６３は、シャッタボタン１６１の操作完了いわゆる全押し（撮影指示）でＯＮとなり、第２シャッタスイッチ信号ＳＷ２を発生する。システム制御部１５０は、前記信号ＳＷ２により、撮像部１２２からの信号読み出しから記録媒体１０４に画像データを書き込むまでの一連の撮影処理の動作を開始する。 The first shutter switch 162 is turned on when the shutter button 161 is half-pressed (shooting preparation instruction) during operation of the shutter button 161, and generates a first shutter switch signal SW1. The system control unit 150 starts operations such as AF processing, AE processing, AWB processing, and EF processing in response to the signal SW1. The second shutter switch 163 is turned on when the operation of the shutter button 161 is completed, that is, when it is fully pressed (shooting instruction), and generates a second shutter switch signal SW2. Based on the signal SW2, the system control unit 150 starts a series of photographing processing operations from reading a signal from the imaging unit 122 to writing image data in the recording medium 104.

操作部１７０の各操作部材は、表示部１２８に表示される種々の機能アイコンを選択操作することなどにより、場面ごとに適宜機能が割り当てられ、各種機能ボタンとして作用する。機能ボタンとしては、例えば終了ボタン、戻るボタン、画像送りボタン、ジャンプボタン、絞込みボタン、属性変更ボタン等がある。例えばメニューボタンが押されると各種の設定可能なメニュー画面が表示部１２８に表示される。利用者は、表示部１２８に表示されたメニュー画面と４方向ボタンやＳＥＴボタンとを用いて直感的に各種設定を行うことができる。 Each operation member of the operation unit 170 is appropriately assigned a function for each scene by selecting and operating various function icons displayed on the display unit 128, and functions as various function buttons. Examples of the function buttons include an end button, a return button, an image advance button, a jump button, a narrowing button, and an attribute change button. For example, when a menu button is pressed, various setting menu screens are displayed on the display unit 128. The user can make various settings intuitively using the menu screen displayed on the display unit 128 and the four-way button or the SET button.

電源制御部１８０は、電池検出回路、ＤＣ−ＤＣコンバータ、通電するブロックを切り替えるスイッチ回路等により構成され、電池の装着の有無、電池の種類、電池残量の検出を行う。また、電源制御部１８０は、その検出結果及びシステム制御部１５０の指示に基づいてＤＣ−ＤＣコンバータを制御し、記録媒体１０４を含む各部へ必要な電圧を必要な期間供給する。電源部１３０は、アルカリ電池やリチウム電池等の一次電池、ＮｉＣｄ電池やＮｉＭＨ電池やＬｉ電池等の二次電池、ＡＣアダプタ等から構成されている。記録媒体Ｉ／Ｆ１１８は、記録媒体１０４とのインタフェースを司る。 The power control unit 180 includes a battery detection circuit, a DC-DC converter, a switch circuit that switches a block to be energized, and the like, and detects whether or not a battery is attached, the type of battery, and the remaining battery level. The power supply control unit 180 controls the DC-DC converter based on the detection result and an instruction from the system control unit 150, and supplies necessary voltages to each unit including the recording medium 104 for a necessary period. The power supply unit 130 includes a primary battery such as an alkaline battery or a lithium battery, a secondary battery such as a NiCd battery, a NiMH battery, or a Li battery, an AC adapter, or the like. The recording medium I / F 118 serves as an interface with the recording medium 104.

記録媒体１０４は、メイン撮像ユニット１９０及びサブ撮像ユニット１９１により撮影された映像（メイン映像とメイン映像に対応付けられた複数のサブ映像など）を記録するものであり、半導体メモリまたは磁気ディスク等から構成されている。ここで、メイン映像は、被写体をメイン撮像ユニット１９０により撮影して得た映像である。サブ映像は、メイン映像を再生して表示部１２８により表示した際に該メイン映像の閲覧者をサブ撮像ユニット１９１により撮影して得た映像である。 The recording medium 104 records a video (such as a plurality of sub-videos associated with the main video and the main video) taken by the main imaging unit 190 and the sub-imaging unit 191, and is recorded from a semiconductor memory or a magnetic disk. It is configured. Here, the main video is a video obtained by shooting the subject with the main imaging unit 190. The sub video is a video obtained by photographing the viewer of the main video by the sub imaging unit 191 when the main video is reproduced and displayed on the display unit 128.

尚、図１に示す撮像装置はデジタルカメラにおいて本発明を実施した場合の構成例であり、以下に説明する動作を実行できるものであれば、図１に示す構成に限定されるものではない。即ち、メイン映像と該メイン映像に対応付けられた複数のサブ映像が記録された記録媒体を備えるものであれば、スマートフォンやパーソナルコンピュータ等でも撮像装置と同様に本発明を実施可能である。 The imaging apparatus shown in FIG. 1 is a configuration example when the present invention is implemented in a digital camera, and is not limited to the configuration shown in FIG. 1 as long as the operation described below can be executed. In other words, the present invention can be implemented in the same manner as the imaging apparatus even in a smartphone, a personal computer, or the like as long as it has a recording medium on which a main video and a plurality of sub-videos associated with the main video are recorded.

次に、本実施形態に係る撮像装置の再生モード時の処理について図２のフローチャートを参照して詳細に説明する。 Next, processing in the playback mode of the imaging apparatus according to the present embodiment will be described in detail with reference to the flowchart of FIG.

本発明の目的（メイン映像に対応付けられた複数のサブ映像から好適なサブ映像を自動的に選択可能とする）を踏まえ、図１の撮像装置によりメイン映像が撮影されており、メイン映像の再生及びサブ映像の撮影が複数回実行されている場合を例に説明する。従って、予めメイン映像及びメイン映像に対応付けられた複数のサブ映像が記録媒体１０４（記録手段）に記録されているものとする。 In view of the object of the present invention (a suitable sub video can be automatically selected from a plurality of sub videos associated with the main video), the main video is shot by the imaging apparatus of FIG. A case where reproduction and sub-video shooting are executed a plurality of times will be described as an example. Accordingly, it is assumed that a main video and a plurality of sub-videos associated with the main video are recorded in the recording medium 104 (recording unit) in advance.

図２は、本実施形態に係る撮像装置の再生モード時の処理を示すフローチャートである。図２において、システム制御部１５０は再生モードを開始すると、操作部１７０からのユーザ入力に基づき、記録媒体１０４に記録されている複数のメイン映像から１つのメイン映像を選択する（ステップＳ２０１）。次に、システム制御部１５０は選択されたメイン映像を表示部１２８により表示する（ステップＳ２０２）。次に、システム制御部１５０は記録媒体１０４に記録されている複数のサブ映像のうち１つのサブ映像に対して、顔検出部１２５によりサブ映像の人物の顔を検出する顔検出処理を行う（ステップＳ２０３）。 FIG. 2 is a flowchart showing processing in the playback mode of the imaging apparatus according to the present embodiment. In FIG. 2, when starting the playback mode, the system control unit 150 selects one main video from a plurality of main videos recorded on the recording medium 104 based on a user input from the operation unit 170 (step S201). Next, the system control unit 150 displays the selected main video on the display unit 128 (step S202). Next, the system control unit 150 performs face detection processing for detecting a human face in the sub video by the face detection unit 125 for one sub video among the plurality of sub videos recorded in the recording medium 104 ( Step S203).

次に、システム制御部１５０は上記顔検出処理を行ったサブ映像の人物の表情のうち１つに関して、表情別評価値算出部１２６により表情別に評価値を算出する表情別評価値算出を行う（ステップＳ２０４）。表情別評価値算出方法としては、例えば、特開２００５−３１５６６号公報に開示されているように、画像データ内の人物の顔の特徴点を検出し、検出された特徴点から人物の笑顔度を推定する技術が挙げられる。 Next, the system control unit 150 performs facial expression evaluation value calculation for the facial expression evaluation value calculation unit 126 to calculate an evaluation value for each facial expression with respect to one of the facial expressions of the sub-video person subjected to the face detection processing ( Step S204). As an evaluation value calculation method for each facial expression, for example, as disclosed in JP-A-2005-31566, feature points of a person's face in image data are detected, and the smile level of the person is detected from the detected feature points. A technique for estimating

本実施形態の表情別評価値算出は、メモリ１３２に書き込まれた人物の顔領域の検出結果と、不揮発性メモリ１５６に予め記憶されている表情別の統計データに基づき行われる。ここで表情別の統計データとは、人物の表情毎に顔の特徴点がどのように変化するかを表したデータであり、統計的な手法によって算出される。 The expression-specific evaluation value calculation of this embodiment is performed based on the detection result of the person's face area written in the memory 132 and the expression-specific statistical data stored in the nonvolatile memory 156 in advance. Here, the expression-specific statistical data is data representing how facial feature points change for each facial expression of a person, and is calculated by a statistical method.

次に、システム制御部１５０は上記１つのサブ映像について、全ての表情に関して表情別評価値の算出を行ったか否かを判定する（ステップＳ２０５）。全ての表情に関して表情別評価値の算出を行っていない場合は、システム制御部１５０はステップＳ２０４に戻り処理を繰り返す。全ての表情に関して表情別評価値の算出を行った場合は、システム制御部１５０は上記１つのサブ映像について、表情別に算出された全ての表情別評価値のうち表情別評価値が最大となる表情を当該サブ映像の表情として判別する（ステップＳ２０６）。 Next, the system control unit 150 determines whether or not the evaluation value for each facial expression has been calculated for all facial expressions for the one sub-video (step S205). If the facial expression evaluation values are not calculated for all facial expressions, the system control unit 150 returns to step S204 and repeats the processing. When the evaluation value for each facial expression is calculated for all facial expressions, the system control unit 150 performs the facial expression with the maximum evaluation value for each facial expression among all the evaluation values for each facial expression calculated for each facial expression. Is determined as the facial expression of the sub-picture (step S206).

次に、システム制御部１５０は全てのサブ映像に対して表情別評価値算出及び表情判別を行ったか否かを判定する（ステップＳ２０７）。全てのサブ映像に対して表情別評価値算出及び表情判別を行っていない場合は、システム制御部１５０はステップＳ２０３に戻り一連の処理を繰り返す。全てのサブ映像に対して表情別評価値算出及び表情判別を行った場合は、システム制御部１５０は表情別評価値が予め定めた閾値よりも高い複数のサブ映像から、表情別評価値が高い順に予め定めた数だけサブ映像を優先的に選択する（ステップＳ２０８）。 Next, the system control unit 150 determines whether or not the facial expression-based evaluation value calculation and facial expression determination have been performed on all sub-videos (step S207). If the evaluation value by facial expression and the facial expression determination are not performed for all the sub-videos, the system control unit 150 returns to step S203 and repeats a series of processes. When facial expression-based evaluation values are calculated and facial expressions are determined for all sub-videos, the system control unit 150 has high facial expression-based evaluation values from a plurality of sub-videos whose facial expression-specific evaluation values are higher than a predetermined threshold. A predetermined number of sub-videos are preferentially selected in order (step S208).

次に、システム制御部１５０は上記表情別評価値が高い順に選択したサブ映像を表示部１２８により表示する（ステップＳ２０９）。次に、システム制御部１５０は撮像装置を操作している人物の映像をサブ撮像ユニット１９１により撮影し、サブ映像として記録媒体１０４に記録し（ステップＳ２１０）、本処理を終了する。 Next, the system control unit 150 causes the display unit 128 to display the sub-images selected in descending order of the evaluation value by facial expression (step S209). Next, the system control unit 150 captures an image of the person who is operating the image capturing device with the sub image capturing unit 191 and records the image as a sub image on the recording medium 104 (step S210), and ends the present process.

以上、撮像装置のシステム制御部１５０の制御により上記図２に示した手順を実行することで、表情別評価値の高い人物（メイン映像の閲覧者）を優先的に表示部１２８に表示することができる。 As described above, by performing the procedure shown in FIG. 2 under the control of the system control unit 150 of the imaging apparatus, a person with a high facial expression evaluation value (viewer of the main video) is preferentially displayed on the display unit 128. Can do.

図５（ａ）は、上記手順に基づいて行った処理の一例を表したものであり、メイン映像に対する閲覧者の顔と表情別評価値と表情判別結果を示す図である。図５（ａ）において、メイン映像に対する閲覧者である人物Ａから人物Ｆの６人について、笑顔、泣き顔、怒り顔の表情別に算出した表情別評価値と、３人が笑顔、２人が泣き顔、１人が怒り顔と判別した表情判別結果とを示している。この場合、表示部１２８に表示するサブ映像を３つとすると、表情別評価値が高い人物Ａ、人物Ｃ、人物Ｄが選択され表示される。 FIG. 5A illustrates an example of processing performed based on the above-described procedure, and is a diagram illustrating a viewer's face, facial expression evaluation value, and facial expression discrimination result for the main video. In FIG. 5 (a), with respect to six persons from person A to person F who are viewers of the main video, the evaluation value for each expression calculated for each expression of smile, crying face, and angry face, three smiling faces, two crying faces 1 shows a facial expression discrimination result determined by one person as an angry face. In this case, assuming that there are three sub-images displayed on the display unit 128, a person A, a person C, and a person D with high facial expression evaluation values are selected and displayed.

図６は、メイン映像とメイン映像に関連付けられた複数のサブ映像の表示例を示す図である。図６において、表示部１２８の映像表示領域６００は、メイン映像及びメイン映像に関連付けられたいくつかのサブ映像を表示する領域である。映像表示領域６００は、メイン映像表示領域６０１、第１サブ映像表示領域６０２、第２サブ映像表示領域６０３、第３サブ映像表示領域６０４から構成される。ただし、記録媒体１０４にはメイン映像に関連付けられたサブ映像は３つより多く記録されており、表示された３つのサブ映像は記録媒体１０４から自動的に選択されている。 FIG. 6 is a diagram illustrating a display example of a main video and a plurality of sub-videos associated with the main video. In FIG. 6, a video display area 600 of the display unit 128 is an area for displaying a main video and several sub-videos associated with the main video. The video display area 600 includes a main video display area 601, a first sub video display area 602, a second sub video display area 603, and a third sub video display area 604. However, more than three sub-videos associated with the main video are recorded on the recording medium 104, and the three displayed sub-videos are automatically selected from the recording medium 104.

上述したように本実施形態によれば、複数のサブ映像から表情別評価値を算出し、表情別評価値に基づきサブ映像の表情を判別し、サブ映像の表情に基づき該当するサブ映像を選択して表示する。これにより、メイン映像に対応付けられた複数のサブ映像からユーザにとって好適なサブ映像を自動的に選択することが可能となる。 As described above, according to this embodiment, the evaluation value for each facial expression is calculated from a plurality of sub-videos, the facial expression of the sub-video is determined based on the evaluation value for each facial expression, and the corresponding sub-video is selected based on the facial expression of the sub-video. And display. Thereby, it is possible to automatically select a sub video suitable for the user from a plurality of sub videos associated with the main video.

〔第２実施形態〕
本発明の第２実施形態は、上記第１実施形態に対して下記で説明する点において相違する。本実施形態のその他の要素は、上記第１実施形態（図１）の対応するものと同一であるため説明を省略する。 [Second Embodiment]
The second embodiment of the present invention differs from the first embodiment in the points described below. Other elements of the present embodiment are the same as the corresponding ones of the first embodiment (FIG. 1), and thus description thereof is omitted.

次に、本実施形態に係る撮像装置の再生モード時の処理について図３のフローチャートを参照して詳細に説明する。 Next, processing in the playback mode of the imaging apparatus according to the present embodiment will be described in detail with reference to the flowchart of FIG.

図３は、本実施形態に係る撮像装置の再生モード時の処理を示すフローチャートである。図３において、ステップＳ３０１〜ステップＳ３０７の処理は図２のステップＳ２０１〜ステップＳ２０７の処理と同様であるため説明を省略する。ステップＳ３０１〜ステップＳ３０７の処理の後、システム制御部１５０は全てのサブ映像の表情判別結果に基づき、全てのサブ映像に対して表情別割合を算出する（ステップＳ３０８）。ここで、表情別割合とは、特定の表情であると判別されたサブ映像の全てのサブ映像に占める割合である。 FIG. 3 is a flowchart showing processing in the reproduction mode of the imaging apparatus according to the present embodiment. In FIG. 3, the processing from step S301 to step S307 is the same as the processing from step S201 to step S207 in FIG. After the processes in steps S301 to S307, the system control unit 150 calculates the expression-specific ratios for all the sub videos based on the facial expression discrimination results for all the sub videos (step S308). Here, the expression-specific ratio is the ratio of the sub-video determined to be a specific facial expression to all the sub-videos.

例えば、あるメイン画像に対応して記録媒体１０４に記録されているサブ映像が６つあり、表情判別結果が笑顔３つ、泣き顔２つ、怒り顔１つであった場合、表情別割合は次のように算出される。即ち、表情別割合は、笑顔５０％（＝３／６）、泣き顔３３％（≒２／６）、怒り顔１６％（≒１／６）となる。 For example, if there are six sub-videos recorded on the recording medium 104 corresponding to a certain main image, and the facial expression discrimination results are three smiles, two crying faces, and one angry face, the ratio by facial expression is as follows: It is calculated as follows. That is, the expression-specific ratios are 50% smile (= 3/6), 33% crying face (≈2 / 6), and 16% angry face (≈1 / 6).

次に、システム制御部１５０は最も表情別割合が高い表情から予め定めた数だけ表示部１２８に表示するサブ映像を優先的に選択する（ステップＳ３０９）。この場合、同じ表情内ではその表情の表情別評価値が高い順にサブ映像を選択する。以下のステップＳ３１０〜ステップＳ３１１の処理は図２のステップＳ２０９〜ステップＳ２１０の処理と同様であるため説明を省略する。 Next, the system control unit 150 preferentially selects a predetermined number of sub-images to be displayed on the display unit 128 from the facial expressions having the highest expression-specific ratio (step S309). In this case, sub-pictures are selected in descending order of evaluation value for each facial expression within the same facial expression. Since the processes in steps S310 to S311 below are the same as the processes in steps S209 to S210 in FIG.

以上、撮像装置のシステム制御部１５０の制御により上記図３に示した手順を実行することで、メイン映像と関係ある表情をした人物（閲覧者）を表示部１２８に表示することができる。 As described above, by executing the procedure shown in FIG. 3 under the control of the system control unit 150 of the imaging apparatus, a person (viewer) who has a facial expression related to the main video can be displayed on the display unit 128.

図５（ｂ）は、上記手順に基づいて行った処理の一例を表したものであり、メイン映像に対する閲覧者の顔と表情別評価値と表情判別結果を示す図である。図５（ｂ）において、メイン映像に対する閲覧者である人物Ａから人物Ｆの６人について、笑顔、泣き顔、怒り顔の表情別に算出した表情別評価値と、５人が笑顔、１人が怒り顔と判別した表情判別結果とを示している。この場合、表示部１２８に表示するサブ映像を３つとすると、最も表情別割合が大きい笑顔の５人から、表情別評価値が高い人物Ａ、人物Ｂ、人物Ｃのサブ映像が選択され表示される。 FIG. 5B illustrates an example of processing performed based on the above-described procedure, and is a diagram illustrating an evaluation value for each face, facial expression, and facial expression discrimination result for the main video. In FIG. 5 (b), the evaluation value according to expression calculated for each expression of smiles, crying faces, and angry faces for six persons A to F who are viewers of the main video, five persons are smiling, one person is angry An expression discrimination result determined as a face is shown. In this case, assuming that there are three sub-images to be displayed on the display unit 128, the sub-images of the person A, person B, and person C with the highest evaluation value by expression are selected and displayed from the five smiling faces with the highest ratio by expression. The

上述したように本実施形態によれば、メイン映像に対応付けられた複数のサブ映像からユーザにとって好適なサブ映像を自動的に選択することが可能となる。 As described above, according to the present embodiment, it is possible to automatically select a sub video suitable for the user from a plurality of sub videos associated with the main video.

〔第３実施形態〕
本発明の第３実施形態は、上記第１実施形態に対して下記で説明する点において相違する。本実施形態のその他の要素は、上記第１実施形態（図１）の対応するものと同一であるため説明を省略する。 [Third Embodiment]
The third embodiment of the present invention differs from the first embodiment in the points described below. Other elements of the present embodiment are the same as the corresponding ones of the first embodiment (FIG. 1), and thus description thereof is omitted.

次に、本実施形態に係る撮像装置の再生モード時の処理について図４のフローチャートを参照して詳細に説明する。 Next, processing in the playback mode of the imaging apparatus according to the present embodiment will be described in detail with reference to the flowchart of FIG.

図４は、本実施形態に係る撮像装置の再生モード時の処理を示すフローチャートである。図４において、ステップＳ４０１〜ステップＳ４０８の処理は図３のステップＳ３０１〜ステップＳ３０８の処理と同様であるため説明を省略する。ステップＳ４０１〜ステップＳ４０８の処理の後、システム制御部１５０は最も表情別割合が大きい表情について、表情別割合が予め定めた閾値を超えるか否かを判定する（ステップＳ４０９）。表情別割合が予め定めた閾値を超える場合は、システム制御部１５０は表情別評価値が前記閾値よりも高い複数のサブ映像から、表情別評価値が高い順に予め定めた数だけのサブ映像を優先的に選択する（ステップＳ４１０）。 FIG. 4 is a flowchart illustrating processing in the playback mode of the imaging apparatus according to the present embodiment. In FIG. 4, the processing from step S401 to step S408 is the same as the processing from step S301 to step S308 in FIG. After the processing in steps S401 to S408, the system control unit 150 determines whether the facial expression ratio exceeds a predetermined threshold for facial expressions having the largest facial expression ratio (step S409). When the facial expression ratio exceeds a predetermined threshold, the system control unit 150 displays a predetermined number of sub-videos in descending order of the facial expression evaluation value from a plurality of sub-videos whose facial expression evaluation values are higher than the threshold. Select preferentially (step S410).

表情別割合が予め定めた閾値を超えない場合は、システム制御部１５０は表情別評価値が前記閾値よりも高い複数の表情から、予め定めた数だけ表示するサブ映像を優先的に選択する（ステップＳ４１１）。この場合、複数の表情内では表情別評価値が高い順にサブ映像を選択する。以下のステップＳ４１２〜ステップＳ４１３の処理は図３のステップＳ３１０〜ステップＳ３１１の処理と同様であるため説明を省略する。 When the facial expression ratio does not exceed a predetermined threshold, the system control unit 150 preferentially selects a predetermined number of sub-images to be displayed from a plurality of facial expressions whose facial expression evaluation value is higher than the threshold. Step S411). In this case, sub-images are selected in descending order of the evaluation value for each facial expression within a plurality of facial expressions. Since the processes in steps S412 to S413 below are the same as the processes in steps S310 to S311 in FIG.

上記ステップＳ４１０においては、ステップＳ４０９の判別結果から、メイン映像に対する閲覧者の表情が一意に決まる場合と考えられる。また、ステップＳ４１１においては、メイン映像に対する閲覧者の表情が一意に決まらない場合と考えられる。従って、上述の方法でサブ映像選択方法を切り替える。即ち、メイン映像に対応付けられた複数のサブ映像から表情別評価値が高いサブ映像を優先的に選択するか、表情別割合が高いサブ映像を優先的に選択するか、を切り替える。これにより、メイン映像に応じてより好ましい閲覧者を選択し表示することができる。 In step S410, it is considered that the viewer's facial expression for the main video is uniquely determined from the determination result in step S409. In step S411, it is considered that the viewer's facial expression for the main video is not uniquely determined. Therefore, the sub video selection method is switched by the above-described method. That is, switching between preferentially selecting a sub-video with a high expression-specific evaluation value or preferentially selecting a sub-video with a high expression-specific ratio from a plurality of sub-videos associated with the main video. Thereby, a more preferable viewer can be selected and displayed according to the main video.

〔第４実施形態〕
本発明の第４実施形態は、上記第１実施形態に対して下記で説明する点において相違する。本実施形態のその他の要素は、上記第１実施形態（図１）の対応するものと同一であるため説明を省略する。 [Fourth Embodiment]
The fourth embodiment of the present invention is different from the first embodiment in the points described below. Other elements of the present embodiment are the same as the corresponding ones of the first embodiment (FIG. 1), and thus description thereof is omitted.

本実施形態の撮像装置の画像処理部１２４は、上記第１実施形態で説明した画質調整処理された画像信号から静止画データを生成する処理に加えて、更に、画質調整処理された複数のフレームの画像信号から動画データを生成する処理を行う。ここで、画像処理部１２４は、動画データの各フレームをフレーム内符号化して圧縮符号化された動画データを生成してもよい。また、動画データの複数のフレーム間での差分や動き予測などを利用して圧縮符号化された動画データを生成してもよい。例えばMotion JPEG、MPEG、H.264（MPEG4-Part10 AVC）等の様々な公知の圧縮符号化方式の動画データを生成することができる。 In addition to the process of generating still image data from the image signal subjected to the image quality adjustment process described in the first embodiment, the image processing unit 124 of the imaging apparatus according to the present embodiment further includes a plurality of frames subjected to the image quality adjustment process. A process of generating moving image data from the image signal is performed. Here, the image processing unit 124 may generate moving image data that is compression-coded by intra-frame encoding each frame of the moving image data. Moreover, you may produce | generate the moving image data compression-coded using the difference between several flame | frames of moving image data, a motion estimation, etc. For example, it is possible to generate moving image data of various known compression encoding methods such as Motion JPEG, MPEG, and H.264 (MPEG4-Part10 AVC).

一般に、フレーム内符号化されたフレーム画像データをＩピクチャーと呼ぶ。また、前方のフレームとの差分を用いてフレーム間符号化された画像データをＰピクチャーと呼ぶ。また、前方後方のフレームとの差分を用いてフレーム間符号化された画像データをＢピクチャーと呼ぶ。尚、これらの圧縮方式は、公知の圧縮方式を用いており、本発明の特徴とは関係ないので説明を省略する。 In general, frame image data subjected to intra-frame coding is called an I picture. In addition, image data inter-frame encoded using a difference from the previous frame is called a P picture. In addition, image data that is inter-frame encoded using a difference between the front and rear frames is called a B picture. Note that these compression methods use known compression methods and are not related to the characteristics of the present invention, and thus the description thereof is omitted.

システム制御部１５０は、これらの動画データ及び図示しない音声データを合成することでデータストリームを形成し、データストリームを１つの動画ファイルとして記録媒体１０４に書き込んでいく。一方、再生モード時には、システム制御部１５０は、記録媒体１０４に記録された圧縮画像信号からなる静止画ファイルまたは圧縮画像信号と圧縮音声信号とからなる動画ファイルをメモリ１３２に読み出す。 The system control unit 150 synthesizes the moving image data and audio data (not shown) to form a data stream, and writes the data stream to the recording medium 104 as one moving image file. On the other hand, in the playback mode, the system control unit 150 reads out to the memory 132 a still image file composed of a compressed image signal recorded on the recording medium 104 or a moving image file composed of a compressed image signal and a compressed audio signal.

システム制御部１５０は読み出された圧縮画像信号と圧縮音声信号を、画像処理部１２４及び音声処理部（不図示）に送る。画像処理部１２４は、圧縮画像信号を一時的に記憶させたメモリ１３２から公知の所定の手順で復号する。そして、メモリ制御部１１５は、復号化した画像信号を顔検出部１２５や表情別評価値算出部１２６に送信する。これにより、表情別に評価値算出処理が実行される。 The system control unit 150 sends the read compressed image signal and compressed audio signal to the image processing unit 124 and an audio processing unit (not shown). The image processing unit 124 decodes the compressed image signal from the memory 132 in which the compressed image signal is temporarily stored by a known predetermined procedure. Then, the memory control unit 115 transmits the decoded image signal to the face detection unit 125 and the facial expression evaluation value calculation unit 126. Thereby, the evaluation value calculation process is executed for each facial expression.

次に、本実施形態に係る撮像装置の再生モード時の処理について図７を参照して詳細に説明する。 Next, processing in the playback mode of the imaging apparatus according to the present embodiment will be described in detail with reference to FIG.

上記第１乃至第３実施形態では、サブ映像が静止画データである場合について説明した。これに対し、本実施形態では、メイン映像及びサブ映像が特に動画データである場合について説明する。即ち、記録媒体１０４に記録されたサブ映像の動画データから少なくとも１つ以上のフレーム（具体的には予め定めた時間毎に複数のフレーム）を選択し、表情別評価値の算出、表情の判別、閲覧者を選択する処理について説明する。 In the first to third embodiments, the case where the sub video is still image data has been described. On the other hand, in the present embodiment, a case where the main video and the sub video are moving image data will be described. That is, at least one frame (specifically, a plurality of frames at predetermined time intervals) is selected from the moving image data of the sub video recorded on the recording medium 104, the evaluation value for each facial expression is calculated, and the facial expression is discriminated. A process for selecting a viewer will be described.

図７（ａ）、図７（ｂ）は、本実施形態に係る撮像装置のサブ映像から閲覧者を選択する処理を説明する図である。図７（ａ）は、メイン映像に対応する閲覧者（人物Ａ）の動画データを示している。図７（ｂ）はメイン映像に対応する閲覧者（人物Ｂ）の動画データを示している。図７（ａ）、図７（ｂ）において、横軸は動画データの経過時間であり、縦軸はある時間における表情別評価値である。ｔ１〜ｔ５は表情別評価値を取得するある時刻を示しており、例えば１分ごとに動画データの表情別評価値を取得する。この時間間隔は固定であってもよいし、動画の記録時間等によって変化させてもよい。 FIG. 7A and FIG. 7B are diagrams illustrating processing for selecting a viewer from the sub video of the imaging apparatus according to the present embodiment. FIG. 7A shows moving image data of a viewer (person A) corresponding to the main video. FIG. 7B shows moving image data of a viewer (person B) corresponding to the main video. 7A and 7B, the horizontal axis represents the elapsed time of the moving image data, and the vertical axis represents the evaluation value for each expression at a certain time. t1 to t5 indicate a certain time when the facial expression evaluation value is acquired. For example, the facial expression evaluation value of the moving image data is acquired every minute. This time interval may be fixed or may be changed depending on the recording time of the moving image.

時刻ｔ１（＝０）における人物Ａの動画データの１フレームを復号化した画像信号に対して取得した笑顔に対する表情別評価値がｅ１ａとなる。また、時刻ｔ１（＝０）における人物Ｂに対して取得した笑顔に対する表情別評価値がｅ１ｂとなる。この場合、他の表情に対してもそれぞれ表情別評価値を取得する。時刻ｔ１（＝０）における表情別評価値の取得を予めメイン映像に対する複数のサブ映像が記録されているもの全てについて実行する。 The expression-specific evaluation value for a smile obtained for an image signal obtained by decoding one frame of the moving image data of the person A at time t1 (= 0) is e1a. Further, the evaluation value classified by facial expression for the smile obtained for the person B at time t1 (= 0) is e1b. In this case, the evaluation value for each facial expression is acquired for each of the other facial expressions. Acquisition of the evaluation value for each facial expression at time t1 (= 0) is executed for all of the records in which a plurality of sub-videos for the main video are recorded in advance.

これらの表情別評価値について、上記第１乃至第３実施形態に示した方法を用いて、表示部１２８に表示するサブ映像を選択する。そして、選択された複数のサブ映像を表示例として上記図６に示したように表示部１２８に表示する。この場合、表示部１２８に表示する画像としては、メイン映像に対応する閲覧者の動画を表示してもよいし、表情別評価値を取得した静止画を表示してもよい。 For these facial expression evaluation values, a sub-video to be displayed on the display unit 128 is selected using the method described in the first to third embodiments. Then, the plurality of selected sub-videos are displayed on the display unit 128 as shown in FIG. 6 as a display example. In this case, as an image to be displayed on the display unit 128, a viewer's moving image corresponding to the main video may be displayed, or a still image obtained by obtaining an expression-specific evaluation value may be displayed.

表示部１２８に表示するサブ映像の選択結果は、次の時刻ｔ２まで維持する。即ち、表情別評価値は次の時刻ｔ２まで有効とする。時刻ｔ２において、時刻ｔ１の処理と同様に、メイン映像に対する複数の全てのサブ映像について、表情別評価値を取得して表示するサブ映像を選択し、選択したサブ映像を表示部１２８に表示する。時刻ｔ３以降についても、同様の処理を繰り返し実施する。 The selection result of the sub video to be displayed on the display unit 128 is maintained until the next time t2. That is, the facial expression evaluation value is valid until the next time t2. At time t2, as in the processing at time t1, for each of a plurality of sub-videos with respect to the main video, a sub-video to be acquired and displayed is selected and displayed, and the selected sub-video is displayed on the display unit 128. . The same processing is repeated after time t3.

上述の方法でサブ映像選択方法を切り替えることで、メイン映像及びサブ映像が特に動画データである場合でも、メイン映像に応じてより好ましい閲覧者を選択し表示することができる。 By switching the sub video selection method by the above-described method, a more preferable viewer can be selected and displayed according to the main video even when the main video and the sub video are moving image data.

〔第５実施形態〕
本発明の第５実施形態は、上記第１実施形態に対して下記で説明する点において相違する。本実施形態のその他の要素は、上記第１実施形態（図１）の対応するものと同一であるため説明を省略する。 [Fifth Embodiment]
The fifth embodiment of the present invention differs from the first embodiment in the points described below. Other elements of the present embodiment are the same as the corresponding ones of the first embodiment (FIG. 1), and thus description thereof is omitted.

次に、本実施形態に係る撮像装置の再生モード時の処理について図８を参照して詳細に説明する。本実施形態では、閲覧者を選択する処理が上記第４実施形態とは異なるので、その内容について説明する。 Next, processing in the playback mode of the imaging apparatus according to the present embodiment will be described in detail with reference to FIG. In the present embodiment, since the process of selecting a viewer is different from that in the fourth embodiment, the contents thereof will be described.

図８（ａ）、図８（ｂ）は、本実施形態に係る撮像装置のサブ映像から閲覧者を選択する処理を説明する図である。図８（ａ）は、メイン映像に対応する閲覧者（人物Ａ）の動画データを示している。図８（ｂ）は、メイン映像に対応する閲覧者（人物Ｂ）の動画データを示している。図８（ａ）、図８（ｂ）において、ｔ１〜ｔ５は表情別評価値を取得するある時刻を示しており、予め全ての設定された時刻における表情別評価値を取得する。そして、取得した全ての表情別評価値を比較して最大値を算出する。 FIG. 8A and FIG. 8B are diagrams illustrating processing for selecting a viewer from the sub video of the imaging apparatus according to the present embodiment. FIG. 8A shows moving image data of a viewer (person A) corresponding to the main video. FIG. 8B shows moving image data of a viewer (person B) corresponding to the main video. 8 (a) and 8 (b), t1 to t5 indicate certain times at which facial expression evaluation values are acquired, and facial expression evaluation values at all preset times are acquired in advance. And all the acquired evaluation values classified by expression are compared, and the maximum value is calculated.

図８（ａ）では、人物Ａの動画データの１フレームを復号化した画像信号に対して取得した笑顔に対する複数の表情別評価値の最大値が、ｅａｍａｘ（時刻ｔ２）である。また、図８（ｂ）では、人物Ｂの動画データの１フレームを復号化した画像信号に対して取得した笑顔に対する複数の表情別評価値の最大値が、ｅｂｍａｘ（時刻ｔ４）である。 In FIG. 8A, the maximum value of a plurality of expression-specific evaluation values for a smile acquired for an image signal obtained by decoding one frame of the moving image data of the person A is eamax (time t2). Further, in FIG. 8B, the maximum value of the plurality of facial expression evaluation values for a smile acquired for an image signal obtained by decoding one frame of the moving image data of the person B is ebmax (time t4).

複数のサブ映像に対する表情別評価値の最大値について、上記第１乃至第３実施形態に示した方法を用いて、表示部１２８に表示するサブ映像を選択する。即ち、記録媒体１０４に記録された複数のサブ映像から、表情別評価値算出部１２６により算出された複数の表情別評価値のうち表情別評価値が最大値となるサブ映像を選択する。 For the maximum evaluation value for each sub-image, the sub-image to be displayed on the display unit 128 is selected using the method shown in the first to third embodiments. In other words, the sub-image having the maximum evaluation value by facial expression is selected from the plurality of evaluation values by facial expression calculated by the expression-specific evaluation value calculation unit 126 from the plurality of sub-videos recorded on the recording medium 104.

例えば、メイン映像に対する閲覧者において「笑顔」「泣き顔」「怒り顔」の各表情別評価値のうち「笑顔」の表情別評価値が最大値である場合は、「笑顔」をサブ映像として選択する。具体的には、図５（ａ）の人物Ｃの場合は、表情別評価値が「笑顔」100、「泣き顔」25、「怒り顔」0であるため、表情別評価値が最大値となる「笑顔」が表情判別結果となる（サブ映像として選択する）。以降は上記各実施形態と同様であるので説明を省略する。 For example, if the viewer for the main video has the highest evaluation value for each expression of “smiling”, “crying face”, and “angry face”, the “smile” is the maximum value, and “smile” is selected as the sub video To do. Specifically, in the case of the person C in FIG. 5A, the evaluation value by expression is “smile” 100, “crying face” 25, and “angry face” 0, so the evaluation value by expression becomes the maximum value. “Smile” is the facial expression discrimination result (selected as a sub video). Since the subsequent steps are the same as those in the above embodiments, description thereof is omitted.

〔第６実施形態〕
本発明の第６実施形態は、上記第１実施形態に対して下記で説明する点において相違する。本実施形態のその他の要素は、上記第１実施形態（図１）の対応するものと同一であるため説明を省略する。 [Sixth Embodiment]
The sixth embodiment of the present invention is different from the first embodiment in the points described below. Other elements of the present embodiment are the same as the corresponding ones of the first embodiment (FIG. 1), and thus description thereof is omitted.

次に、本実施形態に係る撮像装置の再生モード時の処理について詳細に説明する。本実施形態では、閲覧者を選択する処理が上記第４及び第５実施形態とは異なるので、その内容について説明する。 Next, processing in the playback mode of the imaging apparatus according to the present embodiment will be described in detail. In the present embodiment, since the process of selecting a viewer is different from those in the fourth and fifth embodiments, the contents thereof will be described.

上記第４及び第５実施形態では、複数のサブ映像に対する表情別評価値を設定された時刻毎に取得する必要があるため、処理が複雑であり、動画データの復号化など実際の処理負荷が大きい。本実施形態では、処理負荷を低減しつつ、メイン映像に応じてより好ましい閲覧者を選択し表示する内容について説明する。具体的には、記録媒体１０４に記録されたメイン映像の１フレームに対応する時刻における複数のサブ映像のそれぞれについて１フレームを選択する例を説明する。 In the fourth and fifth embodiments, since it is necessary to acquire facial expression evaluation values for a plurality of sub-videos at each set time, the processing is complicated, and the actual processing load such as decoding of moving image data is increased. large. In the present embodiment, contents to select and display a more preferable viewer according to the main video while reducing the processing load will be described. Specifically, an example in which one frame is selected for each of a plurality of sub-videos at a time corresponding to one frame of the main video recorded on the recording medium 104 will be described.

映像コンテンツに関わる技術としては、映像コンテンツに含まれる特定画像を検出し、ハイライトシーン（重要度の高いシーン）を抽出し、映像コンテンツにハイライトシーン情報を付加する技術が提案されている（特開２００６−０１４０８５号公報）。 As a technology related to video content, a technology has been proposed in which a specific image included in the video content is detected, a highlight scene (high importance scene) is extracted, and highlight scene information is added to the video content ( JP, 2006-014085, A).

本実施形態では、上記公報に記載されたような技術を用いて、メイン映像データのハイライトシーン情報よりハイライトシーンの時刻情報を取得する。この時刻をｔｍとする。メイン映像に対応付けられた複数のサブ映像における、上記ハイライトシーンの時刻ｔｍに対応する時刻ｔｍの１フレームを復号化し、表情別評価値を取得する。 In the present embodiment, the time information of the highlight scene is acquired from the highlight scene information of the main video data using a technique as described in the above publication. Let this time be tm. One frame at time tm corresponding to time tm of the highlight scene in a plurality of sub-videos associated with the main video is decoded, and an evaluation value for each expression is acquired.

取得した複数のサブ映像に関わる表情別評価値について、上記第１乃至第３実施形態に示した方法を用いて、表示部１２８に表示するサブ映像を選択する。以降は上記各実施形態と同様であるので説明を省略する。上述の方法でサブ映像の選択を実施することで、メイン映像及びサブ映像が特に動画データである場合でも、処理負荷を低減しつつ、メイン映像に応じてより好ましい閲覧者を選択し表示することができる。 For the facial expression evaluation values related to the acquired plurality of sub-videos, the sub-video to be displayed on the display unit 128 is selected using the method described in the first to third embodiments. Since the subsequent steps are the same as those in the above embodiments, description thereof is omitted. By selecting the sub video by the above-described method, even when the main video and the sub video are moving image data, it is possible to select and display a more preferable viewer according to the main video while reducing the processing load. Can do.

〔他の実施形態〕
第１乃至第６実施形態では、本発明の好ましい実施形態について説明したが、本発明はこれらの第１乃至６実施形態に限定されるものではなく、その要旨の範囲内で種々の変形及び変更が可能である。 [Other Embodiments]
In the first to sixth embodiments, preferred embodiments of the present invention have been described. However, the present invention is not limited to these first to sixth embodiments, and various modifications and changes can be made within the scope of the gist. Is possible.

また、本発明は、以下の処理を実行することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、ネットワークまたは各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。本発明のプログラムは、本発明の画像処理装置の制御方法をコンピュータに実行させるためのコンピュータ可読のプログラムコードを有し、記憶媒体に格納される。 The present invention can also be realized by executing the following processing. That is, software (program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and a computer (or CPU, MPU, etc.) of the system or apparatus reads the program. It is a process to be executed. The program of the present invention has computer-readable program code for causing a computer to execute the control method of the image processing apparatus of the present invention, and is stored in a storage medium.

１００撮像装置
１０４記録媒体
１２５顔検出部
１２６表情別評価値算出部
１２８表示部
１５０システム制御部 DESCRIPTION OF SYMBOLS 100 Imaging device 104 Recording medium 125 Face detection part 126 Evaluation value calculation part 128 classified by expression Display part 150 System control part

Claims

Recording means for recording a main video and a plurality of sub-videos associated with the main video;
Detecting means for detecting a human face from the sub-video recorded in the recording means;
Based on the facial area of the person detected by the detection means, an evaluation value calculation means for each expression that calculates an evaluation value for each facial expression representing what expression the person's face corresponds to;
Based on the evaluation value calculated by the expression-specific evaluation value calculation means, a selection means for selecting a sub video from the plurality of sub videos recorded in the recording means;
Control means for displaying the sub video selected by the selection means together with the main video recorded in the recording means;
An image processing apparatus comprising:

The image processing apparatus according to claim 1, wherein the selection unit selects a predetermined number of sub-images.

Based on the result calculated by the facial expression-based evaluation value calculation means, further comprising a determination means for determining the expression of the sub-image,
The selection means selects a sub video from the plurality of sub videos recorded in the recording means based on the evaluation value calculated by the expression-specific evaluation value calculation means and the facial expression determined by the determination means. The image processing apparatus according to claim 1, wherein:

The said selection means preferentially selects the sub video | video with high evaluation value according to the expression of the determined facial expression from the said some sub video | video recorded on the said recording means. Image processing device.

The image processing apparatus according to claim 3, wherein the selection unit preferentially selects an expression having the same facial expression determined by the determination unit and having a high evaluation value of the facial expression.

Further comprising a facial expression ratio calculation means for calculating a facial expression ratio indicating a ratio of all the sub videos of the sub video determined to be a specific facial expression by the determination means;
The selection means preferentially selects a sub-video with a high expression-specific ratio of the identified facial expression from the plurality of sub-videos recorded in the recording means based on the result calculated by the expression-specific ratio calculation means. The image processing apparatus according to claim 1, wherein the image processing apparatus is selected.

The selection means preferentially selects a sub-video with a high evaluation value for each facial expression from the plurality of sub-videos recorded in the recording means based on the result calculated by the expression-specific ratio calculation means, The image processing apparatus according to claim 6, wherein the sub-video having a high expression-specific ratio is preferentially selected.

Frame selection means for selecting at least one frame from the sub-video recorded in the recording means;
The image processing apparatus according to claim 1, wherein the detection unit detects a human face from the frame selected by the frame selection unit.

The image processing apparatus according to claim 8, wherein the frame selection unit selects a plurality of frames at predetermined time intervals from the sub video recorded in the recording unit.

9. The image processing apparatus according to claim 8, wherein the frame selection unit selects one frame for each of a plurality of sub-videos at a time corresponding to one frame of the main video recorded in the recording unit. .

The main video is a video obtained by photographing a subject, and the sub video is a video obtained by photographing a viewer of the main video when the main video is displayed. The image processing apparatus according to any one of 1 to 10.

An imaging apparatus comprising the image processing apparatus according to claim 1.

A control method for an image processing apparatus, comprising:
A detection step of detecting a person's face from the sub video recorded in the recording means for recording the main video and a plurality of sub videos associated with the main video;
Based on the facial area of the person detected by the detection step, an evaluation value for each facial expression that calculates an evaluation value for each facial expression representing what facial expression the person's face corresponds to;
Based on the evaluation value calculated by the expression-specific evaluation value calculation step, a selection step of selecting a sub image from the plurality of sub images recorded in the recording unit;
A control step of displaying the sub video selected by the selection step together with the main video recorded in the recording means;
An image processing apparatus control method comprising:

A computer-readable program having program code for causing a computer to execute the control method of the image processing apparatus according to claim 13, wherein the control method includes:
A detection step of detecting a person's face from the sub video recorded in the recording means for recording the main video and a plurality of sub videos associated with the main video;
Based on the facial area of the person detected by the detection step, an evaluation value for each facial expression that calculates an evaluation value for each facial expression representing what facial expression the person's face corresponds to;
Based on the evaluation value calculated by the expression-specific evaluation value calculation step, a selection step of selecting a sub image from the plurality of sub images recorded in the recording unit;
A control step of displaying the sub video selected by the selection step together with the main video recorded in the recording means;
A program characterized by having.

A computer-readable storage medium storing the program according to claim 14.