JP2012253692A

JP2012253692A - Imaging apparatus, reproducer, data structure, control method of imaging apparatus and imaging apparatus program

Info

Publication number: JP2012253692A
Application number: JP2011126727A
Authority: JP
Inventors: Atsushi Ishihara; 厚石原
Original assignee: Olympus Imaging Corp
Current assignee: Olympus Imaging Corp
Priority date: 2011-06-06
Filing date: 2011-06-06
Publication date: 2012-12-20

Abstract

PROBLEM TO BE SOLVED: To provide an imaging apparatus for reproducing atmosphere upon photographing and for reproducing an image without giving discomfort to an appreciator when the image is reproduced with sound, and to provide a reproducer, a data structure, a control method of the imaging apparatus and an imaging apparatus program.SOLUTION: The imaging apparatus includes: an imaging section for imaging an image and generating electronic data of the image; a sound input section for inputting sound and converting it into an electric signal; a sound processing section for generating sound data by performing processing on the electric signal which the sound input section converts, and separating the sound data into a plurality of pieces of partial sound data on the basis of a characteristic of the sound; and a control section for selecting the sound data or the partial sound data corresponding to the sound outputted in synchronization with the image and making the data correspond to the image in accordance with a display mode when the image corresponding to the image data which the imaging section generates is reproduced.

Description

本発明は、画像を取得して電子的な画像データを生成する撮像装置、画像を再生する再生装置、画像データを含むデータ構造、撮像装置の制御方法および撮像装置用プログラムに関する。 The present invention relates to an imaging apparatus that acquires an image and generates electronic image data, a playback apparatus that reproduces an image, a data structure including image data, a control method for the imaging apparatus, and a program for the imaging apparatus.

従来、動画を撮影する機能を有するデジタルカメラにおいて、動画撮影の途中で撮影した静止画に、その前後の音声データを付加して再生する技術が開示されている（例えば、特許文献１を参照）。 2. Description of the Related Art Conventionally, in a digital camera having a function of shooting a moving image, a technique for adding and reproducing audio data before and after a still image shot during moving image shooting has been disclosed (for example, see Patent Document 1). .

また、動画を撮影して再生する機能を有する撮像再生装置において、通常の動画撮影モード以外のモードとして、所定時間経過後に自動的に撮影が終了するスナップムービー撮影モードを設定可能な撮像再生装置が知られている。この撮像再生装置では、スナップムービー撮影モード撮影時の音声とは別に用意された音声を出力しながらテンポよく映像を切り換えて再生することができる（例えば、特許文献２を参照）。 In addition, in an imaging / playback apparatus having a function of shooting and playing back a movie, an imaging / playback apparatus capable of setting a snap movie shooting mode in which shooting automatically ends after a predetermined time has elapsed as a mode other than the normal movie shooting mode. Are known. In this imaging / playback apparatus, it is possible to switch and play back images with high tempo while outputting audio prepared separately from the audio at the time of shooting in the snap movie shooting mode (see, for example, Patent Document 2).

特開２００８−２２２４６号公報JP 2008-22246 A 特開２０１０−１４１４１４号公報JP 2010-141414 A

しかしながら、上述した特許文献１では、再生時の音声に被写体の声が入っている場合、その声が途中で途切れてしまう可能性があった。被写体の声が途中で途切れてしまう場合、再生画像の鑑賞者は途切れた音声の内容が気になり、違和感を感じたまま鑑賞を続けなければならなかった。 However, in Patent Document 1 described above, when the voice of the subject is included in the sound during reproduction, there is a possibility that the voice is interrupted in the middle. When the subject's voice was interrupted, the viewer of the replayed image was concerned about the content of the interrupted sound, and had to continue watching with a sense of discomfort.

また、上述した特許文献２では、再生時と映像との間の関連性がないため、場合によっては音と映像とのミスマッチが生じてしまい、鑑賞者が違和感を感じてしまう可能性があった。 Further, in Patent Document 2 described above, since there is no relationship between playback and video, in some cases, a mismatch between sound and video may occur, and the viewer may feel uncomfortable. .

本発明は、上記に鑑みてなされたものであって、画像を音声とともに再生する際に、撮影時の雰囲気を再現し、かつ鑑賞者に違和感を感じさせない再生を行うことができる撮像装置、再生装置、データ構造、撮像装置の制御方法および撮像装置用プログラムを提供することを目的とする。 The present invention has been made in view of the above, and an imaging apparatus capable of reproducing the atmosphere at the time of shooting and reproducing the image without making the viewer feel uncomfortable when reproducing an image together with sound. An object is to provide an apparatus, a data structure, an imaging apparatus control method, and an imaging apparatus program.

上述した課題を解決し、目的を達成するために、本発明に係る撮像装置は、画像を撮像して該画像の電子的なデータを生成する撮像部と、音声を入力して電気信号に変換する音声入力部と、前記音声入力部が変換した電気信号に処理を施すことによって音声データを生成するとともに、該音声データを音の特性に基づいて複数の部分音声データに分離する音声処理部と、前記撮像部が生成した画像データに対応する画像を再生する際の表示態様に応じて、該画像と同期して出力する音声に対応する音声データまたは部分音声データを選択して該画像と対応付ける制御部と、を備えたことを特徴とする。 In order to solve the above-described problems and achieve the object, an imaging apparatus according to the present invention includes an imaging unit that captures an image and generates electronic data of the image, and inputs audio and converts it into an electrical signal. A voice input unit that generates voice data by processing the electrical signal converted by the voice input unit, and that separates the voice data into a plurality of partial voice data based on the characteristics of the sound; In accordance with a display mode when reproducing an image corresponding to the image data generated by the imaging unit, audio data or partial audio data corresponding to audio output in synchronization with the image is selected and associated with the image And a control unit.

また、本発明に係る撮像装置は、上記発明において、前記音の特性は、音の強度を含むことを特徴とする。 The imaging apparatus according to the present invention is characterized in that, in the above invention, the sound characteristic includes a sound intensity.

また、本発明に係る撮像装置は、上記発明において、前記音の特性は、周波数を含み、前記音声処理部は、前記音声データの周波数スペクトルを算出し、前記音の特性に基づいて前記周波数スペクトルを分離することによって得られる部分周波数スペクトルから前記複数の部分音声データを生成することを特徴とする。 In the imaging device according to the present invention, in the above invention, the sound characteristic includes a frequency, and the sound processing unit calculates a frequency spectrum of the sound data, and the frequency spectrum is calculated based on the sound characteristic. The plurality of partial sound data is generated from a partial frequency spectrum obtained by separating.

また、本発明に係る撮像装置は、上記発明において、前記音声処理部は、人の声を含む部分音声データを生成可能であることを特徴とする。 Moreover, the imaging apparatus according to the present invention is characterized in that, in the above-mentioned invention, the voice processing unit can generate partial voice data including a human voice.

また、本発明に係る撮像装置は、上記発明において、前記制御部は、複数の画像を所定時間ずつ連続的に表示するスライドショーモードで画像を再生表示する場合、該画像と同期して出力する音声として、人の声を含まない部分音声データを最優先で選択することを特徴とする。 In the image pickup apparatus according to the present invention, in the above invention, when the control unit reproduces and displays an image in a slide show mode in which a plurality of images are continuously displayed for a predetermined time, audio output in synchronization with the image is output. As described above, partial voice data not including a human voice is selected with the highest priority.

また、本発明に係る撮像装置は、上記発明において、前記画像データに対応する画像を表示可能な表示部と、前記音声データに対応する音声または前記部分音声データに対応する部分音声を出力可能な音声出力部と、をさらに備え、前記制御部は、前記表示部における画像の表示態様に応じた音声または部分音声を選択し、該画像の表示と同期して前記音声出力部に出力させることを特徴とする。 In the image pickup apparatus according to the present invention, the display unit capable of displaying an image corresponding to the image data and the sound corresponding to the sound data or the partial sound corresponding to the partial sound data can be output. An audio output unit, and the control unit selects audio or partial audio corresponding to the display mode of the image on the display unit, and causes the audio output unit to output in synchronization with the display of the image. Features.

また、本発明に係る撮像装置は、上記発明において、画像の表示および音声の出力が可能な再生装置とネットワークを介して情報の送受信を行う通信部と、前記再生装置に出力させる情報として、画像データ、および該画像データに対応する画像の表示態様に応じて定められ、該画像と同期して出力する音声データを含み、前記通信部を介して前記再生装置へ送信する送信ファイルを作成する送信ファイル作成部と、をさらに備えたことを特徴とする。 In the image pickup apparatus according to the present invention, in the above-described invention, a reproduction apparatus capable of displaying an image and outputting audio, a communication unit that transmits and receives information via a network, and information to be output to the reproduction apparatus Transmission that creates data and a transmission file that is determined according to the display mode of the image corresponding to the image data and includes audio data that is output in synchronization with the image and that is transmitted to the playback device via the communication unit And a file creation unit.

また、本発明に係る再生装置は、画像を表示可能な表示部と、前記表示部が表示する画像に適合する音声を出力可能な音声出力部と、画像データ、および該画像データに対応する画像の表示態様に応じて定められ、該画像と同期して出力する音声を含む音声データ、を有するデータ構造の記載内容を解釈する解釈部と、前記解釈部が解釈した結果に基づいて前記表示部に画像を表示させるとともに前記音声出力部に音声を出力させる再生制御部と、を備えたことを特徴とする。 In addition, the playback device according to the present invention includes a display unit capable of displaying an image, a sound output unit capable of outputting sound suitable for an image displayed on the display unit, image data, and an image corresponding to the image data. An interpreter that interprets the description of a data structure that includes audio data including audio that is output in synchronization with the image, and the display unit based on the result of interpretation by the interpreter And a reproduction control unit that causes the audio output unit to output audio while displaying an image.

また、本発明に係る再生装置は、上記発明において、前記撮像装置と前記ネットワークを介して通信可能であり、前記データ構造としての前記送信ファイルを前記撮像装置から受信する通信部をさらに備えたことを特徴とする。 Further, the playback device according to the present invention further includes a communication unit capable of communicating with the imaging device via the network and receiving the transmission file as the data structure from the imaging device in the above invention. It is characterized by.

また、本発明に係るデータ構造は、画像データと、前記画像データに対応する画像の表示態様を定めるモード情報と、前記画像を前記表示態様にしたがって表示する際に前記画像と同期して出力する音声を含む音声データと、を備えたことを特徴とする。 In addition, the data structure according to the present invention outputs image data, mode information that defines a display mode of an image corresponding to the image data, and outputs the image in synchronization with the image when the image is displayed according to the display mode. And voice data including voice.

また、本発明に係る撮像装置の制御方法は、画像を撮像して該画像の電子的なデータを生成する撮像ステップと、前記撮像ステップにおける画像の撮像時を含む期間の音声を入力して電気信号に変換する音声入力ステップと、前記音声入力ステップで変換された電気信号に処理を施すことによって音声データを生成するとともに、該音声データを音の特性に基づいて複数の部分音声データに分離する音声処理ステップと、前記撮像ステップで生成した画像データに対応する画像を再生する際の表示態様に応じて、該画像と同期して出力する音声に対応する音声データまたは部分音声データを選択して該画像と対応付ける対応付けステップと、を有することを特徴とする。 According to another aspect of the present invention, there is provided a method for controlling an imaging apparatus, wherein an imaging step of capturing an image and generating electronic data of the image, and inputting an audio signal during a period including the time of imaging of the image in the imaging step. A voice input step for converting into a signal; and processing the electrical signal converted in the voice input step to generate voice data, and separating the voice data into a plurality of partial voice data based on the characteristics of the sound Depending on the display mode when reproducing the image corresponding to the image data generated in the audio processing step and the imaging step, the audio data or the partial audio data corresponding to the audio output in synchronization with the image is selected. And an associating step for associating with the image.

また、本発明に係る撮像装置用プログラムは、画像を撮像して該画像の電子的なデータを生成するとともに、音声を入力して電気信号に変換する撮像装置に、前記電気信号に処理を施すことによって音声データを生成するとともに、該音声データを音の特性に基づいて複数の部分音声データに分離する音声処理ステップと、前記画像を再生する際の表示態様に応じて、該画像と同期して出力する音声に対応する音声データまたは部分音声データを選択して該画像と対応付ける対応付けステップと、を実行させることを特徴とする。 The imaging apparatus program according to the present invention captures an image to generate electronic data of the image, and processes the electrical signal to an imaging apparatus that inputs sound and converts it into an electrical signal. In accordance with the sound processing step for generating sound data and separating the sound data into a plurality of partial sound data based on sound characteristics, and the display mode when the image is reproduced, the sound data is synchronized with the image. And selecting a voice data or partial voice data corresponding to the voice to be output and associating it with the image.

本発明によれば、音声データを複数の部分音声データに分離し、表示する画像データの表示態様に適合する音声データまたは部分音声データを選択して該画像と対応付けているため、画像を再生する際に、その画像の撮影時の雰囲気を再現するとともに、鑑賞者に対して快適な状態で再生画像を鑑賞させることができる。 According to the present invention, audio data is separated into a plurality of partial audio data, and audio data or partial audio data suitable for the display mode of the image data to be displayed is selected and associated with the image. In this case, it is possible to reproduce the atmosphere at the time of photographing the image and allow the viewer to appreciate the reproduced image in a comfortable state.

図１は、本発明の一実施の形態に係る撮像装置および再生装置からなる通信システムの構成を示す図である。FIG. 1 is a diagram illustrating a configuration of a communication system including an imaging device and a playback device according to an embodiment of the present invention. 図２は、本発明の一実施の形態に係る撮像装置の音声処理部が行う処理の概要を示す図である。FIG. 2 is a diagram illustrating an outline of processing performed by the audio processing unit of the imaging apparatus according to the embodiment of the present invention. 図３は、音声データの最大音圧レベルの時間変化の例を示す図である。FIG. 3 is a diagram illustrating an example of a temporal change in the maximum sound pressure level of the audio data. 図４は、本発明の一実施の形態に係る撮像装置の音声処理部が行う音声の分離処理の概要を示す図である。FIG. 4 is a diagram showing an outline of the sound separation process performed by the sound processing unit of the imaging apparatus according to the embodiment of the present invention. 図５は、本発明の一実施の形態に係る撮像装置の送信ファイル作成部が生成する送信ファイルのデータ構造を模式的に示す図である。FIG. 5 is a diagram schematically showing the data structure of the transmission file generated by the transmission file creation unit of the imaging apparatus according to the embodiment of the present invention. 図６は、本発明の一実施の形態に係る撮像装置が行う特徴的な処理の概要を示す模式図である。FIG. 6 is a schematic diagram showing an outline of characteristic processing performed by the imaging apparatus according to the embodiment of the present invention. 図７は、本発明の一実施の形態に係る撮像装置が行う処理の概要を示すフローチャートである。FIG. 7 is a flowchart showing an outline of processing performed by the imaging apparatus according to the embodiment of the present invention. 図８は、本発明の一実施の形態に係る撮像装置が行う音声データ記録処理の概要を示すフローチャートである。FIG. 8 is a flowchart showing an outline of audio data recording processing performed by the imaging apparatus according to the embodiment of the present invention. 図９は、本発明の一実施の形態に係る撮像装置が行う動画撮影処理の概要を示すフローチャートである。FIG. 9 is a flowchart showing an overview of the moving image shooting process performed by the imaging apparatus according to the embodiment of the present invention. 図１０は、本発明の一実施の形態に係る撮像装置が行う再生処理の概要を示すフローチャートである。FIG. 10 is a flowchart showing an outline of the reproduction process performed by the imaging apparatus according to the embodiment of the present invention. 図１１は、本発明の一実施の形態に係る撮像装置が行うスライドショー表示処理の概要を示すフローチャートである。FIG. 11 is a flowchart showing an outline of a slide show display process performed by the imaging apparatus according to the embodiment of the present invention. 図１２は、スライドショー表示において、動作再生時に第２部分音声を用いて再生している状況を模式的に示す図である。FIG. 12 is a diagram schematically illustrating a situation in which playback is performed using the second partial sound during operation playback in the slide show display. 図１３は、本発明の一実施の形態に係る再生装置が行う表示処理の概要を示すフローチャートである。FIG. 13 is a flowchart showing an overview of display processing performed by the playback apparatus according to the embodiment of the present invention.

以下、添付図面を参照して、本発明を実施するための形態（以下、「実施の形態」という）を説明する。 DESCRIPTION OF EMBODIMENTS Hereinafter, embodiments for carrying out the present invention (hereinafter referred to as “embodiments”) will be described with reference to the accompanying drawings.

図１は、本発明の一実施の形態に係る撮像装置および再生の構成ならびに撮像装置および再生装置からなる通信システムの構成を示す図である。同図に示す通信システム１００は、所定の視野領域の画像を撮像して画像データを生成する撮像装置１と、画像を表示する再生装置３とを備える。撮像装置１および再生装置３は、互いに他と画像データを含む情報の通信を行う機能を有する。 FIG. 1 is a diagram illustrating a configuration of an imaging device and a playback device according to an embodiment of the present invention, and a configuration of a communication system including the imaging device and the playback device. The communication system 100 shown in the figure includes an imaging device 1 that captures an image of a predetermined visual field and generates image data, and a playback device 3 that displays the image. The imaging device 1 and the playback device 3 have a function of communicating information including image data with each other.

まず、撮像装置１の構成を説明する。撮像装置１は、撮像部１１と、操作入力部１２と、音声入力部１３と、表示部１４と、音声出力部１５と、通信部１６と、記憶部１７と、制御部１８とを備える。 First, the configuration of the imaging device 1 will be described. The imaging device 1 includes an imaging unit 11, an operation input unit 12, a voice input unit 13, a display unit 14, a voice output unit 15, a communication unit 16, a storage unit 17, and a control unit 18.

撮像部１１は、所定の視野領域の画像を撮像してデジタルの画像データを生成する。撮像部１１は、一または複数のレンズによって構成され、所定の視野領域に存在する被写体からの光を集光する光学系と、光学系が集光した光の入射量を調整する絞りと、絞りを通過した光を受光して電気信号に変換するＣＣＤ（Charge Coupled Device）等の撮像素子と、撮像素子から出力されるアナログ信号をデジタル信号からなる画像データに変換するＡ／Ｄ変換部とを有する。 The imaging unit 11 captures an image of a predetermined visual field and generates digital image data. The imaging unit 11 includes one or more lenses, an optical system that collects light from a subject existing in a predetermined visual field region, a diaphragm that adjusts an incident amount of light collected by the optical system, and a diaphragm An image sensor such as a CCD (Charge Coupled Device) that receives light passing through and converts it into an electrical signal, and an A / D converter that converts an analog signal output from the image sensor into image data composed of a digital signal. Have.

操作入力部１２は、撮像装置１の操作指示を含む各種情報の入力を受け付ける。操作入力部１２は、撮像装置１の電源ボタン、静止画の撮影指示を与える静止画レリーズボタン動画の撮影指示を与える動画レリーズボタン、撮像装置１で設定可能な各種動作モードの切替を行うモード切替ボタン、画像データの再生や編集の指示を含む制御ボタンなどを有する。また、操作入力部１２は、表示部１４に積層されて設けられるタッチパネルを有する。 The operation input unit 12 receives input of various types of information including operation instructions for the imaging apparatus 1. The operation input unit 12 is a power button of the imaging device 1, a still image release button that gives a still image shooting instruction, a moving image release button that gives a moving image shooting instruction, and mode switching that switches among various operation modes that can be set by the imaging device 1. Buttons and control buttons including instructions for reproducing and editing image data. In addition, the operation input unit 12 includes a touch panel provided by being stacked on the display unit 14.

音声入力部１３は、マイクロフォンを用いて構成され、音声を入力して電気信号に変換する機能を有する。 The voice input unit 13 is configured using a microphone, and has a function of inputting voice and converting it into an electrical signal.

表示部１４は、液晶または有機ＥＬ等を用いて構成され、画像データのほかに撮像装置１の操作情報や撮影に関する情報を適宜表示する機能を有する。 The display unit 14 is configured using liquid crystal, organic EL, or the like, and has a function of appropriately displaying operation information of the imaging apparatus 1 and information related to photographing in addition to image data.

音声出力部１５は、スピーカを用いて構成され、音声を外部へ出力する機能を有する。 The audio output unit 15 is configured using a speaker and has a function of outputting audio to the outside.

通信部１６は、ネットワークを介して再生装置３との間で情報の送受信を行う。このネットワークとして、無線ＬＡＮ（Local Area Network）または赤外通信などの無線通信用ネットワークを適用してもよいし、有線通信を含むネットワークを適用してもよい。 The communication unit 16 transmits / receives information to / from the playback device 3 via the network. As this network, a wireless communication network such as a wireless local area network (LAN) or infrared communication may be applied, or a network including wired communication may be applied.

記憶部１７は、画像データを記憶する画像データ記憶部１７１と、音声データを記憶する音声データ記憶部１７２と、撮像装置１が実行する各種プログラムを記憶するプログラム記憶部１７３とを有する。記憶部１７は、撮像装置１の内部に固定的に設けられるフラッシュメモリやＲＡＭ（Random Access Memory）等の半導体メモリを用いて構成される。なお、記憶部１７が、外部から装着されるメモリカード等の記録媒体に対して情報を記録する一方、記録媒体が記録する情報を読み出す記録媒体インタフェースとしての機能を有していてもよい。 The storage unit 17 includes an image data storage unit 171 that stores image data, an audio data storage unit 172 that stores audio data, and a program storage unit 173 that stores various programs executed by the imaging apparatus 1. The storage unit 17 is configured using a semiconductor memory such as a flash memory or a RAM (Random Access Memory) that is fixedly provided inside the imaging apparatus 1. Note that the storage unit 17 may have a function as a recording medium interface that reads information recorded on a recording medium while recording information on a recording medium such as a memory card mounted from the outside.

画像データ記憶部１７１は、ＭＰＥＧ（Moving Picture Experts Group）等の規格に基づいて符号化された動画像データを音声データとともにコンテナに格納した動画ファイルを記憶するとともに、ＪＰＥＧ（Joint Photographic Experts Group）等の形式に基づいて符号化された静止画像データを記憶する。 The image data storage unit 171 stores a moving image file in which moving image data encoded based on a standard such as MPEG (Moving Picture Experts Group) is stored in a container together with audio data, and also JPEG (Joint Photographic Experts Group) or the like. Still image data encoded based on the format is stored.

制御部１８は、ＣＰＵ（Central Processing Unit）等を用いて構成され、制御対象である撮像装置１の各構成部位とバスラインを介して接続されている。制御部１８は、撮像部１１が生成した画像データに対して所定の画像処理を施す画像処理部１８１と、音声入力部１３から出力される電気信号に対して所定の信号処理を施す音声処理部１８２と、再生装置３へ送信する送信ファイルを作成する送信ファイル作成部１８３と、計時機能を備えたタイマー１８４とを有する。 The control unit 18 is configured using a CPU (Central Processing Unit) or the like, and is connected to each component of the imaging device 1 that is a control target via a bus line. The control unit 18 includes an image processing unit 181 that performs predetermined image processing on the image data generated by the imaging unit 11, and an audio processing unit that performs predetermined signal processing on the electrical signal output from the audio input unit 13. 182, a transmission file creation unit 183 that creates a transmission file to be transmitted to the playback device 3, and a timer 184 having a timekeeping function.

音声処理部１８２は、音声信号に標本化および量子化を施してＡ／Ｄ変換を行うことによって音声データを生成し、生成した音声データにＦＦＴ（Fast Fourier Transformation）およびフィルタリングを行うことによって音を所定の周波数で分離し、分離した音声データの音圧レベルを判定し、所定の基準を満たしている分離後の音声データに逆ＦＦＴを施すことによって複数の部分音声データを生成する。 The audio processing unit 182 samples and quantizes the audio signal and performs A / D conversion to generate audio data, and performs sound processing by performing FFT (Fast Fourier Transformation) and filtering on the generated audio data. Separation is performed at a predetermined frequency, the sound pressure level of the separated audio data is determined, and a plurality of partial audio data is generated by performing inverse FFT on the audio data after separation that satisfies a predetermined standard.

図２は、音声処理部１８２が行う処理の概要を説明する図である。図２において、横軸ｆが周波数を表し、縦軸Ｌが音圧レベル（ＳＰＬ）を表している。図２に示す曲線２００は、音声データの周波数スペクトル（以下、「周波数スペクトル２００」という）である。本実施の形態において、音声処理部１８２は、基準周波数ｆ₀を境界として周波数スペクトル２００を２つに分離する。その後、音声処理部１８２は、２つに分離された部分周波数スペクトルにおける最大音圧レベルＬ_maxと基準音圧レベルＬ₀との大小を比較することによって音圧レベルのレベル判定を行う。ここで、音圧レベルＬは、基準音圧Ｐ₀＝２×１０^-5（Ｐａ）に対する音圧Ｐの比の値の常用対数に比例する量であり、Ｌ＝２０×ｌｏｇ₁₀（Ｐ／Ｐ₀）（ｄＢ）で定義される。 FIG. 2 is a diagram for explaining an overview of processing performed by the audio processing unit 182. In FIG. 2, the horizontal axis f represents frequency, and the vertical axis L represents sound pressure level (SPL). A curve 200 shown in FIG. 2 is a frequency spectrum of audio data (hereinafter referred to as “frequency spectrum 200”). In the present embodiment, the audio processing unit 182 separates the frequency spectrum 200 into two with the reference frequency f ₀ as a boundary. Thereafter, the sound processing unit 182 determines the level of the sound pressure level by comparing the maximum sound pressure level L _max and the reference sound pressure level L _{0 in the} two separated partial frequency spectra. Here, the sound pressure level L is an amount proportional to the common logarithm of the value of the ratio of the sound pressure P to the reference sound pressure P ₀ = 2 × 10 ⁻⁵ (Pa), and L = 20 × log ₁₀ (P / P ₀ ) (dB).

図２に示す場合、周波数スペクトル２００は、音声処理部１８２によって２つの部分周波数スペクトル２０１、２０２に分離される。また、音声処理部１８２による音圧レベルのレベル判定の結果、部分周波数スペクトル２０１，２０２における最大音圧レベルＬ₁、Ｌ₂と基準音圧レベルＬ₀との大小関係は、Ｌ₁＞Ｌ₀、Ｌ₂＜Ｌ₀である。 In the case illustrated in FIG. 2, the frequency spectrum 200 is separated into two partial frequency spectra 201 and 202 by the audio processing unit 182. Further, as a result of the sound pressure level determination by the sound processing unit 182, the magnitude relationship between the maximum sound pressure levels L ₁ and L ₂ and the reference sound pressure level L ₀ in the partial frequency spectra 201 and 202 is L ₁ > L _0. , L ₂ <L ₀ .

例えば、図２において、基準周波数ｆ₀＝６００（Ｈｚ）、基準音圧レベルＬ₀＝５０（ｄＢ）とすると、部分周波数スペクトル２０１には人の声が含まれている可能性が高い。これは、人が普通に会話する声の周波数が１５０〜６００Ｈｚであり、音圧レベルが６０〜９０ｄＢであるためである。これに対して、部分周波数スペクトル２０２には、周波数が比較的高く、かつ音圧レベルが低い音が含まれている。したがって、部分周波数スペクトル２０２によって与えられる音としては、川のせせらぎ、鳥のさえずり、木の葉が触れ合う音のように、ヒーリング効果が高いとされる音が含まれている可能性が高い。このように、基準周波数ｆ₀と基準音圧レベルＬ₀とを適切に定めることにより、人の声を含む部分音声と、自然の背景音を含む部分音声とを明確に分離することができる。 For example, in FIG. 2, if the reference frequency f ₀ = 600 (Hz) and the reference sound pressure level L ₀ = 50 (dB), there is a high possibility that the partial frequency spectrum 201 includes a human voice. This is because the frequency of voices in which a person normally talks is 150 to 600 Hz, and the sound pressure level is 60 to 90 dB. On the other hand, the partial frequency spectrum 202 includes sound having a relatively high frequency and a low sound pressure level. Therefore, it is highly possible that the sound given by the partial frequency spectrum 202 includes a sound that has a high healing effect, such as a sound of a river, a song of a bird, or a leaf touching a tree. Thus, by appropriately determining the reference frequency f ₀ and the reference sound pressure level L ₀ , it is possible to clearly separate the partial sound including the human voice and the partial sound including the natural background sound.

図３は、音声データの最大音圧レベルＬ_maxの時間変化の例を示す図である。図３において、横軸ｔが時間であり、縦軸Ｌ_maxが最大音圧レベルを示している。図３に示す曲線３００は、１つの動画データに対応した音声データの最大音圧レベルの時間変化を示しており、時間Ｔが動画撮影時間を示している。また、図３において、時間Ｔ₀は、動画と静止画とが混在するスライドショー表示を行う際、動画の再生時間を静止画の再生時間と略等しくするように設定した場合の再生時間を示している。この再生時間Ｔ₀を設定することにより、動画と静止画が混在したスライドショーを行う際にテンポのよい再生を行うことができるようになる。 FIG. 3 is a diagram illustrating an example of a temporal change in the maximum sound pressure level L _max of the audio data. In FIG. 3, the horizontal axis t represents time, and the vertical axis L _max represents the maximum sound pressure level. A curve 300 shown in FIG. 3 indicates a time change of the maximum sound pressure level of the audio data corresponding to one moving image data, and time T indicates moving image shooting time. In FIG. 3, time T ₀ indicates the playback time when the playback time of the moving image is set to be substantially equal to the playback time of the still image when performing a slide show display in which a moving image and a still image are mixed. Yes. By setting the playback time T ₀ , playback with a good tempo can be performed when performing a slide show in which moving images and still images are mixed.

その一方で、図３に示す動画の場合、撮影開始直後から最大音圧レベルＬ_maxが基準音圧レベルＬ₀を超えており、時間Ｔ₀でも依然として基準音圧レベルＬ₀を超えたままである。このため、基準音圧レベルＬ₀＝５０（Ｈｚ）である場合、この時点で画像が切換ると、人の声が入った状態で切換ることとなり、何か話している途中で画像が途切れてしまう可能性がある。本実施の形態においては、このような事態を回避して違和感の少ないスライドショーを実現するため、画像と同期して出力する音声に人の声が入っている場合には、人の声を出力しないで他の部分音声を出力する。 On the other hand, in the case of the moving image shown in FIG. 3, the maximum sound pressure level L _max exceeds the reference sound pressure level L ₀ immediately after the start of shooting, and still exceeds the reference sound pressure level L ₀ even at time T _0. . For this reason, when the reference sound pressure level L ₀ = 50 (Hz), if the image is switched at this point, it is switched in a state in which a human voice enters, and the image is interrupted while something is being spoken. There is a possibility that. In the present embodiment, in order to avoid such a situation and realize a slideshow with less sense of incongruity, when a voice of a person is included in the sound output in synchronization with the image, the voice of the person is not output. To output other partial audio.

図４は、音声処理部１８２が行う音声の分離処理の概要を示す図である。第１部分音声は、周波数ｆが基準周波数ｆ₀より小さく（ｆ＜ｆ₀）、かつ最大音圧レベルＬ_maxが基準音圧レベルＬ₀以上（Ｌ_max≧Ｌ₀）である音声データに対応している。したがって、図２に示す部分周波数スペクトル２０１は、第１部分音声に相当している。 FIG. 4 is a diagram showing an outline of the sound separation process performed by the sound processing unit 182. The first partial sound corresponds to sound data whose frequency f is smaller than the reference frequency f ₀ (f <f ₀ ) and whose maximum sound pressure level L _max is equal to or higher than the reference sound pressure level L ₀ (L _max ≧ L ₀ ). is doing. Therefore, the partial frequency spectrum 201 shown in FIG. 2 corresponds to the first partial sound.

これに対し、第２部分音声は、周波数ｆが基準周波数ｆ₀以上（ｆ≧ｆ₀）であり、かつ最大音圧レベルＬ_maxが基準音圧レベルＬ₀より小さい（Ｌ_max＜Ｌ₀）音声データに対応している。したがって、図２に示す部分周波数スペクトル２０２は、第２部分音声に相当している。 On the other hand, in the second partial sound, the frequency f is equal to or higher than the reference frequency f ₀ (f ≧ f ₀ ), and the maximum sound pressure level L _max is smaller than the reference sound pressure level L ₀ (L _max <L ₀ ). Supports audio data. Therefore, the partial frequency spectrum 202 shown in FIG. 2 corresponds to the second partial sound.

なお、周波数ｆおよび最大音圧レベルＬ_maxが上述した条件を満たさない場合、音声処理部１８２は、分離した音声データを部分音声データとして記録しない。 When the frequency f and the maximum sound pressure level L _max do not satisfy the above-described conditions, the sound processing unit 182 does not record the separated sound data as partial sound data.

ところで、音声処理部１８２が被写体である人物の声などを明確に認識できるようにするために、画像データに含まれる顔画像を検知することができる顔検知部を撮像装置１に具備させ、顔検知部が顔画像を検知する際に判定した特徴によって、音声データを被写体の声などに適した帯域に制限するようにしてもよい。この場合、音声処理部１８２は、例えば帯域を制限するフィルタ回路である帯域制限フィルタ回路（ＢＰＦ：Band-Pass Filter）を備えることによって音声データの帯域制限を行うようにすることができる。 By the way, in order to enable the voice processing unit 182 to clearly recognize the voice of a person who is a subject, the imaging device 1 is provided with a face detection unit that can detect a face image included in the image data. The audio data may be limited to a band suitable for the subject's voice or the like according to the characteristics determined when the detection unit detects the face image. In this case, the audio processing unit 182 can limit the band of the audio data by including a band limiting filter circuit (BPF: Band-Pass Filter) that is a filter circuit that limits the band, for example.

なお、音声処理部１８２は、音声データを記録し、この記録された音声データから必要な帯域に関わる部分のデータだけを抽出することにより、出力する音声を生成するようにしてもよい。 Note that the sound processing unit 182 may record sound data, and may generate sound to be output by extracting only data of a portion related to a necessary band from the recorded sound data.

また、音声処理部１８２は、音声データからある帯域の音声データを増幅し、それ以外の帯域の音声データを減衰することによって音声データの帯域制限を行うようにしてもよい。 In addition, the audio processing unit 182 may amplify audio data of a certain band from the audio data, and perform audio data band limitation by attenuating audio data of other bands.

また、音声処理部１８２は、フィルタ特性を切り替え可能な構成を有してもよい。例えば、音声処理部１８２が、大人の声の帯域に最適化したフィルタ特性と、子供の声の帯域に最適化したフィルタ特性とを切り替え可能な構成を有してもよい。ここで、大人の声の周波数帯は１００〜８０００Ｈｚである一方、子供の声の周波数帯は１５０〜１００００Ｈｚである。このため、大人の声を鮮明に記録する際には、入力された音声データから、１００〜８０００Ｈｚの音声データだけを抽出して出力する一方、子供の声を鮮明に記録する際には、入力された音声データから、１５０〜１００００Ｈｚの音声データだけを抽出して出力するようにする。なお、性別に応じたフィルタ特性に切り替えることによって性別に応じた音声データを抽出するようにしてもよい。 Further, the sound processing unit 182 may have a configuration in which the filter characteristics can be switched. For example, the voice processing unit 182 may have a configuration capable of switching between a filter characteristic optimized for an adult voice band and a filter characteristic optimized for a child voice band. Here, the frequency band of an adult voice is 100 to 8000 Hz, while the frequency band of a child's voice is 150 to 10000 Hz. For this reason, when recording the voice of an adult clearly, only the voice data of 100 to 8000 Hz is extracted and output from the input voice data, while when recording the voice of a child clearly, the input is performed. Only the audio data of 150 to 10000 Hz is extracted from the audio data and output. In addition, you may make it extract the audio | voice data according to sex by switching to the filter characteristic according to sex.

また、音声処理部１８２は、人の声とは異なる周波数帯の音声データを抽出して出力するようにしてもよい。具体的には、音声処理部１８２が、動物の声、車、飛行機のエンジン音などを抽出して出力するようにしてもよい。 Further, the audio processing unit 182 may extract and output audio data in a frequency band different from that of human voice. Specifically, the voice processing unit 182 may extract and output an animal voice, a car, an airplane engine sound, and the like.

送信ファイル作成部１８３は、再生装置３へ送信する送信ファイルを作成する。図５は、送信ファイルのデータ構造を模式的に示す図である。同図に示す送信ファイル４００は、テキスト情報４０１と、画像データ４０２と、音声データ４０３とを有する。テキスト情報４０１には、再生装置３が解釈すべき情報として、送信ファイルの種類に関する情報、画像の表示態様を定めるモード情報、および画像とともに表示する文字情報などが含まれている。送信ファイルの種類に関する情報としては、スライドショー用のデータであることを示す情報や、スライドショー用の一連のデータのうちの最後のデータであることを示す情報などが含まれる。また、音声データ４０３は、音声データ、第１および第２部分音声データのいずれかである。 The transmission file creation unit 183 creates a transmission file to be transmitted to the playback device 3. FIG. 5 is a diagram schematically showing the data structure of the transmission file. The transmission file 400 shown in the figure has text information 401, image data 402, and audio data 403. The text information 401 includes, as information to be interpreted by the playback device 3, information related to the type of transmission file, mode information that determines the display mode of the image, and character information that is displayed together with the image. The information regarding the type of transmission file includes information indicating that the data is for a slide show, information indicating the last data in a series of data for a slide show, and the like. The audio data 403 is either audio data or first and second partial audio data.

図６は、以上の構成を有する撮像装置１が行う特徴的な処理の概要を示す模式図である。具体的には、図６は、撮像装置１による撮影時の状況と再生時の音声出力の違いを模式的に示す図である。なお、図６に記載されている文字は、画像に対応する音の特徴を表すためのものであり、実際の画面上に文字が表示されているわけではない。図６（ａ）に示すように、撮影前後に被写体７００が「きついよ」という言葉を発している場合、音声データ記憶部１７２は、この被写体７００の言葉を第１部分音声データとして記憶する一方、背景音として「ザーザー」という音（川の流れの音）を第２部分音声データとして記憶する。撮像装置１は、上記の如く撮影して音声データを記憶した場合、画像を再生するときに、図６（ｂ）に示すように、第２部分音声のみを再生し、被写体７００が発した言葉を再生しない機能を有している。 FIG. 6 is a schematic diagram illustrating an outline of characteristic processing performed by the imaging apparatus 1 having the above configuration. Specifically, FIG. 6 is a diagram schematically showing a difference between a situation at the time of photographing by the imaging apparatus 1 and an audio output at the time of reproduction. Note that the characters described in FIG. 6 are for representing the characteristics of the sound corresponding to the image, and the characters are not displayed on the actual screen. As shown in FIG. 6A, when the subject 700 is uttering the word “tight” before and after shooting, the audio data storage unit 172 stores the word of the subject 700 as the first partial audio data. In addition, the sound of “Zaza” (sound of the river flow) is stored as the second partial audio data as the background sound. When the imaging apparatus 1 captures and stores audio data as described above, when the image is reproduced, as shown in FIG. 6B, only the second partial audio is reproduced, and the words uttered by the subject 700 are reproduced. Has the function of not playing.

次に、再生装置３の構成を説明する。再生装置３は、通信部３１と、操作入力部３２と、表示部３３と、音声出力部３４と、記憶部３５と、制御部３６とを備える。通信部は、撮像装置１との間で画像データを含む情報の送受信を行う通信インタフェースである。操作入力部３２は、再生装置３の操作指示を含む各種情報の入力を受け付ける。表示部３３は、液晶または有機ＥＬ等を用いて構成され、各種情報を表示可能である。音声出力部３４は、スピーカ等を用いて構成され、音声を出力可能である。記憶部３５は、半導体メモリ等を用いて構成され、再生装置３の動作プログラムを含む各種情報を記憶する。制御部３６は、ＣＰＵ等を用いて構成され、再生装置３の動作を統括的に制御する。 Next, the configuration of the playback device 3 will be described. The playback device 3 includes a communication unit 31, an operation input unit 32, a display unit 33, an audio output unit 34, a storage unit 35, and a control unit 36. The communication unit is a communication interface that transmits and receives information including image data to and from the imaging device 1. The operation input unit 32 receives input of various types of information including operation instructions for the playback device 3. The display unit 33 is configured using liquid crystal, organic EL, or the like, and can display various information. The audio output unit 34 is configured using a speaker or the like and can output audio. The storage unit 35 is configured using a semiconductor memory or the like, and stores various types of information including an operation program for the playback device 3. The control unit 36 is configured using a CPU or the like, and comprehensively controls the operation of the playback device 3.

制御部３６は、撮像装置１から送られてくる送信ファイルに書き込まれている情報を解釈する解釈部３６１と、解釈部が解釈した結果に基づいて表示部３３に画像を表示させるとともに、音声出力部３４に音声を出力させる再生制御部３６２と、計時機能を有するタイマー３６３とを有する。 The control unit 36 interprets the information written in the transmission file sent from the imaging device 1, displays the image on the display unit 33 based on the result interpreted by the interpretation unit, and outputs the sound. A reproduction control unit 362 that outputs sound to the unit 34, and a timer 363 having a timekeeping function.

以上の構成を有する再生装置３は、例えばテレビ、プロジェクター、パソコン、デジタルフォトフレーム等のいずれかによって実現される。このような電子機器によって再生装置３を実現する場合には、撮像装置１から送られてきた画像データに基づく画像の表示と、それ以外の機能に基づく画像の表示とを切り換える表示切換手段を再生装置３に具備させておけばよい。 The playback apparatus 3 having the above configuration is realized by any one of a television, a projector, a personal computer, a digital photo frame, and the like. When the playback device 3 is realized by such an electronic device, the display switching means for switching between the display of the image based on the image data sent from the imaging device 1 and the display of the image based on other functions is played back. The device 3 may be provided.

図７は、撮像装置１が行う処理の概要を示すフローチャートである。まず、撮像装置１の電源がオンになっていれば（ステップＳ１０１：Ｙｅｓ）、撮像装置１はステップＳ１０２へ移行する。一方、撮像装置１の電源がオンになっていなければ（ステップＳ１０１：Ｎｏ）、撮像装置１は一連の処理を終了する。 FIG. 7 is a flowchart illustrating an outline of processing performed by the imaging apparatus 1. First, if the power supply of the imaging device 1 is on (step S101: Yes), the imaging device 1 proceeds to step S102. On the other hand, if the power supply of the imaging device 1 is not turned on (step S101: No), the imaging device 1 ends a series of processes.

ステップＳ１０２において、撮像装置１が撮影モードに設定されている場合（ステップＳ１０２：Ｙｅｓ）、制御部１８は、撮像部１１および音声入力部１３に対して画像および音声の取得をそれぞれ開始させる制御を行う（ステップＳ１０３）。 In step S102, when the imaging device 1 is set to the shooting mode (step S102: Yes), the control unit 18 controls the imaging unit 11 and the voice input unit 13 to start acquiring images and sounds, respectively. This is performed (step S103).

この後、制御部１８は、撮像部１１が生成し、画像処理部１８１によって画像処理が施された画像データに対応する画像をスルー画として表示部１４に表示させる（ステップＳ１０４）。 Thereafter, the control unit 18 causes the display unit 14 to display an image corresponding to the image data generated by the imaging unit 11 and subjected to image processing by the image processing unit 181 as a through image (step S104).

続いて、静止画レリーズボタンが操作されて静止画レリーズ信号が入力された場合（ステップＳ１０５：Ｙｅｓ）、撮像装置１は静止画撮影を行う（ステップＳ１０６）。この後、制御部１８は、撮影した画像を圧縮して静止画データとして画像データ記憶部１７１へ記録する（ステップＳ１０７）。また、制御部１８は、音声データに分類等の処理を施して音声データ記憶部１７２へ記録する（ステップＳ１０８）。ステップＳ１０８の後、撮像装置１はステップＳ１０１へ戻る。 Subsequently, when the still image release button is operated and a still image release signal is input (step S105: Yes), the imaging apparatus 1 performs still image shooting (step S106). Thereafter, the control unit 18 compresses the captured image and records it as still image data in the image data storage unit 171 (step S107). In addition, the control unit 18 performs processing such as classification on the audio data and records it in the audio data storage unit 172 (step S108). After step S108, the imaging apparatus 1 returns to step S101.

図８は、ステップＳ１０８の音声データ記録処理の概要を示すフローチャートである。撮像装置１は、画像撮影を行う場合にその前後の所定時間帯に取得した音声データに対して、以下に説明するステップＳ２０１〜Ｓ２０４の処理を繰り返し行う。 FIG. 8 is a flowchart showing an overview of the audio data recording process in step S108. The imaging device 1 repeatedly performs the processing of steps S201 to S204 described below for audio data acquired in a predetermined time zone before and after image shooting.

まず、制御部１８は、音声データを音声データ記憶部１７２へ記録する（ステップＳ２０１）。その後、音声処理部１８２は、音声データを基準周波数ｆ₀との大小に基いて２つの部分音声データに分離する（ステップＳ２０２）。続いて、音声処理部１８２は、分離された２つの部分音声データの最大音圧レベルＬ_maxを基準音圧レベルＬ₀と比較することによって、各音声データの音圧レベルのレベル判定を行う（ステップＳ２０３）。 First, the control unit 18 records audio data in the audio data storage unit 172 (step S201). Thereafter, the voice processing unit 182 separates the voice data into two partial voice data based on the magnitude of the reference frequency f ₀ (step S202). Subsequently, the sound processing unit 182 determines the sound pressure level of each sound data by comparing the maximum sound pressure level L _max of the two separated partial sound data with the reference sound pressure level L ₀ ( Step S203).

この後、音声処理部１８２は、レベル判定結果に応じて部分音声データを音声データ記憶部１７２へ記録する（ステップＳ２０４）。例えば、図２に示す周波数スペクトル２００を有する音声データの場合には、部分周波数スペクトル２０１に対応する第１部分音声データおよび部分周波数スペクトル２０２に対応する第２部分音声データが音声データ記憶部１７２に記録される。ここで、音声データのレベル判定結果が図４に示す条件を満たさない場合、音声処理部１８２はその音声データを音声データ記憶部１７２に記録しないで削除する。 Thereafter, the voice processing unit 182 records the partial voice data in the voice data storage unit 172 according to the level determination result (step S204). For example, in the case of audio data having the frequency spectrum 200 shown in FIG. 2, the first partial audio data corresponding to the partial frequency spectrum 201 and the second partial audio data corresponding to the partial frequency spectrum 202 are stored in the audio data storage unit 172. To be recorded. If the audio data level determination result does not satisfy the condition shown in FIG. 4, the audio processing unit 182 deletes the audio data without recording it in the audio data storage unit 172.

撮像装置１は、全てのループが終了した場合、図７に示すメインルーチンへ戻る。 When all the loops are completed, the imaging apparatus 1 returns to the main routine shown in FIG.

次に、図７のステップＳ１０５で静止画レリーズ信号が入力されなかった場合（ステップＳ１０５：Ｎｏ）を説明する。この場合において、動画レリーズボタンが押されて動画レリーズ信号が入力されたとき（ステップＳ１０９：Ｙｅｓ）、撮像装置１は動画撮影を行う（ステップＳ１１０）。一方、ステップＳ１０９において、動画レリーズ信号が入力されないとき（ステップＳ１０９：Ｎｏ）、撮像装置１はステップＳ１０１へ戻る。 Next, a case where the still image release signal is not input in step S105 of FIG. 7 (step S105: No) will be described. In this case, when the moving image release button is pressed and a moving image release signal is input (step S109: Yes), the imaging apparatus 1 performs moving image shooting (step S110). On the other hand, when the moving image release signal is not input in step S109 (step S109: No), the imaging device 1 returns to step S101.

図９は、ステップＳ１１０の動画撮影処理の概要を示すフローチャートである。まず、制御部１８は、動画データおよび音声データの記録を開始する（ステップＳ３０１）。 FIG. 9 is a flowchart showing an overview of the moving image shooting process in step S110. First, the control unit 18 starts recording moving image data and audio data (step S301).

この後、制御部１８は、動画データを画像データ記憶部１７１へ記録する（ステップＳ３０２）。その後、静止画レリーズ信号が入力された場合（ステップＳ３０３：Ｙｅｓ）、制御部１８は静止画データを記録する（ステップＳ３０４）。一方、静止画レリーズ信号が入力されない場合（ステップＳ３０３：Ｎｏ）、撮像装置１は後述するステップＳ３０９へ移行する。 Thereafter, the control unit 18 records the moving image data in the image data storage unit 171 (step S302). Thereafter, when a still image release signal is input (step S303: Yes), the control unit 18 records still image data (step S304). On the other hand, when the still image release signal is not input (step S303: No), the imaging apparatus 1 proceeds to step S309 described later.

撮像装置１は、ステップＳ３０２〜Ｓ３０４の処理と並行して、音声記録処理も行う。音声記録処理に対応するステップＳ３０５〜Ｓ３０８の処理は、上述したステップＳ２０１〜Ｓ２０４の処理に順次対応している。 The imaging device 1 also performs audio recording processing in parallel with the processing in steps S302 to S304. The processes in steps S305 to S308 corresponding to the voice recording process sequentially correspond to the processes in steps S201 to S204 described above.

ステップＳ３０４およびＳ３０８の後、操作入力部１２によって動画撮影の終了指示が入力された場合（ステップＳ３０９：Ｙｅｓ）、制御部１８は、動画データおよび音声データの記録を終了する（ステップＳ３１０）。この後、撮像装置１は、図７に示すメインルーチンへ戻る。 After step S304 and S308, when an instruction to end moving image shooting is input by the operation input unit 12 (step S309: Yes), the control unit 18 ends recording of moving image data and audio data (step S310). Thereafter, the imaging apparatus 1 returns to the main routine shown in FIG.

ステップＳ３０９において、操作入力部１２によって動画撮影の終了指示が入力されない場合（ステップＳ３０９：Ｎｏ）、撮像装置１はステップＳ３０１へ戻る。 In step S309, when the operation input unit 12 does not input a moving image shooting end instruction (step S309: No), the imaging apparatus 1 returns to step S301.

次に、図７のステップＳ１０２において、撮像装置１が撮影モードに設定されていない場合（ステップＳ１０２：Ｎｏ）を説明する。この場合、撮像装置１は再生処理を行い（ステップＳ１１１）、ステップＳ１０１へ戻る。 Next, the case where the imaging device 1 is not set to the shooting mode in step S102 of FIG. 7 (step S102: No) will be described. In this case, the imaging apparatus 1 performs a reproduction process (step S111) and returns to step S101.

図１０は、撮像装置１が行う再生処理の概要を示すフローチャートである。図１０において、制御部１８は、表示部１４に再生対象画像を所定数だけ一覧表示させる（ステップＳ４０１）。続いて、撮像装置１がスライドショーモードに設定されている場合（ステップＳ４０２：Ｙｅｓ）、制御部１８はスライドショー表示を表示部６に行わせる（ステップＳ４０３）。その後、操作入力部１２によって再生処理の終了指示が入力された場合（ステップＳ４０４：Ｙｅｓ）、撮像装置１は、図７に示すメインルーチンへ戻る。一方、操作入力部１２によって再生処理の終了指示が入力されない場合（ステップＳ４０４：Ｎｏ）、撮像装置１はステップＳ４０１へ戻る。 FIG. 10 is a flowchart illustrating an outline of the reproduction process performed by the imaging apparatus 1. In FIG. 10, the control unit 18 displays a predetermined number of reproduction target images on the display unit 14 (step S401). Subsequently, when the imaging apparatus 1 is set to the slide show mode (step S402: Yes), the control unit 18 causes the display unit 6 to perform a slide show display (step S403). Thereafter, when an instruction to end the reproduction process is input by the operation input unit 12 (step S404: Yes), the imaging apparatus 1 returns to the main routine illustrated in FIG. On the other hand, when the operation input unit 12 does not input a reproduction processing end instruction (step S404: No), the imaging apparatus 1 returns to step S401.

図１１は、ステップＳ４０３におけるスライドショー表示の概要を示すフローチャートである。図１１において、撮像装置１は、全ての対象画像に対し、以下に説明するステップＳ５０１〜Ｓ５１３の処理を繰り返し行う。以下に説明するスライドショー表示においては、動画と静止画の再生時間を揃えて再生するものとする。ただし、動画が静止画の再生時間より早く終了する場合は、その時点で画像を切り換えるものとする。このようなスライドショー表示を行うことにより、動画と静止画が混じっていてもテンポ感のある画像の再生を行うことができる。 FIG. 11 is a flowchart showing an outline of the slide show display in step S403. In FIG. 11, the imaging apparatus 1 repeatedly performs the processes of steps S501 to S513 described below for all target images. In the slide show display described below, it is assumed that the playback time is the same for moving images and still images. However, if the moving image ends earlier than the playback time of the still image, the image is switched at that time. By performing such a slide show display, it is possible to reproduce an image with a sense of tempo even when moving images and still images are mixed.

まず、撮像装置１が再生装置３と通信を確立していない場合（ステップＳ５０１：Ｎｏ）を説明する。この場合において、再生する画像データに対応する第２部分音声データがあるとき（ステップＳ５０２：Ｙｅｓ）、制御部１８は、画像を表示部１４に所定時間再生表示させるとともに、第２部分音声を音声出力部１５に画像の表示時間と同じ時間だけ再生させ（ステップＳ５０３）、１つのループを終了する。これに対し、再生する画像データに対応する第２部分音声データがなければ（ステップＳ５０２：Ｎｏ）、制御部１８は画像を表示部１４に所定時間再生表示させるとともに、その画像に対応する音声を所定時間再生し（ステップＳ５０４）、１つのループを終了する。なお、第２部分音声データがない場合には、画像のみを再生して音声を再生しないようにしてもよい。 First, a case where the imaging device 1 has not established communication with the playback device 3 (step S501: No) will be described. In this case, when there is the second partial audio data corresponding to the image data to be reproduced (step S502: Yes), the control unit 18 reproduces and displays the image on the display unit 14 for a predetermined time, and the second partial audio is audio. The output unit 15 is caused to reproduce only the same time as the image display time (step S503), and one loop is terminated. On the other hand, if there is no second partial audio data corresponding to the image data to be reproduced (step S502: No), the control unit 18 reproduces and displays the image on the display unit 14 for a predetermined time, and also outputs the audio corresponding to the image. Playback is performed for a predetermined time (step S504), and one loop is terminated. When there is no second partial audio data, only the image may be reproduced and the audio may not be reproduced.

次に、撮像装置１が再生装置３と通信を確立している場合（ステップＳ５０１：Ｙｅｓ）を説明する。この場合、再生する画像データに対応する第２部分音声データがあれば（ステップＳ５０５：Ｙｅｓ）、送信ファイル作成部１８３は、互いに対応する画像データおよび第２部分音声データを含む送信ファイルを作成する（ステップＳ５０６）。ここで生成される送信ファイルは、図５に示す送信ファイル４００の音声データ４０３として、第２部分音声データを適用した構成を有する。 Next, a case where the imaging device 1 has established communication with the playback device 3 (step S501: Yes) will be described. In this case, if there is second partial audio data corresponding to the image data to be reproduced (step S505: Yes), the transmission file creation unit 183 creates a transmission file including the image data and the second partial audio data corresponding to each other. (Step S506). The transmission file generated here has a configuration in which the second partial audio data is applied as the audio data 403 of the transmission file 400 shown in FIG.

この後、通信部１６は、制御部１８の制御のもと、再生装置３へ送信ファイルをストリーミング送信する（ステップＳ５０７）。 Thereafter, the communication unit 16 performs streaming transmission of the transmission file to the playback device 3 under the control of the control unit 18 (step S507).

通信部１６による送信ファイルの送信が完了した場合（ステップＳ５０８：Ｙｅｓ）、撮像装置１は後述するステップＳ５１１へ移行する。一方、通信部１６による送信ファイルの送信が完了していない場合（ステップＳ５０８：Ｎｏ）、通信部１６が再生装置３から画像の切換要求を受信したとき（ステップＳ５０９：Ｙｅｓ）、撮像装置１は、送信中の画像データおよび第２部分音声データの送信を中止し（ステップＳ５１０）、後述するステップＳ５１１へ移行する。ステップＳ５０９において、再生装置３から画像の切換要求を受信しないとき（ステップＳ５０９：Ｎｏ）、撮像装置１はステップＳ５０７へ戻る。 When the transmission of the transmission file by the communication unit 16 is completed (step S508: Yes), the imaging device 1 proceeds to step S511 described later. On the other hand, when transmission of the transmission file by the communication unit 16 is not completed (step S508: No), when the communication unit 16 receives an image switching request from the playback device 3 (step S509: Yes), the imaging device 1 The transmission of the image data being transmitted and the second partial audio data is stopped (step S510), and the process proceeds to step S511 described later. In step S509, when an image switching request is not received from the playback device 3 (step S509: No), the imaging device 1 returns to step S507.

ステップＳ５１１では、直前に送信した送信ファイルが一連のスライドショーの最後に送信するデータである場合（ステップＳ５１１：Ｙｅｓ）、通信部１６が再生装置３へスライドショーの終了を意味する終了信号を送信する（ステップＳ５１２）。この後、撮像装置１は、１つのループ処理を終了する。一方、ステップＳ５１１において、直前に送信したデータがスライドショーの最後のデータでない場合（ステップＳ５１１：Ｎｏ）、撮像装置１は１つのループ処理を終了する。 In step S511, when the transmission file transmitted immediately before is data to be transmitted at the end of a series of slide shows (step S511: Yes), the communication unit 16 transmits an end signal indicating the end of the slide show to the playback device 3 (step S511). Step S512). Thereafter, the imaging apparatus 1 ends one loop process. On the other hand, in step S511, when the data transmitted immediately before is not the last data of the slide show (step S511: No), the imaging device 1 ends one loop process.

ステップＳ５０５において、再生する画像データに対応する第２部分音声データがない場合（ステップＳ５０５：Ｎｏ）、送信ファイル作成部１８３は、互いに対応する画像データおよび音声データを含む送信ファイルを作成する（ステップＳ５１３）。この後、撮像装置１は、ステップＳ５０７へ移行する。 If there is no second partial audio data corresponding to the image data to be reproduced in step S505 (step S505: No), the transmission file creation unit 183 creates a transmission file including the image data and audio data corresponding to each other (step S505). S513). Thereafter, the imaging apparatus 1 proceeds to step S507.

全ての対象画像に対するループ処理が終了した場合、撮像装置１は図１０に示す再生処理へ戻る。 When the loop processing for all target images is completed, the imaging apparatus 1 returns to the reproduction processing shown in FIG.

図１２は、上述したスライドショー表示において、動作再生時に第２部分音声を用いて再生している状況を模式的に示す図である。より具体的には、図１２は、表示部１４が静止画５０１を時間Ｔ₀だけ表示した後、動画フレーム５０２〜５０５で代表的に記載されている動画を第２部分音声とともに時間Ｔ₀だけ表示し、その後、静止画５０６を時間Ｔ₀だけ表示する状況を示している。この場合、動画再生中は、被写体６００の声は出力されない（吹き出しの中に×印を付して模式的に記載）ため、被写体６００が何か話をしていても、動画から静止画へ移行する際にその話が途切れてしまうことがない。したがって、スライドショーの鑑賞者は、違和感なく画像を鑑賞することができる。 FIG. 12 is a diagram schematically showing a situation in which the second partial sound is played during operation playback in the above-described slide show display. More specifically, in FIG. 12, after the display unit 14 displays the still image 501 for the time T ₀ , the moving image typically described in the moving image frames 502 to 505 is displayed together with the second partial sound for the time T _0. It shows a situation in which a still image 506 is displayed for a time T ₀ after that. In this case, since the voice of the subject 600 is not output during moving image reproduction (schematically described with an X mark in the balloon), even if the subject 600 is talking about something, the moving image is changed to a still image. The story is not interrupted when migrating. Therefore, the viewer of the slide show can appreciate the images without a sense of incongruity.

図１２に示す場合、静止画５０１、５０６についても、対応する第２部分音声データが存在すれば第２部分音声を出力することになる。なお、動画と静止画が混在するスライドショー表示において静止画を表示する際には、音声を出力しないようにしてもよい。 In the case shown in FIG. 12, the second partial sound is output for the still images 501 and 506 if corresponding second partial sound data exists. Note that when displaying a still image in a slide show display in which a moving image and a still image are mixed, no sound may be output.

次に、図１０のステップＳ４０２において、撮像装置１がスライドショーモードに設定されていない場合（ステップＳ４０２：Ｎｏ）を説明する。この場合において、操作入力部１２によって動画の再生が選択されたとき（ステップＳ４０５：Ｙｅｓ）、制御部１８は、再生装置３との通信確立の有無を判定する（ステップＳ４０６）。制御部１８が、再生装置３との通信が確立されていると判定した場合（ステップＳ４０６：Ｙｅｓ）、通信部１６は、制御部１８の制御のもと、選択された動画の動画ファイルを再生装置３へストリーミング送信する（ステップＳ４０７）。一方、ステップＳ４０６において、制御部１８が再生装置３との通信が確立されていないと判定した場合（ステップＳ４０６：Ｎｏ）、制御部１８は、選択された動画を表示部１４に再生表示させる（ステップＳ４０８）。ステップＳ４０７またはＳ４０８の後、撮像装置１はステップＳ４０４へ移行する。 Next, the case where the imaging device 1 is not set to the slide show mode in step S402 in FIG. 10 (step S402: No) will be described. In this case, when reproduction of a moving image is selected by the operation input unit 12 (step S405: Yes), the control unit 18 determines whether or not communication with the reproduction device 3 is established (step S406). When the control unit 18 determines that communication with the playback device 3 has been established (step S406: Yes), the communication unit 16 plays the video file of the selected video under the control of the control unit 18. Streaming transmission is performed to the apparatus 3 (step S407). On the other hand, when the control unit 18 determines in step S406 that communication with the playback device 3 has not been established (step S406: No), the control unit 18 causes the display unit 14 to play back and display the selected video ( Step S408). After step S407 or S408, the imaging apparatus 1 proceeds to step S404.

ステップＳ４０５において、動画の再生が選択されていない場合（ステップＳ４０５：Ｎｏ）を説明する。この場合において、静止画の再生が選択されたとき（ステップＳ４０９：Ｙｅｓ）、撮像装置１は、再生装置３との通信確立の有無を判定する（ステップＳ４１０）。制御部１８が、再生装置３との通信が確立されていると判定した場合（ステップＳ４１０：Ｙｅｓ）において、再生対象の画像データに対応する第２部分音声データがあるとき（ステップＳ４１１：Ｙｅｓ）、送信ファイル作成部１８３は、対応する画像データおよび第２部分音声データを含む送信ファイルを作成する（ステップＳ４１２）。 A case where reproduction of a moving image is not selected in step S405 (step S405: No) will be described. In this case, when playback of a still image is selected (step S409: Yes), the imaging device 1 determines whether or not communication with the playback device 3 has been established (step S410). When the control unit 18 determines that communication with the playback device 3 has been established (step S410: Yes), when there is second partial audio data corresponding to the image data to be played back (step S411: Yes). The transmission file creation unit 183 creates a transmission file including the corresponding image data and second partial audio data (step S412).

ステップＳ４１２の後、通信部１６は、制御部１８の制御のもと、再生装置３へ送信ファイルをストリーミング送信する（ステップＳ４１３）。 After step S412, the communication unit 16 performs streaming transmission of the transmission file to the playback device 3 under the control of the control unit 18 (step S413).

その後、通信部１６による送信ファイルの送信が完了した場合（ステップＳ４１４：Ｙｅｓ）、撮像装置１はステップＳ４０４へ移行する。一方、通信部１６による送信ファイルの送信が完了していない場合（ステップＳ４１４：Ｎｏ）、通信部１６が再生装置３から終了指示を受信したとき（ステップＳ４１５：Ｙｅｓ）、撮像装置１は、その静止画データおよび第２部分音声データの送信を中止し（ステップＳ４１６）、ステップＳ４０４へ移行する。通信部１６による送信ファイルの送信が完了していない場合（ステップＳ４１４：Ｎｏ）において、通信部１６が再生装置３から終了指示を受信しないとき（ステップＳ４１５：Ｎｏ）、撮像装置１はステップＳ４１３へ戻る。 Thereafter, when transmission of the transmission file by the communication unit 16 is completed (step S414: Yes), the imaging device 1 proceeds to step S404. On the other hand, when the transmission of the transmission file by the communication unit 16 is not completed (step S414: No), when the communication unit 16 receives an end instruction from the playback device 3 (step S415: Yes), the imaging device 1 Transmission of the still image data and the second partial audio data is stopped (step S416), and the process proceeds to step S404. When the transmission of the transmission file by the communication unit 16 has not been completed (step S414: No), when the communication unit 16 does not receive an end instruction from the playback device 3 (step S415: No), the imaging device 1 goes to step S413. Return.

ステップＳ４１１において、再生対象の静止画データに対応する第２部分音声データがない場合（ステップＳ４１１：Ｎｏ）、送信ファイル作成部１８３は、対応する静止画データおよび音声データを含む送信ファイルを作成する（ステップＳ４１７）。この後、撮像装置１は、ステップＳ４１３へ移行する。なお、再生対象の静止画データに対応する第２部分音声データがない場合、送信ファイル作成部１８３は、音声データを含まずに静止画データを用いて送信ファイルを作成するようにしてもよい。 If there is no second partial audio data corresponding to the still image data to be reproduced in step S411 (step S411: No), the transmission file creation unit 183 creates a transmission file including the corresponding still image data and audio data. (Step S417). Thereafter, the imaging apparatus 1 proceeds to step S413. If there is no second partial audio data corresponding to still image data to be played back, the transmission file creation unit 183 may create a transmission file using still image data without including audio data.

次に、ステップＳ４１０において、制御部１８が再生装置３との通信が確立していないと判定した場合（ステップＳ４１０：Ｎｏ）を説明する。この場合、再生対象の静止画データに対応する第１部分音声データがあるとき（ステップＳ４１８：Ｙｅｓ）、制御部１８は、静止画および第１部分音声を再生する（ステップＳ４１９）。すなわち、制御部１８は、静止画を表示部１４に表示させるとともに、その静止画に対応する第１部分音声を音声出力部１５に出力させる。その後、撮像装置１は、ステップＳ４０４へ移行する。 Next, the case where the control unit 18 determines in step S410 that communication with the playback device 3 has not been established (step S410: No) will be described. In this case, when there is the first partial audio data corresponding to the still image data to be reproduced (step S418: Yes), the control unit 18 reproduces the still image and the first partial audio (step S419). That is, the control unit 18 displays a still image on the display unit 14 and causes the audio output unit 15 to output the first partial sound corresponding to the still image. Thereafter, the imaging apparatus 1 proceeds to step S404.

一方、ステップＳ４１８において、再生対象の静止画データに対応する第１部分音声データがないとき（ステップＳ４１８：Ｎｏ）、制御部１８は、静止画および音声を再生する（ステップＳ４２０）。すなわち、制御部１８は、静止画を表示部１４に表示させるとともに、その静止画に対応する音声を音声出力部１５に出力させる。その後、撮像装置１は、ステップＳ４０４へ移行する。なお、ステップＳ４２０において、表示部１４が静止画を表示するのみとし、対応する音声を音声出力部１５が出力しないようにしてもよい。 On the other hand, when there is no first partial audio data corresponding to the still image data to be reproduced in step S418 (step S418: No), the control unit 18 reproduces the still image and the audio (step S420). That is, the control unit 18 displays a still image on the display unit 14 and causes the audio output unit 15 to output sound corresponding to the still image. Thereafter, the imaging apparatus 1 proceeds to step S404. In step S420, the display unit 14 may only display a still image, and the audio output unit 15 may not output the corresponding audio.

ステップＳ４０５で動画の再生が選択されず（ステップＳ４０５：Ｎｏ）、かつステップＳ４０９で静止画の再生が選択されない場合（ステップＳ４０９：Ｎｏ）、撮像装置１はステップＳ４０４へ移行する。 If reproduction of a moving image is not selected in step S405 (step S405: No) and reproduction of a still image is not selected in step S409 (step S409: No), the imaging apparatus 1 proceeds to step S404.

次に、再生装置が行う処理の概要を説明する。図１３は、再生装置３が行う処理の概要を示すフローチャートである。まず、再生装置３の電源がオンになっている場合（ステップＳ６０１：Ｙｅｓ）、再生装置３はステップＳ６０２へ移行する。一方、再生装置３の電源がオンになっていない場合（ステップＳ６０１：Ｎｏ）、撮像装置１は一連の処理を終了する。 Next, an outline of processing performed by the playback apparatus will be described. FIG. 13 is a flowchart illustrating an outline of processing performed by the playback device 3. First, when the power of the playback device 3 is on (step S601: Yes), the playback device 3 proceeds to step S602. On the other hand, when the playback apparatus 3 is not turned on (step S601: No), the imaging apparatus 1 ends the series of processes.

続いて、通信部３１が撮像装置１から送信ファイルを受信した場合（ステップＳ６０２：Ｙｅｓ）、解釈部３６１は、送信ファイルのテキスト情報を参照して、表示対象のデータがスライドショーの一部をなす送信ファイルであるか否かを判定する（ステップＳ６０３）。解釈部３６１がスライドショーのデータであると判定した場合（ステップＳ６０３：Ｙｅｓ）、再生制御部３６２は、表示部３３に対して受信した画像データに対応する画像を表示させるとともに、音声出力部３４に対して受信した音声データに対応する音声を出力させることによってストリーミング再生を開始する（ステップＳ６０４）。なお、ステップＳ６０２で通信部３１が撮像装置１から送信ファイルを受信しない場合（ステップＳ６０２：Ｎｏ）、撮像装置１はステップＳ６０１へ戻る。 Subsequently, when the communication unit 31 receives a transmission file from the imaging device 1 (step S602: Yes), the interpretation unit 361 refers to the text information of the transmission file and the display target data forms part of the slide show. It is determined whether the file is a transmission file (step S603). When the interpretation unit 361 determines that the data is slide show data (step S603: Yes), the reproduction control unit 362 causes the display unit 33 to display an image corresponding to the received image data and causes the audio output unit 34 to display the image. On the other hand, the streaming reproduction is started by outputting the audio corresponding to the received audio data (step S604). If the communication unit 31 does not receive a transmission file from the imaging device 1 in step S602 (step S602: No), the imaging device 1 returns to step S601.

再生装置３がストリーミング再生を開始してから所定時間が経過した場合（ステップＳ６０５：Ｙｅｓ）、通信部３１は、制御部３６の制御のもと、撮像装置１に対して切換要求を送信する（ステップＳ６０６）。この後、通信部３１が撮像装置１から新たな送信ファイルを受信した場合（ステップＳ６０７：Ｙｅｓ）、再生装置３はステップＳ６０４へ戻る。再生装置３がストリーミング再生を開始してから所定時間が経過していない場合（ステップＳ６０５：Ｎｏ）、再生装置３はステップＳ６０５を繰り返す。 When the predetermined time has elapsed since the playback device 3 started streaming playback (step S605: Yes), the communication unit 31 transmits a switching request to the imaging device 1 under the control of the control unit 36 ( Step S606). Thereafter, when the communication unit 31 receives a new transmission file from the imaging device 1 (step S607: Yes), the reproducing device 3 returns to step S604. If the predetermined time has not elapsed since the playback device 3 started streaming playback (step S605: No), the playback device 3 repeats step S605.

ステップＳ６０７で新たな送信ファイルを所定時間内に受信しなかった場合（ステップＳ６０７：Ｎｏ）において、スライドショー終了信号をその所定時間内に受信したとき（ステップＳ６０８：Ｙｅｓ）、再生装置３はステップＳ６０１へ戻る。これに対して、スライドショー終了信号をその所定時間内に受信しないとき（ステップＳ６０８：Ｎｏ）、制御部３６は表示部３３にエラー表示を表示させ（ステップＳ６０９）、ステップＳ６０１へ戻る。なお、再生装置３は、エラー表示の後、スライドショーを続行するか否かの選択入力をユーザに促す表示を行うようにしてもよい。この場合、再生装置３は、操作入力部１２の入力内容に応じた処理を行うようにする。 When a new transmission file is not received within the predetermined time in step S607 (step S607: No), when the slide show end signal is received within the predetermined time (step S608: Yes), the playback device 3 performs step S601. Return to. On the other hand, when the slide show end signal is not received within the predetermined time (step S608: No), the control unit 36 displays an error display on the display unit 33 (step S609), and returns to step S601. Note that the playback device 3 may display a message prompting the user to select whether to continue the slide show after the error display. In this case, the playback device 3 performs processing according to the input content of the operation input unit 12.

ステップＳ６０３において、送られてきた画像データがスライドショーのデータでない場合（ステップＳ６０３：Ｎｏ）、再生装置３は通常の再生処理を行い（ステップＳ６１０）、ステップＳ６０１へ戻る。 In step S603, if the transmitted image data is not slide show data (step S603: No), the playback device 3 performs normal playback processing (step S610), and the process returns to step S601.

以上説明した本発明の一実施の形態によれば、音声データを複数の部分音声データに分離し、表示する画像データの表示態様に適合する音声データまたは部分音声データを選択して該画像と対応付けているため、画像を再生する際に、その画像の撮影時の雰囲気を再現するとともに、鑑賞者に対して快適な状態で再生画像を鑑賞させることができる。 According to the embodiment of the present invention described above, the audio data is separated into a plurality of partial audio data, and the audio data or the partial audio data suitable for the display mode of the image data to be displayed is selected to correspond to the image. Therefore, when the image is reproduced, the atmosphere at the time of shooting the image can be reproduced and the viewer can view the reproduced image in a comfortable state.

また、本実施の形態によれば、部分音声データを生成する際に、音声データの周波数スペクトルを、周波数を含む音の特性に基づいて分離するため、例えば人の声や背景音を分離することが可能となる。したがって、再生時の状況に適した音声を出力することが可能となる。 Further, according to the present embodiment, when generating partial sound data, the frequency spectrum of the sound data is separated based on the characteristics of the sound including the frequency, so that, for example, a human voice or background sound is separated. Is possible. Therefore, it is possible to output sound suitable for the situation during reproduction.

また、本実施の形態によれば、人の声を含む部分音声データ（第１部分音声データ）を生成しているため、それよりも高周波数の部分音声データ（第２部分音声データ）に含まれる音は、ヒーリング効果の高い音となる。したがって、このような部分音声データを用いて画像を再生することにより、人の声やその他の雑音を排除した、ヒーリング効果の高い背景音で再生を行うことが可能となる。 Further, according to the present embodiment, since partial voice data (first partial voice data) including a human voice is generated, it is included in partial voice data (second partial voice data) having a higher frequency than that. The sound that is heard is a sound with a high healing effect. Therefore, by reproducing an image using such partial audio data, it is possible to perform reproduction with a background sound with a high healing effect that eliminates human voice and other noises.

また、本実施の形態によれば、複数の画像を所定時間ずつ連続的に表示するスライドショーモードで画像を再生表示する場合、該画像と同期して出力する音声として、人の声を含まない部分音声データを最優先で選択するため、動画が途中で切換るような場合であっても、人の声が途切れて出力させるのを極力回避することができる。 Further, according to the present embodiment, when an image is reproduced and displayed in a slide show mode in which a plurality of images are continuously displayed for a predetermined time, a portion that does not include a human voice as a sound output in synchronization with the image Since audio data is selected with the highest priority, it is possible to avoid as much as possible that a human voice is interrupted and output even when a moving image is switched halfway.

ここまで、本発明を実施するための一形態を説明してきたが、本発明は上述した実施の形態によってのみ限定されるべきものではない。 Up to this point, one mode for carrying out the present invention has been described, but the present invention should not be limited only by the above-described embodiment.

例えば、本発明において、送信ファイル作成部が、画像データとその画像データに対応する音声データおよび部分音声データとを含む送信ファイルを作成するようにしてもよい。この場合には、再生装置に、画像に適合した音声または部分音声を判定して出力する機能を具備させればよい。 For example, in the present invention, the transmission file creation unit may create a transmission file including image data, audio data corresponding to the image data, and partial audio data. In this case, the playback device may be provided with a function of determining and outputting a sound or partial sound suitable for the image.

また、本発明において、送信ファイル作成部が作成した送信ファイルを記録媒体に記録し、この記録媒体を再生装置に差し込んで再生装置側で読み取ることができる構成としてもよい。この場合には、再生装置に記録媒体が記録する情報を読み取る記録媒体インタフェースを設けるとともに、その読み取ったデータに基づいて、再生する画像に適合した音声または部分音声を判定して出力する機能を具備させればよく、撮像装置と再生装置とが通信接続されている必要はない。 In the present invention, the transmission file created by the transmission file creation unit may be recorded on a recording medium, and the recording medium may be inserted into the playback device and read by the playback device. In this case, the reproducing apparatus is provided with a recording medium interface for reading information recorded by the recording medium, and has a function of determining and outputting sound or partial sound suitable for the image to be reproduced based on the read data. The imaging device and the playback device need not be connected for communication.

また、本発明において、音声データを分離する際に、音の強度の時間変化のパターンに応じて音声データを分離するようにしてもよいし、人の声の声紋の特徴を用いて音声データから人の声を含む部分音声データを分離するようにしてもよい。 In the present invention, when separating voice data, the voice data may be separated according to a temporal change pattern of the sound intensity, or from the voice data using characteristics of a voice print of a human voice. You may make it isolate | separate the partial audio | voice data containing a human voice.

１撮像装置
３再生装置
１１撮像部
１２、３２操作入力部
１３音声入力部
１４、３３表示部
１５、３４音声出力部
１６、３１通信部
１７、３５記憶部
１８、３６制御部
１００通信システム
１７１画像データ記憶部
１７２音声データ記憶部
１７３プログラム記憶部
１８１画像処理部
１８２音声処理部
１８３送信ファイル作成部
１８４、３６３タイマー
２００周波数スペクトル
２０１、２０２部分周波数スペクトル
３６１解釈部
３６２再生制御部 DESCRIPTION OF SYMBOLS 1 Imaging device 3 Reproduction | regeneration apparatus 11 Imaging part 12, 32 Operation input part 13 Audio | voice input part 14, 33 Display part 15, 34 Audio | voice output part 16, 31 Communication part 17, 35 Storage part 18, 36 Control part 100 Communication system 171 Image Data storage unit 172 Audio data storage unit 173 Program storage unit 181 Image processing unit 182 Audio processing unit 183 Transmission file creation unit 184, 363 Timer 200 Frequency spectrum 201, 202 Partial frequency spectrum 361 Interpretation unit 362 Playback control unit

Claims

An imaging unit that captures an image and generates electronic data of the image;
An audio input unit that inputs audio and converts it into an electrical signal;
An audio processing unit that generates audio data by performing processing on the electrical signal converted by the audio input unit, and that separates the audio data into a plurality of partial audio data based on the characteristics of the sound;
Control for selecting audio data or partial audio data corresponding to audio to be output in synchronization with the image and associating it with the image according to a display mode when reproducing an image corresponding to the image data generated by the imaging unit And
An imaging apparatus comprising:

The imaging apparatus according to claim 1, wherein the sound characteristics include sound intensity.

The sound characteristics include frequency,
The voice processing unit
3. The partial audio data is generated from a partial frequency spectrum obtained by calculating a frequency spectrum of the audio data and separating the frequency spectrum based on a characteristic of the sound. Imaging device.

The voice processing unit
The imaging apparatus according to claim 3, wherein partial sound data including a human voice can be generated.

The controller is
When playing back and displaying images in the slide show mode in which a plurality of images are continuously displayed for a predetermined time, it is necessary to select, with highest priority, partial audio data that does not include human voices as audio to be output in synchronization with the images. The imaging apparatus according to claim 4, wherein the imaging apparatus is characterized.

A display unit capable of displaying an image corresponding to the image data;
An audio output unit capable of outputting audio corresponding to the audio data or partial audio corresponding to the partial audio data;
Further comprising
The controller is
The sound or the partial sound corresponding to the display mode of the image on the display unit is selected and output to the sound output unit in synchronization with the display of the image. The imaging device described.

A communication unit that transmits and receives information via a network with a playback device capable of displaying images and outputting sound;
The information to be output to the playback device includes image data and audio data that is determined in accordance with the display mode of the image corresponding to the image data and is output in synchronization with the image, and the playback is performed via the communication unit. A transmission file creation unit for creating a transmission file to be transmitted to the device;
The imaging apparatus according to claim 1, further comprising:

A display unit capable of displaying an image;
An audio output unit capable of outputting audio suitable for an image displayed by the display unit;
An interpretation unit for interpreting the description content of the data structure having image data and sound data including sound that is defined in accordance with a display mode of the image corresponding to the image data and output in synchronization with the image;
A reproduction control unit for displaying an image on the display unit based on a result of interpretation by the interpretation unit and outputting sound to the audio output unit;
A playback apparatus comprising:

9. The communication apparatus according to claim 8, further comprising a communication unit capable of communicating with the imaging apparatus according to claim 7 via the network and receiving the transmission file as the data structure from the imaging apparatus. Playback device.

Image data,
Mode information that defines a display mode of an image corresponding to the image data;
Audio data including audio output in synchronization with the image when the image is displayed according to the display mode;
A data structure characterized by comprising:

An imaging step of capturing an image and generating electronic data of the image;
An audio input step of inputting an audio of a period including the time of imaging of the image in the imaging step and converting it into an electrical signal;
A voice processing step of generating voice data by performing processing on the electrical signal converted in the voice input step, and separating the voice data into a plurality of partial voice data based on a characteristic of the sound;
Correspondence of selecting audio data or partial audio data corresponding to audio output in synchronization with the image and corresponding to the image according to the display mode when reproducing the image corresponding to the image data generated in the imaging step Attaching step,
A method for controlling an imaging apparatus, comprising:

An imaging device that captures an image and generates electronic data of the image, and also inputs sound and converts it into an electrical signal.
A voice processing step of generating voice data by performing processing on the electrical signal, and separating the voice data into a plurality of partial voice data based on a sound characteristic;
An association step of selecting audio data or partial audio data corresponding to audio output in synchronization with the image and associating it with the image according to a display mode when reproducing the image;
A program for an imaging apparatus, characterized in that