JP2011035708A

JP2011035708A - Acoustic signal processor, and imaging apparatus

Info

Publication number: JP2011035708A
Application number: JP2009180616A
Authority: JP
Inventors: Masahiro Yoshida; 昌弘吉田; Makoto Yamanaka; 誠山中; Tomoki Oku; 智岐奥
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 2009-08-03
Filing date: 2009-08-03
Publication date: 2011-02-17

Abstract

<P>PROBLEM TO BE SOLVED: To provide: an acoustic signal processor capable of recording or reproducing a speaker sound by appropriately controlling a sound image direction of the speaker sound; or an imaging apparatus which records the speaker sound by appropriately controlling the sound image direction of the speaker sound when an acoustic signal of sound to be generated from the speaker is detected in the acoustic signal when an image signal and the acoustic signal are recorded by being associated with each other or when the image signal and the acoustic signal recorded by being associated with each other are reproduced. <P>SOLUTION: A speaker sound control section 10 separates and extracts sound from a plurality of sound source collected by microphones 5L and 5R for every sound source, and determines whether or not the separated and extracted sound from each sound source is the speaker sound. When it is determined that the sound is the speaker sound, the sound image direction is controlled so that a sound image from the sound source substantially matches to a photographing direction by the imaging apparatus 1. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、音響信号を処理する音響信号処理装置に関し、特に、スピーカから発せられる音響信号の音像方向を制御する音響信号処理装置又は撮像装置に関する。 The present invention relates to an acoustic signal processing apparatus that processes an acoustic signal, and more particularly, to an acoustic signal processing apparatus or an imaging apparatus that controls a sound image direction of an acoustic signal emitted from a speaker.

講演会や各種イベント等、人が大勢集まる場所では、話し手はマイクロフォンを使用して話すことが多い。マイクロフォンに入力された話し手の声は、マイクロフォンが接続されている増幅器によって増幅され、増幅器に接続されているスピーカから大音量となって出力される。したがって、話し手の声のほとんどは、スピーカから聴こえて来ることになる。 In places where people gather, such as lectures and various events, speakers often use a microphone to speak. The speaker's voice input to the microphone is amplified by an amplifier to which the microphone is connected, and is output at a loud volume from a speaker connected to the amplifier. Therefore, most of the speaker's voice comes from the speaker.

そして、複数のスピーカが増幅器に接続されており、これらスピーカが話し手の位置を中心に非対称に配置されている、或いは、スピーカは１つだけであるがこのスピーカが話し手の位置とは全く別の位置に配置されているような場合、話し手の声は、話し手の存在する位置とは異なる位置から聴こえて来ることになる。 A plurality of speakers are connected to the amplifier, and these speakers are arranged asymmetrically around the position of the speaker, or there is only one speaker, but this speaker is completely different from the position of the speaker. In such a case, the speaker's voice is heard from a position different from the position where the speaker exists.

このようなシーンの画像と音を、例えば、２つのステレオマイクを備えたビデオカメラで話し手が撮影領域の真ん中付近に位置するように記録し、記録した画像信号と音響信号を再生して視聴する場合を考える。再生装置が、例えば、Ｌチャネル及びＲチャネル用の２つのスピーカによるステレオ再生を行なう場合、再生装置のモニタの真ん中付近には、話し手が話をしているシーンが写し出されているにもかかわらず、話し手の声はいずれか一方のスピーカからしか聴こえない、或いはいずれか一方に偏って聴こえることになる。このような画像と音の再生は、これを視聴している視聴者に非常に違和感を感じさせるものであり、問題である。 For example, an image and sound of such a scene are recorded with a video camera having two stereo microphones so that the speaker is positioned near the middle of the shooting area, and the recorded image signal and sound signal are reproduced and viewed. Think about the case. For example, when the playback device performs stereo playback using two speakers for the L channel and the R channel, a scene where the speaker is speaking is shown near the center of the monitor of the playback device. The speaker's voice can only be heard from one of the speakers, or can be heard biased to either one. Such reproduction of images and sounds is a problem because it makes viewers who are watching the image and sound feel very uncomfortable.

また、運動会などでは、ＢＧＭが１つのスピーカからしか流れてこないことがあり、このようなシーンの画像及び音を記録して視聴する場合も、ＢＧＭがいずれか一方のスピーカからしか聴こえず、或いはいずれか一方に偏って聴こえるため、迫力感に欠けてしまい、問題である。 Also, in sports events, BGM may only flow from one speaker. When recording and viewing images and sounds of such scenes, BGM can only be heard from one of the speakers, or Since it can be heard in either direction, it lacks a sense of power and is a problem.

尚、画像に合わせて音像全体を制御する従来技術として下記特許文献１がある。下記特許文献１では、撮影時に取得した画像信号及び音響信号を再生する際に、再生する画像信号の画角に応じ、同時に再生する音声信号の指向性を制御し再生音場を補正する技術であり、上記問題を解決するものではない。 As a conventional technique for controlling the entire sound image in accordance with an image, there is Patent Document 1 below. In the following Patent Document 1, when reproducing an image signal and a sound signal acquired at the time of shooting, the directivity of a sound signal to be reproduced is controlled in accordance with the angle of view of the image signal to be reproduced and the reproduction sound field is corrected. Yes, it does not solve the above problem.

特開2006-287544JP2006-287544

本発明は、上記問題に鑑みてなされたものであり、画像信号と音響信号を互いに関連付けて記録する際、又は互いに関連付けて記録された画像信号と音響信号を再生する際、当該音響信号中にスピーカから発せられる音の音響信号を検出した場合には、当該スピーカ音の音像方向を適切に制御して記録する又は再生することができる音響信号処理装置、或いは当該スピーカ音の音像方向を適切に制御して記録する撮像装置を提供することを目的とする。 The present invention has been made in view of the above problems, and when an image signal and an audio signal are recorded in association with each other, or when an image signal and an audio signal recorded in association with each other are reproduced, When the sound signal of the sound emitted from the speaker is detected, the sound signal processing device capable of appropriately recording and reproducing the sound image direction of the speaker sound, or the sound image direction of the speaker sound appropriately An object of the present invention is to provide an imaging apparatus that performs control and recording.

本発明に係る第１の音響信号処理装置は、撮影の際に到来する音を集音することにより該音の音響信号を取得する集音手段と、前記音響信号からスピーカ音信号を検出するスピーカ音検出手段と、前記スピーカ音信号が検出された場合、該スピーカ音信号の音像方向が撮影方向と一致するように、前記音響信号に音響信号処理を施す音像方向制御手段と、前記音響信号処理を施された前記音響信号を記録する音響信号記録手段と、を備えたことを特徴とする。 A first acoustic signal processing apparatus according to the present invention includes a sound collecting means for collecting a sound arriving at the time of photographing to acquire an acoustic signal of the sound, and a speaker for detecting a speaker sound signal from the acoustic signal. Sound detection means, sound image direction control means for performing sound signal processing on the sound signal so that the sound image direction of the speaker sound signal coincides with the shooting direction when the speaker sound signal is detected, and the sound signal processing And acoustic signal recording means for recording the acoustic signal subjected to.

本発明に係る第２の音響信号処理装置は、撮影の際に到来する音を集音することにより該音の音響信号を取得する集音手段と、前記音響信号からスピーカ音信号を検出するスピーカ音検出手段と、前記スピーカ音信号が検出された場合、前記音響信号の音像方向が撮影方向と一致するように、前記音響信号に音響信号処理を施す音像方向制御手段と、前記音響信号処理を施された前記音響信号を記録する音響信号記録手段と、を備えたことを特徴とする。 A second acoustic signal processing apparatus according to the present invention includes a sound collecting means for collecting an incoming sound at the time of shooting to acquire an acoustic signal of the sound, and a speaker for detecting a speaker sound signal from the acoustic signal. Sound detection means, sound image direction control means for applying sound signal processing to the sound signal so that the sound image direction of the sound signal coincides with the shooting direction when the speaker sound signal is detected, and the sound signal processing. And an acoustic signal recording means for recording the applied acoustic signal.

本発明に係る撮像装置は、上記第１又は第２の音響信号処理装置を備え、撮影対象を撮影することにより該撮影対象の画像信号を取得する撮像手段と、前記撮像手段により取得された前記画像信号を、前記音響信号処理装置が備える音像方向制御手段により音響信号処理が施された音響信号と関連付けて記録する画像信号記録手段と、をさらに備えたことを特徴とする。 An imaging device according to the present invention includes the first or second acoustic signal processing device, and obtains an image signal of the imaging target by imaging the imaging target, and the imaging unit acquired by the imaging unit And image signal recording means for recording the image signal in association with the acoustic signal subjected to the acoustic signal processing by the sound image direction control means provided in the acoustic signal processing device.

本発明に係る撮像装置は、さらに前記画像信号から人物の顔画像信号を検出する顔検出手段と、前記画像信号からマイクロフォンの画像信号を検出するマイクロフォン検出手段と、を備え、前記音響信号処理装置が備える前記スピーカ音検出手段は、前記顔検出手段により人物の顔画像信号が検出され、且つ、前記マイクロフォン検出手段によりマイクロフォンの画像信号が検出された場合に、前記音響信号処理装置が備える集音手段によって取得された音響信号からスピーカ音信号を検出することを特徴とする。 The imaging apparatus according to the present invention further includes face detection means for detecting a human face image signal from the image signal, and microphone detection means for detecting a microphone image signal from the image signal, and the acoustic signal processing apparatus The speaker sound detection means included in the sound collecting device provided in the acoustic signal processing apparatus when a face image signal of a person is detected by the face detection means and a microphone image signal is detected by the microphone detection means. A speaker sound signal is detected from an acoustic signal acquired by the means.

また、本発明に係る撮像装置は、撮影対象を撮影することにより該撮影対象の画像信号を取得する撮像手段と、前記撮影の際に到来する音を集音することにより該音の第１音響信号を取得する集音手段と、前記第１音響信号を前記撮像手段により取得された画像信号と関連付けて記録する第１記録手段と、前記第１音響信号に基づく第２音響信号を生成するとともに、該第２音響信号を前記撮像手段により取得された画像信号と関連付けて記録する第２記録手段と、前記第１記録手段と前記第２記録手段とを切り替える切換え手段と、を備え、前記第２記録手段は、前記音響信号からスピーカ音信号を検出するスピーカ音検出手段と、前記スピーカ音信号が検出された場合、該スピーカ音信号の音像方向が撮影方向と一致するように、前記スピーカ音信号に音響信号処理を施すことにより前記第２音響信号を生成する音像方向制御手段と、
を備えたことを特徴とする。 The image pickup apparatus according to the present invention includes an image pickup unit that acquires an image signal of the shooting target by shooting the shooting target, and a first sound of the sound by collecting the sound that arrives at the time of shooting. A sound collecting means for acquiring a signal, a first recording means for recording the first acoustic signal in association with the image signal acquired by the imaging means, and a second acoustic signal based on the first acoustic signal. A second recording means for recording the second acoustic signal in association with the image signal acquired by the imaging means; and a switching means for switching between the first recording means and the second recording means. (2) recording means for detecting a speaker sound signal from the acoustic signal; and when the speaker sound signal is detected, the recording means is arranged so that the sound image direction of the speaker sound signal coincides with the shooting direction. A sound image direction control means for generating the second audio signal by performing audio signal processing over mosquito sound signal,
It is provided with.

さらにまた、本発明に係る撮像装置は、撮影対象を撮影することにより該撮影対象の画像信号を取得する撮像手段と、前記撮影の際に到来する音を集音することにより該音の第１音響信号を取得する集音手段と、前記第１音響信号を前記撮像手段により取得された画像信号と関連付けて記録する第１記録手段と、前記第１音響信号に基づく第２音響信号を生成するとともに、該第２音響信号を前記撮像手段により取得された画像信号と関連付けて記録する第２記録手段と、前記第１記録手段と前記第２記録手段とを切り替える切換え手段と、を備え、
前記第２記録手段は、前記音響信号からスピーカ音信号を検出するスピーカ音検出手段と、前記スピーカ音信号が検出された場合、前記音響信号の音像方向が撮影方向と一致するように、前記音響信号に音響信号処理を施すことにより前記第２音響信号を生成する音像方向制御手段と、を備えたことを特徴とする。 Furthermore, the imaging apparatus according to the present invention includes an imaging unit that acquires an image signal of the shooting target by shooting the shooting target, and a first of the sounds by collecting the sound that arrives at the time of shooting. Sound collecting means for acquiring an acoustic signal, first recording means for recording the first acoustic signal in association with the image signal acquired by the imaging means, and generating a second acoustic signal based on the first acoustic signal And a second recording means for recording the second acoustic signal in association with the image signal acquired by the imaging means, and a switching means for switching between the first recording means and the second recording means,
The second recording means includes speaker sound detection means for detecting a speaker sound signal from the sound signal, and when the speaker sound signal is detected, the sound signal direction of the sound signal matches the shooting direction. And a sound image direction control means for generating the second acoustic signal by performing acoustic signal processing on the signal.

本発明に係る第３の音響信号処理装置は、撮影手段による撮影により取得された画像信号と集音手段により取得された前記撮影の際に到来する音の音響信号とが互いに関連づけて記録されている記録手段から、前記音響信号を取得する取得手段と、前記取得手段によって取得された前記音響信号からスピーカ音信号を検出するスピーカ音検出手段と、前記スピーカ音信号が検出された場合、前記スピーカ音信号の音像方向が前記撮影手段による撮影方向と一致するように、前記音響信号に音響信号処理を施す音像方向制御手段と、前記音響信号処理を施された前記音響信号を再生する再生手段と、を備えたことを特徴とする。 In the third acoustic signal processing apparatus according to the present invention, the image signal acquired by photographing by the photographing means and the acoustic signal of the sound that arrives at the time of photographing obtained by the sound collecting means are recorded in association with each other. Acquisition means for acquiring the acoustic signal from the recording means, speaker sound detection means for detecting a speaker sound signal from the acoustic signal acquired by the acquisition means, and when the speaker sound signal is detected, the speaker Sound image direction control means for performing acoustic signal processing on the acoustic signal so that a sound image direction of the sound signal matches a photographing direction by the photographing means, and reproducing means for reproducing the acoustic signal subjected to the acoustic signal processing. , Provided.

本発明に係る第４の音響信号処理装置は、撮影手段による撮影により取得された画像信号と集音手段により取得された前記撮影の際に到来する音の音響信号とが互いに関連づけて記録されている記録手段から、前記音響信号を取得する取得手段と、前記取得手段によって取得された前記音響信号からスピーカ音信号を検出するスピーカ音検出手段と、前記スピーカ音信号が検出された場合、前記音響信号の音像方向が前記撮影手段による撮影方向と一致するように、前記音響信号に音響信号処理を施す音像方向制御手段と、前記音響信号処理を施された前記音響信号を再生する再生手段と、を備えたことを特徴とする。 In the fourth acoustic signal processing apparatus according to the present invention, the image signal acquired by the photographing by the photographing means and the acoustic signal of the sound arriving at the photographing obtained by the sound collecting means are recorded in association with each other. Acquisition means for acquiring the acoustic signal from the recording means, speaker sound detection means for detecting a speaker sound signal from the acoustic signal acquired by the acquisition means, and when the speaker sound signal is detected, the sound Sound image direction control means for performing acoustic signal processing on the acoustic signal such that a sound image direction of the signal matches a photographing direction by the photographing means, and reproducing means for reproducing the acoustic signal subjected to the acoustic signal processing; It is provided with.

本発明によると、画像信号と音響信号を互いに関連付けて記録する際、又は互いに関連付けて記録された画像信号と音響信号を再生する際、当該音響信号中にスピーカから発せられる音の音響信号を検出した場合には、当該スピーカ音の音像方向を適切に制御して記録する又は再生する音響信号処理装置、或いは当該スピーカ音の音像方向を適切に制御して記録する又は再生する撮像装置を提供することができる。 According to the present invention, when an image signal and an audio signal are recorded in association with each other, or when an image signal and an audio signal recorded in association with each other are reproduced, an audio signal of a sound emitted from a speaker is detected in the audio signal. In such a case, an acoustic signal processing device that appropriately records and reproduces the sound image direction of the speaker sound, or an imaging device that appropriately records and reproduces the sound image direction of the speaker sound is provided. be able to.

本発明の意義ないし効果は、以下に示す実施の形態の説明により更に明らかとなろう。ただし、以下の実施の形態は、あくまでも、本発明の一つの実施形態であって、本発明ないし各構成要件の用語の意義は、以下の実施の形態に記載されたものに制限されるものではない。 The significance or effect of the present invention will become more apparent from the following description of embodiments. However, the following embodiment is merely one embodiment of the present invention, and the meaning of the term of the present invention or each constituent element is not limited to that described in the following embodiment. Absent.

本発明の実施の形態に係る撮像装置の全体構成図である。1 is an overall configuration diagram of an imaging apparatus according to an embodiment of the present invention. スピーカ音制御部１００の処理内容の概要を説明する図である。It is a figure explaining the outline | summary of the processing content of the speaker sound control part. スピーカ音制御部１００の内部構成の概略を示すブロック図である。3 is a block diagram illustrating an outline of an internal configuration of a speaker sound control unit 100. FIG. 方向判定部１０２が音響信号の到来する方向を算出する方法を説明するための図である。It is a figure for demonstrating the method the direction determination part 102 calculates the direction from which an acoustic signal comes. 音響処理部７が備えるスピーカ音制御部２００の内部構成の概略を示すブロック図である。It is a block diagram which shows the outline of the internal structure of the speaker sound control part 200 with which the acoustic process part 7 is provided. 第ｎフレームの音響信号が周期性を有するか否かを判定する方法を説明するための図である。It is a figure for demonstrating the method to determine whether the acoustic signal of the nth frame has periodicity. 第ｎフレームの音響信号の自己相関値Ｓ（Ｐ）と変数Ｐの関係を示す図である。It is a figure which shows the relationship between the autocorrelation value S (P) of the acoustic signal of the nth frame, and the variable P.

以下、本発明に係る音響信号処理装置を撮像装置に実施した形態につき、図面を参照して説明する。 Hereinafter, an embodiment in which an acoustic signal processing device according to the present invention is implemented in an imaging device will be described with reference to the drawings.

図１は、本発明の実施形態に係る撮像装置の内部構成の概略を示すブロック図である。図１に示すように、撮像装置１は、入射される光学像を電気信号に変換するＣＣＤ（Charge Coupled Device）またはＣＭＯＳ（Complimentary Metal Oxide Semiconductor）センサなどの固体撮像素子から成るイメージセンサ２と、被写体の光学像をイメージセンサ２に結像させるとともに光量などの調整を行うレンズ部３と、を備える。レンズ部３とイメージセンサ２とで撮像部が構成され、この撮像部によってアナログ信号の画像信号が生成される。なお、レンズ部３は、ズームレンズやフォーカスレンズなどの各種レンズ（不図示）や、イメージセンサ２に入力される光量を調整する絞り（不図示）などを備える。 FIG. 1 is a block diagram illustrating an outline of an internal configuration of an imaging apparatus according to an embodiment of the present invention. As shown in FIG. 1, an imaging device 1 includes an image sensor 2 including a solid-state imaging device such as a CCD (Charge Coupled Device) or a CMOS (Complimentary Metal Oxide Semiconductor) sensor that converts an incident optical image into an electrical signal. And a lens unit 3 that forms an optical image of a subject on the image sensor 2 and adjusts the amount of light. The lens unit 3 and the image sensor 2 constitute an imaging unit, and an analog image signal is generated by the imaging unit. The lens unit 3 includes various lenses (not shown) such as a zoom lens and a focus lens, and a diaphragm (not shown) that adjusts the amount of light input to the image sensor 2.

さらに、撮像装置１は、イメージセンサ２から出力されるアナログ信号である画像信号をデジタル画像信号（以下、デジタル画像信号を単に画像信号と記載する場合もある。）に変換するとともにゲインの調整を行うＡＦＥ（Analog Front End）４と、入力される音を電気信号に変換し集音するマイクロフォン（以下、単にマイクと記載する。）５Ｌ及び５Ｒと、マイク５Ｌ及び５Ｒから出力されるアナログの音響信号をデジタル音響信号（以下、デジタル音響信号を単に音響信号と記載する場合もある。）に変換するＡＤＣ（Analog to Digital Converter）６Ｌ及び６Ｒと、ＡＤＣ６Ｌ及び６Ｒから出力される音響信号に対して各種音響信号処理を施して出力する音響処理部７と、ＡＦＥ４から出力される画像信号に対して各種画像信号処理を施して出力する画像処理部８を備える。 Furthermore, the imaging apparatus 1 converts an image signal that is an analog signal output from the image sensor 2 into a digital image signal (hereinafter, the digital image signal may be simply referred to as an image signal) and adjusts the gain. AFE (Analog Front End) 4 to be performed, microphones (hereinafter simply referred to as microphones) 5L and 5R for converting input sound into electric signals and collecting sound, and analog sound output from the microphones 5L and 5R ADC (Analog to Digital Converter) 6L and 6R for converting a signal into a digital sound signal (hereinafter, digital sound signal may be simply referred to as a sound signal), and sound signals output from ADC 6L and 6R The sound processing unit 7 that performs various sound signal processing and outputs the image signal, and the image signal that is output from the AFE 4 performs various image signal processing and outputs the image signal. An image processing unit 8 is provided.

ここで、音響処理部７は、マイク５Ｌ及び５Ｒによって集音された音にスピーカから発せられる音（以下、スピーカから発せられる音をスピーカ音と、スピーカ音の音響信号をスピーカ音信号と記載する。）が含まれているか否かを検出し、スピーカ音を検出した場合には当該スピーカ音の音像方向を制御する、或いはマイク５Ｌ及び５Ｒによって集音された音全体の音像方向を制御するスピーカ音制御部を備える。このスピーカ音制御部の詳細については、後述する。 Here, the sound processing unit 7 describes the sound collected by the microphones 5L and 5R from the speaker (hereinafter, the sound emitted from the speaker is the speaker sound, and the sound signal of the speaker sound is described as the speaker sound signal. .) Is detected, and when a speaker sound is detected, the sound image direction of the speaker sound is controlled, or the sound image direction of the entire sound collected by the microphones 5L and 5R is controlled. A sound control unit is provided. Details of the speaker sound control unit will be described later.

また、撮像装置１は、画像処理部８から出力される画像信号と音響処理部７から出力される音響信号とに対してＭＰＥＧ（Moving Picture Experts Group）圧縮方式などの圧縮符号化処理を施す圧縮処理部９と、圧縮処理部９で圧縮符号化された圧縮符号化信号を記録する外部メモリ１１と、圧縮符号化信号を外部メモリ１１に記録したり読み出したりするドライバ部１０と、ドライバ部１０によって外部メモリ１１から読み出された圧縮符号化信号を伸長して復号する伸長処理部１２と、を備える。 Further, the imaging apparatus 1 performs compression encoding processing such as MPEG (Moving Picture Experts Group) compression method on the image signal output from the image processing unit 8 and the sound signal output from the sound processing unit 7. The processing unit 9, an external memory 11 that records the compressed encoded signal compressed and encoded by the compression processing unit 9, a driver unit 10 that records and reads the compressed encoded signal in the external memory 11, and the driver unit 10 And a decompression processing unit 12 that decompresses and decodes the compressed encoded signal read from the external memory 11.

また、撮像装置１は、伸長処理部１２で復号された画像信号をモニタなどの表示部２１で表示可能な形式の信号に変換する画像信号出力部１３と、伸長処理部１２で復号された音響信号をスピーカ部２２で出力可能な形式の信号に変換する音響信号出力部１４と、を備える。 In addition, the imaging apparatus 1 includes an image signal output unit 13 that converts the image signal decoded by the expansion processing unit 12 into a signal that can be displayed on the display unit 21 such as a monitor, and the sound decoded by the expansion processing unit 12. And an acoustic signal output unit 14 that converts the signal into a signal in a format that can be output by the speaker unit 22.

また、撮像装置１は、撮像装置１内全体の動作を制御するＣＰＵ（Central Processing Unit）１５と、各処理を行うための各プログラムを記憶するとともにプログラム実行時の信号の一時保管を行うメモリ１６と、撮影を開始するボタンや各種設定の決定を行うボタンなどの撮影者からの指示が入力される操作部１７と、各部の動作タイミングを一致させるためのタイミング制御信号を出力するタイミングジェネレータ（ＴＧ）部１８と、ＣＰＵ１５と各部との間で信号のやりとりを行うためのバス１９と、メモリ１６と各部との間で信号のやりとりを行うためのバス２０と、を備える。 The imaging apparatus 1 also stores a CPU (Central Processing Unit) 15 that controls the overall operation of the imaging apparatus 1 and a memory 16 that stores each program for performing each process and temporarily stores a signal when the program is executed. A timing generator (TG) that outputs a timing control signal for matching the operation timing of the operation unit 17 to which an instruction from a photographer such as a button for starting shooting and a button for determining various settings is input. ) Unit 18, a bus 19 for exchanging signals between the CPU 15 and each unit, and a bus 20 for exchanging signals between the memory 16 and each unit.

なお、外部メモリ１１は画像信号や音響信号を記録することができればどのようなものでも構わない。例えば、ＳＤ（Secure Digital）カードのような半導体メモリ、ＤＶＤなどの光ディスク、ハードディスクなどの磁気ディスクなどをこの外部メモリ１１として使用することができる。また、外部メモリ１１を撮像装置１から着脱自在としても構わない。 The external memory 11 may be anything as long as it can record image signals and sound signals. For example, a semiconductor memory such as an SD (Secure Digital) card, an optical disk such as a DVD, a magnetic disk such as a hard disk, or the like can be used as the external memory 11. Further, the external memory 11 may be detachable from the imaging device 1.

次に、撮像装置１の基本動作について図１を参照して説明する。まず、撮像装置１は、レンズ部３より入射される光をイメージセンサ２において光電変換することによって、電気信号であるアナログ画像信号を生成する。イメージセンサ２は、ＴＧ部１８から入力されるタイミング制御信号に同期して、所定のフレーム周期（例えば、１／３０秒）で順次ＡＦＥ４に生成したアナログ画像信号を出力する。そして、ＡＦＥ４によってアナログ信号からデジタル信号へと変換された画像信号は、画像処理部８に入力される。画像処理部８では、画像信号がＹＵＶを用いた信号に変換されるとともに、階調補正や輪郭強調等の各種画像信号処理が施される。また、メモリ１６はフレームメモリとして動作し、画像処理部８が処理を行なう際に画像信号を一時的に保持する。 Next, the basic operation of the imaging apparatus 1 will be described with reference to FIG. First, the imaging apparatus 1 generates an analog image signal that is an electrical signal by photoelectrically converting light incident from the lens unit 3 in the image sensor 2. In synchronization with the timing control signal input from the TG unit 18, the image sensor 2 sequentially outputs analog image signals generated to the AFE 4 at a predetermined frame period (for example, 1/30 second). Then, the image signal converted from the analog signal to the digital signal by the AFE 4 is input to the image processing unit 8. In the image processing unit 8, the image signal is converted into a signal using YUV, and various image signal processes such as gradation correction and contour enhancement are performed. The memory 16 operates as a frame memory, and temporarily holds an image signal when the image processing unit 8 performs processing.

また、マイク５Ｌ及び５Ｒは、音を集音して電気信号であるアナログ音響信号に変換し、出力する。マイク５Ｌ及び５Ｒから出力されるアナログ音響信号はＡＤＣ６Ｌ及び６Ｒに入力されて、デジタル音響信号へと変換される。さらに、ＡＤＣ６Ｌ及び６Ｒからの音響信号は音響処理部７に入力され、ノイズ除去やスピーカ音制御部によるスピーカ音制御などの各種音響信号処理が施される。 The microphones 5L and 5R collect sound, convert it into an analog acoustic signal that is an electrical signal, and output it. Analog audio signals output from the microphones 5L and 5R are input to the ADCs 6L and 6R and converted into digital audio signals. Furthermore, the acoustic signals from the ADCs 6L and 6R are input to the acoustic processing unit 7 and subjected to various acoustic signal processing such as noise removal and speaker sound control by the speaker sound control unit.

画像処理部８から出力される画像信号と、音響処理部７から出力される音響信号とはともに圧縮処理部９に入力され、圧縮処理部９において所定の圧縮方式で圧縮される。このとき、画像信号と音響信号とは時間的に関連付けられて（対になって）おり、再生時に画像と音とがずれないように構成される。そして、圧縮された画像信号及び音響信号はドライバ部１０を介して外部メモリ１１に記録される。 Both the image signal output from the image processing unit 8 and the acoustic signal output from the acoustic processing unit 7 are input to the compression processing unit 9 and compressed by the compression processing unit 9 using a predetermined compression method. At this time, the image signal and the sound signal are temporally associated (paired), and are configured so that the image and the sound are not shifted during reproduction. The compressed image signal and sound signal are recorded in the external memory 11 via the driver unit 10.

外部メモリ１１に記録された圧縮後の画像信号及び音響信号は、操作部１７を介して入力される撮影者の再生指示に基づいて伸長処理部１２に読み出される。伸長処理部１２は、再生するために読み出される圧縮された画像信号及び音響信号を伸長し、この再生用の画像信号を画像信号出力部１３、再生用の音響信号を音響信号出力部１４にそれぞれ出力する。そして、画像信号出力部１３が、再生用の画像信号を表示部２１で表示可能な形式の信号に変換するとともに、音響信号出力部１４が、再生用の音響信号をスピーカ部２２で出力可能な形式の信号に変換して、それぞれ出力する。これにより、再生用の画像が表示部２１で表示され、再生用の音がスピーカ部２２から出力される。 The compressed image signal and sound signal recorded in the external memory 11 are read out to the decompression processing unit 12 based on the reproduction instruction of the photographer input via the operation unit 17. The decompression processing unit 12 decompresses the compressed image signal and sound signal read out for reproduction, the image signal for reproduction to the image signal output unit 13, and the sound signal for reproduction to the sound signal output unit 14, respectively. Output. The image signal output unit 13 converts the image signal for reproduction into a signal in a format that can be displayed on the display unit 21, and the sound signal output unit 14 can output the sound signal for reproduction by the speaker unit 22. Convert to a format signal and output each. As a result, an image for reproduction is displayed on the display unit 21, and a sound for reproduction is output from the speaker unit 22.

また、本実施形態の撮像装置１は、撮影した画像の記録を開始する前や、動画の記録時などに、撮影した画像を表示部２１に表示する。このとき、画像処理部８は、表示用の画像信号を生成するとともに、バス２０を介して画像信号出力部１３に出力する。そして、画像信号出力部１３が、表示用の画像信号を表示部２１で表示可能な形式の信号に変換して、出力する。撮影者は、表示部２１に表示される画像を確認することで、これから記録するまたは現在記録している画像の画角を認識することができる。 In addition, the imaging apparatus 1 according to the present embodiment displays the captured image on the display unit 21 before recording of the captured image is started or when a moving image is recorded. At this time, the image processing unit 8 generates a display image signal and outputs it to the image signal output unit 13 via the bus 20. Then, the image signal output unit 13 converts the image signal for display into a signal in a format that can be displayed on the display unit 21 and outputs the signal. The photographer can recognize the angle of view of the image to be recorded or currently recorded by checking the image displayed on the display unit 21.

なお、表示部２１やスピーカ部２２は、撮像装置１と一体となっているものでも構わないし、別体となっており、撮像装置１に備えられる端子とケーブル等を用いて接続されるようなものでも構わない。また、マイク５Ｌ及び５Ｒが、デジタル音響信号を出力するデジタルマイクを備えるものとして、ＡＤＣ６を備えない構成としても構わない。 The display unit 21 and the speaker unit 22 may be integrated with the imaging device 1 or may be separate from each other and connected to a terminal provided in the imaging device 1 using a cable or the like. It does n’t matter. In addition, the microphones 5L and 5R may include a digital microphone that outputs a digital acoustic signal, and the ADC 6 may not be included.

＜＜第１実施例＞＞
以下、撮像装置１の音響処理部７が備えるスピーカ音制御部の第１実施例について説明する。尚、以下の第１実施例の説明では、スピーカ音制御部に番号１００を付すこととする。 << First Example >>
Hereinafter, a first embodiment of the speaker sound control unit included in the acoustic processing unit 7 of the imaging apparatus 1 will be described. In the following description of the first embodiment, the speaker sound control unit is assigned the number 100.

図２は、スピーカ音制御部１００の処理内容の概要を説明する図である。図２では、撮影者が、撮像装置１を用いてマイクを持った話し手を撮影している。話し手がマイクに向って声を発すると、その声は、話し手とは異なる位置に配置されたスピーカから発せられることとなる。即ち、図２では、話し手の声を発するスピーカが音源Ｐである。他方、スピーカ音以外の音を発する音源Ｑも存在しているものとする。 FIG. 2 is a diagram for explaining the outline of the processing contents of the speaker sound control unit 100. In FIG. 2, the photographer is photographing a speaker with a microphone using the imaging device 1. When the speaker speaks into the microphone, the voice is emitted from a speaker arranged at a position different from that of the speaker. That is, in FIG. 2, the speaker that emits the speaker's voice is the sound source P. On the other hand, it is assumed that there is also a sound source Q that emits sound other than speaker sound.

スピーカ音制御部１００は、マイク５Ｌ及び５Ｒによって集音された複数の音源（図２では、音源Ｐ及び音源Ｑの２つの音源）からの音を音源毎に分離抽出する。そして、分離抽出された各音源からの音がスピーカ音であるか否かを判定する。スピーカ音と判定した場合には、当該音源からの音像が撮像装置１による撮影方向とほぼ一致するように音像方向の制御を行なう。ここで、撮影方向とは、撮像装置１による撮影の際に、撮像装置１のレンズ部３が向いている方向をいう。図２では、撮影方向にはマイクを持った話し手が存在するものの、音源Ｐ（スピーカ）及び音源Ｑは存在しない。 The speaker sound control unit 100 separates and extracts sounds from a plurality of sound sources (two sound sources of the sound source P and the sound source Q in FIG. 2) collected by the microphones 5L and 5R for each sound source. And it is determined whether the sound from each sound source separated and extracted is a speaker sound. When it is determined that the sound is a speaker sound, the direction of the sound image is controlled so that the sound image from the sound source substantially coincides with the shooting direction of the image pickup apparatus 1. Here, the shooting direction refers to a direction in which the lens unit 3 of the imaging device 1 faces when shooting with the imaging device 1. In FIG. 2, there is a speaker with a microphone in the shooting direction, but there is no sound source P (speaker) and sound source Q.

スピーカ音制御部１００は、音源Ｐ及び音源Ｑからの音のうち、音源Ｐからの音がスピーカ音であると判定すると、当該スピーカ音が撮像装置１の撮影方向から到来しているかのように、当該スピーカ音の音響信号に音響信号処理を施す。かかる処理の結果、撮影により取得された画像信号及び音響信号を再生して視聴すると、視聴者は、話し手の声が撮影方向、即ち、話し手の存在する方向から聴こえることになるため、上述した違和感を感じなくなる。 When the speaker sound control unit 100 determines that the sound from the sound source P is the speaker sound among the sounds from the sound source P and the sound source Q, it is as if the speaker sound is coming from the shooting direction of the imaging device 1. The sound signal processing is performed on the sound signal of the speaker sound. As a result of such processing, when the image signal and the sound signal acquired by shooting are reproduced and viewed, the viewer can hear the speaker's voice from the shooting direction, that is, the direction in which the speaker is present. I don't feel anymore.

図３は、スピーカ音制御部１００の内部構成の概略を示すブロック図である。図３において、ＡＤＣ６Ｌ及び６Ｒから出力される音響信号は、時間領域上の信号であり、或る基準時刻からの経過時間をｔ（ｔは整数）とすると、当該音響信号はｔの関数として表現できる。以下、ＡＤＣ６Ｌ及び６Ｒから出力される音響信号をそれぞれ原信号Ｌｉ（ｔ）及び原信号Ｒｉ（ｔ）と記載する。 FIG. 3 is a block diagram illustrating an outline of the internal configuration of the speaker sound control unit 100. In FIG. 3, the acoustic signals output from the ADCs 6L and 6R are signals in the time domain, and when the elapsed time from a certain reference time is t (t is an integer), the acoustic signal is expressed as a function of t. it can. Hereinafter, the acoustic signals output from the ADCs 6L and 6R are referred to as an original signal Li (t) and an original signal Ri (t), respectively.

ＦＦＴ（Fast Fourier Transform）部１０１Ｌ及び１０１Ｒぞれぞれは、原信号Ｌｉ（ｔ）及びＲｉ（ｔ）それぞれに対して離散フーリエ変換を施し、周波数スペクトルを生成する。ＦＦＴ部１０１Ｌ及び１０１Ｒそれぞれから出力される周波数スペクトルは、ＡＤＣ６Ｌ及び６Ｒそれぞれから時間領域上の信号として出力される音響信号を周波数領域上の信号に変換したものである。従って、当該周波数スペクトルは、周波数ｆ（ｆは正の整数）の関数として表現できる。以下、ＦＦＴ部１０１Ｌ及び１０１Ｒから出力される周波数スペクトルを、それぞれ周波数スペクトルＬ（ｆ）及びＲ（ｆ）と記載する。 FFT (Fast Fourier Transform) sections 101L and 101R respectively perform discrete Fourier transform on the original signals Li (t) and Ri (t) to generate a frequency spectrum. The frequency spectrum output from each of the FFT units 101L and 101R is obtained by converting acoustic signals output as signals on the time domain from the ADCs 6L and 6R into signals on the frequency domain. Therefore, the frequency spectrum can be expressed as a function of the frequency f (f is a positive integer). Hereinafter, the frequency spectra output from the FFT units 101L and 101R are referred to as frequency spectra L (f) and R (f), respectively.

本実施形態にかかる撮像装置１では、ＡＤＣ６Ｌ及び６Ｒそれぞれがアナログ音響信号を、例えば、サンプリング周波数４８ｋＨｚ（キロヘルツ）でデジタル音響信号に変換する。そして、撮像装置１は、生成された音響信号１０２４サンプル、即ち、約２１．３ｍｓｅｃ（１０２４×１／４８ｋＨｚ）を１つのフレームとし、このフレーム単位で該音響信号に音響信号処理を施す。 In the imaging apparatus 1 according to the present embodiment, the ADCs 6L and 6R each convert an analog acoustic signal into a digital acoustic signal at a sampling frequency of 48 kHz (kilohertz), for example. Then, the imaging apparatus 1 uses the generated acoustic signal 1024 samples, that is, approximately 21.3 msec (1024 × 1/48 kHz) as one frame, and performs acoustic signal processing on the acoustic signal in units of this frame.

ＦＦＴ部１０１Ｌ及び１０１Ｒは、音響信号に対し１フレーム単位で離散フーリエ変換を施す。この際、音響信号の周波数帯域をΔｆの標本間隔でＭ（Ｍは２以上の整数）個に細分化するとともに、細分化された周波数帯域毎に周波数スペクトルを算出する。以下、細分化された周波数帯域を細分化帯域と記載する。例えば、音響信号の全周波数帯域がΔＦであるとすると、細分化帯域の個数Ｍは、Ｍ＝ΔＦ／Δｆとなる。ここで、理想的には、標本間隔Δｆを狭くすることにより、細分化帯域のそれぞれが１つの音源からの音響信号の成分しか含まないようにすることができる。即ち、各細分化帯域に含まれる音響信号は、複数ある音源のうちのいずれか一つの音源から発せられた音の音響信号の成分であると考えることができる。 The FFT units 101L and 101R perform discrete Fourier transform on the acoustic signal in units of one frame. At this time, the frequency band of the acoustic signal is subdivided into M (M is an integer of 2 or more) with a sampling interval of Δf, and a frequency spectrum is calculated for each subdivided frequency band. Hereinafter, the subdivided frequency band is referred to as a subdivided band. For example, if the total frequency band of the acoustic signal is ΔF, the number M of subdivided bands is M = ΔF / Δf. Here, ideally, by narrowing the sample interval Δf, each of the subdivided bands can include only the component of the acoustic signal from one sound source. That is, the acoustic signal included in each subdivided band can be considered as a component of the acoustic signal of the sound emitted from any one of a plurality of sound sources.

複数の細分化帯域をそれぞれ、ｆ０，ｆ１，ｆ２，・・・，ｆｍ-1（ｍは１以上の整数）とすると、周波数スペクトルＬ（ｆ）及びＲ（ｆ）は、細分化帯域ｆ０，ｆ１，ｆ２，・・・，ｆｍ-1（ｍは１以上の整数）の周波数スペクトルから構成されることになる。以下、周波数スペクトルＬ（ｆ）及びＲ（ｆ）を構成する細分化帯域の周波数スペクトルを、それぞれＬ（ｆ０），Ｌ（ｆ１），Ｌ（ｆ２），・・・Ｌ（ｆｍ-1）、及びＲ（ｆ０），Ｒ（ｆ１），Ｒ（ｆ２），・・・Ｒ（ｆｍ-1）と記載する。 Assuming that the plurality of subdivided bands are f0, f1, f2,..., Fm-1 (m is an integer of 1 or more), the frequency spectra L (f) and R (f) are subdivided bands f0, .., fm-1 (m is an integer of 1 or more). Hereinafter, the frequency spectrums of the subdivided bands constituting the frequency spectra L (f) and R (f) are denoted by L (f0), L (f1), L (f2),... L (fm-1), respectively. And R (f0), R (f1), R (f2),... R (fm-1).

方向判定部１０２は、ＦＦＴ部１０１Ｌ及び１０１Ｒそれぞれから出力される各細分化帯域の周波数スペクトルから、各細分化帯域に含まれる音響信号がマイク５Ｌ及び５Ｒへ到達した時の位相差を算出し、この位相差に基づいて各細分化帯域に含まれる音響信号の到来方向を判定する。 The direction determination unit 102 calculates the phase difference when the acoustic signal included in each subband reaches the microphones 5L and 5R from the frequency spectrum of each subband output from each of the FFT units 101L and 101R. Based on this phase difference, the arrival direction of the acoustic signal included in each subdivided band is determined.

図４は、方向判定部１０２が音響信号の到来する方向を算出する方法を説明するための図である。今、互いに直行するＸ軸及びＹ軸を座標軸とする２次元の座標面を想定する。Ｘ軸とＹ軸は原点Ｏで直交する。原点Ｏを基準としてＸ軸正方向を右側、負方向を左側、Ｙ軸正方向を前方、負方向を後方とする。マイク５Ｌ及び５ＲがそれぞれＸ軸上の互いに異なる位置であって、Ｙ軸を基準に対称となるように配置され、２つのマイクの間隔がＤであるとする。間隔Ｄは、例えば数ｍｍ程度である。 FIG. 4 is a diagram for explaining a method by which the direction determination unit 102 calculates the direction in which the acoustic signal arrives. Now, a two-dimensional coordinate plane is assumed with the X axis and Y axis orthogonal to each other as coordinate axes. The X axis and the Y axis are orthogonal at the origin O. With the origin O as a reference, the X-axis positive direction is the right side, the negative direction is the left side, the Y-axis positive direction is the front, and the negative direction is the rear. It is assumed that the microphones 5L and 5R are arranged at different positions on the X axis and are symmetrical with respect to the Y axis, and the distance between the two microphones is D. The interval D is, for example, about several mm.

今、例えば、ｆ０Ｈｚの細分化帯域に含まれる音響信号が音源Ｐから発せられたものであり、当該音響信号が原点Ｏへ到来する時の入射角を原点を中心に反時計周りを正としてθ（rad）（ラジアン）であるとする。このときマイク５Ｌ及び５Ｒへの入射角もθ（rad）と近似することができる。当該音響信号がマイク５Ｌ及び５Ｒに到達する際の位相差をΔφ（rad）とすると、Δφは、ＦＦＴ部１０１Ｌ及び１０１Ｒそれぞれから出力される周波数スペクトルＬ（ｆ０）及びＲ（ｆ０）から算出することができる。 Now, for example, an acoustic signal included in the subdivided band of f0 Hz is emitted from the sound source P, and the incident angle when the acoustic signal arrives at the origin O is assumed to be positive with a counterclockwise rotation centered on the origin. (Rad) (radian). At this time, the incident angle to the microphones 5L and 5R can also be approximated to θ (rad). Assuming that the phase difference when the acoustic signal reaches the microphones 5L and 5R is Δφ (rad), Δφ is calculated from the frequency spectra L (f0) and R (f0) output from the FFT units 101L and 101R, respectively. be able to.

具体的には、離散フーリエ変換によって算出された周波数スペクトルＬ（ｆ０）の実部をＬ＿ｒ（ｆ０）、虚部をＬ＿ｉ（ｆ０）とするとＬ（ｆ０）の位相φｌは Specifically, if the real part of the frequency spectrum L (f0) calculated by the discrete Fourier transform is L_r (f0) and the imaginary part is L_i (f0), the phase φl of L (f0) is

と算出することができる。 Can be calculated.

同様に、周波数スペクトルＲ（ｆ０）の実部をＲ＿ｒ（ｆ０）、虚部をＲ＿ｉ（ｆ０）とするとＲ（ｆ０）の位相φｒは、 Similarly, if the real part of the frequency spectrum R (f0) is R_r (f0) and the imaginary part is R_i (f0), the phase φr of R (f0) is

と算出することができる。 Can be calculated.

ここで、位相差Δφは、Δφ＝φr-φlと算出できるため、下記式（１）により算出することができる。 Here, since the phase difference Δφ can be calculated as Δφ = φr−φl, it can be calculated by the following equation (1).

また、音速をＣ（ｍｍ／ｓｅｃ）、マイク５Ｌと５Ｒとの間隔をＤ（ｍｍ）とすると、Δφは下記式（２）からも算出することができる。 Further, if the sound velocity is C (mm / sec) and the distance between the microphones 5L and 5R is D (mm), Δφ can also be calculated from the following equation (2).

よって、上記式（１）と（２）より入射角θは、下記式（３）から算出することができる。 Therefore, the incident angle θ can be calculated from the following equation (3) from the above equations (1) and (2).

以上より、細分化帯域ｆ０Ｈｚに含まれる音響信号が到来する方向である、入射角θを算出することができる。このようにして方向判定部１０２は、全ての細分化帯域に含まれる音響信号の入射角を算出する。以下、細分化帯域に含まれる音響信号の入射角を単に細分化帯域の入射角と記載する。 As described above, the incident angle θ that is the direction in which the acoustic signal included in the subdivided band f0 Hz arrives can be calculated. In this way, the direction determination unit 102 calculates the incident angles of the acoustic signals included in all the subdivided bands. Hereinafter, the incident angle of the acoustic signal included in the subdivided band is simply referred to as the incident angle of the subdivided band.

本実施例では、方向判定部１０２は、周波数スペクトルＬ（ｆ）の細分化帯域の周波数スペクトルと当該細分化帯域の入射角とを一組として全ての組をスピーカ音判定部１０３へ出力する。尚、周波数スペクトルＲ（ｆ）の細分化帯域の周波数スペクトルと当該細分化帯域の入射角とを一組として出力してもかまわない。 In the present embodiment, the direction determination unit 102 outputs all the sets to the speaker sound determination unit 103 by setting the frequency spectrum of the subdivided band of the frequency spectrum L (f) and the incident angle of the subdivided band as one set. Note that the frequency spectrum of the subband of the frequency spectrum R (f) and the incident angle of the subband may be output as a set.

例えば、細分化帯域ｆ０、ｆ１、ｆ２、ｆ３、ｆ４、ｆ５、ｆ６、ｆ７、ｆ８、及びｆ９の入射角がそれぞれθ０、θ１、θ０、θ０、θ１、θ１、θ１、θ２、θ２、及びθ２（rad）であるとすると、方向判定部１０２は、（Ｌ（ｆ０），θ０）、（Ｌ（ｆ１），θ１）、（Ｌ（ｆ２），θ０）、（Ｌ（ｆ３），θ０）、（Ｌ（ｆ４），θ１）、（Ｌ（ｆ５），θ１）、（Ｌ（ｆ６），θ１）、（Ｌ（ｆ７），θ２）、（Ｌ（ｆ８），θ２）、（Ｌ（ｆ９），θ２）をスピーカ音判定部１０３へ出力する。 For example, the incident angles of the subdivided bands f0, f1, f2, f3, f4, f5, f6, f7, f8, and f9 are θ0, θ1, θ0, θ0, θ1, θ1, θ1, θ2, θ2, and θ2, respectively. Assuming that (rad), the direction determination unit 102 determines that (L (f0), θ0), (L (f1), θ1), (L (f2), θ0), (L (f3), θ0), (L (f4), θ1), (L (f5), θ1), (L (f6), θ1), (L (f7), θ2), (L (f8), θ2), (L (f9) , Θ2) is output to the speaker sound determination unit 103.

スピーカ音判定部１０３は、方向判定部１０２から出力される細分化帯域の周波数スペクトルと当該細分化帯域の入射角の組から、入射角毎に細分化帯域の周波数スペクトルを抽出するとともにこれらを合成し、合成周波数スペクトルを生成する。即ち、音響信号の到来方向毎の合成周波数スペクトルを生成する。以下、入射角θから到来する細分化帯域の周波数スペクトルを合成して生成した合成周波数スペクトルをＬ（θ）と記載する。 The speaker sound determination unit 103 extracts the frequency spectrum of the subdivided band for each incident angle from the set of the frequency spectrum of the subdivided band output from the direction determination unit 102 and the incident angle of the subdivided band and combines them. And a synthesized frequency spectrum is generated. That is, a synthesized frequency spectrum for each direction of arrival of the acoustic signal is generated. Hereinafter, the synthesized frequency spectrum generated by synthesizing the frequency spectrum of the subdivided band coming from the incident angle θ will be referred to as L (θ).

例えば、スピーカ音判定部１０３は、方向判定部１０２から、（Ｌ（ｆ０），θ０）、（Ｌ（ｆ１），θ１）、（Ｌ（ｆ２），θ０）、（Ｌ（ｆ３），θ０）、（Ｌ（ｆ４），θ１）、（Ｌ（ｆ５），θ１）、（Ｌ（ｆ６），θ１）、（Ｌ（ｆ７），θ２）、（Ｌ（ｆ８），θ２）、（Ｌ（ｆ９），θ２）を取得すると、入射角θ０の細分化帯域の周波数スペクトルＬ（ｆ０）、Ｌ（ｆ２）、Ｌ（ｆ３）を抽出し、これらを合成し、合成周波数スペクトルＬ（θ０）を生成する。 For example, the speaker sound determination unit 103 receives (L (f0), θ0), (L (f1), θ1), (L (f2), θ0), (L (f3), θ0) from the direction determination unit 102. , (L (f4), θ1), (L (f5), θ1), (L (f6), θ1), (L (f7), θ2), (L (f8), θ2), (L (f9) ), Θ2), the frequency spectrum L (f0), L (f2), L (f3) of the subdivided band with the incident angle θ0 is extracted and synthesized to generate a synthesized frequency spectrum L (θ0). To do.

同様に、Ｌ（ｆ１）、Ｌ（ｆ４）、Ｌ（ｆ５）、Ｌ（ｆ６）を抽出して合成し、入射角θ１からの合成周波数スペクトルＬ（θ１）とする。また、Ｌ（ｆ７）、Ｌ（ｆ８）、Ｌ（ｆ９）を抽出して合成し、入射角θ２からの合成周波数スペクトルＬ（θ２）とする。 Similarly, L (f1), L (f4), L (f5), and L (f6) are extracted and combined to obtain a combined frequency spectrum L (θ1) from the incident angle θ1. Further, L (f7), L (f8), and L (f9) are extracted and combined to obtain a combined frequency spectrum L (θ2) from the incident angle θ2.

スピーカ音判定部１０３は、このようにして算出した各到来方向の合成周波数スペクトルの特性から、各到来方向からのデジタル音響信号がスピーカ音信号であるか否かを判定する。 The speaker sound determination unit 103 determines whether the digital sound signal from each direction of arrival is a speaker sound signal from the characteristics of the combined frequency spectrum of each direction of arrival calculated in this way.

一般的に、講習会や運動会といったイベントで使用されるスピーカが再生できる音響信号の周波数帯域は、３００Ｈｚ〜６ｋＨｚ程度の範囲であることが多い。従って、各到来方向の音響信号の周波数スペクトルが概ね３００Ｈｚ〜６ｋＨｚ程度の範囲に収まっていれば、スピーカ音の可能性が高いと判断する。 Generally, the frequency band of an acoustic signal that can be reproduced by a speaker used in an event such as a seminar or athletic meet is often in the range of about 300 Hz to 6 kHz. Therefore, if the frequency spectrum of the acoustic signal in each direction of arrival is within a range of approximately 300 Hz to 6 kHz, it is determined that the possibility of speaker sound is high.

また、人の声の周波数スペクトルは概ね１００Ｈｚ〜４ｋＨｚ程度の範囲にスペクトルが集中している。そして、有声音は、比較的低周波数帯域にピッチ周波数が存在するとともに、その倍音成分から成る調波構造を有している。ここで、ピッチ周波数とは、声帯振動により発せられる人の声の基本周波数であり、通常、１００Ｈｚ〜３００Ｈｚ程度の範囲に存在する。従って、ピッチ周波数をｆｐとすると、人の声の周波数スペクトルは、ｆｐ、２ｆｐ、３ｆｐ、・・・ｎｆｐＨｚ（ｎは正の整数）で極大値を取る特性を示す。 Further, the frequency spectrum of the human voice is generally concentrated in the range of about 100 Hz to 4 kHz. The voiced sound has a harmonic structure composed of harmonic components thereof with a pitch frequency in a relatively low frequency band. Here, the pitch frequency is a fundamental frequency of a human voice emitted by vocal fold vibration, and usually exists in a range of about 100 Hz to 300 Hz. Therefore, if the pitch frequency is fp, the frequency spectrum of the human voice shows a characteristic that takes a maximum value at fp, 2fp, 3fp,... NfpHz (n is a positive integer).

一方、上記したように講習会や運動会といったイベントで使用されるスピーカの再生可能な周波数帯域が３００Ｈｚ〜６ｋＨｚ程度であるから、スピーカ音に人の声が含まれている場合には、その周波数スペクトルはピッチ周波数のスペクトルを含まず、かつ、調波構造を有する周波数特性を示すこととなる。 On the other hand, as described above, since the reproducible frequency band of the speaker used in events such as workshops and athletic meet is about 300 Hz to 6 kHz, if the speaker sound contains a human voice, its frequency spectrum Does not include the spectrum of the pitch frequency and shows frequency characteristics having a harmonic structure.

スピーカ音判定部１０３は、例えば、合成周波数スペクトルに対し自己相関を施し、ピッチ周波数のスペクトルを含むか否か、及び調波構造を備えるか否かを判定することにより、合成周波数スペクトルが人の声のスピーカ音信号によるものを含むか否かを判定する。 For example, the speaker sound determination unit 103 performs autocorrelation on the synthesized frequency spectrum and determines whether or not the synthesized frequency spectrum includes a pitch frequency spectrum and whether or not the synthesized frequency spectrum has a harmonic structure. It is determined whether or not a voice speaker sound signal is included.

具体的には、スピーカ音判定部１０３は、まず、合成周波数スペクトルＬ（θ）に対して自己相関を施し、複数の極大値を検出する。合成周波数スペクトルＬ（θ）に、１００Ｈｚ〜３００Ｈｚ程度の範囲のスペクトルが含まれず、極大値を取る周波数が、例えば、ｆｍ１＝３００Ｈｚ，ｆｍ２＝４５０Ｈｚ、ｆｍ３＝６００Ｈｚ・・・であったとする。 Specifically, the speaker sound determination unit 103 first performs autocorrelation on the combined frequency spectrum L (θ) and detects a plurality of maximum values. Assume that the synthesized frequency spectrum L (θ) does not include a spectrum in the range of about 100 Hz to 300 Hz, and the frequencies at which the maximum values are obtained are, for example, fm1 = 300 Hz, fm2 = 450 Hz, fm3 = 600 Hz,.

ここで、極大値を取る最初の周波数ｆｍ１がピッチ周波数ｆｐの２倍の周波数、即ち、３００Ｈｚ＝２ｆｐであると仮定すると、ピッチ周波数ｆｐはｆｐ＝１５０Ｈｚとなる。さらに、合成周波数スペクトルＬ（θ）が、ピッチ周波数ｆｐ＝１５０Ｈｚの音声信号の周波数スペクトルを含むのであれば、Ｌ（θ）は２ｆｐ＝３００Ｈｚ、３ｆｐ＝４５０Ｈｚ、４ｆｐ＝６００Ｈｚで極大値を取るはずである。 Here, assuming that the first frequency fm1 having the maximum value is twice the pitch frequency fp, that is, 300 Hz = 2 fp, the pitch frequency fp is fp = 150 Hz. Furthermore, if the synthesized frequency spectrum L (θ) includes the frequency spectrum of an audio signal having a pitch frequency fp = 150 Hz, L (θ) should have a maximum value at 2fp = 300 Hz, 3fp = 450 Hz, 4fp = 600 Hz. It is.

今、ｆｍ１＝２ｆｐ、ｆｍ２＝３ｆｐ、ｆｍ３＝４ｆｐという関係を満たしていることから、この場合、スピーカ音判定部１０３は、Ｌ（θ）は、ピッチ周波数が１５０Ｈｚである音声信号のスピーカ音信号の周波数スペクトルを含むと判断する。 Now, since the relationship of fm1 = 2fp, fm2 = 3fp, fm3 = 4fp is satisfied, in this case, the speaker sound determination unit 103 determines that L (θ) is a speaker sound signal of an audio signal whose pitch frequency is 150 Hz. It is determined that the frequency spectrum is included.

このようにして、スピーカ音判定部１０３は、すべての合成周波数スペクトルＬ（θ）に対して、それらが音声信号のスピーカ音信号の周波数スペクトルを含むか否かを判定する。 In this way, the speaker sound determination unit 103 determines whether or not they include the frequency spectrum of the speaker sound signal of the sound signal for all the synthesized frequency spectra L (θ).

本実施例では、到来方向毎の音響信号の周波数帯域が、講習会や運動会といったイベントで使用されるスピーカ音の再生可能帯域（３００Ｈｚ〜６ｋＨｚ程度の範囲）に含まれており、かつ、音声信号が含まれている場合に、スピーカ音信号であると判定し、それ以外の場合は、たとえ、周波数帯域が３００Ｈｚ〜６ｋＨｚであったとしてもスピーカ音信号とは判断しないこととする。これにより、人の声を含むスピーカ音を精度よく検出して音像方向の制御を行なうことができる。 In this embodiment, the frequency band of the acoustic signal for each direction of arrival is included in the reproducible band of speaker sound (in the range of about 300 Hz to 6 kHz) used in events such as workshops and athletic meet, and the audio signal Is included, it is determined that the sound signal is a speaker sound signal. Otherwise, the speaker sound signal is not determined even if the frequency band is 300 Hz to 6 kHz. Thereby, it is possible to accurately detect a speaker sound including a human voice and control the direction of the sound image.

スピーカ音判定部１０３は、スピーカ音信号を含む細分化帯域の周波数をゲイン調整部１０４へ通知する。 The speaker sound determination unit 103 notifies the gain adjustment unit 104 of the subband frequency including the speaker sound signal.

ゲイン調整部１０４は、スピーカ音判定部１０３から通知されたスピーカ音信号を含む細分化帯域の周波数に対応する周波数スペクトルをＬ（ｆ）とＲ（ｆ）で等しいレベルとなるように調整する。 The gain adjustment unit 104 adjusts the frequency spectrum corresponding to the frequency of the subdivided band including the speaker sound signal notified from the speaker sound determination unit 103 so that L (f) and R (f) have the same level.

具体的には、例えば、細分化帯域ｆ０Ｈｚがスピーカ音信号を含むと通知された場合、Ｌ（ｆ０）＝ＶＬ、Ｒ（ｆ０）＝ＶＲであって、ＶＬ＞ＶＲであったとすると、ゲイン調整部１０４は、Ｌ（ｆ０）＝ＶＬ、Ｒ（ｆ０）＝ＶＬとなるようにゲインを調整する。すなわち、Ｌ（ｆ０）及びＲ（ｆ０）の周波数スペクトルのレベルを両者のうちのレベルが高い方に一致させるように調整する。 Specifically, for example, when it is notified that the subdivided band f0 Hz includes a speaker sound signal, L (f0) = VL, R (f0) = VR, and VL> VR, the gain adjustment The unit 104 adjusts the gain so that L (f0) = VL and R (f0) = VL. That is, the level of the frequency spectrum of L (f0) and R (f0) is adjusted so as to coincide with the higher one of the levels.

ＩＦＦＴ（Inverse Fast Fourier Transform）部１０５Ｌ及び１０５Ｒは、それぞれゲイン調整後の周波数スペクトルＬ（ｆ）及びＲ（ｆ）に対し逆フーリエ変換を行い、時間領域上の信号に変換し、それぞれＬｏ（ｔ）及びＲｏ（ｔ）として出力する。 IFFT (Inverse Fast Fourier Transform) sections 105L and 105R perform inverse Fourier transform on the gain-adjusted frequency spectra L (f) and R (f), respectively, to convert them into signals in the time domain, respectively Lo (t ) And Ro (t).

以上説明したように、本実施例に係るスピーカ音制御部１００は、複数の音源から到来する音の中に人の声のスピーカ音が含まれているか否かを検出する。人の声のスピーカ音を検出した場合には、当該スピーカ音に対してのみ、その音像が撮像装置１による撮影方向とほぼ一致するように制御する。 As described above, the speaker sound control unit 100 according to the present embodiment detects whether or not a human voice speaker sound is included in sounds coming from a plurality of sound sources. When a speaker sound of a human voice is detected, control is performed so that only the speaker sound has a sound image that substantially coincides with the shooting direction of the imaging apparatus 1.

＜＜第２実施例＞＞
以下、撮像装置１の音響処理部７が備えるスピーカ音制御部の第２実施例について説明する。尚、以下の第２実施例の説明では、スピーカ音制御部に番号２００を付すこととする。 << Second Example >>
Hereinafter, a second embodiment of the speaker sound control unit included in the acoustic processing unit 7 of the imaging apparatus 1 will be described. In the following description of the second embodiment, the number 200 is assigned to the speaker sound control unit.

一般的に、マイクから入力され、増幅器による増幅処理が施された後にスピーカから発せられる人の声は、直接発せられる人の声よりも大きくなる。直接発せられる人の声がスピーカから発せられる人の声よりも大きい場合もあるが、直接発せられる人の声とスピーカから発せられる人の声とでは、同一人物であったとしても、反響が相違する。通常、スピーカから発せられる人の声は、直接発せられる人の声よりも反響が大きくなる。人がマイクを介して声を発する場合、マイクは、当該人が直接発する声に加えてスピーカから発せられる当該人の声をも集音するため、これら２種類の同一人物の声がスピーカから発せられることになるからである。 Generally, the voice of a person who is input from a microphone and is emitted from a speaker after being amplified by an amplifier is larger than the voice of a person who is directly emitted. In some cases, the voice of a person uttered directly is louder than the voice of a person uttered from the speaker, but the voice of the person uttered directly and the voice of the person uttered from the speaker differ even if they are the same person. To do. Usually, the voice of a person uttered from a speaker has a higher response than the voice of a person uttered directly. When a person utters a voice through a microphone, the microphone collects the voice of the person uttered from the speaker in addition to the voice directly uttered by the person. Because it will be.

このようなことから、音量が大きく、かつ、反響が大きい人の声は、人の声のスピーカ音であると考えられる。ここで、反響が大きいとは、一定の周期性があるということである。 For this reason, it is considered that the voice of a person whose volume is high and whose response is large is a speaker sound of a person's voice. Here, a large echo means that there is a certain periodicity.

また、一般的に、音楽による音響信号は広帯域信号であって、かつ、一定の周期性を有している。上記したように、講習会や運動会といったイベントで通常使用されるスピーカについては、再生できる音響信号の周波数帯域が３００Ｈｚ〜６ｋＨｚ程度の範囲のものが多い。したがって、音楽による音響信号がこのようなスピーカから発せられる場合には、直接の音楽による音響信号よりも周波数帯域の幅が狭くなるものの、増幅処理が施されているため音量が大きく、かつ、一定の周期性を有することとなる。 In general, music acoustic signals are wideband signals and have a certain periodicity. As described above, many speakers that are normally used in events such as classes and athletic meet have a frequency band of sound signals that can be reproduced in the range of about 300 Hz to 6 kHz. Therefore, when an audio signal based on music is emitted from such a speaker, the frequency band is narrower than that of an audio signal based on direct music, but the volume is large and constant because of the amplification process. It will have the periodicity of.

以上のことから、（Ａ）周波数帯域が３００Ｈｚ〜６ｋＨｚ程度の範囲に含まれること、（Ｂ）音量が大きいこと、及び（Ｃ）一定の周期性を有すること、の要件を全て満たす音響信号は、スピーカ音信号であると判断することができる。換言すると、上記（Ａ）乃至（Ｃ）の要件のうちいずれか一つでも満たさない場合には、スピーカ音信号でないと判断することができる。 From the above, an acoustic signal that satisfies all the requirements of (A) the frequency band being included in the range of about 300 Hz to 6 kHz, (B) high volume, and (C) having a certain periodicity is It can be determined that the sound signal is a speaker sound. In other words, if any one of the requirements (A) to (C) is not satisfied, it can be determined that the signal is not a speaker sound signal.

図５は、音響処理部７が備えるスピーカ音制御部２００の内部構成の概略を示すブロック図である。スピーカ音判定部２０１は、ＡＤＣ６Ｒから出力されるＲｉ（ｔ）がスピーカ音信号を含んでいるか否かを判断し、スピーカ音信号を含んでいると判断した場合には、後述の切り替え部２０２へ切り替え信号を出力する。 FIG. 5 is a block diagram illustrating an outline of an internal configuration of the speaker sound control unit 200 included in the acoustic processing unit 7. The speaker sound determination unit 201 determines whether or not Ri (t) output from the ADC 6R includes a speaker sound signal. If the speaker sound determination unit 201 determines that the speaker sound signal is included, the speaker sound determination unit 201 proceeds to the switching unit 202 described later. Outputs a switching signal.

切り替え部２０２は、スピーカ音判定部２０１から切り替え信号が出力されると、ＡＤＣ６Ｌ及び６Ｒそれぞれから出力されるＬｉ（ｔ）及びＲｉ（ｔ）に対してモノラル化処理を施し、Ｌｏ（ｔ）及びＲｏ（ｔ）として出力する。ここで、モノラル化処理とは、Ｌｏ（ｔ）＝Ｒｏ（ｔ）とする処理である。スピーカ音判定部２０１から切り替え信号が出力されない場合は、Ｌｉ（ｔ）及びＲｉ（ｔ）をそれぞれＬｏ（ｔ）及びＲｏ（ｔ）として出力する。尚、スピーカ音判定部２０１は、Ｒｉ（ｔ）についてスピーカ音信号か否かの判定を行なっているが、Ｌｉ（ｔ）について判定を行なうこととしてもよい。 When the switching signal is output from the speaker sound determination unit 201, the switching unit 202 performs monaural processing on Li (t) and Ri (t) output from the ADCs 6L and 6R, respectively, and Lo (t) and Output as Ro (t). Here, monauralization processing is processing that sets Lo (t) = Ro (t). When the switching signal is not output from the speaker sound determination unit 201, Li (t) and Ri (t) are output as Lo (t) and Ro (t), respectively. Note that the speaker sound determination unit 201 determines whether or not Ri (t) is a speaker sound signal, but may also determine Li (t).

以下、スピーカ音判定部２０１の具体的な処理について説明する。
＜ステップ１：周波数帯域が３００Ｈｚ〜６ｋＨｚ程度の範囲に含まれるか否か＞
まず、スピーカ音判定部２０１は、ＡＤＣ６Ｒから出力される音響信号Ｒｉ（ｔ）の各フレーム（１０２４サンプル）に対し、ＦＦＴを施し周波数スペクトルを算出する。算出した周波数スペクトルが、周波数帯域３００Ｈｚ〜６ｋＨｚ程度の範囲に含まれるか否かを判断する。
＜ステップ２：音量が大きいか否か＞
次に、スピーカ音判定部２０１は、音響信号Ｒｉ（ｔ）の周波数スペクトルが周波数帯域３００Ｈｚ〜６ｋＨｚ程度の範囲に含まれていると判断した場合には、Ｒｉ（ｔ）の音響信号のレベル（パワー値）が所定の閾値以上であるかどうかを判断する。 Hereinafter, specific processing of the speaker sound determination unit 201 will be described.
<Step 1: Whether the frequency band is included in the range of about 300 Hz to 6 kHz>
First, the speaker sound determination unit 201 calculates the frequency spectrum by performing FFT on each frame (1024 samples) of the acoustic signal Ri (t) output from the ADC 6R. It is determined whether or not the calculated frequency spectrum is included in a frequency band of about 300 Hz to 6 kHz.
<Step 2: Whether the volume is high>
Next, when the speaker sound determination unit 201 determines that the frequency spectrum of the acoustic signal Ri (t) is included in the frequency band of about 300 Hz to 6 kHz, the level of the acoustic signal Ri (t) ( It is determined whether the (power value) is equal to or greater than a predetermined threshold.

具体的には、スピーカ音判定部２０１は、下記式（４）によりＲｉ（ｔ）について算出した各フレームの音響信号のパワーの平均値ＰＲｉ(ｎ)が所定の閾値以上かどうかを判定する。ここで、時間領域上で連なるフレームを時刻の早い方から順に、第１、第２、第３・・・第ｎフレームと記載する。ｎはフレーム番号を示す正の整数である。 Specifically, the speaker sound determination unit 201 determines whether or not the average value PRi (n) of the sound signal power of each frame calculated for Ri (t) by the following equation (4) is equal to or greater than a predetermined threshold. Here, consecutive frames in the time domain are described as first, second, third... Nth frame in order from the earliest time. n is a positive integer indicating a frame number.

＜ステップ３：一定の周期性を有するか否か＞
次に、スピーカ音判定部２０１は、第ｎフレーム（１０２４サンプル）の音響信号のパワーの平均値ＰＲｉ(ｎ)が所定の閾値以上であると判断されると、第ｎフレームの音響信号が周期性を有するか否かを判定する。 <Step 3: Whether or not it has a certain periodicity>
Next, when the speaker sound determination unit 201 determines that the average power value PRi (n) of the sound signal of the nth frame (1024 samples) is equal to or greater than a predetermined threshold, the sound signal of the nth frame is a period. It is determined whether or not it has sex.

図６は、第ｎフレームの音響信号が周期性を有するか否かを判定する方法を説明するための図である。図６において、第ｎフレームの音響信号Ｒｉ（ｔ）のうち、例えば、ｔ＝１〜ｔ０番目のＲｉ（ｔ）を基準ブロックとして用いた上で、自己相関を計算する（ｔ０は２以上の整数）。即ち、ｔ０番目以降のＲｉ(ｔ)に対して、ｔ０個の連続するＲｉ（ｔ）から成る評価ブロックを定義し、評価ブロックの位置を時間方向に順次ずらしながら基準ブロックと評価ブロックとの間の相関を求めていく。図６では、Ｐがずらし幅、換言すると評価ブロックの位置表わす変数であり、Ｐ＞ｔ０である。具体的には、下記式（５）に従って自己相関値Ｓ（Ｐ）を算出する。自己相関値Ｓ（Ｐ）は、評価ブロックの位置を決める変数Ｐの関数となる。 FIG. 6 is a diagram for explaining a method of determining whether or not the acoustic signal of the nth frame has periodicity. In FIG. 6, among the acoustic signals Ri (t) of the nth frame, for example, the autocorrelation is calculated after using t = 1 to t0th Ri (t) as a reference block (t0 is 2 or more). integer). That is, an evaluation block consisting of t0 consecutive Ri (t) is defined for Ri (t) after t0, and the position of the evaluation block is sequentially shifted in the time direction between the reference block and the evaluation block. We will seek the correlation. In FIG. 6, P is a variable representing the shift width, in other words, the position of the evaluation block, and P> t0. Specifically, the autocorrelation value S (P) is calculated according to the following equation (5). The autocorrelation value S (P) is a function of the variable P that determines the position of the evaluation block.

図７に、算出された自己相関値Ｓ（Ｐ）と変数Ｐの関係を示す。図７において、横軸及び縦軸はそれぞれ変数Ｐ及び自己相関値Ｓ（Ｐ）を表す。図７によると、変数Ｐの変化に対して自己相関値Ｓ（Ｐ）が周期的に所定の閾値以上となる極大値を取っている。この場合、スピーカ音判定部２０１は、第ｎフレームの音響信号Ｒｉ(ｔ)が周期性を有しており、スピーカ音信号を含むと判断し、切り替え部２０２へ切り替え信号を出力する。尚、上記ステップ１乃至３の実行順序については、変更してもかまわない。 FIG. 7 shows the relationship between the calculated autocorrelation value S (P) and the variable P. In FIG. 7, the horizontal axis and the vertical axis represent the variable P and the autocorrelation value S (P), respectively. According to FIG. 7, the autocorrelation value S (P) takes a local maximum value that periodically exceeds a predetermined threshold with respect to the change of the variable P. In this case, the speaker sound determination unit 201 determines that the sound signal Ri (t) of the nth frame has periodicity and includes a speaker sound signal, and outputs a switching signal to the switching unit 202. The execution order of steps 1 to 3 may be changed.

切り替え部２０２は、スピーカ音判定部２０１から、切り替え信号が出力されると、ステレオ方式からモノラル方式へ切り替える
。即ち、切り替え部２０２は、Ｌｉ（ｔ）又はＲｉ(ｔ)いずれか一方の音響信号をＬｏ（ｔ）及びＲｏ(ｔ)として出力する。これにより、集音した音にスピーカ音が含まれる場合には、モノラル方式で記録されることとなる。 When the switching signal is output from the speaker sound determination unit 201, the switching unit 202 switches from the stereo system to the monaural system. That is, the switching unit 202 outputs one of the acoustic signals Li (t) and Ri (t) as Lo (t) and Ro (t). As a result, if the collected sound includes speaker sound, it is recorded in monaural format.

以上説明したように、本実施例に係るスピーカ音制御部２００は、マイク５Ｌ及び５Ｒが集音する音にスピーカ音が含まれているか否かを検出する。スピーカ音を検出した場合には、マイク５Ｌ及び５Ｒによって集音された音全体の音像を撮像装置１による撮影方向とほぼ一致するように制御する。 As described above, the speaker sound control unit 200 according to the present embodiment detects whether or not the speaker sound is included in the sound collected by the microphones 5L and 5R. When the speaker sound is detected, control is performed so that the sound image of the entire sound collected by the microphones 5L and 5R substantially coincides with the shooting direction by the imaging device 1.

＜＜変形例１＞＞
上記実施例１のスピーカ音制御部１００と実施例２のスピーカ音制御部２００を組み合わせることも可能である。 << Modification 1 >>
It is possible to combine the speaker sound control unit 100 of the first embodiment and the speaker sound control unit 200 of the second embodiment.

例えば、マイク５Ｌ及び５Ｒによって集音された音に人の声のスピーカ音が含まれる場合には、スピーカ音制御部１００によって音像方向の制御を行ない、人の声のスピーカ音が含まれないものの、音楽のスピーカ音が含まれる場合には、スピーカ音制御部２００による音像制御をなすように組み合わせることができる。 For example, when the sound collected by the microphones 5L and 5R includes the speaker sound of a human voice, the direction of the sound image is controlled by the speaker sound control unit 100, and the speaker sound of the human voice is not included. When the speaker sound of music is included, it can be combined so that the sound image control by the speaker sound control unit 200 is performed.

＜＜変形例２＞＞
撮像装置１に通常記録モードとスピーカ音制御記録モードの切り替えスイッチを設けることもできる。 << Modification 2 >>
The imaging device 1 can be provided with a switch for switching between the normal recording mode and the speaker sound control recording mode.

ユーザが撮像装置１で撮影する際に通常記録モードに設定すると、撮像装置１は、音を記録する際に、スピーカ音制御部１００又は２００による音像方向の制御を行なわない。一方、スピーカ音制御記録モードに設定すると撮像装置１は、マイク５Ｌ及び５Ｒによって集音された音に対しスピーカ音制御部１００又は２００による音像方向の制御を行う。 When the user sets the normal recording mode when shooting with the imaging device 1, the imaging device 1 does not control the sound image direction by the speaker sound control unit 100 or 200 when recording the sound. On the other hand, when the speaker sound control recording mode is set, the imaging apparatus 1 controls the sound image direction by the speaker sound control unit 100 or 200 with respect to the sounds collected by the microphones 5L and 5R.

かかる撮像装置１によると、ユーザがスピーカ音制御部１００又は２００による音像方向の制御の要否を自由に決定することができる。 According to the imaging apparatus 1, the user can freely determine whether or not the sound image direction control by the speaker sound control unit 100 or 200 is necessary.

＜＜変形例３＞＞
上記実施例１及び２では、撮像装置１によって画像及び音を記録する際にスピーカ音制御部１００或いは２００による音像方向の制御を施す場合について説明したが、例えば、撮像装置１又は再生装置が、外部記録媒体（撮像装置１の場合は外部メモリ１１）に記録されている画像信号及び音響信号を再生する場合に、スピーカ音制御部１００又は２００による音像方向の制御を行なうこととしてもよい。このような撮像装置１又は再生装置によると、撮影の際には、なんら音像方向の制御のための音響信号処理を施していない、いわゆる生の画像信号及び音響信号を取得することができる。従って、撮影者の意図しない音像方向の制御が施されて記録される事態を回避することができる。 << Modification 3 >>
In the first and second embodiments, the case where the sound image direction is controlled by the speaker sound control unit 100 or 200 when the image and sound are recorded by the imaging device 1 has been described. For example, the imaging device 1 or the playback device When reproducing the image signal and the sound signal recorded in the external recording medium (external memory 11 in the case of the imaging apparatus 1), the sound image direction may be controlled by the speaker sound control unit 100 or 200. According to such an imaging device 1 or a playback device, so-called raw image signals and acoustic signals that are not subjected to any acoustic signal processing for controlling the direction of the sound image can be acquired during photographing. Therefore, it is possible to avoid a situation where recording is performed by controlling the direction of the sound image unintended by the photographer.

＜＜変形例４＞＞
撮影時に取得した画像信号から人の顔の画像信号を検出する公知の技術（顔画像検出）を利用して、当該画像信号からマイクの画像信号を検出すること（マイク画像検出）が可能である。人の顔の画像信号を検出する公知の技術としては、例えば、特開２００７−２５７３５８号に記載の技術がある。当該技術において顔の画像信号を検出する際に参照する人の顔に関する重みテーブルをマイクに関する重みテーブルに置き換えることにより、画像信号からマイクの画像信号を検出することが可能である。 << Modification 4 >>
It is possible to detect a microphone image signal from the image signal (microphone image detection) using a known technique (face image detection) for detecting a human face image signal from an image signal acquired at the time of shooting. . As a known technique for detecting an image signal of a human face, for example, there is a technique described in Japanese Patent Application Laid-Open No. 2007-257358. It is possible to detect the image signal of the microphone from the image signal by replacing the weight table related to the face of the person to be referred to when detecting the face image signal in the technique with the weight table related to the microphone.

図１の撮像装置１において、画像処理部８に画像信号から人の顔の画像信号を検出する顔画像検出部とマイクの画像信号を検出するマイク画像検出部を設けることができる。そして、ＣＰＵ１５は、ＡＦＥ４から出力される画像信号に対し当該顔画像検出部に顔検出処理を行なわせ、人の顔の画像信号が検出された場合には、マイク画像検出部にマイク画像検出処理を行なわせることができる。マイク画像検出部は、人の顔の近辺にマイクがあるかどうかを検出するために、ＡＦＥ４から出力される画像信号のうち、人の顔の画像信号が検出された領域を含み、当該領域よりも大きい所定領域について、マイクの画像信号の検出を行なう。 In the imaging apparatus 1 of FIG. 1, the image processing unit 8 can be provided with a face image detection unit that detects a human face image signal from an image signal and a microphone image detection unit that detects a microphone image signal. Then, the CPU 15 causes the face image detection unit to perform face detection processing on the image signal output from the AFE 4, and when a human face image signal is detected, the microphone image detection unit performs microphone image detection processing. Can be performed. The microphone image detection unit includes a region in which the image signal of the human face is detected from the image signals output from the AFE 4 in order to detect whether or not the microphone is in the vicinity of the human face. The image signal of the microphone is detected for a predetermined large area.

上記顔画像検出及びマイク画像検出により、ＡＦＥ４から出力される画像信号中に人の顔及びマイクの画像信号が検出された場合には、撮影領域に人が存在し、さらに当該人がマイクを使用して声を発しているシーンであると判断できる。 When the face image detection and microphone image detection detect a human face and microphone image signal in the image signal output from the AFE 4, there is a person in the shooting area, and the person uses the microphone. It can be determined that the scene is a voice.

このような場合に、ＣＰＵ１５は、マイク５Ｌ及び５Ｒが集音する音にはスピーカ音が含まれる可能性があると判断し、スピーカ音制御部１００又は２００に音像方向の制御を行なわせることができる。一方、ＣＰＵ１５は、ＡＦＥ４から出力される画像信号中に顔の画像信号が検出されない場合、或いは顔の画像信号は検出されたものの、マイクの画像信号が検出されない場合には、スピーカ音制御部１００又は２００に音像方向の制御を行なわせない。 In such a case, the CPU 15 determines that the sound collected by the microphones 5L and 5R may include speaker sound, and causes the speaker sound control unit 100 or 200 to control the sound image direction. it can. On the other hand, when the face image signal is not detected in the image signal output from the AFE 4 or when the face image signal is detected but the microphone image signal is not detected, the CPU 15 controls the speaker sound control unit 100. Alternatively, 200 is not allowed to control the sound image direction.

また、上記顔画像検出処理及びマイク画像検出処理、並びに顔の画像信号とマイクの画像信号が検出された場合のスピーカ音制御部１００又は２００による音像方向の制御は、外部メモリ１１に記録されている画像信号及び音響信号を再生する場合に実行させることとしてもよい。これにより音像方向の制御が必要なシーンを適切に判断し、スピーカ音制御部１００又は２００による音像方向の制御を施すことが可能となる。 The face image detection process and the microphone image detection process, and the control of the sound image direction by the speaker sound control unit 100 or 200 when the face image signal and the microphone image signal are detected are recorded in the external memory 11. It may be executed when reproducing the existing image signal and sound signal. As a result, it is possible to appropriately determine a scene that requires control of the sound image direction, and control the sound image direction by the speaker sound control unit 100 or 200.

＜＜変形例５＞＞
上記実施例１及び２では、ステレオ録音ためにＬＲチャンネル用の２つのマイクを備えた撮像装置１、ステレオ再生を行なうためにＬＲチャンネル用の２つのスピーカを備えた撮像装置１又は再生装置について説明したが、マイク及びスピーカの数や音響信号の録音及び再生方式はこれらに限られるものではない。例えば、６つのマイク及びスピーカを用いた５．１チャンネルによる録音及び再生方式であっても本発明を実現することができる。 << Modification 5 >>
In the first and second embodiments, the imaging apparatus 1 having two LR channel microphones for stereo recording and the imaging apparatus 1 or the playback apparatus having two LR channel speakers for performing stereo reproduction will be described. However, the number of microphones and speakers and the method for recording and reproducing the acoustic signal are not limited to these. For example, the present invention can be realized even with a 5.1 channel recording and playback method using six microphones and speakers.

５Ｌマイクロフォン
５Ｒマイクロフォン
６ＬＡＤＣ
６ＲＡＤＣ
１００スピーカ音制御部
１０１ＬＦＦＴ部
１０１ＲＦＦＴ部
１０２方向判定部
１０３スピーカ音判定部
１０４ゲイン調整部
１０５ＬＩＦＦＴ部
１０５ＲＩＦＦＴ部
２００スピーカ音制御部
２０１スピーカ音判定部
２０２切り替え部

5L microphone 5R microphone 6L ADC
6R ADC
DESCRIPTION OF SYMBOLS 100 Speaker sound control part 101L FFT part 101R FFT part 102 Direction determination part 103 Speaker sound determination part 104 Gain adjustment part 105L IFFT part 105R IFFT part 200 Speaker sound control part 201 Speaker sound determination part 202 Switching part

Claims

Sound collecting means for collecting sound that arrives at the time of shooting to acquire an acoustic signal of the sound;
Speaker sound detection means for detecting a speaker sound signal from the acoustic signal;
Sound image direction control means for performing sound signal processing on the sound signal so that the sound image direction of the speaker sound signal matches the shooting direction when the speaker sound signal is detected;
Acoustic signal recording means for recording the acoustic signal subjected to the acoustic signal processing;
An acoustic signal processing device comprising:

Sound collecting means for collecting sound that arrives at the time of shooting to acquire an acoustic signal of the sound;
Speaker sound detection means for detecting a speaker sound signal from the acoustic signal;
Sound image direction control means for performing sound signal processing on the sound signal so that the sound image direction of the sound signal matches the shooting direction when the speaker sound signal is detected;
Acoustic signal recording means for recording the acoustic signal subjected to the acoustic signal processing;
An acoustic signal processing device comprising:

The acoustic signal processing device according to claim 1 or 2,
An imaging means for acquiring an image signal of the shooting target by shooting the shooting target;
Image signal recording means for recording the image signal acquired by the imaging means in association with an acoustic signal subjected to acoustic signal processing by a sound image direction control means included in the acoustic signal processing device;
An imaging apparatus, further comprising:

Face detection means for detecting a human face image signal from the image signal;
A microphone detection means for detecting an image signal of a microphone from the image signal,
The speaker sound detection means provided in the acoustic signal processing device is configured to perform the acoustic signal processing when a face image signal of a person is detected by the face detection means and a microphone image signal is detected by the microphone detection means. The imaging apparatus according to claim 3, wherein a speaker sound signal is detected from an acoustic signal acquired by a sound collecting unit included in the apparatus.

An imaging means for acquiring an image signal of the shooting target by shooting the shooting target;
Sound collecting means for acquiring a first acoustic signal of the sound by collecting the sound arriving at the time of shooting;
First recording means for recording the first acoustic signal in association with the image signal acquired by the imaging means;
Second recording means for generating a second acoustic signal based on the first acoustic signal and recording the second acoustic signal in association with the image signal acquired by the imaging means;
Switching means for switching between the first recording means and the second recording means;
With
The second recording means includes
Speaker sound detection means for detecting a speaker sound signal from the acoustic signal;
When the speaker sound signal is detected, sound image direction control means for generating the second sound signal by performing sound signal processing on the speaker sound signal so that the sound image direction of the speaker sound signal matches the shooting direction When,
An imaging apparatus comprising:

An imaging means for acquiring an image signal of the shooting target by shooting the shooting target;
Sound collecting means for acquiring a first acoustic signal of the sound by collecting the sound arriving at the time of shooting;
First recording means for recording the first acoustic signal in association with the image signal acquired by the imaging means;
Second recording means for generating a second acoustic signal based on the first acoustic signal and recording the second acoustic signal in association with the image signal acquired by the imaging means;
Switching means for switching between the first recording means and the second recording means;
With
The second recording means includes
Speaker sound detection means for detecting a speaker sound signal from the acoustic signal;
Sound image direction control means for generating the second sound signal by performing sound signal processing on the sound signal so that a sound image direction of the sound signal matches a shooting direction when the speaker sound signal is detected;
An imaging apparatus comprising:

An acquisition means for acquiring the acoustic signal from a recording means in which an image signal acquired by imaging by the imaging means and an acoustic signal of sound arriving at the time of imaging acquired by the sound collection means are recorded in association with each other. When,
Speaker sound detection means for detecting a speaker sound signal from the acoustic signal acquired by the acquisition means;
Sound image direction control means for performing sound signal processing on the sound signal so that a sound image direction of the speaker sound signal matches a shooting direction by the shooting means when the speaker sound signal is detected;
Reproduction means for reproducing the acoustic signal subjected to the acoustic signal processing;
An acoustic signal processing device comprising:

An acquisition means for acquiring the acoustic signal from a recording means in which an image signal acquired by imaging by the imaging means and an acoustic signal of sound arriving at the time of imaging acquired by the sound collection means are recorded in association with each other. When,
Speaker sound detection means for detecting a speaker sound signal from the acoustic signal acquired by the acquisition means;
Sound image direction control means for performing sound signal processing on the sound signal so that a sound image direction of the sound signal matches a shooting direction by the shooting means when the speaker sound signal is detected;
Reproduction means for reproducing the acoustic signal subjected to the acoustic signal processing;
An acoustic signal processing device comprising: