[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2015172630A1 - 摄像装置及其对焦方法 - Google Patents

摄像装置及其对焦方法 Download PDF

Info

Publication number
WO2015172630A1
WO2015172630A1 PCT/CN2015/077480 CN2015077480W WO2015172630A1 WO 2015172630 A1 WO2015172630 A1 WO 2015172630A1 CN 2015077480 W CN2015077480 W CN 2015077480W WO 2015172630 A1 WO2015172630 A1 WO 2015172630A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
target
stored
focus
sounds
Prior art date
Application number
PCT/CN2015/077480
Other languages
English (en)
French (fr)
Inventor
孙丽
Original Assignee
努比亚技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 努比亚技术有限公司 filed Critical 努比亚技术有限公司
Publication of WO2015172630A1 publication Critical patent/WO2015172630A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G03PHOTOGRAPHY; CINEMATOGRAPHY; ANALOGOUS TECHNIQUES USING WAVES OTHER THAN OPTICAL WAVES; ELECTROGRAPHY; HOLOGRAPHY
    • G03BAPPARATUS OR ARRANGEMENTS FOR TAKING PHOTOGRAPHS OR FOR PROJECTING OR VIEWING THEM; APPARATUS OR ARRANGEMENTS EMPLOYING ANALOGOUS TECHNIQUES USING WAVES OTHER THAN OPTICAL WAVES; ACCESSORIES THEREFOR
    • G03B13/00Viewfinders; Focusing aids for cameras; Means for focusing for cameras; Autofocus systems for cameras
    • G03B13/32Means for focusing
    • G03B13/34Power focusing
    • G03B13/36Autofocus systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/67Focus control based on electronic image sensor signals

Definitions

  • the present invention relates to the field of imaging technology, and in particular, to an imaging device and a focusing method thereof.
  • the traditional focus method is usually manual focus, where the user selects the focus and then focuses on the focus.
  • Manual focus is cumbersome and inefficient, especially when the target is constantly changing or the target is constantly moving, manual focus is not able to achieve real-time tracking.
  • the existing autofocus has not been able to achieve tracking shooting for a specific target.
  • the main object of the present invention is to provide an image pickup apparatus and a focusing method thereof, which aim to perform autofocusing by sound to achieve tracking shooting of a tracking target.
  • the present invention provides a focusing method of a camera device, including the steps of:
  • the sound is collected; whether the collected sound has a target sound matching the pre-stored sound, and if so, the target sound source that emits the target sound is focused.
  • determining whether the collected sound matches the pre-stored sound includes: if the collected sound is a sound emitted by at least two sound sources, separating the sounds of the sound sources; and extracting the sound sources The acoustic characteristics of the sound are compared with the acoustic characteristics of the pre-stored sound; if the acoustic characteristics of the sound of one of the sound sources match the acoustic characteristics of the pre-stored sound, the sound of the sound source is determined to be the target sound, the sound The source is the target sound source.
  • focusing on the target sound source that emits the target sound comprises: positioning the target sound source that emits the target sound; and focusing on the positioning direction of the target sound source.
  • the method further comprises: if there are at least two pre-stored sounds and at least two matching target sounds, focusing on the target sound source corresponding to the target sound specified by the user.
  • the method further comprises: if there is no target sound in the collected sound that matches the pre-stored sound, Focus on any sound source.
  • the step of determining whether there is a target sound matching the pre-stored sound in the collected sound further comprises: if the collected sound does not have a target sound matching the pre-stored sound, selecting the sound according to the preset rule The sound source is in focus.
  • the preset rule is the principle of proximity or the principle of maximum volume.
  • the pre-stored sound is a sound collected during the shooting process, and the collecting and storing steps are:
  • the invention also provides an image capturing device, comprising a sound collecting module, a processing module and a focusing module, a sound collecting module, configured to collect sound; and a processing module configured to determine whether the collected sound matches the pre-stored sound.
  • the target sound if yes, sends a first focus signal to the focus module; the focus module is configured to focus the target sound source that emits the target sound according to the first focus signal.
  • the processing module is configured to: if the detected sound is detected by at least two sound sources, separate the sounds of the sound sources; extract the acoustic features of the sounds of the sound sources, and respectively separately with the pre-stored sounds The acoustic characteristics are compared; if the acoustic characteristics of the sound of one of the sound sources match the acoustic characteristics of the pre-stored sound, it is determined that the sound of the sound source is the target sound, and the sound source is the target sound source.
  • the focusing module is configured to: position the target sound source that emits the target sound, and control the positioning direction of the camera to the target sound source to perform focusing.
  • the focusing module is configured to: if there are at least two pre-stored sounds and at least two matching target sounds, focus on the target sound source corresponding to the target sound specified by the user.
  • the processing module is configured to: if it is determined that there is no target sound matching the pre-stored sound in the collected sound, send a second focus signal to the focus module; the focus module is configured to: utter any sound according to the second focus signal The sound source is in focus.
  • the processing module is configured to: if it is determined that there is no target sound matching the pre-stored sound, send a second focus signal to the focus module; the focus module is configured to: after receiving the second focus signal, according to The preset rule selects the sound source to focus.
  • the preset rule is the principle of proximity or the principle of maximum volume.
  • the pre-stored sound is a sound collected during the shooting process
  • the sound collecting module is further configured to: determine a tracking target selected by the user on the shooting screen during the shooting process, and collect the sound of the tracking target
  • the processing module is further configured to : The acoustic characteristics of the sound of the tracking target are extracted and stored.
  • the invention also provides a focusing method of a photographing device, comprising:
  • the method further comprises: if there are at least two targets selected in the first shooting picture, and at least two matching target sounds are present, then focusing is performed according to the target sound source corresponding to the target sound specified by the user. .
  • the method further comprises: if there is no corresponding relationship table in the surrounding sound data, selecting the utterance sound source according to the preset rule to perform focusing.
  • the preset rule is the principle of proximity or the principle of maximum volume.
  • the focusing method of the image capturing device collects the sound, separates the sound, and matches the sound, thereby identifying the target sound and the corresponding target sound source, and automatically focusing the target sound source, thereby finally realizing the passing sound. Tracking the tracking target enables real-time tracking shooting even if the tracking target is constantly changing or moving.
  • FIG. 1 is a flow chart showing a first embodiment of a focusing method of an image pickup apparatus of the present invention
  • FIG. 3 is a flow chart showing a second embodiment of a focusing method of the image pickup apparatus of the present invention.
  • Fig. 4 is a block diagram showing the configuration of an image pickup apparatus according to an embodiment of the present invention.
  • the camera device of the present invention includes all devices having a camera function, such as a mobile phone, a tablet computer, a video camera, a surveillance camera, and the like.
  • the focusing method includes the following steps:
  • Step S101 start imaging
  • Step S102 collecting sound during shooting
  • the camera device utilizes at least two microphones, preferably a microphone array consisting of a plurality of microphones to collect sound.
  • Step S103 determining whether there is a target sound in the collected sound that matches the pre-stored sound.
  • a sound segment of a person is pre-recorded or acquired in the camera, and the sound segment is analyzed, and the acoustic features of the sound segment are extracted and stored.
  • the camera device samples the collected sound in real time or timing, and analyzes whether the collected sound has a target sound that matches the pre-stored sound. If there is a target sound, the process proceeds to step S104; if there is no target sound, the image is maintained. The current focus state.
  • the sound matching process is specifically as shown in FIG. 2, and includes the following steps:
  • Step S110 determining whether the collected sound is a sound emitted by a single sound source
  • step S120 If the sound is at least two sound sources, the process proceeds to step S120; if the sound is a single sound source, the process proceeds to step S160.
  • Step S120 separating the sound of each sound source
  • a conventional sound source separation method such as a sound source analysis method based on independent component analysis, can be used to separate the sound of each of the plurality of sound sources, which fully utilizes the source signals of the sound sources between the sound sources to be independent.
  • a linear filter whose number of dimensions is equal to the number of microphones is used according to the number of sound sources, and when the number of sound sources is smaller than the number of microphones, the source signal can be completely recovered.
  • the L1 norm minimization method can be used, which takes advantage of the fact that the probability distribution of the speech power spectrum is close to the Laplacian distribution instead of the Gaussian distribution.
  • the sound source separation is performed by converting analog sound input from at least two sound sources into a digital sound input; converting the digital sound input from the time domain to the frequency domain; generating a first solution set, and the solution set is derived from The estimated error of those sounds that are active in source 1 to N is minimal; A solution set estimates the number of active sound sources to produce an optimal separation solution set that is closest to each sound source of the received analog sound input; the optimal separation solution set is converted to the time domain.
  • the sound of each sound source can be separated.
  • Step S130 extracting acoustic characteristics of the sounds of the respective sound sources, and respectively comparing with the acoustic characteristics of the pre-stored sounds
  • LPC linear predictive cepstral
  • MFCC beautified cepstral parameters
  • the camera device extracts acoustic features from the sounds of the respective sound sources to form a sequence of feature vectors to be identified, such as Performing a matching score (also referred to as a log likelihood score, or a likelihood score, or a score) of each feature vector sequence to be recognized and a feature vector sequence formed by the acoustic features of the pre-stored sound, and making a decision;
  • a matching score also referred to as a log likelihood score, or a likelihood score, or a score
  • the type of pattern recognition method closed voiceprint identification, open voiceprint discrimination, and voiceprint confirmation
  • a rejection judgment is made, and the result is obtained.
  • Step S140 If the acoustic characteristics of the sound of one of the sound sources match the acoustic characteristics of the pre-stored sound, determine that the sound of the sound source is the target sound, and the sound source is the target sound source.
  • the matching score of the sound of one of the sound sources is the highest and exceeds the preset threshold, it is determined that the sound is the target sound, and the sound source is the target sound source.
  • Step S150 Extracting acoustic characteristics of the collected sound
  • the sequence of feature vectors to be identified formed by the acoustic features of the sound is directly extracted.
  • Step S160 determining whether the acoustic characteristics of the collected sound match the acoustic characteristics of the pre-stored sound
  • step S170 Matching the feature vector sequence to be identified with the feature vector sequence formed by the acoustic features of the pre-stored sound, and making a decision; according to the type of the voiceprint recognition method (closed voiceprint discrimination, open voiceprint identification and voiceprint confirmation) ), a rejection judgment is made when needed, and the result is obtained. If they match, the process goes to step S170; otherwise, it is determined that there is no target sound in the collected sound.
  • Step S170 determining that the collected sound is the target sound, and the sound source is the target sound source.
  • the matching score exceeds the preset threshold, it is determined that the collected sound is the target sound, and the sound source is the target sound source.
  • step S104 When the target sound is matched, the process proceeds to step S104.
  • Step S104 Focusing on the target sound source that emits the target sound
  • the target sound source that emits the target sound is positioned by using a conventional positioning method, and then the positioning direction of the camera is aligned with the target sound source, and the target sound source is focused. Therefore, with the focusing method, the tracking target can be tracked in real time by the camera.
  • the sound of at least two people is pre-stored in the camera, and the pre-stored plurality of sounds are prioritized.
  • the camera device is preset according to the preset.
  • the priority order focuses on the target sound source corresponding to the higher priority target sound. That is, the image pickup device stores sound characteristics of a plurality of tracking targets, and when a plurality of tracking targets simultaneously emit sound, the tracking target having a higher priority is aligned.
  • the user may specify a tracking target from the pre-stored tracking targets for tracking shooting.
  • the focusing method includes the following steps:
  • Step S201 start imaging
  • Step S202 collecting sound during shooting
  • Step S203 determining whether there is a target sound in the collected sound that matches the pre-stored sound.
  • step S204 If there is a target sound, the process proceeds to step S204; if there is no target sound, the process proceeds to step S205.
  • Step S204 Focusing on the target sound source that emits the target sound
  • Step S205 Focusing on any sound source
  • the unique sound source is positioned, and the camera is controlled to be in focus with the positioning direction of the sound source; when there are multiple sound sources, the sound sources are separated. And select any sound source for positioning, and control the camera to focus on the positioning direction of the sound source to focus.
  • This embodiment is especially suitable for a conference scene. When an important person in a conference speaks, the focus is on the important person; when an important person does not speak, and others speak, focus on other people.
  • the camera device can also select a focus target according to a preset rule, such as the principle of proximity, the principle of maximum volume, and the like.
  • the camera can pre-store the sound of the tracking target before starting shooting, and then track the tracking target during shooting.
  • the camera can also select a tracking target during shooting and then track the tracking target. For example, during shooting, the user selects a tracking target on the shooting screen, and the camera device flattens the tracking target on the shooting screen according to the existing conversion method.
  • the surface position is converted into a spatial position, and the sound of the tracking target is acquired, and the acoustic characteristics of the sound are extracted and stored by analysis, and then the camera can track the tracking regardless of how the tracking target moves within the imaging range.
  • the focusing method of the image pickup apparatus of the present invention recognizes the target sound and the corresponding target sound source by collecting the sound, separating the sound, and matching the sound, and automatically focusing the target sound source, thereby finally realizing tracking by the sound pair.
  • Target tracking, real-time tracking can be achieved even if the tracking target is constantly changing or moving.
  • the image pickup apparatus includes a sound collection module, a processing module, and a focus module.
  • Sound Acquisition Module Set to collect sound.
  • the sound collection module collects sound through at least two microphones, preferably through a microphone array consisting of multiple microphones.
  • Processing module configured to determine whether there is a target sound matching the pre-stored sound in the collected sound, and if yes, send a first focus signal to the focus module.
  • a sound segment of a person is pre-recorded or acquired in the camera, and the sound segment is analyzed, and the acoustic features of the sound segment are extracted and stored.
  • the processing module samples the collected sound in real time or timing, and analyzes whether the collected sound has a target sound that matches the pre-stored sound. If there is a target sound, the first focus signal is sent to the focusing module.
  • the processing module first determines whether the collected sound is a sound emitted by a single sound source.
  • a conventional sound source separation method such as a sound source analysis method based on independent component analysis, can be used to separate the sound of each of the plurality of sound sources, which fully utilizes the source signals of the sound sources between the sound sources to be independent.
  • a linear filter whose number of dimensions is equal to the number of microphones is used according to the number of sound sources, and when the number of sound sources is smaller than the number of microphones, the source signal can be completely recovered.
  • the L1 norm minimization method can be used, which takes advantage of the fact that the probability distribution of the speech power spectrum is close to the Laplacian distribution instead of the Gaussian distribution.
  • the sound source separation is performed using a method of converting analog sound input from at least two sound sources into a digital sound input; converting the digital sound input from the time domain to the frequency domain; generating a first solution set, and the solution set is derived from The estimated errors of those sounds that are active in sound sources 1 through N are minimal; the number of active sound sources is estimated from the first set of solutions, To produce an optimal separation solution set that is closest to each sound source of the received analog sound input; the optimal separation solution set is converted to the time domain.
  • the optimal separation solution set is converted to the time domain.
  • the acoustic characteristics of the sounds of the respective sound sources are extracted and compared with the acoustic characteristics of the pre-stored sounds, respectively.
  • Current methods for extracting acoustic features commonly used in sound matching methods include linear predictive cepstral (LPCC) parameters, beautified cepstral parameters (MFCC), and the like.
  • the specific processing module extracts acoustic features from the sounds of the respective sound sources to form a sequence of feature vectors to be identified, such as Performing a matching score (also referred to as a log likelihood score, or a likelihood score, or a score) of each feature vector sequence to be recognized and a feature vector sequence formed by the acoustic features of the pre-stored sound, and making a decision;
  • a matching score also referred to as a log likelihood score, or a likelihood score, or a score
  • the type of pattern recognition method closed voiceprint identification, open voiceprint discrimination, and voiceprint confirmation, when necessary, a rejection judgment is made, and the result is obtained.
  • the acoustic characteristics of the sound of one of the sound sources match the acoustic characteristics of the pre-stored sound (eg, the matching score of the sound of one of the sound sources is the highest and exceeds the preset threshold), determining that the sound of the sound source is the target sound, The sound source is the target sound source; otherwise, it is determined that there is no target sound in the collected sound.
  • the acoustic characteristics of the collected sound are directly extracted, and it is judged whether the acoustic characteristics of the collected sound match the acoustic characteristics of the pre-stored sound, and if they match, the collected sound is determined to be The target sound, the sound source is the target sound source, otherwise, it is determined that there is no target sound in the collected sound.
  • the processing module determines that there is no target sound among the collected sounds, then transmitting a second focus signal to the focus module.
  • Focus module Set to focus on the target sound source that emits the target sound according to the first focus signal.
  • the focus module uses a conventional positioning method to locate the target sound source that emits the target sound, and then controls the positioning direction of the camera to the target sound source to focus the target sound source. Therefore, with the focusing method, the tracking target can be tracked in real time by the camera.
  • the focus module is based on the preset priority.
  • the sequence focuses on the target sound source corresponding to the higher priority target sound. That is, the image pickup device stores sound characteristics of a plurality of tracking targets, and when a plurality of tracking targets simultaneously emit sound, the tracking target having a higher priority is aligned. Alternatively, the user may specify a tracking target from the pre-stored tracking targets for tracking shooting.
  • the focus module when the processing module sends the second focus signal to the focus module, the focus module focuses on any of the sound source according to the second focus signal.
  • the sound source is the only sound source, the unique sound source is positioned, and the camera is controlled to focus on the positioning direction of the sound source; when there are multiple sound sources, the processing module performs multiple sound sources.
  • the focus module selects any sound source for positioning, and controls the camera to focus on the positioning direction of the sound source to focus. This embodiment is especially suitable for a conference scene. When an important person in a conference speaks, it focuses on the important task; when an important person does not speak, and another person speaks, it focuses on other people.
  • the focus module can also select the focus target according to the preset rules, such as the principle of proximity, the principle of maximum volume, and the like.
  • the camera can pre-store the sound of the tracking target before starting shooting, and then track the tracking target during shooting.
  • the camera can also select a tracking target during shooting and then track the tracking target. For example, during shooting, the user selects a tracking target on the shooting screen, and the camera device converts the planar position of the tracking target on the shooting image into a spatial position according to an existing conversion method, and acquires the sound of the tracking target, and analyzes The acoustic characteristics of the sound are extracted and stored, and then regardless of how the tracking target moves within the imaging range, the orientation of the target sound source can be determined by the feature matching of the sound, and the camera can perform focus tracking shooting.
  • the image pickup apparatus of the present invention recognizes the target sound and the corresponding target sound source by collecting sound, separating the sound, and matching the sound, and automatically focuses the target sound source, thereby finally realizing the tracking target by the sound. Tracking shots enables real-time tracking even when the tracking target is constantly changing or moving.
  • the image pickup device and the focusing method thereof of the present invention recognize the target sound and the corresponding target sound source by collecting the sound, separating the sound, and matching the sound, and automatically focusing the target sound source, thereby finally realizing the tracking target by the sound.
  • Tracking shots enable real-time tracking even when the tracking target is constantly changing or moving.
  • the technical solution of the present invention is particularly suitable for tracking shooting scenes such as conferences and monitoring, and has industrial applicability.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Studio Devices (AREA)

Abstract

本发明公开了一种摄像装置及其对焦方法,所述对焦方法包括步骤:在拍摄过程中,采集声音;判断采集到的声音中是否有与预存的声音相匹配的目标声音,若是,则对发出所述目标声音的目标声源进行对焦。从而,通过采集声音、分离声音、匹配声音,从而识别出目标声音及对应的目标声源,并自动对该目标声源进行对焦,最终实现了通过声音对跟踪目标进行跟踪拍摄,即使跟踪目标不断转换或不断移动,也能实现实时跟踪拍摄。

Description

摄像装置及其对焦方法 技术领域
本发明涉及摄像技术领域,尤其是涉及一种摄像装置及其对焦方法。
背景技术
摄像装置传统的对焦方法通常是手动对焦,由用户选择焦点,然后对焦点进行对焦。手动对焦比较繁琐,而且效率较低,特别是当目标不断转换或目标不断移动时,手动对焦更是无法实现实时跟踪拍摄。现有技术中虽然也有自动对焦的方法,但现有的自动对焦还无法实现对特定目标的跟踪拍摄。
发明内容
本发明的主要目的在于提供一种摄像装置及其对焦方法,旨在通过声音进行自动对焦,以实现对跟踪目标进行跟踪拍摄。
为达以上目的,本发明提供一种摄像装置的对焦方法,包括步骤:
在拍摄过程中,采集声音;判断采集到的声音中是否有与预存的声音相匹配的目标声音,若是,则对发出目标声音的目标声源进行对焦。
优选地,判断采集到的声音中是否有与预存的声音相匹配的目标声音包括:若采集到的声音为至少两声源发出的声音,则分离出各声源的声音;提取出各声源的声音的声学特征,并分别与预存的声音的声学特征进行比较;若其中一声源的声音的声学特征与预存的声音的声学特征相匹配,则判定该声源的声音为目标声音,该声源为目标声源。
优选地,对发出目标声音的目标声源进行对焦包括:对发出目标声音的目标声源进行定位;对准目标声源的定位方向进行对焦。
优选地,方法还包括:若预存的声音至少有两个,且匹配出的目标声音也至少有两个时,则对用户指定的目标声音所对应的目标声源进行对焦。
优选地,判断采集到的声音中是否有与预存的声音相匹配的目标声音的步骤之后还包括:若采集到的声音中没有与预存的声音相匹配的目标声音, 则对任一发声声源进行对焦。
优选地,判断采集到的声音中是否有与预存的声音相匹配的目标声音的步骤之后还包括:若采集到的声音中没有与预存的声音相匹配的目标声音,则根据预设规则选择发声声源进行对焦。
优选地,预设规则为就近原则或音量最大原则。
优选地,预存的声音为拍摄过程中采集存储的声音,采集存储步骤为:
拍摄过程中,确定用户在拍摄画面上选定的跟踪目标;
采集跟踪目标的声音;
提取出跟踪目标的声音的声学特征并予以存储。
本发明同时还提供一种摄像装置,包括声音采集模块、处理模块和对焦模块,声音采集模块,设置为采集声音;处理模块,设置为判断采集到的声音中是否有与预存的声音相匹配的目标声音,若是,则向对焦模块发送第一对焦信号;对焦模块,设置为根据第一对焦信号对发出目标声音的目标声源进行对焦。
优选地,处理模块设置为:若检测到采集到的声音为至少两声源发出的声音,则分离出各声源的声音;提取出各声源的声音的声学特征,并分别与预存的声音的声学特征进行比较;若其中一声源的声音的声学特征与预存的声音的声学特征相匹配,则判定该声源的声音为目标声音,该声源为目标声源。
优选地,对焦模块设置为:对发出目标声音的目标声源进行定位,控制摄像头对准目标声源的定位方向进行对焦。
优选地,对焦模块设置为:若预存的声音至少有两个,且匹配出的目标声音也至少有两个时,则对用户指定的目标声音所对应的目标声源进行对焦。
优选地,处理模块设置为:若判定采集到的声音中没有与预存的声音相匹配的目标声音,则向对焦模块发送第二对焦信号;对焦模块设置为:根据第二对焦信号对任一发声声源进行对焦。
优选地,处理模块设置为:若判定采集到的声音中没有与预存的声音相匹配的目标声音,则向对焦模块发送第二对焦信号;对焦模块设置为:接收到第二对焦信号后,根据预设规则选择发声声源进行对焦。
优选地,预设规则为就近原则或音量最大原则。
优选地,预存的声音为拍摄过程中采集存储的声音,声音采集模块还设置为:在拍摄过程中确定用户在拍摄画面上选定的跟踪目标,并采集跟踪目标的声音;处理模块还设置为:提取出跟踪目标的声音的声学特征并予以存储。
本发明同时还提供一种摄影装置的对焦方法,包括:
在第一拍摄画面中选取目标,并采集目标的声音数据;建立目标与声音数据的对应关系表;对第二拍摄画面采集周围的声音数据;判断周围声音数据是否存在对应关系表中;及当周围声音数据存在对应关系表中,则根据对应关系表选取相应的目标进行对焦。
优选地,方法还包括:若在第一拍摄画面中选取的目标至少有两个,且匹配出的目标声音也至少有两个时,则根据用户指定的目标声音所对应的目标声源进行对焦。
优选地,判断周围声音数据是否存在对应关系表中的步骤之后还包括:若周围声音数据中不存在对应关系表中时,则根据预设规则选择发声声源进行对焦。
优选地,预设规则为就近原则或音量最大原则。
本发明所提供的一种摄像装置的对焦方法,通过采集声音、分离声音、匹配声音,从而识别出目标声音及对应的目标声源,并自动对该目标声源进行对焦,最终实现了通过声音对跟踪目标进行跟踪拍摄,即使跟踪目标不断转换或不断移动,也能实现实时跟踪拍摄。
附图说明
图1是本发明的摄像装置的对焦方法第一实施例的流程图;
图2是本发明中声音匹配的具体流程图;
图3是本发明的摄像装置的对焦方法第二实施例的流程图;
图4是本发明的摄像装置一实施例的结构框图。
本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
本发明所述的摄像装置,包括一切具有摄像功能的设备,如手机、平板电脑、摄像机、监控摄像头等等。
参见图1,提出本发明的摄像装置的对焦方法第一实施例,所述对焦方法包括以下步骤:
步骤S101:开始摄像
步骤S102:在拍摄过程中采集声音
摄像装置利用至少两麦克风,优选利用由多个麦克风组成的麦克风阵列来采集声音。
步骤S103:判断采集到的声音中是否有与预存的声音相匹配的目标声音
摄像装置中预先录制或获取了某人的声音片段,并对该声音片段进行分析,提取出该声音片段的声学特征并予以存储。摄像装置实时或定时的对采集到的声音进行采样,分析采集到的声音中是否有与预存的声音相匹配的目标声音,若其中有目标声音,则进入步骤S104;若没有目标声音,则保持当前的对焦状态。
声音的匹配流程具体如图2所示,包括以下步骤:
步骤S110:判断采集到的声音是否为单一声源发出的声音
若为至少两声源发出的声音,则进入步骤S120;若为单一声源发出的声音,则进入步骤S160。
步骤S120:分离出各声源的声音
可以利用传统的声源分离方法,如基于独立分量分析的声源分析方法分离出多个声源中每一个声源的声音,其充分利用在声源之间声源的源信号是独立的这一事实。在独立分量分析中,根据声源数量使用维数等于麦克风数量的线性滤波器,当声源的数量小于麦克风的数量时,能够完全恢复源信号。当声源数量超过麦克风数量时,可以使用L1范最小化方法,该方法利用了语音功率谱的概率分布接近拉普拉斯分布而不是高斯分布这一事实。
优选利用以下方法进行声源分离:将来自至少两个声源的模拟声音输入转换为数字声音输入;将数字声音输入从时域转换到频域;产生第一解集,且该解集使得来自声源1到N中活动的那些声音的估计的误差最小;根据第 一解集估计活动声源的数量,以生产最优分离解集,该最优分离解集最接近收到的模拟声音输入的每个声源;将最优分离解集转换到时域。从而,即使在声源数量超过麦克风数量,并且出现一些具有高S/N的背景噪声、回声和混响的环境里,也能够分离出每个声源的声音。
步骤S130:提取出各声源的声音的声学特征,并分别与预存的声音的声学特征进行比较
目前声音匹配方法中常用的提取声学特征的方法包括线性预测倒谱(LPCC)参数、美化倒谱参数(MFCC)等等。
具体的,摄像装置从各声源的声音中提取声学特征形成待识别的特征矢量序列,如
Figure PCTCN2015077480-appb-000001
将每个待识别的特征矢量序列与预存的声音的声学特征形成的特征矢量序列进行匹配得分(也称为对数似然的分,或似然的分,或得分),进行判决;根据声纹的识别方法的类型(闭集声纹鉴别、开集声纹鉴别和声纹确认),在需要的时候进行拒识判决,从而得出结果。
步骤S140:若其中一声源的声音的声学特征与预存的声音的声学特征相匹配,则判定该声源的声音为目标声音,该声源为目标声源
若其中一声源的声音的匹配得分最高,且超过预设阈值,则判定该声音为目标声音,该声源为目标声源。
步骤S150:提取出采集到的声音的声学特征
若采集到的声音由单一声源发出,则直接提取出该声音的声学特征形成的待识别的特征矢量序列。
步骤S160:判断采集到的声音的声学特征与预存的声音的声学特征是否匹配
将待识别的特征矢量序列与预存的声音的声学特征形成的特征矢量序列进行匹配得分,进行判决;根据声纹的识别方法的类型(闭集声纹鉴别、开集声纹鉴别和声纹确认),在需要的时候进行拒识判决,从而得出结果。若相匹配,则进入步骤S170;否则,则判定采集到的声音中没有目标声音。
步骤S170:判定采集到的声音为目标声音,发声声源为目标声源
若匹配得分超过预设阈值,则判定采集到的该声音为目标声音,发声声源为目标声源。
当匹配出目标声音后,则进入步骤S104。
步骤S104:对发出目标声音的目标声源进行对焦
摄像装置检测到目标声音后,则利用传统的定位方法对发出目标声音的目标声源进行定位,然后控制摄像头对准目标声源的定位方向,对该目标声源进行对焦。从而,利用该对焦方法,可以利用摄像头对跟踪目标进行实时跟踪拍摄。
在某些实施例中,摄像装置中预存了至少两个人的声音,并对预存的多个声音进行优先级排序,当据此匹配出的目标声音至少有两个时,摄像装置则根据预设的优先级顺序对优先级较高的目标声音所对应的目标声源进行对焦。即,摄像装置中存储了多个跟踪目标的声音特征,当多个跟踪目标同时发声时,则对准优先级较高的跟踪目标进行对焦。或者,也可以由用户从预存的跟踪目标中指定一跟踪目标进行跟踪拍摄。
参见图3,提出本发明的摄像装置的对焦方法第二实施例,所述对焦方法包括以下步骤:
步骤S201:开始摄像
步骤S202:在拍摄过程中采集声音
步骤S203:判断采集到的声音中是否有与预存的声音相匹配的目标声音
若有目标声音,则进入步骤S204;若没有目标声音,则进入步骤S205。
步骤S204:对发出目标声音的目标声源进行对焦
步骤S205:对任一发声声源进行对焦
当发声声源为唯一声源时,则对该唯一声源进行定位,控制摄像头对准该声源的定位方向进行对焦;当发声声源有多个时,则对多个声源进行分离,并选择任一声源进行定位,控制摄像头对准该声源的定位方向进行对焦。本实施例尤其适用于会议场景,当会议中重要人物讲话时,则聚焦于该重要人物;当重要人物没有讲话,而有其他人发言时,则聚焦于其它人。
此外,摄像装置此时也可以根据预设规则选择对焦目标,如就近原则、音量最大原则等。
摄像装置可以在开始拍摄之前预先存储跟踪目标的声音,然后在拍摄过程中对跟踪目标进行跟踪拍摄。摄像装置也可以在拍摄过程中选定跟踪目标,然后对该跟踪目标进行跟踪拍摄。例如,拍摄过程中,用户在拍摄画面上选定一跟踪目标,摄像装置根据现有的转换方法将跟踪目标在拍摄画面上的平 面位置转换为空间位置,并获取该跟踪目标的声音,通过分析提取出该声音的声学特征并予以存储,随后无论该跟踪目标在摄像范围内如何移动,摄像装置都能对其跟踪拍摄。
从而,本发明的摄像装置的对焦方法,通过采集声音、分离声音、匹配声音,从而识别出目标声音及对应的目标声源,并自动对该目标声源进行对焦,最终实现了通过声音对跟踪目标的跟踪拍摄,即使跟踪目标不断转换或不断移动,也能实现实时跟踪拍摄。
参见图4,提出本发明的摄像装置一实施例,所述摄像装置包括声音采集模块、处理模块和对焦模块。
声音采集模块:设置为采集声音。
声音采集模块通过至少两麦克风,优选通过由多个麦克风组成的麦克风阵列来采集声音。
处理模块:设置为判断采集到的声音中是否有与预存的声音相匹配的目标声音,若是,则向对焦模块发送第一对焦信号。
摄像装置中预先录制或获取了某人的声音片段,并对该声音片段进行分析,提取出该声音片段的声学特征并予以存储。
处理模块实时或定时的对采集到的声音进行采样,分析采集到的声音中是否有与预存的声音相匹配的目标声音,若其中有目标声音,则向对焦模块发送第一对焦信号。
具体的,处理模块首先判断采集到的声音是否为单一声源发出的声音。
如果是至少两声源发出的声音,则分离出各声源的声音。可以利用传统的声源分离方法,如基于独立分量分析的声源分析方法分离出多个声源中每一个声源的声音,其充分利用在声源之间声源的源信号是独立的这一事实。在独立分量分析中,根据声源数量使用维数等于麦克风数量的线性滤波器,当声源的数量小于麦克风的数量时,能够完全恢复源信号。当声源数量超过麦克风数量时,可以使用L1范最小化方法,该方法利用了语音功率谱的概率分布接近拉普拉斯分布而不是高斯分布这一事实。优选利用一下方法进行声源分离:将来自至少两个声源的模拟声音输入转换为数字声音输入;将数字声音输入从时域转换到频域;产生第一解集,且该解集使得来自声源1到N中活动的那些声音的估计的误差最小;根据第一解集估计活动声源的数量, 以生产最优分离解集,该最优分离解集最接近收到的模拟声音输入的每个声源;将最优分离解集转换到时域。从而,即使在声源数量超过麦克风数量,并且出现一些具有高S/N的背景噪声、回声和混响的环境里,也能够分离出每个声源的声音。
声源分离后,提取出各声源的声音的声学特征,并分别与预存的声音的声学特征进行比较。目前声音匹配方法中常用的提取声学特征的方法包括线性预测倒谱(LPCC)参数、美化倒谱参数(MFCC)等等。具体的处理模块从各声源的声音中提取声学特征形成待识别的特征矢量序列,如
Figure PCTCN2015077480-appb-000002
将每个待识别的特征矢量序列与预存的声音的声学特征形成的特征矢量序列进行匹配得分(也称为对数似然的分,或似然的分,或得分),进行判决;根据声纹的识别方法的类型(闭集声纹鉴别、开集声纹鉴别和声纹确认),在需要的时候进行拒识判决,从而得出结果。若其中一声源的声音的声学特征与预存的声音的声学特征相匹配(如,其中一声源的声音的匹配得分最高,且超过预设阈值),则判定该声源的声音为目标声音,该声源为目标声源;否则,则,则判定采集到的声音中没有目标声音。
如果是单一声源发出的声音,则直接提取出采集到的声音的声学特征,判断采集到的声音的声学特征与预存的声音的声学特征是否匹配,若相匹配,则判定采集到的声音为目标声音,发声声源为目标声源,否则,则判定采集到的声音中没有目标声音。
在某些实施例中,当处理模块判定采集到的声音中没有目标声音时,则向对焦模块发送第二对焦信号。
对焦模块:设置为根据第一对焦信号对发出目标声音的目标声源进行对焦。
具体的,对焦模块接收到第一对焦信号后,则利用传统的定位方法对发出目标声音的目标声源进行定位,然后控制摄像头对准目标声源的定位方向,对该目标声源进行对焦。从而,利用该对焦方法,可以利用摄像头对跟踪目标进行实时跟踪拍摄。
当摄像装置中预存了至少两个人的声音,并对预存的多个声音进行了优先级排序,而处理模块据此匹配出的目标声音至少有两个时,对焦模块则根据预设的优先级顺序对优先级较高的目标声音所对应的目标声源进行对焦。 即,摄像装置中存储了多个跟踪目标的声音特征,当多个跟踪目标同时发声时,则对准优先级较高的跟踪目标进行对焦。或者,也可以由用户从预存的跟踪目标中指定一跟踪目标进行跟踪拍摄。
在某些实施例中,当处理模块向对焦模块发送第二对焦信号时,对焦模块根据第二对焦信号对任一发声声源对焦。当发声声源为唯一声源时,则对该唯一声源进行定位,控制摄像头对准该声源的定位方向进行对焦;当发声声源有多个时,处理模块则对多个声源进行分离,对焦模块选择任一声源进行定位,控制摄像头对准该声源的定位方向进行对焦。本实施例尤其适用于会议场景,当会议中重要人物讲话时,则聚焦于该重要任务;当重要人物没有讲话,而有其他人发言时,则聚焦于其它人。此外,对焦模块此时也可以根据预设规则选择对焦目标,如就近原则、音量最大原则等。
摄像装置可以在开始拍摄之前预先存储跟踪目标的声音,然后在拍摄过程中对跟踪目标进行跟踪拍摄。摄像装置也可以在拍摄过程中选定跟踪目标,然后对该跟踪目标进行跟踪拍摄。例如,拍摄过程中,用户在拍摄画面上选定一跟踪目标,摄像装置根据现有的转换方法将跟踪目标在拍摄画面上的平面位置转换为空间位置,并获取该跟踪目标的声音,通过分析提取出该声音的声学特征并予以存储,随后无论该跟踪目标在摄像范围内如何移动,都可通过声音的特征匹配,来确定目标声源的方位,进而摄像装置都能对其对焦跟踪拍摄。
据此,本发明的摄像装置,通过采集声音、分离声音、匹配声音,从而识别出目标声音及对应的目标声源,并自动对该目标声源进行对焦,最终实现了通过声音对跟踪目标的跟踪拍摄,即使跟踪目标不断转换或不断移动,也能实现实时跟踪拍摄。
需要说明的是,上述方法实施例中的技术特征在本装置均对应适用。
本领域普通技术人员可以理解,实现上述实施例方法中的全部或部分步骤可以通过程序来控制相关的硬件完成,所述的程序可以存储于一计算机可读取存储介质中,所述的存储介质可以是ROM/RAM、磁盘、光盘等。
以上参照附图说明了本发明的优选实施例,并非因此局限本发明的权利范围。本领域技术人员不脱离本发明的范围和实质,可以有多种变型方案实现本发明,比如作为一个实施例的特征可用于另一实施例而得到又一实施例。 凡在运用本发明的技术构思之内所作的任何修改、等同替换和改进,均应在本发明的权利范围之内。
工业实用性
本发明的摄像装置及其对焦方法,通过采集声音、分离声音、匹配声音,从而识别出目标声音及对应的目标声源,并自动对该目标声源进行对焦,最终实现了通过声音对跟踪目标进行跟踪拍摄,即使跟踪目标不断转换或不断移动,也能实现实时跟踪拍摄。本发明的技术方案尤其适用于会议、监控等跟踪拍摄场景,具有工业实用性。

Claims (20)

  1. 一种摄像装置的对焦方法,包括步骤:
    在拍摄过程中,采集声音;
    判断采集到的声音中是否有与预存的声音相匹配的目标声音,若是,则对发出所述目标声音的目标声源进行对焦。
  2. 根据权利要求1所述的摄像装置的对焦方法,其中,所述判断采集到的声音中是否有与预存的声音相匹配的目标声音包括:
    若采集到的声音为至少两声源发出的声音,则分离出各声源的声音;
    提取出各声源的声音的声学特征,并分别与预存的声音的声学特征进行比较;
    若其中一声源的声音的声学特征与预存的声音的声学特征相匹配,则判定该声源的声音为目标声音,该声源为目标声源。
  3. 根据权利要求1所述的摄像装置的对焦方法,其中,所述对发出所述目标声音的目标声源进行对焦包括:
    对发出所述目标声音的目标声源进行定位;
    对准所述目标声源的定位方向进行对焦。
  4. 根据权利要求1所述的摄像装置的对焦方法,其中,所述方法还包括:若预存的声音至少有两个,且匹配出的目标声音也至少有两个时,则对用户指定的目标声音所对应的目标声源进行对焦。
  5. 根据权利要求1所述的摄像装置的对焦方法,其中,判断采集到的声音中是否有与预存的声音相匹配的目标声音的步骤之后还包括:若采集到的声音中没有与预存的声音相匹配的目标声音,则对任一发声声源进行对焦。
  6. 根据权利要求1所述的摄像装置的对焦方法,其中,判断采集到的声音中是否有与预存的声音相匹配的目标声音的步骤之后还包括:若采集到的声音中没有与预存的声音相匹配的目标声音,则根据预设规则选择发声声源进行对焦。
  7. 根据权利要求7所述的摄像装置的对焦方法,其中,所述预设规则为就近原则或音量最大原则。
  8. 根据权利要求1所述的摄像装置的对焦方法,其中,所述预存的声音为拍摄过程中采集存储的声音,所述采集存储步骤为:
    拍摄过程中,确定用户在拍摄画面上选定的跟踪目标;
    采集所述跟踪目标的声音;
    提取出所述跟踪目标的声音的声学特征并予以存储。
  9. 一种摄像装置,包括声音采集模块、处理模块和对焦模块,其中:
    声音采集模块,设置为采集声音;
    处理模块,设置为判断采集到的声音中是否有与预存的声音相匹配的目标声音,若是,则向对焦模块发送第一对焦信号;
    对焦模块,设置为根据所述第一对焦信号对发出所述目标声音的目标声源进行对焦。
  10. 根据权利要求9所述的摄像装置,其中,所述处理模块设置为:
    若检测到采集到的声音为至少两声源发出的声音,则分离出各声源的声音;提取出各声源的声音的声学特征,并分别与预存的声音的声学特征进行比较;若其中一声源的声音的声学特征与预存的声音的声学特征相匹配,则判定该声源的声音为目标声音,该声源为目标声源。
  11. 根据权利要求9所述的摄像装置,其中,所述对焦模块设置为:对发出所述目标声音的目标声源进行定位,控制摄像头对准所述目标声源的定位方向进行对焦。
  12. 根据权利要求9所述的摄像装置,其中,所述对焦模块设置为:若预存的声音至少有两个,且匹配出的目标声音也至少有两个时,则对用户指定的目标声音所对应的目标声源进行对焦。
  13. 根据权利要求9所述的摄像装置,其中,所述处理模块设置为:若判定采集到的声音中没有与预存的声音相匹配的目标声音,则向所述对焦模块发送第二对焦信号;
    所述对焦模块设置为:根据所述第二对焦信号对任一发声声源进行对焦。
  14. 根据权利要求9所述的摄像装置,其中,所述处理模块设置为:若判定采集到的声音中没有与预存的声音相匹配的目标声音,则向所述对焦模块发送第二对焦信号;
    所述对焦模块设置为:接收到所述第二对焦信号后,根据预设规则选择发声声源进行对焦。
  15. 根据权利要求14所述的摄像装置,其中,所述预设规则为就近原则或音量最大原则。
  16. 根据权利要求9所述的摄像装置,其中,所述预存的声音为拍摄过程中采集存储的声音,所述声音采集模块还设置为:在拍摄过程中确定用户在拍摄画面上选定的跟踪目标,并采集所述跟踪目标的声音;所述处理模块还设置为:提取出所述跟踪目标的声音的声学特征并予以存储。
  17. 一种摄影装置的对焦方法,包括:
    在第一拍摄画面中选取目标,并采集所述目标的声音数据;
    建立所述目标与所述声音数据的对应关系表;
    对第二拍摄画面采集周围的声音数据;
    判断所述周围声音数据是否存在所述对应关系表中;及
    当所述周围声音数据存在所述对应关系表中,则根据所述对应关系表选取相应的目标进行对焦。
  18. 根据权利要求17所述的摄像装置的对焦方法,其中,所述方法还包括:若在第一拍摄画面中选取的目标至少有两个,且匹配出的目标声音也至少有两个时,则根据用户指定的目标声音所对应的目标声源进行对焦。
  19. 根据权利要求17所述的摄像装置的对焦方法,其中,所述判断所述周围声音数据是否存在所述对应关系表中的步骤之后还包括:若所述周围声音数据中不存在所述对应关系表中时,则根据预设规则选择发声声源进行对焦。
  20. 根据权利要求19所述的摄像装置的对焦方法,其中,所述预设规则为就近原则或音量最大原则。
PCT/CN2015/077480 2014-05-15 2015-04-27 摄像装置及其对焦方法 WO2015172630A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410205508.5 2014-05-15
CN201410205508.5A CN103957359B (zh) 2014-05-15 2014-05-15 摄像装置及其对焦方法

Publications (1)

Publication Number Publication Date
WO2015172630A1 true WO2015172630A1 (zh) 2015-11-19

Family

ID=51334574

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/077480 WO2015172630A1 (zh) 2014-05-15 2015-04-27 摄像装置及其对焦方法

Country Status (2)

Country Link
CN (1) CN103957359B (zh)
WO (1) WO2015172630A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110876036A (zh) * 2018-08-31 2020-03-10 腾讯数码(天津)有限公司 一种视频生成的方法以及相关装置
CN111953894A (zh) * 2016-11-22 2020-11-17 谷歌有限责任公司 用于捕获图像的设备、方法、系统及计算机可读存储介质
CN113284490A (zh) * 2021-04-23 2021-08-20 歌尔股份有限公司 电子设备的控制方法、装置、设备及可读存储介质
CN113573096A (zh) * 2021-07-05 2021-10-29 维沃移动通信(杭州)有限公司 视频处理方法、装置、电子设备及介质
US11388333B2 (en) * 2017-11-30 2022-07-12 SZ DJI Technology Co., Ltd. Audio guided image capture method and device

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103957359B (zh) * 2014-05-15 2016-08-24 努比亚技术有限公司 摄像装置及其对焦方法
CN104092936B (zh) * 2014-06-12 2017-01-04 小米科技有限责任公司 自动对焦方法及装置
CN106303195A (zh) * 2015-05-28 2017-01-04 中兴通讯股份有限公司 拍摄设备及跟踪拍摄方法和系统
CN105208283A (zh) * 2015-10-13 2015-12-30 广东欧珀移动通信有限公司 一种声控拍照的方法及装置
CN105554443B (zh) * 2015-12-04 2018-11-13 浙江宇视科技有限公司 视频图像中异响来源的定位方法及装置
CN105657253B (zh) * 2015-12-28 2019-03-29 联想(北京)有限公司 一种对焦方法及电子设备
CN105872366B (zh) * 2016-03-30 2018-08-24 南昌大学 一种基于fastica算法的盲源分离技术控制聚焦系统
CN107347145A (zh) * 2016-05-06 2017-11-14 杭州萤石网络有限公司 一种视频监控方法及云台网络摄像机
CN105979442B (zh) * 2016-07-22 2019-12-03 北京地平线机器人技术研发有限公司 噪声抑制方法、装置和可移动设备
CN106341601A (zh) * 2016-09-23 2017-01-18 努比亚技术有限公司 移动终端及拍照方法
CN106341665A (zh) * 2016-09-30 2017-01-18 浙江宇视科技有限公司 一种跟踪监控方法及装置
CN106603919A (zh) * 2016-12-21 2017-04-26 捷开通讯(深圳)有限公司 调整拍摄对焦的方法及终端
CN106803886A (zh) * 2017-02-28 2017-06-06 深圳天珑无线科技有限公司 一种拍照的方法及装置
JP6766086B2 (ja) 2017-09-28 2020-10-07 キヤノン株式会社 撮像装置およびその制御方法
JP7292853B2 (ja) 2017-12-26 2023-06-19 キヤノン株式会社 撮像装置及びその制御方法及びプログラム
WO2019130909A1 (ja) * 2017-12-26 2019-07-04 キヤノン株式会社 撮像装置及びその制御方法及び記録媒体
CN108091091A (zh) * 2017-12-28 2018-05-29 中国电子科技集团公司第五十四研究所 一种低功耗震声图像复合探测系统
CN110875053A (zh) * 2018-08-29 2020-03-10 阿里巴巴集团控股有限公司 语音处理的方法、装置、系统、设备和介质
CN109194918B (zh) * 2018-09-17 2022-04-19 东莞市丰展电子科技有限公司 一种基于移动载体的拍摄系统
CN109639961B (zh) * 2018-11-08 2021-05-18 联想(北京)有限公司 采集方法和电子设备
WO2020118503A1 (zh) * 2018-12-11 2020-06-18 华为技术有限公司 一种确定图像对焦区域的方法及装置
CN111050063A (zh) * 2019-03-29 2020-04-21 苏州浩哥文化传播有限公司 一种基于声源识别的自动化摄像方法及其系统
CN113411487B (zh) * 2020-03-17 2023-08-01 中国电信股份有限公司 设备的控制方法、装置、系统和计算机可读存储介质
CN111783628B (zh) * 2020-06-29 2024-07-12 珠海格力电器股份有限公司 一种位置跟踪方法、装置、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20040001707A (ko) * 2002-06-28 2004-01-07 엘지전자 주식회사 화상 통신 기능을 구비한 휴대용 단말기의 촬영 방향 조정방법 및 장치
CN1713717A (zh) * 2004-06-25 2005-12-28 北京中星微电子有限公司 摄像机拍摄方位数字声控定向方法
CN101068308A (zh) * 2007-05-10 2007-11-07 华为技术有限公司 一种控制图像采集装置进行目标定位的系统及方法
CN101770139A (zh) * 2008-12-29 2010-07-07 鸿富锦精密工业(深圳)有限公司 对焦控制系统及方法
CN102413276A (zh) * 2010-09-21 2012-04-11 天津三星光电子有限公司 具有声控聚焦功能的数码摄像机
CN103957359A (zh) * 2014-05-15 2014-07-30 深圳市中兴移动通信有限公司 摄像装置及其对焦方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2680616A1 (en) * 2012-06-25 2014-01-01 LG Electronics Inc. Mobile terminal and audio zooming method thereof
CN103685905B (zh) * 2012-09-17 2016-12-28 联想(北京)有限公司 一种拍照方法及电子设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20040001707A (ko) * 2002-06-28 2004-01-07 엘지전자 주식회사 화상 통신 기능을 구비한 휴대용 단말기의 촬영 방향 조정방법 및 장치
CN1713717A (zh) * 2004-06-25 2005-12-28 北京中星微电子有限公司 摄像机拍摄方位数字声控定向方法
CN101068308A (zh) * 2007-05-10 2007-11-07 华为技术有限公司 一种控制图像采集装置进行目标定位的系统及方法
CN101770139A (zh) * 2008-12-29 2010-07-07 鸿富锦精密工业(深圳)有限公司 对焦控制系统及方法
CN102413276A (zh) * 2010-09-21 2012-04-11 天津三星光电子有限公司 具有声控聚焦功能的数码摄像机
CN103957359A (zh) * 2014-05-15 2014-07-30 深圳市中兴移动通信有限公司 摄像装置及其对焦方法

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111953894A (zh) * 2016-11-22 2020-11-17 谷歌有限责任公司 用于捕获图像的设备、方法、系统及计算机可读存储介质
US11317018B2 (en) 2016-11-22 2022-04-26 Google Llc Camera operable using natural language commands
US11388333B2 (en) * 2017-11-30 2022-07-12 SZ DJI Technology Co., Ltd. Audio guided image capture method and device
CN110876036A (zh) * 2018-08-31 2020-03-10 腾讯数码(天津)有限公司 一种视频生成的方法以及相关装置
CN110876036B (zh) * 2018-08-31 2022-08-02 腾讯数码(天津)有限公司 一种视频生成的方法以及相关装置
CN113284490A (zh) * 2021-04-23 2021-08-20 歌尔股份有限公司 电子设备的控制方法、装置、设备及可读存储介质
CN113573096A (zh) * 2021-07-05 2021-10-29 维沃移动通信(杭州)有限公司 视频处理方法、装置、电子设备及介质

Also Published As

Publication number Publication date
CN103957359B (zh) 2016-08-24
CN103957359A (zh) 2014-07-30

Similar Documents

Publication Publication Date Title
WO2015172630A1 (zh) 摄像装置及其对焦方法
JP7536789B2 (ja) 分散システムにおいてユーザの好みに最適化するためのカスタマイズされた出力
EP3963576B1 (en) Speaker attributed transcript generation
CN112074901B (zh) 语音识别登入
US9330673B2 (en) Method and apparatus for performing microphone beamforming
US9595259B2 (en) Sound source-separating device and sound source-separating method
US12051422B2 (en) Processing overlapping speech from distributed devices
CN102843540B (zh) 用于视频会议的自动摄像机选择
KR102230667B1 (ko) 오디오-비주얼 데이터에 기반한 화자 분리 방법 및 장치
WO2020222930A1 (en) Audio-visual diarization to identify meeting attendees
KR101508092B1 (ko) 화상 회의를 지원하는 방법 및 시스템
EP3963575A1 (en) Distributed device meeting initiation
JP2022062874A (ja) 話者予測方法、話者予測装置、およびコミュニケーションシステム
CN116866509B (zh) 会议现场画面跟踪方法、装置和存储介质
JP2021197658A (ja) 収音装置、収音システム及び収音方法
RU2821283C2 (ru) Индивидуально настроенный вывод, который оптимизируется для пользовательских предпочтений в распределенной системе
CN113411487B (zh) 设备的控制方法、装置、系统和计算机可读存储介质
CN112788278B (zh) 视频流的生成方法、装置、设备及存储介质
JP2022135674A (ja) 電子機器、情報処理装置、制御方法、学習方法、プログラム
CN113691753A (zh) 一种处理方法、装置和电子设备
CN118470592A (zh) 识别和追踪发言人的方法、装置、电子设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15792926

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 26/04/2017)

122 Ep: pct application non-entry in european phase

Ref document number: 15792926

Country of ref document: EP

Kind code of ref document: A1