KR20100043801A

KR20100043801A - Apparatus and method for sound source localization

Info

Publication number: KR20100043801A
Application number: KR1020080102997A
Authority: KR
Inventors: 박영진; 권병호
Original assignee: 한국과학기술원
Priority date: 2008-10-21
Filing date: 2008-10-21
Publication date: 2010-04-29
Anticipated expiration: 2028-10-21
Also published as: KR101038437B1

Abstract

음원 위치 추정 장치 및 방법이 개시(disclose)된다. 이 장치는 3개의 마이크로폰들의 조합으로 얻어지는 제n(1 내지 3) 마이크로폰 쌍에 대한 제n(1 내지 3) 상호상관값 시퀀스를 획득하는 획득부; 미리 정해진 제n(1 내지 3) 사상 함수(mapping function)에 따라, 상기 제n(1 내지 3) 상호상관값 시퀀스의 상호상관값들 각각을 제n(1 내지 3) 좌표계 - 제n(1 내지 3) 마이크로폰 쌍을 기준으로 설정되는 좌표계로서 수평각 및 고도각을 포함함 - 의 해당 좌표에 할당하는 사상함수 적용부; 및 상기 제1 내지 제3 상호상관값 시퀀스의 할당 결과를 기초로, 음원의 방향을 추정하는 추정부를 포함한다. 여기서, 상기 제n(1 내지 3) 사상 함수는 상기 제n(1 내지 3) 마이크로폰 쌍 기준에서의 도착시간지연과 상기 제n(1 내지 3) 좌표계의 좌표 간의 대응 관계를 나타내는 함수이다. 따라서, 적은 개수의 마이크로폰들을 사용하여도 음원의 3차원적인 위치에 대한 추정이 가능하다.Sound source position estimation apparatus and method are disclosed. The apparatus comprises: an obtaining unit for obtaining an nth (1-3) cross-correlation value sequence for an nth (1-3) microphone pair obtained by a combination of three microphones; According to a predetermined nth (1 to 3) mapping function, each of the cross-correlation values of the n-th (1 to 3) cross-correlation value sequence is an n (1 to 3) coordinate system-n (1) 3) a mapping function application unit for allocating corresponding coordinates of-including a horizontal angle and an altitude angle as a coordinate system set based on a microphone pair; And an estimator for estimating a direction of a sound source based on an assignment result of the first to third cross-correlation value sequences. Here, the n th (1 to 3) mapping function is a function indicating a correspondence relationship between the arrival time delay based on the n th (1 to 3) microphone pair and the coordinates of the n th (1 to 3) coordinate system. Therefore, even when using a small number of microphones, it is possible to estimate the three-dimensional position of the sound source.

Description

Apparatus and method for sound source localization}

본 발명은 음원 위치 추정에 관한 것으로, 보다 상세하지만 제한됨이 없이는(more particularly, but not exclusively), 적은 개수의 마이크로폰들을 사용하여도 음원의 3차원적인 위치 추정(localzing)이 가능한 음원 위치 추정 장치 및 방법에 관한 것이다.The present invention relates to sound source position estimation, and more particularly, but not exclusively, a sound source position estimation apparatus capable of localizing a three-dimensional position of a sound source even using a small number of microphones, and It is about a method.

음원 위치 추정 기술은 마이크로폰 어레이(microphone array) 등의 음향 센서들을 사용하여 음원 및 화자의 위치를 파악하는 기술로서, 로봇 관련 시스템(예컨대, 음원인 사용자를 위치추정하여, 사용자에게 다가가는 인간형 로봇 또는 위치 이동 로봇을 포함하는 시스템), 폐회로 감시 시스템(예컨대, 음원을 촬영대상으로 간주하여, 음원을 위치 추정하여 촬영하는 시스템), 입체 음향 등 다양한 용도로 활용되고 있다.The sound source position estimation technology uses sound sensors such as a microphone array to identify a sound source and a speaker, and is a robot-related system (eg, a humanoid robot that approaches a user by estimating a user as a sound source or A system including a position moving robot), a closed loop monitoring system (for example, a system for estimating and photographing a sound source by considering a sound source as a photographing target) and a stereoscopic sound are utilized.

3차원 공간 상의 음원 위치를 추정하는 기술은 마이크로폰 어레이 방식, 실내 고정 설치 방식(room-oriented microphone set) 등 다양한 방식으로 접근되어 왔다. 그 다양한 방식 중에서 마이크로폰 어레이 방식은 로봇에 적용되기 적합하 고, 정확한 음원 위치 추정이 가능하다는 등의 이유로 많이 부각되고 있다. 마이크로폰 어레이 방식은 각 마이크로폰 쌍에서 수신된 두 신호의 도착시간지연(Time Delay of Arrival : 이하 TDOA)를 추정한 후, 마이크로폰 쌍들 간의 기하학적 관계 및 상기 추정된 TDOA 값들을 이용하여 음원 위치를 추정하는 방식이다. 마이크로폰 어레이 방식으로 3차원 공간상의 음원 위치를 추정하기 위해서는 최소한 4개의 마이크로폰이 요구된다. 마이크로폰의 개수는 음원 위치 추정 하드웨어의 사이즈와 계산량에 직접적인 영향을 미치므로, 적은 개수의 마이크로폰들을 사용하여 음원의 3차원적인 위치 추정이 가능한 기술이 요구된다.Techniques for estimating the location of sound sources in three-dimensional space have been approached in a variety of ways, including microphone arrays and room-oriented microphone sets. Among the various methods, the microphone array method is emerging as a reason that it is suitable to be applied to robots and accurate sound source position estimation is possible. The microphone array method estimates a time delay of two signals received from each microphone pair (TDOA), and then estimates a sound source position by using geometrical relations between the microphone pairs and the estimated TDOA values. to be. At least four microphones are required to estimate the position of a sound source in three-dimensional space using the microphone array method. Since the number of microphones directly affects the size and the amount of computation of the sound source position estimation hardware, a technique capable of three-dimensional position estimation of sound sources using a small number of microphones is required.

본 발명이 이루고자 하는 기술적 과제는 적은 개수의 마이크로폰들을 사용하여 음원의 3차원적인 위치 추정이 가능한 음원 위치 추정 장치 및 방법을 제공하는 데 있다.An object of the present invention is to provide a sound source position estimation apparatus and method capable of three-dimensional position estimation of the sound source using a small number of microphones.

상기의 기술적 과제를 이루기 위해 본 발명의 일 측면은 3개의 마이크로폰들의 조합으로 얻어지는 제n(1 내지 3) 마이크로폰 쌍에 대한 제n(1 내지 3) 상호상관값 시퀀스를 획득하는 획득부; 미리 정해진 제n(1 내지 3) 사상 함수(mapping function)에 따라, 상기 제n(1 내지 3) 상호상관값 시퀀스의 상호상관값들 각각을 제n(1 내지 3) 좌표계 - 제n(1 내지 3) 마이크로폰 쌍을 기준으로 설정되는 좌표계로서 수평각 및 고도각을 포함함 - 의 해당 좌표에 할당하는 사상함수 적용부; 및 상기 제1 내지 제3 상호상관값 시퀀스의 할당 결과를 기초로, 음원의 방향을 추정하는 추정부를 포함하는 음원 위치 추정 장치를 제공한다. 여기서, 상기 제n(1 내지 3) 사상 함수는 상기 제n(1 내지 3) 마이크로폰 쌍 기준에서의 도착시간지연과 상기 제n(1 내지 3) 좌표계의 좌표 간의 대응 관계를 나타내는 함수이다.One aspect of the present invention to achieve the above technical problem is an acquisition unit for obtaining an n (1 to 3) cross-correlation sequence for the n (1 to 3) microphone pair obtained by a combination of three microphones; According to a predetermined nth (1 to 3) mapping function, each of the cross-correlation values of the n-th (1 to 3) cross-correlation value sequence is an n (1 to 3) coordinate system-n (1) 3) a mapping function application unit for allocating corresponding coordinates of-including a horizontal angle and an altitude angle as a coordinate system set based on a microphone pair; And an estimator for estimating the direction of the sound source based on the allocation result of the first to third cross-correlation value sequences. Here, the n th (1 to 3) mapping function is a function indicating a correspondence relationship between the arrival time delay based on the n th (1 to 3) microphone pair and the coordinates of the n th (1 to 3) coordinate system.

바람직하게, 상기 마이크로폰들이 설치된 플랫폼은 상기 마이크로폰들이 이루는 평면을 기준으로 상하 비대칭 구조를 가진다.Preferably, the platform in which the microphones are installed has a vertically asymmetrical structure with respect to the plane of the microphones.

바람직하게, 상기 추정부는 기준 좌표계와 상기 제n 좌표계 간의 관계를 기초로 상기 제n 상호상관값 시퀀스의 할당 결과를 좌표 변환하여, 상기 기준 좌표계의 좌표들 각각에 상기 제n(1 내지 3) 상호상관값 시퀀스의 해당 상호상관값을 할당하는 좌표 변환부; 및 상기 기준 좌표계의 좌표들 중에서 할당된 상호상관값들의 합이 최대인 좌표를 검출하고, 상기 검출된 좌표에 해당하는 수평각 및 고도각을 상기 음원의 방향으로 결정하는 결정부를 포함한다.Preferably, the estimator coordinate-converts an assignment result of the nth cross-correlation value sequence based on a relationship between a reference coordinate system and the n-th coordinate system, so that the nth (1 to 3) A coordinate transformation unit for assigning a corresponding cross-correlation value of a correlation value sequence; And a determination unit configured to detect a coordinate having a maximum sum of the assigned cross-correlation values among the coordinates of the reference coordinate system, and determine a horizontal angle and an elevation angle corresponding to the detected coordinates in the direction of the sound source.

상기의 기술적 과제를 이루기 위해 본 발명의 다른 측면은 3개의 마이크로폰들의 조합으로 얻어지는 제n(1 내지 3) 마이크로폰 쌍에 대한 제n(1 내지 3) 상호상관값 시퀀스를 획득하는 획득부; 미리 정해진 제n(1 내지 3) 사상 함수에 따라, 상기 제n(1 내지 3) 상호상관값 시퀀스의 상호상관값들 각각을 기준 좌표계 - 수평각 및 고도각을 포함함 - 의 해당 좌표에 할당하는 사상함수 적용부; 및 상기 제1 내지 제3 상호상관값 시퀀스의 할당 결과를 기초로, 음원의 방향을 추정하는 추정부를 포함하는 음원 위치 추정 장치를 제공한다. 여기서, 상기 제n(1 내지 3) 사상 함수는 상기 제n(1 내지 3) 마이크로폰 쌍 기준에서의 도착시간지연과 상기 기준 좌표계의 좌표 간의 대응 관계를 나타내는 함수이다.Another aspect of the present invention to achieve the above technical problem is an acquisition unit for obtaining an n (1 to 3) cross-correlation sequence for the n (1 to 3) microphone pair obtained by a combination of three microphones; Assigning each of the cross-correlation values of the n-th (3) cross-correlation value sequence to corresponding coordinates of a reference coordinate system, including a horizontal angle and an elevation angle, according to a predetermined nth (1-3) mapping function Mapping function application unit; And an estimator for estimating the direction of the sound source based on the allocation result of the first to third cross-correlation value sequences. Here, the nth (1 to 3) mapping function is a function indicating a correspondence relationship between the arrival time delay in the nth (1 to 3) microphone pair reference and the coordinates of the reference coordinate system.

바람직하게, 상기 추정부는 상기 기준 좌표계의 좌표들 중에서 할당된 상호상관값들의 합이 최대인 좌표를 검출하고, 상기 검출된 좌표에 해당하는 수평각 및 고도각을 상기 음원의 방향으로 결정한다.Preferably, the estimator detects a coordinate having a maximum sum of the assigned cross-correlation values among coordinates of the reference coordinate system, and determines a horizontal angle and an elevation angle corresponding to the detected coordinates in the direction of the sound source.

상기의 기술적 과제를 이루기 위해 본 발명의 또 다른 측면은 3개의 마이크로폰들의 조합으로 얻어지는 제n(1 내지 3) 마이크로폰 쌍에 대한 제n(1 내지 3) 상호상관값 시퀀스를 획득하는 단계; 미리 정해진 제n(1 내지 3) 사상 함수(mapping function)에 따라, 상기 제n(1 내지 3) 상호상관값 시퀀스의 상호상관값들 각각을 제n(1 내지 3) 좌표계 - 제n(1 내지 3) 마이크로폰 쌍을 기준으로 설정되는 좌표계로서 수평각 및 고도각을 포함함 - 의 해당 좌표에 할당하는 단계; 및 상기 제1 내지 제3 상호상관값 시퀀스의 할당 결과를 기초로, 음원의 방향을 추정하는 단계를 포함하는 음원 위치 추정 방법을 제공한다. 여기서, 상기 제n(1 내지 3) 사상 함수는 상기 제n(1 내지 3) 마이크로폰 쌍 기준에서의 도착시간지연과 상기 제n(1 내지 3) 좌표계의 좌표 간의 대응 관계를 나타내는 함수이다.In order to achieve the above object, another aspect of the present invention is to obtain an n (1 to 3) cross-correlation sequence for the n (1 to 3) microphone pair obtained by a combination of three microphones; According to a predetermined nth (1 to 3) mapping function, each of the cross-correlation values of the n-th (1 to 3) cross-correlation value sequence is an n (1 to 3) coordinate system-n (1) 3) assigning to the corresponding coordinates of-including a horizontal angle and an elevation angle as a coordinate system set based on the microphone pair; And estimating a direction of a sound source based on an assignment result of the first to third cross-correlation value sequences. Here, the n th (1 to 3) mapping function is a function indicating a correspondence relationship between the arrival time delay based on the n th (1 to 3) microphone pair and the coordinates of the n th (1 to 3) coordinate system.

바람직하게, 상기 추정하는 단계는, 기준 좌표계와 상기 제n 좌표계 간의 관계를 기초로 상기 제n 상호상관값 시퀀스의 할당 결과를 좌표 변환하여, 상기 기준 좌표계의 좌표들 각각에 상기 제n(1 내지 3) 상호상관값 시퀀스의 해당 상호상관값을 할당하는 단계; 및 상기 기준 좌표계의 좌표들 중에서 할당된 상호상관값들의 합이 최대인 좌표를 검출하고, 상기 검출된 좌표에 해당하는 수평각 및 고도각을 상기 음원의 방향으로 결정하는 단계를 포함한다.Preferably, the estimating may include transforming an assignment result of the nth cross-correlation value sequence based on a relationship between a reference coordinate system and the nth coordinate system, and converting the nth (1 to 1) to each of the coordinates of the reference coordinate system. 3) assigning a corresponding cross-correlation value of the cross-correlation value sequence; And detecting a coordinate having the maximum sum of the assigned cross-correlation values among the coordinates of the reference coordinate system, and determining a horizontal angle and an elevation angle corresponding to the detected coordinates in the direction of the sound source.

상기의 기술적 과제를 이루기 위해 본 발명의 또 다른 측면은 3개의 마이크로폰들의 조합으로 얻어지는 제n(1 내지 3) 마이크로폰 쌍에 대한 제n(1 내지 3) 상호상관값 시퀀스를 획득하는 단계; 미리 정해진 제n(1 내지 3) 사상 함수에 따라, 상기 제n(1 내지 3) 상호상관값 시퀀스의 상호상관값들 각각을 기준 좌표계 - 수평각 및 고도각을 포함함 - 의 해당 좌표에 할당하는 단계; 및 상기 제1 내지 제3 상호상관값 시퀀스의 할당 결과를 기초로, 음원의 방향을 추정하는 단계를 포함하는 음원 위치 추정 방법을 제공한다. 여기서, 상기 제n(1 내지 3) 사상 함수는 상기 제n(1 내지 3) 마이크로폰 쌍 기준에서의 도착시간지연과 상기 기준 좌표계의 좌표 간의 대응 관계를 나타내는 함수이다.In order to achieve the above object, another aspect of the present invention is to obtain an n (1 to 3) cross-correlation sequence for the n (1 to 3) microphone pair obtained by a combination of three microphones; Assigning each of the cross-correlation values of the n-th (3) cross-correlation value sequence to corresponding coordinates of a reference coordinate system, including a horizontal angle and an elevation angle, according to a predetermined nth (1-3) mapping function step; And estimating a direction of a sound source based on an assignment result of the first to third cross-correlation value sequences. Here, the nth (1 to 3) mapping function is a function indicating a correspondence relationship between the arrival time delay in the nth (1 to 3) microphone pair reference and the coordinates of the reference coordinate system.

바람직하게, 상기 추정하는 단계는, 상기 기준 좌표계의 좌표들 중에서 할당된 상호상관값들의 합이 최대인 좌표를 검출하고, 상기 검출된 좌표에 해당하는 수평각 및 고도각을 상기 음원의 방향으로 결정하는 단계를 포함한다.Preferably, the estimating may include: detecting a coordinate having a maximum sum of assigned cross-correlation values among coordinates of the reference coordinate system and determining a horizontal angle and an elevation angle corresponding to the detected coordinates in the direction of the sound source; Steps.

상기의 기술적 과제를 이루기 위해 본 발명의 또 다른 측면은 상술한 음원 위치 추정 방법을 컴퓨터에서 실행시키기 위한 프로그램을 수록한 컴퓨터로 읽을 수 있는 기록 매체를 제공한다.In order to achieve the above technical problem, another aspect of the present invention provides a computer-readable recording medium containing a program for executing the aforementioned sound source position estimation method on a computer.

상기에서 제시한 본 발명의 실시예들은 다음의 장점들을 포함하는 효과를 가질 수 있다. 다만, 본 발명의 모든 실시예들이 이를 전부 포함하여야 한다는 의미는 아니므로, 본 발명의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다.Embodiments of the present invention presented above may have an effect including the following advantages. However, all the embodiments of the present invention are not meant to include them all, and thus the scope of the present invention should not be understood as being limited thereto.

적은 개수의 마이크로폰들을 사용하여도 음원의 3차원적인 위치에 대한 추정이 가능하다. 따라서, 음원 위치 추정을 위한 하드웨어의 사이즈 및 계산량을 줄일 수 있다.Even with a small number of microphones, it is possible to estimate the three-dimensional position of the sound source. Therefore, it is possible to reduce the size and calculation amount of hardware for sound source position estimation.

본 발명의 실시예들에 관한 설명은 본 발명의 구조적 내지 기능적 설명들을 위하여 예시된 것에 불과하므로, 본 발명의 권리범위는 본문에 설명된 실시예들에 의하여 제한되는 것으로 해석되어서는 아니 된다. 즉, 본 발명의 실시예들은 다양한 변경이 가능하고 여러 가지 형태를 가질 수 있으므로 본 발명의 기술적 사상을 실현할 수 있는 균등물들을 포함하는 것으로 이해되어야 한다.Since descriptions of embodiments of the present invention are merely illustrated for structural to functional descriptions of the present invention, the scope of the present invention should not be construed as limited by the embodiments described in the present invention. That is, the embodiments of the present invention may be variously modified and may have various forms, and thus, it should be understood to include equivalents that may realize the technical idea of the present invention.

한편, 본 발명에서 서술되는 용어의 의미는 다음과 같이 이해되어야 할 것이다.On the other hand, the meaning of the terms described in the present invention will be understood as follows.

"제1", "제2" 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위한 것으로 이들 용어들에 의해 본 발명의 권리범위가 한정되어서는 아니 된다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요 소도 제1 구성요소로 명명될 수 있다.Terms such as "first" and "second" are intended to distinguish one component from another component, and the scope of the present invention should not be limited by these terms. For example, a first component may be named a second component, and similarly, a second component may also be named a first component.

본 발명에서 기재된 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함하다" 또는 "가지다" 등의 용어는 설시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions described herein are to be understood to include plural expressions unless the context clearly indicates otherwise, and the terms "comprise" or "having" include elements, features, numbers, steps, operations, and elements described. It is to be understood that the present invention is intended to designate that there is a part or a combination thereof, and does not exclude in advance the possibility of the presence or addition of one or more other features or numbers, steps, actions, components, parts or combinations thereof. .

본 발명에서 기술한 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않은 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 단계들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.Each step described in the present invention may occur out of the stated order unless the context clearly dictates the specific order. That is, each step may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order.

여기서 사용되는 모든 용어들은 다르게 정의되지 않는 한, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한 이상적이거나 과도하게 형식적인 의미를 지니는 것으로 해석될 수 없다.Unless otherwise defined, all terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs. Terms such as those defined in the commonly used dictionaries should be construed as having meanings consistent with the meanings in the context of the related art, and shall be interpreted as having ideal or overly formal meanings unless expressly defined in this application. Can't be.

도 1은 본 발명의 일실시예에 따른 음원 위치 추정 장치를 나타내는 블록도이다.1 is a block diagram illustrating a sound source position estimation apparatus according to an embodiment of the present invention.

도 1을 참조하면, 본 실시예에 따른 장치는 획득부(100), 사상함수 적용부(110), 및 추정부(120)를 포함하여 이루어진다.Referring to FIG. 1, an apparatus according to the present exemplary embodiment includes an acquisition unit 100, a mapping function application unit 110, and an estimation unit 120.

획득부(100)는 3개의 마이크로폰들의 조합으로 얻어지는 제n(1 내지 3) 마이크로폰 쌍에 대한 제n(1 내지 3) 상호상관값 시퀀스를 획득한다. 여기서, 상호상관값 시퀀스의 획득 방법의 예로는, 상호상관값 시퀀스를 추정하는 별도의 장치로부터 획득부(100)가 제공받는 방법, 획득부(100) 자체가 상호상관값 시퀀스를 추정하는 방법 등을 들 수 있다. 후자의 방법에 대해서는, 도 8을 참조하여 후술한다.The acquisition unit 100 obtains an nth (1-3) cross-correlation sequence for the nth (1-3) microphone pair obtained by the combination of three microphones. Here, as an example of a method of acquiring a cross-correlation value sequence, a method in which the acquirer 100 is provided from a separate device for estimating the cross-correlation value sequence, a method in which the acquirer 100 itself estimates the cross-correlation value sequence, and the like. Can be mentioned. The latter method will be described later with reference to FIG. 8.

도 2는 본 실시예를 설명하기 위해 예시하는 마이크로폰 어레이 및 플랫폼을 나타낸다. 본 명세서에서 플랫폼(200)은 마이크로폰 어레이가 설치되는 장치, 물체, 구조물 등을 나타내며, 인간형 로봇의 경우에는 일반적으로 인간형 로봇의 머리 부분에 마이크로폰 어레이가 설치되므로 인간형 로봇의 머리가 플랫폼(200)에 해당하나 반드시 이에 한정되는 것은 아니다. 2 shows a microphone array and a platform to illustrate the present embodiment. In the present specification, the platform 200 represents a device, an object, a structure, etc. in which the microphone array is installed. In the case of a humanoid robot, the head of the humanoid robot is mounted on the platform 200 because the microphone array is generally installed at the head of the humanoid robot. This is not necessarily limited thereto.

도 2에는 제1 마이크로폰(m1), 제2 마이크로폰(m2), 및 제3 마이크로폰(m3)로 이루어진 마이크로폰 어레이가 플랫폼(200)에 설치되어 있는 것으로 도시되어 있다.In FIG. 2, a microphone array consisting of a first microphone m1, a second microphone m2, and a third microphone m3 is illustrated in the platform 200.

본 명세서에서는, 편의상, 제1 마이크로폰(m1)과 제2 마이크로폰(m2)을 묶어서 제1 마이크로폰 쌍(pair)으로 칭하고, 제2 마이크로폰(m2)과 제3 마이크로폰(m3)을 묶어서 제2 마이크로폰 쌍으로 칭하며, 제3 마이크로폰(m3)과 제1 마이크로폰(m1)을 묶어서 제3 마이크로폰 쌍으로 칭한다.In the present specification, for convenience, the first microphone m1 and the second microphone m2 are bundled to be referred to as a first microphone pair, and the second microphone m2 and the third microphone m3 are bundled to form a second microphone pair. The third microphone m3 and the first microphone m1 are collectively referred to as a third microphone pair.

제1 마이크로폰 쌍에 대한 상호상관값 시퀀스는 수학식 1로 예시된 상호상관값 R₁₂(τ)들로 얻어진다.The cross-correlation value sequence for the first microphone pair is obtained with the cross-correlation values R ₁₂ (τ) illustrated in equation (1).

수학식 1에서, s₁[n], s₂[n]은 각각 제1 마이크로폰(m1) 및 제2 마이크로폰(m2)에서 수신된 신호의 n번째 디지털 샘플을 나타내며, T_s는 수신된 신호를 샘플링(즉, 아날로그 신호에서 디지털 신호로 변환)할 때의 샘플링 주파수에 따라 결정되는 샘플 간 시간 간격을 나타낸다. N은 상관 윈도우 사이즈를 나타낸다. In Equation 1, s ₁ [n] and s ₂ [n] represent the nth digital sample of the signal received at the first microphone m1 and the second microphone m2, respectively, and T _s denotes the received signal. It represents the time interval between samples determined by the sampling frequency when sampling (i.e., converting an analog signal to a digital signal). N represents the correlation window size.

m은 상관 옵셋(correlation offset or correlation lag)를 나타내며, M은 상관 옵셋의 범위를 특정하는 값이다. m represents a correlation offset or correlation lag, and M is a value specifying a range of correlation offsets.

참고적으로, 종래 기술에 따르면, 수학식 1로부터 얻어지는 상호상관값들 중 최대값에 해당하는 τ를 해당 마이크로폰 쌍의 TDOA로 간주하고, 이렇게 구하여진 여러 마이크로폰 쌍의 TDOA와 기하학적 관계를 이용하여 음원 위치를 파악한다. 즉, 종래 기술에 따르면, 상호상관값들은 TDOA 결정에만 이용될 뿐, 모든 상호상관값들이 음원 위치 파악과 관련된 신호 처리에 이용되지 않는다. 반면에, 후술하는 본 발명의 실시예에 따르면, 최대의 상호상관값만을 고려하는 것이 아니라, 각각의 상호상관값을 그 해당 좌표의 음원 존재 가능성 또는 음원 존재 판단에 사용되는 가중치로 간주하기 때문에, 모든 상호상관값들이 고려된다.For reference, according to the related art, τ, which corresponds to the maximum value among the cross-correlation values obtained from Equation 1, is regarded as the TDOA of the corresponding microphone pair, and the sound source is obtained by using the geometric relationship with the TDOA of the various microphone pairs thus obtained. Know your location. That is, according to the prior art, the cross-correlation values are used only for the TDOA determination, and not all the cross-correlation values are used for signal processing related to sound source localization. On the other hand, according to an embodiment of the present invention to be described later, not only considering the maximum cross-correlation value, but because each cross-correlation value is considered as a weight used for sound source presence possibility or sound source presence determination of the corresponding coordinate, All cross-correlation values are considered.

한편, 수학식 1은 상호상관값의 개념을 용이하게 설명하기 위해 시간 및 디지 털 도메인에서 예시한 식일 뿐, 아날로그 신호 도메인에서도 상호상관값을 추정할 수도 있으며, 주파수 도메인에서도 상호상관값을 추정할 수도 있음은 이 분야에 종사하는 자라면 충분히 이해할 수 있다. 또한, 상호상관값 추정 방법에 대해서는, C.H.Knapp, G.C.Carter, "The generalized correlation method for estimation of time delay," IEEE Trans. on acoustics, speech and signal processing, Vol. Assp-24, No.4 1976으로 특정되는 자료 등에 기재된 추정 방법 등 다양한 방법이 공지되어 있으며, 본 발명은 특정한 상호상관값 추정 알고리즘만을 사용해야만 하는 것은 아니다.On the other hand, Equation 1 is only an example illustrated in the time and digital domain to easily explain the concept of cross-correlation value, it is also possible to estimate the cross-correlation value in the analog signal domain, and also to estimate the cross-correlation value in the frequency domain The possibility of being able to understand is well understood by those who are in this field. Also, for a method of estimating cross-correlation values, see C.H. Knapp, G.C. Carter, "The generalized correlation method for estimation of time delay," IEEE Trans. on acoustics, speech and signal processing, Vol. Various methods are known, such as the estimation method described in the data specified in Assp-24, No. 4, 1976, etc., and the present invention does not have to use only a specific cross-correlation value estimation algorithm.

사상함수 적용부(110)는 미리 정해진 제n(1 내지 3) 사상 함수(mapping function)에 따라, 제n(1 내지 3) 상호상관값 시퀀스의 상호상관값들 각각을 제n(1 내지 3) 좌표계의 해당 좌표에 할당한다. The mapping function application unit 110 stores each of the cross-correlation values of the n (1 to 3) cross-correlation value sequence according to a predetermined n-th (1 to 3) mapping function. ) Assigns to the corresponding coordinates of the coordinate system.

여기서, 제n(1 내지 3) 좌표계는 제n(1 내지 3) 마이크로폰 쌍을 기준으로 설정되는 좌표계로서 수평각 및 고도각을 포함한다. 또한, 제n(1 내지 3) 사상 함수는 제n(1 내지 3) 마이크로폰 쌍 기준에서의 도착시간지연과 제n(1 내지 3) 좌표계의 좌표 간의 대응 관계를 나타내는 함수이다.Here, the nth (1 to 3) coordinate system is a coordinate system set based on the nth (1 to 3) microphone pair and includes a horizontal angle and an elevation angle. The nth (1 to 3) mapping function is a function indicating a correspondence relationship between the arrival time delay based on the n (1 to 3) microphone pair and the coordinates of the n (1 to 3) coordinate system.

도 3은 본 발명의 일실시예에서 사용하는 사상 함수를 설명하기 위한 도면이다. 3 is a view for explaining a mapping function used in an embodiment of the present invention.

제2 및 제3 마이크로폰 쌍 및 제2 및 제3 사상 함수에 대해서도 마찬가지로 설명되므로, 이하에서는 제1 마이크로폰 쌍을 기준으로 설명하고자 한다.Since the second and third microphone pairs and the second and third mapping functions are similarly described, the following description will be made based on the first microphone pair.

도 3에서, 제1 마이크로폰 쌍의 좌표계 즉, 제1 좌표계는 제1 내지 제3 마이크 로폰(m1, m2, m3)이 이루는 평면을 수평각(azimuth angle)을 나타내는 평면으로 하고, 제1 마이크로폰(m1)과 제2 마이크로폰(m2) 간의 중간 위치를 원점으로 한다.In FIG. 3, the coordinate system of the first microphone pair, that is, the first coordinate system, is a plane representing the azimuth angle as a plane formed by the first to third microphones m1, m2, and m3, and the first microphone m1. ) And the middle position between the second microphone m2 as the origin.

열린 공간(open space)에서는, 도 3과 같이, 제1 마이크로폰 쌍에서 임의의 값의 TDOA를 발생시키는 음원의 위치는 최우측의 자주색 원으로 근사화 표현될 수 있다.In the open space, as shown in FIG. 3, the position of the sound source that generates an arbitrary value of TDOA in the first microphone pair may be expressed by approximating the rightmost purple circle.

자주색 원 중 초록색 별표 위치에 해당하는 좌표는 제1 좌표계에서 수평각

및 고도각(elevation angle)

로 표현될 수 있다.Among the purple circles, the coordinate corresponding to the green star position is the horizontal angle in the first coordinate system.

And elevation angles

It can be expressed as.

여기서,

는 0^o에서 360^o의 범위를 가지며,

는 -90^o에서 90^o의 범위를 가진다.here,

^Has a range from 0 ^o to 360 ^o ,

Has a range from -90 ^o to 90 ^o .

즉, 제1 사상함수는 각각의 TDOA 값과 그 값을 가지게 하는 음원의 위치(수평각 및 고도각) 간의 대응 관계를 나타내는 함수로서, 플랫폼(200)에 대해 실험을 통하여 이러한 대응 관계를 얻어낼 수도 있고, 플랫폼(200)을 모델링하여 제1 사상 함수를 얻어낼 수도 있다.That is, the first mapping function is a function representing a correspondence relationship between each TDOA value and the position (horizontal angle and elevation angle) of the sound source having the value, and the correspondence relationship may be obtained by experimenting with the platform 200. In addition, the platform 200 may be modeled to obtain a first mapping function.

제1 사상 함수는 관심 범위에 있는 모든 TDOA값에 대해 상술한 대응 관계를 가지고 있고, 각 상호상관값의 상관 지연은 TDOA값에 대응되므로, 사상함수 적용부(110)는 제1 상호상관값 시퀀스의 상호상관값들 각각을 제1 좌표계의 좌표(수평각, 고도각)에 할당/맵핑할 수 있다.Since the first mapping function has the above-described correspondence with respect to all TDOA values in the range of interest, and the correlation delay of each cross-correlation value corresponds to the TDOA value, the mapping function application unit 110 performs the first cross-correlation value sequence. Each of the cross-correlation values of may be assigned / mapped to coordinates (horizontal angle, elevation angle) of the first coordinate system.

한편, 제1 내지 제3 마이크로폰들(m1, m2, m3)이 설치된 플랫폼이 도 2와 같이 제1 내지 제3 마이크로폰들(m1, m2, m3)이 이루는 평면을 기준으로 상하 비대칭 구 조를 가지는 경우, 동일한 TDOA를 발생시키는 음원의 위치는 도 3의 자주색 원과는 달리 원형을 이루지 않을 수 있다. 도 1에서, 참조 번호 210에 해당하는 원을 연장한 평면의 하단에 음원이 위치한 경우, 즉, 음의 특정 값을 가진 고도각 방향에 음원이 위치한 경우, 사운드 신호가 전달되는 경로가 평면 상단보다 다르기 때문에 평면의 상단에 동일한 고도각으로 위치한 음원이 있는 경우와 그 TDOA값이 다를 수 있다.Meanwhile, the platform in which the first to third microphones m1, m2 and m3 are installed may have a vertically asymmetric structure based on a plane formed by the first to third microphones m1, m2 and m3 as shown in FIG. 2. In this case, the position of the sound source generating the same TDOA may not be circular, unlike the purple circle of FIG. 3. 1, when the sound source is located at the bottom of the plane extending the circle corresponding to the reference number 210, that is, when the sound source is located in the altitude direction with a specific value of the sound, the path through which the sound signal is transmitted is greater than the top of the plane. As a result, the TDOA value may be different from that of a sound source located at the same elevation angle at the top of the plane.

도 4는 마이크로폰 어레이 평면을 기준으로 상하 비대칭 구조를 가진 플랫폼에 따른 사상 함수의 대응 관계를 예시한다.4 illustrates the correspondence of the mapping function according to a platform having a vertically asymmetric structure with respect to the microphone array plane.

도 4에서, y축은 TDOA값을 나타내고, x축은 수평각, 곡선의 색깔은 고도각에 대응된다. 즉, TDOA값과 (수평각, 고도각) 좌표 간의 대응 관계를 나타내는데, 도 4를 참조하면, 크기가 같고 부호가 다른 고도각은 대부분 서로 다른 TDOA에 맵핑됨을 알 수 있다. 이러한 성질은 플랫폼이 비대칭적인 구조에 따라 발생된다. 여기서, 비대칭구조라 함은 마이크로폰 어레이 평면을 기준으로 상단의 플랫폼의 신호 수신 특성과 하단의 플랫폼의 신호 수신 특성이 표면 모양의 상이, 재질의 상이 등으로 인해 달라지는 구조를 의미한다.In FIG. 4, the y axis represents a TDOA value, the x axis corresponds to a horizontal angle, and the color of the curve corresponds to an elevation angle. That is, it shows a correspondence relationship between the TDOA value and the (horizontal angle, elevation angle) coordinates. Referring to FIG. 4, it can be seen that altitude angles having the same magnitude and different signs are mapped to different TDOAs. This property occurs due to the asymmetrical structure of the platform. Here, the asymmetric structure refers to a structure in which the signal reception characteristics of the upper platform and the signal reception characteristics of the lower platform are different due to the surface shape, the material, etc. based on the microphone array plane.

도 4에 예시된 사상 함수도 플랫폼의 표면 구조, 플랫폼의 재질 등을 고려하여 모델링하여 얻을 수 있으며, 실험을 통해서도 사상 함수를 얻을 수 있다.The mapping function illustrated in FIG. 4 may also be obtained by modeling in consideration of the surface structure of the platform, the material of the platform, and the like.

도 5는 본 발명의 일실시예에 따른 사상 함수 적용 과정을 설명하기 위한 도면 이다.5 is a diagram for describing a process of applying a mapping function according to an exemplary embodiment of the present invention.

도 5에서 좌측은 제1 좌표계를 나타내고, 우측은 제1 상호상관값 시퀀스를 나타낸다.In FIG. 5, the left side represents the first coordinate system and the right side represents the first cross-correlation value sequence.

제1 상호상관값 시퀀스의 상호상관값들 각각은 해당 상관 옵셋 τ을 가지고 있으며, 이 상관 옵셋은 제1 사상 함수의 대응 관계에 따라 제1 좌표계의 해당 좌표(수평각

, 고도각

)에 대응된다. 따라서, 도 5와 같이 제1 상호상관값 시퀀스의 상호상관값들 각각을 제1 좌표계의 해당 좌표에 할당할 수 있다.Each of the cross-correlation values of the first cross-correlation value sequence has a corresponding correlation offset τ, which is the corresponding coordinate (horizontal angle) of the first coordinate system according to the corresponding relationship of the first mapping function.

, Elevation angle

Corresponds to). Accordingly, as shown in FIG. 5, each of the cross-correlation values of the first cross-correlation value sequence may be allocated to a corresponding coordinate of the first coordinate system.

추정부(120)는 상기 제1 내지 제3 상호상관값 시퀀스의 할당 결과를 기초로, 음원의 방향을 추정한다.The estimator 120 estimates the direction of the sound source based on the allocation result of the first to third cross-correlation value sequences.

도 1을 참조하면 추정부(120)는, 좌표 변환부(122) 및 결정부(124)를 포함하여 이루어진다. Referring to FIG. 1, the estimator 120 includes a coordinate converter 122 and a determiner 124.

좌표 변환부(122)는 기준 좌표계(음원 위치 추정 장치의 공통 좌표계로서, 대개 로봇의 경우에는 정면을 기준으로 설정됨)와 제n 좌표계 간의 관계를 기초로 제n 상호상관값 시퀀스의 할당 결과를 좌표 변환하여, 기준 좌표계의 좌표들 각각에 상기 제n(1 내지 3) 상호상관값 시퀀스의 해당 상호상관값을 할당한다.The coordinate transformation unit 122 determines an assignment result of the nth cross-correlation value sequence based on the relationship between the reference coordinate system (the common coordinate system of the sound source position estimating apparatus, which is usually set based on the front surface of the robot) and the n-th coordinate system. By coordinate transformation, each of the coordinates of the reference coordinate system is assigned a corresponding cross-correlation value of the nth (1 to 3) cross-correlation value sequence.

도 6은 좌표 변환부(122)의 좌표 변환 과정을 예시하여 설명하기 위한 도면이다.6 is a diagram for explaining and explaining a coordinate conversion process of the coordinate conversion unit 122.

도 6과 같이, 편의상, 제1 내지 제3 마이크로폰들(m1, m2, m3)이 정삼각형을 이루고, 기준 좌표계가 제3 마이크로폰 쌍의 좌표계 즉, 제3 좌표계인 경우를 설명하고자 한다.For convenience, as illustrated in FIG. 6, the first to third microphones m1, m2, and m3 form an equilateral triangle, and the reference coordinate system will be described as a coordinate system of the third microphone pair, that is, a third coordinate system.

이 경우, 제3 상호상관값 시퀀스의 할당 결과는 좌표 변환할 필요 없고, 제1및 제2 상호상관값 시퀀스의 할당 결과를 좌표 변환해야 한다. 제1 좌표계와 제2 좌표계는, 제3 좌표계에 대해, 각각 -120^o의 수평각 옵셋 및 12^o의 수평각 옵셋을 가지고 있다. 따라서, 제1 좌표계의 좌표(

,

)에 할당된 상호상관값은 기준좌표계 즉, 제3 좌표계의 좌표(

-120^o,

)에 할당된다. 제2 좌표계도 마찬가지 원리로 설명된다.In this case, the assignment result of the third cross-correlation value sequence need not be coordinate-converted, and the assignment result of the first and second cross-correlation value sequences should be coordinate-converted. The first coordinate system and the second coordinate system have a horizontal angle offset of −120 ^{° and} a horizontal angle offset of 12 ^° , respectively, with respect to the third coordinate system. Therefore, the coordinates of the first coordinate system (

,

The cross-correlation value assigned to) is the coordinate of the reference coordinate system,

-120 ^o ,

Is assigned to). The second coordinate system is also described on the same principle.

도 6에서, 이해의 편의를 위해, 간단한 정삼각형의 구조를 설명하였지만, 정삼각형을 이루지 않는 마이크로폰 어레이의 구조에도 본 발명이 적용될 수 있음은 이 분야에 종사하는 자라면 충분히 이해할 수 있다.In FIG. 6, for convenience of understanding, the structure of a simple equilateral triangle has been described, but it can be fully understood by those skilled in the art that the present invention can be applied to a structure of a microphone array that does not form an equilateral triangle.

도 7은 좌표 변환부(122)의 결과 즉, 상호상관값 할당 결과를 예시하는 도면이다.FIG. 7 is a diagram illustrating a result of the coordinate transformation unit 122, that is, a result of assigning cross-correlation values.

도 7에서 좌측 상단 도면은 제1 마이크로폰 쌍에 대한 좌표 변환부(122)의 결과를 나타내고, 우측 상단 도면은 제2 마이크로폰 쌍에 대한 좌표 변환부(122)의 결과를 나타내고, 좌측 하단 도면은 제3 마이크로폰 쌍에 대한 좌표 변환부(122)의 결과를 나타낸다.In FIG. 7, the upper left figure shows the result of the coordinate transformation unit 122 for the first microphone pair, the upper right figure shows the result of the coordinate transformation unit 122 for the second microphone pair, and the lower left figure shows the result of the The result of the coordinate conversion unit 122 for three microphone pairs is shown.

도 7의 4개의 도면에서 x축, y축은 각각 기준 좌표계 기준의 수평각 및 고도각을 나타내며, 색깔은 상호상관값의 크기를 나타내는데, 적색에 가까울 수록 큰 값 이고, 파란색에 가까울수록 작은 값을 나타낸다. 즉, 도 7의 4개의 도면은 좌표계와 상호상관값 표현만 다를 수 있을 뿐, 도 5의 좌측 도면의 구조와 동일한 구조를 갖는다.In the four diagrams of FIG. 7, the x and y axes represent horizontal and altitude angles of the reference coordinate system, respectively, and the color represents the magnitude of the cross-correlation value. The closer to red, the larger the value, and the closer to blue, the smaller the value. . That is, the four views of FIG. 7 may have only the expression of the correlation system and the cross-correlation value, and have the same structure as the structure of the left diagram of FIG. 5.

결정부(124)는 기준 좌표계의 좌표들 중에서 할당된 상호상관값들의 합이 최대인 좌표를 검출하고, 검출된 좌표에 해당하는 수평각 및 고도각을 음원의 방향으로 결정한다. 도 7을 참조하여 설명하자면, 결정부(124)는 기준좌표계의 각 좌표마다 도 7의 좌측 상단 도면의 해당 위치의 상호상관값, 우측 상단 도면의 해당 위치의 상호상관값, 및 좌측 하단 도면의 해당 위치의 상호상관값을 더하여 우측 하단 도면과 같은 결과를 생성한 후, 우측 하단 도면과 같은 결과에서 가장 큰 상호상관값을 가진 좌표를 기초로 음원 방향을 결정한다. 도 7의 우측 하단 도면에서, 적색에 가장 가까운 색을 가진 좌표는 수평각이 120^o이고, 고도각이 50^o인 좌표이므로, 결정부(124)는 음원의 방향이 수평각 120^o, 고도각 50^o에 위치하는 것으로 결정한다.The determination unit 124 detects a coordinate having a maximum sum of the assigned cross-correlation values among the coordinates of the reference coordinate system, and determines a horizontal angle and an elevation angle corresponding to the detected coordinates in the direction of the sound source. Referring to FIG. 7, the determination unit 124 may determine the cross-correlation value of the corresponding position of the upper left drawing of FIG. 7, the cross-correlation value of the corresponding position of the upper right drawing, and the lower left drawing of each coordinate of the reference coordinate system. After adding the cross-correlation value of the corresponding position to generate a result as shown in the lower right drawing, the sound source direction is determined based on the coordinate having the largest cross-correlation value in the result as shown in the lower right drawing. In the lower right drawing of FIG. 7, since the coordinates having the color closest to red have the horizontal angle of 120 ^o and the elevation angle of 50 ^o , the determination unit 124 has a sound source having the horizontal angle of 120 ^o and the altitude of 50 ^o. Determined to be located at

도 8은 도 1의 실시예를 로봇에 적용하는 경우, 보다 구체적인 실시예를 나타낸다.FIG. 8 shows a more specific embodiment when the embodiment of FIG. 1 is applied to a robot.

도 8을 참조하면, 로봇의 머리에 해당하는 플랫폼(800)에 3개의 마이크로폰들(m1, m2, m3)가 있고, 이 플랫폼(800)이 로봇 본체(810)와 연결되어 있음을 알 수 있다. Referring to FIG. 8, it can be seen that there are three microphones m1, m2, and m3 on the platform 800 corresponding to the head of the robot, and the platform 800 is connected to the robot body 810. .

마이크로폰들(m1, m2, m3)은 음향 센싱된 결과를 전기 신호로 바꾸어 획득 부(820)에 제공한다. 여기서, 참조 번호 820에 해당하는 모듈은 도 1의 획득부(100)에 대응되며, 특히, 본 실시예는 로봇 자체에서 상호상관값들을 추정하는 실시예이다. The microphones m1, m2, and m3 convert the acoustically sensed result into an electrical signal and provide it to the acquisition unit 820. Here, the module corresponding to the reference numeral 820 corresponds to the acquisition unit 100 of FIG. 1, in particular, the present embodiment is an embodiment of estimating cross-correlation values in the robot itself.

마이크로폰들(m1, m2, m3)로부터 제공되는 신호는 일반적으로 미약하므로, 참조 번호 820에 해당하는 모듈에서, 증폭(amplifying) 과정을 거치게 된다. 이렇게 증폭된 신호는 ADC(Analog to Digital Converter)를 통하여 디지털 샘플들로 변환된다. 그 다음 제1 내지 제3 상호상관값 추정부들은 공지된 각종 추정 방식을 사용하여, 제1 내지 제3 상호상관값 시퀀스를 획득한다.Since the signals provided from the microphones m1, m2, and m3 are generally weak, the amplifying process is performed in the module corresponding to the reference numeral 820. The amplified signal is converted into digital samples through an analog to digital converter (ADC). The first to third cross-correlation value estimators then obtain the first to third cross-correlation value sequences using various known estimation schemes.

도 8의 참조 번호 830, 840, 850에 해당하는 모듈들은 도 1의 사상함수 적용부(110), 좌표 변환부(122), 및 결정부(124)에 대응한다.Modules corresponding to reference numerals 830, 840, and 850 of FIG. 8 correspond to the mapping function applying unit 110, the coordinate transformation unit 122, and the determination unit 124 of FIG. 1.

참조 번호 850에 해당하는 모듈에 있는 컴바이닝부는 기준좌표계의 각 좌표에 할당된 3개의 상호상관값들을 컴바이닝하고, 음원 방향 결정부는 컴바이닝 결과가 가장 큰 좌표에 해당하는 수평각 및 고도각을 음원의 방향으로 결정한다. 앞선 설명에서는 편의상 컴바이닝의 방식으로 단순 합산 방식을 예시하였지만, 사전 정보(apriori information)가 있는 경우(예컨대, 특정 마이크로폰에서 수신된 신호의 세기가 우수하다는 정보, 음원의 대략적인 범위가 파악되는 경우)라면, 단순 합산이 아닌, 가중합 방식(각 상호상관값들 각각에 가중치를 부여하여 합산하는 방식) 의 컴바이닝도 가능함은 이 분야에 종사하는 자라면 충분히 이해할 수 있다.The combining unit in the module corresponding to the reference number 850 combines three cross-correlation values assigned to each coordinate of the reference coordinate system, and the sound source direction determiner generates the horizontal and altitude angles corresponding to the coordinates with the largest combining result. Determine in the direction of. In the foregoing description, for the sake of convenience, a simple summation method is illustrated as a method of combining, but when there is apriori information (for example, information indicating that the strength of a signal received from a specific microphone is excellent, and an approximate range of a sound source is determined). ), The combination of weighted summation method (weighted sum of each cross-correlation value) and not simple summation can be fully understood by those skilled in the art.

도 9는 본 발명의 다른 일실시예에 따른 음원 위치 추정 장치를 나타내는 블 록도이다.9 is a block diagram illustrating a sound source position estimation apparatus according to another embodiment of the present invention.

도 9를 참조하면, 본 실시예에 따른 장치는 획득부(900), 사상함수 적용부(910), 및 추정부(920)를 포함하여 이루어진다.Referring to FIG. 9, the apparatus according to the present exemplary embodiment includes an acquirer 900, a mapping function applier 910, and an estimator 920.

획득부(900)는 도 1의 획득부(100)와 동일하게 설명되므로, 이하 설명은 생략한다.Since the acquirer 900 is described in the same manner as the acquirer 100 of FIG. 1, a description thereof will be omitted.

사상함수 적용부(910)는 미리 정해진 제n(1 내지 3) 사상 함수에 따라, 제n(1 내지 3) 상호상관값 시퀀스의 상호상관값들 각각을 기준 좌표계의 해당 좌표에 할당한다. 여기서, 제n(1 내지 3) 사상 함수는 상기 제n(1 내지 3) 마이크로폰 쌍 기준에서의 도착시간지연과 기준 좌표계의 좌표 간의 대응 관계를 나타내는 함수이다. 즉, 본 실시예의 사상함수 적용부(910)에서 사용되는 제1 내지 제3 사상 함수는 도 1에서의 좌표 변환부(122)의 좌표 변환 과정이 미리 반영된 사상 함수들이다. The mapping function application unit 910 allocates each of the cross-correlation values of the n (1 to 3) cross-correlation value sequence to corresponding coordinates of the reference coordinate system according to a predetermined n-th (1 to 3) mapping function. Here, the nth (1 to 3) mapping function is a function indicating a correspondence relationship between the arrival time delay in the nth (1 to 3) microphone pair reference and the coordinates of the reference coordinate system. That is, the first to third mapping functions used in the mapping function applying unit 910 of the present exemplary embodiment are mapping functions in which the coordinate transformation process of the coordinate conversion unit 122 in FIG. 1 is reflected in advance.

추정부(920)는 상기 제1 내지 제3 상호상관값 시퀀스의 할당 결과를 기초로, 음원의 방향을 추정한다. 일실시예에 따라, 추정부(920)는, 상기 기준 좌표계의 좌표들 중에서 할당된 상호상관값들의 합이 최대인 좌표를 검출하고, 상기 검출된 좌표에 해당하는 수평각 및 고도각을 상기 음원의 방향으로 결정한다.The estimator 920 estimates the direction of the sound source based on the allocation result of the first to third cross-correlation value sequences. According to an embodiment, the estimator 920 detects a coordinate having a maximum sum of the assigned cross-correlation values among the coordinates of the reference coordinate system, and calculates a horizontal angle and an elevation angle corresponding to the detected coordinates of the sound source. Decide on direction.

도 10은 본 발명의 일실시예에 따른 음원 위치 추정 방법을 나타내는 흐름도이다.10 is a flowchart illustrating a sound source position estimation method according to an embodiment of the present invention.

도 1을 참조하여 도 10의 실시예를 설명하면 다음과 같다. 즉, 도 1의 실시 예를 시계열적으로 구현하는 경우도 본 실시예에 해당하므로, 도 1에서 설명된 부분은 본 실시예에도 그대로 적용된다.The embodiment of FIG. 10 will be described with reference to FIG. 1. That is, the case of implementing the embodiment of FIG. 1 in time series also corresponds to the present embodiment, and thus the parts described in FIG. 1 are applied to the present embodiment as it is.

S1000 단계에서, 획득부(100)는 3개의 마이크로폰들의 조합으로 얻어지는 제n(1 내지 3) 마이크로폰 쌍에 대한 제n(1 내지 3) 상호상관값 시퀀스를 획득한다.In operation S1000, the acquirer 100 obtains an nth (1 to 3) cross-correlation value sequence for the nth (1 to 3) microphone pair obtained by combining three microphones.

S1010 단계에서, 사상함수 적용부(110)는 미리 정해진 제n(1 내지 3) 사상 함수(mapping function)에 따라, 제n(1 내지 3) 상호상관값 시퀀스의 상호상관값들 각각을 제n(1 내지 3) 좌표계 - 제n(1 내지 3) 마이크로폰 쌍을 기준으로 설정되는 좌표계로서 수평각 및 고도각을 포함함 - 의 해당 좌표에 할당한다. 제n(1 내지 3) 사상 함수는 상기 제n(1 내지 3) 마이크로폰 쌍 기준에서의 도착시간지연과 상기 제n(1 내지 3) 좌표계의 좌표 간의 대응 관계를 나타내는 함수이다.In operation S1010, the mapping function applying unit 110 stores each of the cross-correlation values of the n (1 to 3) cross-correlation value sequence according to a predetermined n-th (1 to 3) mapping function. (1 to 3) coordinate system, which is a coordinate system set based on the nth (1 to 3) microphone pair and includes a horizontal angle and an elevation angle. The nth (1 to 3) mapping function is a function indicating a correspondence relationship between an arrival time delay based on the nth (1 to 3) microphone pair and the coordinates of the nth (1 to 3) coordinate system.

S1020 단계에서, 추정부(120)는 제1 내지 제3 상호상관값 시퀀스의 할당 결과를 기초로, 음원의 방향을 추정한다. 일실시예에 따라, 좌표변환부(122)가 기준 좌표계와 상기 제n 좌표계 간의 관계를 기초로 상기 제n 상호상관값 시퀀스의 할당 결과를 좌표 변환하여, 상기 기준 좌표계의 좌표들 각각에 상기 제n(1 내지 3) 상호상관값 시퀀스의 해당 상호상관값을 할당한 후, 결정부(124)가 상기 기준 좌표계의 좌표들 중에서 할당된 상호상관값들의 합이 최대인 좌표를 검출하고, 상기 검출된 좌표에 해당하는 수평각 및 고도각을 상기 음원의 방향으로 결정한다.In operation S1020, the estimator 120 estimates the direction of the sound source based on the allocation result of the first to third cross-correlation value sequences. According to an embodiment, the coordinate transformation unit 122 coordinate-converts an assignment result of the nth cross-correlation value sequence based on a relationship between a reference coordinate system and the nth coordinate system, and applies the coordinates to each of the coordinates of the reference coordinate system. After allocating the corresponding cross-correlation value of the n (1 to 3) cross-correlation value sequence, the determination unit 124 detects the coordinate whose sum of the assigned cross-correlation values is the maximum among the coordinates of the reference coordinate system, and detects the detected cross-correlation value. The horizontal angle and the altitude angle corresponding to the coordinates are determined in the direction of the sound source.

도 11은 본 발명의 다른 일실시예에 따른 음원 위치 추정 방법을 나타내는 흐름도이다.11 is a flowchart illustrating a sound source position estimation method according to another embodiment of the present invention.

도 9를 참조하여 도 11의 실시예를 설명하면 다음과 같다. 즉, 도 9의 실시예를 시계열적으로 구현하는 경우도 본 실시예에 해당하므로, 도 9에서 설명된 부분은 본 실시예에도 그대로 적용된다.The embodiment of FIG. 11 will be described with reference to FIG. 9. That is, the case of implementing the embodiment of FIG. 9 in time series also corresponds to the present embodiment, and thus the parts described in FIG. 9 are applied to the present embodiment as it is.

S1100 단계에서, 획득부(900)는 3개의 마이크로폰들의 조합으로 얻어지는 제n(1 내지 3) 마이크로폰 쌍에 대한 제n(1 내지 3) 상호상관값 시퀀스를 획득한다.In operation S1100, the acquirer 900 obtains an nth (1 to 3) cross-correlation value sequence for the nth (1 to 3) microphone pair obtained by combining three microphones.

S1110 단계에서, 사상함수 적용부(910)는 미리 정해진 제n(1 내지 3) 사상 함수에 따라, 제n(1 내지 3) 상호상관값 시퀀스의 상호상관값들 각각을 기준 좌표계 의 해당 좌표에 할당한다. 제n(1 내지 3) 사상 함수는 상기 제n(1 내지 3) 마이크로폰 쌍 기준에서의 도착시간지연과 기준 좌표계의 좌표 간의 대응 관계를 나타내는 함수이다.In operation S1110, the mapping function applying unit 910 stores each of the cross-correlation values of the n (1 to 3) cross-correlation value sequence according to the predetermined n-th (1 to 3) mapping function to the corresponding coordinates of the reference coordinate system. Assign. The nth (1 to 3) mapping function is a function indicating a correspondence relationship between the arrival time delay in the nth (1 to 3) microphone pair reference and the coordinates of the reference coordinate system.

S1120 단계에서, 추정부(920)는 상기 제1 내지 제3 상호상관값 시퀀스의 할당 결과를 기초로, 음원의 방향을 추정한다. 일실시예에 따라 추정부(920)는 상기 기준 좌표계의 좌표들 중에서 할당된 상호상관값들의 합이 최대인 좌표를 검출하고, 상기 검출된 좌표에 해당하는 수평각 및 고도각을 상기 음원의 방향으로 결정한다. In operation S1120, the estimator 920 estimates the direction of the sound source based on the allocation result of the first to third cross-correlation value sequences. According to an embodiment, the estimator 920 detects a coordinate having a maximum sum of the assigned cross-correlation values among the coordinates of the reference coordinate system, and sets a horizontal angle and an elevation angle corresponding to the detected coordinates in the direction of the sound source. Decide

본 발명은 또한 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의해 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광데이터 저장장치 등이 있으며, 또한 케리어 웨이브(예를 들어 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다. 또한, 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고, 본 발명을 구현하기 위한 기능적인(functional) 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술분야의 프로그래머들에 의해 용이하게 추론될 수 있다.The invention can also be embodied as computer readable code on a computer readable recording medium. Computer-readable recording media include all kinds of recording devices that store data that can be read by a computer system. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disks, optical data storage devices, and the like, which are also implemented in the form of carrier waves (for example, transmission over the Internet). Include. The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion. In addition, functional programs, codes, and code segments for implementing the present invention can be easily inferred by programmers in the art to which the present invention belongs.

이러한 본원 발명인 장치는 이해를 돕기 위하여 도면에 도시된 실시예를 참고로 설명되었으나, 이는 예시적인 것에 불과하며, 당해 분야에서 통상적 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 특허청구범위에 의해 정해져야 할 것이다.The inventors of the present invention have been described with reference to the embodiments shown in the drawings for clarity, but this is merely exemplary, and those skilled in the art may various modifications and other equivalent embodiments therefrom. Will understand. Therefore, the true technical protection scope of the present invention will be defined by the appended claims.

도 2는 본 실시예를 설명하기 위해 예시하는 마이크로폰 어레이 및 플랫폼을 나타낸다.2 shows a microphone array and a platform to illustrate the present embodiment.

도 3은 본 발명의 일실시예에서 사용하는 사상 함수를 설명하기 위한 도면이다.3 is a view for explaining a mapping function used in an embodiment of the present invention.

도 5는 본 발명의 일실시예에 따른 사상 함수 적용 과정을 설명하기 위한 도면이다.5 is a view for explaining a process of applying a mapping function according to an embodiment of the present invention.

도 6은 좌표 변환부의 좌표 변환 과정을 예시하여 설명하기 위한 도면이다.6 is a diagram for explaining and explaining a coordinate conversion process of the coordinate conversion unit.

도 7은 좌표 변환부의 결과 즉, 상호상관값 할당 결과를 예시하는 도면이다.7 is a diagram illustrating a result of the coordinate transformation unit, that is, a result of cross correlation value assignment.

도 9는 본 발명의 다른 일실시예에 따른 음원 위치 추정 장치를 나타내는 블록도이다.9 is a block diagram illustrating a sound source position estimation apparatus according to another embodiment of the present invention.

Claims

An acquisition unit for obtaining an nth (1-3) cross-correlation value sequence for the nth (1-3) microphone pair obtained by the combination of three microphones;

According to a predetermined nth (1 to 3) mapping function, each of the cross-correlation values of the n-th (1 to 3) cross-correlation value sequence is an n (1 to 3) coordinate system-n (1) 3) a mapping function application unit for allocating corresponding coordinates of-including a horizontal angle and an altitude angle as a coordinate system set based on a microphone pair; And

An estimator configured to estimate a direction of a sound source based on an assignment result of the first to third cross-correlation value sequences,

The nth (1 to 3) mapping function is a sound source position estimation device which is a function representing a correspondence relationship between the arrival time delay based on the n (1 to 3) microphone pair and the coordinates of the n (1 to 3) coordinate system. .

The method of claim 1,

The platform in which the microphones are installed has an up and down asymmetrical structure with respect to the plane formed by the microphones.

The method of claim 1, wherein the estimating unit,

Coordinate transformation of the allocation result of the n-th cross-correlation value sequence based on the relationship between a reference coordinate system and the n-th coordinate system, so that each of the coordinates of the n (1 to 3) cross-correlation value sequence corresponds to each of the coordinates of the reference coordinate system. A coordinate transformation unit for assigning cross-correlation values; And

And a determination unit which detects a coordinate having a maximum sum of the assigned cross-correlation values among coordinates of the reference coordinate system and determines a horizontal angle and an elevation angle corresponding to the detected coordinates in the direction of the sound source.

Assigning each of the cross-correlation values of the n-th (3) cross-correlation value sequence to corresponding coordinates of a reference coordinate system, including a horizontal angle and an elevation angle, according to a predetermined nth (1-3) mapping function Mapping function application unit; And

And the nth (1 to 3) mapping function is a function representing a correspondence relationship between the arrival time delay in the nth (1 to 3) microphone pair reference and the coordinates of the reference coordinate system.

The method of claim 4, wherein

The method of claim 4, wherein the estimating unit,

A sound source position estimation device for detecting coordinates having the maximum sum of the assigned cross-correlation values among the coordinates of the reference coordinate system, and determining a horizontal angle and an elevation angle corresponding to the detected coordinates in the direction of the sound source.

Obtaining an nth (1-3) cross-correlation sequence for an nth (1-3) microphone pair obtained by the combination of three microphones;

According to a predetermined nth (1 to 3) mapping function, each of the cross-correlation values of the n-th (1 to 3) cross-correlation value sequence is an n (1 to 3) coordinate system-n (1) 3) assigning to the corresponding coordinates of-including a horizontal angle and an elevation angle as a coordinate system set based on the microphone pair; And

Estimating a direction of a sound source based on an assignment result of the first to third cross-correlation value sequences;

The nth (1 to 3) mapping function is a function of indicating a correspondence between the arrival time delay based on the nth (1 to 3) microphone pair and the coordinates of the nth (1 to 3) coordinate system. .

The method of claim 7, wherein

And a platform in which the microphones are installed has a vertically asymmetrical structure with respect to a plane formed by the microphones.

The method of claim 7, wherein the estimating step,

Coordinate transformation of an assignment result of the nth cross-correlation value sequence based on a relationship between a reference coordinate system and the n-th coordinate system, and the corresponding crossover of the nth (1 to 3) cross-correlation value sequence to each of the coordinates of the reference coordinate system. Assigning a correlation value; And

And detecting a coordinate having a maximum sum of the assigned cross-correlation values among coordinates of the reference coordinate system, and determining a horizontal angle and an elevation angle corresponding to the detected coordinates in the direction of the sound source.

Assigning each of the cross-correlation values of the n-th (3) cross-correlation value sequence to corresponding coordinates of a reference coordinate system, including a horizontal angle and an elevation angle, according to a predetermined nth (1-3) mapping function step; And

The method of claim 10,

The method of claim 10, wherein the estimating comprises:

A computer-readable recording medium containing a program for executing the method of claim 7 on a computer.

A computer-readable recording medium containing a program for executing the method of claim 10 on a computer.