KR101189765B1

KR101189765B1 - Method and apparatus for classification sex-gender based on voice and video

Info

Publication number: KR101189765B1
Application number: KR1020080132626A
Authority: KR
Inventors: 김혜진; 윤호섭; 황대환
Original assignee: 한국전자통신연구원
Priority date: 2008-12-23
Filing date: 2008-12-23
Publication date: 2012-10-15
Also published as: KR20100073845A; JP4881980B2; JP2010152866A

Abstract

본 발명은 입력받은 영상 정보 및 음성 정보로부터 특정인의 성별 및 연령을 식별할 수 있는 방법 및 그 장치에 관한 것으로서, 구체적으로는 성별정보와 연령정보의 상호 연관성을 고려하여 음성인식 및 얼굴인식을 조합 수행함으로써 정확하게 성별 및 연령을 연산할 수 있는 식별장치 및 방법에 관한 것이다.The present invention relates to a method and apparatus for identifying a gender and age of a specific person from input image information and audio information. Specifically, a combination of voice recognition and face recognition is considered in consideration of the correlation between gender information and age information. The present invention relates to an identification device and a method for accurately calculating gender and age.

본 발명에 따른 성별-연령 식별방법은, 영상 정보 및 음성 정보를 수집하는 단계; 상기 수집된 음성 정보에 대하여 하나 이상의 특징값을 추출하고, 상기 추출된 특징값을 이용하여 성별 및 연령을 판별하는 음성 정보를 이용한 성별 및 연령 판별단계; 상기 수집된 영상 정보에 대하여 하나 이상의 특징값을 추출하고, 상기 추출된 특징값을 이용하여 성별 및 연령을 판별하는 얼굴 정보를 이용한 성별 및 연령 판별단계; 및 상기 음성 정보를 이용하여 판별된 성별 및 연령과 상기 얼굴 정보를 이용하여 판별된 성별 및 연령을 조합 연산하여 성별 및 연령을 최종 결정하는 단계를 포함한다.Gender-age identification method according to the invention, the step of collecting the image information and audio information; Extracting one or more feature values with respect to the collected voice information, and determining gender and age using voice information for determining gender and age using the extracted feature values; Extracting one or more feature values with respect to the collected image information, and determining gender and age using face information for determining gender and age using the extracted feature values; And finally determining the gender and age by combining a gender and an age determined using the voice information and a gender and an age determined using the face information.

얼굴인식, 음성인식, 성별 식별, 연령식별, 지능형 DB Face recognition, voice recognition, gender identification, age identification, intelligent DB

Description

Gender-age discrimination method and device based on voice and video {METHOD AND APPARATUS FOR CLASSIFICATION SEX-GENDER BASED ON VOICE AND VIDEO}

본 발명은 지식경제부 및 정보통신연구진흥원의 IT원천기술개발사업의 일환으로 수행한 연구로부터 도출된 것이다 [과제관리번호: 2008-F-037-01, 과제명: u-로봇 HRI 솔루션 및 핵심 소자 기술 개발].The present invention is derived from the research conducted as part of the IT source technology development project of the Ministry of Knowledge Economy and the Ministry of Information and Telecommunication Research and Development. Technology development].

종래 기술에 따른 사용자의 성별 및 연령 판별기술로서, 전자주민증과 같은 개인 식별수단을 이용하는 방법, 얼굴인식을 이용하는 방법, 음성인식을 이용하는 방법 등이 존재한다.As the gender and age discrimination technology of the user according to the prior art, there is a method using a personal identification means such as an electronic resident's card, a method using a face recognition, a method using a voice recognition and the like.

개인 식별수단을 이용하는 방법 중의 하나로서, 전자주민증을 이용한 연령인식 방법(한국 공개특허 제1999-0008679호)은 각 개인이 전자주민증과 같은 개인 식별수단을 항상 휴대해야 하는 불편함이 있으며, 또한 전자 주민증과 같은 개인 식 별수단은 분실, 파손, 위조 등이 쉽게 발생할 수 있는 문제점이 있다.As a method of using the personal identification means, the age recognition method using the electronic resident ID (Korean Patent Publication No. 1999-0008679) has the inconvenience that each individual must always carry a personal identification means such as electronic resident ID, Personal identification means such as resident card, there is a problem that can be easily lost, damaged, counterfeited.

종래의 성별-연령 판별기술로서 사용되는 얼굴인식 방법의 경우에는, 얼굴 영상 정보만으로 성별 및 연령을 판단하였으므로, 각 개인마다 특징을 반영하기가 어려워 인식 정확도가 낮고, 또한, 음성인식을 이용한 인식방법의 경우에는, 음성 정보만으로 성별 및 연령을 판단하였으므로, 여성과 어린이 같이 음성적 특징이 유사한 경우 등에 있어서 그 인식 정확도가 낮아지는 문제점이 존재한다.In the face recognition method used as a conventional gender-age discrimination technique, since the gender and age are determined only by the face image information, it is difficult to reflect the characteristics for each individual, so that the recognition accuracy is low, and the recognition method using the voice recognition In the case of, since gender and age are determined only by voice information, there is a problem in that the recognition accuracy is lowered when the voice characteristics such as women and children are similar.

또한 종래의 얼굴인식 또는 음성인식에 기초한 판별 방식은 성별에 따라 특징의 분포가 달라지는 특이성 또는 연령에 따라 성별의 특징 분포가 다른 특이성 등을 반영하여 연령 및 성별을 판별하지 못하고 있으므로 연산의 정확도가 낮고 연산량이 많다는 단점이 있다.In addition, the conventional method of discriminating face or voice based recognition does not discriminate age and sex by reflecting specificity in which feature distribution varies according to gender or specificity in which sex feature distribution varies according to age. The disadvantage is that the amount of computation is large.

본 발명은 전술한 문제점을 해결하기 위한 것으로, 본 발명이 해결하고자 하는 과제는 성별정보와 연령정보의 상호 연관성을 이용하고 또한 음성인식 및 얼굴인식을 조합함으로써 인식 정확도를 향상시킬 수 있는 성별-연령 식별방법 및 그 장치를 제공하는 것이다.The present invention is to solve the above-mentioned problems, the problem to be solved by the present invention is to use the interrelationship of gender information and age information, and gender-age that can improve the recognition accuracy by combining voice recognition and face recognition An identification method and an apparatus thereof are provided.

본 발명의 일면에 따른 성별-연령 식별방법은, 영상 정보 및 음성 정보를 수집하는 단계; 상기 수집된 음성 정보에 대하여 하나 이상의 특징값을 추출하고, 상기 추출된 특징값을 이용하여 성별 및 연령을 판별하는 음성 정보를 이용한 성별 및 연령 판별단계; 상기 수집된 영상 정보에 대하여 하나 이상의 특징값을 추출하고, 상기 추출된 특징값을 이용하여 성별 및 연령을 판별하는 얼굴 정보를 이용한 성별 및 연령 판별단계; 및 상기 음성 정보를 이용하여 판별된 성별 및 연령과 상기 얼굴 정보를 이용하여 판별된 성별 및 연령을 조합 연산하여 성별 및 연령을 최종 결정하는 단계를 포함한다.Gender-age identification method according to an aspect of the present invention, collecting the image information and audio information; Extracting one or more feature values with respect to the collected voice information, and determining gender and age using voice information for determining gender and age using the extracted feature values; Extracting one or more feature values with respect to the collected image information, and determining gender and age using face information for determining gender and age using the extracted feature values; And finally determining the gender and age by combining a gender and an age determined using the voice information and a gender and an age determined using the face information.

본 발명에 또 다른 면에 따른 성별-연령 식별장치는, 영상 정보 및 음성 정보를 수집하는 입력부; 상기 수집된 음성 정보에 대하여 특징값을 추출하고, 그 추출된 특징값을 이용하여 상기 음성 정보로부터 성별 및 연령을 판별하는 음성 처리부; 상기 수집된 영상 정보에 대하여 특징값을 추출하고, 그 추출된 특징값을 이용하여 상기 영상 정보로부터 성별 및 연령을 판별하는 영상 처리부; 및 상기 영상 처리부에서 판별된 성별 및 연령과 상기 음성 처리부에서 판별된 성별 및 연령을 조합 연산하여 상기 특정인의 성별 및 연령을 최종 결정하는 최종 판별부를 포함한다. According to another aspect of the present invention, a gender-age identification apparatus includes an input unit for collecting image information and audio information; A voice processor extracting a feature value with respect to the collected voice information and determining a gender and an age from the voice information using the extracted feature value; An image processor extracting a feature value with respect to the collected image information and determining a gender and an age from the image information using the extracted feature value; And a final determination unit which finally determines the gender and age of the specific person by combining a gender and age determined by the image processor and the gender and age determined by the voice processor.

본 발명은 음성인식 및 얼굴인식을 조합하여 수행하므로, 종래의 음성인식만을 이용한 방법 또는 얼굴인식만을 이용한 방법보다 인식정확도가 향상되는 효과가 있다. Since the present invention is performed by combining the voice recognition and the face recognition, the recognition accuracy is improved compared to the conventional method using only the voice recognition or the method using only the face recognition.

또한 본 발명은, 성별정보와 연령정보의 상호 연관성, 예컨대 연령 판별은 성별에 따라 특징의 분포가 달라지는 특이성 또는 연령에 따라 성별의 특징 분포가 다른 특이성 등을 반영하여 연령 및 성별을 인식하므로 종래의 인식 방법에 비해 높은 정확도를 보장할 수 있는 효과가 있다.In addition, the present invention, because the correlation between the gender information and age information, for example, age discrimination recognizes the age and gender by reflecting the specificity of the distribution of the characteristics according to gender or the specificity of the distribution of gender characteristics according to age, etc. Compared with the recognition method, high accuracy can be guaranteed.

또한 본 발명은, 특징 추출에 있어서 입력된 정보에 대하여 각 입력 정보별 구별이 용이한 특징을 기준으로 일차적으로 음성 정보를 그룹화하고, 그 기준에 따라 구별된 각 그룹에 대하여 각 그룹별의 특징을 반영하여 특징값을 추출하는 방법을 이용함으로써, 식별의 정확성을 확보할 수 있으며 또한 연산의 중복성을 배제하여 빠른 식별을 수행할 수 있는 효과가 있다.In addition, the present invention primarily groups voice information on the basis of the characteristics that can be easily distinguished for each input information with respect to the input information in the feature extraction, and the characteristics of each group for each group distinguished according to the criteria. By using the method of extracting the feature value by reflecting, it is possible to secure the accuracy of identification and to perform the quick identification by eliminating the redundancy of the operation.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 상세하게 설명한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명에 따른 성별-연령 식별장치의 일실시예에 대한 구성도이다.1 is a block diagram of an embodiment of a gender-age identification apparatus according to the present invention.

도 1에 도시된 바와 같이, 본 발명에 따른 성별-연령 식별장치는, 입력부(10), 연령-성별 연산부(20) 및 출력부(30)를 포함하여 구성될 수 있다.As shown in FIG. 1, the gender-age identification apparatus according to the present invention may include an input unit 10, an age-gender calculator 20, and an output unit 30.

입력부(10)는 특정인의 영상 정보 및 음성 정보를 수집한다.The input unit 10 collects video information and audio information of a specific person.

이러한 입력부(10)는 영상 정보를 취득할 수 있는 카메라와 같은 영상 정보 취득수단 및 음향정보를 취득할 수 있는 스피커와 같은 음향정보 취득수단을 포함하여 구성될 수 있다. The input unit 10 may include image information acquisition means such as a camera capable of acquiring image information, and acoustic information acquisition means such as a speaker capable of acquiring sound information.

또한 입력부(10)는 영상 정보 취득수단에 의하여 취득된 영상 정보에서 특정인의 얼굴 정보만을 따로 추출하는 얼굴 추출수단 및 음향정보 취득수단에 의하여 취득된 음향정보에서 특정인의 음성 정보만을 따로 추출할 수 있는 음성 추출수단을 포함하도록 구성할 수 있다. 이 경우, 연령-성별 연산부(20)의 각 특징 추출수단이 매번 얼굴 정보 및 음성 정보를 영상 정보 및 음향정보에서 따로 추출할 필요가 없게 되므로 빠른 연산이 가능하다.In addition, the input unit 10 may separately extract only the voice information of the specific person from the face extracting means for extracting only the face information of the specific person separately from the image information acquired by the image information acquiring means and the sound information acquired by the sound information acquiring means. It can be configured to include a voice extraction means. In this case, since each feature extracting means of the age-gender calculation unit 20 does not need to extract face information and voice information from image information and sound information each time, it is possible to perform fast calculation.

이러한 얼굴 추출수단 및 음성 추출수단의 구현은, 종래의 얼굴 검출 기술들을 이용하여 구현될 수 있다. 예컨대, 얼굴 추출을 위하여, 지식 기반 방법(Knowledge-based Methods), 특징 기반 방법 (Feature-based Methods), 템플릿 매칭 방법(Template-matching Methods), 외형 기반 방법(Appearance-based Methods), 열적외선(Infra Red) 방법, 3차원 얼굴인식 방법, 멀티 모달 방법 등을 이용하여 얼굴 추출수단을 구현할 수 있다. The implementation of such face extracting means and speech extracting means may be implemented using conventional face detection techniques. For example, for face extraction, knowledge-based methods, feature-based methods, template-matching methods, appearance-based methods, thermal infrared ( Infra Red), a three-dimensional face recognition method, a multi-modal method can be used to implement the face extraction means.

연령-성별 연산부(20)는 음성 정보를 기초로 연령 및 성별을 판별하는 음성 처리부(100), 영상 정보를 기초로 연령 및 성별을 판별하는 영상 처리부(200) 및 음성 처리부(100)와 영상 처리부(200)의 연산 결과를 종합하여 연령 및 성별을 결정하는 최종 판별부(300)를 포함하여 구성된다.The age-gender calculator 20 includes a voice processor 100 for determining age and gender based on voice information, an image processor 200 for determining age and gender based on image information, a voice processor 100 and an image processor. The final determination unit 300 is configured to determine the age and gender by combining the calculation results of the 200.

출력부(30)는 연령-성별 연산부(20)로부터 전달된 연령 및 성별을 출력한다.The output unit 30 outputs the age and gender transmitted from the age-gender calculator 20.

이하에서는 도 2 및 도 3을 참조하여, 연령-성별 연산부(20)에 대하여 상세히 설명한다.Hereinafter, the age-gender calculator 20 will be described in detail with reference to FIGS. 2 and 3.

도 2는 도 1에 따른 음성 처리부(100)의 세부 구성도이다.FIG. 2 is a detailed configuration diagram of the voice processing unit 100 shown in FIG. 1.

도 2에 도시된 바와 같이, 음성 처리부(100)는 음성 정보에서 특징값을 추출하는 음성 특징추출부(110) 및 그 추출된 특징값으로부터 성별 및 연령을 판별하는 음성연산부(120)를 포함하여 구성될 수 있다.As shown in FIG. 2, the voice processing unit 100 includes a voice feature extraction unit 110 for extracting feature values from voice information and a voice operator 120 for determining gender and age from the extracted feature values. Can be configured.

더 상세히 설명하면, 음성 특징추출부(110)는 음성 정보에 대하여 하나 이상의 특징값 또는 특징벡터(이하에서는, '특징값'이라 통칭함)을 추출한다. 이러한 음성 특징추출부(110)는 선형예측계수(Linear Predictive Coefficient) 방법, 켑스트럼(Cepstrum) 방법, 멜프리퀀시켑스트럼(Mel Frequency Cepstral Coefficient, MFCC) 방법, 주파수 대역별 에너지(Filter Bank Energy) 방법 등을 이용하거나, 이들을 조합하여 특징값을 추출할 수 있다.In more detail, the voice feature extractor 110 extracts one or more feature values or feature vectors (hereinafter, referred to as "feature values") with respect to the voice information. The voice feature extractor 110 includes a linear predictive coefficient method, a cepstrum method, a mel frequency cepstral coefficient (MFCC) method, and an energy per frequency band (Filter Bank Energy). Method) or a combination thereof to extract feature values.

음성 특징추출부(110)는 전술한 특징값 판별방법을 복수 적용하여 동일한 음성 정보에서 복수 개의 특징값을 추출하거나, 단일의 특징값 판별방법을 사용하되 복수의 샘플을 이용하여 복수 개의 특징값을 판별할 수 있으며, N 개의 특징판별 방법으로 M개의 음성 샘플을 대상으로 특징값을 얻으면 (N * M)의 행렬형태로 특징값을 나타낼 수 있다.The voice feature extractor 110 extracts a plurality of feature values from the same voice information by applying a plurality of feature value determination methods described above, or uses a single feature value determination method to generate a plurality of feature values using a plurality of samples. When the feature values are obtained for M speech samples using N feature discrimination methods, the feature values may be represented in a matrix form of (N * M).

본 실시예서는 음성에 대한 특징추출을 정확하고 빠르게 실시하기 위하여, 성별 특징추출부(111), 연령별 특징추출부-M(112), 연령별 특징추출부-FC(113), 연령별 특징추출부-F(114) 및 성별 특징추출부-C(115)를 포함하여 음성 특징추출부(110)를 구성한다.In this embodiment, in order to accurately and quickly perform the feature extraction for voice, gender feature extraction unit 111, age-specific feature extraction unit-M (112), age-specific feature extraction unit -FC 113, age-specific feature extraction unit- The voice feature extractor 110 is configured to include an F 114 and a gender feature extractor-C 115.

성별 특징추출부(111)는 입력된 음성 정보에 대한 남성과 여성의 차이, 즉 성별 특징을 반영하여 특징값을 추출하고, 그 추출된 특징값을 기준으로 음성 정보를 남성그룹(M) 또는 여성 및 어린이 그룹(FC)으로 구분한다.The sex feature extracting unit 111 extracts a feature value by reflecting the difference between the male and the female, that is, the sex feature, with respect to the input voice information, and extracts the voice information based on the extracted feature value from the male group M or the female. And children's groups (FC).

연령별 특징추출부-M(112)는 성별 특징추출부(111)에 의하여 남성 그룹(M)으로 구분된 음성 정보에 대하여 특징값을 추출하며, 이 경우 입력되는 음성 정보는 남성의 음성 정보로 판단된 음성 정보이므로 그에 대하여 남성의 연령별 특징을 반영하여 특징값을 추출할 수 있다.The age-specific feature extractor-M 112 extracts feature values with respect to the voice information divided into the male group M by the gender feature extractor 111, and in this case, the input voice information is determined as the voice information of the male. Since it is the voice information, the feature value can be extracted by reflecting the age-specific features of the male.

연령별 특징추출부-FC(113)는 성별 특징추출부(111)에 의하여 여성 및 어린이 그룹(FC)으로 구분된 음성 정보에 대하여, 여성 및 어린이의 연령별 특징을 반영하여 특징값을 추출할 수 있으며, 그 결과 입력 음성 정보를 다시 여성 그룹(F)과 어린이 그룹(C)으로 구분한다. 이때, 어린이 그룹(C)은 남녀 특징이 구분되기 어려운 변성기 이전의 사람을 대상으로 하는 그룹이다.The age-specific feature extractor-FC 113 may extract feature values by reflecting age-specific features of women and children with respect to voice information divided into women and children groups (FC) by the sex feature extractor 111. As a result, the input voice information is divided into the female group (F) and the children group (C). At this time, the children's group (C) is a group for people before the metamorphosis difficult to distinguish the gender characteristics.

연령별 특징추출부-F(114)는 연령별 특징추출부-FC(113)에 의하여 여성 그룹(F)으로 구분된 음성 정보에 대하여, 여성의 연령별 특징을 반영하여 특징값을 추출할 수 있다.The age-specific feature extractor-F 114 may extract feature values by reflecting the age-specific features of women with respect to the voice information divided into the female group F by the age-specific feature extractor-FC 113.

성별 특징추출부-C(115)는 연령별 특징추출부-FC(113)에 의하여 어린이 그 룹(C)으로 구분된 상기 음성 정보에 대하여, 어린이의 성별 특징을 반영하여 특징값을 추출한다.The sex feature extractor-C 115 extracts a feature value by reflecting the sex feature of the child with respect to the voice information classified into the child group C by the feature extractor-FC 113 for each age.

음성연산부(120)는, 전술한 바와 같이 음성 특징추출부(110)에 의하여 추출된 특징값을 입력받아 입력 음성의 성별 및 연령을 판별할 수 있다.The voice operator 120 may receive the feature value extracted by the voice feature extractor 110 and determine the gender and age of the input voice as described above.

이를 위하여 음성연산부(120)는, 음성 특징추출부(110)에서 추출된 특징값에 대하여 가중치를 반영하여 대표 특징값을 결정하는 조합연산부 및 결정된 대표 특징값에 기초하여, 성별 및 연령별 기준 특징값 또는 음성 및 영상 기준 샘플을 저장하고 있는 기준 DB를 참조하여 성별 및 연령을 판별하는 판별부를 포함한다.To this end, the speech operation unit 120 based on the combinational operation unit that determines the representative feature value by reflecting the weight with respect to the feature value extracted by the speech feature extractor 110 and the determined representative feature value, the reference feature value for each gender and age Or a determination unit for determining gender and age by referring to a reference DB storing audio and image reference samples.

또한 음성연산부(120)는, 도 2에 도시된 바와 같이, 음성 특징추출부(110)에서 그룹화한 남성 그룹(M), 여성 그룹(F) 및 어린이 그룹(C)에 대하여 각각 최적화된 조합연산부 및 판별부를 각각 구비하도록 구성하는 것이 바람직하다. In addition, as shown in FIG. 2, the voice operation unit 120 is a combination operation unit optimized for the male group M, the female group F, and the child group C grouped by the voice feature extraction unit 110, respectively. And a discriminating unit, respectively.

이하에서는 이와 같이 각각 조합연산부 및 판별부가 구비된 도 2에 도시된 실시예를 기준으로 설명한다.Hereinafter, a description will be given with reference to the embodiment shown in FIG.

음성연산부(120)는 음성 특징추출부(110)에서 남성 그룹(M)으로 구분된 음성 정보에서 추출된 특징값을 입력받아 성별 및 연령을 연산하는 음성연산부-M(121), 여성 그룹(F)으로 구분된 음성 정보에서 추출된 특징값을 입력받아 성별 및 연령을 연산하는 음성연산부-F(122) 및 어린이 그룹(C)으로 구분된 음성 정보에서 추출된 특징값을 입력받아 성별 및 연령을 연산하는 음성연산부-C(123)로 구성될 수 있다.The voice operator 120 receives a feature value extracted from voice information divided into a male group M from the voice feature extractor 110 and calculates a gender and an age. Voice operation unit that calculates gender and age by receiving feature values extracted from voice information divided by) -F (122) and the feature value extracted from voice information divided by children group (C). It may be composed of a speech operation unit -C (123) to calculate.

더 상세히 설명하면, 음성연산부-M(121)은 조합연산부-M(121A)와 판별부-M(121B)을 포함하는데, 조합연산부-M(121A)는 남성 그룹(M)으로 구분된 음성 정보 에서 추출된 하나 이상의 특징값에 대하여 가중치를 부여하여 대표 특징값을 결정하고, 판별부-M(121B)은 그 대표 특징값을 토대로 기준 DB를 참조하여 성별 및 연령을 판별할 수 있다. 또한 조합연산부-M(121A)는 남성 그룹으로 구분된 음성 정보를 입력받아 조합 연산을 수행하므로, 전술한 바에 따라 성별 특징추출부(111) 및 연령별 특징추출부-M(112)로부터 추출된 특징값들을 입력받을 수 있다.In more detail, the voice operator-M 121 includes a combination operator-M 121A and a discriminator-M 121B, and the combination operator-M 121A includes voice information divided into a male group M. Determining the representative feature value by weighting one or more feature values extracted in the determination unit, the determination unit -M (121B) can determine the gender and age with reference to the reference DB based on the representative feature value. In addition, since the combination operation unit-M 121A receives voice information divided into male groups and performs combination operation, the feature extracted from the gender feature extractor 111 and the age-specific feature extractor-M 112 according to the above description. You can receive values.

마찬가지로 음성연산부-F(122)는, 여성 그룹(F)으로 구분된 음성 정보에서 추출된 하나 이상의 특징값에 대하여 가중치를 부여하여 대표 특징값을 결정하는 조합연산부-F(122A)와, 그 대표 특징값을 토대로 기준 DB를 참조하여 성별 및 연령을 판별하는 판별부-F(122B)를 포함한다. 전술한 바와 같이, 조합연산부-F(122A)는 성별 특징추출부(111), 연령별 특징추출부-FC(113) 및 연령별 특징추출부-F(114)로부터 추출된 특징값들을 입력받을 수 있다.Similarly, the speech computing unit -F 122, the combinational computing unit -F (122A) for determining the representative feature value by weighting one or more feature values extracted from the speech information divided into the female group (F), and the representative And a discriminating unit-F 122B for determining gender and age with reference to the reference DB based on the feature values. As described above, the combination operation unit-F 122A may receive feature values extracted from the sex feature extractor 111, the age-specific feature extractor-FC 113, and the age-specific feature extractor-F 114. .

또한 음성연산부-C(123)는, 어린이 그룹(C)으로 구분된 음성 정보에서 추출된 하나 이상의 특징값에 대하여 가중치를 부여하여 대표 특징값을 결정하는 조합연산부-C(123A)와, 그 대표 특징값을 토대로 기준 DB를 참조하여 성별 및 연령을 판별하는 판별부-C(123B)를 포함한다. 또한 전술한 바와 같이, 조합연산부-C(123A)는 성별 특징추출부(111), 연령별 특징추출부-FC(113) 및 연령별 특징추출부-C(115)로부터 추출된 특징값들을 입력받을 수 있다.In addition, the speech operation unit-C 123 may assign a weight to one or more feature values extracted from the speech information divided into the children group C to determine the representative feature value, and the representative operation unit C-123A, and the representative thereof. Determination unit -C (123B) for determining the gender and age with reference to the reference DB based on the feature value. In addition, as described above, the combination operation unit-C 123A may receive feature values extracted from the sex feature extractor 111, the age-specific feature extractor-FC 113, and the age-specific feature extractor-C 115. have.

이러한 연령 및 성별을 판별하기 위하여, GMM(Gaussian Mixture Model), NN(Neural Network), SVM(Support Vector Machine) 등의 알고리즘을 이용하여 연령 및 성별을 판별할 수 있다. 그러나 전술한 알고리즘은 예시적인 것에 불과하며, 전 술한 알고리즘 외에도 다양한 알고리즘을 이용하여 특징값으로부터 연령 및 성별을 판별할 수 있음은 물론이다.In order to determine such age and gender, the age and gender may be determined using algorithms such as Gaussian Mixture Model (GMM), Neural Network (NN), and Support Vector Machine (SVM). However, the above-described algorithm is merely an example, and in addition to the above-described algorithm, age and gender can be determined from the feature values using various algorithms.

예컨대, GMM의 알고리즘을 이용하는 경우, 각 조합연산부(121A,122A,123A)는 특징 판별 방법의 가짓수 N 또는 복수 개의 샘플의 갯수 N개에 대응하여 N개의 우도값(likelihood)을 계산하고, 그러한 N개의 우도값에서 대표값을 결정할 수 있다. 대표값을 결정하기 위하여, 조합연산부(121A,122A,123A)는 N개 우도값의 평균값을 구하거나, 최대값을 구하거나, 최소값을 구하거나, 전체값을 합산하여 대표값을 결정할 수 있다. For example, in the case of using the algorithm of the GMM, each combination calculating unit 121A, 122A, 123A calculates N likelihood values corresponding to the number N of the feature discrimination method or the number N of the plurality of samples, and such N The representative value can be determined from the likelihood values of the dogs. In order to determine the representative value, the combination calculation units 121A, 122A, and 123A may determine the representative value by obtaining an average value of N likelihood values, a maximum value, a minimum value, or summing the total values.

또한 조합연산부(121A,122A,123A)는 대표 특징값을 판별함에 있어 가중치 값을 부여하여 대표 특징값을 판별할 수 있다. 이러한 가중치 값은 상황에 따라 설정되거나 또는 경험적으로 축적된 정보를 이용하여 가중치 값을 설정할 수 있다. 예컨대, 소음이 많은 환경에 있어서는 소음 대역에 해당하는 특징값 부분은 가중치를 낮게 설정하고 일반적인 음성 대역 중에서 중간 정도의 대역에 해당하는 특징값 부분에 높은 가중치를 부여할 수 있다. 또한 각 조합연산부(121A,122A,123A)는 전술한 각 그룹(남성, 여성, 어린이)에 대하여 음성적 특징을 반영하여 각각 다르게 가중치를 부여하여 대표 특징값을 결정할 수 있다.In addition, the combination operation unit 121A, 122A, 123A may determine the representative feature value by assigning a weight value in determining the representative feature value. The weight value may be set according to a situation or a weight value may be set using empirically accumulated information. For example, in a noisy environment, the feature value portion corresponding to the noise band may be set to a low weight, and the weight value may be assigned to a feature value portion corresponding to an intermediate band in the general voice band. In addition, each combination operation unit 121A, 122A, 123A may determine the representative feature value by differently weighting each group (male, female, child) by reflecting the voice characteristic.

이상에서는 음성 정보를 남성그룹, 여성그룹 및 어린이 그룹으로 나누어 설명하였으나, 음성 정보의 특징값을 추출한 결과 그 특징값이 어느 그룹으로 구분하기 모호한 경우에는 그 음성 정보를 각 그룹에 중복하여 적용하는 것이 바람직하다. 즉, 그룹을 구분하기 모호한 음성 정보의 경우에는 해당하는 각 그룹에 대하여 연산을 각각 적용한 후, 각 판별부의 결과간 유사도나 정상 판별 확률, 신뢰도 등을 고려하여 최종적으로 최종 판별부(300)를 통하여 연령 및 성별을 결정한다.In the above description, the voice information is divided into a male group, a female group, and a child group. However, when the feature value of the voice information is extracted, and the feature value is ambiguous, the overlapping of the voice information is applied to each group. desirable. That is, in the case of ambiguity of speech information for classifying groups, arithmetic operation is applied to each corresponding group, and finally, through the final discriminating unit 300 in consideration of the similarity, the normal discrimination probability, the reliability, etc. between the results of each discriminating unit. Determine your age and gender.

도 3은 도 1에 따른 영상 처리부의 세부구성도이다.FIG. 3 is a detailed configuration diagram of the image processor of FIG. 1.

도 3에 도시된 바와 같이, 영상 처리부(200)는 영상 정보에서 특징값을 추출하는 영상 특징추출부(210) 및 그 추출된 특징값에서 성별 및 연령을 연산하는 영상연산부(220)를 포함하여 구성될 수 있다.As illustrated in FIG. 3, the image processor 200 includes an image feature extractor 210 extracting a feature value from image information and an image calculator 220 calculating a gender and an age from the extracted feature value. Can be configured.

영상 특징추출부(210)는 영상 정보를 입력받아 특징값을 추출할 수 있다. 이러한 영상 특징추출부(210)는 다시 연령별 특징추출부(211), 연령별 특징추출부-C(212), 성별 특징추출부-C(213), 성별 특징추출부-A(214), 연령별 특징추출부-M(215) 및 연령별 특징추출부-F(216)를 포함할 수 있다.The image feature extractor 210 may receive image information and extract feature values. The image feature extractor 210 again includes an age-specific feature extractor 211, an age-specific feature extractor-C 212, a sex feature extractor-C 213, a sex feature extractor-A 214, and an age-specific feature. The extractor-M 215 and the age-specific feature extractor-F 216 may be included.

연령별 특징추출부(211)는 입력된 얼굴 정보에 대하여 연령별 특징을 반영하여 특징값을 추출하고, 그 특징값을 기준으로 입력된 얼굴 정보를 성인 그룹(A) 또는 어린이 그룹(C)으로 구분한다. 예컨대, 얼굴 정보의 경우 얼굴 대비 눈의 크기, 눈가 주름의 여부 등을 기초로 성인과 어린이를 구분하기가 용이하며, 이러한 연령별 특징을 반영하여 연령별 특징추출부(211)는 입력된 얼굴정보에 대하여 특징값을 추출할 수 있는 것이다.The age-specific feature extracting unit 211 extracts feature values by reflecting age-specific features with respect to the input face information, and divides the input face information into an adult group A or a child group C based on the feature values. . For example, in the case of face information, it is easy to distinguish an adult from a child based on eye size, eye wrinkles, etc., and the age-specific feature extracting unit 211 reflects the age-specific features. Feature values can be extracted.

연령별 특징추출부(211)에 의하여 어린이 그룹(C)으로 구분된 얼굴 정보에 대하여, 연령별 특징추출부-C(212)는 어린이의 연령별 특징을 반영하여 특징값을 추출하고, 성별 특징추출부-C(213)는 어린이의 성별 특징을 반영하여 특징값을 추출한다. With respect to the face information divided into the child group C by the age-specific feature extractor 211, the age-specific feature extractor-C 212 extracts feature values by reflecting the age-specific features of the child, and the sex feature extractor- C 213 extracts feature values by reflecting the gender characteristics of the child.

성별 특징추출부-A(214)는 연령별 특징추출부(211)에 의하여 성인 그룹(A)으로 구분된 얼굴 정보에 대하여, 성인의 성별 특징을 반영하여 특징값을 추출할 수 있고, 그 특징값을 기준으로 입력된 얼굴 정보를 남성 그룹(M)과 여성 그룹(F)으로 구분한다.The sex feature extractor-A 214 may extract a feature value by reflecting the sex feature of an adult with respect to face information classified into the adult group A by the feature extractor 211 for each age, and the feature value The face information input based on the data is divided into a male group M and a female group F. FIG.

연령별 특징추출부-M(215)는 성별 특징추출부-A(214)에 의하여 남성 그룹(M)으로 구분된 얼굴 정보에 대하여, 남성의 연령별 특징을 반영하여 특징값을 추출한다. 또한 연령별 특징추출부-F(216)는 성별 특징추출부-A(214)에 의하여 여성 그룹(F)으로 구분된 얼굴 정보에 대하여, 여성의 연령별 특징을 반영하여 특징값을 추출한다.The age-specific feature extractor-M 215 extracts feature values by reflecting the age-specific features of men with respect to face information classified into the male group M by the gender feature extractor-A 214. In addition, the age-specific feature extractor-F 216 extracts feature values by reflecting the age-specific features of women with respect to face information classified into the female group F by the gender feature extractor-A 214.

영상연산부(220)는 이와같이 영상 특징추출부(210)에 의하여 추출된 특징값을 이용하여 영상 정보로부터 성별 및 연령을 판별한다.The image operator 220 determines gender and age from the image information by using the feature values extracted by the image feature extractor 210.

즉, 영상연산부(220)는 영상 특징추출부(210)에서 추출된 하나 이상의 특징값에 대하여 가중치를 반영하여 대표 특징값을 결정하는 조합연산부 및 대표 특징값을 토대로 기준 DB를 참조하여 성별 및 연령을 판별하는 판별부로 구성될 수 있다. That is, the image operator 220 refers to the reference DB based on the combinational operator and the representative feature value to determine the representative feature value by reflecting the weight of one or more feature values extracted by the image feature extractor 210, and the gender and age. It may be configured to determine the determining unit.

또한 영상연산부(220)는, 도 3에 도시된 바와 같이, 전술한 바와 같이 영상 특징추출부(210)에 의해 그룹화된 남성 그룹(M), 여성 그룹(F) 및 어린이 그룹(C)에 대하여 각각 최적화된 조합연산부 및 판별부를 가지도록 구성될 수 있다. 즉, 영상연산부(220)는 남성 그룹으로 구분된 영상 정보에서 추출된 특징값을 입력받아 연령 및 성별을 연산하는 영상연산부-M(221), 여성 그룹으로 구분된 음성 정보에서 추출된 특징값을 입력받아 연령 및 성별을 연산하는 영상연산부-F(222) 및 어린이 그룹으로 구분된 음성 정보에서 추출된 특징값을 입력받아 연령 및 성별을 연산하는 영상연산부-C(223)으로 구성될 수 있다. Also, as illustrated in FIG. 3, the image calculation unit 220 includes a male group M, a female group F, and a children group C grouped by the image feature extracting unit 210 as described above. Each may be configured to have an optimized combination operation unit and discrimination unit. That is, the image calculation unit 220 receives the feature values extracted from the image information divided into the male group, and calculates the age and gender. It may be configured as an image operator-F (222) for receiving age and gender and an image operator-C (223) for calculating age and gender by receiving feature values extracted from voice information divided into child groups.

영상연산부-M(221)은, 남성 그룹(M)으로 구분된 얼굴 정보에서 추출된 하나 이상의 특징값을 입력받아 대표 특징값을 결정하는 조합연산부-M(222A)와, 그 대표 특징값을 토대로 기준 DB를 참조하여 성별 및 연령을 판별하는 판별부-M(222B)를 포함할 수 있다. 조합연산부-M(222A)은 남성 그룹(M)으로 구분된 얼굴 정보에서 추출된 특징값을 입력받으므로, 연령별 특징추출부(211), 성별 특징추출부-A(214) 및 연령별 특징추출부-M(215)에서 추출된 특징값들을 입력받아 대표 특징값을 판별할 수 있다.The image calculating unit-M 221 receives the one or more feature values extracted from the face information divided into the male group M, and determines the representative feature value based on the combination calculating unit-M 222A and the representative feature value. It may include a determination unit -M (222B) for determining the gender and age with reference to the reference DB. Combination operation unit-M (222A) receives the feature value extracted from the face information divided into the male group (M), age-specific feature extractor 211, sex feature extractor-A (214) and age-specific feature extractor The representative feature value may be determined by receiving the feature values extracted by the -M 215.

영상연산부-F(222)은, 여성 그룹(F)으로 구분된 얼굴 정보에서 추출된 특징값들을 입력받아 대표 특징값을 결정하는 조합연산부-F(223A)와, 그 대표 특징값과 기준 DB를 이용하여 성별 및 연령을 판별하는 판별부-F(223B)를 포함할 수 있다.The image calculating unit-F 222 receives a combination of the feature values extracted from the face information divided into the female group F, and combines the calculating unit-F 223A and the representative feature value and the reference DB. Determination unit -F (223B) for determining the gender and age may be used.

이 경우, 조합연산부-F(223A)은, 여성 그룹(F)으로 구분된 얼굴 정보를 대상으로 하므로, 연령별 특징추출부(211), 성별 특징추출부-A(214) 및 연령별 특징추출부-F(216)에서 추출된 특징값들을 입력받아 대표 특징값을 판별할 수 있다.In this case, the combination operation unit -F (223A) targets the face information divided into the female group (F), so the age-specific feature extraction unit 211, sex feature extraction unit -A (214) and age-specific feature extraction unit- Representative feature values may be determined by receiving feature values extracted at F 216.

영상연산부-C(223)는, 어린이 그룹(C)으로 구분된 얼굴 정보에서 추출된 특징값들을 입력받아 대표 특징값을 결정하는 조합연산부-C(221A)와, 그 대표 특징값과 기준 DB를 이용하여 성별 및 연령을 판별하는 판별부-C(221B)를 포함할 수 있다. 이 경우, 조합연산부-C(221A)는, 어린이 그룹(C)으로 구분된 얼굴 정보에서 추 출된 특징값을 입력받으므로 연령별 특징추출부(211), 연령별 특징추출부-C(212) 및 성별 특징추출부-C(213)에서 추출된 특징값들을 입력받아 대표 특징값을 판별할 수 있다.The image calculating unit-C 223 receives the combination of the feature values extracted from the face information divided into the children group C, and determines the representative calculating unit-C 221A, and the representative feature value and the reference DB. Determination unit -C (221B) for determining the gender and age may be used. In this case, the combination calculation unit-C 221A receives the feature values extracted from the face information divided into the child group C, and thus the age-specific feature extraction unit 211, the age-specific feature extraction unit-C 212, and the sexes. Representative feature values may be determined by receiving feature values extracted by the feature extractor-C 213.

각 판별부(221B, 222B, 223B)는 전술한 각 조합연산부(221A,2 22A, 223A)로부터 대표 특징값을 입력받아 기준 DB를 참조하여 성별 및 연령을 연산할 수 있다. 이에 대한 구체적인 설명은 음성연산부(120)를 참조하여 전술한 바와 유사하므로, 더 이상의 상세한 설명은 생략한다.Each determination unit 221B, 222B, and 223B may receive a representative feature value from each combination operation unit 221A, 2 22A, and 223A described above to calculate a gender and an age with reference to the reference DB. Since a detailed description thereof is similar to that described above with reference to the voice operator 120, a detailed description thereof will be omitted.

또한 이러한 영상 처리부(200)를 이용하여 연령 및 성별을 연산하는 경우에도, 전술한 바와 마찬가지로, 얼굴 정보가 남성 그룹(M), 여성 그룹(F) 및 어린이 그룹(C) 중 어느 한 그룹으로 구분하기 모호한 경우에는, 그 얼굴 정보를 각 그룹에 중복하여 적용할 수 있다. In addition, even when calculating the age and gender using the image processing unit 200, as described above, the face information is divided into any one group of the male group (M), female group (F) and children group (C). In the case of the following ambiguity, the face information can be applied to each group in duplicate.

이하에서는 최종 판별부(300)에 대하여 상세히 설명한다.Hereinafter, the final determination unit 300 will be described in detail.

최종 판별부(300)는 판별부(121B, 122B, 123B, 221B, 222B, 223B) 중 일부 또는 전부에서 출력된 성별 및 연령들을 입력받아, 그 성별 및 연령들을 조합연산 하여 최종적인 성별 및 연령을 판별할 수 있다. The final determining unit 300 receives genders and ages output from some or all of the determining units 121B, 122B, 123B, 221B, 222B, and 223B, and combines the genders and ages to determine the final gender and age. Can be determined.

즉, 입력받은 복수개의 성별 및 연령들에 대하여 각각 상호 유사도를 계산하고, 상호 유사도가 가장 높은 성별 및 연령을 최종 성별 및 연령으로 결정할 수 있다. 또는 입력받은 복수 개의 성별 및 연령들에 대한 정상 판별 확률이나, 신뢰도 지수를 매 판별시마다 파악하여 저장하여 두고 이를 이용하여 최종 성별 및 연령으로 결정할 수 있다.That is, mutual similarity may be calculated for each of a plurality of input genders and ages, and a gender and age having the highest mutual similarity may be determined as final gender and age. Alternatively, the normal discrimination probability or the reliability index of the plurality of input genders and ages may be identified and stored at each determination, and determined as the final gender and age.

이러한 최종 판별부(300)는 음성 처리부(100)에서 출력된 성별 및 연령들에 대하여 상호 유사도를 이용하여 성별 및 연령을 판별하고, 또한 영상 처리부(200)에서 출력된 성별 및 연령들에 대하여 상호 유사도를 이용하여 성별 및 연령을 판별한 후, 두 판별된 성별 및 연령을 이용하여 최종적인 성별 및 연령을 판별하여 출력하도록 실시 할 수 있다. The final determining unit 300 determines the gender and age using the mutual similarity with respect to the gender and ages output from the voice processing unit 100, and also mutually with respect to the gender and ages output from the image processing unit 200. After determining gender and age using similarity, the final gender and age may be determined and output using the two determined gender and age.

또는 최종 판별부(300)는 음성 처리부(100) 및 영상 처리부(200)에서 출력된 성별 및 연령의 판별 결과들 전체에 대하여 상호 유사도를 이용하여 최종적인 성별 및 연령을 판별하여 출력하도록 실시할 수도 있다.Alternatively, the final determination unit 300 may be configured to determine and output the final gender and age by using mutual similarity with respect to all gender and age determination results output from the voice processing unit 100 and the image processing unit 200. have.

이하에서는 기준 DB에 대하여 상세히 설명한다.Hereinafter, the reference DB will be described in detail.

기준 DB는 성별 및 연령별 기준 특징값 또는 음성 및 영상 기준 샘플을 저장하고 있는데, 얼굴 정보 또는 음성 정보에서 추출된 특징값과, 그 특징값에 대한 성별 및 연령의 관계 모델로서 구성될 수 있다.The reference DB stores gender and age reference feature values or voice and image reference samples. The reference DB may be configured as a relationship model of feature values extracted from face information or voice information and gender and age with respect to the feature values.

이러한 기준 DB에 저장된 특징값-성별 및 연령 대응 관계를 이용하여, 음성연산부(120) 또는 영상연산부(220)는 전술한 대표 특징값을 토대로 기준 DB를 참조하여 성별 및 연령을 획득할 수 있다. 예컨대, 판별부는 대표 특징값과 기준 DB의 관계 모델 간의 거리값을 이용하여 성별 및 연령을 판별할 수 있다. Using the feature value-gender and age correspondence relationships stored in the reference DB, the voice operator 120 or the image operator 220 may obtain a gender and an age with reference to the reference feature DB. For example, the determination unit may determine gender and age by using a distance value between the representative feature value and the relationship model of the reference DB.

또한 기준 DB는 특징값을 원활히 추출하기 어려운 경우 등에 있어서 영상 또는 음성 정보를 직접 이용하여 성별 및 연령을 판별할 수 있도록 영상 데이터 및 음성 데이터와 그에 대응되는 성별과 연령을 포함하여 구성될 수 있다.In addition, the reference DB may include image data and audio data and corresponding gender and age so that gender and age can be directly determined by using video or audio information in a case where it is difficult to extract feature values smoothly.

기준 DB에 포함된 영상 데이터는,　예컨대 카메라와 사람을 각각 0.5m, 1m, 3m의 거리만큼 이격시켜 획득할 수 있다. 이때, 이격거리가 3m의 경우는 사람의 몸 전체가 모두 포함되도록 찍는다. 이러한 영상 데이터는 10초간 100 frame이 되도록 촬영할 수 있다. 이와 같이 촬영된 영상에 대하여 얼굴 검출기, 키 검출기, 눈 검출기 등을 이용하여 각각의 피사체 사람의 얼굴, 머리모양, 콧수염, 눈썹 모양등을 취득하여 세부 DB를 구성할 수 있다. 이렇게 구성된 세부 DB를 이용하여 특징값을 판별하도록 본 발명을 실시할 수 있다.The image data included in the reference DB may be acquired by, for example, separating a camera and a person by a distance of 0.5 m, 1 m, and 3 m, respectively. At this time, if the separation distance is 3m to shoot all the body of the person included. Such image data may be photographed to be 100 frames for 10 seconds. The detailed DB may be configured by acquiring a face, a head shape, a mustache, a shape of an eyebrow, and the like of each subject using a face detector, a key detector, an eye detector, and the like on the photographed image. The present invention can be practiced to determine feature values using the detailed DB configured in this way.

기준 DB에 포함된 음성 데이터의 경우에는, 예컨대 사전에 준비된 50개의 문장을 3회 반복 발성하여 얻을 수 있다.이러한 음성 데이터는 16kHz, 16bit, mono 타입 등의 다양한 형태를 가질 수 있다. In the case of voice data included in the reference DB, for example, 50 sentences prepared in advance may be obtained by repeating three times. Such voice data may have various forms such as 16 kHz, 16 bit, and mono type.

이러한 기준 DB는 그 표본성을 갖추기 위하여, 예컨대 120명을 대상으로 데이터를 구성할 수 있으며, 이때 전체 남녀 성비는 1:1이 되도록하고, 각 연령대에 대한 비율도 1:1이 되도록 구성할 수 있다.In order to have a sample, such a reference DB may be configured with data for 120 people, wherein the gender ratio of males and females is 1: 1 and the ratio for each age group is 1: 1. have.

기준 DB는 학습 능력을 보유하여, 본 발명의 실시예에 따라서 성별-연령에 대한 연산이 이루어지면, 그 연산의 결과값(그 연산의 대표 특징값과 최종적인 성별 및 연령)을 현재 구성하고 있는 데이터에 반영하여 DB를 재구성(갱신)하여 신뢰도를 지속적으로 향상할 수 있도록 함이 바람직하다. 물론 DB 갱신에 활용되는 결과값은 신뢰성이 확인된 결과값이어야 함은 당연하다.The reference DB possesses learning ability, and when an operation is performed on gender-age according to an embodiment of the present invention, the resultant value of the operation (representative characteristic value of the operation and the final gender and age) is presently constituted. It is desirable to reconstruct (update) the DB to reflect the data so that the reliability can be continuously improved. Of course, the result value used for updating the database should be a result of which the reliability is confirmed.

도 4는 본 발명에 따른 성별-연령 식별방법의 순서도이다.4 is a flowchart of a gender-age identification method according to the present invention.

입력부(10)는 성별 및 연령을 식별하려는 특정인의 얼굴 정보 및 음성 정보를 수집한다(S100). The input unit 10 collects face information and voice information of a specific person to identify gender and age (S100).

수집된 음성 정보로부터, 음성 처리부(100)가 연령별 특징 및 성별 특징을 반영하여 특징값을 추출하고, 추출된 하나 이상의 특징값에 대하여 대표 특징값을 판별한다. 그리고 그 대표 특징값을 기준 DB에 질의하여 성별 및 연령을 판별한다(S200). From the collected voice information, the voice processor 100 extracts feature values by reflecting age-specific features and gender features, and determines representative feature values with respect to the extracted one or more feature values. The representative feature value is queried from the reference DB to determine gender and age (S200).

아울러, 영상 처리부(200)가 얼굴 정보에 대하여 연령별 특징 및 성별 특징을 반영하여 특징값을 추출하고, 추출된 하나 이상의 특징값에 대하여 대표 특징값을 판별한다. 그리고 그 대표 특징값을 기준 DB에 질의하여 성별 및 연령을 판별한다(S300).In addition, the image processor 200 extracts a feature value by reflecting age-specific features and gender features with respect to face information, and determines a representative feature value with respect to the extracted one or more feature values. The representative feature value is queried in the reference DB to determine gender and age (S300).

최종 판별부(300)는 단계(S200) 및 단계(S300) 의하여 판별된 적어도 하나의 성별 및 연령에 대하여 상호 유사도 또는 확률을 고려하여 최종적으로 성별 및 연령을 판별한다(S400). The final determination unit 300 finally determines the gender and age in consideration of mutual similarity or probability with respect to at least one gender and age determined by the step S200 and the step S300 (S400).

이하에서는 도 5을 참조하여, 도 4의 음성으로부터 성별과 연령을 판별하는 단계(S200)에 대하여 상세히 살펴본다. Hereinafter, referring to FIG. 5, the step of determining gender and age from the voice of FIG. 4 will be described in detail.

일반적으로 여성의 음성 정보와 어린이의 음성 정보는 유사하여 구별이 쉽지 않으나, 여성 및 어린이의 음성 정보와 남성의 음성 정보는 구별이 용이하다는 점에 착안하여 음성신호에 대하여 성별 특징을 우선적으로 반영하여 특징값을 추출하여 남성과 여성 및 어린이 그룹을 분류한다(S210). In general, it is difficult to distinguish between female voice information and children's voice information, but it is easy to distinguish between female and child voice information and male voice information. The feature values are extracted to classify the male, female and children groups (S210).

이와 같이 음성 정보에 대해서는 성별 특징을 우선적으로 반영하는 것은 성별 특징적 차이가 음성 정보에서는 크게 나타나는 점을 이용한 것으로, 이를 통하여 연산을 빠르고 효율적으로 수행할 수 있게 된다.As described above, the gender characteristic is first applied to the speech information by using the fact that the gender characteristic difference is large in the speech information, and thus, the calculation can be performed quickly and efficiently.

분류 결과에 따라, 입력된 음성 정보를 남성 그룹 또는 여성 및 어린이 그룹으로 구별하여, 남성 그룹으로 분류된 음성 정보에 대하여, 남성의 연령별 특징을 반영한 하나 이상의 연령별 특징값을 추출한다(S220).According to the classification result, the input voice information is classified into a male group or a female and a child group, and at least one characteristic value of each age reflecting an age characteristic of a male is extracted from the speech information classified into a male group (S220).

또한, 여성 및 어린이 그룹으로 분류된 음성 정보에 대해서는, 그 음성 정보가 여성 그룹인지 또는 어린이 그룹인지를 구별할 수 있도록, 여성 및 어린이의 연령별 특징을 반영한 연령별 특징값을 추출하고, 여성과 어린이를 구별한다(S230).Also, with respect to the voice information classified into the female and children group, the age-specific feature values reflecting the age-specific characteristics of the female and the child are extracted to distinguish whether the voice information is the female group or the children's group, It distinguishes (S230).

그 후, 여성 그룹으로 구별된 음성 정보에 대하여 여성의 연령별 특징을 반영한 연령별 특징추출을 수행한다(S240). Thereafter, age-specific feature extraction reflecting age-specific features of women is performed with respect to voice information divided into female groups (S240).

또한, 어린이 그룹으로 구별된 음성 정보에 대해서는 어린이의 성별 및 연령별 특징 추출을 수행한다(S250).In addition, the voice information distinguished by the child group is performed to extract the characteristics of each child's gender and age (S250).

이와 같이 추출된 특징값들에 대해서, 그 음성 정보에 대한 대표 특징값을 결정하고, 대상자의 성별 및 연령을 판별한다.For the feature values extracted in this way, the representative feature value for the voice information is determined, and the gender and age of the subject are determined.

예컨대, 음성연산부(120)가 음성 특징추출부(110)에 의하여 추출된 하나 이상의 특징값에 대하여 대표 특징값을 결정하고, 그 결정된 대표 특징값을 기초로 기준 DB를 이용하여 성별 및 연령을 판별할 수 있는데, 대표 특징값의 결정 또는 성별 및 연령의 판별은, 전술한 바와 같이 남성그룹, 여성그룹 및 어린이 그룹별로 각각 수행됨이 바람직하다. For example, the voice operator 120 determines a representative feature value with respect to one or more feature values extracted by the voice feature extractor 110, and determines gender and age using a reference DB based on the determined representative feature value. The determination of the representative feature value or the determination of the gender and age may be performed for each of the male group, the female group and the child group as described above.

즉, 남성 그룹으로 구분된 음성 정보의 특징값들에 대하여 음성 연산-M을 수행하거나(S225), 여성 그룹으로 구분된 음성 정보의 특징값들에 대하여 음성 연산-F를 수행하거나(S245), 어린이 그룹으로 구분된 음성 정보의 특징값들에 대하여 음 성 연산-C를 수행하여(S255) 성별 및 연령을 판별할 수 있다.That is, the voice operation-M is performed on the feature values of the voice information divided into male groups (S225), the voice operation-F is performed on the feature values of the voice information divided into female groups (S245), The gender and age may be determined by performing speech operation-C on the feature values of the speech information divided into child groups (S255).

전술한 바와 같이 구별이 용이한 특징(예컨대, 음성 정보는 성별에 따른 특징)을 기준으로 일차적으로 음성 정보를 그룹화하고, 그 기준에 따라 구별된 각 그룹에 대하여 각 그룹별의 특징을 반영하여 특징값을 추출하는 방법을 이용하는 것은 본 발명의 큰 특징 중의 하나이다. 이러한 단계적 추출방법을 이용함으로써, 본 발명은 식별의 정확성을 확보할 수 있으며 또한 연산의 중복성을 배제하여 빠르게 대상자의 연령 및 성별을 판별할 수 있다.As described above, voice information is first grouped on the basis of easily distinguishable features (eg, voice information is gender-specific), and the characteristics of each group are reflected for each group classified according to the criteria. Using a method of extracting values is one of the great features of the present invention. By using this stepwise extraction method, the present invention can secure the accuracy of identification and can quickly determine the age and gender of the subject by excluding redundancy of operations.

이하에서는 도 6을 참조하여, 도 4의 영상에 의하여 성별과 연령을 판별하는 단계(S300)에 대하여 상세히 살펴본다. Hereinafter, referring to FIG. 6, the step of determining gender and age based on the image of FIG. 4 will be described in detail.

영상 정보의 경우에는, 일반적으로 성인과 어린이를 구별하는 것이 용이하다. 예컨대, 키와 같은 생체정보를 이용하거나, 얼굴 크기 대비 이목구비의 크기 비율 등을 이용하여 성인과 어린이를 쉽게 구별할 수 있다. In the case of video information, it is generally easy to distinguish between adults and children. For example, it is possible to easily distinguish between adult and child by using biometric information such as height or by using a ratio of the size of the face to the size of the eye.

이러한 점을 이용하여, 본 발명의 영상 유사도 판별단계에서는 일차적으로 입력받은 영상 정보(얼굴 정보 또는 얼굴 정보를 포함하는 영상 정보. 이하 '얼굴 정보' 라 칭함)에 대하여 상기의 연령별 특징을 고려한 특징값 추출을 수행한다(S310). 이러한 단계를 통하여 입력 얼굴 정보는 어린이 그룹과 성인 그룹으로 쉽게 구분될 수 있다.Using this point, in the image similarity determining step of the present invention, a feature value considering the characteristics of each age with respect to the image information (the image information including the face information or the face information. Extraction is performed (S310). Through this step, the input face information can be easily divided into a child group and an adult group.

그 후, 어린이 그룹으로 구분된 얼굴 정보에 대하여 어린이의 연령별 특징을 고려한 연령별 특징을 추출하고(S320), 어린이의 성별 특징을 고려한 성별 특징 추출을 수행한다(S330).Thereafter, age-specific features in consideration of age-specific features of the children are extracted with respect to face information divided into children's groups (S320), and gender feature extraction considering the sex features of the children is performed (S330).

성인 그룹으로 구분된 얼굴 정보에 대하여, 성인의 성별 특징을 고려한 성별 특징추출을 수행하여 성인 그룹의 얼굴 정보를 남성 그룹 또는 여성 그룹으로 구별한다(340).With respect to face information divided into adult groups, gender feature extraction is performed in consideration of adult gender characteristics to distinguish face information of an adult group into a male group or a female group (340).

그 후, 남성 그룹으로 구분된 얼굴 정보에 대하여 남성의 연령별 특징을 고려한 특징추출 방법을 이용하여 하나 이상의 특징값을 추출하고(S350), 여성 그룹으로 구분된 얼굴 정보에 대하여는 여성의 연령별 특징을 고려하여 하나 이상의 특징값을 추출한다(S360).Thereafter, at least one feature value is extracted by using a feature extraction method that takes into account the age-specific characteristics of men with respect to the face information divided by the male group (S350). One or more feature values are extracted (S360).

영상연산부(220)는 전술한 바와 같이 영상 특징추출부(210)에 의하여 추출된 특징값들에 대하여 가중치를 반영하여 대표 특징값을 결정하고, 그 결정된 대표 특징값 및 기준 DB를 이용하여 성별 및 연령을 판별한다. 이러한 영상 정보에 의한 성별 및 연령 판별은, 도 6에 도시된 바와 같이, 어린이 그룹, 남성 그룹 및 여성그룹별로 각각 수행되는 것이 바람직하다(S325, S355, S365). As described above, the image calculator 220 determines the representative feature value by reflecting the weights of the feature values extracted by the image feature extractor 210, and uses the determined representative feature value and the reference DB to determine the gender and the gender. Determine your age. As shown in FIG. 6, gender and age discrimination based on the image information is preferably performed for each of a child group, a male group, and a female group (S325, S355, and S365).

이상, 본 발명에 대하여 첨부 도면을 참조하여 상세히 설명하였으나, 이는 예시에 불과한 것으로서 본 발명의 기술적 사상의 범위 내에서 다양한 변형과 변경이 가능함은 자명하다. 따라서 본 발명의 보호 범위는, 전술한 실시예에 국한되서는 아니되며 이하의 특허청구범위의 기재에 의한 범위 및 그와 균등한 범위를 포함하여 정하여져야 할 것이다.As mentioned above, although this invention was demonstrated in detail with reference to attached drawing, this is only an illustration, It is clear that various deformation | transformation and a change are possible within the scope of the technical idea of this invention. Therefore, the protection scope of the present invention should not be limited to the above-described embodiment, but should be determined to include the scope according to the description of the following claims and their equivalents.

도 1은 본 발명에 따른 성별-연령 식별장치의 일실시예에 대한 구성도.1 is a block diagram of an embodiment of a gender-age identification apparatus according to the present invention.

도 2는 도 1에 따른 음성 처리부의 세부 구성도.FIG. 2 is a detailed configuration diagram of the voice processor of FIG. 1. FIG.

도 3은 도 1에 따른 영상 처리부의 세부구성도. 3 is a detailed configuration diagram of the image processor of FIG. 1.

도 4는 본 발명에 따른 성별-연령 식별방법의 순서도.4 is a flowchart of a gender-age identification method according to the present invention.

도 5은 도 4의 음성 유사도 판별단계에 대한 세부 순서도.5 is a detailed flowchart of the voice similarity determining step of FIG. 4.

도 6은 도 4의 영상 유사도 판별단계에 대한 세부 순서도.6 is a detailed flowchart of the image similarity determining step of FIG. 4.

Claims

Collecting image information and audio information;

Extracting one or more feature values with respect to the collected voice information, and determining gender and age using voice information for determining gender and age using the extracted feature values;

Extracting one or more feature values with respect to the collected image information, and determining gender and age using face information for determining gender and age using the extracted feature values; And

And finally determining the gender and age by combining a gender and age determined using the voice information with a gender and age determined using the face information.

The final decision on gender and age is

Calculating mutual similarity for each of sex and age determined using at least one voice information and gender and age determined using at least one image information; And

Determining the gender and age having the highest mutual similarity as the final gender and age.

Human gender-age identification method.

The method of claim 1, wherein the determining of gender and age using the voice information comprises:

A first gender feature extraction step of extracting a feature value by reflecting the gender feature of the voice with respect to the collected input voice information;

A first age feature extraction step of extracting a feature value by reflecting the age-specific features of a male with respect to the voice information divided into male groups by the first gender feature extraction step; And

A second age feature extraction step of extracting feature values by reflecting age-specific features of women and children with respect to the voice information divided into women and children groups by the first sex feature extraction step;

It comprises a gender-age identification method.

The method of claim 2, wherein the determining of gender and age using the voice information comprises:

A third age-specific feature extraction step of extracting feature values by reflecting the age-specific features of women with respect to the voice information divided into the female group by the second age-specific feature extraction step; And

A second sex feature extraction step of extracting a feature value of the voice information divided into child groups by the second age feature extraction step by reflecting a gender feature of a child;

It further comprises a sex-age identification method.

The method of claim 2 or 3, wherein the feature value,

A gender-age identification method in which M samples are extracted by applying different N feature value discrimination methods.

A representative feature value determining step of determining a representative feature value by applying a weight to the extracted one or more feature values; And

A discriminating step of determining the gender and age with reference to the reference DB based on the representative feature value;

It comprises a gender-age identification method.

The method of claim 5, wherein the representative feature value determination step and the determination step,

Carried out separately for the male, female and child groups

Human gender-age identification method.

The method of claim 5, wherein the representative feature value determination step,

Determining any one of an average value, a maximum value, a minimum value, and a sum of one or more of the weighted feature values as the representative feature value;

It comprises a gender-age identification method.

The method of claim 1, wherein the determining of gender and age using the image information comprises:

A first feature extraction step of extracting feature values by reflecting features of each age with respect to the collected image information; And

A second feature extraction step of classifying adult and child according to the result of the first feature extraction step, and then classifying the group by male, female and child groups to extract one or more feature values for each group;

It comprises a gender-age identification method.

The method of claim 8, wherein the determining of gender and age using the image information comprises:

And determining the gender and age by using a reference DB based on the representative feature value.

The representative feature value determining step and the calculating step are performed for each male group, female group and children group

It comprises a gender-age identification method.

delete

The method of claim 5 or 9, wherein the reference DB,

Includes feature values by gender and age, and continuously reconfigures to reflect feature values verified for reliability by gender and age

Human gender-age identification method.

An input unit for collecting image information and audio information;

A voice processor extracting a feature value with respect to the collected voice information and determining a gender and an age from the voice information using the extracted feature value;

An image processor extracting a feature value with respect to the collected image information and determining a gender and an age from the image information using the extracted feature value; And

And a final determination unit which finally determines the gender and age of a specific person by combining a gender and age determined by the image processor and the gender and age determined by the voice processor,

The final determination unit,

Calculating mutual similarity for each of at least one age and gender determined by the voice processor or the image processor, and finally determining the gender and age having the highest mutual similarity;

Gender-age identification device.

The method of claim 12, wherein the voice processing unit,

A voice feature extracting unit for extracting feature values by reflecting gender characteristics or age-specific features of the voice with respect to the collected voice information; And

A voice operator extracting a representative feature value from the feature value extracted by the voice feature extractor and determining the gender and age using the representative feature value.

Gender-age identification device comprising a.

The method of claim 13, wherein the voice feature extraction unit,

First determining whether the collected voice is a male voice, and then extracting one or more feature values for each male, female, and child group

Gender-age identification device.

The method of claim 14, wherein the voice calculation unit,

Determining representative feature values for each of the male, female and child groups and determining gender and age using the representative feature values

Gender-age identification device.

The method of claim 12, wherein the image processor,

An image feature extracting unit configured to extract a feature value by reflecting gender characteristics or age-specific features of the image with respect to the collected image information; And

An image calculator extracts a representative feature value from the feature value extracted by the image feature extractor, and determines the gender and age using the representative feature value.

Gender-age identification device comprising a.

The method of claim 16, wherein the image feature extraction unit,

First determining whether the collected image is an adult or a child, and then extracting one or more feature values for each male, female, and child group

Gender-age identification device.

The method of claim 17, wherein the image operation unit,

Gender-age identification device.

delete