KR100229538B1

KR100229538B1 - Apparatus and method for encoding a facial movement

Info

Publication number: KR100229538B1
Application number: KR1019960072671A
Authority: KR
Inventors: 이민섭
Original assignee: 전주범; 대우전자주식회사
Priority date: 1996-12-27
Filing date: 1996-12-27
Publication date: 1999-11-15
Also published as: GB2320839A; KR19980053565A; JPH10215452A; GB9726058D0

Abstract

본 발명은 3차원(3D) 모델 기반형 코딩 시스템에서 음성 신호 및 2차원(2D) 이미지 신호에 근거하여 새로운 얼굴 이미지의 얼굴 움직임을 인코딩(encoding)하기 위한 방법에 관한 것이다. 적응성 3D 모델(adaptive 3D model)은 인간의 공통 얼굴의 기본 3D 모델에 근거한 새로운 얼굴의 초기 데이터로부터 발생되고, 2D 이미지 신호에 대한 기본 패턴은 2D 이미지 신호와 적응성 3D 모델 사이의 회전 상관관계에 의해 발생된다. 2D 이미지 신호로부터 새로운 얼굴의 하나 이상의 특징 영역들은 기본 패턴과 비교되어 다수의 변환 파라미터들이 검출된다. 변환 파라미터들은 음성 신호에 근거하여 변경된다.The present invention relates to a method for encoding a facial motion of a new face image based on a speech signal and a two-dimensional (2D) image signal in a three-dimensional (3D) model-based coding system. The adaptive 3D model is generated from the initial data of the new face based on the basic 3D model of the human common face, and the basic pattern for the 2D image signal is based on the rotational correlation between the 2D image signal and the adaptive 3D model. Is generated. One or more feature areas of the new face from the 2D image signal are compared with the base pattern to detect a number of conversion parameters. The conversion parameters are changed based on the speech signal.

Description

Method and apparatus for encoding facial movements {APPARATUS AND METHOD FOR ENCODING A FACIAL MOVEMENT}

본 발명은 움직임 물체를 인코딩(encoding)하기 위한 방법 및 장치에 관한 것으로서, 보다 상세하게는 3차원 얼굴 모델을 사용하여 얼굴 움직임을 인코딩 및 디코딩(decoding)하는 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for encoding a moving object, and more particularly, to a method and apparatus for encoding and decoding facial motion using a three-dimensional face model.

비디오 전화, 원격 화상 회의 및 고선명 텔레비젼 시스템과 같은 디지털 방송 시스템에 있어서, 비디오 프레임 신호 내의 비디오 라인 신호(line signal)는 픽셀값으로서 지칭되는 디지털 데이터의 시퀀스(sequence)를 포함하기 때문에, 각각의 비디오 프레임 신호를 정의하는데에는 다량의 디지털 데이터가 필요하다. 그러나, 통상적인 전송 채널의 사용가능한 주파수 대역폭은 제한적이므로, 이를 통해 다량의 디지털 데이터를 전송하기위해서는 여러 가지 데이터 압축 기법, 특히, 인간의 형상을 전송하기 위해 채용된 비디오 전화 및 원격 화상 회의 시스템에 사용되는 낮은 비트 속도의 비디오 신호 인코더를 통하여 데이터의 양을 압축시키거나 또는 감소시키는 것이 필수적이다.In digital broadcasting systems such as video telephony, teleconferencing and high definition television systems, the video line signal within the video frame signal includes a sequence of digital data referred to as pixel values, so that each video Defining a frame signal requires a large amount of digital data. However, the available frequency bandwidth of a typical transmission channel is limited, so that the transmission of large amounts of digital data can lead to various data compression techniques, especially video telephony and teleconference systems employed to transmit human shapes. It is essential to compress or reduce the amount of data through the low bit rate video signal encoder used.

통상적으로, 비디오 코딩 시스템은 연속하여 변화하는 픽셀들로 구성되는 이미지들을 전송한다. 그러나, 3차원 모델 기반형 코딩 시스템에서는 특정한 움직임파라미터를 이미지로부터 추출하고, 추출된 움직임 파라미터를 수신단으로 전송한다. 수신단에서, 그 이미지들, 예를 들면, 얼굴 이미지들을 재구성하기 위해, 수신된 움직임 파라미터는 이전에 수신단에 전송된 사람의 기본 얼굴 형상 및 머리에 대한 일반적인 3차원 모델과 같은 데이터와 결합된다.Typically, a video coding system transmits images that consist of continuously changing pixels. However, in a 3D model-based coding system, a specific motion parameter is extracted from an image and the extracted motion parameter is transmitted to a receiver. At the receiving end, in order to reconstruct the images, for example facial images, the received motion parameter is combined with data such as a general three-dimensional model of the basic facial shape and head of the person previously sent to the receiving end.

비디오 전화 및 원격 화상 회의 시스템에 있어서, 비디오 이미지들은 주로 머리와 어깨 화면(head-and-shoulder shots), 즉, 사람의 상부 몸체로 구성된다. 또한, 시청자에게 가장 관심있는 대상은 사람의 얼굴일 것이므로 시청자는 특히, 비디오 화면에서 사람이 말하고 있는 경우에 배경 화면 혹은 다른 세부적인 것들이 아닌 부분, 즉, 입술, 턱, 머리 등의 움직이는 부분을 포함하는 사람의 입 영역(mouth area)에 주목한다. 그러므로, 얼굴의 형상에 대한 일반적인 정보만을 전송하고자 한다면, 디지털 데이터의 양을 상당히 감소시킬수 있을 것이다.In video telephony and teleconferencing systems, video images are mainly comprised of head-and-shoulder shots, i.e., the upper body of the person. In addition, the most interesting object for the viewer is the person's face, so that the viewer includes moving parts such as lips, chins, heads, etc., especially if the person is talking on the video screen, which is not a wallpaper or other details. Note the mouth area of the person. Therefore, if only general information about the shape of the face is to be transmitted, the amount of digital data can be significantly reduced.

그러므로, 본 발명은 3차원 얼굴 모델을 사용하여 얼굴 움직임을 인코딩 및 디코딩함으로써 전송 데이터 양을 감소시키는 방법 및 장치를 제공하는 것을 그 목적으로 한다.It is therefore an object of the present invention to provide a method and apparatus for reducing the amount of transmitted data by encoding and decoding facial motion using a three-dimensional face model.

상술한 목적을 달성하기위한 본 발명에 따르면, 3차원(3D) 모델 기반형 코딩 시스템에서 음성 신호와 2차원(2D) 이미지 신호에 근거하여 새로운 얼굴 이미지의 얼굴 움직임을 인코딩하기 위한 방법이 제공된다. 상술한 본 발명의 방법은, (a) 인간의 공통 얼굴의 3D 모델을 나타내는 기본 3D 모델에 근거하여 새로운 얼굴 이미지의 하나 이상의 2D 얼굴 이미지들을 나타내는 초기 데이터로부터 적응성 3D 모델을 발생하는 단계와, (b) 상기 적응성 3D 모델에 근거하여 상기 2D 이미지 신호에 대해, 상기 2D 이미지 신호와 상기 적응성 3D 모델 사이의 회전 상관관계에 의해 획득된 2D 픽쳐(picture)를 나타내는 기본 패턴을 발생하는 단계와, (c) 상기 2D 이미지 신호로부터 상기 새로운 얼굴 이미지의 다수의 변환이 발생하는 하나 이상의 특징 영역들을 추출하는 단계와, (d) 비교 결과를 나타내는 다수의 변환 파라미터들을 검출하기 위해, 상기 특징 영역들과 상기 기본 패턴을 비교하는 단계와, (e) 변경된 변환 파라미터들을 발생하기 위해 상기 음성 신호에 근거하여 상기 변환 파라미터들을 변경시키는 단계와, (f) 상기 초기 데이터와 상기 변경된 변환 파라미터들을 인코딩하는 단계를 포함하는 것을 특징으로 한다.According to the present invention for achieving the above object, there is provided a method for encoding a face motion of a new face image based on a speech signal and a two-dimensional (2D) image signal in a three-dimensional (3D) model-based coding system . The method of the present invention described above comprises the steps of (a) generating an adaptive 3D model from initial data representing one or more 2D face images of a new face image based on a base 3D model representing a 3D model of a human common face; b) generating a basic pattern for the 2D image signal based on the adaptive 3D model, representing a 2D picture obtained by rotational correlation between the 2D image signal and the adaptive 3D model; c) extracting one or more feature regions from which the multiple transformations of the new face image occur from the 2D image signal; and (d) detecting the plurality of transformation parameters indicative of a comparison result. Comparing basic patterns, and (e) converting the conversion based on the speech signal to generate modified conversion parameters. Changing the parameters, and (f) encoding the initial data and the modified transformation parameters.

도 1은 본 발명에 따른 얼굴 움직임을 인코딩하기 위한 장치의 블록도,1 is a block diagram of an apparatus for encoding facial motion according to the present invention;

도 2는 본 발명에 따른 얼굴 움직임을 디코딩하기 위한 장치의 블록도,2 is a block diagram of an apparatus for decoding facial motion according to the present invention;

도 3a는 본 발명에 따른 다수의 눈 및 눈썹 파라미터들을 도시한 도면,3A shows a number of eye and eyebrow parameters in accordance with the present invention;

도 3b는 본 발명에 따른 다수의 입(mouth) 파라미터들을 도시한 도면,3b illustrates a plurality of mouth parameters in accordance with the present invention;

도 3c는 본 발명에 따른 3개의 턱 파라미터들을 도시한 도면,3c shows three jaw parameters in accordance with the present invention;

도 3d는 본 발명에 따른 3개의 머리 파라미터들을 도시한 도면.3d shows three head parameters according to the invention.

〈도면의 주요부분에 대한 부호의 설명〉<Explanation of symbols for main parts of drawing>

10 : 적응성 3차원 모델부 12 : 인코더10: adaptive three-dimensional model unit 12: encoder

14 : 기본 3D 모델부 16 : 머리 파라미터14: basic 3D model section 16: head parameters

18 : 기본 패턴 발생부 20 : 특징 추출부18: basic pattern generator 20: feature extraction unit

30 : 음성 분석기 36 : 포맷터(formatter)30: speech analyzer 36: formatter

38 : 버퍼38: buffer

이하, 본 발명의 바람직한 실시예는 하기 첨부한 도면을 참조하여 다음과 같이 상세히 설명될 것으로, 본 발명의 실시예에 있어서, 입력 이미지의 대상은 인간의 얼굴이고 인코딩될 얼굴 이미지의 사전설정된 특징 부분들은 머리, 입, 턱 및 눈의 부분들이라고 가정한다.DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments of the present invention will now be described in detail as follows with reference to the accompanying drawings, in which an object of the input image is a human face and a predetermined feature portion of the face image to be encoded. Are assumed to be parts of the head, mouth, chin and eyes.

도 1을 참조하면, 본 발명의 실시예에 따라 다음에 상세히 설명될 방법에 근거하여 분류된(classified) 얼굴 움직임을 인코딩하기 위한 장치(100)의 블록도가 도시된다.Referring to FIG. 1, shown is a block diagram of an apparatus 100 for encoding face motion classified based on a method that will be described in detail below in accordance with an embodiment of the present invention.

설명의 편의를 위해, 인체는 허리를 포함하는 상부 몸체와 허리 아래의 하부몸체로 분리되어 있다고 가정한다. 그리고, 상부 몸체는 다시 머리, 몸통, 팔 등으로 분리된다. 머리는 다시 눈, 코, 입, 귀 등으로 분리된다. 눈, 코, 입, 귀가 기본 패턴으로서 고려되면, 인체에 대한 조직 시스템은 그러한 기본 패턴들 및 기본 패턴들의 변환이 추출될 수 있음을 나타내는 변환 파라미터들에 근거하여 구성될 수 있다. 이하, 머리에 대한 변환 파라미터들 및 데이터 구조가 설명된다.For convenience of explanation, it is assumed that the human body is divided into an upper body including a waist and a lower body below the waist. The upper body is again divided into a head, a torso and an arm. The head is separated into eyes, nose, mouth and ears. If eye, nose, mouth, and ear are considered as the base pattern, the tissue system for the human body may be constructed based on transformation parameters indicating that the base patterns and the transformation of the base patterns can be extracted. The conversion parameters and data structure for the head are described below.

머리의 기본 패턴들은 두 개의 카테고리(categories)로 분리된다. 제 1 카테고리는 기본 패턴들의 변환이 많이 발생하는 영역들에 대응하고, 제 2 카테고리는 기본 패턴들의 변환이 매우 드물게 발생하는 영역들에 대응한다. 제 1 카테고리는 눈,눈썹 입, 턱, 볼(cheeks), 앞머리의 영역들에 대응하고, 제 2 카테고리는 머리카락, 코, 귀 등의 영역들에 대응한다. 변환 파라미터들을 추출하는데 사용되는 기본 패턴들은 활동적인 움직임 영역들에 대응한다. 따라서, 선택된 변환 파라미터들은 다음에 상세하게 설명되는 눈, 눈썹, 입, 턱 및 머리에 대한 파라미터들을 포함한다. 앞머리와 볼의 주름은 각각 눈썹과 턱의 움직임에 따라 수동적으로 움직인다.The basic patterns of the head are divided into two categories. The first category corresponds to areas in which conversion of the basic patterns occurs frequently, and the second category corresponds to areas in which conversion of the basic patterns occurs very rarely. The first category corresponds to areas of the eye, eyebrow mouth, chin, cheeks, bangs, and the second category corresponds to areas of hair, nose, ear, and the like. The basic patterns used to extract the transform parameters correspond to active motion areas. Thus, the selected conversion parameters include parameters for the eye, eyebrows, mouth, chin and head, which are described in detail below. The wrinkles of the bangs and cheeks are passively moved according to the movement of the eyebrows and chin respectively.

1) 눈썹 : 도 3a에 도시된 바와 같이, 눈썹은 왼쪽과 오른쪽 눈썹으로 분리되고, 왼쪽 및 오른쪽 눈썹 변환 파라미터들은 각각 안쪽 눈썹 상하 움직임 파라미터들(EB1, EB2)과, 눈썹 좌우 움직임 파라미터들(EB3, EB4) 및 바깥쪽 눈썹 상하 움직임 파라미터들(EB5, EB6)을 포함한다.1) Eyebrow: As shown in FIG. 3A, the eyebrow is divided into left and right eyebrows, and the left and right eyebrow conversion parameters are inner eyebrow up and down motion parameters EB1 and EB2 and eyebrow left and right motion parameters EB3, respectively. , EB4) and outer eyebrow up and down motion parameters EB5 and EB6.

2) 눈 : 도 3a에 도시된 바와 같이, 왼쪽 및 오른쪽 눈 변환 파라미터들은 각각 눈까풀 상하 움직임 파라미터들(EL1, EL2)과, 눈동자 상하 움직임 파라미터들(E1, E2) 및 눈동자 좌우 움직임 파라미터들(E3, E4)을 포함한다.2) Eye: As shown in FIG. 3A, the left and right eye conversion parameters are eyelid up and down motion parameters EL1 and EL2, and pupil up and down motion parameters E1 and E2 and pupil left and right motion parameters E3, respectively. , E4).

3) 입 : 도 3b에 도시된 바와 같이, 입 움직임은 입술 움직임에 의존한다. 입변환 파라미터들은 입술의 양쪽 종단점들의 좌우 움직임 파라미터들(L1, L2)과, 입술의 중심 영역의 최상위 및 최하위에 있는 상하 움직임 파라미터들(L3, L4)과, 입술의 중심 영역의 최상위 및 최하위에 있는 전후 움직임 파라미터들(L5, L6)과, 입술의 종단점들의 상하 움직임 파라미터들(L7, L8)을 포함한다.3) Mouth: As shown in FIG. 3B, mouth movement depends on lip movement. The mouth transformation parameters include the left and right motion parameters L1 and L2 of both end points of the lips, the up and down motion parameters L3 and L4 at the top and bottom of the center area of the lips, and the top and bottom of the center area of the lips. Forward and backward motion parameters L5 and L6 and up and down motion parameters L7 and L8 of the endpoints of the lip.

4) 턱 : 도 3c에 도시된 바와 같이, 턱 변환 파라미터들은 상하 움직임 파라미터(C1)와, 좌우 움직임 파라미터(C2)와, 전후 움직임 파라미터(C3)를 포함한다.4) Jaw: As shown in FIG. 3C, the jaw conversion parameters include an up and down motion parameter C1, a left and right motion parameter C2, and a forward and backward motion parameter C3.

5) 머리 : 도 3d에 도시된 바와 같이, 좌표계의 기준은 제 1 경추 부근에 위치한 3차원 좌표계는 x 축에 수직이고 얼굴에 평행한 가상면인 얼굴면과, 머리의 정수리의 중앙을 지나는 z 축과, x 및 z 축에 모두 수직인 y 축으로 정의된다. 환언하면, 각각의 x, y, z 축은 도 3b에 도시된 파라미터들 L5, L1, L3의 움직임 방향과 평행하다. 머리 변환 파라미터들은 z 축 둘레 좌우 회전을 나타내는 요잉(yawing) 파라미터(H1)와, y 축 둘레 상하 회전을 나타내는 피칭(pitching) 파라미터(H2)와, x 축 둘레를 회전하는 좌우 경사를 나타내는 롤링(rolling) 파라미터(H3)를 포함한다.5) Head: As shown in FIG. 3D, the reference of the coordinate system is a three-dimensional coordinate system located near the first cervical spine, a face plane which is an imaginary plane perpendicular to the x axis and parallel to the face, and z passing through the center of the head of the head. An axis and a y axis perpendicular to both the x and z axes. In other words, each of the x, y, z axes is parallel to the direction of movement of the parameters L5, L1, L3 shown in FIG. 3B. The head transformation parameters are a yawing parameter H1 representing left and right rotation around the z axis, a pitching parameter H2 representing up and down rotation around the y axis, and a rolling representing left and right tilt around the x axis. rolling) parameter H3.

이들 파라미터들은 기본 패턴상의 개별적인 기본 위치에서 0의 값을 갖는다.These parameters have a value of zero at individual base locations on the base pattern.

독립 항목으로 표시된 각 기본 패턴의 변환 파라미터들은 아래에 주어진 데이터 형식으로 저장되고 전송된다.The conversion parameters of each basic pattern, marked as independent items, are stored and transmitted in the data format given below.

항목(TITLE)Item (TITLE) 코드(CODE)CODE 비트 수Number of bits START CODEHEAD ORIENTATION BITHEAD ORIENTATION ITEMShead_orientation_items[0]head_orientation_items[1]head_orientation_items[2]EYEBROW TRANSFORMATION BITLEFT-RIGHT EYEBROW ITEMSLEFT EYEBROW ITEMSlefteyebrow_items[0]lefteyebrow_items[1]lefteyebrow_items[2]RIGHT EYEBROW ITEMSrighteyebrow_items[0]righteyebrow_items[1]righteyebrow_items[2]EYE TRANSFORMATION BITSTART CODEHEAD ORIENTATION BITHEAD ORIENTATION ITEMShead_orientation_items [0] head_orientation_items [1] head_orientation_items [2] EYEBROW TRANSFORMATION BITLEFT-RIGHT EYEBROW ITEMSLEFT EYEBROW ITEMSlefteyebrow_eyems_browser_browser_browser_items_items_items ] EYE TRANSFORMATION BIT headhead_orientation_bithead_orientation_itemsH1H2H3eyebrow_biteyebrowslefteyebrow_itemsEB1EB3EB5righteyebrow_itemsEB2EB4EB6eye_bitheadhead_orientation_bithead_orientation_itemsH1H2H3eyebrow_biteyebrowslefteyebrow_itemsEB1EB3EB5righteyebrow_itemsEB2EB4EB6eye_bit 3138751233333333131387512333333331

항목(TITLE)Item (TITLE) 코드(CODE)CODE 비트 수Number of bits MOUTH TRANSFORMATION BITTRANSFORMATION SELECTION BITCHIN TRANSFORMATION BITFACE TEXTURE BITMOUTH TRANSFORMATION BITTRANSFORMATION SELECTION BITCHIN TRANSFORMATION BITFACE TEXTURE BIT E1E2E3E4EL1EL2mouth_bitspeech_bitL1L2L3L4L5L6L7L8soundpaceaccentchin_bitC1C2C3face_texture_bitface_dataE1E2E3E4EL1EL2mouth_bitspeech_bitL1L2L3L4L5L6L7L8soundpaceaccentchin_bitC1C2C3face_texture_bitface_data 333333113333333384314331VLB333333113333333384314331VLB

이하, 표 1의 각 항목을 설명한다.Hereinafter, each item of Table 1 is demonstrated.

1. START CODE(head) : 3 비트 코드로 머리 데이터의 시작을 나타내며, 예로 "001"로 설정된다. 시작 코드가 "001"이 아니면, 머리 데이터가 뒤따르지 않는다.1. START CODE (head): 3-bit code indicating the start of head data. For example, it is set to "001". If the start code is not "001", no head data follows.

2. HEAD ORIENTATION BIT(head_orientation_bit) : 1 비트 코드로 머리가 회전되는지의 여부를 나타낸다. 비트값 1은 머리가 회전되는 것과 머리 회전 파라미터가 뒤따르는 것을 나타낸다. 비트값 0은 머리가 회전되지 않는 것을 나타낸다.2. HEAD ORIENTATION BIT (head_orientation_bit): Indicates whether the head is rotated by 1 bit code. Bit value 1 indicates that the head is to be rotated and followed by the head rotation parameter. Bit value 0 indicates that the head is not rotated.

1) HEAD ORIENTATION ITEMS(head_orientation_items) : 3 비트 코드로머리가 회전되는 방향을 나타낸다. 각 3 비트의 코드는 대응하는 방향 항목의 존재를 나타낸다. 비트값 1은 머리가 자신의 대응하는 방향으로 회전되는 것을 나타낸다. 예를 들면, 비트값 "110"은 머리의 요잉, 즉 좌우 회전 및 피칭, 즉, 상하 회전이발생함을 나타낸다.1) HEAD ORIENTATION ITEMS (head_orientation_items): 3-bit code indicates the direction in which the head is rotated. Each three bit code indicates the presence of a corresponding direction item. Bit value 1 indicates that the head is rotated in its corresponding direction. For example, the bit value " 110 " indicates that yawing of the head, i.e., left and right rotation and pitching, i.e., vertical rotation, occurs.

a) head_orientation_items[0] : 8 비트의 머리 요잉 파라미터(H1)는 -90도 내지 90도까지의 181단계로 주어진 정수값을 나타낸다.a) head_orientation_items [0]: The 8-bit head yawing parameter H1 represents an integer value given in 181 steps from -90 degrees to 90 degrees.

b) head_orientation_items[1] : 7 비트의 헤드 피칭 파라미터(H2)는 -60도 내지 60도까지의 121단계로 주어진 정수값을 나타낸다.b) head_orientation_items [1]: The 7-bit head pitching parameter H2 represents an integer value given in 121 steps from -60 degrees to 60 degrees.

c) head_orientation_items[2] : 5 비트의 머리 롤링 파라미터(H3)는 -15도 내지 15도까지 31단계로 주어진 정수값을 나타낸다.c) head_orientation_items [2]: The 5-bit head rolling parameter H3 represents an integer value given in 31 steps from -15 degrees to 15 degrees.

3. EYEBROW TRANSFORMATION BIT(eyebrow_bit) : 1 비트 코드로 눈썹이 움직이는지의 여부를 나타낸다. 비트값 1은 눈썹이 움직이는 것을 나타내고 비트값 0은 눈썹이 움직이지 않음을 의미한다.3. EYEBROW TRANSFORMATION BIT (eyebrow_bit): This 1-bit code indicates whether the eyebrows are moving. Bit value 1 indicates that the eyebrows are moving and bit value 0 means that the eyebrows are not moving.

1) LEFT-RIGHT EYEBROW ITEMS(eyebrows) : 2 비트 코드로 눈섭이 움직이는 것을 나타낸다.1) LEFT-RIGHT EYEBROW ITEMS (eyebrows): It shows the movement of the eye with 2 bit code.

a) 00 : 양쪽 눈썹이 움직이지 않음.a) 00: Both eyebrows do not move.

b) 01 : 왼쪽 눈썹이 움직임.b) 01: The left eyebrow moves.

c) 10 : 오른쪽 눈썹이 움직임.c) 10: Right eyebrow movement.

d) 11 : 양쪽 눈썹이 움직임.d) 11: both eyebrows move.

2) LEFT EYEBROW ITEMS(lefteyebrow_items) : 3 비트 코드로 왼쪽 눈썹이 움직이는 방향을 나타낸다. 3 비트의 코드는 하기 기술되는 3개의 움직임 파라미터들의 존재를 나타낸다.2) LEFT EYEBROW ITEMS (lefteyebrow_items): 3-bit code that indicates the direction of left eyebrow movement. A three bit code indicates the presence of three motion parameters described below.

a) lefteyebrow_items[0] : lefteyebrow_items의 코드값 1은 안쪽의 왼쪽 눈썹이 상하로 움직이는 것을 나타낸다. 3 비트의 안쪽의 왼쪽 눈썹 상하 움직임 파라미터(EB1)는 -1.0 내지 1.0의 7단계로 주어진다.a) lefteyebrow_items [0]: Code value 1 of lefteyebrow_items indicates that the inner left eyebrow moves up and down. The 3-bit inner left eyebrow up-and-down motion parameter EB1 is given in seven steps from -1.0 to 1.0.

단계step 1One 22 33 44 55 66 77 가중치(W)Weight (W) -1.0-1.0 -0.6-0.6 -0.3-0.3 0.00.0 0.30.3 0.60.6 1.01.0

통상적으로 거의 움직임이 없는 4번째 단계와 두 개의 양쪽 끝 단계인 1번째 및 7번째 단계는 사전설정된 3차원의 절대 좌표로 주어지고, 나머지 단계들의위치는 상기 표 2에 예시된 바와 같은 사전설정된 가중치 팩터들을 적용하여 연산된다. 2번째, 3번째, 5번째, 6번째 단계들의 좌표들은 하기 수학식과 같이 계산될 수 있다.Typically, the fourth stage with little motion and the two end stages, the first and seventh stages, are given in predetermined three-dimensional absolute coordinates, and the positions of the remaining stages are preset weights as illustrated in Table 2 above. Computed by applying factors. Coordinates of the second, third, fifth, and sixth steps may be calculated as in the following equation.

먼저, 2번째와 3번째 단계는 하기 수학식 1과 같이 계산된다.First, the second and third steps are calculated as in Equation 1 below.

5번째와 6번째 단계는 하기 수학식 2와 같이 계산된다.The fifth and sixth steps are calculated as in Equation 2 below.

상기 수학식에서, x(j), y(j), z(j)는 j번째 단계에 대한 x, y, z 좌표를 나타내고, w(j)는 j번째 단계에 대한 사전설정된 가중치 팩터이며, x(단계 i), y(단계 i), z(단계 i)는 i번째 단계에서의 x, y, z 좌표이다.In the above equation, x (j), y (j), z (j) represent the x, y, z coordinates for the jth step, w (j) is the preset weight factor for the jth step, x (Step i), y (Step i) and z (Step i) are the x, y, z coordinates in the i-th step.

b) lefteyebrow_items[1] : 비트값 1은 왼쪽 눈썹이 왼쪽 혹은 오른쪽 방향으로 움직이는 것을 나타낸다. 3 비트의 왼쪽 눈썹 좌우 움직임 파라미터(EB3)는 -1.0 내지 1.0까지의 7단계로 주어진다. 단계들의 위치는 EB1에서와 같은 동일한 방식으로 결정된다.b) lefteyebrow_items [1]: Bit value 1 indicates that the left eyebrow moves left or right. The 3-bit left eyebrow left and right movement parameter (EB3) is given in seven steps from -1.0 to 1.0. The location of the steps is determined in the same way as in EB1.

c) lefteyebrow_items[2] : 비트값 1은 바깥쪽의 왼쪽 눈썹이 상하로 움직이는것을 나타낸다. 3 비트의 바깥쪽의 왼쪽 눈썹 상하 움직임 파라미터(EB5)는 -1.0 내지 1.0까지의 7단계로 주어진다. (EB1)에서 사용된 가중치는 (EB5)에 적용되고 단계들의 위치는 EB1에서와 같은 동일한 방식으로 결정된다.c) lefteyebrow_items [2]: Bit value 1 indicates that the outer left eyebrow moves up and down. The left eyebrow up and down motion parameter EB5 of 3 bits is given in seven steps from -1.0 to 1.0. The weight used in (EB1) is applied to (EB5) and the position of the steps is determined in the same way as in EB1.

3) RIGHT EYEBROW ITEMS(righteyebrow_items) : 1 비트 코드로 오른쪽눈썹이 움직이는 방향을 나타낸다. 오른쪽 눈썹 변환 파라미터들의 함수들(EB2, EB4, EB6)은 왼쪽 눈썹 변환 파라미터들의 함수(EB1, EB3, EB5)와 동일하다.3) RIGHT EYEBROW ITEMS (righteyebrow_items): A 1-bit code indicates the direction of movement of the right eyebrow. The functions EB2, EB4, EB6 of the right eyebrow transformation parameters are the same as the functions EB1, EB3, EB5 of the left eyebrow transformation parameters.

4. EYE TRANSFORMATION BIT(eye_bit) : 1 비트 코드로 눈이 움직이는지의 여부를 나타낸다. 비트값 1은 눈이 움직이는 것을 나타내고 비트값 0은 눈이 움직이지 않음을 나타낸다.4. EYE TRANSFORMATION BIT (eye_bit): Indicates whether the eye moves with a 1-bit code. Bit value 1 indicates that the eye is moving and bit value 0 indicates that the eye is not moving.

1) PUPIL UP-DOWN MOVEMENT PARAMETERS(E1, E2) : (E1) 및 (E2)는 각각 왼쪽과 오른쪽 눈의 상하 움직임을 나타낸다. (E1) 및 (E2)는 각각 7단계를 갖는다. 통상적으로 거의 움직임이 없는 4번째 단계와 양쪽 끝 단계인 1번째 및 7번째 단계는 사전설정된 3차원의 절대 좌표로 주어지고, 나머지 단계들의 위치는 EB1의 경우와 같이 연산된다.1) PUPIL UP-DOWN MOVEMENT PARAMETERS (E1, E2): (E1) and (E2) represent the up and down movement of the left and right eyes, respectively. (E1) and (E2) each have seven steps. Typically, the fourth stage with almost no movement and the first and seventh stages, both end stages, are given in predetermined three-dimensional absolute coordinates, and the positions of the remaining stages are calculated as in the case of EB1.

2) PUPIL LEFT-RIGHT MOVEMENT PARAMETERS(E3, E4) : (E3) 및 (E4)는 각각 왼쪽과 오른쪽 눈의 좌우 움직임을 나타낸다. (E3) 및 (E4)는 각각 7단계를 가지며, 단계들의 위치는 E1 및 E2의 경우와 같이 연산된다.2) PUPIL LEFT-RIGHT MOVEMENT PARAMETERS (E3, E4): (E3) and (E4) represent left and right eye movements, respectively. (E3) and (E4) each have seven steps, and the positions of the steps are calculated as in the case of E1 and E2.

3) OUTER EYELID UP-DOWN MOVEMENT PARAMETERS(EL1, EL2) : (EL1) 및 (EL2)는 각각 왼쪽과 오른쪽 눈까풀의 상하 움직임을 나타낸다. (EL1) 및 (EL2)는 각각 7단계를 가지며, 단계들의 위치는 E1 및 E2의 경우와 같이 연산된다.3) OUTER EYELID UP-DOWN MOVEMENT PARAMETERS (EL1, EL2): (EL1) and (EL2) represent the vertical movement of the left and right eyelids, respectively. (EL1) and (EL2) each have seven steps, and the positions of the steps are calculated as in the case of E1 and E2.

5. MOUTH TRANSFORMATION BIT(mouth_bit) : 1 비트 코드로 입 형상이 변화하는지의 여부를 나타낸다. 비트값 1은 입 형상이 변화하는 것을 나타내고,비트값 0은 입 형상이 변화하지 않음을 나타낸다.5. MOUTH TRANSFORMATION BIT (mouth_bit): Indicates whether the mouth shape changes with a 1-bit code. Bit value 1 indicates that the mouth shape changes, and bit value 0 indicates that the mouth shape does not change.

1) TRANSFORMATION SELECTION BIT(speech_bit) : 1 비트 코드로 입의 변환 파라미터가 선택됨을 나타낸다. 입술의 형상은 사람이 말하고 있는 경우와 사람이 감정을 표현하는 경우의 2 가지로 분류된다. 통상적으로, 입술의 형상은 사람이 말하고 있는 경우에 발성된 사운드(a pronounced sound)에 상당히 의존하기 때문에, 입술의 형상은 음성의 발성된 사운드, 속도(pace), 액센트(accent)의 특징을사용하여 구성될 수 있다. 그러나, 사람이 감정을 표현하는 경우에, 입술의 형상은 어떠한 특징도 갖지 않는다. 따라서, 입술의 형상은 모든 변환 파라미터들, 즉, 도 3b에 도시된 입술의 양쪽 종단점들의 좌우 움직임 파라미터들(L1, L2)과, 입술의 중심 영역의 최상위 및 최하위에 있는 상하 움직임 파라미터들(L3, L4)과, 입술의 중심 영역의 최상위 및 최하위에 있는 전후 움직임 파라미터들(L5, L6) 및 입술의 종단점들의 상하 움직임 파라미터들(L7, L8)을 모두 사용하여 구성되어야 한다. 비트값 1은 사람이 말하고 있는 경우를 나타내고, 발음, 속도 및 액센트 코드가 이에 뒤따른다. 코드값이 0이면, L1 내지 L8 코드들이 뒤따른다. 각 3 비트인 L1 내지 L8 움직임 파라미터들은 7단계를 각각 갖는다. 단계들의 위치는 움직임 파라미터들EB1에 대해 전술된 체계에 근거하여 연산된다. 음성의 발성된 사운드, 속도 및 액센트의 특징은 각각 8 비트, 4 비트 및 3 비트로 표시된다.1) TRANSFORMATION SELECTION BIT (speech_bit): Indicates that the transformation parameter of the mouth is selected as a 1-bit code. The shape of the lip is classified into two types: a person speaking and a person expressing emotion. Typically, the shape of the lips uses features of spoken sound, pace, and accents, since the shape of the lips depends heavily on a pronounced sound when a person is speaking. Can be configured. However, when a person expresses emotions, the shape of the lips does not have any feature. Thus, the shape of the lips is determined by all the transformation parameters, i.e., the left and right motion parameters L1 and L2 of both endpoints of the lips shown in FIG. 3B, and the up and down motion parameters L3 at the top and bottom of the central area of the lips. , L4), and the forward and backward motion parameters L5 and L6 at the top and bottom of the central region of the lip and the up and down motion parameters L7 and L8 of the endpoints of the lip. Bit value 1 represents the case in which the person is speaking, followed by pronunciation, speed, and accent code. If the code value is zero, the L1 through L8 codes follow. Each of the three bits, L1 to L8 motion parameters, has seven steps each. The position of the steps is calculated based on the scheme described above for the motion parameters EB1. The uttered sound, speed and accent characteristics of the voice are represented by 8 bits, 4 bits and 3 bits, respectively.

6. CHIN TRANSFORMATION BIT(chin_bit) : 1 비트 코드로 턱이 움직이는지의 여부를 나타낸다. 비트값 1은 턱이 움직이는 것을 나타내고, 비트값 0은 턱이 움직이지 않는 것을 나타낸다.6. CHIN TRANSFORMATION BIT (chin_bit): It indicates whether the jaw moves by 1 bit code. Bit value 1 indicates that the jaw is moving and bit value 0 indicates that the jaw is not moving.

1) CHIN UP-DOWN MOVEMENT PARAMETER(C1) : 4 비트의 턱 상하 움직임 파라미터(C1)는 입을 다물었을 때부터 턱의 변위량을 나타내며, 16단계로 주어진다. 즉, 0번째 단계는 다문 입을 나타내고 15번째 단계는 가장 크게 열린 입을 나타낸다. 16단계들에 대한 턱의 위치는 EB1에서와 동일한 방식으로 연산된다.1) CHIN UP-DOWN MOVEMENT PARAMETER (C1): 4-bit Jaw Up / Down Movement Parameter (C1) shows the displacement of jaw from the time the mouth is closed and is given in 16 steps. That is, the 0th stage represents the multilingual mouth and the 15th stage represents the largest open mouth. The jaw position for the 16 steps is calculated in the same way as in EB1.

2) CHIN LEFT-RIGHT MOVEMENT PARAMETER(C2) : 3 비트의 턱 좌우 움직임 파라미터(C2)는 턱의 좌우 움직임을 나타내고, 이는 중심 영역에서의 기준으로부터 왼쪽 방향으로 3단계 및 오른쪽 방향으로 3단계로 주어진다. 단계들의 위치는 EB1에서의 경우와 같이 연산된다.2) CHIN LEFT-RIGHT MOVEMENT PARAMETER (C2): 3-bit jaw left and right movement parameter (C2) represents the left and right movement of the jaw, which is given in three steps in the left direction and three steps in the right direction from the reference in the center area. . The position of the steps is calculated as in the case of EB1.

3) CHIN FORWARD-BACKWARD MOVEMENT PARAMETER(C3) : 3 비트의 턱 전후 움직임 파라미터(C3)는 턱이 전후로 움직인 것을 나타내며, 이는 중심 영역에서의 기준으로부터 전방으로 3단계 및 후방으로 3단계로 주어진다. 단계들의 위치는 EB1에서의 경우와 같이 연산된다.3) CHIN FORWARD-BACKWARD MOVEMENT PARAMETER (C3): The 3-bit jaw forward and backward movement parameter (C3) indicates the jaw moved forward and backward, which is given three steps forward and three steps backward from the reference in the center area. The position of the steps is calculated as in the case of EB1.

7. FACE TEXTURE BIT(face_texture_bit) : 새로운 얼굴 이미지가 전송되는 경우에, 코드는 1로 설정된다.7. FACE TEXTURE BIT (face_texture_bit): When a new face image is sent, the code is set to one.

1) FACE DATA(face_data) : 이것은 새로운 얼굴 이미지의 압축된 기본 얼굴 이미지를 나타내고 이의 길이는 변수가 된다.1) FACE DATA (face_data): This represents the compressed base face image of the new face image and its length is variable.

다시 도 1을 참조하면, 초기 데이터가 적응성 3차원(3D) 모델 블록(10) 및 인코더(12)에 인가되는데, 여기서 초기 데이터는 스크린에 나타난 새로운 얼굴 이미지의 무표정하고 말없는 얼굴 이미지인 하나 이상의 얼굴 이미지들, 즉, 새로운 얼굴 이미지의 하나 이상의 정지 픽쳐들을 나타낸다. 인코더(12)는 통상적인 인코딩 원리에 의해 2D 얼굴 이미지들의 초기 데이터를 인코딩하여 인코딩된 얼굴 이미지를face_data로서 포맷터(formatter)(36)에 제공한다.Referring again to FIG. 1, initial data is applied to adaptive three-dimensional (3D) model block 10 and encoder 12, where the initial data is one or more faces that are expressionless and silent face images of new face images displayed on the screen. Represent one or more still pictures of the images, ie, the new face image. Encoder 12 encodes the initial data of the 2D face images according to conventional encoding principles and provides the encoded face image as face_data to formatter 36.

한편, 기본 3D 모델 블록(14)에 저장된 인간의 공통 얼굴의 3D 모델을 나타내는 기본 3D 모델은 적응성 3D 모델 블록(10)에 인가된다. 적응성 3D 모델 블록(10)은 기본 3D 모델에 근거하여 2D 초기 데이터를 변경시킴으로써 새로운 얼굴 이미지와 유사한 적응성 3D 모델을 발생하여, 이 적응성 3D 모델을 머리 파라미터 블록(16) 및 기본 패턴 발생 블록(18)에 제공한다.On the other hand, the basic 3D model representing the 3D model of the human common face stored in the basic 3D model block 14 is applied to the adaptive 3D model block 10. The adaptive 3D model block 10 generates an adaptive 3D model similar to the new face image by changing the 2D initial data based on the basic 3D model, so that the adaptive 3D model is converted into the head parameter block 16 and the basic pattern generation block 18. To provide.

한편, 새로운 얼굴의 이미지 신호는 카메라(도시되지 않음)로부터 머리 파라미터 블록(16) 및 특징 추출 블록(20)에 제공되고, 새로운 얼굴 이미지의 음성 신호들은 마이크로폰(도시되지 않음)으로부터 음성 분석기(30)에 연속적으로 입력되는데, 여기서 새로운 얼굴의 입력 이미지 및 음성 신호는 프레임 단위 혹은 필드 단위 중 하나로 연속적으로 입력된다.On the other hand, the image signal of the new face is provided to the head parameter block 16 and the feature extraction block 20 from the camera (not shown), and the voice signals of the new face image are received from the microphone (not shown) and the voice analyzer 30. ), Where the input image and voice signal of the new face are continuously input in one of frame units or field units.

처음에, 머리 파라미터 블록(16)은 새로운 얼굴 이미지의 적응성 3D 모델에통상적인 어핀 변환 원리(affine transform discipline)를 적용하여, 새로운 얼굴의 이미지 신호들로부터 머리 요잉(yawing), 피칭(pitching) 및 롤링(rolling) 파라미터들 H1 내지 H3을 검출한다. 머리 요잉, 피칭 및 롤링 파라미터 H1 내지 H3은 기본 패턴 발생 블록(18) 및 포맷터(36)에 제공된다. 기본 패턴 발생 블록(18)은 새로운 얼굴 이미지의 기본 패턴을 발생하는데, 여기서 기본 패턴은 요잉, 피칭 및 롤링 파라미터들에 대한 적응성 3D 모델을 회전시켜 획득된 스크린에 투사(projecting)되는 새로운 얼굴의 2D 적응성 이미지를 나타내고, 기본 패턴내의 왼쪽과 오른쪽 눈섭과, 왼쪽과 오른쪽 눈을 인덱스(indexe)하여 기본 눈썹, 눈, 입 및 턱 패턴들을 발생한다. 기본 패턴 발생 블록(18)은 인덱스된 눈썹, 눈, 입 및 턱을 눈썹 추출 블록(22), 눈 추출 블록(24), 입 1 추출 블록(26) 및 턱 1 추출 블록(28)에 각각 제공한다.Initially, the head parameter block 16 applies conventional affine transform discipline to the adaptive 3D model of the new face image, so that the head yawing, pitching and Rolling parameters H1 to H3 are detected. Head yawing, pitching and rolling parameters H1 to H3 are provided to the basic pattern generation block 18 and the formatter 36. The base pattern generation block 18 generates a base pattern of the new face image, where the base pattern is a 2D of the new face projected onto the screen obtained by rotating an adaptive 3D model for yawing, pitching and rolling parameters. Represents an adaptive image and indexes the left and right brows and the left and right eyes in the basic pattern to generate basic eyebrow, eye, mouth and chin patterns. The basic pattern generation block 18 provides indexed eyebrows, eyes, mouths, and jaws to the eyebrow extraction block 22, eye extraction block 24, mouth 1 extraction block 26, and jaw 1 extraction block 28, respectively. do.

한편, 특징 추출 블록(20)은 소벨 연산자(sobel operator)와 같은 통상적인 에지 검출기를 사용하여 새로운 얼굴의 이미지 신호들로부터 사전설정된 특징 영역들의 에지를 검출한다. 여기서 특징 영역들은 새로운 얼굴의 왼쪽과 오른쪽 눈썹, 왼쪽과 오른쪽 눈, 입 및 턱을 포함하고, 각 특징 영역들의 형상 및 위치를 나타내는 윤곽선 정보, 즉, 왼쪽과 오른쪽 눈썹, 왼쪽과 오른쪽 눈, 입 및 턱을 눈썹 추출 블록(22), 눈 추출 블록(24), 입 1 추출 블록(26) 및 턱 1 추출 블록(28)에 각각 제공한다.On the other hand, feature extraction block 20 detects the edges of the predetermined feature areas from image signals of the new face using a conventional edge detector such as a sobel operator. The feature areas here include the left and right eyebrows, the left and right eyebrows, the mouth and the chin of the new face, and outline information indicating the shape and location of each feature area, namely the left and right eyebrows, the left and right eye, the mouth and The jaws are provided to the eyebrow extraction block 22, the eye extraction block 24, the mouth 1 extraction block 26 and the jaw 1 extraction block 28, respectively.

눈썹 추출 블록(22)은 기본 패턴 발생 블록(18)으로부터 입력된 기본 눈썹 패턴에 근거하여 왼쪽과 오른쪽 눈썹의 움직임을 검출한다. 왼쪽과 오른쪽 눈썹의 움직임이 있으면, 왼쪽과 오른쪽 눈썹 변환 파라미터들 E1 내지 E6은 각각 3 비트로연산된다. 눈썹 추출 블록(22)에서, 왼쪽 눈썹 변환 파라미터들 E1, E3, E5가 인코딩 될 것임을 나타내는 3비트 lefteyebrow_items 신호와, 오른쪽 눈썹 변환 파라미터들 E2, E4, E6이 인코딩 될 것임을 나타내는 3 비트 righteyebrow_items 신호가 발생된다. 눈썹이 움직이는것을 나타내는 2 비트 눈썹 신호는 lefteyebrow_items 신호와 righteyebrow_items 신호에 근거하여 발생된다. 눈썹 데이터는 상기에 주어진 데이터 형식으로 포맷터(36)에 연속적으로 제공되는데, 여기서 눈썹 데이터가 있다면 눈썹 신호와, lefteyebrow_items 신호와, 왼쪽 눈썹 변환 파라미터들 EB1, EB3, EB5와, righteyebrow_items 신호와, 오른쪽 눈썹 변환 파라미터들 EB2, EB4, EB6를 포함한다.The eyebrow extraction block 22 detects the movement of the left and right eyebrows based on the basic eyebrow pattern input from the basic pattern generation block 18. If there is movement of the left and right eyebrows, the left and right eyebrow transformation parameters E1 to E6 are each computed by 3 bits. In the eyebrow extraction block 22, a 3-bit lefteyebrow_items signal is generated indicating that the left eyebrow transformation parameters E1, E3, E5 will be encoded, and a 3-bit righteyebrow_items signal indicating that the right eyebrow transformation parameters E2, E4, E6 will be encoded. do. The 2-bit eyebrow signal, which indicates that the eyebrows are moving, is generated based on the lefteyebrow_items and righteyebrow_items signals. The eyebrow data is provided continuously to the formatter 36 in the data format given above, where the eyebrow data, if present, is the eyebrow signal, the lefteyebrow_items signal, the left eyebrow conversion parameters EB1, EB3, EB5, the righteyebrow_items signal, and the right eyebrow. Conversion parameters EB2, EB4, EB6.

눈 추출 블록(24)은 기본 패턴 발생 블록(18)으로부터 입력된 기본 눈에 근거하여 왼쪽과 오른쪽 눈의 움직임을 검출하고, 왼쪽과 오른쪽 눈의 움직임에 근거하여 눈동자 상하 움직임 파라미터들 E1과 E2, 눈동자 좌우 움직임 파라미터들 E3과 E4 및 바깥쪽 눈까풀 상하 움직임 파라미터들 EL1과 EL2를 발생한다. 눈 추출 블록(24)은 눈동자 상하 움직임 파라미터들 E1과 E2, 눈동자 왼쪽 아래 움직임 파라미터들 E3과 E4 및 바깥쪽 눈까풀 상하 움직임 파라미터들 EL1과 EL2를 포함하는 눈 데이터가 있으면 이를 포맷터(36)에 제공한다.The eye extraction block 24 detects the movement of the left and right eyes based on the basic eye input from the basic pattern generation block 18, and based on the movement of the left and right eyes, the pupil up and down motion parameters E1 and E2, Generates pupil left and right motion parameters E3 and E4 and outer eyelid up and down motion parameters EL1 and EL2. The eye extraction block 24 provides the formatter 36 with eye data including eye up and down motion parameters E1 and E2, eye left and down motion parameters E3 and E4 and outer eyelid up and down motion parameters EL1 and EL2. do.

입 1 추출 블록(26)은 기본 패턴 발생 블록(18)으로부터 입력된 기본 입 패턴에 근거하여 새로운 얼굴의 감정 표현 중에 입의 움직임을 검출하고, 입 변환 파라미터들 L1 내지 L8을 발생한다. 입 변환 파라미터들 L1 내지 L8은 포맷터(36)에 제공된다.The mouth 1 extraction block 26 detects the movement of the mouth during the emotional expression of the new face based on the basic mouth pattern input from the basic pattern generation block 18 and generates mouth conversion parameters L1 to L8. The input conversion parameters L1 through L8 are provided to the formatter 36.

턱 1 추출 블록(28)은 기본 패턴 발생 블록(18)으로부터 입력된 기본 턱 패턴에 근거하여 새로운 얼굴의 감정 표현 중에 턱의 움직임을 검출아고, 턱 변환 파라미터들 C1 내지 C3을 발생한다. 턱 변환 파라미터들 C1 내지 C3은 포맷터(36)에 제공된다.The jaw 1 extraction block 28 detects the movement of the jaw during the expression of the new face based on the basic jaw pattern input from the base pattern generation block 18 and generates the jaw transformation parameters C1 to C3. Tuck conversion parameters C1 to C3 are provided to the formatter 36.

한편, 음성 분석기(30)는 음성 신호를 사전설정된 임계치와 비교하여 새로운얼굴 이미지가 말하는 중이거나 혹은 감정을 표현하는지를 판단한다. 새로운 얼굴 이미지가 의사소통을 위해 말하고 있는지의 여부를 나타내는 speech_bit 신호가 포맷터(36)에 제공된다.On the other hand, the speech analyzer 30 compares the speech signal with a preset threshold to determine whether a new face image is speaking or expressing emotions. A speech_bit signal is provided to the formatter 36 indicating whether a new face image is speaking for communication.

새로운 얼굴 이미지가 의사소통을 위해 말하고 있는 중이면, 발성된 사운드(pronounced sound), 속도(pace) 및 액센트는 입 2 추출 블록(32)과 턱 2 추출 블록(34)에 제공되기 위해 음성 신호들로부터 추출된다.If a new face image is talking for communication, then the spoken sounds, pace and accents are sent to the mouth 2 extraction block 32 and the jaw 2 extraction block 34 to provide speech signals. Is extracted from.

입 2 추출 블록(32)은 입의 형상을 결정하기 위해 발성된 사운드, 속도 및 액센트 각각에 대해서 8 비트 사운드 파라미터, 4 비트 속도 파라미터 및 3 비트 액센트파라미터를 발생하고, 그 사운드 파라미터, 속도 파라미터 및 액센트 파라미터를 포맷터(36)에 제공한다. 필요하다면, 3개의 턱 변환 파라미터들 C1 내지 C3가 포맷터(36)에 제공되기 위해 턱 2 추출 블록(24)에서 발생된다.The mouth 2 extraction block 32 generates an 8-bit sound parameter, a 4-bit speed parameter and a 3-bit accent parameter for each of the sound, speed and accent spoken to determine the shape of the mouth, the sound parameter, velocity parameter and The accent parameter is provided to the formatter 36. If necessary, three tuck conversion parameters C1 to C3 are generated in the tuck 2 extraction block 24 to be provided to the formatter 36.

포맷터(36)는 새로운 얼굴 이미지가 전송될 때마다, 새로운 얼굴의 face_data가 뒤따름을 나타내는 1 비트의 face_texture_bit를 발생한다. 포맷터(36)는 또한 3 비트의 시작 코드 신호와, 1 비트의 head_orientation_bit 신호와, 1 비트의 eyebrow_bit 신호와, 1 비트의 eye_bit 신호와, 1 비트의 mouth_bit 신호 및 1 비트의 chin_bit 신호를 발생하는데, 여기서 눈썹 신호에 근거하여 발생된 1 비트 eyebrow_bit 신호는 왼쪽과 오른쪽 눈썹에서 어떠한 움직임이 있는지의 여부를 나타내고, 파라미터 E1 내지 E4, EL1 및 EL2에 근거하여 발생된 1 비트 eye_bit 신호는 왼쪽 혹은 오른쪽 눈에서 어떠한 움직임이 있는지의 여부를 나타내며, 입 변환 파라미터들 L1 내지 L8 혹은 사운드, 속도 및 액센트 파라미터들에 근거하여 발생된 mouth_bit 신호는 입에서 어떠한 움직임이 있는지의 여부를 나타내고, 턱 변환 파라미터들 C1 내지 C3에 근거하여 발생된 1 비트 chin_bit 신호는 턱에서 어떠한 움직임이 있는지의 여부를 나타낸다. 포맷터(36)는 상기에 주어진 데이터 형식에 따라 모든 신호들 및 파라미터들을 멀티플렉스하고, 멀티플렉스된 신호들 및 데이터를 저장하는 버퍼(38)로 제공한다. 버퍼(38)내에 저장된 데이터는 전송용 트랜스미터(도시되지 않음)를 통하여 전송된다.Each time a new face image is transmitted, the formatter 36 generates one bit of face_texture_bit indicating that the face_data of the new face is followed. The formatter 36 also generates a three bit start code signal, one bit head_orientation_bit signal, one bit eyebrow_bit signal, one bit eye_bit signal, one bit mouth_bit signal and one bit chin_bit signal. Here, the 1-bit eyebrow_bit signal generated based on the eyebrow signal indicates whether there is any movement in the left and right eyebrows, and the 1-bit eye_bit signal generated based on the parameters E1 to E4, EL1, and EL2 is used for the left or right eye. Indicates whether there is any movement, and the mouth_bit signal generated based on the mouth conversion parameters L1 to L8 or the sound, speed and accent parameters indicates whether there is any movement in the mouth, and the jaw conversion parameters C1 to C3. The 1-bit chin_bit signal generated based on indicates whether there is any movement in the jaw. The formatter 36 multiplexes all signals and parameters according to the data format given above and provides them to a buffer 38 that stores the multiplexed signals and data. Data stored in the buffer 38 is transmitted through a transmission transmitter (not shown).

도 2를 참조하면, 본 발명에 따른 얼굴 움직임을 디코딩(decoding)하기 위한 장치(200)의 블록도가 도시되며, 여기서 전송된 데이터는 일시적으로 버퍼(50)에 저장되어 초기 데이터 디코더(52)와 파라미터 디코더(54)에 제공된다.2, a block diagram of an apparatus 200 for decoding facial movements in accordance with the present invention is shown, wherein the transmitted data is temporarily stored in a buffer 50 to provide an initial data decoder 52. And to the parameter decoder 54.

초기 데이터 디코더(52)는 전송된 데이터 중에서 face_data를 디코딩하여 새로운 얼굴 이미지의 2D 초기 데이터를 적응성 3D 모델 발생 블록(57) 내의 적응성 3D 모델 블록에 제공한다.The initial data decoder 52 decodes face_data from the transmitted data and provides 2D initial data of the new face image to the adaptive 3D model block in the adaptive 3D model generation block 57.

적응성 3D 모델 블록(58)은 적응성 3D 모델 발생 블록(57)내의 기본 3D 모델 블록(60)으로부터 입력된 기본 3D 모델에 근거하여 2D 초기 데이터를 변경시킴으로써 새로운 얼굴 이미지와 유사한 적응성 3D 모델을 발생하는데, 여기서 기본 3D 모델은 인코딩 장치(100)의 기본 3D 모델과 동일하다. 적응성 3D 모델 블록(58)에서 생성된 적응성 3D 모델은 패턴 발생 블록(62)에 제공된다.The adaptive 3D model block 58 generates an adaptive 3D model similar to the new face image by changing the 2D initial data based on the basic 3D model input from the basic 3D model block 60 in the adaptive 3D model generation block 57. Here, the basic 3D model is the same as the basic 3D model of the encoding apparatus 100. The adaptive 3D model generated in adaptive 3D model block 58 is provided to pattern generation block 62.

한편, 파라미터 디코더(54)는 face_data를 제외한 전송된 모든 데이터를 디코딩하여 변환 파라미터들을 모두 생성한다. 이때 생성되는 변환 파라미터들은 머리 변환 파라미터들 H1 내지 H3, 왼쪽과 오른쪽 눈썹 변환 파라미터들 EB1 내지 EB6, 왼쪽과 오른쪽 눈 변환 파라미터들 E1 내지 E4, EL1 및 EL2, 입 변환 파라미터들 L1 내지 L8 혹은 사운드, 속도 및 액센트 파라미터들, 턱 변환 파라미터들 C1 내지 C3를 포함한다. 파라미터 디코더(54)에서 생성된 변환 파라미터들은 눈썹 재구성 블록(64), 눈 재구성 블록(66), 입 재구성 블록(68) 및 턱 재구성블록(70)으로 구성된 재구성 블록(63)으로 입력된다.Meanwhile, the parameter decoder 54 decodes all transmitted data except for face_data and generates all conversion parameters. The conversion parameters generated at this time are head transformation parameters H1 to H3, left and right eyebrow transformation parameters EB1 to EB6, left and right eye transformation parameters E1 to E4, EL1 and EL2, mouth transformation parameters L1 to L8 or sound, Velocity and accent parameters, tuck transformation parameters C1 to C3. The transform parameters generated at the parameter decoder 54 are input to a reconstruction block 63 consisting of an eyebrow reconstruction block 64, an eye reconstruction block 66, a mouth reconstruction block 68, and a jaw reconstruction block 70.

보다 상세히 말해서, 머리 변환 파라미터들 H1 내지 H3은 라인 L62를 통해 패턴 발생 블록(62)에제공되고, 왼쪽과 오른쪽 눈썹 변환 파라미터들 EB1 내지 EB6은 눈썹 재구성 블록(64)에 제공되며, 왼쪽과 오른쪽 눈 변환 파라미터들 E1 내지 E4, EL1 및 EL2는 눈재구성 블록(66)에 제공되고, 입 변환 파라미터들 L1 내지 L8 혹은 사운드, 속도 및 액센트 파라미터들 중 하나는 입 재구성 블록(68)에 제공되며, 턱 변환 파라미터들 C1 내지 C3은 턱 재구성 블록(70)에 제공된다.More specifically, hair transformation parameters H1 to H3 are provided to pattern generation block 62 via line L62, and left and right eyebrow transformation parameters EB1 to EB6 are provided to eyebrow reconstruction block 64, left and right. Eye conversion parameters E1 to E4, EL1 and EL2 are provided to the eye reconstruction block 66, and mouth conversion parameters L1 to L8 or one of the sound, speed and accent parameters are provided to the mouth reconstruction block 68, Jaw conversion parameters C1 through C3 are provided to the jaw reconstruction block 70.

처음에, 패턴 발생 블록(62)이 새로운 얼굴의 기본 패턴을 이미지 재구성 블록(56)에 제공하는데, 여기서 기본 패턴은 머리 변환 파라미터들, 즉, 요잉, 피칭 및롤링 파라미터들 H1 내지 H3에 대한 적응성 3D 모델을 회전시켜 스크린에 투사함으로써 획득된 2D 적응성 이미지를 나타내고, 기본 패턴 내의 왼쪽과 오른쪽 눈썹과, 왼쪽과 오른쪽 눈과, 입 및 턱이 인덱스되어 기본 눈썹, 눈, 입 및 턱 패턴들이 발생된다. 패턴 발생 블록(62)은 인덱스된 눈썹, 눈, 입 및 턱을 눈썹 재구성 블록(64), 눈 재구성 블록(66), 입 재구성 블록(68) 및 턱 재구성 블록(70)에 각각 제공한다.Initially, pattern generation block 62 provides the basic pattern of the new face to image reconstruction block 56, where the basic pattern is adaptable to the head transformation parameters, i.e. yawing, pitching and rolling parameters H1 to H3. Represents a 2D adaptive image obtained by rotating and projecting a 3D model onto a screen, and the left and right eyebrows, the left and right eyes, the mouth and the chin in the basic pattern are indexed, resulting in basic eyebrow, eye, mouth and chin patterns . The pattern generation block 62 provides the indexed eyebrows, eyes, mouth and chin to the eyebrow reconstruction block 64, the eye reconstruction block 66, the mouth reconstruction block 68 and the jaw reconstruction block 70, respectively.

눈썹 재구성 블록(64)은 왼쪽과 오른쪽 눈썹 변환 파라미터들 EB1 내지 EB6에 근거하여 인덱스된 눈썹으로부터 왼쪽과 오른쪽 눈섭을 재구성하여, 이 재구성된 왼쪽과 오른쪽 눈썹을 이미지 재구성 블록(56)에 제공한다.The eyebrow reconstruction block 64 reconstructs the left and right brows from the indexed eyebrows based on the left and right eyebrow transformation parameters EB1 through EB6 to provide this reconstructed left and right eyebrows to the image reconstruction block 56.

눈 재구성 블록(66)은 왼쪽과 오른쪽 눈 변환 파라미터들 E1 내지 E4, EL1 및 EL2에 근거하여 인덱스된 눈으로부터 왼쪽과 오른쪽 눈을 재구성하여, 이 재구성된 왼쪽과 오른쪽 눈을 이미지 재구성 블록(56)에 제공한다.The eye reconstruction block 66 reconstructs the left and right eyes from the indexed eye based on the left and right eye transformation parameters E1 to E4, EL1 and EL2, thereby reconstructing the reconstructed left and right eyes into an image reconstruction block 56. To provide.

입 재구성 블록(68)은 입 변환 파라미터들 L1 내지 L8 혹은 사운드, 속도 및액센트 파라미터들 중 하나에 근거하여 인덱스된 입으로부터 입을 재구성하여, 재구성된 입을 이미지 재구성 블록(56)에 제공한다.The mouth reconstruction block 68 reconstructs the mouth from the indexed mouth based on the mouth transformation parameters L1 to L8 or one of the sound, speed and accent parameters, and provides the reconstructed mouth to the image reconstruction block 56.

턱 재구성 블록(70)은 턱 변환 파라미터들 C1 내지 C3에 근거하여 인덱스된 턱으로부터 턱을 재구성하여, 이 재구성된 턱을 이미지 재구성 블록(56)에 제공한다.The jaw reconstruction block 70 reconstructs the jaw from the jaws indexed based on the jaw transformation parameters C1 through C3 to provide this reconstructed jaw to the image reconstruction block 56.

이미지 재구성 블록(56)은 패턴 발생 블록(62)으로부터 입력된 기본 패턴 내의 눈썹, 눈, 입 및 턱을 재구성 블록(63)으로부터 입력된 눈썹, 눈, 입 및 턱으로 치환하여 프레임 단위 혹은 필드 단위 중 하나로 새로운 얼굴의 새로운 이미지를 재구성한다.The image reconstruction block 56 replaces the eyebrows, eyes, mouths, and jaws in the basic pattern input from the pattern generation block 62 with the eyebrows, eyes, mouths, and jaws input from the reconstruction block 63, and then frames or fields. Reconstruct a new image of a new face with one of them.

본 발명이 특정한 실시예에 대하여 설명되었지만, 당업자라면 여러 가지 변경 및 변형이 다음의 청구범위에 정의된 바로서 본 발명의 정신 및 범주를 벗어나지 않고 행해질 수 있음을 알 수 있을 것이다.While the invention has been described with respect to particular embodiments, it will be apparent to those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the following claims.

그러므로, 본 발명에 따라 비디오 전화 및 원격 화상 회의 시스템에서 전송자의 입 영역을 포함하는 얼굴 형상에 대한 일반적인 정보를 3차원 얼굴 모델을 이용하여 전송함으로써 전송 데이터 양을 감축시킬 수 있는 효과가 있다.Therefore, according to the present invention, it is possible to reduce the amount of transmission data by transmitting general information about a face shape including a sender's mouth area using a three-dimensional face model in a videophone and a teleconference system.

Claims

Apparatus for encoding facial movement of a new face image based on a speech signal and a two-dimensional (2D) image signal of a new face image selectively provided in units of frames or fields in a 3D model-based coding system To

When initial data represents one or more 2D face images of a new face and a basic 3D model represents a 3D model of a common face of a human, an initial 3D model for generating an adaptive 3D model from the initial data of the new face based on the basic 3D model Adaptive 3D model generator;

A basic pattern generator for generating a basic pattern representing a 2D picture obtained by a rotation correlation between the 2D image signal and the adaptive 3D model, for the 2D image based on the adaptive 3D model;

A feature extractor for extracting from the 2D image signal one or more feature regions of the new face representing one or more regions where many transformations occur;

A parameter generator for comparing the feature areas with the basic pattern to detect a plurality of conversion parameters representing a comparison result;

A speech analyzer for generating a sound_bit signal representing one of said new face speaking or expressing emotion based on said speech signal;

A parameter modulator for modulating conversion parameters based on the speech signal to generate modified conversion parameters in response to the sound bit;

And a formatter for encoding the initial data and the transformation parameters or the modified transformation parameters.

The method of claim 1, wherein the basic pattern generator,

Means for determining, for the 2D image signal based on the adaptive 3D model, head parameters indicative of rotational conditions for the projected image of the adaptive 3D model to be similar to the 2D image signal;

Means for replacing the basic pattern with the projection image corresponding to the head parameters.

The method of claim 1, wherein the parameter generator,

Means for matching the feature areas with the basic pattern to calculate an amount of motion for each of the feature areas;

Means for storing the respective motion amounts in corresponding conversion parameters.

4. The method of claim 3, wherein the feature regions comprise left and right eyebrows, left and right eyes, mouth and chin, the transformation parameters comprising eyebrow transformation parameters, eye transformation parameters, mouth transformation parameters and jaw transformation parameters. Facial motion encoding device comprising.

The method of claim 4, wherein the voice analyzer,

Means for obtaining sound, speed, and accent spoken from the speech signal;

Means for comparing each of said spoken sound, velocity, and accents with a predetermined threshold to generate said sound_bit signal.

6. The apparatus of claim 5, wherein in response to the sound_bit signal, the mouth and jaw transformation parameters are changed based on the speech signal to generate a changed mouth transformation parameter and a changed jaw transformation parameter, respectively.

A method for encoding facial motion of a new face image based on a speech signal and a two-dimensional (2D) image signal of a new face image selectively provided on a frame or field basis in a 3D model-based coding system To

(a) generating an adaptive 3D model from initial data representing one or more 2D face images of a new face image based on a base 3D model representing a 3D model of a human common face;

(b) generating a basic pattern representing the 2D picture obtained by the rotation correlation between the 2D image signal and the adaptive 3D model for the 2D image signal based on the adaptive 3D model;

(c) extracting one or more feature regions from which the multiple transformations of the new face image occur from the 2D image signal;

(d) comparing the feature areas with the basic pattern to detect a plurality of conversion parameters indicative of a comparison result;

(e) changing the conversion parameters based on the speech signal to generate modified conversion parameters;

(f) encoding said initial data and said modified transformation parameters.

The method of claim 7, wherein step (b)

(b1) determining head parameters indicative of a rotation condition for an image of the adaptive 3D model that is similar to a 2D image signal based on the adaptive 3D model;

(b2) replacing the basic pattern with the projection image corresponding to the head parameters.

The method of claim 7, wherein step (d) is

(d1) matching the feature areas with the basic pattern to calculate an amount of motion for each of the feature areas;

(d2) storing the respective motion amounts in corresponding conversion parameters.

The method of claim 7, wherein step (e)

(e1) obtaining sound, speed and accent spoken from the speech signal;

(e2) comparing a spoken sound, speed and accent with a preset threshold to generate a sound_bit signal indicative of either said new face image speaking or expressing emotion;

(e3) if the new face image is determined to speak, modulating mouth conversion parameters based on the spoken sound, speed, and accent to generate modified mouth conversion parameters. Motion encoding method.