WO2024101769A1 - Procédé et système de génération d'un mouvement facial de modèle tridimensionnel auquel l'expression faciale et l'état émotionnel de l'utilisateur sont appliqués - Google Patents
Procédé et système de génération d'un mouvement facial de modèle tridimensionnel auquel l'expression faciale et l'état émotionnel de l'utilisateur sont appliqués Download PDFInfo
- Publication number
- WO2024101769A1 WO2024101769A1 PCT/KR2023/017327 KR2023017327W WO2024101769A1 WO 2024101769 A1 WO2024101769 A1 WO 2024101769A1 KR 2023017327 W KR2023017327 W KR 2023017327W WO 2024101769 A1 WO2024101769 A1 WO 2024101769A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- user
- facial
- control information
- character image
- generating
- Prior art date
Links
- 230000001815 facial effect Effects 0.000 title claims abstract description 120
- 230000002996 emotional effect Effects 0.000 title claims abstract description 49
- 230000008921 facial expression Effects 0.000 title claims abstract description 32
- 238000000034 method Methods 0.000 title claims abstract description 25
- 238000000605 extraction Methods 0.000 claims description 20
- 239000000284 extract Substances 0.000 claims description 16
- 230000008451 emotion Effects 0.000 claims description 13
- 238000004088 simulation Methods 0.000 claims description 12
- 230000001360 synchronised effect Effects 0.000 claims description 4
- 238000013473 artificial intelligence Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
- G06V40/176—Dynamic expression
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
- G10L15/25—Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
Definitions
- the present invention relates to facial motion capture, and more specifically, to a method of automatically generating a virtual character image that tracks and imitates a user's face in real time.
- a facial motion capture system is a system that captures a user's image with a camera, identifies the user's facial motion, and allows a virtual character to imitate it.
- the present invention was created to solve the above problems.
- the purpose of the present invention is to create a virtual character image that imitates the user, by applying the user's facial expression characteristics and emotional state to create a more natural and rich facial expression.
- a method for generating a user replica character image includes the steps of receiving a user face image; Extracting facial movement from the input image; Generating facial motion control information from the extracted facial motion; extracting a user emotional state; Applying the extracted emotional state to facial movement control information; It includes: generating a character image with a moving face based on facial movement control information.
- the facial movement control information generation step may be identifying the user's facial expression from the user's facial movement and applying the identified facial expression to the facial movement control information.
- the facial movement control information generation step may be to identify the user's expression from the user's facial movement by utilizing feature information about the user's facial expression collected in advance.
- the emotional state extraction step may be extracting the user's emotional state from the extracted facial movements.
- the application step may be applying information that quantifies the characteristics of facial movement according to the extracted emotional state to facial movement control information.
- the method for generating a user replica character image includes the steps of receiving a user voice synchronized with a user face image; It may further include converting the input voice into text, and the facial movement control information generating step may refer to text in addition to the extracted facial movement to generate facial movement control information.
- Text can be referenced to generate control information for mouth movements.
- the emotional state extraction step may be extracting the user's emotional state from the extracted facial movements and the input user voice.
- the character image generation step may be to generate a character image by analyzing facial movement control information and using an artificial intelligence model that has been learned to generate a character image with the face moving according to the control information.
- an input unit that receives a user's face image; a motion extraction unit that extracts facial movement from the input image; a control information generator that generates facial movement control information from the extracted facial movements; an emotion extraction unit that extracts the user's emotional state; An application unit that applies the extracted emotional state to facial movement control information; A character image generating unit generating a character image with a moving face based on facial movement control information is provided.
- extracting facial movement from a user's face image Generating facial motion control information from the extracted facial motion; Applying the user's emotional state to facial movement control information;
- a method for generating a user simulated character image comprising: generating a character image with a moving face based on facial movement control information.
- an extraction unit for extracting facial movement from a user's face image; a control information generator that generates facial movement control information from the extracted facial movements; an application unit that applies the user's emotional state to facial movement control information; A character image generating unit generating a character image with a moving face based on facial movement control information is provided.
- a foundation is laid for using one's own digital twin character in the digital world such as a non-face-to-face environment, and the digital character's By creating natural movements, we can lay the foundation for using them in digital content such as movies, games, and animations.
- Figure 2 is a diagram showing the configuration of a user simulation character image generation system according to an embodiment of the present invention
- FIG. 3 is a flowchart provided to explain a method for generating a user simulation character image according to another embodiment of the present invention.
- Figure 4 is a diagram showing the configuration of a character creation system according to another embodiment of the present invention.
- Figure 5 is a flowchart provided to explain a method for generating a user simulation character image according to another embodiment of the present invention.
- An embodiment of the present invention presents a method and system for generating 3D model facial movements applying an individual's facial expression characteristics and emotional state.
- a facial movement generation system that moves more similar to the user as a digital twin character is implemented, and arbitrary emotional states can also be applied. This is a technology that allows digital characters to have rich facial expressions.
- Figure 2 is a diagram illustrating the configuration of a user simulation character image generation system according to an embodiment of the present invention.
- the 'user simulation character image generation system' (hereinafter abbreviated as 'character creation system') according to an embodiment of the present invention includes an image input unit 110, a facial motion extraction unit 120, and a control information generation unit. (130), an emotion extraction unit 140, an emotion application unit 150, and a character image creation unit 160.
- the image input unit 110 receives a user's face image captured through a camera in units of consecutive frames.
- the facial motion extraction unit 120 extracts feature points from the user's face image input through the image input unit 110 and extracts the user's facial movement by identifying the movement of the extracted feature points.
- the control information generator 130 generates facial motion control information using the facial motion extracted by the facial motion extractor 120.
- the control information generator 130 utilizes feature information about the user's facial expression collected in advance.
- control information generator 130 determines the user's facial expression from the user's facial movement and applies the identified facial expression to the facial movement control information.
- facial expressions are different for each person, facial movements are not directly applied to generate control information, but facial expressions are interpreted and the interpreted facial expressions are reflected in facial movement control information.
- the emotion extraction unit 140 extracts the user's emotional state from the user's facial movement extracted by the facial motion extraction unit 120. To achieve this, we utilize a known algorithm that estimates emotional states from user facial movements.
- the emotion application unit 150 reflects the emotional state extracted by the emotion extraction unit 140 in the facial movement control information generated by the control information generation unit 130. For this purpose, information that quantifies the characteristics of facial movements according to emotional state is prepared in advance and used.
- emotional states appear in facial expressions
- emotional states cannot be defined by facial expressions. Accordingly, the user's emotional state is interpreted and even the interpreted emotional state is reflected in the facial movement control information, allowing the character to have subtle changes according to the corresponding emotional state, thereby enriching the facial expressions.
- the character image generator 160 generates and outputs a character image with a moving face based on facial movement control information generated by the control information generator 130 and then reflected in the emotional state by the emotion application unit 150. .
- the character image generator 160 may generate a character image by analyzing facial movement control information and using an artificial intelligence model learned to generate a character image with the face moving according to the control information.
- Figure 3 is a flowchart provided to explain a method for generating a user simulation character image according to another embodiment of the present invention.
- the image input unit 110 first receives the user's face image captured through a camera (S210), and the facial motion extractor 120 extracts the user's face movement from the user's face image input in step S210. Extract (S220).
- control information generator 130 generates facial movement control information using the facial movement extracted in step S220 and applies the user's facial expression identified from the user's facial movement (S230).
- the emotion extraction unit 140 extracts the user's emotional state from the user's facial movement extracted in step S220 (S240), and the emotion application unit 150 controls the facial movement generated in step S230 using the emotional state extracted in step S240. Reflected in information (S250).
- the character image generator 160 generates and outputs a character image with a moving face based on the facial movement control information generated in step S230 and the emotional state reflected in step S250 (S260).
- Figure 4 is a diagram showing the configuration of a character creation system according to another embodiment of the present invention.
- the character creation system according to an embodiment of the present invention is a system shown in FIG. 2 in which a voice input unit 170 and a text conversion unit 180 are further added.
- the voice input unit 170 receives a user's voice synchronized with the user's face image input to the video input unit 110.
- the text converter 180 is a STT (Speech To Text) module that converts the user's voice input through the voice input unit 170 into text.
- the text generated by the text conversion unit 180 is transmitted to the control information generation unit 130 and used to generate facial movement control information. Specifically, when generating facial movement control information, the control information generator 130 reflects the text pronounced by the user for control information about mouth movement.
- the mouth shape is influenced by the pronounced text, when creating the mouth shape movement, it reflects the text pronounced by the user in addition to the user's facial movement, and this can make the mouth shape more accurate and natural.
- the user's voice input to the voice input unit 170 is referenced to extract the user's emotional state.
- the emotion extraction unit 140 extracts the user's emotional state by further referring to the user's voice input by the voice input unit 170 in addition to the user's facial movement extracted by the facial motion extraction unit 120.
- Figure 5 is a flowchart provided to explain a method for generating a user simulation character image according to another embodiment of the present invention.
- the image input unit 110 first receives the user's face image captured through a camera (S310), and the facial motion extractor 120 extracts the user's face movement from the user's face image input in step S310. Extract (S320).
- the voice input unit 170 receives the user's voice synchronized with the user's face image input in step S310 (S330), and the text converter 180 converts the user's voice input in step S330 into text (S340).
- control information generator 130 generates facial movement control information using the facial movement extracted in step S320 and the text converted in step S30 (S350).
- the emotion extraction unit 140 extracts the user's emotional state from the user's facial movement extracted in step S320 and the user's voice input in step S310 (S360), and the emotion application unit 150 extracts the emotional state extracted in step S360. It is reflected in the facial movement control information generated in step S350 (S370).
- the character image generator 160 generates and outputs a character image with a moving face based on the facial movement control information generated in step S350 and the emotional state reflected in step S370 (S380).
- a virtual character that sufficiently reflects the characteristics of an actual user, it lays the foundation for using one's own digital twin character in a digital world such as a non-face-to-face environment, and by creating natural movements of the digital character. It can be used in digital content such as movies, games, and animations.
- a computer-readable recording medium can be any data storage device that can be read by a computer and store data.
- computer-readable recording media can be ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical disk, hard disk drive, etc.
- computer-readable codes or programs stored on a computer-readable recording medium may be transmitted through a network connected between computers.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Child & Adolescent Psychology (AREA)
- Signal Processing (AREA)
- Psychiatry (AREA)
- Hospice & Palliative Care (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Processing Or Creating Images (AREA)
Abstract
L'invention concerne un procédé et un système de génération d'un mouvement facial de modèle 3D auquel une expression faciale et un état émotionnel d'un utilisateur sont appliqués. Un procédé de génération de vidéo de personnage imitant un utilisateur selon un mode de réalisation de la présente invention comprend l'extraction d'un mouvement facial à partir d'une vidéo de visage d'utilisateur, la génération d'informations de commande de mouvement facial à partir du mouvement facial extrait, l'application d'un état émotionnel d'utilisateur aux informations de commande de mouvement facial, et la génération d'une vidéo de personnage dans laquelle un visage se déplace sur la base des informations de commande de mouvement facial. Par conséquent, la vidéo de personnage avec une expression faciale plus naturelle et riche peut être générée par application des caractéristiques faciales et des états émotionnels d'un utilisateur, et un mouvement naturel d'un personnage numérique peut être créé, ce qui permet de préparer une base pour l'utilisation du personnage numérique dans un contenu numérique, tel que des films, des jeux et des dessins animés.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2022-0150085 | 2022-11-11 | ||
KR1020220150085A KR20240068992A (ko) | 2022-11-11 | 2022-11-11 | 사용자의 표정과 감정 상태를 적용한 3d 모델 얼굴 움직임 생성 방법 및 시스템 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024101769A1 true WO2024101769A1 (fr) | 2024-05-16 |
Family
ID=91032769
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2023/017327 WO2024101769A1 (fr) | 2022-11-11 | 2023-11-02 | Procédé et système de génération d'un mouvement facial de modèle tridimensionnel auquel l'expression faciale et l'état émotionnel de l'utilisateur sont appliqués |
Country Status (2)
Country | Link |
---|---|
KR (1) | KR20240068992A (fr) |
WO (1) | WO2024101769A1 (fr) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20060091435A (ko) * | 2005-02-15 | 2006-08-21 | 에스케이 텔레콤주식회사 | 이동통신망에서 3d 캐릭터를 이용한 뉴스 정보를 제공하는방법 및 시스템 |
KR20170062089A (ko) * | 2015-11-27 | 2017-06-07 | 주식회사 매니아마인드 | 3d아바타의 표정 구현 방법 및 프로그램 |
KR20190000087A (ko) * | 2017-06-22 | 2019-01-02 | 전자부품연구원 | 얼굴 표정 인식을 활용한 멀티미디어 가공 방법 및 시스템 |
KR20200053163A (ko) * | 2018-11-08 | 2020-05-18 | 백으뜸 | 무안경식 가상현실 콘텐츠 제공 장치 및 방법 |
KR20220034396A (ko) * | 2020-09-11 | 2022-03-18 | 주식회사 케이티 | 얼굴 영상 생성 장치, 방법 및 컴퓨터 프로그램 |
-
2022
- 2022-11-11 KR KR1020220150085A patent/KR20240068992A/ko not_active IP Right Cessation
-
2023
- 2023-11-02 WO PCT/KR2023/017327 patent/WO2024101769A1/fr unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20060091435A (ko) * | 2005-02-15 | 2006-08-21 | 에스케이 텔레콤주식회사 | 이동통신망에서 3d 캐릭터를 이용한 뉴스 정보를 제공하는방법 및 시스템 |
KR20170062089A (ko) * | 2015-11-27 | 2017-06-07 | 주식회사 매니아마인드 | 3d아바타의 표정 구현 방법 및 프로그램 |
KR20190000087A (ko) * | 2017-06-22 | 2019-01-02 | 전자부품연구원 | 얼굴 표정 인식을 활용한 멀티미디어 가공 방법 및 시스템 |
KR20200053163A (ko) * | 2018-11-08 | 2020-05-18 | 백으뜸 | 무안경식 가상현실 콘텐츠 제공 장치 및 방법 |
KR20220034396A (ko) * | 2020-09-11 | 2022-03-18 | 주식회사 케이티 | 얼굴 영상 생성 장치, 방법 및 컴퓨터 프로그램 |
Also Published As
Publication number | Publication date |
---|---|
KR20240068992A (ko) | 2024-05-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115205949B (zh) | 图像生成方法以及相关设备 | |
CN111429885B (zh) | 一种将音频片段映射为人脸嘴型关键点的方法 | |
JP2021192222A (ja) | 動画インタラクティブ方法と装置、電子デバイス、コンピュータ可読記憶媒体、及び、コンピュータプログラム | |
WO2013141522A1 (fr) | Jeu de karaoké et de danse | |
CN110488975A (zh) | 一种基于人工智能的数据处理方法及相关装置 | |
JP2014519082A5 (fr) | ||
WO2023080266A1 (fr) | Procédé et appareil de conversion de visage utilisant un réseau d'apprentissage profond | |
WO2024101769A1 (fr) | Procédé et système de génération d'un mouvement facial de modèle tridimensionnel auquel l'expression faciale et l'état émotionnel de l'utilisateur sont appliqués | |
CN117135331A (zh) | 一种用于生成3d数字人视频的方法及系统 | |
WO2021025279A1 (fr) | Système, procédé et support de stockage lisible par ordinateur pour optimiser une expression d'un caractère virtuel via une classification et un reciblage d'une expression basés sur l'intelligence artificielle (ai) | |
WO2023096275A1 (fr) | Procédé et système de génération d'avatar textuel | |
WO2022108275A1 (fr) | Procédé et dispositif de génération d'un visage virtuel à l'aide de l'intelligence artificielle | |
WO2021261687A1 (fr) | Dispositif et procédé permettant de reconstruire un modèle de forme et de posture humaine tridimensionnel sur la base d'une image | |
WO2023239041A1 (fr) | Création d'images, de maillages et d'animations de conversation à partir de données de forme de bouche | |
CN117787956A (zh) | 基于元宇宙的电力巡检方法、系统、设备及介质 | |
CN112002005A (zh) | 一种基于云端的远程虚拟协同主持的方法 | |
WO2023277421A1 (fr) | Procédé de segmentation de langue des signes en morphèmes, procédé de prédiction de positions de morphèmes, et procédé d'augmentation de données | |
WO2022260385A1 (fr) | Procédé et dispositif de synthèse d'arrière-plan et de visage par prise en compte de la forme d'un visage et utilisation d'un réseau d'apprentissage profond | |
CN116129860A (zh) | 基于ai人工智能技术的元宇宙虚拟人图书自动播报方法 | |
KR100445846B1 (ko) | 대인 공포증 치료를 위한 가상 연설 시뮬레이터 | |
CN117119123A (zh) | 一种基于视频素材生成数字人视频的方法及系统 | |
CN115690280A (zh) | 一种三维形象发音口型模拟方法 | |
WO2024117616A1 (fr) | Système et procédé de fourniture d'un service métavers utilisant un humain numérique capable de synchronisation et d'interaction en temps réel à l'aide d'une caméra et d'une reconnaissance de capture de mouvements | |
WO2022131390A1 (fr) | Procédé d'estimation de posture humaine tridimensionnelle basée sur un apprentissage auto-supervisé utilisant des images multi-vues | |
WO2024101485A1 (fr) | Procédé et système de production d'hologramme d'image animée |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23889031 Country of ref document: EP Kind code of ref document: A1 |