[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2024101769A1 - Procédé et système de génération d'un mouvement facial de modèle tridimensionnel auquel l'expression faciale et l'état émotionnel de l'utilisateur sont appliqués - Google Patents

Procédé et système de génération d'un mouvement facial de modèle tridimensionnel auquel l'expression faciale et l'état émotionnel de l'utilisateur sont appliqués Download PDF

Info

Publication number
WO2024101769A1
WO2024101769A1 PCT/KR2023/017327 KR2023017327W WO2024101769A1 WO 2024101769 A1 WO2024101769 A1 WO 2024101769A1 KR 2023017327 W KR2023017327 W KR 2023017327W WO 2024101769 A1 WO2024101769 A1 WO 2024101769A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
facial
control information
character image
generating
Prior art date
Application number
PCT/KR2023/017327
Other languages
English (en)
Korean (ko)
Inventor
김용화
윤상필
홍성희
김영민
홍지수
정진수
이병효
오현찬
Original Assignee
한국전자기술연구원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 한국전자기술연구원 filed Critical 한국전자기술연구원
Publication of WO2024101769A1 publication Critical patent/WO2024101769A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/176Dynamic expression
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • G10L15/25Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • the present invention relates to facial motion capture, and more specifically, to a method of automatically generating a virtual character image that tracks and imitates a user's face in real time.
  • a facial motion capture system is a system that captures a user's image with a camera, identifies the user's facial motion, and allows a virtual character to imitate it.
  • the present invention was created to solve the above problems.
  • the purpose of the present invention is to create a virtual character image that imitates the user, by applying the user's facial expression characteristics and emotional state to create a more natural and rich facial expression.
  • a method for generating a user replica character image includes the steps of receiving a user face image; Extracting facial movement from the input image; Generating facial motion control information from the extracted facial motion; extracting a user emotional state; Applying the extracted emotional state to facial movement control information; It includes: generating a character image with a moving face based on facial movement control information.
  • the facial movement control information generation step may be identifying the user's facial expression from the user's facial movement and applying the identified facial expression to the facial movement control information.
  • the facial movement control information generation step may be to identify the user's expression from the user's facial movement by utilizing feature information about the user's facial expression collected in advance.
  • the emotional state extraction step may be extracting the user's emotional state from the extracted facial movements.
  • the application step may be applying information that quantifies the characteristics of facial movement according to the extracted emotional state to facial movement control information.
  • the method for generating a user replica character image includes the steps of receiving a user voice synchronized with a user face image; It may further include converting the input voice into text, and the facial movement control information generating step may refer to text in addition to the extracted facial movement to generate facial movement control information.
  • Text can be referenced to generate control information for mouth movements.
  • the emotional state extraction step may be extracting the user's emotional state from the extracted facial movements and the input user voice.
  • the character image generation step may be to generate a character image by analyzing facial movement control information and using an artificial intelligence model that has been learned to generate a character image with the face moving according to the control information.
  • an input unit that receives a user's face image; a motion extraction unit that extracts facial movement from the input image; a control information generator that generates facial movement control information from the extracted facial movements; an emotion extraction unit that extracts the user's emotional state; An application unit that applies the extracted emotional state to facial movement control information; A character image generating unit generating a character image with a moving face based on facial movement control information is provided.
  • extracting facial movement from a user's face image Generating facial motion control information from the extracted facial motion; Applying the user's emotional state to facial movement control information;
  • a method for generating a user simulated character image comprising: generating a character image with a moving face based on facial movement control information.
  • an extraction unit for extracting facial movement from a user's face image; a control information generator that generates facial movement control information from the extracted facial movements; an application unit that applies the user's emotional state to facial movement control information; A character image generating unit generating a character image with a moving face based on facial movement control information is provided.
  • a foundation is laid for using one's own digital twin character in the digital world such as a non-face-to-face environment, and the digital character's By creating natural movements, we can lay the foundation for using them in digital content such as movies, games, and animations.
  • Figure 2 is a diagram showing the configuration of a user simulation character image generation system according to an embodiment of the present invention
  • FIG. 3 is a flowchart provided to explain a method for generating a user simulation character image according to another embodiment of the present invention.
  • Figure 4 is a diagram showing the configuration of a character creation system according to another embodiment of the present invention.
  • Figure 5 is a flowchart provided to explain a method for generating a user simulation character image according to another embodiment of the present invention.
  • An embodiment of the present invention presents a method and system for generating 3D model facial movements applying an individual's facial expression characteristics and emotional state.
  • a facial movement generation system that moves more similar to the user as a digital twin character is implemented, and arbitrary emotional states can also be applied. This is a technology that allows digital characters to have rich facial expressions.
  • Figure 2 is a diagram illustrating the configuration of a user simulation character image generation system according to an embodiment of the present invention.
  • the 'user simulation character image generation system' (hereinafter abbreviated as 'character creation system') according to an embodiment of the present invention includes an image input unit 110, a facial motion extraction unit 120, and a control information generation unit. (130), an emotion extraction unit 140, an emotion application unit 150, and a character image creation unit 160.
  • the image input unit 110 receives a user's face image captured through a camera in units of consecutive frames.
  • the facial motion extraction unit 120 extracts feature points from the user's face image input through the image input unit 110 and extracts the user's facial movement by identifying the movement of the extracted feature points.
  • the control information generator 130 generates facial motion control information using the facial motion extracted by the facial motion extractor 120.
  • the control information generator 130 utilizes feature information about the user's facial expression collected in advance.
  • control information generator 130 determines the user's facial expression from the user's facial movement and applies the identified facial expression to the facial movement control information.
  • facial expressions are different for each person, facial movements are not directly applied to generate control information, but facial expressions are interpreted and the interpreted facial expressions are reflected in facial movement control information.
  • the emotion extraction unit 140 extracts the user's emotional state from the user's facial movement extracted by the facial motion extraction unit 120. To achieve this, we utilize a known algorithm that estimates emotional states from user facial movements.
  • the emotion application unit 150 reflects the emotional state extracted by the emotion extraction unit 140 in the facial movement control information generated by the control information generation unit 130. For this purpose, information that quantifies the characteristics of facial movements according to emotional state is prepared in advance and used.
  • emotional states appear in facial expressions
  • emotional states cannot be defined by facial expressions. Accordingly, the user's emotional state is interpreted and even the interpreted emotional state is reflected in the facial movement control information, allowing the character to have subtle changes according to the corresponding emotional state, thereby enriching the facial expressions.
  • the character image generator 160 generates and outputs a character image with a moving face based on facial movement control information generated by the control information generator 130 and then reflected in the emotional state by the emotion application unit 150. .
  • the character image generator 160 may generate a character image by analyzing facial movement control information and using an artificial intelligence model learned to generate a character image with the face moving according to the control information.
  • Figure 3 is a flowchart provided to explain a method for generating a user simulation character image according to another embodiment of the present invention.
  • the image input unit 110 first receives the user's face image captured through a camera (S210), and the facial motion extractor 120 extracts the user's face movement from the user's face image input in step S210. Extract (S220).
  • control information generator 130 generates facial movement control information using the facial movement extracted in step S220 and applies the user's facial expression identified from the user's facial movement (S230).
  • the emotion extraction unit 140 extracts the user's emotional state from the user's facial movement extracted in step S220 (S240), and the emotion application unit 150 controls the facial movement generated in step S230 using the emotional state extracted in step S240. Reflected in information (S250).
  • the character image generator 160 generates and outputs a character image with a moving face based on the facial movement control information generated in step S230 and the emotional state reflected in step S250 (S260).
  • Figure 4 is a diagram showing the configuration of a character creation system according to another embodiment of the present invention.
  • the character creation system according to an embodiment of the present invention is a system shown in FIG. 2 in which a voice input unit 170 and a text conversion unit 180 are further added.
  • the voice input unit 170 receives a user's voice synchronized with the user's face image input to the video input unit 110.
  • the text converter 180 is a STT (Speech To Text) module that converts the user's voice input through the voice input unit 170 into text.
  • the text generated by the text conversion unit 180 is transmitted to the control information generation unit 130 and used to generate facial movement control information. Specifically, when generating facial movement control information, the control information generator 130 reflects the text pronounced by the user for control information about mouth movement.
  • the mouth shape is influenced by the pronounced text, when creating the mouth shape movement, it reflects the text pronounced by the user in addition to the user's facial movement, and this can make the mouth shape more accurate and natural.
  • the user's voice input to the voice input unit 170 is referenced to extract the user's emotional state.
  • the emotion extraction unit 140 extracts the user's emotional state by further referring to the user's voice input by the voice input unit 170 in addition to the user's facial movement extracted by the facial motion extraction unit 120.
  • Figure 5 is a flowchart provided to explain a method for generating a user simulation character image according to another embodiment of the present invention.
  • the image input unit 110 first receives the user's face image captured through a camera (S310), and the facial motion extractor 120 extracts the user's face movement from the user's face image input in step S310. Extract (S320).
  • the voice input unit 170 receives the user's voice synchronized with the user's face image input in step S310 (S330), and the text converter 180 converts the user's voice input in step S330 into text (S340).
  • control information generator 130 generates facial movement control information using the facial movement extracted in step S320 and the text converted in step S30 (S350).
  • the emotion extraction unit 140 extracts the user's emotional state from the user's facial movement extracted in step S320 and the user's voice input in step S310 (S360), and the emotion application unit 150 extracts the emotional state extracted in step S360. It is reflected in the facial movement control information generated in step S350 (S370).
  • the character image generator 160 generates and outputs a character image with a moving face based on the facial movement control information generated in step S350 and the emotional state reflected in step S370 (S380).
  • a virtual character that sufficiently reflects the characteristics of an actual user, it lays the foundation for using one's own digital twin character in a digital world such as a non-face-to-face environment, and by creating natural movements of the digital character. It can be used in digital content such as movies, games, and animations.
  • a computer-readable recording medium can be any data storage device that can be read by a computer and store data.
  • computer-readable recording media can be ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical disk, hard disk drive, etc.
  • computer-readable codes or programs stored on a computer-readable recording medium may be transmitted through a network connected between computers.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Child & Adolescent Psychology (AREA)
  • Signal Processing (AREA)
  • Psychiatry (AREA)
  • Hospice & Palliative Care (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Processing Or Creating Images (AREA)

Abstract

L'invention concerne un procédé et un système de génération d'un mouvement facial de modèle 3D auquel une expression faciale et un état émotionnel d'un utilisateur sont appliqués. Un procédé de génération de vidéo de personnage imitant un utilisateur selon un mode de réalisation de la présente invention comprend l'extraction d'un mouvement facial à partir d'une vidéo de visage d'utilisateur, la génération d'informations de commande de mouvement facial à partir du mouvement facial extrait, l'application d'un état émotionnel d'utilisateur aux informations de commande de mouvement facial, et la génération d'une vidéo de personnage dans laquelle un visage se déplace sur la base des informations de commande de mouvement facial. Par conséquent, la vidéo de personnage avec une expression faciale plus naturelle et riche peut être générée par application des caractéristiques faciales et des états émotionnels d'un utilisateur, et un mouvement naturel d'un personnage numérique peut être créé, ce qui permet de préparer une base pour l'utilisation du personnage numérique dans un contenu numérique, tel que des films, des jeux et des dessins animés.
PCT/KR2023/017327 2022-11-11 2023-11-02 Procédé et système de génération d'un mouvement facial de modèle tridimensionnel auquel l'expression faciale et l'état émotionnel de l'utilisateur sont appliqués WO2024101769A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2022-0150085 2022-11-11
KR1020220150085A KR20240068992A (ko) 2022-11-11 2022-11-11 사용자의 표정과 감정 상태를 적용한 3d 모델 얼굴 움직임 생성 방법 및 시스템

Publications (1)

Publication Number Publication Date
WO2024101769A1 true WO2024101769A1 (fr) 2024-05-16

Family

ID=91032769

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2023/017327 WO2024101769A1 (fr) 2022-11-11 2023-11-02 Procédé et système de génération d'un mouvement facial de modèle tridimensionnel auquel l'expression faciale et l'état émotionnel de l'utilisateur sont appliqués

Country Status (2)

Country Link
KR (1) KR20240068992A (fr)
WO (1) WO2024101769A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20060091435A (ko) * 2005-02-15 2006-08-21 에스케이 텔레콤주식회사 이동통신망에서 3d 캐릭터를 이용한 뉴스 정보를 제공하는방법 및 시스템
KR20170062089A (ko) * 2015-11-27 2017-06-07 주식회사 매니아마인드 3d아바타의 표정 구현 방법 및 프로그램
KR20190000087A (ko) * 2017-06-22 2019-01-02 전자부품연구원 얼굴 표정 인식을 활용한 멀티미디어 가공 방법 및 시스템
KR20200053163A (ko) * 2018-11-08 2020-05-18 백으뜸 무안경식 가상현실 콘텐츠 제공 장치 및 방법
KR20220034396A (ko) * 2020-09-11 2022-03-18 주식회사 케이티 얼굴 영상 생성 장치, 방법 및 컴퓨터 프로그램

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20060091435A (ko) * 2005-02-15 2006-08-21 에스케이 텔레콤주식회사 이동통신망에서 3d 캐릭터를 이용한 뉴스 정보를 제공하는방법 및 시스템
KR20170062089A (ko) * 2015-11-27 2017-06-07 주식회사 매니아마인드 3d아바타의 표정 구현 방법 및 프로그램
KR20190000087A (ko) * 2017-06-22 2019-01-02 전자부품연구원 얼굴 표정 인식을 활용한 멀티미디어 가공 방법 및 시스템
KR20200053163A (ko) * 2018-11-08 2020-05-18 백으뜸 무안경식 가상현실 콘텐츠 제공 장치 및 방법
KR20220034396A (ko) * 2020-09-11 2022-03-18 주식회사 케이티 얼굴 영상 생성 장치, 방법 및 컴퓨터 프로그램

Also Published As

Publication number Publication date
KR20240068992A (ko) 2024-05-20

Similar Documents

Publication Publication Date Title
CN115205949B (zh) 图像生成方法以及相关设备
CN111429885B (zh) 一种将音频片段映射为人脸嘴型关键点的方法
JP2021192222A (ja) 動画インタラクティブ方法と装置、電子デバイス、コンピュータ可読記憶媒体、及び、コンピュータプログラム
WO2013141522A1 (fr) Jeu de karaoké et de danse
CN110488975A (zh) 一种基于人工智能的数据处理方法及相关装置
JP2014519082A5 (fr)
WO2023080266A1 (fr) Procédé et appareil de conversion de visage utilisant un réseau d'apprentissage profond
WO2024101769A1 (fr) Procédé et système de génération d'un mouvement facial de modèle tridimensionnel auquel l'expression faciale et l'état émotionnel de l'utilisateur sont appliqués
CN117135331A (zh) 一种用于生成3d数字人视频的方法及系统
WO2021025279A1 (fr) Système, procédé et support de stockage lisible par ordinateur pour optimiser une expression d'un caractère virtuel via une classification et un reciblage d'une expression basés sur l'intelligence artificielle (ai)
WO2023096275A1 (fr) Procédé et système de génération d'avatar textuel
WO2022108275A1 (fr) Procédé et dispositif de génération d'un visage virtuel à l'aide de l'intelligence artificielle
WO2021261687A1 (fr) Dispositif et procédé permettant de reconstruire un modèle de forme et de posture humaine tridimensionnel sur la base d'une image
WO2023239041A1 (fr) Création d'images, de maillages et d'animations de conversation à partir de données de forme de bouche
CN117787956A (zh) 基于元宇宙的电力巡检方法、系统、设备及介质
CN112002005A (zh) 一种基于云端的远程虚拟协同主持的方法
WO2023277421A1 (fr) Procédé de segmentation de langue des signes en morphèmes, procédé de prédiction de positions de morphèmes, et procédé d'augmentation de données
WO2022260385A1 (fr) Procédé et dispositif de synthèse d'arrière-plan et de visage par prise en compte de la forme d'un visage et utilisation d'un réseau d'apprentissage profond
CN116129860A (zh) 基于ai人工智能技术的元宇宙虚拟人图书自动播报方法
KR100445846B1 (ko) 대인 공포증 치료를 위한 가상 연설 시뮬레이터
CN117119123A (zh) 一种基于视频素材生成数字人视频的方法及系统
CN115690280A (zh) 一种三维形象发音口型模拟方法
WO2024117616A1 (fr) Système et procédé de fourniture d'un service métavers utilisant un humain numérique capable de synchronisation et d'interaction en temps réel à l'aide d'une caméra et d'une reconnaissance de capture de mouvements
WO2022131390A1 (fr) Procédé d'estimation de posture humaine tridimensionnelle basée sur un apprentissage auto-supervisé utilisant des images multi-vues
WO2024101485A1 (fr) Procédé et système de production d'hologramme d'image animée

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23889031

Country of ref document: EP

Kind code of ref document: A1