CN117726728A

CN117726728A - Avatar generation method, device, electronic equipment and storage medium

Info

Publication number: CN117726728A
Application number: CN202410033947.6A
Authority: CN
Inventors: 吴昊潜; 李林橙; 吴佳阳; 陈伟杰; 武蕴杰; 范长杰; 胡志鹏
Original assignee: Netease Hangzhou Network Co Ltd
Current assignee: Netease Hangzhou Network Co Ltd
Priority date: 2024-01-09
Filing date: 2024-01-09
Publication date: 2024-03-19

Abstract

The application provides an avatar generation method, an avatar generation device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a description text for creating an avatar; tracking an editing intention for the avatar based on the acquired descriptive text, and generating a reply text; and displaying a reply text and an avatar generated according to the editing intention, wherein the reply text is used for indicating the generated avatar to carry out editing response on the descriptive text. According to the interactive virtual image generation scheme, the operation of a user can be simplified, and interactive response can be timely made.

Description

Virtual image generation method, device, electronic equipment and storage medium

技术领域Technical field

本申请涉及计算机应用技术领域，尤其涉及一种虚拟形象生成方法、装置、电子设备及存储介质。The present application relates to the field of computer application technology, and in particular to a virtual image generation method, device, electronic equipment and storage medium.

背景技术Background technique

“捏脸”提供了虚拟人物形象定制功能，是角色扮演游戏、增强现实、元宇宙中的一项重要的模块。其允许用户调整虚拟角色的骨骼位置、妆容属性等角色参数，以满足用户的个性化定制需求，极大地增强用户的沉浸式体验。"Face pinching" provides the function of customizing virtual characters and is an important module in role-playing games, augmented reality, and the metaverse. It allows users to adjust the virtual character's skeletal position, makeup attributes and other character parameters to meet the user's personalized customization needs and greatly enhance the user's immersive experience.

然而，数以百计的角色参数虽然提供了极大的角色创建自由度，但同时也增大了用户的操作负担。目前，虚拟角色自动生成技术快速发展，允许用户可以通过简单的输入来生成虚拟角色，例如，输入图像、文本等，帮助用户节约时间，进一步提升可玩性。However, although hundreds of character parameters provide great freedom in character creation, they also increase the user's operational burden. Currently, virtual character automatic generation technology is developing rapidly, allowing users to generate virtual characters through simple input, such as inputting images, text, etc., helping users save time and further improve playability.

相关技术中的捏脸方案通常是非交互式的，需要用户对期望的角色形象进行一次性的完整描述，然后直接生成角色定制结果，但是完整描述角色形象，对用户的负担较高，并且当文字描述内容较长、较复杂时，容易出现生成结果与描述偏差过大的情况。Face pinching solutions in related technologies are usually non-interactive, requiring the user to fully describe the desired character image at one time, and then directly generate character customization results. However, fully describing the character image places a high burden on the user, and when text When the description content is long and complex, it is easy for the generated results to deviate too much from the description.

发明内容Contents of the invention

有鉴于此，本申请实施例至少提供一种虚拟形象生成方法、装置、电子设备及存储介质，以克服上述至少一种缺陷。In view of this, embodiments of the present application provide at least a virtual image generation method, device, electronic device, and storage medium to overcome at least one of the above-mentioned disadvantages.

第一方面，本申请示例性实施例提供一种虚拟形象生成方法，所述方法包括：获取用于创建虚拟形象的描述文本；基于所获取的描述文本，追踪针对虚拟形象的编辑意图，并生成回复文本；展示所述回复文本以及根据所述编辑意图所生成的虚拟形象，所述回复文本用于指示所生成的虚拟形象针对所述描述文本所进行的编辑响应。In a first aspect, exemplary embodiments of the present application provide a method for generating an avatar. The method includes: obtaining a description text for creating an avatar; based on the obtained description text, tracking the editing intention for the avatar, and generating Reply text; display the reply text and the avatar generated according to the editing intention, and the reply text is used to indicate the generated avatar's editing response to the description text.

第二方面，本申请实施例还提供一种虚拟形象生成装置，所述装置包括：获取模块，获取用于创建虚拟形象的描述文本；识别模块，基于所获取的描述文本，追踪针对虚拟形象的编辑意图，并生成回复文本；展示模块，展示所述回复文本以及根据所述编辑意图所生成的虚拟形象，所述回复文本用于指示所生成的虚拟形象针对所述描述文本所进行的编辑响应。In a second aspect, embodiments of the present application also provide a device for generating a virtual image. The device includes: an acquisition module to acquire description text for creating a virtual image; and an identification module to track the virtual image based on the acquired description text. Edit the intention and generate a reply text; a display module displays the reply text and the avatar generated according to the editing intention, and the reply text is used to indicate the generated avatar's editing response to the description text. .

第三方面，本申请实施例还提供一种电子设备，处理器、存储介质和总线，所述存储介质存储有所述处理器可执行的机器可读指令，当电子设备运行时，所述处理器与所述存储介质之间通过总线通信，所述处理器执行所述机器可读指令，以执行上述虚拟形象生成方法的步骤。In a third aspect, embodiments of the present application also provide an electronic device, a processor, a storage medium, and a bus. The storage medium stores machine-readable instructions executable by the processor. When the electronic device is running, the processing The processor communicates with the storage medium through a bus, and the processor executes the machine-readable instructions to perform the steps of the above virtual image generation method.

第四方面，本申请实施例还提供了一种计算机可读存储介质，所述计算机可读存储介质上存储有计算机程序，所述计算机程序被处理器运行时执行上述虚拟形象生成方法的步骤。In a fourth aspect, embodiments of the present application further provide a computer-readable storage medium. A computer program is stored on the computer-readable storage medium, and the computer program executes the steps of the above virtual image generation method when run by a processor.

本申请实施例提供的虚拟形象生成方法、装置、电子设备及存储介质，能够简化玩家的操作，以减轻用户的输入负担，并且通过及时的交互响应能够提高角色创建的处理效率，从而有助于为用户提供一更自然、更便捷、更准确的角色创建交互方案。The virtual image generation method, device, electronic device and storage medium provided by the embodiments of the present application can simplify the player's operation to reduce the user's input burden, and can improve the processing efficiency of character creation through timely interactive response, thus contributing to Provide users with a more natural, convenient, and accurate interaction solution for character creation.

为使本申请的上述目的、特征和优点能更明显易懂，下文特举较佳实施例，并配合所附附图，作详细说明如下。In order to make the above-mentioned objects, features and advantages of the present application more obvious and understandable, preferred embodiments are given below and described in detail with reference to the attached drawings.

附图说明Description of the drawings

为了更清楚地说明本申请实施例的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，应当理解，以下附图仅示出了本申请的某些实施例，因此不应被看作是对范围的限定，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他相关的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required to be used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of the present application and therefore do not It should be regarded as a limitation of the scope. For those of ordinary skill in the art, other relevant drawings can be obtained based on these drawings without exerting creative efforts.

图1示出本申请示例性实施例提供的虚拟形象生成方法的流程图；Figure 1 shows a flow chart of a virtual image generation method provided by an exemplary embodiment of the present application;

图2示出本申请示例性实施例提供的针对描述文本的语言识别步骤的流程图；Figure 2 shows a flowchart of language identification steps for description text provided by an exemplary embodiment of the present application;

图3示出本申请示例性实施例提供的多轮交互的流程图；Figure 3 shows a flow chart of multiple rounds of interactions provided by an exemplary embodiment of the present application;

图4示出本申请示例性实施例提供的训练相关性预测模型的步骤的流程图；Figure 4 shows a flow chart of the steps of training a correlation prediction model provided by an exemplary embodiment of the present application;

图5示出本申请示例性实施例提供的训练可微渲染器的步骤的流程图；Figure 5 shows a flow chart of the steps of training a differentiable renderer provided by an exemplary embodiment of the present application;

图6示出本申请示例性实施例提供的确定先验损失的步骤的流程图；Figure 6 shows a flow chart of the steps of determining a priori loss provided by an exemplary embodiment of the present application;

图7示出本申请示例性实施例提供的虚拟形象生成装置的结构示意图；Figure 7 shows a schematic structural diagram of a virtual image generation device provided by an exemplary embodiment of the present application;

图8示出本申请示例性实施例提供的电子设备的结构示意图。FIG. 8 shows a schematic structural diagram of an electronic device provided by an exemplary embodiment of the present application.

具体实施方式Detailed ways

为使本申请实施例的目的、技术方案和优点更加清楚，下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，应当理解，本申请中的附图仅起到说明和描述的目的，并不用于限定本申请的保护范围。另外，应当理解，示意性的附图并未按实物比例绘制。本申请中使用的流程图示出了根据本申请的一些实施例实现的操作。应当理解，流程图的操作可以不按顺序实现，没有逻辑的上下文关系的步骤可以反转顺序或者同时实施。此外，本领域技术人员在本申请内容的指引下，可以向流程图添加一个或多个其他操作，也可以从流程图中移除一个或多个操作。In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. It should be understood that the technical solutions in the embodiments of the present application The drawings are for illustration and description purposes only and are not intended to limit the scope of the present application. Additionally, it should be understood that the schematic drawings are not drawn to scale. The flowcharts used in this application illustrate operations implemented in accordance with some embodiments of the application. It should be understood that the operations of the flowchart may be implemented out of sequence, and steps without logical context may be implemented in reverse order or simultaneously. In addition, those skilled in the art can add one or more other operations to the flow chart, or remove one or more operations from the flow chart under the guidance of the content of this application.

本说明书中使用用语“一个”、“一”、“该”和“所述”用以表示存在一个或多个要素/组成部分/等；用语“包括”和“具有”用以表示开放式的包括在内的意思并且是指除了列出的要素/组成部分/等之外还可存在另外的要素/组成部分/等；用语“第一”和“第二”等仅作为标记使用，不是对其对象的数量限制。The terms "a", "an", "the" and "said" are used in this specification to indicate the existence of one or more elements/components/etc.; the terms "include" and "have" are used to indicate an open-ended Inclusive is intended and means that there may be additional elements/components/etc. in addition to the listed elements/components/etc.; the terms "first" and "second" etc. are used as labels only and do not refer to The number of its objects is limited.

应当理解，在本申请实施例中，“至少一个”是指一个或者多个，“多个”是指两个或者两个以上。“和/或”仅仅是一种描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。“包含A、B和/或C”是指包含A、B、C三者中任1个或任2个或3个。It should be understood that in the embodiments of this application, "at least one" refers to one or more, and "multiple" refers to two or more. "And/or" is just an association relationship that describes related objects. It means that there can be three kinds of relationships. For example, A and/or B can mean: A exists alone, A and B exist simultaneously, and B exists alone. Condition. The character "/" generally indicates that the related objects are in an "or" relationship. "Including A, B and/or C" means including any one, any two or three of A, B and C.

应当理解，在本申请实施例中，“与A对应的B”、“与A相对应的B”、“A与B相对应”或者“B与A相对应”，表示B与A相关联，根据A可以确定B。根据A确定B并不意味着仅仅根据A确定B，还可以根据A和/或其他信息确定B。It should be understood that in the embodiment of the present application, "B corresponding to A", "B corresponding to A", "A corresponding to B" or "B corresponding to A" means that B is associated with A, B can be determined based on A. Determining B based on A does not mean determining B only based on A, but can also determine B based on A and/or other information.

另外，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。通常在此处附图中描述和示出的本申请实施例的组件可以以各种不同的配置来布置和设计。因此，以下对在附图中提供的本申请的实施例的详细描述并非旨在限制要求保护的本申请的范围，而是仅仅表示本申请的选定实施例。基于本申请的实施例，本领域技术人员在没有做出创造性劳动的前提下所获得的全部其他实施例，都属于本申请保护的范围。In addition, the described embodiments are only some of the embodiments of the present application, rather than all of the embodiments. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations. Accordingly, the following detailed description of the embodiments of the application provided in the appended drawings is not intended to limit the scope of the claimed application, but rather to represent selected embodiments of the application. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without any creative work fall within the scope of protection of this application.

然而，数以百计的角色参数虽然提供了极大的角色创建自由度，但同时也增大了用户的操作负担。目前，虚拟角色自动生成技术快速发展，允许用户可以通过简单的输入来生成虚拟角色，例如，基于输入的图像或者文本来生成虚拟角色，帮助用户节约时间，进一步提升可玩性。However, although hundreds of character parameters provide great freedom in character creation, they also increase the user's operational burden. Currently, virtual character automatic generation technology is developing rapidly, allowing users to generate virtual characters through simple input, for example, generating virtual characters based on input images or text, helping users save time and further improve playability.

相关技术中的捏脸方案通常是单轮的、非交互式的。例如，需要用户对期望的角色形象进行一次性的完整描述，然后直接生成角色定制结果，但是上述捏脸存在以下缺陷：Face pinching solutions in related technologies are usually single-round and non-interactive. For example, the user is required to give a complete description of the desired character image at one time, and then directly generate the character customization results. However, the above-mentioned face pinching has the following defects:

(1)一次性完整描述出角色形象，为用户带来较大的输入负担。对于用户对捏脸方案不熟悉的情况，用户无法一次性输入准确的描述内容，其可能需要用户反复尝试，以获得理想的生成结果。此外，若用户输入的不够完整、准确，可能会造成生成结果的偏差。(1) Completely describing the character image at one time brings a greater input burden to the user. For situations where the user is not familiar with the face pinching scheme, the user cannot enter an accurate description at one time, and may require the user to try repeatedly to obtain the ideal generation result. In addition, if the user's input is not complete and accurate enough, it may cause deviation in the generated results.

(2)在算法生成的角色形象同描述或者用户的期待存在偏差时，无法进一步调整，只能重新输入一长段完整的描述内容。(2) When the character image generated by the algorithm deviates from the description or the user's expectations, further adjustments cannot be made and only a long and complete description can be re-entered.

(3)当文字描述内容较长、较复杂时，也对算法带来极大的挑战，容易出现生成结果与描述偏差过大的情况。(3) When the text description content is long and complex, it also brings great challenges to the algorithm, and it is easy for the generated results to deviate too much from the description.

除此之外，相关技术的捏脸方案均存在运算时间过长，运算效果不鲁棒的问题，容易生成不符合人脸自然分布的异常结果。上述问题的出现，主要是由于相关方案中采取了低效的参数搜索方式，因而使得运算效率较低。此外，相关方案是在无约束空间下进行的参数生成，缺乏先验分布的约束，因此容易产生不符合人脸自然分布的异常结果。In addition, face pinching solutions in related technologies have problems such as too long computation time and non-robust computation effects, and are prone to generate abnormal results that do not conform to the natural distribution of faces. The above problems arise mainly due to the inefficient parameter search method adopted in the relevant scheme, which results in low computational efficiency. In addition, the related scheme generates parameters in an unconstrained space and lacks the constraints of a priori distribution, so it is easy to produce abnormal results that are not consistent with the natural distribution of faces.

针对上述至少一个方面的问题，本申请提出一种交互式的虚拟形象生成方案，以减轻用户的输入负担，有助于使得角色创建过程更自然、更便捷、更准确。In response to at least one of the above problems, this application proposes an interactive virtual image generation solution to reduce the user's input burden and help make the character creation process more natural, convenient, and accurate.

首先，对本申请实施例中涉及的名称进行介绍。First, the names involved in the embodiments of this application are introduced.

终端设备：Terminal Equipment:

本申请实施例中涉及的终端设备主要是指能够提供用户界面(User Interface)以实现人机交互的电子设备，在一示例性的应用场景中，终端设备可以用于提供游戏画面(如，游戏中的相关设置/配置界面、呈现游戏场景的界面)，并能够对虚拟角色进行控制操作的智能设备，终端设备可以包括但不限于以下设备中的任意一种：智能手机、平板电脑、便携式计算机、台式计算机、游戏机、个人数字助理(PDA)、电子书阅读器、MP4(MovingPicture Experts Group Audio Layer IV，动态影像专家压缩标准音频层面4)播放器等。该终端设备中安装和运行有支持游戏场景的应用程序，比如支持三维游戏场景的应用程序。该应用程序可以包括但不限于虚拟现实应用程序、三维地图程序、军事仿真程序、MOBA游戏(Multiplayer Online Battle Arena，多人在线战术竞技游戏)、多人枪战类生存游戏、第三人称射击游戏(TPS，Third-Personal Shooting Game)中的任意一种。可选地，该应用程序可以是单机版的应用程序，比如单机版的3D(Three Dimensions，三维立体图形)游戏程序，也可以是网络联机版的应用程序。The terminal device involved in the embodiments of this application mainly refers to an electronic device that can provide a user interface (User Interface) to realize human-computer interaction. In an exemplary application scenario, the terminal device can be used to provide game screens (such as games). Related settings/configuration interfaces, interfaces that present game scenes), and smart devices that can control and operate virtual characters. Terminal devices can include but are not limited to any of the following devices: smartphones, tablets, portable computers , desktop computers, game consoles, personal digital assistants (PDAs), e-book readers, MP4 (Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Group Audio Layer IV) players, etc. Applications supporting game scenes, such as applications supporting three-dimensional game scenes, are installed and run on the terminal device. The application may include but is not limited to virtual reality applications, three-dimensional map programs, military simulation programs, MOBA games (Multiplayer Online Battle Arena, multiplayer online tactical competitive games), multiplayer gun battle survival games, third-person shooters (TPS) , Third-Personal Shooting Game). Optionally, the application may be a stand-alone version of the application, such as a stand-alone version of a 3D (Three Dimensions, three-dimensional graphics) game program, or may be a network online version of the application.

图形用户界面：GUI:

是一种人与计算机通信的界面显示格式，允许用户使用鼠标、键盘和/或游戏手柄等输入设备操纵屏幕上的图标、标识或菜单选项，也允许用户通过在触控终端的触控屏上执行触摸操作来操纵屏幕上的图标或菜单选项，以选择命令、启动程序或执行其它一些任务等。在游戏场景下，图形用户界面中可以显示游戏场景界面以及游戏配置界面。It is an interface display format for human-computer communication that allows users to use input devices such as mouse, keyboard and/or game controller to manipulate icons, logos or menu options on the screen. It also allows users to control icons, logos or menu options on the touch screen of a touch terminal. Touch to manipulate on-screen icons or menu options to select commands, launch programs, or perform other tasks. In the game scene, the game scene interface and the game configuration interface can be displayed in the graphical user interface.

虚拟场景：Virtual scene:

是应用程序在终端设备或服务器上运行时显示(或提供)的虚拟环境。可选地，该虚拟场景是对真实世界的仿真环境，或者是半仿真半虚构的虚拟环境，或者是纯虚构的虚拟环境。虚拟场景可以是二维虚拟环境、2.5维虚拟环境和三维虚拟环境中的任意一种，虚拟环境可以为天空、陆地、海洋等。其中，虚拟场景为用户控制虚拟角色完整游戏逻辑的场景，可选地，该虚拟场景还用于至少两个虚拟对象之间的虚拟环境对战，在该虚拟场景中具有可供至少两个虚拟角色使用的虚拟资源。It is the virtual environment displayed (or provided) when the application is running on the terminal device or server. Optionally, the virtual scene is a simulation environment of the real world, or a semi-simulation and semi-fictional virtual environment, or a purely fictitious virtual environment. The virtual scene can be any one of a two-dimensional virtual environment, a 2.5-dimensional virtual environment, and a three-dimensional virtual environment. The virtual environment can be the sky, land, ocean, etc. The virtual scene is a scene in which the user controls the complete game logic of the virtual character. Optionally, the virtual scene is also used for a virtual environment battle between at least two virtual objects. In the virtual scene, there is a space for at least two virtual characters to play. The virtual resource used.

虚拟角色：Virtual character:

是指在虚拟环境(如，游戏场景)中的虚拟角色，该虚拟角色可以是由玩家操控的虚拟角色，包括但不限于虚拟人物、虚拟动物、动漫人物中的至少一种，可以是非玩家操控的虚拟角色(NPC)，还可以是虚拟物体，如虚拟场景中的静态对象，例如，虚拟场景中的虚拟道具、虚拟任务、虚拟环境中的一个位置、地形、房屋、桥梁、植被等。静态对象往往不受玩家直接控制，但可以响应场景中虚拟角色的交互行为(如，攻击、拆除等)，而作出相应的表现，例如：虚拟角色可以对建筑物进行拆除、拾取、拖拽、建造等。可选地，虚拟物体也可以无法响应虚拟角色的交互行为，例如，虚拟物体同样可以为游戏场景中的建筑物、门、窗、植等，但是虚拟角色并不能与其进行交互，比如，虚拟角色不能对窗进行破坏或是拆除等。可选地，当虚拟环境为三维虚拟环境时，虚拟角色可以是三维虚拟模型，每个虚拟角色在三维虚拟环境中具有自身的形状和体积，占据三维虚拟环境中的一部分空间。可选地，虚拟角色是基于三维人体骨骼技术构建的三维角色，该虚拟角色通过穿戴不同的皮肤来实现不同的外在形象。在一些实现方式中，虚拟角色也可以采用2.5维或2维模型来实现，本申请实施例对此不加以限定。Refers to a virtual character in a virtual environment (such as a game scene). The virtual character can be a virtual character controlled by a player, including but not limited to at least one of a virtual character, a virtual animal, and an animation character. It can be a non-player controlled virtual character. A virtual character (NPC) can also be a virtual object, such as a static object in a virtual scene, for example, virtual props in the virtual scene, virtual tasks, a location in the virtual environment, terrain, houses, bridges, vegetation, etc. Static objects are often not directly controlled by the player, but can respond to the interactive behavior of the virtual character in the scene (such as attacking, demolishing, etc.) and make corresponding performances. For example, the virtual character can demolish, pick up, drag, and drop buildings. Construction etc. Optionally, virtual objects may also be unable to respond to the interactive behavior of virtual characters. For example, virtual objects may also be buildings, doors, windows, plants, etc. in the game scene, but virtual characters cannot interact with them. For example, virtual characters Windows cannot be damaged or dismantled. Optionally, when the virtual environment is a three-dimensional virtual environment, the virtual characters may be three-dimensional virtual models. Each virtual character has its own shape and volume in the three-dimensional virtual environment and occupies a part of the space in the three-dimensional virtual environment. Optionally, the virtual character is a three-dimensional character constructed based on three-dimensional human skeleton technology, and the virtual character achieves different external images by wearing different skins. In some implementations, the virtual character can also be implemented using a 2.5-dimensional or 2-dimensional model, which is not limited in the embodiments of the present application.

在虚拟场景中可以存在有多个虚拟角色，该虚拟角色是玩家操控的虚拟角色(即，玩家通过输入设备、触控屏进行控制的角色)、或者是通过训练设置在虚拟环境对战中的人工智能(Artificial Intelligence，AI)。可选地，该虚拟角色是在游戏场景中进行竞技的虚拟人物。可选地，该游戏场景对战中的虚拟角色的数量是预设设置的，或者是根据加入虚拟对战的终端设备的数量动态确定的，本申请实施例对此不作限定。在一种可能实现方式中，用户能够控制虚拟角色在该虚拟场景中进行移动，例如，控制拟角色跑动、跳动、爬行等，也能够控制虚拟角色使用应用程序所提供的技能、虚拟道具等与其他虚拟角色进行战斗。There may be multiple virtual characters in the virtual scene. The virtual characters are virtual characters controlled by players (that is, characters controlled by players through input devices and touch screens), or artificial characters set up in virtual environment battles through training. Artificial Intelligence (AI). Optionally, the virtual character is a virtual character competing in a game scene. Optionally, the number of virtual characters in the game scene battle is preset, or is dynamically determined based on the number of terminal devices participating in the virtual battle, which is not limited in the embodiments of the present application. In one possible implementation, the user can control the virtual character to move in the virtual scene, for example, control the virtual character to run, jump, crawl, etc., and can also control the virtual character to use skills, virtual props, etc. provided by the application. Fight with other virtual characters.

在一可选的实施方式中，终端设备可以为本地终端设备。以游戏为例，本地终端设备存储有游戏程序并用于呈现游戏画面。本地终端设备用于通过图形用户界面与玩家进行交互，即，常规的通过电子设备下载安装游戏程序并运行。该本地终端设备将图形用户界面提供给玩家的方式可以包括多种，例如，可以渲染显示在终端设备的显示屏上，或者，通过全息投影提供给玩家。举例而言，本地终端设备可以包括显示屏和处理器，该显示屏用于呈现图形用户界面，该图形用户界面包括游戏场景画面和游戏配置界面，该处理器用于运行该游戏、生成图形用户界面以及控制图形用户界面在显示屏上的显示。In an optional implementation, the terminal device may be a local terminal device. Taking games as an example, the local terminal device stores game programs and is used to present game screens. The local terminal device is used to interact with players through a graphical user interface, that is, conventionally downloading, installing and running game programs through electronic devices. The local terminal device may provide the graphical user interface to the player in a variety of ways. For example, it may be rendered and displayed on the display screen of the terminal device, or provided to the player through holographic projection. For example, the local terminal device may include a display screen and a processor. The display screen is used to present a graphical user interface. The graphical user interface includes a game scene screen and a game configuration interface. The processor is used to run the game and generate a graphical user interface. and controlling the display of the graphical user interface on the display screen.

对本申请可适用的应用场景进行介绍。本申请可应用于游戏技术领域，在游戏中，由参与游戏的多个玩家共同加入同一局虚拟对局中。Introduce the applicable application scenarios of this application. This application can be applied to the field of game technology. In the game, multiple players participating in the game join the same virtual game.

在进入虚拟对战之前，玩家可以为自己在虚拟对战中的虚拟角色选择不同的角色属性，例如，身份属性，通过分配不同的角色属性以确定不同阵营，使得玩家在本次虚拟对战的不同对局阶段通过执行游戏分配的任务，以赢得游戏比赛，例如，具有A角色属性的多个虚拟角色通过在对局阶段将具有B角色属性的虚拟角色进行“淘汰”，以获得游戏比赛的胜利。这里，还可以是在进入虚拟对战时，为参与本次虚拟对战的每个虚拟角色随机分配角色属性。Before entering the virtual battle, players can choose different character attributes for their virtual characters in the virtual battle, such as identity attributes. By assigning different character attributes to determine different camps, players can compete in different matches of this virtual battle. In the stage, the task assigned by the game is performed to win the game competition. For example, multiple virtual characters with the attributes of character A "eliminate" the virtual characters with the attributes of character B in the game stage to win the game competition. Here, when entering the virtual battle, character attributes can also be randomly assigned to each virtual character participating in this virtual battle.

本申请一个实施例提供的实施环境可以包括：第一终端设备、服务器和第二终端设备，第一终端设备和第二终端设备分别与服务器进行通信，以实现数据通信。在本实施方式中，第一终端设备和第二终端设备分别安装有执行本申请所提供的虚拟形象生成方法的应用程序，服务器为执行本申请所提供的虚拟形象生成方法的服务器端。通过应用程序，使得第一终端设备和第二终端设备分别可以与服务器进行通信。An implementation environment provided by an embodiment of the present application may include: a first terminal device, a server, and a second terminal device. The first terminal device and the second terminal device communicate with the server respectively to implement data communication. In this embodiment, the first terminal device and the second terminal device are each installed with an application program that executes the virtual image generation method provided by this application, and the server is a server that executes the virtual image generation method provided by this application. Through the application program, the first terminal device and the second terminal device can respectively communicate with the server.

以第一终端设备为例，第一终端设备通过运行应用程序与服务器建立通信。在一可选的实施方式中，服务器根据应用程序的游戏请求建立虚拟对战。其中，虚拟对战的参数可以根据接收到的游戏请求中的参数进行确定，例如，虚拟对战的参数可包括参与虚拟对战的人数，参与虚拟对战的角色级别等。当第一终端设备接收到游戏服务器的响应时，通过第一终端设备的图形用户界面显示虚拟对战所对应的游戏场景，第一终端设备为第一用户控制的设备，第一终端设备的图形用户界面中所显示的虚拟角色为该第一用户控制的玩家角色(即第一虚拟角色)，第一用户通过图形用户界面输入角色操作指令，以控制玩家角色在游戏场景中执行相应的操作。Taking the first terminal device as an example, the first terminal device establishes communication with the server by running an application program. In an optional implementation, the server establishes a virtual battle based on the application's game request. The parameters of the virtual battle can be determined according to the parameters in the received game request. For example, the parameters of the virtual battle can include the number of people participating in the virtual battle, the level of the characters participating in the virtual battle, etc. When the first terminal device receives a response from the game server, the game scene corresponding to the virtual battle is displayed through the graphical user interface of the first terminal device. The first terminal device is a device controlled by the first user. The graphical user of the first terminal device The virtual character displayed in the interface is the player character (ie, the first virtual character) controlled by the first user. The first user inputs character operation instructions through the graphical user interface to control the player character to perform corresponding operations in the game scene.

以第二终端设备为例，第二终端设备通过运行应用程序与服务器建立通信。在一可选的实施方式中，服务器根据应用程序的游戏请求建立虚拟对战。其中，虚拟对战的参数可以根据接收到的游戏请求中的参数进行确定，例如，虚拟对战的参数可包括参与虚拟对战的人数，参与虚拟对战的角色级别等。当第二终端设备接收到服务器的响应时，通过第二终端设备的图形用户界面显示虚拟对战所对应的游戏场景。第二终端设备为第二用户控制的设备，第二终端设备的图形用户界面中所显示的虚拟角色为该第二用户控制的玩家角色(即第二虚拟角色)，第二用户通过图形用户界面输入角色操作指令，以控制玩家角色在虚拟场景中执行相应的操作。Taking the second terminal device as an example, the second terminal device establishes communication with the server by running an application program. In an optional implementation, the server establishes a virtual battle based on the application's game request. The parameters of the virtual battle can be determined according to the parameters in the received game request. For example, the parameters of the virtual battle can include the number of people participating in the virtual battle, the level of the characters participating in the virtual battle, etc. When the second terminal device receives the response from the server, the game scene corresponding to the virtual battle is displayed through the graphical user interface of the second terminal device. The second terminal device is a device controlled by the second user, and the virtual character displayed in the graphical user interface of the second terminal device is the player character (ie, the second virtual character) controlled by the second user. The second user uses the graphical user interface to Input character operation instructions to control the player character to perform corresponding operations in the virtual scene.

服务器根据通过接收第一终端设备和第二终端设备上报的游戏数据，进行数据计算，并将计算后的游戏数据同步至第一终端设备和第二终端设备，使得第一终端设备和第二终端设备根据游戏服务器下发的同步数据控制图形用户界面渲染出对应的游戏场景和/或虚拟角色。The server performs data calculation based on the game data reported by receiving the first terminal device and the second terminal device, and synchronizes the calculated game data to the first terminal device and the second terminal device, so that the first terminal device and the second terminal device The device controls the graphical user interface to render corresponding game scenes and/or virtual characters based on the synchronized data sent by the game server.

在本实施方式中，第一终端设备控制的第一虚拟角色和第二终端设备控制的第二虚拟角色为处于同一局虚拟对战中的虚拟角色。其中，第一终端设备控制的第一虚拟角色和第二终端设备控制的第二虚拟角色可以具有相同的角色属性，也可以为具有不同的角色属性，第一终端设备控制的第一虚拟角色和第二终端设备控制的第二虚拟角色可以属于相同的阵营，也可以属于不同的阵营。In this embodiment, the first virtual character controlled by the first terminal device and the second virtual character controlled by the second terminal device are virtual characters in the same virtual battle. Wherein, the first virtual character controlled by the first terminal device and the second virtual character controlled by the second terminal device may have the same character attributes, or may have different character attributes. The first virtual character controlled by the first terminal device and the second virtual character controlled by the second terminal device may have the same character attributes. The second virtual character controlled by the second terminal device may belong to the same camp or different camps.

需要说明的是，虚拟对战中，可以包括两个或是两个以上的虚拟角色，不同的虚拟角色可以分别对应不同的终端设备，也就是说，在虚拟对战中，存在两个以上的终端设备分别与游戏服务器进行游戏数据的发送和同步。It should be noted that a virtual battle can include two or more virtual characters, and different virtual characters can correspond to different terminal devices. That is to say, in a virtual battle, there are more than two terminal devices. Send and synchronize game data with the game server respectively.

本申请实施例提供的虚拟形象生成方法可以应用于虚拟现实应用程序、三维地图程序、军事仿真程序、多人在线战术竞技游戏(MOBA)、多人枪战类生存游戏、第三人称对战游戏、第一人称对战游戏中的任一种。The virtual image generation method provided by the embodiment of the present application can be applied to virtual reality applications, three-dimensional map programs, military simulation programs, multiplayer online tactical competitive games (MOBA), multiplayer gunfight survival games, third-person battle games, first-person Any kind of battle game.

在本申请其中一种实施例中的虚拟形象生成方法可以运行于本地终端设备或者是服务器。当该方法运行于服务器时，该方法则可以基于云交互系统来实现与执行，其中，云交互系统包括服务器和客户端设备。The virtual image generation method in one embodiment of the present application can be run on a local terminal device or a server. When the method is run on the server, the method can be implemented and executed based on a cloud interaction system, where the cloud interaction system includes a server and a client device.

在一可选的实施方式中，云交互系统下可以运行各种云应用，例如：云游戏。以云游戏为例，云游戏是指以云计算为基础的游戏方式。在云游戏的运行模式下，游戏程序的运行主体和游戏画面呈现主体是分离的，该信息展示方法的储存与运行是在云游戏服务器上完成的，客户端设备的作用用于数据的接收、发送以及游戏画面的呈现，举例而言，客户端设备可以是靠近用户侧的具有数据传输功能的显示设备，如，移动终端、电视机、计算机、掌上电脑等；但是进行信息处理的为云端的云游戏服务器。在进行游戏时，玩家操作客户端设备向云游戏服务器发送操作指令，云游戏服务器根据操作指令运行游戏，将游戏画面等数据进行编码压缩，通过网络返回客户端设备，最后，通过客户端设备进行解码并输出游戏画面。In an optional implementation, various cloud applications, such as cloud games, can be run under the cloud interactive system. Take cloud gaming as an example. Cloud gaming refers to a gaming method based on cloud computing. In the running mode of cloud games, the running body of the game program and the game screen presentation body are separated. The storage and operation of the information display method are completed on the cloud game server. The role of the client device is used to receive data, transmission and presentation of the game screen. For example, the client device can be a display device with data transmission function close to the user side, such as a mobile terminal, a television, a computer, a handheld computer, etc.; however, the information processing is done in the cloud. Cloud gaming server. When playing a game, the player operates the client device to send operating instructions to the cloud game server. The cloud game server runs the game according to the operating instructions, encodes and compresses the game screen and other data, and returns it to the client device through the network. Finally, the cloud game server performs operations through the client device. Decode and output game screen.

在一可选的实施方式中，以游戏为例，本地终端设备存储有游戏程序并用于呈现游戏画面。本地终端设备用于通过图形用户界面与玩家进行交互，即，常规的通过电子设备下载安装游戏程序并运行。该本地终端设备将图形用户界面提供给玩家的方式可以包括多种，例如，可以渲染显示在终端的显示屏上，或者，通过全息投影提供给玩家。举例而言，本地终端设备可以包括显示屏和处理器，该显示屏用于呈现图形用户界面，该图形用户界面包括游戏画面，该处理器用于运行该游戏、生成图形用户界面以及控制图形用户界面在显示屏上的显示。In an optional implementation, taking a game as an example, the local terminal device stores the game program and is used to present the game screen. The local terminal device is used to interact with players through a graphical user interface, that is, conventionally downloading, installing and running game programs through electronic devices. The local terminal device may provide the graphical user interface to the player in a variety of ways. For example, it may be rendered and displayed on the display screen of the terminal, or provided to the player through holographic projection. For example, the local terminal device may include a display screen and a processor. The display screen is used to present a graphical user interface. The graphical user interface includes a game screen. The processor is used to run the game, generate the graphical user interface, and control the graphical user interface. displayed on the display screen.

在一种可能的实施方式中，本发明实施例提供了一种虚拟形象生成方法，通过终端设备提供图形用户界面，其中，终端设备可以是前述提到的本地终端设备，也可以是前述提到的云交互系统中的客户端设备。In a possible implementation, an embodiment of the present invention provides a virtual image generation method that provides a graphical user interface through a terminal device, where the terminal device may be the aforementioned local terminal device or the aforementioned Client devices in cloud interactive systems.

为便于对本申请进行理解，下面对本申请实施例提供的虚拟形象生成方法、装置、电子设备及存储介质进行详细介绍。In order to facilitate understanding of the present application, the virtual image generation method, device, electronic device and storage medium provided by the embodiments of the present application are introduced in detail below.

请参阅图1，为本申请示例性实施例提供的虚拟形象生成方法的流程图，其通常被应用于游戏服务器中，例如，前述描述的云游戏服务器，但本申请不限于此。Please refer to FIG. 1 , which is a flow chart of an avatar generation method provided in an exemplary embodiment of the present application. It is usually applied in a game server, such as the cloud game server described above, but the present application is not limited thereto.

在本申请示例性实施例的虚拟形象创建过程中，可以包括至少一轮交互，示例性的，每轮交互可以包括但不限于：获取本轮交互的用于创建虚拟形象的描述文本，然后展示本轮交互的结果，如，展示针对本轮交互的描述文本所产生的回复文本和虚拟形象。用户可以基于本轮交互所展示的虚拟形象和回复文本，选择结束交互，或者执行下一轮交互的描述文本的输入，再针对下一轮输入的描述文本进行虚拟形象和回复文本的展示，如此重复，直至交互结束，形成最终创建的虚拟形象。通过上述交互式的虚拟形象创建方案，允许用户以多轮自然语言对话的方式完成角色的创建和修改，为用户提供更好的角色形象创建体验。In the process of creating a virtual image in the exemplary embodiment of the present application, at least one round of interaction may be included. Exemplarily, each round of interaction may include but is not limited to: obtaining the description text used to create the virtual image in this round of interaction, and then displaying The results of this round of interaction, such as displaying the reply text and avatar generated by the description text of this round of interaction. The user can choose to end the interaction based on the avatar and reply text displayed in this round of interaction, or input the description text of the next round of interaction, and then display the avatar and reply text based on the description text input in the next round, so Repeat until the interaction is complete, resulting in the final created avatar. Through the above interactive virtual image creation solution, users are allowed to complete the creation and modification of characters through multiple rounds of natural language dialogue, providing users with a better character image creation experience.

下面参照图1来对用于创建虚拟形象的多轮交互中的一轮交互流程进行阐述，具体如下：The following describes one of the multiple rounds of interactions used to create an avatar with reference to Figure 1. The details are as follows:

步骤S101：获取用于创建虚拟形象的描述文本。Step S101: Obtain the description text used to create the avatar.

在本申请实施例中，可以通过多种途径来获取到上述描述文本，本申请对此不做限制。In the embodiment of the present application, the above description text can be obtained through multiple ways, and the present application does not limit this.

在本申请一优选示例中，可以通过在虚拟形象编辑界面上执行的输入来获取到描述文本。In a preferred example of the present application, the description text can be obtained through input performed on the avatar editing interface.

例如，可以通过终端设备来提供图形用户界面，在图形用户界面上展示游戏的虚拟形象编辑界面，该虚拟形象编辑界面可以用于创建和/或调整虚拟形象。For example, a graphical user interface can be provided through a terminal device, and an avatar editing interface of the game is displayed on the graphical user interface. The avatar editing interface can be used to create and/or adjust the avatar.

示例性的，上述虚拟形象编辑界面可以包括文本输入区域，此时，基于用户在文本输入区域内的输入信息，获取用于创建虚拟形象的描述文本。For example, the above-mentioned avatar editing interface may include a text input area. At this time, based on the user's input information in the text input area, the description text for creating the avatar is obtained.

应理解，可以通过在虚拟形象编辑界面的各种输入方式来获取到上述描述文本，本申请对此不做限制。例如，还可以基于虚拟形象编辑界面上的语音输入控件来采集用户输入的语音数据，通过对所采集的语音数据进行识别，以获得与语音数据对应的文本内容，将该文本内容确定为描述文本。It should be understood that the above description text can be obtained through various input methods on the avatar editing interface, and this application does not limit this. For example, the voice data input by the user can also be collected based on the voice input control on the avatar editing interface, and the text content corresponding to the voice data can be obtained by identifying the collected voice data, and the text content can be determined as the description text. .

除上述描述文本的获取方式之外，还可以从其他终端设备接收描述文本，或者从网络上获取描述文本。In addition to the above methods of obtaining description text, description text can also be received from other terminal devices, or description text can be obtained from the network.

在本申请实施例中，描述文本可以是用于描述虚拟形象的整体特征或者局部特征的文本，也可以是针对虚拟形象的模糊或者精确的描述。示例性的，上述描述文本可以是用于描述虚拟形象的外观特征的一段短文本，例如，一个自然语句，以减轻用户的输入负担。In this embodiment of the present application, the description text may be text used to describe the overall features or local features of the avatar, or it may be a vague or precise description of the avatar. For example, the above description text may be a short text used to describe the appearance characteristics of the avatar, for example, a natural sentence to reduce the user's input burden.

这里，虚拟形象可以指被呈现在虚拟场景中的形像。从虚拟形象的结构来说，虚拟形象可以是三维模型的形象，也可以是平面图像的形象。从虚拟形象的类型来说，虚拟形象可以是模拟人物形象来形成的虚拟形象，也可以是模拟动物形象来形成的虚拟形象，还可以是基于卡通、漫画中的形象来形成的虚拟形象。Here, the virtual image may refer to the image presented in the virtual scene. From the perspective of the structure of the virtual image, the virtual image can be the image of a three-dimensional model or a flat image. In terms of the type of avatar, an avatar can be a virtual image formed by simulating a human image, a virtual image formed by simulating an animal image, or a virtual image formed based on images in cartoons and comics.

示例性的，虚拟形象一般为虚拟角色形象，虚拟角色可以为人物、动物等。所创建的虚拟形象可以是虚拟角色的整体形象，也可以是虚拟角色的局部形象，例如，该虚拟形象可以指虚拟角色的面部图像。For example, the virtual image is generally a virtual character image, and the virtual character can be a person, an animal, etc. The created virtual image can be the overall image of the virtual character or a partial image of the virtual character. For example, the virtual image can refer to the facial image of the virtual character.

步骤S102：基于所获取的描述文本，追踪针对虚拟形象的编辑意图，并生成回复文本。Step S102: Based on the obtained description text, track the editing intention for the avatar and generate a reply text.

这里，可以通过对描述文本的语言识别，来确定其对应的编辑意图以及回复文本。示例性的，可以预先训练一大型语言模型(LLM，Large Language Model)，将描述文本作为LLM的输入，以获得与该描述文本对应的编辑意图和回复文本。Here, the corresponding editing intention and reply text can be determined through language recognition of the description text. For example, a large language model (LLM, Large Language Model) can be pre-trained, and the description text is used as the input of the LLM to obtain the editing intention and reply text corresponding to the description text.

或者，也可以预先训练另一大型语言模型，将描述文本作为该大型语言模型的输入，以获得与该描述文本对应的编辑意图，然后再基于所确定的编辑意图生成对应的回复文本。Alternatively, another large-scale language model can be pre-trained, and the description text can be used as the input of the large-scale language model to obtain the editing intention corresponding to the description text, and then the corresponding reply text can be generated based on the determined editing intention.

上述回复文本与所识别出的编辑意图存在关联，或者说，通过回复文本能够反映出虚拟形象生成算法针对描述文本的意图理解。The above reply text is related to the identified editing intention, or in other words, the reply text can reflect the avatar generation algorithm's understanding of the intention of the description text.

在本申请一优选实施例中，在虚拟形象创建系统中引入了大语言模型，并为它设计了一种记忆力机制，即，形成带记忆力机制的大语言模型，上述带记忆力机制的大语言模型能够对用户的编辑过程进行追踪，从而对复杂的对话过程实现更准确的理解和解析。In a preferred embodiment of the present application, a large language model is introduced into the virtual image creation system, and a memory mechanism is designed for it, that is, a large language model with a memory mechanism is formed. The above large language model with a memory mechanism is It can track the user's editing process to achieve a more accurate understanding and analysis of complex dialogue processes.

下面结合图2来介绍基于带记忆力机制的大语言模型来进行意图理解的过程。The following is an introduction to the process of intent understanding based on a large language model with a memory mechanism in conjunction with Figure 2.

图2示出本申请示例性实施例提供的针对描述文本的语言识别步骤的流程图。FIG. 2 shows a flowchart of language identification steps for description text provided by an exemplary embodiment of the present application.

如图2所示，在步骤S201中，提取用于创建虚拟形象的交互编辑记忆。As shown in Figure 2, in step S201, the interactive editing memory used to create the avatar is extracted.

这里，上述交互编辑记忆用于指示针对虚拟形象的创建过程的交互历史，即，在本轮交互之前的历史交互轮次中对虚拟形象的编辑历史。针对首轮交互的情况，此时步骤S201中所提取的交互编辑记忆为空，在此情况下，仅基于所获取的描述文本来确定编辑意图和回复文本。Here, the above-mentioned interactive editing memory is used to indicate the interaction history for the creation process of the avatar, that is, the editing history of the avatar in the historical interaction rounds before the current round of interaction. For the first round of interaction, the interactive editing memory extracted in step S201 is empty. In this case, the editing intention and reply text are determined based only on the obtained description text.

在第一实施例中，上述交互编辑记忆可以包括历史对话。In a first embodiment, the above-mentioned interactive editing memory may include historical conversations.

示例性的，历史对话包括至少一个对话组，这里，一轮交互可以对应产生一个对话组，每个对话组可以包括但不限于一轮交互中的描述文本及其对应的回复文本。For example, the historical dialogue includes at least one dialogue group. Here, one round of interaction may correspond to one dialogue group. Each dialogue group may include, but is not limited to, the description text in the round of interaction and its corresponding reply text.

在第二实施例中，上述交互编辑记忆可以包括编辑属性以及编辑属性对应的相关编辑强度。In a second embodiment, the interactive editing memory may include editing attributes and related editing strengths corresponding to the editing attributes.

示例性的，编辑属性包括多个角色形象参数中处于编辑状态的预设角色形象参数，这里，可以针对虚拟形象预先定义用于表征其外在形象的多个角色形象参数，例如，可以包括但不限于用于表征脸型、体型、服饰、发色、发型、肤色等类型的角色形象参数，在上述任一类型参数下，还可以包括多个角色形象参数，作为示例，对于脸型而言，可以包括但不限于圆脸、方脸、长脸等角色形象参数。For example, the editing attributes include the preset character image parameters in the editing state among the multiple character image parameters. Here, multiple character image parameters for characterizing the external image of the virtual image can be predefined for the virtual image. For example, it can include but It is not limited to character image parameters used to characterize face shape, body shape, clothing, hair color, hairstyle, skin color, etc. Under any of the above types of parameters, multiple character image parameters can also be included. As an example, for face shape, it can Including but not limited to character image parameters such as round face, square face, long face, etc.

这里，处于编辑状态可以指在创建虚拟形象的过程中编辑过的角色形象参数，例如，在本轮交互之前的历史交互轮次中涉及到的角色形象参数，即，在历史交互轮次中与编辑意图相关联的角色形象参数。这里，每轮交互中所识别出的编辑意图用于表征该轮交互的编辑对象，该编辑对象可以指在本轮交互中需调整的角色形象参数。Here, being in an editing state may refer to the character image parameters that have been edited in the process of creating the avatar. For example, the character image parameters involved in the historical interaction rounds before this round of interaction, that is, in the historical interaction rounds with Edit the character image parameters associated with the intent. Here, the editing intention identified in each round of interaction is used to characterize the editing object of this round of interaction, and the editing object may refer to the character image parameters that need to be adjusted in this round of interaction.

示例性的，相关编辑强度用于表征预设角色形象参数与描述文本之间的关联程度，或者说，相关编辑强度用于表征预设角色形象参数与所识别的编辑意图之间的关联程度。For example, the correlation editing strength is used to characterize the degree of correlation between the preset character image parameters and the description text, or in other words, the correlation editing strength is used to characterize the correlation degree between the preset character image parameters and the identified editing intention.

在第三实施例中，上述交互编辑记忆可以同时包括历史对话、编辑属性以及编辑属性对应的相关编辑强度。In a third embodiment, the above-mentioned interactive editing memory may simultaneously include historical conversations, editing attributes, and related editing intensity corresponding to the editing attributes.

在步骤S202中，基于描述文本和交互编辑记忆，获得针对虚拟形象的编辑意图和回复文本。In step S202, based on the description text and interactive editing memory, the editing intention and reply text for the avatar are obtained.

示例性的，可以将用于创建虚拟形象的交互编辑记忆和描述文本输入到预先训练的大型语言模型，以获得编辑意图和回复文本。For example, the interactive editing memory and description text used to create the avatar can be input into a pre-trained large language model to obtain the editing intention and reply text.

针对上述交互编辑记忆的第一实施例，可以从对话记忆结构中提取历史对话。Regarding the above first embodiment of interactive editing memory, historical dialogues can be extracted from the dialogue memory structure.

例如，对话记忆结构中存储有在本轮交互之前的历史交互轮次中的对话组，对话记忆结构的初始状态为空，随着交互轮次的进行，对对话记忆结构中所存储的内容进行更新。For example, the dialogue memory structure stores dialogue groups in historical interaction rounds before this round of interaction. The initial state of the dialogue memory structure is empty. As the interaction rounds proceed, the content stored in the dialogue memory structure is processed. renew.

具体的，可以通过以下方式更新对话记忆结构：每当一轮交互结束，将该轮交互对应的描述文本及其对应的回复文本添加到对话记忆结构中，以进行存储。Specifically, the dialogue memory structure can be updated in the following way: whenever a round of interaction ends, the description text corresponding to the round of interaction and its corresponding reply text are added to the dialogue memory structure for storage.

在本申请一优选实施例中，可以在获得针对虚拟形象的编辑意图和回复文本的同时，还获得相关编辑强度，这里所获得的相关编辑强度用于表征目标角色形象参数与描述文本之间的关联程度，该目标角色形象参数是根据本轮交互中所识别的编辑意图来确定的，即，与本轮交互的编辑意图相关联的角色形象参数。In a preferred embodiment of the present application, while obtaining the editing intention and reply text for the avatar, the related editing strength can also be obtained. The related editing strength obtained here is used to characterize the relationship between the target character image parameters and the description text. The degree of association, the target character image parameter is determined according to the editing intention recognized in this round of interaction, that is, the character image parameter associated with the editing intention of this round of interaction.

示例性的，可以将用于创建虚拟形象的交互编辑记忆和描述文本输入到预先训练的大型语言模型，以获得编辑意图、相关编辑强度和回复文本。Illustratively, the interactive editing memory and description text used to create the avatar can be input into a pre-trained large language model to obtain editing intent, related editing intensity, and reply text.

这里，可以通过各种方式来对上述各实施例中所涉及的大语言模型进行训练，本申请对此部分内容不再赘述。Here, the large language model involved in the above embodiments can be trained in various ways, and this part will not be described again in this application.

针对上述交互编辑记忆的第二实施例，可以从编辑状态记忆结构中提取编辑属性以及编辑属性对应的相关编辑强度。Regarding the second embodiment of the above-mentioned interactive editing memory, the editing attributes and the related editing intensity corresponding to the editing attributes can be extracted from the editing state memory structure.

例如，编辑状态记忆结构中存储有在本轮交互之前的历史交互轮次中涉及到的编辑属性及其对应的相关编辑强度，即，历史交互轮次中编辑过的角色形象参数及其对应的相关编辑强度，编辑状态记忆结构的初始状态为空，随着交互轮次的进行，对编辑状态记忆结构中所存储的内容进行更新。For example, the editing state memory structure stores the editing attributes involved in the historical interaction rounds before this round of interaction and their corresponding related editing strengths, that is, the character image parameters edited in the historical interaction rounds and their corresponding Relevant to the editing intensity, the initial state of the editing state memory structure is empty. As the interaction round proceeds, the content stored in the editing state memory structure is updated.

具体的，可以通过以下方式更新编辑状态记忆结构：每当一轮交互结束，搜索编辑状态记忆结构中是否存在与该轮交互对应的编辑意图，即，搜索是否存在与本轮交互的编辑意图对应的编辑属性。若存在，则对搜索到的编辑意图对应的相关编辑强度进行更新，示例性的，上述更新可指增加相关编辑强度的数值，以体现关联程度变高，若不存在，则将该轮交互对应的编辑意图以及相关编辑强度添加到编辑状态记忆结构中。Specifically, the editing state memory structure can be updated in the following way: whenever a round of interaction ends, search the editing state memory structure to see whether there is an editing intention corresponding to the round of interaction, that is, search whether there is an editing intention corresponding to the round of interaction. Editing properties. If it exists, the relevant editing intensity corresponding to the searched editing intention is updated. For example, the above update can refer to increasing the value of the relevant editing intensity to reflect the higher degree of correlation. If it does not exist, then the corresponding round of interaction is The editing intention and related editing strength are added to the editing state memory structure.

返回图1，步骤S103：展示回复文本以及根据编辑意图所生成的虚拟形象。Returning to Figure 1, step S103: display the reply text and the avatar generated according to the editing intention.

这里，回复文本用于指示所生成的虚拟形象针对描述文本所进行的编辑响应，示例性的，该编辑响应用以指示在本轮交互中针对虚拟形象所进行的创建或者调整的描述。换言之，本轮交互中针对虚拟形象的创建或者调整，与回复文本中所记载的内容是一致的，以使期望创建虚拟形象的用户能够及时获知算法的创建、调整过程。此外，也可以有助于在一定程度上引导用户在下一轮交互中针对描述文本的输入，从而提高虚拟形象的处理效率。Here, the reply text is used to indicate the editing response of the generated avatar to the description text. For example, the editing response is used to indicate the description of the creation or adjustment of the avatar in this round of interaction. In other words, the creation or adjustment of the avatar in this round of interaction is consistent with the content recorded in the reply text, so that users who want to create an avatar can be informed of the creation and adjustment process of the algorithm in a timely manner. In addition, it can also help guide the user to input description text in the next round of interaction to a certain extent, thereby improving the processing efficiency of the avatar.

针对上述通过终端设备显示虚拟形象编辑界面的情况，上述虚拟形象编辑界面除包括文本输入区域之外，可以还包括形象展示区域和系统响应区域。Regarding the above situation of displaying the virtual image editing interface through the terminal device, the above virtual image editing interface may include an image display area and a system response area in addition to a text input area.

示例性的，可以在形象展示区域内呈现根据编辑意图生成的虚拟形象，在系统响应区域内展示回复文本。在一优选示例中，在首轮交互输入描述文本时，在该虚拟形象编辑界面上可以仅提供文本输入区域，用于接收描述文本，在虚拟形象编辑界面上不显示出形象展示区域和系统响应区域，或者，也可以在首轮交互提供文本输入区域的同时，显示出形象展示区域和系统响应区域，但在上述两个区域内不显示内容。For example, the virtual image generated according to the editing intention can be presented in the image display area, and the reply text can be displayed in the system response area. In a preferred example, during the first round of interactive input of description text, only a text input area can be provided on the avatar editing interface for receiving the description text, and the image display area and system response are not displayed on the avatar editing interface. area, or, while providing a text input area for the first round of interaction, the image display area and the system response area can be displayed, but no content will be displayed in the above two areas.

这里，在文本输入区域、形象展示区域和系统响应区域内可以展示本轮交互相关的描述文本、虚拟形象和回复文本。优选地，在虚拟形象编辑界面上还可以显示有历史输入查看控件、历史形象查看控件、历史编辑查看控件，示例性的，历史输入查看控件可以显示在与文本输入区域关联的位置处，响应于针对历史输入查看控件的操作，在文本输入区域内展示在历史轮次中获取到的描述文本，也可以在文本输入区域内同时展示本轮交互以及历史轮次中获取到的描述文本。Here, the description text, avatar and reply text related to this round of interaction can be displayed in the text input area, image display area and system response area. Preferably, a historical input viewing control, a historical image viewing control, and a historical editing viewing control can also be displayed on the avatar editing interface. For example, the historical input viewing control can be displayed at a position associated with the text input area, in response to For the operation of the historical input viewing control, the description text obtained in the historical rounds is displayed in the text input area, and the description text obtained in the current round of interaction and the historical rounds can also be displayed in the text input area at the same time.

相应地，历史形象查看控件可以显示在与形象展示区域关联的位置处，历史编辑查看控件可以显示在与系统响应区域关联的位置处，这里，针对历史形象查看控件、历史编辑查看控件的操作与针对历史输入查看控件的操作相似，本申请对此部分内容不再赘述。Correspondingly, the historical image viewing control can be displayed at a location associated with the image display area, and the historical editing viewing control can be displayed at a location associated with the system response area. Here, the operations for the historical image viewing control and the historical editing viewing control are as follows: The operation of the historical input viewing control is similar, and this part will not be described again in this application.

下面结合图3来介绍创建虚拟形象的多轮交互过程。The multi-round interaction process of creating a virtual image is introduced below with reference to Figure 3.

图3示出本申请示例性实施例提供的多轮交互的流程图。Figure 3 shows a flow chart of multiple rounds of interaction provided by an exemplary embodiment of the present application.

如图3所示，在步骤S301中，获取用于创建虚拟形象的描述文本。As shown in Figure 3, in step S301, description text for creating an avatar is obtained.

下面通过一非限制性示例来阐述多轮交互过程。示例性的，用户在虚拟形象编辑界面输入的本轮交互的描述文本为“The girl doesn’t looks cute enough.You shouldmake some modifications.”这里，本申请实施例对于描述文本的语种不做限制。The multi-round interaction process is explained below through a non-limiting example. For example, the description text for this round of interaction input by the user on the avatar editing interface is “The girl doesn’t looks cute enough. You should make some modifications.” Here, the embodiment of the present application does not limit the language of the description text.

在步骤S302中，基于描述文本和交互编辑记忆，获得编辑意图、相关编辑强度和回复文本。In step S302, based on the description text and interactive editing memory, the editing intention, related editing intensity and reply text are obtained.

针对本轮交互为非首轮交互的情况，示例性的，对话记忆结构中可以存储有如下内容：For the situation where this round of interaction is not the first round of interaction, for example, the following content can be stored in the dialogue memory structure:

‘user’：Make the skin fair.‘user’: Make the skin fair.

‘system’：……‘system’:…

………

上述列举的为对话记忆结构中存储的历史对话，其中‘user’指对应交互轮次中的描述文本，‘system’指对应交互轮次中的回复文本。The above list is the historical dialogue stored in the dialogue memory structure, where 'user' refers to the description text in the corresponding interaction round, and 'system' refers to the reply text in the corresponding interaction round.

示例性的，编辑状态记忆结构中可以存储有如下内容：For example, the editing status memory structure can store the following content:

‘round face’：0.5‘round face’: 0.5

‘fair skin’：0.5‘fair skin’: 0.5

………

上述列举的为编辑状态记忆结构中存储的编辑属性及其对应的相关编辑强度，其中‘round face’和‘fair skin’分别指编辑属性“圆脸”、“白皙皮肤”，编辑属性对应的数值用于表征该编辑属性对应的相关编辑强度。示例性的，数值越大，则表征关联程度越高，数值越小，则表征关联程度越低。The above list is the editing attributes stored in the editing state memory structure and their corresponding related editing strengths. Among them, 'round face' and 'fair skin' refer to the editing attributes "round face" and "fair skin" respectively, and the corresponding values of the editing attributes. Used to characterize the relative editing intensity corresponding to the editing attribute. For example, the larger the value, the higher the degree of correlation, and the smaller the value, the lower the degree of correlation.

承接上述示例，可以从对话记忆结构和编辑状态记忆结构中提取历史对话、处于编辑状态的编辑属性和相关编辑强度，连同用户输入的描述文本一起填充至事先设计的Prompt。示例性的，Prompt可以包括如下项：Following the above example, the historical dialogue, editing attributes and related editing intensity in the editing state can be extracted from the dialogue memory structure and the editing state memory structure, and filled into the pre-designed prompt together with the description text input by the user. For example, Prompt can include the following items:

‘user input’：‘user input’:

‘histery’：'histery':

‘state’：'state':

其中，‘user input’中对应填充描述文本，‘histery’中对应填充历史对话，‘state’中对应填充处于编辑状态的编辑属性和相关编辑强度。Among them, 'user input' is filled with description text, 'histery' is filled with historical conversations, and 'state' is filled with editing attributes and related editing intensity in the editing state.

将填充好的Prompt输入预先训练好的大语言模型，示例性的，可以以json形式得到解析好的指令，具体如下：Input the filled Prompt into the pre-trained large language model. For example, you can get the parsed instructions in json form, as follows:

Target：Big EyesTarget: Big Eyes

Strength：0.5Strength: 0.5

Response：“I increased the size of the eyes a bit.Dose she looksbetter？”Response: "I increased the size of the eyes a bit.Dose she looks better?"

其中，Target的内容对应解析出的编辑意图T_k，即，Big Eyes，表征针对描述文本和交互编辑记忆推断出的本轮交互所需编辑的目标角色形象参数。这里，该编辑意图可以是用于后续针对虚拟形象进行编辑的文本指令，例如，用于指示针对上一轮交互获得的虚拟形象执行的编辑操作。Among them, the content of Target corresponds to the parsed editing intention T _k , that is, Big Eyes, which represents the target character image parameters that need to be edited for this round of interaction inferred from the description text and interactive editing memory. Here, the editing intention may be a text instruction for subsequent editing of the avatar, for example, used to indicate an editing operation performed on the avatar obtained in the previous round of interaction.

Strength的内容对应解析出的相关编辑强度s，即，0.5，表征推断出的本轮交互的编辑意图与描述文本之间的关联强度。The content of Strength corresponds to the parsed relevant editing strength s, that is, 0.5, which represents the association strength between the inferred editing intention of this round of interaction and the description text.

Response的内容对应虚拟形象创建系统向用户呈现的回复文本R_k，即，“Iincreased the size of the eyes a bit.Dose she looks better？”。The content of Response corresponds to the reply text R _k presented to the user by the avatar creation system, that is, "Iincreased the size of the eyes a bit.Dose she looks better?".

在本轮交互结束之后，可以根据描述文本y_k和回复文本R_k更新对话记忆结构，根据编辑意图T_k和相关编辑强度s更新编辑状态记忆结构。这里，下标k表示当前为第k轮交互，其中，1≤k≤M，M为正整数，示例性的，可以是预先设定的最多交互轮次数。After this round of interaction ends, the conversation memory structure can be updated according to the description text y _k and the reply text R _k , and the editing status memory structure can be updated according to the editing intention T _k and the related editing intensity s. Here, the subscript k indicates that the current round of interaction is the kth round, where 1≤k≤M, and M is a positive integer, which may be, for example, the preset maximum number of rounds of interaction.

在步骤S303中，获取初始隐变量。In step S303, the initial latent variable is obtained.

这里，可以通过对初始隐变量的多次迭代，获得优化后的隐变量，进而获得用于生成虚拟形象的角色形象参数。Here, the optimized latent variables can be obtained through multiple iterations of the initial latent variables, and then the character image parameters used to generate the virtual image can be obtained.

在本申请一优选实施例中，上述初始隐变量用以表征角色形象参数在低维空间的投影。In a preferred embodiment of the present application, the above-mentioned initial latent variables are used to represent the projection of character image parameters in a low-dimensional space.

这里，初始隐变量z_k为最近一次迭代对应的优化后的角色形象参数在低维空间的投影。对于非首次迭代的情况，该最近一次迭代指当前迭代的上一轮迭代，对于非首轮交互、但首次迭代的情况，该最近一次迭代指当前交互轮次的上一轮次中的最后一次迭代。Here, the initial latent variables z _k are the optimized character image parameters corresponding to the latest iteration. Projection into low-dimensional space. For the case of non-first iteration, the latest iteration refers to the last iteration of the current iteration. For the case of non-first round of interaction but the first iteration, the latest iteration refers to the last iteration of the previous round of the current interaction round. iterate.

对于首轮交互的首次迭代的情况，上述初始隐变量为基准虚拟形象对应的角色形象参数在低维空间的投影，示例性的，该基准虚拟形象包括与首轮交互的描述文本相匹配的虚拟形象，例如，可以从预先创建的多个候选虚拟形象中，选取与首轮交互的描述文本匹配度最高的一虚拟形象确定为上述基准虚拟形象。For the first iteration of the first round of interaction, the above-mentioned initial latent variable is the projection of the character image parameters corresponding to the baseline avatar in the low-dimensional space. For example, the baseline avatar includes a virtual character that matches the description text of the first round of interaction. For example, the avatar that has the highest matching degree with the description text of the first round of interaction can be selected from multiple pre-created candidate avatars and determined as the above-mentioned reference avatar.

针对虚拟形象为虚拟角色的面部图像的情况，上述角色形象参数可以包括捏脸参数，该捏脸参数可以包括用于表征虚拟角色的头部特征的参数，示例性的，可以包括但不限于以下项中的至少一项：表征面部特征(眼睛形状/尺寸、鼻子形状/尺寸、嘴唇形状/尺寸)的参数、表征头型、发型、发色特征的参数。For the case where the virtual image is a facial image of a virtual character, the above character image parameters may include face pinching parameters, and the face pinching parameters may include parameters used to characterize the head features of the virtual character. Examples may include but are not limited to the following. At least one of the items: parameters characterizing facial features (eye shape/size, nose shape/size, lip shape/size), parameters characterizing head shape, hairstyle, and hair color characteristics.

在步骤S304中，通过对初始隐变量进行解码，获得初始形象参数。In step S304, initial image parameters are obtained by decoding the initial latent variables.

这里，可以将该初始隐变量经过反投影矩阵，解码回原始空间(即，角色形象参数空间)，以获得初始形象参数。上述在低维空间与原始空间之间的投影与反投影过程为本领域的公知常识，本申请对此部分内容不再赘述。Here, the initial latent variable can be decoded back to the original space (ie, character image parameter space) through the back-projection matrix to obtain the initial image parameters. The above-mentioned projection and back-projection processes between the low-dimensional space and the original space are common knowledge in the art, and this part will not be described again in this application.

在一可选实施例中，可以直接将上述解码后获得的初始形象参数作为候选形象参数，以执行后续步骤S307。除此之外，还可以引入相关性向量，以基于初始形象参数和相关性向量，来确定候选形象参数。In an optional embodiment, the initial image parameters obtained after the above decoding can be directly used as candidate image parameters to perform subsequent step S307. In addition, a correlation vector can also be introduced to determine candidate image parameters based on the initial image parameters and the correlation vector.

具体的，在步骤S305中，获得相关性向量。Specifically, in step S305, a correlation vector is obtained.

例如，可以预测多个角色形象参数与本轮交互所确定的编辑意图的相关性，以获得相关性向量，这里，相关性向量中的每个元素用于表征对应的角色形象参数与编辑意图之间的相关强度。For example, the correlation between multiple character image parameters and the editing intention determined by this round of interaction can be predicted to obtain a correlation vector. Here, each element in the correlation vector is used to characterize the relationship between the corresponding character image parameter and the editing intention. the strength of the correlation between them.

示例性的，可以利用预先训练好的相关性预测模型，将解析得到的编辑意图T_k输入到相关性预测模型中，以预测出各个角色形象参数与编辑意图之间的相关性，从而形成一相关性向量r_k。For example, a pre-trained correlation prediction model can be used to input the parsed editing intention T _k into the correlation prediction model to predict the correlation between each character image parameter and the editing intention, thereby forming a Correlation vector r _k .

下面参照图4来介绍针对相关性预测模型的训练过程。The following describes the training process of the correlation prediction model with reference to Figure 4.

图4示出本申请示例性实施例提供的训练相关性预测模型的步骤的流程图。FIG. 4 shows a flowchart of steps of training a correlation prediction model provided by an exemplary embodiment of the present application.

如图4所示，在步骤S401中，获取第一训练样本。As shown in Figure 4, in step S401, a first training sample is obtained.

示例性的，可以通过以下方式来获取第一训练样本，由大语言模型根据示例撰写多条针对虚拟形象的描述文本样本，通过大语言模型粗标注和人工精标注，来得到与各描述文本样本对应的文本相关性标签，以由上述多个描述文本样本以及对应的文本相关性标签形成第一训练样本。For example, the first training sample can be obtained in the following way: a large language model writes multiple description text samples for the avatar according to the example, and the large language model rough annotation and manual fine annotation are used to obtain the description text sample corresponding to each description text sample. Corresponding text correlation labels, so as to form a first training sample from the plurality of description text samples and corresponding text correlation labels.

在步骤S402中，将当前迭代的描述文本样本输入到相关性预测模型，获得预测相关性向量。In step S402, the description text sample of the current iteration is input into the correlation prediction model to obtain a predicted correlation vector.

在步骤S403中，计算损失函数。In step S403, the loss function is calculated.

例如，可以计算预测相关性向量与该描述文本样本对应的文本相关性标签之间的损失函数。这里，可以利用各种方式来计算损失函数，本申请对此不做限制。For example, a loss function can be calculated between the predicted correlation vector and the textual correlation label corresponding to the description text sample. Here, various methods can be used to calculate the loss function, and this application does not limit this.

在步骤S404中，优化网络参数。In step S404, network parameters are optimized.

例如，可以利用损失函数，通过梯度下降法来更新相关性预测模型中的网络参数。For example, the loss function can be used to update the network parameters in the correlation prediction model through gradient descent.

在步骤S405中，判断当前迭代是否达到最大迭代次数。In step S405, it is determined whether the current iteration reaches the maximum number of iterations.

若未达到最大迭代次数，则将迭代次数加1，并返回执行步骤S402，以继续对相关性预测模型进行训练。If the maximum number of iterations is not reached, the number of iterations is increased by 1, and execution returns to step S402 to continue training the correlation prediction model.

若达到最大迭代次数，则执行步骤S406：保存模型参数。此时，获得训练好的相关性预测模型，即，基于上述优化后的网络参数所构建的相关性预测模型。If the maximum number of iterations is reached, step S406 is executed: save the model parameters. At this time, a trained correlation prediction model is obtained, that is, a correlation prediction model constructed based on the above optimized network parameters.

返回图3，在步骤S306中，根据初始形象参数和相关性向量，获得候选形象参数。Returning to Figure 3, in step S306, candidate image parameters are obtained based on the initial image parameters and the correlation vector.

例如，可以通过对相关性向量r_k和初始形象参数(如，)进行加权，示例性的，可以将相关性向量r_k中的一个元素与初始形象参数中对应的一个形象参数相乘，以得到初始隐变量z_k对应的候选形象参数x_k。For example, it can be calculated by comparing the correlation vector r _k and the initial image parameters (e.g., ) is weighted. For example, an element in the correlation vector r _k can be multiplied by a corresponding image parameter in the initial image parameter to obtain the candidate image parameter x _k corresponding to the initial latent variable z _k .

本申请实施例中，通过引入相关性向量r_k，可以实现对虚拟形象的精细调整，避免了不相关区域被改动。示例性的，在上述过程中，根据编辑意图的文字指令，经由相关性预测模型来预测各角色形象参数的相关性，如，预测出相关的五官区域和妆容属性，后续针对预测出的相关的区域和属性进行编辑。In the embodiment of the present application, by introducing the correlation vector r _k , the virtual image can be finely adjusted to avoid modification of irrelevant areas. For example, in the above process, according to the text instructions of the editing intention, the correlation of the image parameters of each character is predicted through the correlation prediction model, for example, the relevant facial features area and makeup attributes are predicted, and then the predicted relevant Regions and properties can be edited.

在本申请实施例中，将编辑意图T_k、相关编辑强度s、相关性向量r_k，输入到FR-T2P模型中，以得到迭代优化后(即，编辑后)的角色形象参数在上述过程中，通过不断迭代角色形象参数在低维空间的投影这一隐变量，最终使得该隐变量所对应的渲染形象图像与编辑意图的文本指令的编码距离尽量接近。In the embodiment of this application, the editing intention T _k , the relevant editing intensity s, and the correlation vector r _k are input into the FR-T2P model to obtain the character image parameters after iterative optimization (that is, after editing) In the above process, by continuously iterating the hidden variable of the projection of the character image parameters in the low-dimensional space, the encoding distance between the rendered image corresponding to the Cain variable and the text instruction of the editing intention is finally made as close as possible.

具体的，在步骤S307中，基于候选形象参数，获得游戏引擎的渲染形象图像。Specifically, in step S307, the rendered image of the game engine is obtained based on the candidate image parameters.

例如，可以将候选形象参数x_k输入预训练好的神经渲染器网络，来获得对应的渲染形象图像，该渲染形象图像可指游戏引擎渲染后的虚拟形象的图像。For example, the candidate image parameters x _k can be input into a pre-trained neural renderer network to obtain the corresponding rendered image. The rendered image may refer to the image of the virtual image rendered by the game engine.

示例性的，神经渲染器网络可以是可微渲染器，例如，根据预先准备好的(角色形象参数以及游戏引擎的渲染形象图像)数据进行训练，以模仿游戏引擎的渲染过程，从而实现渲染过程可微。For example, the neural renderer network can be a differentiable renderer, for example, trained according to pre-prepared data (character image parameters and game engine rendering image) to imitate the rendering process of the game engine, thereby realizing the rendering process. Differentiable.

在基于编辑意图(即，文本指令)生成角色形象参数的过程中，预训练了带有妆容的神经渲染器网络，从而实现了以完全随机梯度下降的方式进行角色形象参数的生成。下面参照图5来介绍训练可微渲染器的过程。In the process of generating character image parameters based on editing intention (i.e., textual instructions), a neural renderer network with makeup is pre-trained, thereby achieving the generation of character image parameters in a completely stochastic gradient descent manner. The process of training a differentiable renderer is introduced below with reference to Figure 5.

图5示出本申请示例性实施例提供的训练可微渲染器的步骤的流程图。FIG. 5 shows a flowchart of steps for training a differentiable renderer provided by an exemplary embodiment of the present application.

如图5所示，在步骤S501中，获取第二训练样本。As shown in Figure 5, in step S501, a second training sample is obtained.

示例性的，第二训练样本可以包括随机采样的样本形象参数以及游戏引擎渲染的虚拟形象图像，将随机采样样本形象参数作为输入，将游戏引擎渲染的虚拟形象图像作为输出，对可微渲染器进行训练。For example, the second training sample may include randomly sampled sample image parameters and avatar images rendered by a game engine. The randomly sampled image parameters are used as input, and the avatar image rendered by the game engine is used as an output. The differentiable renderer Conduct training.

对于虚拟形象指虚拟角色的面部图像的情况，可以随机采样连续面部参数及离散妆容参数与游戏引擎渲染的对应游戏虚拟角色的面部图像作为训练数据。For the case where the virtual image refers to the facial image of the virtual character, continuous facial parameters and discrete makeup parameters can be randomly sampled and the facial image of the corresponding game virtual character rendered by the game engine can be used as training data.

在步骤S502中，将样本形象参数输入到可微渲染器，获得游戏引擎的预测渲染图像。In step S502, the sample image parameters are input to the differentiable renderer to obtain a predicted rendering image of the game engine.

针对上述捏脸的示例，可以将连续面部参数输入可微渲染器，渲染出角色的预测面部图像。For the above example of face pinching, the continuous facial parameters can be input into the differentiable renderer to render the predicted facial image of the character.

在步骤S503中，计算损失函数。In step S503, the loss function is calculated.

例如，可以计算可微渲染器渲染的预测渲染图像与游戏引擎的渲染形象图像之间的损失函数。这里，可以利用各种方式来计算损失函数，本申请对此不做限制。For example, a loss function can be calculated between a predicted rendered image rendered by a differentiable renderer and a rendered image image rendered by a game engine. Here, various methods can be used to calculate the loss function, and this application does not limit this.

在步骤S504中，优化网络参数。In step S504, network parameters are optimized.

例如，可以利用损失函数，通过梯度下降法来更新可微渲染器中的网络参数。For example, a loss function can be used to update network parameters in a differentiable renderer via gradient descent.

在步骤S505中，判断当前迭代是否达到最大迭代次数。In step S505, it is determined whether the current iteration reaches the maximum number of iterations.

若未达到最大迭代次数，则将迭代次数加1，并返回执行步骤S502，以继续对可微渲染器进行训练。If the maximum number of iterations is not reached, the number of iterations is increased by 1, and step S502 is returned to continue training the differentiable renderer.

若达到最大迭代次数，则执行步骤S506：保存模型参数。此时，获得训练好的可微渲染器，即，基于上述优化后的网络参数所构建的可微渲染器。If the maximum number of iterations is reached, step S506 is executed: save the model parameters. At this time, a trained differentiable renderer is obtained, that is, a differentiable renderer built based on the above optimized network parameters.

在本申请实施例中，以缩小编辑意图与渲染形象图像之间的偏差为优化目标，通过多次迭代对候选形象参数进行优化，以获得优化后的角色形象参数。In the embodiment of the present application, the optimization goal is to reduce the deviation between the editing intention and the rendered image, and the candidate image parameters are optimized through multiple iterations to obtain optimized character image parameters.

具体的返回图3，在步骤S308中，将渲染形象图像输入CLIP，获得图像编码。Specifically, returning to Figure 3, in step S308, the rendered image is input into CLIP to obtain image coding.

示例性的，可以预先训练跨模态编码模型CLIP(Contrastive Language-ImagePre-Training)，以基于当前迭代的候选形象参数对应的游戏引擎的渲染形象图像，确定图像编码。For example, the cross-modal coding model CLIP (Contrastive Language-Image Pre-Training) can be pre-trained to determine the image encoding based on the rendered image of the game engine corresponding to the candidate image parameter of the current iteration.

在步骤S309中，基于编辑意图，获得文本编码。In step S309, based on the editing intention, the text encoding is obtained.

示例性的，可以将编辑意图T_k(即，文本指令)输入预先训练的CLIP，得到文本编码。For example, the editing intention T _k (ie, text instruction) can be input into the pre-trained CLIP to obtain text encoding.

在步骤S310中，根据所确定的图像编码和文本编码，获得意图理解损失。In step S310, the intent understanding loss is obtained according to the determined image encoding and text encoding.

例如，可以计算图像编码与文本编码之间的余弦距离，将该余弦距离确定为意图理解损失CLIP Loss。For example, the cosine distance between the image encoding and the text encoding can be calculated, and the cosine distance is determined as the intent understanding loss CLIP Loss.

在本申请实施例中，将基于梯度下降法来优化初始隐变量z_k以最小化上述的余弦距离，即，以意图理解损失为优化目标，对候选形象参数进行优化。In the embodiment of this application, the initial latent variable z _k will be optimized based on the gradient descent method to minimize the above-mentioned cosine distance, that is, the candidate image parameters will be optimized with the intention understanding loss as the optimization goal.

在本申请一优选实施例中，上述基于描述文本所获得的相关编辑强度s也会在上述优化过程中通过影响CLIP loss权重来实现编辑强度的控制。示例性的，可以计算相关编辑强度s与意图理解损失CLIP Loss的乘积，从而以在相关编辑强度影响下的意图理解损失(即，上述该乘积)为优化目标，对候选形象参数进行优化。In a preferred embodiment of the present application, the above-mentioned related editing intensity s obtained based on the description text will also control the editing intensity by affecting the CLIP loss weight during the above-mentioned optimization process. For example, the product of the relevant editing intensity s and the intention understanding loss CLIP Loss can be calculated, so that the candidate image parameters are optimized with the intention understanding loss under the influence of the relevant editing intensity (ie, the above-mentioned product) as the optimization target.

除上述在梯度下降法中提到的最主要的优化目标(CLIP Loss)外，在本申请一优选实施例中，还对初始隐变量z_k施加先验正则。In addition to the most important optimization objective (CLIP Loss) mentioned above in the gradient descent method, in a preferred embodiment of the present application, a priori regularization is also applied to the initial latent variable z _k .

具体的，在步骤S311中，获得先验损失。Specifically, in step S311, a priori loss is obtained.

示例性的，在单轮的文本到角色形象参数的生成过程中，引入先验分布并使用完全的梯度下降框架，做到更鲁棒、更快速的角色形象参数生成。For example, in the single-round generation process of text-to-character image parameters, a prior distribution is introduced and a complete gradient descent framework is used to achieve more robust and faster generation of character image parameters.

图6示出本申请示例性实施例提供的确定先验损失的步骤的流程图。FIG. 6 shows a flowchart of steps for determining a priori loss provided by an exemplary embodiment of the present application.

如图6所示，在步骤S601中，从虚拟形象样本集中获取多组样本。As shown in Figure 6, in step S601, multiple groups of samples are obtained from the avatar sample set.

这里，可以从公开的虚拟角色的形象图像数据集中获取多组样本，每组样本包括多个角色形象参数。Here, multiple sets of samples can be obtained from a public virtual character image image data set, and each set of samples includes multiple character image parameters.

在步骤S602中，统计多组样本对应的先验分布统计值。In step S602, the prior distribution statistical values corresponding to multiple groups of samples are counted.

示例性的，针对多组样本的角色形象参数的先验分布统计值可以包括但不限于均值μ_Z、协方差A_Z。这里，计算均值以及协方差的方法为本领域的公知常识，本申请对此不做限制。For example, the prior distribution statistical values of the character image parameters of multiple groups of samples may include but are not limited to the mean μ _Z and the covariance A _Z . Here, the method of calculating the mean and covariance is common knowledge in the art, and this application does not limit it.

在步骤S603中，根据先验分布统计值，确定先验损失。In step S603, the prior loss is determined based on the prior distribution statistical value.

示例性的，可以通过如下公式来计算先验损失：For example, the prior loss can be calculated by the following formula:

priorLoss＝||A_Z·(z-μ_Z)||² priorLoss＝||A _Z ·(z-μ _Z )|| ²

其中，priorLoss表示先验损失，z表示当前迭代的角色形象参数，示例性的，上述z也可以表示当前迭代的初始隐变量。Among them, priorLoss represents the prior loss, and z represents the character image parameter of the current iteration. For example, the above z can also represent the initial latent variable of the current iteration.

在一优选实施例中，可以在优化过程中施加先验约束，以意图理解损失和先验损失为优化目标，对候选形象参数进行优化。In a preferred embodiment, a priori constraints can be imposed during the optimization process, and the candidate image parameters can be optimized with intent understanding loss and a priori loss as optimization goals.

在这一过程中引入了先验分布，使得参数优化空间被限制在先验分布空间中并额外施加了先验分布约束，从而避免了生成不符合真实人脸分布的结果。In this process, the prior distribution is introduced, so that the parameter optimization space is limited to the prior distribution space and additional prior distribution constraints are imposed, thereby avoiding the generation of results that do not conform to the real face distribution.

返回图3，在步骤S312中，判断是否满足优化条件，即，判断优化过程是否结束。Returning to Figure 3, in step S312, it is determined whether the optimization conditions are met, that is, whether the optimization process is completed.

示例性的，可以通过判断是否达到最大迭代次数，来确定优化过程是否结束。For example, whether the optimization process ends can be determined by determining whether the maximum number of iterations is reached.

若不满足优化条件(如，未达到最大迭代次数)，则确定隐变量调整值，基于隐变量调整值对初始隐变量进行更新，并返回执行步骤S303，以基于更新后的初始隐变量再次进行迭代。If the optimization conditions are not met (for example, the maximum number of iterations is not reached), the hidden variable adjustment value is determined, the initial hidden variable is updated based on the hidden variable adjustment value, and the execution step S303 is returned to proceed again based on the updated initial hidden variable. iterate.

示例性的，可以通过以下方式来获得隐变量调整值：将意图理解损失和先验损失输入到优化器，以通过优化器确定出隐变量调整值，该隐变量调整值表征针对至少一个角色形象参数的调整程度和调整方向，可以以±ΔZ来表示。For example, the latent variable adjustment value can be obtained in the following manner: inputting the intent understanding loss and the prior loss to the optimizer to determine the latent variable adjustment value through the optimizer, and the latent variable adjustment value represents the image for at least one character image The degree and direction of parameter adjustment can be expressed as ±ΔZ.

若满足优化条件(如，达到最大迭代次数)，则执行步骤S313：确定角色形象参数。If the optimization conditions are met (for example, the maximum number of iterations is reached), step S313 is executed: determine the character image parameters.

例如，在上述优化目标下，不断以梯度下降的方式迭代优化初始隐变量z_k，收敛得到最优的隐变量再经由反投影即可得到角色形象参数/> For example, under the above optimization goal, the initial hidden variable z _k is continuously iteratively optimized using gradient descent, and the optimal hidden variable is obtained by convergence. Then through back-projection, the character image parameters can be obtained/>

在步骤S314中，展示虚拟形象和回复文本。In step S314, the avatar and the reply text are displayed.

例如，根据上述所确定的角色形象参数生成与编辑意图对应的虚拟形象。示例性的，可以在输入描述文本的虚拟形象编辑界面同时展示出所确定的虚拟形象和回复文本。For example, based on the character image parameters determined above Generate an avatar corresponding to the editing intention. For example, the determined avatar and the reply text can be displayed simultaneously on the avatar editing interface where the description text is input.

在步骤S315中，判断对话是否结束。In step S315, it is determined whether the conversation ends.

这里，可以预先设置对话结束条件(如，达到最大交互轮次)，在检测到满足所设置的对话结束条件时，确定对话结束，在检测到不满足所设置的对话结束条件时，确定对话未结束。Here, the dialogue end condition can be set in advance (for example, the maximum number of interaction rounds is reached). When it is detected that the set dialogue end condition is met, it is determined that the dialogue has ended. When it is detected that the set dialogue end condition is not met, it is determined that the dialogue has not been completed. Finish.

除上述方式之外，还可以在虚拟形象编辑界面上设置一用于触发对话结束的控制按钮，检测到针对该控制按钮的操作，确定对话结束，若未检测到针对该控制按钮的操作，则确定对话未结束。In addition to the above method, a control button for triggering the end of the dialogue can also be set on the avatar editing interface. When an operation on the control button is detected, the end of the dialogue is determined. If no operation on the control button is detected, then Make sure the conversation is not over.

若确定对话未结束，则返回执行步骤S301，继续接收用户在下一轮交互输入的描述文本。If it is determined that the conversation has not ended, the process returns to step S301 and continues to receive the description text input by the user in the next round of interaction.

若确定对话结束，则将当前轮交互所展示的虚拟形象确定为用户创建的虚拟形象。If it is determined that the dialogue is over, the avatar displayed in the current round of interaction is determined to be the avatar created by the user.

基于上述多轮交互，通过自然语言对话方式即可可能完成对虚拟形象的创建或者调整，而无需手动反复调整参数，也无需知晓每个参数的具体含义，还可以实现精细调整，避免了不相关区域被改动，这样，可以大大降低用户在虚拟形象调整过程中所花费的时间成本。Based on the above multiple rounds of interaction, it is possible to create or adjust the avatar through natural language dialogue, without the need to manually adjust parameters repeatedly, and without knowing the specific meaning of each parameter. Fine adjustments can also be achieved to avoid irrelevant problems. The area is changed, so that the time cost spent by the user in the avatar adjustment process can be greatly reduced.

基于同一申请构思，本申请实施例中还提供了与上述实施例提供的方法对应的虚拟形象生成装置，由于本申请实施例中的装置解决问题的原理与本申请上述实施例的虚拟形象生成方法相似，因此装置的实施可以参见方法的实施，重复之处不再赘述。Based on the same application concept, the embodiments of this application also provide a virtual image generation device corresponding to the method provided in the above embodiments. Since the problem-solving principle of the device in the embodiments of this application is the same as the virtual image generation method in the above embodiments of this application, are similar, so the implementation of the device can refer to the implementation of the method, and repeated details will not be repeated.

图7为本申请示例性实施例提供的虚拟形象生成装置的结构示意图。Figure 7 is a schematic structural diagram of a virtual image generation device provided by an exemplary embodiment of the present application.

如图7所示，该虚拟形象生成装置200包括：As shown in Figure 7, the virtual image generation device 200 includes:

获取模块210，获取用于创建虚拟形象的描述文本；The acquisition module 210 obtains the description text used to create the virtual image;

识别模块220，基于所获取的描述文本，追踪针对虚拟形象的编辑意图，并生成回复文本；The identification module 220 tracks the editing intention for the avatar based on the obtained description text, and generates a reply text;

展示模块230，展示所述回复文本以及根据所述编辑意图所生成的虚拟形象，所述回复文本用于指示所生成的虚拟形象针对所述描述文本所进行的编辑响应。The display module 230 displays the reply text and the avatar generated according to the editing intention, and the reply text is used to indicate the editing response of the generated avatar to the description text.

在本申请的一种可能实施方式中，识别模块220，还用于：提取用于创建虚拟形象的交互编辑记忆，所述交互编辑记忆用于指示针对虚拟形象的创建过程的交互历史；基于所述描述文本和所述交互编辑记忆，获得针对虚拟形象的编辑意图和所述回复文本。In a possible implementation of the present application, the identification module 220 is also used to: extract the interactive editing memory used to create the virtual image, the interactive editing memory is used to indicate the interaction history of the creation process of the virtual image; based on the The description text and the interactive editing memory are used to obtain the editing intention for the avatar and the reply text.

在本申请的一种可能实施方式中，所述创建过程包括至少一轮交互，在每轮交互过程中，基于该轮交互的描述文本来生成对应的回复文本和虚拟形象，其中，所述交互编辑记忆包括历史对话，所述历史对话包括至少一个对话组，每个对话组包括一轮交互中的描述文本及其对应的回复文本，和/或，所述交互编辑记忆包括编辑属性以及编辑属性对应的相关编辑强度，所述编辑属性包括多个角色形象参数中处于编辑状态的预设角色形象参数，所述相关编辑强度用于表征预设角色形象参数与描述文本之间的关联程度。In a possible implementation of the present application, the creation process includes at least one round of interaction. In each round of interaction, the corresponding reply text and avatar are generated based on the description text of the round of interaction, wherein the interaction The editing memory includes historical dialogues, the historical dialogues include at least one dialogue group, each dialogue group includes description text and its corresponding reply text in a round of interaction, and/or, the interactive editing memory includes editing attributes and editing attributes. Corresponding related editing strength, the editing attribute includes the preset character image parameter in the editing state among the plurality of character image parameters, and the related editing strength is used to represent the degree of association between the preset character image parameter and the description text.

在本申请的一种可能实施方式中，识别模块220，还用于：从对话记忆结构中提取历史对话，其中，识别模块220通过以下方式更新所述对话记忆结构：每当一轮交互结束，将该轮交互对应的描述文本及其对应的回复文本添加到所述对话记忆结构中。In a possible implementation of the present application, the identification module 220 is also used to extract historical dialogues from the dialogue memory structure, where the identification module 220 updates the dialogue memory structure in the following manner: whenever a round of interaction ends, Add the description text corresponding to this round of interaction and its corresponding reply text to the dialogue memory structure.

在本申请的一种可能实施方式中，展示模块230通过以下方式来根据所述编辑意图生成虚拟形象：基于候选形象参数，获得游戏引擎的渲染形象图像；以缩小编辑意图与渲染形象图像之间的偏差为优化目标，通过多次迭代对所述候选形象参数进行优化，以获得优化后的角色形象参数；根据所述角色形象参数，生成与所述编辑意图对应的虚拟形象。In a possible implementation of the present application, the display module 230 generates a virtual image according to the editing intention in the following manner: based on the candidate image parameters, obtains the rendering image of the game engine; to narrow the gap between the editing intention and the rendering image. The deviation of is the optimization target, and the candidate image parameters are optimized through multiple iterations to obtain optimized character image parameters; according to the character image parameters, a virtual image corresponding to the editing intention is generated.

在本申请的一种可能实施方式中，展示模块230通过以下方式获得候选形象参数：获取当前迭代的初始隐变量，所述初始隐变量表征角色形象参数在低维空间的投影；通过对所述初始隐变量进行解码，获得当前迭代的候选形象参数；其中，所述初始隐变量为最近一次迭代对应的优化后的角色形象参数在低维空间的投影，在首轮交互的首次迭代时，所述初始隐变量为基准虚拟形象对应的角色形象参数在低维空间的投影，所述基准虚拟形象包括与首轮交互的描述文本相匹配的虚拟形象。In a possible implementation of the present application, the display module 230 obtains the candidate image parameters in the following manner: obtaining the initial latent variable of the current iteration, which represents the projection of the character image parameters in a low-dimensional space; The initial latent variable is decoded to obtain the candidate image parameters of the current iteration; where the initial latent variable is the projection of the optimized character image parameters corresponding to the latest iteration in the low-dimensional space. In the first iteration of the first round of interaction, the The initial latent variable is the projection of the character image parameters corresponding to the base virtual image in the low-dimensional space. The base virtual image includes the virtual image that matches the description text of the first round of interaction.

在本申请的一种可能实施方式中，展示模块230通过以下方式获得候选形象参数：预测多个角色形象参数与所述编辑意图的相关性，以获得相关性向量，所述相关性向量中的每个元素用于表征对应的角色形象参数与编辑意图之间的相关强度；通过对初始隐变量进行解码，获得初始形象参数，所述初始隐变量表征角色形象参数在低维空间的投影；根据所述初始形象参数和所述相关性向量，获得候选形象参数；其中，所述初始隐变量为最近一次迭代对应的优化后的角色形象参数在低维空间的投影，在首轮交互的首次迭代时，所述初始隐变量为基准虚拟形象对应的角色形象参数在低维空间的投影，所述基准虚拟形象包括与首轮交互的描述文本相匹配的虚拟形象。In a possible implementation of the present application, the display module 230 obtains the candidate image parameters in the following manner: predicting the correlation between multiple character image parameters and the editing intention to obtain a correlation vector, in which Each element is used to represent the correlation strength between the corresponding character image parameters and the editing intention; the initial image parameters are obtained by decoding the initial latent variables, which represent the projection of the character image parameters in the low-dimensional space; according to The initial image parameters and the correlation vector are used to obtain candidate image parameters; wherein, the initial latent variable is the projection of the optimized character image parameters corresponding to the latest iteration in the low-dimensional space. In the first iteration of the first round of interaction When , the initial latent variable is the projection of the character image parameters corresponding to the reference virtual image in the low-dimensional space, and the reference virtual image includes the virtual image that matches the description text of the first round of interaction.

在本申请的一种可能实施方式中，展示模块230通过以下方式对所述候选形象参数进行优化：基于当前迭代的候选形象参数对应的游戏引擎的渲染形象图像，确定图像编码；基于所述编辑意图，确定文本编码；根据所确定的图像编码和文本编码，获得意图理解损失；以所述意图理解损失为优化目标，对所述候选形象参数进行优化。In a possible implementation of the present application, the display module 230 optimizes the candidate image parameters in the following manner: determining the image encoding based on the rendered image of the game engine corresponding to the candidate image parameters of the current iteration; based on the editing According to the intention, the text encoding is determined; according to the determined image encoding and text encoding, the intention understanding loss is obtained; with the intention understanding loss as the optimization target, the candidate image parameters are optimized.

在本申请的一种可能实施方式中，展示模块230，还用于：从虚拟形象样本集中获取多组样本，每组样本包括多个角色形象参数；统计所述多组样本对应的先验分布统计值；根据所述先验分布统计值，确定先验损失；其中，所述多次迭代以所述意图理解损失和所述先验损失为优化目标，对所述候选形象参数进行优化。In a possible implementation of the present application, the display module 230 is also used to: obtain multiple groups of samples from the virtual image sample set, each group of samples includes multiple character image parameters; and count the prior distributions corresponding to the multiple groups of samples. Statistical value; determine the prior loss according to the prior distribution statistical value; wherein the multiple iterations use the intention understanding loss and the prior loss as optimization goals to optimize the candidate image parameters.

在本申请的一种可能实施方式中，识别模块220，还用于：在基于所述描述文本获得针对虚拟形象的编辑意图的同时，还获得相关编辑强度，所述相关编辑强度用于表征目标角色形象参数与描述文本之间的关联程度，所述目标角色形象参数根据所述编辑意图来确定；其中，以在所述相关编辑强度影响下的意图理解损失为优化目标，对所述候选形象参数进行优化。In a possible implementation of the present application, the identification module 220 is also configured to: while obtaining the editing intention for the avatar based on the description text, also obtain the relevant editing intensity, which is used to characterize the target. The degree of association between the character image parameters and the description text, the target character image parameters are determined according to the editing intention; wherein, the loss of intention understanding under the influence of the relevant editing intensity is the optimization goal, and the candidate image is Parameters are optimized.

在本申请的一种可能实施方式中，识别模块220通过以下方式获得编辑意图、相关编辑强度和回复文本：将用于创建虚拟形象的交互编辑记忆和所述描述文本输入到预先训练的大型语言模型，以获得所述编辑意图、相关编辑强度和所述回复文本。In a possible implementation of the present application, the recognition module 220 obtains the editing intention, related editing strength and reply text in the following manner: inputting the interactive editing memory used to create the avatar and the description text into a pre-trained large language model to obtain the editing intent, relevant editing intensity, and the reply text.

在本申请的一种可能实施方式中，所述交互编辑记忆包括编辑属性以及编辑属性对应的相关编辑强度，其中，识别模块220通过以下方式提取用于创建虚拟形象的交互编辑记忆：从编辑状态记忆结构中提取编辑属性以及编辑属性对应的相关编辑强度，其中，识别模块220通过以下方式更新所述编辑状态记忆结构：每当一轮交互结束，搜索编辑状态记忆结构中是否存在与该轮交互对应的编辑意图，若存在，则对搜索到的编辑意图对应的相关编辑强度进行更新，若不存在，则将该轮交互对应的编辑意图以及相关编辑强度添加到所述编辑状态记忆结构中。In a possible implementation of the present application, the interactive editing memory includes editing attributes and related editing intensity corresponding to the editing attributes, wherein the identification module 220 extracts the interactive editing memory for creating the avatar in the following manner: from the editing state The editing attributes and the relevant editing intensity corresponding to the editing attributes are extracted from the memory structure. The identification module 220 updates the editing state memory structure in the following manner: whenever a round of interaction ends, search the editing state memory structure to see whether there is any interaction with that round. If the corresponding editing intention exists, the relevant editing intensity corresponding to the searched editing intention is updated. If it does not exist, the editing intention and the relevant editing intensity corresponding to this round of interaction are added to the editing state memory structure.

在本申请的一种可能实施方式中，获取模块210，还用于：展示游戏的虚拟形象编辑界面，所述虚拟形象编辑界面包括文本输入区域、形象展示区域和系统响应区域，基于用户在所述文本输入区域内的输入信息，获取所述描述文本，和/或，展示模块230，还用于：在所述形象展示区域内呈现根据所述编辑意图生成的虚拟形象，在所述系统响应区域内展示所述回复文本。In a possible implementation of the present application, the acquisition module 210 is also used to: display the avatar editing interface of the game. The avatar editing interface includes a text input area, an image display area and a system response area, based on where the user is. The input information in the text input area is obtained to obtain the description text, and/or the display module 230 is also used to: present the virtual image generated according to the editing intention in the image display area, and when the system responds The reply text is displayed in the area.

在本申请的一种可能实施方式中，所述虚拟形象包括虚拟角色的面部图像，所述角色形象参数包括捏脸参数。In a possible implementation manner of the present application, the virtual image includes a facial image of a virtual character, and the character image parameters include face pinching parameters.

基于上述装置，能够通过界面交互来生成虚拟形象，并及时对交互做出响应。Based on the above device, a virtual image can be generated through interface interaction and respond to the interaction in a timely manner.

请参阅图8，图8为本申请示例性实施例提供的电子设备的结构示意图。如图8所示，该电子设备300包括处理器310、存储器320和总线330。Please refer to FIG. 8 , which is a schematic structural diagram of an electronic device provided by an exemplary embodiment of the present application. As shown in FIG. 8 , the electronic device 300 includes a processor 310 , a memory 320 and a bus 330 .

所述存储器320存储有所述处理器310可执行的机器可读指令，当电子设备300运行时，所述处理器310与所述存储器320之间通过总线330通信，所述机器可读指令被所述处理器310执行时，可以执行如上述任一实施例中虚拟形象生成方法的步骤，具体如下：The memory 320 stores machine-readable instructions executable by the processor 310. When the electronic device 300 is running, the processor 310 and the memory 320 communicate through the bus 330, and the machine-readable instructions are When the processor 310 is executed, it may execute the steps of the virtual image generation method in any of the above embodiments, specifically as follows:

获取用于创建虚拟形象的描述文本；基于所获取的描述文本，追踪针对虚拟形象的编辑意图，并生成回复文本；展示所述回复文本以及根据所述编辑意图所生成的虚拟形象，所述回复文本用于指示所生成的虚拟形象针对所述描述文本所进行的编辑响应。Obtain the description text used to create the avatar; based on the obtained description text, track the editing intention for the avatar, and generate a reply text; display the reply text and the avatar generated according to the editing intention, and the reply The text is used to indicate the generated avatar's editing response to the description text.

基于上述电子设备，能够通过界面交互来生成虚拟形象，并及时对交互做出响应。Based on the above electronic devices, a virtual image can be generated through interface interaction and respond to the interaction in a timely manner.

本申请实施例还提供一种计算机可读存储介质，该存储介质上存储有计算机程序，该计算机程序被处理器运行时可以执行如上述任一实施例中虚拟形象生成方法的步骤，具体如下：Embodiments of the present application also provide a computer-readable storage medium. A computer program is stored on the storage medium. When the computer program is run by a processor, the computer program can perform the steps of the virtual image generation method in any of the above embodiments, specifically as follows:

基于上述计算机可读存储介质，能够通过界面交互来生成虚拟形象，并及时对交互做出响应。Based on the above computer-readable storage medium, a virtual image can be generated through interface interaction and respond to the interaction in a timely manner.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的系统和装置的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。在本申请所提供的几个实施例中，应所述理解到，所揭露的系统、装置和方法，可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，又例如，多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。Those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working processes of the systems and devices described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be described again here. In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some communication interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.

所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者所述技术方案的部分可以以软件产品的形式体现出来，所述计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(Read-Only Memory，ROM)、随机存取存储器(Random Access Memory，RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium that is executable by a processor. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program code. .

以上仅为本申请的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应以权利要求的保护范围为准。The above are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present application, and they should be covered by within the protection scope of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

1. A method for generating a virtual image, characterized in that the method includes:

Get the description text used to create the avatar;

Based on the obtained description text, track the editing intention for the avatar and generate a reply text;

The reply text and the avatar generated according to the editing intention are displayed, and the reply text is used to indicate the editing response of the generated avatar to the description text.

2. The method according to claim 1, wherein the step of tracking the editing intention for the avatar based on the obtained description text and generating a reply text includes:

Extracting an interactive editing memory used to create the avatar, the interactive editing memory being used to indicate an interaction history for the creation process of the avatar;

Based on the description text and the interactive editing memory, the editing intention for the avatar and the reply text are obtained.

3. The method according to claim 2, characterized in that the creation process includes at least one round of interaction, and in each round of interaction, the corresponding reply text and avatar are generated based on the description text of the round of interaction,

Wherein, the interactive editing memory includes historical dialogues, and the historical dialogues include at least one dialogue group, and each dialogue group includes the description text and its corresponding reply text in a round of interaction,

And/or, the interactive editing memory includes editing attributes and related editing strengths corresponding to the editing attributes. The editing attributes include preset character image parameters in an editing state among multiple character image parameters, and the related editing strengths are used to represent Preset the degree of correlation between the character image parameters and the description text.

4. The method according to claim 3, wherein the step of extracting the interactive editing memory used to create the avatar includes:

Extract historical dialogues from dialogue memory structures,

Wherein, the dialogue memory structure is updated in the following manner: whenever a round of interaction ends, the description text corresponding to the round of interaction and its corresponding reply text are added to the dialogue memory structure.

5. The method according to claim 1, characterized in that the avatar is generated according to the editing intention in the following manner:

Based on the candidate image parameters, obtain the rendered image of the game engine;

With the optimization goal of reducing the deviation between the editing intention and the rendered image, the candidate image parameters are optimized through multiple iterations to obtain optimized character image parameters;

According to the character image parameters, an avatar corresponding to the editing intention is generated.

6. The method according to claim 5, characterized in that the candidate image parameters are obtained in the following manner:

Obtain the initial latent variable of the current iteration, which represents the projection of the character image parameters in the low-dimensional space;

By decoding the initial latent variable, the candidate image parameters of the current iteration are obtained;

Wherein, the initial latent variable is the projection of the optimized character image parameters corresponding to the latest iteration in the low-dimensional space. In the first iteration of the first round of interaction, the initial latent variable is the character image parameter corresponding to the benchmark virtual image in A projection of the low-dimensional space, the baseline avatar includes an avatar that matches the description text of the first round of interaction.

7. The method according to claim 5, characterized in that the candidate image parameters are obtained in the following manner:

Predict the correlation between multiple character image parameters and the editing intention to obtain a correlation vector, where each element in the correlation vector is used to characterize the correlation strength between the corresponding character image parameter and the editing intention;

By decoding the initial latent variables, the initial image parameters are obtained, and the initial latent variables represent the projection of the character image parameters in the low-dimensional space;

Obtain candidate image parameters according to the initial image parameters and the correlation vector;

8. The method according to any one of claims 5-7, characterized in that the candidate image parameters are optimized in the following manner:

Determine the image encoding based on the rendered image of the game engine corresponding to the candidate image parameter of the current iteration;

Based on the editorial intent, determine text encoding;

Intent understanding loss is obtained based on the determined image encoding and text encoding;

Taking the intention understanding loss as the optimization target, the candidate image parameters are optimized.

9. The method according to claim 8, characterized in that the method further comprises:

Obtain multiple groups of samples from the avatar sample set, each group of samples includes multiple character image parameters;

Statistics of prior distribution statistical values corresponding to the multiple groups of samples;

Determine the prior loss based on the prior distribution statistical value;

Wherein, the multiple iterations use the intention understanding loss and the prior loss as optimization targets to optimize the candidate image parameters.

10. The method according to claim 8, characterized in that the method further comprises:

While obtaining the editing intention for the virtual image based on the description text, the relevant editing intensity is also obtained. The relevant editing intensity is used to represent the degree of association between the target character image parameters and the description text. The target character image parameters are based on Determined by the editorial intent;

Wherein, the candidate image parameters are optimized with the loss of intention understanding under the influence of the relevant editing intensity as the optimization target.

11. The method according to claim 10, characterized in that the editing intention, related editing intensity and reply text are obtained in the following manner:

The interactive editing memory used to create the avatar and the description text are input into a pre-trained large-scale language model to obtain the editing intention, related editing intensity and the reply text.

12. The method according to claim 10, characterized in that the interactive editing memory includes editing attributes and related editing intensity corresponding to the editing attributes,

Among them, the interactive editing memory used to create the avatar is extracted in the following ways:

Extract the editing attributes and the related editing intensity corresponding to the editing attributes from the editing status memory structure,

Wherein, the editing status memory structure is updated in the following manner:

Whenever a round of interaction ends, search the editing state memory structure to see if there is an editing intention corresponding to the round of interaction.

If it exists, update the relevant editing intensity corresponding to the searched editing intention.

If it does not exist, the editing intention and related editing intensity corresponding to this round of interaction are added to the editing state memory structure.

13. The method of claim 1, wherein the step of obtaining description text for creating an avatar includes:

Display the virtual image editing interface of the game, the virtual image editing interface includes a text input area, an image display area and a system response area,

Obtain the description text based on the user's input information in the text input area,

And/or, the step of displaying the reply text and the avatar generated according to the editing intention includes:

Present the virtual image generated according to the editing intention in the image display area,

Display the reply text in the system response area.

14. The method according to claim 5, wherein the virtual image includes a facial image of a virtual character, and the character image parameters include face pinching parameters.

15. A virtual image generating device, characterized in that the device includes:

Get the module to get the description text used to create the avatar;

The identification module, based on the obtained description text, tracks the editing intention for the avatar and generates a reply text;

A display module displays the reply text and the avatar generated according to the editing intention, and the reply text is used to indicate the editing response of the generated avatar to the description text.

16. An electronic device, characterized in that it includes: a processor, a storage medium and a bus, the storage medium stores machine-readable instructions executable by the processor, and when the electronic device is running, the processor and The storage media communicate with each other through a bus, and the processor executes the machine-readable instructions to perform the steps of the method according to any one of claims 1 to 14.

17. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the steps of the method according to any one of claims 1 to 14 are executed. .