CN114626335A

CN114626335A - Character generation method, network training method, device, equipment and storage medium

Info

Publication number: CN114626335A
Application number: CN202210144287.XA
Authority: CN
Inventors: 杨奕骁; 陈宸; 李宇聪; 鞠奇
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-02-17
Filing date: 2022-02-17
Publication date: 2022-06-14
Anticipated expiration: 2042-02-17
Also published as: CN114626335B

Abstract

The application provides a character generation method, a network training device, a network training equipment and a storage medium for character generation, which can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like, wherein the character generation method comprises the following steps: acquiring at least two candidate characters with different font style information from a character set to be processed; generating font style information corresponding to at least two candidate characters based on a font style coding network; generating target characters corresponding to the font style information and the character content information based on a character generation network; the font style coding network and the character generation network are obtained by performing character generation training on a preset neural network, and the hidden space of the sample font style information used in the character generation training process is restricted to be in normal distribution. According to the embodiment of the application, the generation quality of the target characters can be improved, and the generation cost of the target characters is reduced.

Description

Character generation method, network training method, device, equipment and storage medium

技术领域technical field

本申请属于计算机技术领域，具体涉及一种文字生成方法、网络训练方法、装置、设备及存储介质。The present application belongs to the field of computer technology, and specifically relates to a character generation method, a network training method, an apparatus, a device and a storage medium.

背景技术Background technique

相关技术中，通常从某一种字体的文字子集中，学习到这种字体的风格表示，进而生成新字体对应的整套文字。In the related art, the stylistic representation of a font is usually learned from a subset of characters of a certain font, and then a whole set of characters corresponding to the new font is generated.

然而，相关技术需要预先设置这种新字体下的部分文字，然后将这些文字输入到文字生成模型中学习风格特征，之后从模型中得到其他文字结果，最终得到新字体对应的整套文字，文字生成的过程会消较多的系统资源，导致文字生成的成本较高。且相关技术中的文字生成模型容易出现笔画缺失或笔画粘连问题，导致文字生成的质量较低。However, related technologies need to pre-set some texts under this new font, and then input these texts into a text generation model to learn style features, and then obtain other text results from the model, and finally obtain a complete set of texts corresponding to the new fonts. This process will consume more system resources, resulting in higher cost of text generation. In addition, the text generation model in the related art is prone to the problem of missing strokes or adhesion of strokes, resulting in low quality of text generation.

发明内容SUMMARY OF THE INVENTION

为了解决上述技术问题，本申请提供一种文字生成方法、网络训练方法、装置、设备及存储介质。In order to solve the above technical problems, the present application provides a character generation method, a network training method, an apparatus, a device and a storage medium.

一方面，本申请提出了一种文字生成方法，所述方法包括：On the one hand, the present application proposes a method for generating text, the method comprising:

从待处理文字集中获取字体风格信息不同的至少两种候选文字；Obtain at least two candidate characters with different font style information from the character set to be processed;

基于字体风格编码网络生成所述至少两种候选文字各自对应的字体风格信息；Generate font style information corresponding to each of the at least two candidate characters based on a font style coding network;

基于文字生成网络，生成所述字体风格信息和文字内容信息对应的目标文字；所述文字内容信息表征所述待处理文字集中的文字的内容；Based on the text generation network, the target text corresponding to the font style information and the text content information is generated; the text content information represents the content of the text in the to-be-processed text set;

其中，所述字体风格编码网络和所述文字生成网络为对预设神经网络进行文字生成训练得到，所述文字生成训练过程中所使用到的样本字体风格信息的隐空间被约束为正态分布。The font style encoding network and the text generation network are obtained by performing text generation training on a preset neural network, and the latent space of the sample font style information used in the text generation training process is constrained to be a normal distribution .

另一方面，本申请提供了一种文字生成的网络训练方法，所述方法包括：On the other hand, the present application provides a network training method for text generation, the method comprising:

从样本文字集中提取第一样本文字和第二样本文字；Extract the first sample text and the second sample text from the sample text set;

基于所述第一样本文字的样本字体风格信息和所述第二样本文字的样本文字内容信息，对预设神经网络进行文字生成训练，在所述文字生成训练过程中，将所述样本字体风格信息的隐空间约束为正态分布，得到字体风格编码网络和文字生成网络。Based on the sample font style information of the first sample text and the sample text content information of the second sample text, text generation training is performed on a preset neural network, and during the text generation training process, the sample font The latent space constraint of style information is normal distribution, and the font style encoding network and text generation network are obtained.

另一方面，本申请实施例提供了一种文字生成装置，所述装置包括：On the other hand, an embodiment of the present application provides a character generation device, the device comprising:

文字获取模块，用于从待处理文字集中获取字体风格信息不同的至少两种候选文字；A text acquisition module, used for acquiring at least two candidate texts with different font style information from the to-be-processed text set;

字体风格信息生成模块，用于基于字体风格编码网络生成所述至少两种候选文字各自对应的字体风格信息；a font style information generation module, configured to generate font style information corresponding to each of the at least two candidate characters based on a font style coding network;

目标文字生成模块，用于基于文字生成网络，生成所述字体风格信息和文字内容信息对应的目标文字；所述文字内容信息表征所述待处理文字集中的文字的内容；A target text generation module, configured to generate a target text corresponding to the font style information and text content information based on a text generation network; the text content information represents the content of the text in the to-be-processed text set;

另一方面，本申请提供了一种文字生成的网络训练装置，所述装置包括：On the other hand, the present application provides a network training device for text generation, the device comprising:

样本文字获取模块，用于从样本文字集中提取第一样本文字和第二样本文字；The sample text acquisition module is used to extract the first sample text and the second sample text from the sample text set;

训练模块，用于基于所述第一样本文字的样本字体风格信息和所述第二样本文字的样本文字内容信息，对预设神经网络进行文字生成训练，在所述文字生成训练过程中，将所述样本字体风格信息的隐空间约束为正态分布，得到字体风格编码网络和文字生成网络。A training module, configured to perform text generation training on a preset neural network based on the sample font style information of the first sample text and the sample text content information of the second sample text, and during the text generation training process, Constraining the latent space of the sample font style information to a normal distribution, a font style encoding network and a character generation network are obtained.

另一方面，本申请提出了一种电子设备，所述电子设备包括处理器和存储器，存储器中存储有至少一条指令或至少一段程序，至少一条指令或至少一段程序由处理器加载并执行以实现如上述所述的文字生成方法或文字生成的网络训练方法。On the other hand, the present application proposes an electronic device, the electronic device includes a processor and a memory, the memory stores at least one instruction or at least one program, and the at least one instruction or at least one program is loaded and executed by the processor to realize As described above, the text generation method or the text generation network training method.

另一方面，本申请提出了一种计算机可读存储介质，所述计算机可读存储介质中存储有至少一条指令或至少一段程序，所述至少一条指令或所述至少一段程序由处理器加载并执行以实现如上述所述的文字生成方法或文字生成的网络训练方法。On the other hand, the present application proposes a computer-readable storage medium, in which at least one instruction or at least one program is stored, and the at least one instruction or the at least one program is loaded by a processor and stored therein. Execute to implement the text generation method or the text generation network training method as described above.

另一方面，本申请提出了一种计算机程序产品，包括计算机程序，所述计算机程被处理器执行时实现如上述所述的文字生成方法或文字生成的网络训练方法。On the other hand, the present application proposes a computer program product, including a computer program, when the computer program is executed by a processor, the above-mentioned character generation method or character generation network training method is implemented.

本申请实施例提出的文字生成方法、网络训练方法、装置、设备及存储介质，使用训练好的字体风格编码网络生成该至少两种候选文字各自对应的字体风格信息，以及使用训练好的文字生成网络，生成该字体风格信息和文字内容信息对应的目标文字，由于样本字体风格信息的隐空间在训练过程中被约束为正态分布，从而压缩了字体风格信息，拉近了不同字体风格信息之间的间距，使得风格隐空间的变化更加平滑，避免网络遭遇间断点，从而提高了目标文字的生成质量；此外，使用训练好的字体风格编码网络和文字生成网络，还可以减少目标文字生成过程对系统资源的消耗，从而降低目标文字的生成成本。The text generation method, network training method, device, equipment and storage medium proposed in the embodiments of the present application use a trained font style coding network to generate font style information corresponding to the at least two candidate characters, and use the trained text to generate font style information. The network generates the target text corresponding to the font style information and text content information. Since the latent space of the sample font style information is constrained to a normal distribution during the training process, the font style information is compressed and the difference between different font style information is shortened. The spacing between the font styles makes the change of the style latent space smoother, avoiding the network encountering discontinuities, thus improving the generation quality of the target text; in addition, using the trained font style encoding network and text generation network can also reduce the target text generation process. Consumption of system resources, thereby reducing the cost of generating target text.

附图说明Description of drawings

为了更清楚地说明本申请实施例或现有技术中的技术方案和优点，下面将对实施例或现有技术描述中所需要使用的附图作简单的介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其它附图。In order to more clearly illustrate the technical solutions and advantages in the embodiments of the present application or in the prior art, the following briefly introduces the accompanying drawings used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description The drawings are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without any creative effort.

图1是根据一示例性实施例示出的一种文字生成方法的实施环境示意图。FIG. 1 is a schematic diagram of an implementation environment of a text generation method according to an exemplary embodiment.

图2是根据一示例性实施例示出的一种文字生成方法的流程示意图。FIG. 2 is a schematic flowchart of a method for generating characters according to an exemplary embodiment.

图3是根据一示例性实施例示出的一种对风格和内容进行标准化处理的流程图。Fig. 3 is a flow chart showing a process of standardizing style and content according to an exemplary embodiment.

图4是根据一示例性实施例示出的一种文字生成的网络训练方法的流程图。Fig. 4 is a flow chart of a network training method for text generation according to an exemplary embodiment.

图5是根据一示例性实施例示出的一种预设神经网络示意图。Fig. 5 is a schematic diagram of a preset neural network according to an exemplary embodiment.

图6是根据一示例性实施例示出的一种训练得到的格编码网络和文字生成网络的流程图。Fig. 6 is a flow chart of a lattice coding network and a character generation network obtained by training according to an exemplary embodiment.

图7是根据一示例性实施例示出的一种得到样本字体风格信息的示意图。FIG. 7 is a schematic diagram of obtaining sample font style information according to an exemplary embodiment.

图8是根据一示例性实施例示出的一种对样本字体风格信息和样本文字内容信息进行标准化处理的示意图。FIG. 8 is a schematic diagram illustrating a method of standardizing sample font style information and sample text content information according to an exemplary embodiment.

图9是根据一示例性实施例示出的采用本申请实施例的文字生成方法生成的目标文字的示意图。FIG. 9 is a schematic diagram of a target text generated by using the text generation method according to an embodiment of the present application, according to an exemplary embodiment.

图10是根据一示例性实施例示出的融合效果对比图。FIG. 10 is a comparison diagram of fusion effects according to an exemplary embodiment.

图11是根据一示例性实施例示出的通过字体相似度检测模型对目标文字的字体进行检测，所得到的相似字体的示意图。FIG. 11 is a schematic diagram of a similar font obtained by detecting the font of a target text by a font similarity detection model according to an exemplary embodiment.

图12是根据一示例性实施例示出的一种文字生成装置。Fig. 12 shows a text generating apparatus according to an exemplary embodiment.

图13是根据一示例性实施例示出的一种文字生成的网络训练装置。Fig. 13 shows a network training apparatus for text generation according to an exemplary embodiment.

图14是本申请实施例提供的一种文字生成或文字生成的网络训练的服务器的硬件结构框图。FIG. 14 is a hardware structural block diagram of a server for character generation or character generation network training provided by an embodiment of the present application.

具体实施方式Detailed ways

人工智能(Artificial Intelligence，AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能，感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说，人工智能是计算机科学的一个综合技术，它企图了解智能的实质，并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法，使机器具有感知、推理与决策的功能。Artificial Intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.

人工智能技术是一门综合学科，涉及领域广泛，既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。Artificial intelligence technology is a comprehensive discipline, involving a wide range of fields, including both hardware-level technology and software-level technology. The basic technologies of artificial intelligence generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

具体地，本申请实施例涉及深度学习中的人工神经网络技术。Specifically, the embodiments of the present application relate to artificial neural network technology in deep learning.

为了使本技术领域的人员更好地理解本申请方案，下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分的实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都应当属于本申请保护的范围。In order to make those skilled in the art better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only The embodiments are part of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by persons of ordinary skill in the art without creative work shall fall within the protection scope of the present application.

需要说明的是，本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、系统、产品或服务器不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first", "second", etc. in the description and claims of the present application and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It is to be understood that data so used may be interchanged under appropriate circumstances so that the embodiments of the application described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having" and any variations thereof, are intended to cover non-exclusive inclusion, for example, a process, method, system, product or server comprising a series of steps or units is not necessarily limited to those expressly listed Rather, those steps or units may include other steps or units not expressly listed or inherent to these processes, methods, products or devices.

图1是根据一示例性实施例示出的一种文字生成方法的实施环境示意图。如图1所示，该实施环境至少可以包括终端01和服务器02，该终端01和服务器02之间可以通过有线或无线通信方式进行直接或间接地连接，本申请在此不做限制。FIG. 1 is a schematic diagram of an implementation environment of a text generation method according to an exemplary embodiment. As shown in FIG. 1 , the implementation environment may include at least a terminal 01 and a server 02 , and the terminal 01 and the server 02 may be directly or indirectly connected through wired or wireless communication, which is not limited in this application.

具体地，该终端可以用于采集待处理文字集和样本文字集。可选地，该终端01可以包括但不限于手机、电脑、智能语音交互设备、智能家电、车载终端、飞行器等。本申请实施例可应用于各种场景，包括但不限于云技术、人工智能、智慧交通、辅助驾驶等。Specifically, the terminal can be used to collect the text set to be processed and the sample text set. Optionally, the terminal 01 may include, but is not limited to, a mobile phone, a computer, an intelligent voice interaction device, a smart home appliance, a vehicle-mounted terminal, an aircraft, and the like. The embodiments of the present application may be applied to various scenarios, including but not limited to cloud technology, artificial intelligence, smart transportation, and assisted driving.

具体地，该服务器02可以用于训练字体风格编码网络和文字生成网络，并基于该字体风格编码网络和文字生成网络生成目标文字。可选地，该服务器02可以是独立的物理服务器，也可以是多个物理服务器构成的服务器集群或者分布式系统，还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN、以及大数据和人工智能平台等基础云计算服务的云服务器。Specifically, the server 02 can be used to train a font style encoding network and a character generation network, and generate target characters based on the font style encoding network and the character generation network. Optionally, the server 02 may be an independent physical server, a server cluster or a distributed system composed of multiple physical servers, or a cloud service, cloud database, cloud computing, cloud function, cloud storage, network Cloud servers for basic cloud computing services such as services, cloud communications, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms.

需要说明的是，图1仅仅是一种示例。在其他场景中，还可以包括其他实施环境，例如，该实施环境可以包括终端，通过终端训练得到字体风格编码网络和文字生成网络，并基于该字体风格编码网络和文字生成网络生成目标文字。It should be noted that FIG. 1 is only an example. In other scenarios, other implementation environments may also be included, for example, the implementation environment may include a terminal, a font style encoding network and a character generation network are obtained through terminal training, and target characters are generated based on the font style encoding network and the character generation network.

图2是根据一示例性实施例示出的一种文字生成方法的流程示意图。该方法可以用于图1中的实施环境中。本说明书提供了如实施例或流程图所述的方法操作步骤，但基于常规或者无创造性的劳动可以包括更多或者更少的操作步骤。实施例中列举的步骤顺序仅仅为众多步骤执行顺序中的一种方式，不代表唯一的执行顺序。在实际中的系统或服务器产品执行时，可以按照实施例或者附图所示的方法顺序执行或者并行执行(例如并行处理器或者多线程处理的环境)。具体的如图2所示，该方法可以包括：FIG. 2 is a schematic flowchart of a method for generating characters according to an exemplary embodiment. This method can be used in the implementation environment of FIG. 1 . This specification provides method operation steps as described in the embodiments or flow charts, but more or less operation steps may be included based on routine or non-creative work. The sequence of steps enumerated in the embodiments is only one of the execution sequences of many steps, and does not represent the only execution sequence. When an actual system or server product is executed, it can be executed sequentially or in parallel (for example, in a parallel processor or multi-threaded processing environment) according to the embodiments or the methods shown in the accompanying drawings. Specifically, as shown in Figure 2, the method may include:

S101.从待处理文字集中获取字体风格信息不同的至少两种候选文字。S101. Acquire at least two candidate characters with different font style information from the character set to be processed.

具体地，该字体风格信息可以包括但不限于：楷体、宋体、方正卡通、黑体、华文行楷、方正桃体等。可选地，该待处理文字集中可以包括各种字体风格信息的所有文字。例如包含了楷体的所有文字、宋体的所有文字、方正卡通的所有文字等。Specifically, the font style information may include, but is not limited to, italics, Songs, Fangzheng cartoon, Hei, Chinese Xingkai, Fangzheng peach, and the like. Optionally, the to-be-processed character set may include all characters of various font style information. For example, it includes all the characters in italics, all the characters in Song style, and all the characters in Fangzheng cartoons, etc.

本申请实施例中，可以采用多种方式从待处理文字集中获取字体风格信息不同的至少两种候选文字，在此不做具体限定。In this embodiment of the present application, at least two candidate characters with different font style information may be obtained from the character set to be processed in various ways, which are not specifically limited herein.

在一种方式中，可以将该处理文字集中的文字按照字体风格信息进行分类，得到多个字体风格信息类别，每个字体风格信息类别均对应多个文字。可以从多个字体风格信息类别中确定出至少两个候选字体风格信息类别，并从该至少两个候选字体风格信息类别对应的文字中，分别抽取出一个文字，得到至少两种候选文字。例如，可以从多个字体风格信息类别中确定出楷体类别和宋体类别，楷体类别对应多个文字，宋体类别对应多个文字，从楷体类别对应的多个文字中抽取一个文字，并从宋体类别对应的多个文字抽取一个文字，得到至少两种候选文字。In one manner, the characters in the processed character set may be classified according to the font style information to obtain a plurality of font style information categories, and each font style information category corresponds to a plurality of characters. At least two candidate font style information categories may be determined from a plurality of font style information categories, and one character is respectively extracted from the characters corresponding to the at least two candidate font style information categories to obtain at least two candidate characters. For example, it is possible to determine the italic type and the Song type category from multiple font style information categories, the italic type corresponds to multiple characters, the Song type category corresponds to multiple characters, extract a character from the multiple characters corresponding to the italic type, and extract a character from the Song type category One character is extracted from the corresponding multiple characters to obtain at least two candidate characters.

在另一种方式中，还可以从该至少两个候选字体风格信息类别对应的文字中，分别抽取出预设数量个文字，得到至少两种候选文字。例如，可以从多个字体风格信息类别中确定出楷体类别和宋体类别，楷体类别对应多个文字，宋体类别对应多个文字，从楷体类别对应的多个文字抽取预设数量个文字，并从楷体类别对应的多个文字抽取预设数量个文字，得到至少两种候选文字。In another manner, a preset number of characters may be respectively extracted from the characters corresponding to the at least two candidate font style information categories to obtain at least two kinds of candidate characters. For example, it is possible to determine the italic type and the Song type category from a plurality of font style information categories, the italic type corresponds to a plurality of characters, the Song type category corresponds to a plurality of characters, extract a preset number of characters from the multiple characters corresponding to the italic type, and extract a preset number of characters from the multiple characters corresponding to the italic type. A preset number of characters are extracted from a plurality of characters corresponding to the italic type category, and at least two kinds of candidate characters are obtained.

S103.基于字体风格编码网络生成上述至少两种候选文字各自对应的字体风格信息。S103. Generate font style information corresponding to each of the at least two candidate characters above based on a font style coding network.

S105.基于文字生成网络，生成上述字体风格信息和文字内容信息对应的目标文字；上述文字内容信息表征上述待处理文字集中的文字的内容。S105. Based on the text generation network, generate the target text corresponding to the font style information and the text content information; the text content information represents the content of the text in the to-be-processed text set.

其中，上述字体风格编码网络和上述文字生成网络为对预设神经网络进行文字生成训练得到，上述文字生成训练过程中所使用到的样本字体风格信息的隐空间被约束为正态分布。The font style encoding network and the text generation network are obtained by performing text generation training on a preset neural network, and the latent space of the sample font style information used in the text generation training process is constrained to be a normal distribution.

示例性地，该文字内容信息可以是文字内容的空间信息，例如，文字的笔画结构。Exemplarily, the text content information may be spatial information of the text content, for example, the stroke structure of the text.

示例性地，该隐空间(latent space)可以指的是隐变量(例如，噪声Z)所在的空间，隐空间约束可以理解为对隐空间的隐向量进行约束，通过在训练过程中对样本字体风格信息的隐空间进行约束，使得在文字生成过程中，能够生成质量和效果较好的文字。Exemplarily, the latent space can refer to the space where the latent variables (eg, noise Z) are located, and the latent space constraint can be understood as constraining the latent vector of the latent space, and by adjusting the sample fonts during the training process. The latent space of style information is constrained, so that in the process of text generation, text with better quality and effect can be generated.

在一个具体的实施例中，在上述步骤S103中，可以将至少两种候选文字，输入预先训练好的字体风格编码网络中，通过该字体风格编码网络分别对至少两种候选文字进行特征提取，得到至少两种候选文字各自对应的字体风格信息。例如，通过预先训练好的编码网络，提取出的至少两种候选文字各自对应的字体风格信息分别为楷体和宋体。In a specific embodiment, in the above step S103, at least two candidate characters may be input into a pre-trained font style coding network, and feature extraction is performed on the at least two candidate characters through the font style coding network, respectively, Obtain font style information corresponding to at least two candidate characters respectively. For example, through the pre-trained coding network, the font style information corresponding to the at least two candidate characters extracted are Kai type and Song type respectively.

可选地，本申请实施例可以采用多种方式生成上述至少两种候选文字各自对应的字体风格信息，在此不做具体限定。Optionally, in the embodiment of the present application, the font style information corresponding to the at least two candidate characters above may be generated in various ways, which are not specifically limited herein.

在一种方式中，在上述步骤S103中，上述基于字体风格编码网络生成上述至少两种候选文字各自对应的字体风格信息，可以包括：基于上述字体风格编码网络将上述至少两种候选文字映射到上述正态分布中，得到上述至少两种候选文字各自对应的字体风格信息。In one way, in the above step S103, the generating the font style information corresponding to the at least two candidate characters based on the font style coding network may include: mapping the at least two candidate characters to the font style coding network based on the font style coding network. In the above normal distribution, font style information corresponding to each of the at least two candidate characters is obtained.

由于样本字体风格信息的隐空间在上述文字生成训练过程中被约束为正态分布，因此，在将至少两种候选文字输入到字体风格编码网络之后，可以由字体风格编码网络将该至少两种候选文字映射到该正态分布中，从该正态分布中提取出该至少两个候选文字各自对应的风格特征向量，实现至少两种候选文字各自对应的风格特征向量的提取，得到该至少两种候选文字各自对应的字体风格信息。由于样本字体风格信息的隐空间被约束为正态分布，从而在字体风格信息提取过程中，压缩了字体风格信息，拉近了至少两种候选文字各自对应的不同字体风格信息之间的间距，使得风格隐空间的变化更加平滑，避免网络遭遇间断点，从而提高了字体风格信息的生成质量，进而提高了目标文字的生成质量。Since the latent space of the sample font style information is constrained to be a normal distribution in the above text generation training process, after inputting at least two candidate characters into the font style coding network, the font style coding network can use the at least two candidate characters to The candidate characters are mapped to the normal distribution, the style feature vectors corresponding to the at least two candidate characters are extracted from the normal distribution, and the style feature vectors corresponding to the at least two candidate characters are extracted, and the at least two candidate characters are obtained. The font style information corresponding to each candidate text. Since the latent space of the sample font style information is constrained to be a normal distribution, the font style information is compressed during the font style information extraction process, and the distance between the different font style information corresponding to at least two candidate characters is shortened. It makes the change of the style latent space smoother and avoids the network encountering discontinuous points, thereby improving the generation quality of font style information, thereby improving the generation quality of target text.

在一个具体的实施例中，在上述步骤S105中，可以从待处理文字集中提取出文字的内容，作为文字内容信息，并将该文字内容信息和字体风格信息输入到预先训练好的文字生成网络，生成目标文字。其中，文字生成网络可以包含两个分支：风格分支和内容分支，风格分支输入是字体风格编码网络得到的字体风格信息，而内容分支输入的是文字内容信息。In a specific embodiment, in the above step S105, the content of the text can be extracted from the text set to be processed as text content information, and the text content information and font style information can be input into the pre-trained text generation network , to generate the target text. Among them, the text generation network can include two branches: a style branch and a content branch. The input of the style branch is the font style information obtained by the font style encoding network, and the input of the content branch is the text content information.

在一种方式中，可以从待处理文字集中依次提取出每个文字的文字内容信息，并依次遍历每个文字的文字内容信息，得到每个文字的文字内容信息和字体风格信息对应的目标文字。In one method, the text content information of each character can be sequentially extracted from the text set to be processed, and the text content information of each character can be traversed in turn to obtain the target text corresponding to the text content information and font style information of each character. .

在另一种方式中，还可以从待处理文字集中提取出所有文字的文字内容信息，并行遍历所有文字的文字内容信息，得到所有文字的文字内容信息和字体风格信息对应的目标文字。In another way, the text content information of all characters can also be extracted from the to-be-processed character set, and the text content information of all the characters can be traversed in parallel to obtain the target text corresponding to the text content information of all the characters and the font style information.

在一个可选的实施例中，上述方法还可以包括：基于上述字体风格编码网络，计算上述字体风格信息的平均值，得到目标字体风格信息。In an optional embodiment, the above method may further include: calculating an average value of the above font style information based on the above font style coding network to obtain the target font style information.

相应地，在上述步骤S105中，上述基于文字生成网络，生成上述字体风格信息和文字内容信息对应的目标文字，可以包括：基于上述文字生成网络，生成上述目标字体风格信息和上述文字内容信息对应的上述目标文字。Correspondingly, in the above step S105, generating the target text corresponding to the font style information and the text content information based on the text generation network may include: generating the target font style information corresponding to the text content information based on the text generation network. of the above target text.

具体地，由于输入到字体风格编码网络中的是至少两种候选文字，对于每个候选文字，字体风格编码网络均会输出一个字体风格信息，因此，可以求得至少两种候选文字对应的字体风格信息的均值，得到目标字体风格信息，该目标字体风格信息可以理解为新的风格特征，即该目标文字风格特征与待处理文字集中的文字所对应的字体风格信息均不相同。示例性地，由于文字可以当做图像来处理，且图像的风格信息通常由均值和方差表示，上述求至少两种候选文字对应的字体风格信息的均值，可以理解为求至少两种候选文字对应的图像的均值。Specifically, since at least two candidate characters are input into the font style coding network, for each candidate character, the font style coding network will output a font style information. Therefore, the fonts corresponding to the at least two candidate characters can be obtained. The average value of the style information is used to obtain the target font style information. The target font style information can be understood as a new style feature, that is, the target text style feature is different from the font style information corresponding to the text in the text set to be processed. Exemplarily, since text can be treated as an image, and the style information of the image is usually represented by the mean and variance, the above calculation of the mean value of the font style information corresponding to at least two candidate texts can be understood as finding the corresponding font style information of at least two candidate texts. mean of the image.

相应地，在得到该目标字体风格信息之后，可以将该目标字体风格信息和该文字内容信息输入到该文字生成网络，得到该文字内容信息和该目标字体风格信息对应的目标文字。本申请实施例中，通过计算至少两种候选文字各自对应的字体风格信息的平均值，可以得到新的风格特征(即目标字体风格信息)，目标字体风格信息生成的准确率较高。将该准确率较高的目标字体风格信息和文字内容信息输入到文字生成网络，能够提高目标文字的生成质量；此外，通过文字生成网络即可生成该目标字体风格信息对应的整套新文字(即目标文字)，无需预先设置某种新字体下的部分文字，可以降低目标文字生成过程对系统资源的消耗，从而降低目标文字的生成成本。Correspondingly, after obtaining the target font style information, the target font style information and the text content information can be input into the text generation network to obtain the target text corresponding to the text content information and the target font style information. In the embodiment of the present application, by calculating the average value of font style information corresponding to at least two candidate characters, new style features (ie, target font style information) can be obtained, and the accuracy of generating the target font style information is high. Inputting the target font style information and text content information with high accuracy into the text generation network can improve the generation quality of the target text; in addition, a whole set of new text corresponding to the target font style information can be generated through the text generation network (ie target text), there is no need to pre-set part of the text under a new font, which can reduce the consumption of system resources in the process of generating the target text, thereby reducing the cost of generating the target text.

图3是根据一示例性实施例示出的一种对风格和内容进行标准化处理的流程图。如图3所示，在一个可选的实施例中，上述方法还可以包括：Fig. 3 is a flow chart showing a process of standardizing style and content according to an exemplary embodiment. As shown in FIG. 3, in an optional embodiment, the above method may further include:

S201.解耦上述字体风格信息。S201. Decouple the above font style information.

S203.对解耦后的字体风格信息进行标准化处理，得到标准字体风格信息。S203. Standardize the decoupled font style information to obtain standard font style information.

S205.对上述文字内容信息进行标准化处理，得到标准文字内容信息。S205. Standardize the above text content information to obtain standard text content information.

相应地，在上述步骤S105中，上述基于文字生成网络，生成上述字体风格信息和文字内容信息对应的目标文字，可以包括：Correspondingly, in the above-mentioned step S105, the above-mentioned generation of the target text corresponding to the above-mentioned font style information and text content information based on the text generation network may include:

基于上述文字生成网络，生成上述标准字体风格信息和上述标准文字内容信息对应的上述目标文字。Based on the character generation network, the target character corresponding to the standard font style information and the standard character content information is generated.

在一个可选的实施例中，文字生成网络的结构可以为SPADE+AdaIN结构。其中，SPADE是一种将分割图转换为实景图的生成模型，其作为主干网络，可以更好地保留文字的结构信息。AdaIN为Adaptive Instance Normalization的缩写，中文名称为自适应实例归一化。In an optional embodiment, the structure of the text generation network may be a SPADE+AdaIN structure. Among them, SPADE is a generative model that converts segmentation graphs into reality graphs. As a backbone network, SPADE can better preserve the structural information of text. AdaIN is the abbreviation of Adaptive Instance Normalization, and the Chinese name is Adaptive Instance Normalization.

示例性地，在上述步骤S201-S203中，可以将字体风格信息输入到多层感知器(Multi-layerPerceptron，MLP)中进行解耦，并将解耦后的字体风格信息通过AdaIN的方式进行标准化处理，得到标准化处理后的标准字体风格信息，即将解耦后的风格以AdaIN的方式输入到网络的各层中，通过控制特征统计信息的方式控制生成图像的风格。其中，MLP是一种前向结构的人工神经网络。示例地，解耦可以指的是：对风格本身的解耦，比如说笔画粗细，笔锋等等。Exemplarily, in the above steps S201-S203, the font style information can be input into a multi-layer perceptron (Multi-layer Perceptron, MLP) for decoupling, and the decoupled font style information can be standardized by means of AdaIN. After processing, the standard font style information after standardized processing is obtained, that is, the decoupled style is input into each layer of the network in the form of AdaIN, and the style of the generated image is controlled by controlling the feature statistics. Among them, MLP is an artificial neural network with a forward structure. For example, decoupling may refer to: decoupling the style itself, such as stroke thickness, stroke and so on.

本申请实施例中的字体风格信息在多层全连接层后被解耦，之后被送入网络的各层中，通过这种方式，文字生成网络可以从多种语义层级提取字体风格信息。The font style information in the embodiment of the present application is decoupled after multiple fully connected layers, and then sent to each layer of the network. In this way, the text generation network can extract font style information from various semantic levels.

在文字生成中，内容分支保留了文字的笔画结构。文字的笔画结构(即文字内容信息)是一个关键因素。示例性地，在上述步骤S205中，可以将文字的笔画结构以空间自适应实例归一化(SpatialAdaptiveNormalization，SpatialAdaIN)的形式进行标准化处理，得到标准文字内容信息，即将文字的笔画结构以SpatialAdaIN的形式插入到文字生成网络中，通过该方式更好地保留了文字的笔画结构，即保留了原始分割的空间信息)，提高了目标文字的生成质量。In text generation, the content branch preserves the stroke structure of the text. The stroke structure of text (ie, text content information) is a key factor. Exemplarily, in the above step S205, the stroke structure of the text can be standardized in the form of Spatial Adaptive Normalization (SpatialAdaIN) to obtain standard text content information, that is, the stroke structure of the text is in the form of SpatialAdaIN. Inserted into the text generation network, the stroke structure of the text is better preserved in this way, that is, the spatial information of the original segmentation is preserved), and the generation quality of the target text is improved.

图4是根据一示例性实施例示出的一种文字生成的网络训练方法的流程图。如图4所示，该方法可以包括：Fig. 4 is a flow chart of a network training method for text generation according to an exemplary embodiment. As shown in Figure 4, the method may include:

S301.从样本文字集中提取第一样本文字和第二样本文字。S301. Extract the first sample text and the second sample text from the sample text set.

可选地，该第一样本文字可以为样本文字集或单个文字，该第二样本文字也可以为样本文字集或单个文字。Optionally, the first sample text may be a sample text set or a single text, and the second sample text may also be a sample text set or a single text.

示例性地，对于样本文字集中的任意文字x_ij，其包含风格属性S_i∈S和内容属性C_j∈C。为了表示一种风格，可以从样本文字集中风格为S_i的字体中，取出K个文字，构成此风格的参考集合

即第一样本文字。其中，

d_s是风格嵌入的维度，K为大于或等于1的正整数。Illustratively, for any text x _ij in the sample text set, it contains a style attribute S _i ∈S and a content attribute C _j ∈C. In order to represent a style, K characters can be taken from the font whose style is _Si in the sample character set to form a reference set of this style

That is, the first sample text. in,

d _s is the dimension of the style embedding, and K is a positive integer greater than or equal to 1.

示例性地，为了表示内容属性C_j，可以从样本文字集中取出一些文字构成内容参考集合

即第二样本文字。Exemplarily, in order to represent the content attribute C _j , some texts can be taken from the sample text set to form the content reference set

That is, the second sample text.

如果选取不同风格、同一内容的文字作为内容参考集合，会增大网络学习内容的难度，这是因为不同风格的文字在位置和笔画形态上有很大的差异，网络很难从中抽象出正确的笔画结构。因此，在一个示例性的实施方式中，为了降低网络学习内容的难度，可以从样本文字集中选取字体风格信息为宋体的样本文字作为第二样本文字，由于宋体是一种标准字体，有利于模型学习字体结构。更进一步地，为了进一步降低网络学习内容的难度，还可以从样本文字集中选取单个宋体，即

中的文字，作为第二样本文字。If the text of different styles and the same content is selected as the content reference set, it will increase the difficulty of network learning content. This is because different styles of text have great differences in position and stroke shape, and it is difficult for the network to abstract the correct content from them. stroke structure. Therefore, in an exemplary embodiment, in order to reduce the difficulty of learning the content on the network, a sample text whose font style information is Song type can be selected from the sample text set as the second sample text. Since Song type is a standard font, it is beneficial to the model Learn font structure. Further, in order to further reduce the difficulty of online learning content, a single Song Dynasty can also be selected from the sample text set, that is,

The text in , as the second sample text.

S303.基于上述第一样本文字的样本字体风格信息和上述第二样本文字的样本文字内容信息，对预设神经网络进行文字生成训练，在上述文字生成训练过程中，将上述样本字体风格信息的隐空间约束为正态分布，得到字体风格编码网络和文字生成网络。S303. Based on the sample font style information of the above-mentioned first sample text and the sample text content information of the above-mentioned second sample text, perform text generation training on a preset neural network, and in the above-mentioned text generation training process, the above-mentioned sample font style information The latent space constraint is normal distribution, and the font style encoding network and the text generation network are obtained.

本申请实施例中，可以将第一样本文字的样本字体风格信息和上述第二样本文字的样本文字内容信息输入预设神经网络进行文字生成训练，并在上述文字生成训练过程中，将上述样本字体风格信息的隐空间约束为正态分布，从而得到字体风格编码网络和文字生成网络。由于在训练过程中，样本字体风格信息的隐空间被约束为正态分布，从而压缩了字体风格信息，拉近了不同字体风格信息之间的间距，使得风格隐空间的变化更加平滑，避免网络遭遇间断点，提高字体风格编码网络和文字生成网络的训练精度；此外，基于第一样本文字的样本字体风格信息和第二样本文字的样本文字内容信息，对预设神经网络进行文字生成训练，即可得到字体风格编码网络和文字生成网络，训练过程中对系统资源的消耗较少，从而降低网络的训练难度和成本。In the embodiment of the present application, the sample font style information of the first sample text and the sample text content information of the second sample text may be input into a preset neural network for text generation training, and in the above text generation training process, the above The latent space constraint of the sample font style information is normal distribution, so as to obtain the font style encoding network and the text generation network. During the training process, the latent space of the sample font style information is constrained to a normal distribution, which compresses the font style information, narrows the distance between different font style information, makes the change of the style latent space smoother, and avoids the need for network When encountering discontinuities, the training accuracy of the font style encoding network and the text generation network is improved; in addition, based on the sample font style information of the first sample text and the sample text content information of the second sample text, the preset neural network is trained for text generation , the font style encoding network and the text generation network can be obtained, and the consumption of system resources during the training process is less, thereby reducing the training difficulty and cost of the network.

图5是根据一示例性实施例示出的一种预设神经网络示意图。如图5所示，该预设神经网络可以包括预设字体风格编码网络、预设文字生成网络和预设判别网络，该预设神经网络整体上是一个生成对抗网络(Generative adversarial nets，GAN)的结构。此外，预设神经网络使用SpatialAdaIN和AdaIN的方式，分别向生成器的各层输入样本文字内容信息的空间信息和不同层次的样本字体风格信息。Fig. 5 is a schematic diagram of a preset neural network according to an exemplary embodiment. As shown in FIG. 5 , the preset neural network may include a preset font style encoding network, a preset text generation network and a preset discrimination network, and the preset neural network is a generative adversarial nets (GAN) as a whole. Structure. In addition, the preset neural network uses SpatialAdaIN and AdaIN to input spatial information of sample text content information and sample font style information of different levels to each layer of the generator respectively.

以下，对使用图5中的预设神经网络训练字体风格编码网络和文字生成网络的过程进行说明：The following describes the process of using the preset neural network in Figure 5 to train the font style encoding network and the text generation network:

假设，第一样本文字为“方正卡通”的“公，…，们，政”，其用于表示“方正卡通”这种字体风格信息，第二样本文字为宋体“的”，其用于表示“的”这种文字内容，将第一样本文字输入预设风格编码网络，得到样本字体风格信息，将样本字体风格信息以AdaIN的方式输入预设文字生成网络，并将第二样本文字的样本文字内容信息以SpatialAdaIN的形式插入预设文字生成网络，由预设文字生成网络输出“方正卡通”下的文字“的”(即参考文字)，通过预设判别器网络判别该训练输出的参考文字与样本文字集中的“方正卡通”下的文字“的”(即参考文字)，得到损失信息，在训练过程中不断调整网络的参数，直至损失信息满足预设条件时停止训练过程，得到训练好的字体风格编码网络和文字生成网络。Suppose, the first sample text is "Gong, ..., Men, Zheng" of "Founder Cartoon", which is used to represent the font style information of "Founder Cartoon", and the second sample text is Song type "De", which is used for Indicates the text content of "de", input the first sample text into the preset style coding network to obtain sample font style information, input the sample font style information into the preset text generation network in the form of AdaIN, and convert the second sample text into the preset text generation network. The sample text content information is inserted into the preset text generation network in the form of SpatialAdaIN, and the preset text generation network outputs the text "De" (that is, the reference text) under "Founder Cartoon", and the preset discriminator network is used to determine the output of the training. Refer to the text "De" (that is, the reference text) under "Founder cartoon" in the sample text set to obtain the loss information, and continuously adjust the parameters of the network during the training process until the loss information meets the preset conditions. Stop the training process, and get Trained font style encoding network and text generation network.

图6是根据一示例性实施例示出的一种训练得到的格编码网络和文字生成网络的流程图。如图6所示，在上述S303中，上述基于上述第一样本文字的样本字体风格信息和上述第二样本文字的样本文字内容信息，对预设神经网络进行文字生成训练，在上述文字生成训练过程中，将上述样本字体风格信息的隐空间约束为正态分布，得到字体风格编码网络和文字生成网络，可以包括：Fig. 6 is a flow chart of a lattice coding network and a character generation network obtained by training according to an exemplary embodiment. As shown in FIG. 6, in the above S303, based on the sample font style information of the first sample text and the sample text content information of the second sample text, the preset neural network is subjected to text generation training, and the text generation In the training process, the latent space of the above-mentioned sample font style information is constrained to a normal distribution, and the font style encoding network and the text generation network are obtained, which can include:

S3031.基于上述预设字体风格编码网络将上述第一样本文字映射到最新正太分布中，得到当前样本字体风格信息。S3031. Map the first sample text to the latest normal distribution based on the preset font style coding network to obtain current sample font style information.

可选地，在上述步骤S3031中，该预设字体风格编码网络可以为前向传播网络，即前向传播网络将第一样本文字(即风格参考集合R_S)作为输入，输出当前样本字体风格信息(即风格嵌入向量

)。Optionally, in the above step S3031, the preset font style coding network may be a forward propagation network, that is, the forward propagation network takes the first sample text (ie, the style reference set R _S ) as input, and outputs the current sample font. style information (i.e. style embedding vector

).

示例性地，该风格嵌入向量Z_S可以是随机采样得到的。该前向传播网络，可以将风格参考集合R_S映射到最新正太分布中，输出两个向量

和

上述两个向量表示多元正态分布N(μ_s,σ_s)的参数μ_s和σ_s。Exemplarily, the style embedding vector Z _S may be obtained by random sampling. The forward propagation network can map the style reference set R _S to the latest normal distribution, and output two vectors

and

The above two vectors represent the parameters μ _s and σ _s of the multivariate normal distribution N(μ _s ,σ _s ).

S3033.基于上述预设字体风格编码网络对标准正态分布和上述当前样本字体风格信息进行处理，得到上述样本字体风格信息。S3033. Process the standard normal distribution and the current sample font style information based on the preset font style coding network to obtain the sample font style information.

本申请实施例中，在上述步骤S3033中，预设字体风格编码网络可以使得当前样本字体风格信息向多元标准正态分布看齐，从而得到该样本字体风格信息。In the embodiment of the present application, in the above step S3033, the preset font style coding network can make the current sample font style information align with the multivariate standard normal distribution, so as to obtain the sample font style information.

图7是根据一示例性实施例示出的一种得到样本字体风格信息的示意图。如图7所示，在上述步骤S3033中，上述基于上述预设字体风格编码网络对标准正态分布和上述当前样本字体风格信息进行处理，得到上述样本字体风格信息，可以包括：FIG. 7 is a schematic diagram of obtaining sample font style information according to an exemplary embodiment. As shown in FIG. 7, in the above step S3033, the above-mentioned standard normal distribution and the above-mentioned current sample font style information are processed based on the above-mentioned preset font style coding network to obtain the above-mentioned sample font style information, which may include:

S30331.基于上述预设字体风格编码网络，从上述标准正态分布中随机获取与上述当前样本字体风格信息对应的特征向量。S30331. Based on the above-mentioned preset font style coding network, randomly obtain a feature vector corresponding to the above-mentioned current sample font style information from the above-mentioned standard normal distribution.

S30333.基于上述预设字体风格编码网络，通过上述特征向量和上述当前样本字体风格信息之间的差异信息，对上述最新正太分布进行更新，将更新后的最新正太分布重新作为上述最新正太分布。S30333. Based on the above-mentioned preset font style coding network, through the difference information between the above-mentioned feature vector and the above-mentioned current sample font style information, the above-mentioned latest normal distribution is updated, and the updated latest normal distribution is re-used as the above-mentioned latest normal distribution.

S30335.基于上述预设字体风格编码网络，在上述将上述第一样本文字映射到最新正太分布中，得到当前样本字体风格信息，和上述将更新后的最新正太分布重新作为上述最新正太分布之间重复，直至上述差异信息满足预设条件时停止。S30335. Based on the above-mentioned preset font style coding network, in the above-mentioned mapping of the above-mentioned first sample text to the latest normal Tai distribution, obtain the current sample font style information, and the above-mentioned updated latest normal Tai distribution is re-used as the above-mentioned latest normal Tai distribution. It repeats from time to time until the difference information above satisfies the preset condition.

S30337.基于上述预设字体风格编码网络将上述差异信息满足预设条件时的当前样本字体风格信息，作为上述样本字体风格信息。S30337. Based on the above-mentioned preset font style coding network, use the current sample font style information when the above-mentioned difference information satisfies the preset condition as the above-mentioned sample font style information.

示例性地，在上述步骤S30331中，在训练时，为了使得当前样本字体风格信息(即风格嵌入向量

)向多元标准正态分布看齐，预设字体风格编码网络可以从标准正态分布中随机确定一个与该当前样本字体风格信息对应的特征向量。在上述步骤S20333-S20337中，该预设编码网络可以计算该特征向量和上述当前样本字体风格信息之间的差异信息，并将该差异信息作为分布损失值，基于该分布损失值可以对预设字体风格编码网络中的最新正太分布进行调整更新，以使最新正太分布被不断调整为能够用于得到质量较高的样本字体风格信息，即最新正太分布被不断调整使得差异信息满足预设条件，并将差异信息满足预设条件时的当前样本字体风格信息，作为该样本字体风格信息。Exemplarily, in the above step S30331, during training, in order to make the current sample font style information (ie the style embedding vector

) is aligned with the multivariate standard normal distribution, and the preset font style encoding network can randomly determine a feature vector corresponding to the font style information of the current sample from the standard normal distribution. In the above steps S20333-S20337, the preset coding network can calculate the difference information between the feature vector and the above-mentioned current sample font style information, and use the difference information as a distribution loss value. The latest orthodox distribution in the font style coding network is adjusted and updated, so that the latest orthodox distribution is continuously adjusted to be able to obtain high-quality sample font style information, that is, the latest orthodox distribution is continuously adjusted so that the difference information satisfies the preset conditions, The current sample font style information when the difference information satisfies the preset condition is used as the sample font style information.

在一种方式中，可以对预设风格编码器施加如下约束：In one approach, the following constraints can be imposed on the preset style encoder:

其中，N表示多元正态分布，KL指KL散度，KL散度是一种衡量两个概率分布的匹配程度的指标，两个分布差异越大，KL散度越大。Among them, N represents multivariate normal distribution, KL refers to KL divergence, and KL divergence is an indicator to measure the degree of matching of two probability distributions. The greater the difference between the two distributions, the greater the KL divergence.

相应地，在上述步骤S30333中，可以计算特征向量和上述当前样本字体风格信息之间的KL散度，并基于该KL散度对预设字体风格编码网络中的最新正太分布进行调整更新，以使最新正太分布被不断调整为能够用于得到质量较高的样本字体风格信息，即最新正太分布被不断调整使得KL散度小于预设散度阈值，并将KL散度小于预设散度阈值时的当前样本字体风格信息，作为该样本字体风格信息。Correspondingly, in the above-mentioned step S30333, the KL divergence between the feature vector and the above-mentioned current sample font style information can be calculated, and based on the KL divergence, the latest normal distribution in the preset font style coding network is adjusted and updated, so that The latest normal distribution is continuously adjusted so that it can be used to obtain higher-quality sample font style information, that is, the latest normal distribution is continuously adjusted so that the KL divergence is less than the preset divergence threshold, and the KL divergence is less than the preset divergence threshold. The current sample font style information at the time is used as the sample font style information.

在另一种方式中，在上述步骤S30333中，还可以利用最大平均差异算法，计算特征向量和上述当前样本字体风格信息之间的差异信息。其中，最大平均差异算法用于衡量两个分布之间的差异。In another way, in the above step S30333, the maximum average difference algorithm may also be used to calculate the difference information between the feature vector and the above-mentioned current sample font style information. Among them, the Maximum Mean Difference algorithm is used to measure the difference between two distributions.

本申请实施例中，通过对预设字体风格编码网络施加约束，使得当前样本字体风格信息(风格嵌入向量Z_S)从多元标准正态分布中采样，因此风格嵌入向量Z_S的各个维度都在0均值附近，而不会在

这个空间里任意选取，从而压缩了风格嵌入Z_S的空间，拉近了不同风格之间的间距，使得风格隐空间的变化更加平滑，避免网络遭遇间断点，从而提高了字体风格编码网络的训练精度。In the embodiment of the present application, by imposing constraints on the preset font style encoding network, the current sample font style information (style embedding vector Z _S ) is sampled from the multivariate standard normal distribution, so each dimension of the style embedding vector Z _S is in near the mean of 0, but not at

This space is arbitrarily selected, thereby compressing the space of style embedding _ZS , narrowing the distance between different styles, making the change of style latent space smoother, avoiding the network encountering discontinuous points, thus improving the training of font style encoding network. precision.

S3035.基于上述预设文字生成网络，生成上述样本字体风格信息和上述样本文字内容信息对应的当前文字。S3035. Based on the preset text generation network, generate the current text corresponding to the sample font style information and the sample text content information.

本申请实施例中，给定样本字体风格信息(风格S_i)和样本文字内容(内容C_j)，预设文字生成网络的目标是生成对应风格和内容的文字x_ij。预设文字生成网络的输入包括两个分支：内容分支和风格分支，风格分支的输入是预设字体风格编码网络得到的样本字体风格信息(即风格嵌入

)，而内容分支输入的是内容文字图像，因而预设文字生成网络生成的当前文字

In the embodiment of the present application, given sample font style information (style S _i ) and sample text content (content C _j ), the goal of the preset text generation network is to generate text x _ij corresponding to the style and content. The input of the preset text generation network includes two branches: the content branch and the style branch. The input of the style branch is the sample font style information (that is, the style embedding) obtained by the preset font style encoding network.

), and the content branch input is the content text image, so the preset text generation network generates the current text

S3037.基于上述预设判别网络，对上述当前文字、参考文字、上述样本字体风格信息和上述样本文字内容信息进行判别处理，得到损失信息。S3037. Based on the preset discrimination network, perform discrimination processing on the current text, the reference text, the sample font style information, and the sample text content information to obtain loss information.

S3039.基于上述损失信息训练上述预设字体风格编码网络和上述预设文字生成网络，得到上述字体风格编码网络和上述文字生成网络；上述参考文字表征上述样本文字内容信息在上述样本字体风格信息下的文字。S3039. Train the preset font style coding network and the preset text generation network based on the loss information to obtain the font style coding network and the text generation network; the reference text represents the sample text content information under the sample font style information Text.

预设判别网络是GAN的一个重要组成部分，它的目标是判别给定的当前文字是不是足够真实，如果是真实的，则预设判别网络要为当前文字打高分，否则打低分。在训练时，预设判别网络通常认为预设文字生成网络生成的当前文字不够真实，而只有样本文字集中的参考字体是真实的。这样，预设文字生成网络为了骗过预设判别网络，只能生成更加真实的图像。预设文字生成网络和预设判别网络在这样的零和博弈中不断提高自己的能力，最终，预设文字生成网络生成的当前文字接近参考文字，这样就可以训练得到一个质量较高的文字生成网络。The preset discriminant network is an important part of GAN. Its goal is to judge whether the given current text is real enough. If it is real, the preset discriminant network should give a high score to the current text, otherwise it will give a low score. During training, the preset discrimination network usually considers that the current text generated by the preset text generation network is not real enough, and only the reference fonts in the sample text set are real. In this way, the preset text generation network can only generate more realistic images in order to fool the preset discrimination network. The preset text generation network and the preset discrimination network continuously improve their abilities in such a zero-sum game. In the end, the current text generated by the preset text generation network is close to the reference text, so that a higher-quality text generation can be trained. network.

预设文字生成网络的目的是骗过预设判别网络，即预设文字生成网络没有直接的监督信息，而是依靠预设判别网络提供监督信息。在一个可选的实施例中，预设判别网络可以使用2个结构相同的预设判别网络D₁,D₂，判别不同尺寸的图像，小尺寸的图像可以使得预设判别网络拥有更大的感受野，而大尺寸的图像使得预设判别网络更关注细节，也能在一定程度上避免过拟合的问题。The purpose of the preset text generation network is to deceive the preset discrimination network, that is, the preset text generation network does not have direct supervision information, but relies on the preset discrimination network to provide supervision information. In an optional embodiment, the preset discriminating network can use two preset discriminating networks D ₁ and D ₂ with the same structure to discriminate images of different sizes, and the images of small sizes can make the preset discriminating network have a larger size. Receptive field, and large-size images make the preset discriminant network pay more attention to details, and can also avoid the problem of overfitting to a certain extent.

在一个具体的实施例中，对于每一个判别网络，可以使用铰链损失函数(hingeloss)作为GAN的损失函数：In a specific embodiment, for each discriminative network, a hinge loss function (hingeloss) can be used as the loss function of GAN:

其中，G、D、E分别指的是预设文字生成网络、预设判别网络和预设字体风格编码网络，k指的是预设判别网络的数量。Among them, G, D, and E refer to the preset text generation network, preset discrimination network and preset font style encoding network respectively, and k refers to the number of preset discrimination networks.

在一个示例性的实施例中，为了让预设判别网络能够区分字体的内容和风格，挺高网络质量，在S3037中，除了当前文字和参考文字之外，预设判别网络的输入还可以包括样本字体风格信息和样本文字内容信息。In an exemplary embodiment, in order to enable the preset discrimination network to distinguish the content and style of fonts and to improve the network quality, in S3037, in addition to the current text and reference text, the input of the preset discrimination network may also include Sample font style information and sample text content information.

在另一个示例性的实施例中，为了稳定训练过程，还可以使用特征匹配损失(feature matching loss，)，对齐当前文字和参考文字在预设判别网络各层中的特征(feature)，令

表示输入文字x在第k个判别器，第i层的feature，则featurematching loss可表示为：In another exemplary embodiment, in order to stabilize the training process, a feature matching loss (feature matching loss) can also be used to align the features of the current text and the reference text in each layer of the preset discriminant network, so that

Indicates that the input text x is the feature of the kth discriminator and the ith layer, then the featurematching loss can be expressed as:

其中，

T表示预设判别网络的卷积层数量，N_t表示预设判别网络第t层feature的元素数量。in,

T represents the number of convolutional layers of the preset discriminant network, and _Nt represents the number of elements of the t-th layer feature of the preset discriminant network.

综上所述，在构建好预设神经网络的各个部分后，整个预设神经网络总的优化目标为：To sum up, after constructing each part of the preset neural network, the overall optimization goal of the entire preset neural network is:

其中，超参数λ_FM＝10,λ_VAE＝0.05，G、D、E分别指的是预设文字生成网络、预设判别网络和预设字体风格编码网络。Among them, the hyperparameters λ _FM =10, λ _VAE =0.05, G, D, and E respectively refer to the preset text generation network, the preset discrimination network and the preset font style encoding network.

本申请实施例的预设神经网络可以包括预设字体风格编码网络、预设文字生成网络和预设判别网络，通过该预设字体风格编码网络将样本字体风格信息的隐空间约束为正态分布，从而压缩了字体风格信息，拉近了不同字体风格信息之间的间距，使得风格隐空间的变化更加平滑，避免网络遭遇间断点，从而提高了字体风格编码网络和文字生成网络的训练精度，降低了训练成本和难度；此外，通过预设文字生成网络，生成样本字体风格信息和样本文字内容信息对应的当前文字，并通过预设判别网络，对当前文字、参考文字、样本字体风格信息和样本文字内容信息进行判别处理，得到损失信息，并以该损失信息为基础，训练得到字体风格编码网络和文字生成网络，进一步提高了字体风格编码网络和文字生成网络的训练精度，降低了训练成本。The preset neural network in the embodiment of the present application may include a preset font style encoding network, a preset text generation network, and a preset discrimination network, and the latent space of the sample font style information is constrained to a normal distribution through the preset font style encoding network , thereby compressing the font style information, narrowing the distance between different font style information, making the change of the style latent space smoother, avoiding the network encountering discontinuities, thereby improving the font style encoding network and text generation network training accuracy, The training cost and difficulty are reduced; in addition, the current text corresponding to the sample font style information and the sample text content information is generated through the preset text generation network, and the current text, reference text, sample font style information and sample text content information are generated through the preset discrimination network. The sample text content information is discriminated to obtain the loss information, and based on the loss information, the font style coding network and the text generation network are obtained by training, which further improves the training accuracy of the font style coding network and the text generation network and reduces the training cost. .

图8是根据一示例性实施例示出的一种对样本字体风格信息和样本文字内容信息进行标准化处理的示意图。如图8所示，在一个可选的实施例中，上述方法还可以包括：FIG. 8 is a schematic diagram illustrating a method of standardizing sample font style information and sample text content information according to an exemplary embodiment. As shown in FIG. 8, in an optional embodiment, the above method may further include:

S401.解耦上述样本字体风格信息。S401. Decouple the above sample font style information.

S403.对解耦后的样本字体风格信息进行标准化处理，得到标准样本字体风格信息。S403. Standardize the decoupled sample font style information to obtain standard sample font style information.

S405.对上述样本文字内容信息进行标准化处理，得到标准样本文字内容信息。S405. Standardize the above-mentioned sample text content information to obtain standard sample text content information.

相应地，在上述步骤S3035中，上述基于上述预设文字生成网络，生成上述样本字体风格信息和上述样本文字内容信息对应的当前文字，包括：Correspondingly, in the above step S3035, based on the above-mentioned preset text generation network, the above-mentioned sample font style information and the current text corresponding to the above-mentioned sample text content information are generated, including:

基于上述预设文字生成网络，生成上述标准样本文字内容信息和上述标准样本字体风格信息对应的上述当前文字。Based on the preset text generation network, the current text corresponding to the standard sample text content information and the standard sample font style information is generated.

在一个可选的实施例中，继续如图5所示，预设文字生成网络的结构可以为SPADE+AdaIN结构。其中，SPADE一种将分割图转换为实景图的生成模型，其作为主干网络，可以更好地保留文字的结构信息。In an optional embodiment, continuing as shown in FIG. 5 , the structure of the preset text generation network may be a SPADE+AdaIN structure. Among them, SPADE is a generative model that converts segmentation graphs into reality graphs. As a backbone network, SPADE can better preserve the structural information of text.

示例性地，在上述步骤S401-S403中，可以将样本字体风格信息输入到ML中进行解耦，并将解耦后的样本字体风格信息通过AdaIN的方式进行标准化处理，得到标准化处理后的标准样本字体风格信息，并将解耦后的风格以AdaIN的方式输入到网络的各层中，通过控制特征统计信息的方式控制生成图像的风格。字体风格信息在多层全连接层后被解耦，之后被送入网络的各层中，通过这种方式，预设文字生成网络可以从多种语义层级提取字体风格信息。此外，SPADE使用AdaIN作为标准化方法，可以很好地保留原始分割的空间信息。即通过SPADE+AdaIN的生成器的结构，能够提升字体融合的质量。Exemplarily, in the above steps S401-S403, the sample font style information can be input into the ML for decoupling, and the decoupled sample font style information can be standardized through AdaIN to obtain a standardized standard. Sample font style information, and input the decoupled style into each layer of the network in the form of AdaIN, and control the style of the generated image by controlling the feature statistics. Font style information is decoupled after multiple fully connected layers, and then fed into each layer of the network. In this way, the preset text generation network can extract font style information from various semantic levels. Furthermore, SPADE uses AdaIN as the normalization method, which can well preserve the spatial information of the original segmentation. That is, through the structure of the generator of SPADE+AdaIN, the quality of font fusion can be improved.

在文字生成过程中，内容分支保留了文字的笔画结构。文字的笔画结构(即文字内容信息)是一个关键因素。示例性地，在上述步骤S405中，可以将样本文字内容信息以SpatialAdaIN的形式插入到网络中，以对样本文字内容信息进行标准化处理，得到标准样本文字内容信息，从而保留了文字的笔画结构(即原始分割的空间信息)，提高了当前文字的生成质量。During the text generation process, the content branch preserves the stroke structure of the text. The stroke structure of text (ie, text content information) is a key factor. Exemplarily, in the above step S405, the sample text content information can be inserted into the network in the form of SpatialAdaIN, to standardize the sample text content information to obtain standard sample text content information, thereby retaining the stroke structure of the text ( That is, the spatial information of the original segmentation), which improves the generation quality of the current text.

在一个具体的实施例中，通过AdaIN的方式进行标准化处理的具体过程可以如下：In a specific embodiment, the specific process of standardizing processing by means of AdaIN can be as follows:

AdaIN接收一个内容输入x和一个风格输入s，通过将x的通道级(C)均值和标准差对齐匹配到s上以实现标准化。AdaIN无需学习的仿射参数，其能够自适应地从style输入中计算仿射参数，计算公式如下：AdaIN takes a content input x and a style input s and normalizes it by aligning the channel-level (C) mean and standard deviation of x onto s. AdaIN does not need to learn affine parameters, it can adaptively calculate affine parameters from the style input, and the calculation formula is as follows:

AdaIN(x,s)＝σ(s)(σ(x)x-μ(x))+μ(s)，AdaIN(x,s)=σ(s)(σ(x)x-μ(x))+μ(s),

其中，AdaIN中的IN指的是：对每一个样本(N)的每一个特征通道(C)，在空间(H,W)计算均值和标准差。Among them, IN in AdaIN refers to calculating the mean and standard deviation in space (H, W) for each feature channel (C) of each sample (N).

在一个具体的实施例中，通过Spatial AdaIN的方式进行标准化处理的具体过程可以如下：In a specific embodiment, the specific process of standardization processing by means of Spatial AdaIN may be as follows:

SpatialAdaIN与AdaIN类似，均没有需要学习的放射参数，是一种自适应的标准化方法。特别地，SpatialAdaIN统计的均值和标准差是像素级别的，而不是通道级别的。像素级别的统计量可以提高模型的复杂度，也可以更好地保留图像的空间信息，从而更好地保持文字的笔画结构。Similar to AdaIN, SpatialAdaIN has no radiation parameters to learn and is an adaptive standardization method. In particular, the mean and standard deviation of SpatialAdaIN statistics are pixel-level, not channel-level. Pixel-level statistics can improve the complexity of the model, and can also better preserve the spatial information of the image, thereby better preserving the stroke structure of the text.

在一个可行的实施例中，还可以将低分辨率的样本文字内容信息作为预设文字生成网络的输入。从低分辨率的文字出发，预设文字生成网络只需要根据风格略微调整各个笔画的位置、粗细等细节，就可以生成对应风格的文字，大大降低了文字生成的难度。In a feasible embodiment, low-resolution sample text content information may also be used as the input of the preset text generation network. Starting from low-resolution text, the preset text generation network only needs to slightly adjust the position, thickness and other details of each stroke according to the style, and then the text of the corresponding style can be generated, which greatly reduces the difficulty of text generation.

在一个可行的实施例中，本申请实施例还提供了一种字体质量检测模型和字体相似度检测模型，用来衡量生成的目标文字的字体质量和创新程度。In a feasible embodiment, the embodiment of the present application further provides a font quality detection model and a font similarity detection model, which are used to measure the font quality and innovation degree of the generated target text.

示例性地，字体质量检测模型主要用于评价字体是否完整、是否出现难以辨识等问题，该字体质量检测模型可以为一个二分类的文字质量评价模型，数据集的正样本是模型生成的高质量文字和真实文字，负样本是模型生成的低质量文字，测试显示该模型的分类正确率为93％，具有较强的质量分辨能力。进一步地，为了避免生成文字与训练文字过于相似，还可以训练一个字体相似度检测模型，用于评价生成文字的字体与训练文字的字体的相似度，评价指标可以为余弦相似度(Cosine Similarity)。Exemplarily, the font quality detection model is mainly used to evaluate whether the font is complete and whether it is difficult to identify. Text and real text, negative samples are low-quality text generated by the model. The test shows that the model has a classification accuracy rate of 93%, and has strong quality discrimination ability. Further, in order to prevent the generated text from being too similar to the training text, a font similarity detection model can also be trained to evaluate the similarity between the font of the generated text and the font of the training text. The evaluation index can be Cosine Similarity. .

在一个具体的实施例中，可以选取预设数量个字体风格，每种字体包含预设数量个常用字符(内容)，作为样本文字集。并按照上述训练过程对该样本文字集进行训练。图9是根据一示例性实施例示出的采用本申请实施例的文字生成方法生成的目标文字的示意图。如图9所示，采用本申请实施例提供的字体风格编码网络和文字生成网络可以生成风格各异的文字，包括方正平直的文字、笔画抽象具有艺术效果的文字以及粗细程度各不相同的文字，并且同一种文字能保持自身的风格。在保证了风格多样性的前提下，生成的文字完整性好，质量较高，基本不会出现笔画缺失和合并的问题。In a specific embodiment, a preset number of font styles may be selected, and each font contains a preset number of commonly used characters (contents) as a sample character set. And train the sample text set according to the above training process. FIG. 9 is a schematic diagram of a target text generated by using the text generation method according to an embodiment of the present application, according to an exemplary embodiment. As shown in FIG. 9 , characters with different styles can be generated by using the font style coding network and the character generation network provided by the embodiments of the present application, including square and straight characters, characters with abstract strokes and artistic effects, and characters with different thicknesses. text, and the same text can maintain its own style. Under the premise of ensuring the diversity of styles, the generated text has good integrity and high quality, and there is basically no problem of missing and merged strokes.

在一个可行的实施例中，还可以训练一个经验模态分解模型(Empirical ModeDecomposition，EMD)，并通过该EMD来生成文字。表1是采用EMD生成的文字与采用本申请实施例中的方法生成的目标文字之间的融合效果对比表。图10是根据一示例性实施例示出的融合效果对比图。In a feasible embodiment, an empirical mode decomposition model (Empirical Mode Decomposition, EMD) may also be trained, and texts are generated through the EMD. Table 1 is a comparison table of the fusion effect between the text generated by EMD and the target text generated by the method in the embodiment of the present application. FIG. 10 is a comparison diagram of fusion effects according to an exemplary embodiment.

如表1所示，采用本申请实施例中的方法生成的目标文字的融合效果更好。其中，FID是Frechet Inception Distance score的缩写，中文名称为距离得分。良品率指的是生成的质量较好的文字，与生成的总文字之间的比值。As shown in Table 1, the fusion effect of the target text generated by the method in the embodiment of the present application is better. Among them, FID is the abbreviation of Frechet Inception Distance score, and the Chinese name is distance score. Yield refers to the ratio between the generated text with better quality and the total generated text.

表1融合效果对比Table 1 Comparison of fusion effects

模型Model FID↓FID↓ 良品率↑Yield rate↑ EMDEMD 32.6632.66 63.30％63.30% 本申请this application 28.1028.10 91.49％91.49%

如图10所示，EMD融合的文字容易出现笔画缺失、笔画合并等字体不完整的问题，例如第一行的“真”字缺少了一横，而“练”和“解”字出现了笔画合并进而导致生成了错别字。而采用本申请实施例生成的目标子图，很少出现笔画缺少和合并的问题，能很好地保留文字的结构，极少出现错字，大大提升了字体的质量。As shown in Figure 10, the characters fused by EMD are prone to problems of incomplete fonts such as missing strokes and merging of strokes. For example, the word "True" in the first line is missing a horizontal line, while the words "Lian" and "Jie" have strokes. The merge in turn resulted in a typo. However, using the target subgraph generated by the embodiment of the present application, the problem of lack of strokes and merging rarely occurs, the structure of the text can be well preserved, typos rarely occur, and the quality of the font is greatly improved.

图11是根据一示例性实施例示出的通过字体相似度检测模型对目标文字的字体进行检测，所得到的相似字体的示意图。如图11所示，从前三行可以看到，生成的目标文字的字体与训练集中的文字的字体(即图11中的最相近字体和次相近字体)有明显的风格差异。然而，第四行展示的文字的字体与字库中的文字的字体高度相似，可以自动地过滤掉该文字，以筛除不合适的文字，进一步提高了目标文字生成的质量。FIG. 11 is a schematic diagram of a similar font obtained by detecting the font of a target text by a font similarity detection model according to an exemplary embodiment. As shown in Figure 11, it can be seen from the first three lines that the font of the generated target text has obvious style differences with the font of the text in the training set (ie, the closest font and the next closest font in Figure 11). However, the font height of the text displayed in the fourth row is similar to the font height of the text in the font library, and the text can be automatically filtered out to filter out the inappropriate text, which further improves the quality of the generated target text.

在一个可行的实施例中，如本申请所公开的文字生成方法、文字生成的网络训练方法，其中字体风格信息、文字内容信息等可保存于区块链上。In a feasible embodiment, as disclosed in the present application for the text generation method and the text generation network training method, font style information, text content information, etc. can be stored on the blockchain.

图12是根据一示例性实施例示出的一种文字生成装置。如图12所示，该装置可以至少包括：Fig. 12 shows a text generating apparatus according to an exemplary embodiment. As shown in Figure 12, the device may include at least:

文字获取模块501，可以用于从待处理文字集中获取字体风格信息不同的至少两种候选文字。The character acquisition module 501 may be configured to acquire at least two candidate characters with different font style information from the to-be-processed character set.

字体风格信息生成模块503，可以用于基于字体风格编码网络生成上述至少两种候选文字各自对应的字体风格信息。The font style information generating module 503 may be configured to generate font style information corresponding to each of the at least two candidate characters above based on the font style coding network.

目标文字生成模块505，可以用于基于文字生成网络，生成上述字体风格信息和文字内容信息对应的目标文字；上述文字内容信息表征上述待处理文字集中的文字的内容。The target text generation module 505 can be configured to generate target text corresponding to the font style information and text content information based on the text generation network; the text content information represents the content of the text in the to-be-processed text set.

其中，上述字体风格编码网络和文字生成网络为对预设神经网络进行文字生成训练得到，文字生成训练过程中所使用到的样本字体风格信息的隐空间被约束为正态分布。Wherein, the above-mentioned font style coding network and text generation network are obtained by performing text generation training on a preset neural network, and the latent space of the sample font style information used in the text generation training process is constrained to be a normal distribution.

在一示例性的实施方式中，上述字体风格信息生成模块503，可以用于基于上述字体风格编码网络将上述至少两种候选文字映射到上述正态分布中，得到上述至少两种候选文字各自对应的字体风格信息。In an exemplary embodiment, the font style information generation module 503 can be configured to map the at least two candidate characters to the normal distribution based on the font style coding network, so as to obtain the corresponding correspondence between the at least two candidate characters. font style information.

在一示例性的实施方式中，上述装置还可以包括：In an exemplary embodiment, the above-mentioned apparatus may further include:

目标字体风格信息确定模块，可以用于基于上述字体风格编码网络，计算上述字体风格信息的平均值，得到目标字体风格信息。The target font style information determination module may be configured to calculate the average value of the above font style information based on the above font style coding network to obtain the target font style information.

相应地，上述目标文字生成模块505，可以用于基于上述文字生成网络，生成上述目标字体风格信息和上述文字内容信息对应的上述目标文字。Correspondingly, the target text generation module 505 may be configured to generate the target text corresponding to the target font style information and the text content information based on the text generation network.

第一解耦模块，可以用于解耦上述字体风格信息。The first decoupling module can be used to decouple the above font style information.

第一标准化处理模块，可以用于对解耦后的字体风格信息进行标准化处理，得到标准字体风格信息。The first standardization processing module can be used to standardize the decoupled font style information to obtain standard font style information.

第二标准化处理模块，可以用于对上述文字内容信息进行标准化处理，得到标准文字内容信息。The second standardization processing module may be configured to perform standardization processing on the above-mentioned text content information to obtain standard text content information.

相应地，上述目标文字生成模块505，可以用于基于上述文字生成网络，生成上述标准字体风格信息和上述标准文字内容信息对应的上述目标文字。Correspondingly, the target text generation module 505 may be configured to generate the target text corresponding to the standard font style information and the standard text content information based on the text generation network.

图13是根据一示例性实施例示出的一种文字生成的网络训练装置。如图13所示，该装置还可以包括：Fig. 13 shows a network training apparatus for text generation according to an exemplary embodiment. As shown in Figure 13, the device may also include:

样本文字获取模块601，用于从样本文字集中提取第一样本文字和第二样本文字。The sample text acquisition module 601 is used for extracting the first sample text and the second sample text from the sample text set.

训练模块603，用于基于上述第一样本文字的样本字体风格信息和上述第二样本文字的样本文字内容信息，对预设神经网络进行文字生成训练，在上述文字生成训练过程中，将上述样本字体风格信息的隐空间约束为正态分布，得到字体风格编码网络和文字生成网络。The training module 603 is configured to perform text generation training on the preset neural network based on the sample font style information of the above-mentioned first sample text and the sample text content information of the above-mentioned second sample text. The latent space constraint of the sample font style information is normal distribution, and the font style encoding network and the text generation network are obtained.

在一示例性的实施方式中，上述预设神经网络包括预设字体风格编码网络、预设文字生成网络和预设判别网络，上述训练模块603，可以包括：In an exemplary embodiment, the above-mentioned preset neural network includes a preset font style encoding network, a preset text generation network and a preset discrimination network, and the above-mentioned training module 603 may include:

映射单元，可以用于基于上述预设字体风格编码网络将上述第一样本文字映射到最新正太分布中，得到当前样本字体风格信息。The mapping unit may be configured to map the above-mentioned first sample text to the latest normal distribution based on the above-mentioned preset font style coding network to obtain current sample font style information.

样本字体风格信息生成单元，可以用于基于上述预设字体风格编码网络对标准正态分布和上述当前样本字体风格信息进行处理，得到上述样本字体风格信息。The sample font style information generating unit may be configured to process the standard normal distribution and the current sample font style information based on the preset font style coding network to obtain the sample font style information.

当前文字生成单元，可以用于基于上述预设文字生成网络，生成上述样本字体风格信息和上述样本文字内容信息对应的当前文字。The current text generation unit may be configured to generate the current text corresponding to the sample font style information and the sample text content information based on the preset text generation network.

损失信息确定单元，可以用于基于上述预设判别网络，对上述当前文字、参考文字、上述样本字体风格信息和上述样本文字内容信息进行判别处理，得到损失信息。The loss information determining unit may be configured to perform a discrimination process on the current text, the reference text, the sample font style information and the sample text content information based on the preset discrimination network to obtain loss information.

网络生成单元，可以用于基于上述损失信息训练上述预设字体风格编码网络和上述预设文字生成网络，得到上述字体风格编码网络和上述文字生成网络；上述参考文字表征上述样本文字内容信息在上述样本字体风格信息下的文字。The network generation unit can be used to train the preset font style coding network and the preset text generation network based on the loss information, and obtain the font style coding network and the text generation network; the reference text represents the content information of the sample text in the above Text under sample font style information.

在一示例性的实施方式中，上述样本字体风格信息生成单元，可以包括：In an exemplary embodiment, the above-mentioned sample font style information generating unit may include:

特征向量获取子单元，可以用于基于上述预设字体风格编码网络，从上述标准正态分布中随机获取与上述当前样本字体风格信息对应的特征向量。The feature vector obtaining subunit may be configured to randomly obtain the feature vector corresponding to the font style information of the current sample from the standard normal distribution based on the preset font style coding network.

更新子单元，可以用于基于上述预设字体风格编码网络，确定上述特征向量和上述当前样本字体风格信息之间的差异信息，基于上述差异信息对上述最新正太分布进行更新，将更新后的最新正太分布重新作为上述最新正太分布。The update subunit can be used to determine the difference information between the above-mentioned feature vector and the above-mentioned current sample font style information based on the above-mentioned preset font style coding network, update the above-mentioned latest normal distribution based on the above-mentioned difference information, and update the updated latest font style information. The normal distribution is re-assigned as the latest normal distribution above.

重复子单元，可以用于基于上述预设字体风格编码网络，在上述将上述第一样本文字映射到最新正太分布中，得到当前样本字体风格信息，和上述将更新后的最新正太分布重新作为上述最新正太分布之间重复，直至上述差异信息满足预设条件时停止。The repeating subunit can be used to map the above-mentioned first sample text to the latest normal distribution based on the above-mentioned preset font style coding network to obtain the current sample font style information, and the above-mentioned updated latest normal distribution is re-assigned. The above-mentioned latest normal distribution is repeated until the above-mentioned difference information satisfies the preset condition.

样本字体风格信息确定子单元，可以用于基于上述预设字体风格编码网络将上述差异信息满足预设条件时的当前样本字体风格信息，作为上述样本字体风格信息。The sample font style information determination subunit may be configured to use the current sample font style information when the difference information meets the preset condition based on the preset font style coding network as the sample font style information.

第二解耦模块，可以用于解耦上述样本字体风格信息。The second decoupling module can be used to decouple the above-mentioned sample font style information.

第三标准化处理模块，可以用于对解耦后的样本字体风格信息进行标准化处理，得到标准样本字体风格信息。The third standardization processing module can be used to standardize the decoupled sample font style information to obtain standard sample font style information.

第四标准化处理模块，可以用于对上述样本文字内容信息进行标准化处理，得到标准样本文字内容信息。The fourth standardization processing module may be configured to perform standardization processing on the above-mentioned sample text content information to obtain standard sample text content information.

相应地，上述当前文字生成单元，可以用于基于上述预设文字生成网络，生成上述标准样本文字内容信息和上述标准样本字体风格信息对应的上述当前文字。Correspondingly, the above-mentioned current character generation unit may be configured to generate the above-mentioned current character corresponding to the above-mentioned standard sample text content information and the above-mentioned standard sample font style information based on the above-mentioned preset character generation network.

需要说明的是，本申请实施例提供的装置实施例与上述方法实施例基于相同的发明构思。It should be noted that the apparatus embodiments provided in the embodiments of the present application and the above-mentioned method embodiments are based on the same inventive concept.

本申请实施例还提供了一种电子设备，该电子设备包括处理器和存储器，存储器中存储有至少一条指令或至少一段程序，至少一条指令或至少一段程序由处理器加载并执行以实现如上述任一实施例提供的文字生成方法或文字生成的网络训练方法。An embodiment of the present application further provides an electronic device, the electronic device includes a processor and a memory, the memory stores at least one instruction or at least one program, and the at least one instruction or at least one program is loaded and executed by the processor to achieve the above The text generation method or the network training method for text generation provided by any of the embodiments.

本申请的实施例还提供了一种计算机可读存储介质，该计算机可读存储介质可设置于终端之中以保存用于实现方法实施例中一种文字生成方法或文字生成的网络训练方法相关的至少一条指令或至少一段程序，至少一条指令或至少一段程序由处理器加载并执行以实现如上述方法实施例提供的文字生成方法或文字生成的网络训练方法。Embodiments of the present application further provide a computer-readable storage medium, which can be set in a terminal to store information about a text generation method or a network training method for text generation in the method embodiment. At least one instruction or at least one segment of program of the , at least one instruction or at least one segment of program is loaded and executed by the processor to implement the character generation method or the character generation network training method provided by the above method embodiments.

可选地，在本说明书实施例中，存储介质可以位于计算机网络的多个网络服务器中的至少一个网络服务器。可选地，在本实施例中，上述存储介质可以包括但不限于：U盘、只读存储器(Read-Only Memory，ROM)、随机存取存储器(Random Access Memory，RAM)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。Optionally, in this embodiment of the present specification, the storage medium may be located in at least one network server among multiple network servers of a computer network. Optionally, in this embodiment, the above-mentioned storage medium may include but is not limited to: a U disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a mobile hard disk, a magnetic Various media that can store program codes, such as a disc or an optical disc.

本说明书实施例存储器可用于存储软件程序以及模块，处理器通过运行存储在存储器的软件程序以及模块，从而执行各种功能应用程序以及数据处理。存储器可主要包括存储程序区和存储数据区，其中，存储程序区可存储操作系统、功能所需的应用程序等；存储数据区可存储根据设备的使用所创建的数据等。此外，存储器可以包括高速随机存取存储器，还可以包括非易失性存储器，例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地，存储器还可以包括存储器控制器，以提供处理器对存储器的访问。The memory in the embodiments of this specification can be used to store software programs and modules, and the processor executes various functional application programs and data processing by running the software programs and modules stored in the memory. The memory may mainly include a stored program area and a stored data area, wherein the stored program area may store the operating system, application programs required for functions, etc.; the stored data area may store data created according to the use of the device, and the like. Additionally, the memory may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory may also include a memory controller to provide processor access to the memory.

本申请实施例还提供了一种计算机程序产品或计算机程序，该计算机程序产品或计算机程序包括计算机指令，该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令，处理器执行该计算机指令，使得该计算机设备执行上述方法实施例提供的文字生成方法或文字生成的网络训练方法。Embodiments of the present application also provide a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the character generation method or the character generation network training method provided by the above method embodiments.

本申请实施例所提供的文字生成方法或文字生成的网络训练方法实施例可以在终端、计算机终端、服务器或者类似的运算装置中执行。以运行在服务器上为例，图14是本申请实施例提供的一种文字生成或文字生成的网络训练的服务器的硬件结构框图。如图14所示，该服务器700可因配置或性能不同而产生比较大的差异，可以包括一个或一个以上中央处理器(Central Processing Units，CPU)710(中央处理器710可以包括但不限于微处理器MCU或可编程逻辑器件FPGA等的处理装置)、用于存储数据的存储器730，一个或一个以上存储应用程序723或数据722的存储介质720(例如一个或一个以上海量存储设备)。其中，存储器730和存储介质720可以是短暂存储或持久存储。存储在存储介质720的程序可以包括一个或一个以上模块，每个模块可以包括对服务器中的一系列指令操作。更进一步地，中央处理器710可以设置为与存储介质720通信，在服务器700上执行存储介质720中的一系列指令操作。服务器700还可以包括一个或一个以上电源760，一个或一个以上有线或无线网络接口750，一个或一个以上输入输出接口740，和/或，一个或一个以上操作系统721，例如Windows ServerTM，MacOS XTM，UnixTM，LinuxTM，FreeBSDTM等等。The embodiments of the character generation method or the character generation network training method provided by the embodiments of the present application may be executed in a terminal, a computer terminal, a server, or a similar computing device. Taking running on a server as an example, FIG. 14 is a hardware structural block diagram of a server for character generation or character generation network training provided by an embodiment of the present application. As shown in FIG. 14 , the server 700 may vary greatly due to different configurations or performances, and may include one or more central processing units (Central Processing Units, CPU) 710 (the central processing unit 710 may include, but is not limited to, a microcomputer). A processor MCU or a processing device such as a programmable logic device FPGA), a memory 730 for storing data, and one or more storage media 720 (eg, one or more mass storage devices) for storing application programs 723 or data 722. Among them, the memory 730 and the storage medium 720 may be short-term storage or persistent storage. The program stored in the storage medium 720 may include one or more modules, and each module may include a series of instructions to operate on the server. Furthermore, the central processing unit 710 may be configured to communicate with the storage medium 720 to execute a series of instruction operations in the storage medium 720 on the server 700 . Server 700 may also include one or more power supplies 760, one or more wired or wireless network interfaces 750, one or more input and output interfaces 740, and/or, one or more operating systems 721, such as Windows Server™, MacOS X™ , UnixTM, LinuxTM, FreeBSDTM and more.

输入输出接口740可以用于经由一个网络接收或者发送数据。上述的网络具体实例可包括服务器700的通信供应商提供的无线网络。在一个实例中，输入输出接口740包括一个网络适配器(Network Interface Controller，NIC)，其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中，输入输出接口740可以为射频(RadioFrequency，RF)模块，其用于通过无线方式与互联网进行通讯。Input-output interface 740 may be used to receive or transmit data via a network. The specific example of the above-mentioned network may include a wireless network provided by the communication provider of the server 700 . In one example, the I/O interface 740 includes a network adapter (Network Interface Controller, NIC), which can be connected to other network devices through the base station so as to communicate with the Internet. In one example, the input/output interface 740 may be a radio frequency (Radio Frequency, RF) module, which is used to communicate with the Internet in a wireless manner.

本领域普通技术人员可以理解，图14所示的结构仅为示意，其并不对上述电子装置的结构造成限定。例如，服务器700还可包括比图14中所示更多或者更少的组件，或者具有与图14所示不同的配置。Those of ordinary skill in the art can understand that the structure shown in FIG. 14 is only a schematic diagram, which does not limit the structure of the above-mentioned electronic device. For example, server 700 may also include more or fewer components than those shown in FIG. 14 , or have a different configuration than that shown in FIG. 14 .

需要说明的是：上述本申请实施例先后顺序仅仅为了描述，不代表实施例的优劣。且上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下，在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外，在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中，多任务处理和并行处理也是可以的或者可能是有利的。It should be noted that: the above-mentioned order of the embodiments of the present application is only for description, and does not represent the advantages and disadvantages of the embodiments. And the foregoing describes specific embodiments of the present specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims can be performed in an order different from that in the embodiments and still achieve desirable results. Additionally, the processes depicted in the figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

本说明书中的各个实施例均采用递进的方式描述，各个实施例之间相同相似的部分互相参见即可，每个实施例重点说明的都是与其他实施例的不同之处。尤其，对于装置和服务器实施例而言，由于其基本相似于方法实施例，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。Each embodiment in this specification is described in a progressive manner, and the same and similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the apparatus and server embodiments, since they are basically similar to the method embodiments, the description is relatively simple, and for related parts, please refer to the partial descriptions of the method embodiments.

本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成，也可以通过程序来指令相关的硬件完成，的程序可以存储于一种计算机可读存储介质中，上述提到的存储介质可以是只读存储器，磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps of implementing the above embodiments can be completed by hardware, or can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable storage medium. The storage medium can be read-only memory, magnetic disk or optical disk, etc.

以上仅为本申请的较佳实施例，并不用以限制本申请，凡在本申请的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本申请的保护范围之内。The above are only preferred embodiments of the present application and are not intended to limit the present application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present application shall be included in the protection scope of the present application. Inside.

Claims

1. A method for generating words, the method comprising:

acquiring at least two candidate characters with different font style information from a character set to be processed;

generating font style information corresponding to the at least two candidate characters based on a font style coding network;

generating target characters corresponding to the font style information and the character content information based on a character generation network; the text content information represents the content of the text in the text set to be processed;

the font style coding network and the character generation network are obtained by performing character generation training on a preset neural network, and the hidden space of the sample font style information used in the character generation training process is constrained to be in normal distribution.

2. The method of claim 1, wherein generating font style information corresponding to each of the at least two candidate words based on the font style encoding network comprises:

and mapping the at least two candidate characters to the normal distribution based on the font style coding network to obtain font style information corresponding to the at least two candidate characters.

3. The method of claim 1, further comprising:

calculating the average value of the font style information based on the font style coding network to obtain target font style information;

generating the target characters corresponding to the font style information and the character content information based on the character generation network, wherein the generating comprises:

and generating the target characters corresponding to the target font style information and the character content information based on the character generation network.

4. The method according to any one of claims 1 to 3, further comprising:

decoupling said font style information;

standardizing the decoupled font style information to obtain standard font style information;

standardizing the text content information to obtain standard text content information;

correspondingly, the generating the target text corresponding to the font style information and the text content information based on the text generation network includes:

and generating the target characters corresponding to the standard font style information and the standard character content information based on the character generation network.

5. A network training method for generating characters is characterized by comprising the following steps:

extracting a first sample word and a second sample word from the sample word set;

and performing character generation training on a preset neural network based on the sample font style information of the first sample character and the sample character content information of the second sample character, and constraining the hidden space of the sample font style information into normal distribution in the character generation training process to obtain a font style coding network and a character generation network.

6. The method according to claim 5, wherein the preset neural network includes a preset font style coding network, a preset character generation network, and a preset discrimination network, the character generation training is performed on the preset neural network based on the sample font style information of the first sample character and the sample character content information of the second sample character, and in the character generation training process, the hidden space of the sample font style information is constrained to be normally distributed to obtain the font style coding network and the character generation network, including:

mapping the first sample font to the latest normal distribution based on the preset font style coding network to obtain the current sample font style information;

processing the standard normal distribution and the current sample font style information based on the preset font style coding network to obtain the sample font style information;

generating the current characters corresponding to the sample font style information and the sample character content information based on the preset character generation network;

based on the preset discrimination network, performing discrimination processing on the current characters, the reference characters, the sample font style information and the sample character content information to obtain loss information;

training the preset font style coding network and the preset character generation network based on the loss information to obtain the font style coding network and the character generation network; the reference characters represent characters of the sample character content information under the sample font style information.

7. The method according to claim 6, wherein the processing the standard normal distribution and the current sample font style information based on the preset font style encoding network to obtain the sample font style information comprises:

based on the preset font style coding network, randomly acquiring a feature vector corresponding to the font style information of the current sample from the standard normal distribution;

determining difference information between the feature vector and the current sample font style information based on the preset font style encoding network, updating the latest positive distribution based on the difference information, and taking the updated latest positive distribution as the latest positive distribution again;

based on the preset font style coding network, repeating the mapping of the first sample text word to the latest positive-too-distribution to obtain the current sample font style information, and the renewed latest positive-too-distribution as the latest positive-too-distribution until the difference information meets the preset condition;

and taking the current sample font style information when the difference information meets the preset condition as the sample font style information based on the preset font style coding network.

8. The method of claim 6, further comprising:

decoupling the sample font style information;

standardizing the decoupled sample font style information to obtain standard sample font style information;

standardizing the sample text content information to obtain standard sample text content information;

correspondingly, the generating the current text corresponding to the sample font style information and the sample text content information based on the preset text generation network includes:

and generating the current characters corresponding to the standard sample character content information and the standard sample font style information based on the preset character generation network.

9. A text generation apparatus, the apparatus comprising:

the character acquisition module is used for acquiring at least two candidate characters with different font style information from the character set to be processed;

the font style information generating module is used for generating font style information corresponding to the at least two candidate characters based on a font style coding network;

the target character generation module is used for generating target characters corresponding to the font style information and the character content information based on a character generation network; the text content information represents the content of the text in the text set to be processed;

10. A network training apparatus for character generation, the apparatus comprising:

the sample character acquisition module is used for extracting a first sample character and a second sample character from the sample character set;

and the training module is used for performing character generation training on a preset neural network based on the sample font style information of the first sample characters and the sample character content information of the second sample characters, and in the character generation training process, the hidden space of the sample font style information is restricted to normal distribution to obtain a font style coding network and a character generation network.

11. An electronic device, comprising a processor and a memory, wherein at least one instruction or at least one program is stored in the memory, and the at least one instruction or the at least one program is loaded by the processor and executed to implement the method for generating words according to any one of claims 1 to 4 or the method for training a network for generating words according to any one of claims 5 to 8.

12. A computer-readable storage medium, having at least one instruction or at least one program stored therein, which is loaded and executed by a processor to implement the method for generating words according to any one of claims 1 to 4 or the method for network training of word generation according to any one of claims 5 to 8.

13. A computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the method of generating a word according to any one of claims 1 to 4 and the method of network training for word generation according to any one of claims 5 to 8.