CN115146587B

CN115146587B - Method, system, electronic equipment and storage medium for generating character library in handwriting

Info

Publication number: CN115146587B
Application number: CN202210752549.0A
Authority: CN
Inventors: 岳强
Original assignee: SHANGHAI YICHUANG INFORMATION TECHNOLOGY CO LTD; Beijing Hanyi Innovation Technology Co ltd
Current assignee: SHANGHAI YICHUANG INFORMATION TECHNOLOGY CO LTD; Beijing Hanyi Innovation Technology Co ltd
Priority date: 2022-06-28
Filing date: 2022-06-28
Publication date: 2024-10-15
Anticipated expiration: 2042-06-28
Also published as: CN115146587A

Abstract

The disclosure relates to a method, a system, an electronic device and a storage medium for generating a text library in handwriting, wherein the method comprises the following steps: acquiring a font picture; training the font picture by using a content encoder and a style multi-input encoder in a generating countermeasure network, wherein the content encoder is used for extracting content feature vectors in the font picture, the style multi-input encoder receives input of a plurality of style samples of a designated style, obtains weight relations between the plurality of style samples and a generating target style so as to obtain different style weights corresponding to the plurality of style samples, and outputs the style feature vectors of the plurality of samples according to the plurality of style samples and the corresponding different style weights; the content feature vector and the style feature vector are fed into a generator to generate a Chinese character image. The method and the device greatly shorten the period of word stock manufacture and the labor cost of word stock finishing, enable the generation of personalized word stock to be simpler, more convenient and high in quality, and further promote the personalized application of fonts.

Description

Method, system, electronic equipment and storage medium for generating character library in handwriting

Technical Field

The disclosure relates to the field of word stock generation, in particular to a method, a system, electronic equipment and a storage medium for generating a word stock in handwriting.

Background

With the development of the internet, the upgrading of the portable intelligent device and the continuous innovation of internet companies, the service provided on the network can meet most of basic life demands and entertainment demands, so that the number of people using the network is very large, and the most basic element on the network is the text as an information transmission medium. Many people can read the characters every day, and the characters not only transmit information, but also transmit individuality and strength, culture and connotation, such as business marks of some enterprises, individuality store characters on street head and tail tablets, individuality signatures and the like. Personalized font production is becoming more and more important.

According to the requirements of GB2312, a word stock at least contains 6763 simplified Chinese characters, the data acquisition mode is generally written by an artist or a font lover, a large amount of handwriting is written, and the uniformity of styles is ensured, so that the writer has huge workload, and the data acquisition is particularly time-consuming, labor-consuming and more than a few weeks and months. And obtaining a manuscript file after data acquisition, then using the existing character segmentation and contour extraction to manufacture a primary word stock, and finally, finishing each character by a professional to obtain a word stock file which can be sold and meets the requirements.

With the development of artificial intelligence technology, there are also some practitioners attempting to simplify the above-described conventional processes using artificial intelligence technology, but there are some drawbacks. Zi2Zi (Zi2zi.https:// gitsub.com/kaonashi-tyc/Zi 2 i) and rewrite(YuchenTian.2016.Rewrite:Neural Style Transfer For Chinese Fonts.(2016).Retrieved Nov 23,2016from https://github.com/kaonashi-tyc/Rewrite) are two symbolic tasks, the main content of which is that one style font can be transformed into another style font without training on the character data for the two styles one-to-one. However, the other style of transformation can only be a fixed number and is in the training data set, and the problems of scoring loss, insignificant style and the like exist for the characters generated by the new style. MXFont (Multiple Heads are Better than One: few-shot Font Generation with Multiple Localized Experts, 2021) uses component information to enhance local style generation capabilities, i.e., to enhance the local detail quality of the generated text. The method mainly comprises the steps of firstly enabling an input picture to pass through a plurality of expert networks formed by convolutional neural networks, then enabling the output of the expert networks to be transformed into a style characteristic vector and a content characteristic vector through a full-connection network, splicing the style characteristic vector and the content characteristic vector together, inputting the style characteristic vector and the content characteristic vector into a generator network formed by the convolutional neural networks, and outputting a final generation result. In the training process, the component information of Chinese character splitting is used as a supervision signal. However, in the dimension of the components, not all Chinese characters of 6367 Chinese characters can be split, the split components are extremely unbalanced in distribution, some components are extremely high in occurrence frequency, most other components are extremely low in occurrence frequency, therefore only part of the generated results meet the standard of word stock production, and the rest of the generated effects do not meet the standard of word stock production. There are also documents (GENERATING HANDWRITTEN CHINESE CHARACTERS using CycleGAN, 2018) that use training against neural networks and dense link structures, which can also be trained with only unpaired data, to achieve a transformation from an original font to a target font, by first passing the input font through an encoder neural network to obtain the feature vector of the word, and then converting the feature vector into the feature vector of the target style font through the dense link structure, and finally generating the feature vector of the target style font through the generator network. The method adopts a one-to-one generation mode, if a set of fonts with new styles are to be generated, the new styles are retrained once, the training period is often several days or tens of days, the generated fonts are not clear enough, and the requirements of making a font library are not met.

In summary, the prior art has the following disadvantages:

1) The generated font picture has the condition of stroke or component loss

2) For the generation of a set of fresh air grid, training for tens of days is needed, and the generation period is long.

3) The stroke and font structure of the generated content cannot be controlled.

Disclosure of Invention

The invention provides a method, a system, electronic equipment and a storage medium for generating a handwriting library, which can solve the defects of poor quality, blurred image and long period of generating a set of new style fonts in the conventional method, shorten the time for manufacturing the handwriting library from several weeks of original manual acquisition or tens of days of retraining by using other deep neural network methods to 2-3 hours, and can specify the content style of generated words, and the effect also accords with the standard for manufacturing the handwriting library. In order to solve the technical problems, the present disclosure provides the following technical solutions:

as an aspect of the embodiments of the present disclosure, there is provided a method for generating a database of text in handwriting, including the steps of:

Acquiring a font picture;

training the font picture by using a content encoder and a style multi-input encoder in a generating countermeasure network, wherein the content encoder is used for extracting content feature vectors in the font picture, the style multi-input encoder receives input of a plurality of style samples of a designated style, obtains weight relations between the plurality of style samples and a generating target style so as to obtain different style weights corresponding to the plurality of style samples, and outputs the style feature vectors of the plurality of samples according to the plurality of style samples and the corresponding different style weights;

the content feature vector and the style feature vector are fed into a generator to generate a Chinese character image.

Optionally, the step of obtaining the glyph image specifically includes the steps of: and selecting a plurality of font files, and rendering the font files into a plurality of font pictures of white background and black characters.

Optionally, before the step of obtaining the glyph picture, further comprising training a generation countermeasure network, wherein the loss function of the generation countermeasure network comprises countermeasure loss, L1 loss and content loss.

Optionally, the step of fine-tuning the part of the parameters of the generated countermeasure network is further included before the step of obtaining the glyph picture, the fine-tuning is achieved by adding a new consistency loss, and the consistency loss l1_loss is expressed as follows:

L1_loss＝||ContEnc(I_c)-ContEnc(I_f)||，

Wherein ContEnc (i_c) is a content feature vector of the content glyph, and ContEnc (i_f) is a content feature vector of the generated glyph.

Optionally, the content encoder is comprised of a plurality of convolutional layer-normalization layer-activation layer structures.

Optionally, the stylistic multi-input encoder is composed of an attention layer for receiving input of a plurality of stylistic samples, a residual layer for acquiring and outputting stylistic feature vectors of the plurality of samples.

As another aspect of an embodiment of the present disclosure, there is provided a handwriting-in-text library generating system, including:

the font image acquisition module acquires font images;

An encoder module for training the font picture by using a content encoder and a style multi-input encoder in a generating countermeasure network, wherein the content encoder is used for extracting content feature vectors in the font picture, the style multi-input encoder receives input of a plurality of style samples of a designated style, obtains weight relations between the plurality of style samples and a generating target style so as to obtain different style weights corresponding to the plurality of style samples, and outputs the style feature vectors of the plurality of samples according to the plurality of style samples and the corresponding different style weights;

and a generator for feeding the content feature vector and the style feature vector into the generator to generate a Chinese character image.

Optionally, the system further includes a fine tuning module for fine tuning a loss function in the generating the countermeasure network training process, the loss function adding a consistency loss, the consistency loss having the formula:

L1_loss＝||ContEnc(I_c)-ContEnc(I_f)||，

As another aspect of the embodiments of the present disclosure, there is further provided an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the method for generating a database in handwriting described above when executing the computer program.

As another aspect of the embodiments of the present disclosure, there is also a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method for generating a database of handwritten text described above.

When the embodiment of the disclosure is implemented, a user can generate any content font picture with a corresponding style in a short time by only writing a small number of handwriting (more than three hundred characters) systems, so that the time for manufacturing the handwriting Chinese character library specified in GB2312 by the existing method is shortened from tens of days to hours, and compared with other methods for quickly manufacturing the handwriting Chinese character library, the generation effect is better. The whole method does not depend on stroke components and does not need to consume a large amount of manpower to refine and generate a result. Therefore, the manufacturing period of the word stock and the labor cost of finishing the word stock are greatly shortened, the generation of the personalized word stock is simpler, more convenient and high-quality, and the personalized application of the fonts is further promoted. The present disclosure also solves the problem of stroke missing in glyphs generated by existing methods by fine-tuning the use of consistency loss in generating the countermeasure network.

Drawings

Fig. 1 is a flowchart of a method for generating a database in handwriting in embodiment 1;

FIG. 2 is a diagram of a training process for generating an reactance network;

FIG. 3 is a diagram of the effect of font generation;

fig. 4 shows a handwriting Chinese character library a system block diagram is generated.

Fig. 5 (a), 5 (b) and 5 (c) are diagrams for generating examples of chinese character images.

Detailed Description

The following description of the technical solutions in the embodiments of the present disclosure will be made clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, not all embodiments. Based on the embodiments in this disclosure, all other embodiments that a person of ordinary skill in the art would obtain without making any inventive effort are within the scope of protection of this disclosure.

Example 1

As an aspect of the embodiments of the present disclosure, the present embodiment provides a method for generating a text library in handwriting, as shown in fig. 1, including the following steps:

S10, acquiring a font picture;

S20, training the font picture by using a content encoder and a style multi-input encoder in a generating countermeasure network, wherein the content encoder is used for extracting content feature vectors in the font picture, the style multi-input encoder receives input of a plurality of style samples of a designated style, obtains weight relations between the plurality of style samples and a generating target style so as to obtain different style weights corresponding to the plurality of style samples, and outputs style feature vectors of the plurality of samples according to the plurality of style samples and the corresponding different style weights;

s30, the content feature vector and the style feature vector are sent to a generator to generate Chinese character images.

Based on the configuration, the embodiment of the disclosure can generate any content font picture of a corresponding style in a short time by only writing a small amount of handwriting by a user, so that the time for manufacturing the handwriting Chinese character library specified in GB2312 by the existing method is shortened from tens of days to hours, and compared with other methods for rapidly manufacturing the handwriting Chinese character library, the method has better generation effect. The whole method does not depend on stroke components and does not need to consume a large amount of manpower to refine and generate a result. Therefore, the manufacturing period of the word stock and the labor cost of finishing the word stock are greatly shortened, the generation of the personalized word stock is simpler, more convenient and high-quality, and the personalized application of the fonts is further promoted.

The steps of the embodiments of the present disclosure are described in detail below, respectively.

S10, acquiring a font picture; the step of obtaining the font image specifically comprises the following steps: and selecting a plurality of font files, and rendering the font files into a plurality of font pictures of white background and black characters. The font files may be font files in format ttf or otf, and then rendering the characters specified in GB2312 from the font file format into a font picture file, such as rendering into a font picture of white background and black characters.

As shown in fig. 2, the training for generating the countermeasure network mainly includes four neural networks composed of convolutional layers, namely, a content encoder ContEnc, a style multiple-input encoder StyleEnc, a generator G, and a discriminator D.

In some embodiments, the content encoder is comprised of a plurality of convolutional layer-normalization layer-activation layer structures. For example, the content encoder ContEnc consists of a 5 convolutional layer-normalized layer-active layer structure, which functions to extract a 256x256 pixel glyph picture as a 512x16x16 content feature vector that is characteristic of the glyph image's content.

In some embodiments, the stylistic multi-input encoder is comprised of an attention layer for receiving input of a plurality of stylistic samples, a residual layer for acquiring and outputting stylistic feature vectors of the plurality of samples. For example, the attention layer accepts input of a plurality of style samples, such as 5 style samples, then calculates a weight relation between the 5 style samples and a generated target style, obtains different weights corresponding to the 5 samples, and finally outputs style characteristics of 1x256 latitude through the residual layer together with the obtained style weights and the style samples as style characterization of the 5 samples.

In some embodiments, the method further comprises the step of training a generated countermeasure network prior to the capturing of the glyph picture, the training of the generated countermeasure network loss function comprising countermeasure losses, L1 losses and content losses. Wherein the fight loss, L1 loss and content loss can be represented by loss functions in the prior art, and the aim of training is to minimize the sum of the three losses.

In some embodiments, the step of generating the partial parameters of the countermeasure network is further included before the step of obtaining the font image, wherein the font is rendered into the font image, and then the new style font generated by the trimmed model is generated by trimming the partial parameters of the basic model. The fine tuning can be completed in 2-3 hours. The fine tuning is achieved by adding a new consistency loss, l1_loss, as follows:

L1_loss＝||ContEnc(I_c)-ContEnc(I_f)||，

wherein ContEnc (i_c) is a content feature vector of the content glyph, contEnc (i_f) is a content feature vector of the generated glyph, and it is ensured that the generated content is consistent with the given content.

As shown in fig. 3, a generating effect diagram of fonts in a font library generated by the generator is shown.

The embodiment of the disclosure can use the new style as input, and can generate 6763 characters specified by GB2312 by modifying the given content font. A word stock of 2 ten thousand character sets as specified by GB18030 may also be generated.

Example 2

As another aspect of the embodiment of the present disclosure, the embodiment provides a handwriting Chinese character library generating system 100, as shown in fig. 4, including:

the font image acquisition module 1 acquires font images; for selecting a plurality of font files, such as a plurality of font files in a GB2312 library, and rendering a plurality of font files into a plurality of font pictures of white background and black characters. The font files may be font files in format ttf or otf, and then rendering the characters specified in GB2312 from the font file format into a font picture file, such as rendering into a font picture of white background and black characters.

An encoder module 2 for training the font picture using a content encoder in a generating countermeasure network for extracting a content feature vector in the font picture and a style multi-input encoder for receiving input of a plurality of style samples of a specified style, obtaining a weight relation between the plurality of style samples and a generating target style to obtain different style weights corresponding to the plurality of style samples, and outputting a style feature vector of the plurality of samples according to the plurality of style samples and the corresponding different style weights;

And a generator 3 for feeding the content feature vector and the style feature vector into the generator to generate a Chinese character image.

For example, as shown in fig. 5, for generating an example diagram of a font library in handwriting, fig. 5 (a) is a specified content input, i.e., a content input in a font image; FIG. 5 (b) is a plurality of style samples, namely, designated styles; fig. 5 (c) is a generated chinese character image.

In some embodiments, the system further comprises a fine tuning module for fine tuning the generation of a loss function in the countermeasure network training process, the loss function adding a consistency loss, the consistency loss having the formula:

L1_loss＝||ContEnc(I_c)-ContEnc(I_f)||，

In some embodiments, the arbiter D takes the same structure as VGG16(Karen Simonyan andAndrewZisserman.2014.Very deep convolutional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556(2014).), inputs the generated image or the real image, and outputs the probability of being the generated image and the probability of being the real image, respectively.

Example 3

An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of generating a database in handwriting of embodiment 1 when the computer program is executed.

Embodiment 3 of the present disclosure is merely an example, and should not be construed as limiting the functionality and scope of use of the embodiments of the present disclosure.

The electronic device may be in the form of a general purpose computing device, which may be a server device, for example. Components of an electronic device may include, but are not limited to: at least one processor, at least one memory, a bus connecting different system components, including the memory and the processor.

The buses include a data bus, an address bus, and a control bus.

The memory may include volatile memory such as Random Access Memory (RAM) and/or cache memory, and may further include Read Only Memory (ROM).

The memory may also include program means having a set (at least one) of program modules including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

The processor executes various functional applications and data processing by running computer programs stored in the memory.

The electronic device may also communicate with one or more external devices (e.g., keyboard, pointing device, etc.). Such communication may be through an input/output (I/O) interface. And, the electronic device may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through a network adapter. The network adapter communicates with other modules of the electronic device via a bus. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with an electronic device, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, data backup storage systems, and the like.

It should be noted that although several units/modules or sub-units/modules of an electronic device are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more units/modules described above may be embodied in one unit/module in accordance with embodiments of the present application. Conversely, the features and functions of one unit/module described above may be further divided into ones that are embodied by a plurality of units/modules.

Example 4

A computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method for generating a database of handwritten Chinese characters in embodiment 1.

More specifically, among others, readable storage media may be employed including, but not limited to: portable disk, hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.

In a possible embodiment, the disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps of implementing the method for generating a text library in handwriting as described in example 1, when said program product is run on the terminal device.

Wherein the program code for carrying out the present disclosure may be written in any combination of one or more programming languages, which program code may execute entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device, partly on the remote device or entirely on the remote device.

Although embodiments of the present disclosure have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the disclosure, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A method for generating a character library in handwriting is characterized by comprising the following steps:

Acquiring a font picture;

Feeding the content feature vector and the style feature vector into a generator to generate a Chinese character image;

the content encoder is composed of a plurality of convolution layer-normalization layer-activation layer structures;

The method further comprises the step of training a generated countermeasure network before the step of acquiring the font picture, wherein the loss function of the generated countermeasure network comprises countermeasure loss, L1 loss and content loss;

The method further comprises the step of fine-tuning the part of the parameters of the generated countermeasure network before the step of obtaining the glyph picture, wherein the fine-tuning is realized by adding a new consistency loss, and the consistency loss L1_loss is expressed as follows:

L1_loss=||ContEnc(I_c)-ContEnc(I_f)||，

2. The method for generating a text library in handwriting as claimed in claim 1, wherein said step of obtaining a font picture comprises the steps of: and selecting a plurality of font files, and rendering the font files into a plurality of font pictures of white background and black characters.

3. The method of claim 1, wherein the stylistic multi-input encoder is comprised of an attention layer for receiving input of a plurality of stylistic samples, a residual layer for acquiring and outputting stylistic feature vectors of the plurality of samples.

4. A system for generating a database of characters in handwriting, comprising:

the font image acquisition module acquires font images;

a generator that feeds the content feature vector and the style feature vector into the generator to generate a Chinese character image;

The fine tuning module is used for fine tuning the loss function in the process of generating the countermeasure network training, the loss function also comprises consistency loss, and the formula of the consistency loss is as follows:

L1_loss=||ContEnc(I_c)-ContEnc(I_f)||，

5. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of generating a database of handwritten text as claimed in any one of claims 1 to 3 when executing the computer program.

6. A computer readable storage medium having stored thereon a computer program, characterized in that the program when executed by a processor realizes the steps of the method for generating a database in handwriting according to any of claims 1 to 3.