[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN117132994B - Handwritten character erasing method based on generation countermeasure network - Google Patents

Handwritten character erasing method based on generation countermeasure network Download PDF

Info

Publication number
CN117132994B
CN117132994B CN202311039086.4A CN202311039086A CN117132994B CN 117132994 B CN117132994 B CN 117132994B CN 202311039086 A CN202311039086 A CN 202311039086A CN 117132994 B CN117132994 B CN 117132994B
Authority
CN
China
Prior art keywords
layer
network
handwritten
document image
handwritten character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311039086.4A
Other languages
Chinese (zh)
Other versions
CN117132994A (en
Inventor
金连文
黄鎏丰
周伟英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202311039086.4A priority Critical patent/CN117132994B/en
Publication of CN117132994A publication Critical patent/CN117132994A/en
Application granted granted Critical
Publication of CN117132994B publication Critical patent/CN117132994B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19147Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/094Adversarial learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/16Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/199Arrangements for recognition using optical reference masks, e.g. holographic masks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Character Input (AREA)
  • Character Discrimination (AREA)

Abstract

The application discloses a handwritten character erasing method based on a generated countermeasure network, which comprises the following steps: collecting a document image with handwritten characters, and manually marking the document to obtain a data set; generating a handwritten text stroke mask based on the dataset; preprocessing the document image to obtain processed data; building a handwritten character erasure model based on a generated countermeasure network according to the handwritten character stroke mask; training the handwritten character erasing model by using the processed data to obtain a final model; and erasing the handwritten text part in the document image by using the final model. The application utilizes the deep learning network to extract the characteristics of the document image, predicts the accurate stroke positioning area of the handwritten characters to be erased, adopts the cascade generation countermeasure network to erase the handwritten characters, and has the advantages of simple realization, high speed, good erasing effect and the like.

Description

Handwritten character erasing method based on generation countermeasure network
Technical Field
The application relates to the field of image processing, in particular to a handwritten character erasing method based on a generated countermeasure network.
Background
Handwritten text plays an important role in modern society and is widely used in various fields including document editing, signing, artistic creation, and the like. However, it is sometimes necessary to modify or erase handwritten text to accommodate specific needs or to correct errors. The use of Adobe Photoshop and other picture editing tools to manually erase handwritten text in pictures is inefficient and has a high threshold for users, thus requiring an efficient and accurate automatic handwritten text erase method.
While erasing the handwritten text is not intended to be erased by mistake, the document image contains many printed text, and the handwritten text is close to the printed text in many cases, so that it is a challenging problem to erase the handwritten text accurately. The existing method is difficult to erase the handwritten characters in the document with complex layout, and the erasure of the handwritten characters faces various problems, such as the naturalness of the erased image, the elimination of erasure marks and the like. Therefore, there is a need for an innovative handwritten character erasing method that can effectively address these problems. With the rapid development of deep learning technology, the generation of an antagonistic network (GANs) has demonstrated a powerful capability in the field of image processing. GANs is a neural network composed of a generator and a discriminator, wherein the generator gradually learns to generate realistic data through an countermeasure training mode, and the discriminator continuously improves the identification capability of the real data and the generated data. This technique has achieved remarkable results in image generation, conversion and repair.
Disclosure of Invention
The application aims to provide a method for erasing handwritten characters based on a generated countermeasure network, which can be used for automatically removing handwritten stains in a document image to obtain a clean document image.
In order to achieve the above object, the present application provides a handwritten character erasing method based on generation of an countermeasure network, comprising the steps of:
collecting a document image with handwritten characters, and manually marking the document to obtain a data set;
generating a handwritten text stroke mask based on the dataset;
preprocessing the document image to obtain processed data;
Building a handwritten character erasure model based on a generated countermeasure network according to the handwritten character stroke mask;
Training the handwritten character erasure model by using the processed data to obtain a final model;
and erasing the handwritten character part in the document image by using the final model.
Preferably, the method for performing the manual labeling comprises the following steps: erasing handwritten characters in the document image by using an Adobe Photoshop tool to obtain an erased document image; and marking the coordinates of the handwritten characters in the document image by using a quadrilateral frame to obtain a quadrilateral mask of the handwritten characters.
Preferably, the method for generating the handwritten character stroke mask comprises the following steps: and using the original document image and the erased document image as difference values, and then performing threshold binarization, corrosion expansion to eliminate noise, inward contraction to obtain a character skeleton, outward smooth expansion to obtain a character outer boundary to automatically generate the handwritten character stroke mask.
Preferably, the method for obtaining the processed data comprises the following steps: cutting the complete document image into a plurality of image blocks with the size of 512 x 512 pixels, randomly rotating the image blocks within the range of +/-10 degrees, and turning over the image blocks with the probability level of 0.5 to obtain the processed data.
Preferably, the constructed handwritten character erasure model comprises: a cascade generator and a arbiter;
The cascade generator is used for generating the erased document image;
the discriminator is used for discriminating the true and false of the image.
Preferably, the cascade generator includes: a coarse erase network and a fine erase network;
the rough erasing network is used for predicting the handwritten character stroke mask and the handwritten character quadrilateral mask and generating a preliminary erasing result;
and the fine erasure is used for carrying out fine erasure on the preliminary erasure result to generate a document image with handwritten characters erased.
Preferably, the rough erasure network comprises: an encoder, a decoder, and a handwritten word mask pre-header;
The encoder is used for extracting the characteristics of the document image;
the decoder is used for decoding the document image characteristics into the preliminary erasure result;
The handwritten character mask prediction head is used for predicting the handwritten character stroke mask and the handwritten character quadrilateral mask from the document image features.
Preferably, the fine erasure network adopts a U-net network structure.
Compared with the prior art, the application has the following beneficial effects:
The application aims to provide a handwritten character erasing method based on a generated countermeasure network, which realizes automatic and high-quality erasing of handwritten characters by utilizing the generated countermeasure network. And extracting features of the document image by using a deep learning network, predicting the area to be erased for accurate positioning of strokes of the handwritten characters, and erasing the handwritten characters by using a cascade generation countermeasure network. The method can not only maintain the naturalness of the erased image, but also effectively eliminate the erasing trace, thereby meeting the processing requirements of various handwriting characters. By combining deep learning with image processing, the method has the advantages of simplicity in implementation, high speed, good erasing effect and the like.
Drawings
In order to more clearly illustrate the technical solutions of the present application, the drawings that are needed in the embodiments are briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of the method of the present application;
FIG. 2 is a flow chart of generating a handwritten character stroke mask in accordance with an embodiment of the present application;
FIG. 3 is a schematic diagram of the overall architecture of a cascade generator according to an embodiment of the application;
FIG. 4 is a schematic diagram of a residual connection block structure according to an embodiment of the present application;
fig. 5 is a schematic diagram of the structure of a fine generator according to an embodiment of the present application;
Fig. 6 is a schematic diagram of a handwritten character erasing effect according to an embodiment of the application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In order that the above-recited objects, features and advantages of the present application will become more readily apparent, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description.
As shown in fig. 1, a flow chart of a method of the present embodiment includes the steps of:
S1, collecting document images with handwritten characters, and manually labeling the documents to obtain a data set.
In this embodiment, 545 images with handwritten text are collected, and the pixels of the handwritten text area in the document image are replaced with pixels of the document background area using the stamp function in the Adobe Photoshop tool, note that here the handwritten text needs to be precisely erased and the printed text cannot be erroneously erased, so that an erased document image is obtained. And meanwhile, marking the coordinate position of the handwritten text by using a quadrilateral frame to obtain a quadrilateral mask of the handwritten text, wherein 1 represents the handwritten text and 0 represents the non-handwritten text in the mask. The dataset was then randomly divided into a training set containing 430 images and a test set containing 115 images.
S2, generating a handwritten character stroke mask based on the data set.
As shown in fig. 2, the original document image and the erased document image are subjected to difference value, the threshold value is set to 20, and threshold value binarization is carried out, so that a handwritten character stroke mask containing noise is obtained; then, carrying out corrosion expansion operation on the handwritten character stroke mask containing noise, and filtering the noise to obtain the handwritten character stroke mask with the noise removed; and finally, the noise-removed handwritten character stroke mask is contracted inwards to obtain a character skeleton, and then is smoothly expanded outwards to obtain a character outer boundary so as to obtain a smooth handwritten character stroke mask.
S3, preprocessing the document image to obtain processed data.
Because the picture pixels are large, the whole picture is input into the network for training, and a large memory resource is consumed. Finally, 430 training set images were cut into 4995 image blocks. To increase the diversity of samples, the present embodiment performs a random rotation of ±10° on the image block, and flips the image block with a probability level of 0.5.
S4, building a handwritten character erasing model based on a generated countermeasure network according to the handwritten character stroke mask.
The handwritten character erasure model comprises a cascade generator and a discriminator. The cascade generator is used for generating a document image with handwritten characters erased; the discriminator is used for carrying out true and false discrimination, namely whether the given document image belongs to the image generated by the generator.
A cascade generator:
The structure of the cascade generator is shown in fig. 3. It comprises a coarse generator and a fine generator. The rough generator is used for predicting the handwriting stroke mask and the handwriting quadrilateral mask, and meanwhile, the handwriting in the document image is preliminarily erased to obtain a preliminary erasing result; and the fine generator performs finer erasure on the preliminary erasure result according to the predicted handwritten character stroke mask, and obtains a document image after handwritten characters are erased.
The above-described roughness generator includes an encoder, decoder and handwritten word mask pre-header. Wherein the encoder extracts the document image features using a convolutional neural network; the decoder is used for decoding the document image characteristics into a preliminary erasure result; the handwritten character mask pre-header decodes the handwritten character stroke mask and the handwritten character quadrilateral mask from the document image features.
The encoder consists of a 3-layer convolutional neural network and 8 residual connecting blocks. Each layer of convolutional neural network consists of a convolutional layer and a batch regularization layer (Batch normalization) +ReLU activation layer, wherein except for the convolutional core of the first layer of convolutional layer which is 7*7, the convolutional cores of the other layers are 3*3, and the output channel numbers of the convolutional layers are respectively as follows: 32, 32, 64; the convolution kernel step sizes of the convolution layers are respectively 2,1 and 2. As shown in fig. 4, the residual connection block consists of a residual branch consisting of one 1*1 convolution kernel and a non-residual branch consisting of 2 3*3 convolution kernels. The output channel numbers of the residual connection blocks are respectively: 64, 64, 128, 128, 256, 256, 512, 512, wherein the step size of the convolution kernel of the 3 rd, 5 th and 7 th residual connection blocks is 2, the input feature map is downsampled by 2 times, and the step sizes of the convolution kernels of the rest residual connection blocks are all 1. The final encoder output is characterized by a 32-fold downsampled feature map with a channel number of 512.
The decoder and the handwritten character pre-header are both composed of a 5-layer deconvolution layer network. Each layer of deconvolution network comprises a deconvolution layer, a batch regularization layer (Batch normalization) and a ReLU activation layer, wherein the convolution kernel size of the deconvolution layer is 3*3, and the step size is 2. The resolution of the output features becomes 2 times of the original resolution after passing through one layer of deconvolution layer network, and the resolution of the output features becomes 32 times of the resolution of the input features after passing through 5 layers of deconvolution layer network, which is equal to the resolution of the original document image. The number of output channels of the first 4 layers of deconvolution layer networks of the decoder and the handwritten character pre-measuring head are respectively as follows: 256, 128, 64, 32; the number of output channels of the 5 th layer deconvolution layer network of the decoder is 3, the output is the image of the initial erasure handwritten character, the number of output channels of the 5 th layer deconvolution layer network of the handwritten character pre-measuring head is 2, and the output is the handwritten character stroke mask and the handwritten character quadrilateral mask.
Since the rough generator still has the condition that part of the handwritten text is not erased, the rough generator is used for further erasing. The input of the fine generator comprises a preliminary erasing result and a predicted handwritten character stroke mask, and the image after the preliminary handwritten character erasing is further precisely erased through the predicted handwritten character stroke mask. The fine generator, as shown in fig. 5, adopts a "U-net" structure, including: the fine encoder and the fine decoder have jump connection between them, and the characteristics output by each layer of the fine encoder are input into the network of the corresponding layer of the fine decoder. The fine encoder consists of a 6-layer convolutional neural network and a hole convolutional block. The 6-layer convolutional neural network consists of a convolutional layer, a batch regularization layer (Batch normalization) and a ReLU activation layer. The convolution kernels of the remaining layers are 3*3 in size except for the first layer of convolution layers which have a 7*7 in size. The number of output channels of each convolution layer is respectively: 32, 64, 64, 128, 128, 128. The step length of the convolution kernels of the 2 nd and 4 th convolution layers is 2, and the step length of the rest layers is 1. The cavity convolution block is formed by stacking 4 cavity convolution kernels, the output channels of the cavity convolution kernels are 128, the convolution kernels are 3*3 in size, and the expansion rates are respectively: 2,4,8, 16. The cavity convolution block can increase the receptive field of the convolution kernel and capture context information of multiple scales. The output characteristic of the final fine encoder is a characteristic map with 4 times of resolution downsampling and 128 channels. The fine decoder consists of a 4-layer convolution layer network and a 2-layer deconvolution layer network, wherein the 1,2,4 and 6 layers are convolution layers, the 3 and 5 layers are deconvolution layers, convolution kernels of the convolution layers and the deconvolution layers are 3*3 in size, the convolution kernel step sizes of the convolution layers are 1, and the convolution kernel step sizes of the deconvolution layers are 2. The output access numbers of each layer of network of the fine encoder are respectively as follows: 128, 128, 64, 64, 32,3. The 3-channel image output by the last layer is the document image finally output by the network after the handwritten characters are erased.
A discriminator:
The arbiter consists of a global feature encoder and a sum local feature encoder, and a linear regression layer. The global feature encoder extracts global features of the whole picture; extracting local features of the handwritten character area by a local feature encoder; the linear regression layer splices the global features and the local features together, and comprehensively judges whether the picture is a network generated image or a real image by combining the two features.
The global feature encoder and the local feature encoder have the same structure, each consisting of 6 downsampling convolution kernels, which are 4*4 in size, 2 in step size, and feature dimensions of [64, 128, 256, 256, 256, 256] respectively, to downsample the input image by a factor of 64. The input pictures of the global feature encoder and the local feature encoder are different, the input of the global feature encoder is a document image, and the input of the local feature encoder is an image obtained by multiplying a handwritten text quadrilateral mask and the document image.
By extracting the global features and the local features of the image, the discriminator can pay attention to the details of the whole image and the handwritten text area at the same time, so that the generated image is natural as a whole and the detail textures are finer.
S5, training the handwritten character erasing model by using the processed data to obtain a final model.
The processed data was input into the handwritten word erase model, and training was performed for 100 rounds using an Adam optimizer with a learning rate of 0.0001, β (0.5,0.9), batchsize (4).
S6, erasing the handwritten character part in the document image by utilizing the final model.
The document image is input into a trained handwriting erasure model to obtain a document image with handwritten characters erased, and the result is shown in fig. 6. The results of the comparative experiments of this example with other methods are shown in table 1.
TABLE 1
Method of PSNR MSSIM MSE AGE
Pix2Pix 28.99 89.54 0.16 3.89
MTRNet++ 32.77 92.64 0.08 2.51
EraseNet 33.84 93.69 0.07 2.55
EnsNet 33.87 94.93 0.07 2.25
0urs 36.05 96.59 0.05 1.43
The above embodiments are merely illustrative of the preferred embodiments of the present application, and the scope of the present application is not limited thereto, but various modifications and improvements made by those skilled in the art to which the present application pertains are made without departing from the spirit of the present application, and all modifications and improvements fall within the scope of the present application as defined in the appended claims.

Claims (3)

1. A handwritten character erasing method based on a generated countermeasure network is characterized by comprising the following steps:
Collecting a document image with handwritten characters, and manually marking the document to obtain a data set; the method for carrying out the manual labeling comprises the following steps: erasing handwritten characters in the document image by using an Adobe Photoshop tool to obtain an erased document image; marking coordinates of the handwritten characters in the document image by using a quadrilateral frame to obtain a quadrilateral mask of the handwritten characters;
generating a handwritten text stroke mask based on the dataset;
preprocessing the document image to obtain processed data;
building a handwritten character erasure model based on a generated countermeasure network according to the handwritten character stroke mask; the constructed handwritten character erasure model comprises the following steps: a cascade generator and a arbiter;
The cascade generator is used for generating the erased document image;
the cascade generator includes: a coarse erase network and a fine erase network;
The rough erasing network is used for predicting the handwritten character stroke mask and the handwritten character quadrilateral mask and generating a preliminary erasing result; the coarse erasure network includes: an encoder, a decoder, and a handwritten word mask pre-header; wherein the encoder is used for extracting document image characteristics; the encoder consists of 3 layers of convolutional neural networks and 8 residual error connecting blocks, wherein each layer of convolutional neural network consists of a convolutional layer, a batch regularization layer and a ReLU activation layer, except for the fact that the convolutional kernel of a first layer of convolutional layer is 7*7, the convolutional kernels of the other layers are 3*3, and the output channel numbers of the convolutional layers are respectively: 32, 32, 64; the convolution kernel step length of each convolution layer is 2,1 and 2 respectively; the residual connection block consists of a residual branch and a non-residual branch, wherein the residual branch consists of a 1*1 convolution kernel, and the non-residual branch consists of 2 3*3 convolution kernels; the output channel numbers of the residual error connecting blocks are respectively as follows: 64, 64, 128, 128, 256, 256, 512, 512, wherein the step size of the convolution kernel of the 3 rd, 5 th and 7 th residual connection blocks is 2, the input feature map is downsampled by 2 times once, and the step sizes of the convolution kernels of the other residual connection blocks are all 1;
the decoder is used for decoding the document image characteristics into the preliminary erasure result;
The handwritten character mask prediction head is used for predicting the handwritten character stroke mask and the handwritten character quadrilateral mask from the document image features; the decoder and the handwritten character prediction head are both composed of a 5-layer deconvolution layer network; each layer of deconvolution network comprises a deconvolution layer, a batch regularization layer and a ReLU activation layer, wherein the convolution kernel of the deconvolution layer is 3*3 in size and the step length is 2; after each layer of deconvolution layer network is passed, the resolution of the output features becomes 2 times of the original resolution, and after the deconvolution layer network is passed by 5 layers of deconvolution layer networks, the resolution of the output features becomes 32 times of the resolution of the input features, which is equal to the resolution of the original document image; the number average of output channels of the first 4 layers of deconvolution layer networks of the decoder and the handwritten character pre-measuring head is respectively as follows: 256, 128, 64, 32; the number of output channels of the 5 th layer deconvolution layer network of the decoder is 3, the output is the image of the initial erasure handwritten character, the number of output channels of the 5 th layer deconvolution layer network of the handwritten character pre-measuring head is 2, and the output is the handwritten character stroke mask and the handwritten character quadrilateral mask;
The fine erasure is used for carrying out fine erasure on the preliminary erasure result to generate a document image after handwritten characters are erased;
The fine erasing network adopts a U-net network structure; the fine erasure network includes: a fine encoder and a fine decoder, wherein jump connection exists between the fine encoder and the fine decoder, and the characteristics output by each layer of the fine encoder are input into a network of corresponding layers of the fine decoder; the fine encoder consists of a 6-layer convolutional neural network and a cavity convolutional block; the 6-layer convolutional neural network consists of a convolutional layer, a batch regularization layer and a ReLU activation layer; the convolution kernels of the other layers are 3*3 in size except for the first layer of convolution layers which have a 7*7 in size; the number of output channels of each convolution layer is respectively: 32, 64, 64, 128, 128, 128; the step length of the convolution kernels of the 2 nd and 4 th convolution layers is 2, and the step length of the other layers is 1; the cavity convolution block is formed by stacking 4 cavity convolution kernels, the output channels of the cavity convolution kernels are 128, the convolution kernels are 3*3 in size, and the expansion rates are respectively: 2,4,8, 16; the cavity convolution block can increase the receptive field of the convolution kernel and capture the context information of multiple scales; the output characteristic of the final fine encoder is a characteristic diagram with 4 times of resolution downsampling and 128 channels; the fine decoder consists of a 4-layer convolution layer network and a 2-layer deconvolution layer network, wherein the 1,2,4 and 6 layers are convolution layers, the 3 and 5 layers are deconvolution layers, the convolution kernels of the convolution layers and the deconvolution layers are 3*3 in size, the convolution kernel step sizes of the convolution layers are 1, and the convolution kernel step sizes of the deconvolution layers are 2; the output access numbers of each layer of network of the fine encoder are respectively as follows: 128, 128, 64, 64, 32,3; the 3-channel image output by the last layer is the document image finally output by the network after the handwritten characters are erased;
the discriminator is used for discriminating the true and false of the image; the discriminator consists of a global feature encoder, a local feature encoder and a linear regression layer; the global feature coder is used for extracting global features of the whole picture; the local feature encoder is used for extracting local features of the handwritten character area; the linear regression layer splices the global features and the local features together, and comprehensively judges whether the picture is a network generated image or a real image by combining the two features;
The global feature encoder and the local feature encoder have the same structure and are respectively composed of 6 downsampling convolution kernels, the size of the convolution kernels is 4*4, the step size is 2, and the feature dimensions are [64, 128, 256, 256, 256, 256]; the input pictures of the global feature encoder and the local feature encoder are different, the input of the global feature encoder is a document image, and the input of the local feature encoder is an image obtained by multiplying a handwriting text quadrilateral mask and the document image;
Training the handwritten character erasure model by using the processed data to obtain a final model;
and erasing the handwritten character part in the document image by using the final model.
2. The method of generating a handwritten word erasure based on a countermeasure network of claim 1, wherein the method of generating the handwritten word stroke mask includes: and using the original document image and the erased document image as difference values, and then performing threshold binarization, corrosion expansion to eliminate noise, inward contraction to obtain a character skeleton, outward smooth expansion to obtain a character outer boundary to automatically generate the handwritten character stroke mask.
3. The handwritten text erasure method based on generation of countermeasure networks according to claim 1, wherein the method of obtaining the processed data includes: cutting the complete document image into a plurality of image blocks with the size of 512 x 512 pixels, randomly rotating the image blocks within the range of +/-10 degrees, and turning over the image blocks with the probability level of 0.5 to obtain the processed data.
CN202311039086.4A 2023-08-17 2023-08-17 Handwritten character erasing method based on generation countermeasure network Active CN117132994B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311039086.4A CN117132994B (en) 2023-08-17 2023-08-17 Handwritten character erasing method based on generation countermeasure network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311039086.4A CN117132994B (en) 2023-08-17 2023-08-17 Handwritten character erasing method based on generation countermeasure network

Publications (2)

Publication Number Publication Date
CN117132994A CN117132994A (en) 2023-11-28
CN117132994B true CN117132994B (en) 2024-07-02

Family

ID=88853874

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311039086.4A Active CN117132994B (en) 2023-08-17 2023-08-17 Handwritten character erasing method based on generation countermeasure network

Country Status (1)

Country Link
CN (1) CN117132994B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116009749A (en) * 2022-11-08 2023-04-25 福建亿能达信息技术股份有限公司 Handwritten character erasing method and system based on attention mechanism

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492627B (en) * 2019-01-22 2022-11-08 华南理工大学 Scene text erasing method based on depth model of full convolution network
US11250252B2 (en) * 2019-12-03 2022-02-15 Adobe Inc. Simulated handwriting image generator
US20210286946A1 (en) * 2020-03-16 2021-09-16 Samsung Sds Co., Ltd. Apparatus and method for learning text detection model
CN114708601A (en) * 2022-04-18 2022-07-05 南京大学 Handwritten character erasing method based on deep learning
CN115578403A (en) * 2022-09-20 2023-01-06 上海合合信息科技股份有限公司 Erasing optimization method and device for handwritten contents in document image
CN115965975A (en) * 2022-09-21 2023-04-14 复旦大学 Scene image character detection method based on multi-scale feature aggregation
CN116091630A (en) * 2022-11-01 2023-05-09 哈尔滨工业大学(深圳) Method and device for training image generation model
CN116051686B (en) * 2023-01-13 2023-08-01 中国科学技术大学 Method, system, equipment and storage medium for erasing characters on graph
CN116012835A (en) * 2023-02-20 2023-04-25 张国栋 Two-stage scene text erasing method based on text segmentation
CN115862030B (en) * 2023-02-24 2023-05-16 城云科技(中国)有限公司 Algorithm model for removing text in image, construction method, device and application thereof

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116009749A (en) * 2022-11-08 2023-04-25 福建亿能达信息技术股份有限公司 Handwritten character erasing method and system based on attention mechanism

Also Published As

Publication number Publication date
CN117132994A (en) 2023-11-28

Similar Documents

Publication Publication Date Title
CN110322495B (en) Scene text segmentation method based on weak supervised deep learning
CN111723585B (en) Style-controllable image text real-time translation and conversion method
CN108492272B (en) Cardiovascular vulnerable plaque identification method and system based on attention model and multitask neural network
CN111639646B (en) Test paper handwritten English character recognition method and system based on deep learning
CN108376244B (en) Method for identifying text font in natural scene picture
CN109948714B (en) Chinese scene text line identification method based on residual convolution and recurrent neural network
CN111986125B (en) Method for multi-target task instance segmentation
CN110738207A (en) character detection method for fusing character area edge information in character image
CN110969589A (en) Dynamic scene fuzzy image blind restoration method based on multi-stream attention countermeasure network
CN110503103B (en) Character segmentation method in text line based on full convolution neural network
CN113011357A (en) Depth fake face video positioning method based on space-time fusion
CN110070548B (en) Deep learning training sample optimization method
CN111914654B (en) Text layout analysis method, device, equipment and medium
CN111401353A (en) Method, device and equipment for identifying mathematical formula
CN112102323A (en) Adherent nucleus segmentation method based on generation of countermeasure network and Caps-Unet network
CN111127354A (en) Single-image rain removing method based on multi-scale dictionary learning
CN115273112A (en) Table identification method and device, electronic equipment and readable storage medium
CN111723238A (en) Method, device, equipment and medium for clipping video multiple events and describing text
CN112861785B (en) Instance segmentation and image restoration-based pedestrian re-identification method with shielding function
CN117036243A (en) Method, device, equipment and storage medium for detecting surface defects of shaving board
CN117132994B (en) Handwritten character erasing method based on generation countermeasure network
CN118096799B (en) Hybrid weakly-supervised wafer SEM defect segmentation method and system
CN112733861B (en) Text erasing and character matting method based on U-shaped residual error network
CN114241495A (en) Data enhancement method for offline handwritten text recognition
CN114155556A (en) Human body posture estimation method and system based on stacked hourglass network added with channel shuffle module

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant