CN117132994B - Handwritten character erasing method based on generation countermeasure network - Google Patents
Handwritten character erasing method based on generation countermeasure network Download PDFInfo
- Publication number
- CN117132994B CN117132994B CN202311039086.4A CN202311039086A CN117132994B CN 117132994 B CN117132994 B CN 117132994B CN 202311039086 A CN202311039086 A CN 202311039086A CN 117132994 B CN117132994 B CN 117132994B
- Authority
- CN
- China
- Prior art keywords
- layer
- network
- handwritten
- document image
- handwritten character
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000012549 training Methods 0.000 claims abstract description 9
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 238000013527 convolutional neural network Methods 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 6
- 238000010586 diagram Methods 0.000 claims description 5
- 238000012417 linear regression Methods 0.000 claims description 4
- 238000005260 corrosion Methods 0.000 claims description 3
- 230000007797 corrosion Effects 0.000 claims description 3
- 238000002372 labelling Methods 0.000 claims description 3
- 230000008602 contraction Effects 0.000 claims description 2
- 238000013135 deep learning Methods 0.000 abstract description 4
- 230000000694 effects Effects 0.000 abstract description 3
- 238000012545 processing Methods 0.000 description 4
- 238000010606 normalization Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000003042 antagnostic effect Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/19147—Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/094—Adversarial learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/16—Image preprocessing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/199—Arrangements for recognition using optical reference masks, e.g. holographic masks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Character Input (AREA)
- Character Discrimination (AREA)
Abstract
The application discloses a handwritten character erasing method based on a generated countermeasure network, which comprises the following steps: collecting a document image with handwritten characters, and manually marking the document to obtain a data set; generating a handwritten text stroke mask based on the dataset; preprocessing the document image to obtain processed data; building a handwritten character erasure model based on a generated countermeasure network according to the handwritten character stroke mask; training the handwritten character erasing model by using the processed data to obtain a final model; and erasing the handwritten text part in the document image by using the final model. The application utilizes the deep learning network to extract the characteristics of the document image, predicts the accurate stroke positioning area of the handwritten characters to be erased, adopts the cascade generation countermeasure network to erase the handwritten characters, and has the advantages of simple realization, high speed, good erasing effect and the like.
Description
Technical Field
The application relates to the field of image processing, in particular to a handwritten character erasing method based on a generated countermeasure network.
Background
Handwritten text plays an important role in modern society and is widely used in various fields including document editing, signing, artistic creation, and the like. However, it is sometimes necessary to modify or erase handwritten text to accommodate specific needs or to correct errors. The use of Adobe Photoshop and other picture editing tools to manually erase handwritten text in pictures is inefficient and has a high threshold for users, thus requiring an efficient and accurate automatic handwritten text erase method.
While erasing the handwritten text is not intended to be erased by mistake, the document image contains many printed text, and the handwritten text is close to the printed text in many cases, so that it is a challenging problem to erase the handwritten text accurately. The existing method is difficult to erase the handwritten characters in the document with complex layout, and the erasure of the handwritten characters faces various problems, such as the naturalness of the erased image, the elimination of erasure marks and the like. Therefore, there is a need for an innovative handwritten character erasing method that can effectively address these problems. With the rapid development of deep learning technology, the generation of an antagonistic network (GANs) has demonstrated a powerful capability in the field of image processing. GANs is a neural network composed of a generator and a discriminator, wherein the generator gradually learns to generate realistic data through an countermeasure training mode, and the discriminator continuously improves the identification capability of the real data and the generated data. This technique has achieved remarkable results in image generation, conversion and repair.
Disclosure of Invention
The application aims to provide a method for erasing handwritten characters based on a generated countermeasure network, which can be used for automatically removing handwritten stains in a document image to obtain a clean document image.
In order to achieve the above object, the present application provides a handwritten character erasing method based on generation of an countermeasure network, comprising the steps of:
collecting a document image with handwritten characters, and manually marking the document to obtain a data set;
generating a handwritten text stroke mask based on the dataset;
preprocessing the document image to obtain processed data;
Building a handwritten character erasure model based on a generated countermeasure network according to the handwritten character stroke mask;
Training the handwritten character erasure model by using the processed data to obtain a final model;
and erasing the handwritten character part in the document image by using the final model.
Preferably, the method for performing the manual labeling comprises the following steps: erasing handwritten characters in the document image by using an Adobe Photoshop tool to obtain an erased document image; and marking the coordinates of the handwritten characters in the document image by using a quadrilateral frame to obtain a quadrilateral mask of the handwritten characters.
Preferably, the method for generating the handwritten character stroke mask comprises the following steps: and using the original document image and the erased document image as difference values, and then performing threshold binarization, corrosion expansion to eliminate noise, inward contraction to obtain a character skeleton, outward smooth expansion to obtain a character outer boundary to automatically generate the handwritten character stroke mask.
Preferably, the method for obtaining the processed data comprises the following steps: cutting the complete document image into a plurality of image blocks with the size of 512 x 512 pixels, randomly rotating the image blocks within the range of +/-10 degrees, and turning over the image blocks with the probability level of 0.5 to obtain the processed data.
Preferably, the constructed handwritten character erasure model comprises: a cascade generator and a arbiter;
The cascade generator is used for generating the erased document image;
the discriminator is used for discriminating the true and false of the image.
Preferably, the cascade generator includes: a coarse erase network and a fine erase network;
the rough erasing network is used for predicting the handwritten character stroke mask and the handwritten character quadrilateral mask and generating a preliminary erasing result;
and the fine erasure is used for carrying out fine erasure on the preliminary erasure result to generate a document image with handwritten characters erased.
Preferably, the rough erasure network comprises: an encoder, a decoder, and a handwritten word mask pre-header;
The encoder is used for extracting the characteristics of the document image;
the decoder is used for decoding the document image characteristics into the preliminary erasure result;
The handwritten character mask prediction head is used for predicting the handwritten character stroke mask and the handwritten character quadrilateral mask from the document image features.
Preferably, the fine erasure network adopts a U-net network structure.
Compared with the prior art, the application has the following beneficial effects:
The application aims to provide a handwritten character erasing method based on a generated countermeasure network, which realizes automatic and high-quality erasing of handwritten characters by utilizing the generated countermeasure network. And extracting features of the document image by using a deep learning network, predicting the area to be erased for accurate positioning of strokes of the handwritten characters, and erasing the handwritten characters by using a cascade generation countermeasure network. The method can not only maintain the naturalness of the erased image, but also effectively eliminate the erasing trace, thereby meeting the processing requirements of various handwriting characters. By combining deep learning with image processing, the method has the advantages of simplicity in implementation, high speed, good erasing effect and the like.
Drawings
In order to more clearly illustrate the technical solutions of the present application, the drawings that are needed in the embodiments are briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of the method of the present application;
FIG. 2 is a flow chart of generating a handwritten character stroke mask in accordance with an embodiment of the present application;
FIG. 3 is a schematic diagram of the overall architecture of a cascade generator according to an embodiment of the application;
FIG. 4 is a schematic diagram of a residual connection block structure according to an embodiment of the present application;
fig. 5 is a schematic diagram of the structure of a fine generator according to an embodiment of the present application;
Fig. 6 is a schematic diagram of a handwritten character erasing effect according to an embodiment of the application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In order that the above-recited objects, features and advantages of the present application will become more readily apparent, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description.
As shown in fig. 1, a flow chart of a method of the present embodiment includes the steps of:
S1, collecting document images with handwritten characters, and manually labeling the documents to obtain a data set.
In this embodiment, 545 images with handwritten text are collected, and the pixels of the handwritten text area in the document image are replaced with pixels of the document background area using the stamp function in the Adobe Photoshop tool, note that here the handwritten text needs to be precisely erased and the printed text cannot be erroneously erased, so that an erased document image is obtained. And meanwhile, marking the coordinate position of the handwritten text by using a quadrilateral frame to obtain a quadrilateral mask of the handwritten text, wherein 1 represents the handwritten text and 0 represents the non-handwritten text in the mask. The dataset was then randomly divided into a training set containing 430 images and a test set containing 115 images.
S2, generating a handwritten character stroke mask based on the data set.
As shown in fig. 2, the original document image and the erased document image are subjected to difference value, the threshold value is set to 20, and threshold value binarization is carried out, so that a handwritten character stroke mask containing noise is obtained; then, carrying out corrosion expansion operation on the handwritten character stroke mask containing noise, and filtering the noise to obtain the handwritten character stroke mask with the noise removed; and finally, the noise-removed handwritten character stroke mask is contracted inwards to obtain a character skeleton, and then is smoothly expanded outwards to obtain a character outer boundary so as to obtain a smooth handwritten character stroke mask.
S3, preprocessing the document image to obtain processed data.
Because the picture pixels are large, the whole picture is input into the network for training, and a large memory resource is consumed. Finally, 430 training set images were cut into 4995 image blocks. To increase the diversity of samples, the present embodiment performs a random rotation of ±10° on the image block, and flips the image block with a probability level of 0.5.
S4, building a handwritten character erasing model based on a generated countermeasure network according to the handwritten character stroke mask.
The handwritten character erasure model comprises a cascade generator and a discriminator. The cascade generator is used for generating a document image with handwritten characters erased; the discriminator is used for carrying out true and false discrimination, namely whether the given document image belongs to the image generated by the generator.
A cascade generator:
The structure of the cascade generator is shown in fig. 3. It comprises a coarse generator and a fine generator. The rough generator is used for predicting the handwriting stroke mask and the handwriting quadrilateral mask, and meanwhile, the handwriting in the document image is preliminarily erased to obtain a preliminary erasing result; and the fine generator performs finer erasure on the preliminary erasure result according to the predicted handwritten character stroke mask, and obtains a document image after handwritten characters are erased.
The above-described roughness generator includes an encoder, decoder and handwritten word mask pre-header. Wherein the encoder extracts the document image features using a convolutional neural network; the decoder is used for decoding the document image characteristics into a preliminary erasure result; the handwritten character mask pre-header decodes the handwritten character stroke mask and the handwritten character quadrilateral mask from the document image features.
The encoder consists of a 3-layer convolutional neural network and 8 residual connecting blocks. Each layer of convolutional neural network consists of a convolutional layer and a batch regularization layer (Batch normalization) +ReLU activation layer, wherein except for the convolutional core of the first layer of convolutional layer which is 7*7, the convolutional cores of the other layers are 3*3, and the output channel numbers of the convolutional layers are respectively as follows: 32, 32, 64; the convolution kernel step sizes of the convolution layers are respectively 2,1 and 2. As shown in fig. 4, the residual connection block consists of a residual branch consisting of one 1*1 convolution kernel and a non-residual branch consisting of 2 3*3 convolution kernels. The output channel numbers of the residual connection blocks are respectively: 64, 64, 128, 128, 256, 256, 512, 512, wherein the step size of the convolution kernel of the 3 rd, 5 th and 7 th residual connection blocks is 2, the input feature map is downsampled by 2 times, and the step sizes of the convolution kernels of the rest residual connection blocks are all 1. The final encoder output is characterized by a 32-fold downsampled feature map with a channel number of 512.
The decoder and the handwritten character pre-header are both composed of a 5-layer deconvolution layer network. Each layer of deconvolution network comprises a deconvolution layer, a batch regularization layer (Batch normalization) and a ReLU activation layer, wherein the convolution kernel size of the deconvolution layer is 3*3, and the step size is 2. The resolution of the output features becomes 2 times of the original resolution after passing through one layer of deconvolution layer network, and the resolution of the output features becomes 32 times of the resolution of the input features after passing through 5 layers of deconvolution layer network, which is equal to the resolution of the original document image. The number of output channels of the first 4 layers of deconvolution layer networks of the decoder and the handwritten character pre-measuring head are respectively as follows: 256, 128, 64, 32; the number of output channels of the 5 th layer deconvolution layer network of the decoder is 3, the output is the image of the initial erasure handwritten character, the number of output channels of the 5 th layer deconvolution layer network of the handwritten character pre-measuring head is 2, and the output is the handwritten character stroke mask and the handwritten character quadrilateral mask.
Since the rough generator still has the condition that part of the handwritten text is not erased, the rough generator is used for further erasing. The input of the fine generator comprises a preliminary erasing result and a predicted handwritten character stroke mask, and the image after the preliminary handwritten character erasing is further precisely erased through the predicted handwritten character stroke mask. The fine generator, as shown in fig. 5, adopts a "U-net" structure, including: the fine encoder and the fine decoder have jump connection between them, and the characteristics output by each layer of the fine encoder are input into the network of the corresponding layer of the fine decoder. The fine encoder consists of a 6-layer convolutional neural network and a hole convolutional block. The 6-layer convolutional neural network consists of a convolutional layer, a batch regularization layer (Batch normalization) and a ReLU activation layer. The convolution kernels of the remaining layers are 3*3 in size except for the first layer of convolution layers which have a 7*7 in size. The number of output channels of each convolution layer is respectively: 32, 64, 64, 128, 128, 128. The step length of the convolution kernels of the 2 nd and 4 th convolution layers is 2, and the step length of the rest layers is 1. The cavity convolution block is formed by stacking 4 cavity convolution kernels, the output channels of the cavity convolution kernels are 128, the convolution kernels are 3*3 in size, and the expansion rates are respectively: 2,4,8, 16. The cavity convolution block can increase the receptive field of the convolution kernel and capture context information of multiple scales. The output characteristic of the final fine encoder is a characteristic map with 4 times of resolution downsampling and 128 channels. The fine decoder consists of a 4-layer convolution layer network and a 2-layer deconvolution layer network, wherein the 1,2,4 and 6 layers are convolution layers, the 3 and 5 layers are deconvolution layers, convolution kernels of the convolution layers and the deconvolution layers are 3*3 in size, the convolution kernel step sizes of the convolution layers are 1, and the convolution kernel step sizes of the deconvolution layers are 2. The output access numbers of each layer of network of the fine encoder are respectively as follows: 128, 128, 64, 64, 32,3. The 3-channel image output by the last layer is the document image finally output by the network after the handwritten characters are erased.
A discriminator:
The arbiter consists of a global feature encoder and a sum local feature encoder, and a linear regression layer. The global feature encoder extracts global features of the whole picture; extracting local features of the handwritten character area by a local feature encoder; the linear regression layer splices the global features and the local features together, and comprehensively judges whether the picture is a network generated image or a real image by combining the two features.
The global feature encoder and the local feature encoder have the same structure, each consisting of 6 downsampling convolution kernels, which are 4*4 in size, 2 in step size, and feature dimensions of [64, 128, 256, 256, 256, 256] respectively, to downsample the input image by a factor of 64. The input pictures of the global feature encoder and the local feature encoder are different, the input of the global feature encoder is a document image, and the input of the local feature encoder is an image obtained by multiplying a handwritten text quadrilateral mask and the document image.
By extracting the global features and the local features of the image, the discriminator can pay attention to the details of the whole image and the handwritten text area at the same time, so that the generated image is natural as a whole and the detail textures are finer.
S5, training the handwritten character erasing model by using the processed data to obtain a final model.
The processed data was input into the handwritten word erase model, and training was performed for 100 rounds using an Adam optimizer with a learning rate of 0.0001, β (0.5,0.9), batchsize (4).
S6, erasing the handwritten character part in the document image by utilizing the final model.
The document image is input into a trained handwriting erasure model to obtain a document image with handwritten characters erased, and the result is shown in fig. 6. The results of the comparative experiments of this example with other methods are shown in table 1.
TABLE 1
Method of | PSNR | MSSIM | MSE | AGE |
Pix2Pix | 28.99 | 89.54 | 0.16 | 3.89 |
MTRNet++ | 32.77 | 92.64 | 0.08 | 2.51 |
EraseNet | 33.84 | 93.69 | 0.07 | 2.55 |
EnsNet | 33.87 | 94.93 | 0.07 | 2.25 |
0urs | 36.05 | 96.59 | 0.05 | 1.43 |
The above embodiments are merely illustrative of the preferred embodiments of the present application, and the scope of the present application is not limited thereto, but various modifications and improvements made by those skilled in the art to which the present application pertains are made without departing from the spirit of the present application, and all modifications and improvements fall within the scope of the present application as defined in the appended claims.
Claims (3)
1. A handwritten character erasing method based on a generated countermeasure network is characterized by comprising the following steps:
Collecting a document image with handwritten characters, and manually marking the document to obtain a data set; the method for carrying out the manual labeling comprises the following steps: erasing handwritten characters in the document image by using an Adobe Photoshop tool to obtain an erased document image; marking coordinates of the handwritten characters in the document image by using a quadrilateral frame to obtain a quadrilateral mask of the handwritten characters;
generating a handwritten text stroke mask based on the dataset;
preprocessing the document image to obtain processed data;
building a handwritten character erasure model based on a generated countermeasure network according to the handwritten character stroke mask; the constructed handwritten character erasure model comprises the following steps: a cascade generator and a arbiter;
The cascade generator is used for generating the erased document image;
the cascade generator includes: a coarse erase network and a fine erase network;
The rough erasing network is used for predicting the handwritten character stroke mask and the handwritten character quadrilateral mask and generating a preliminary erasing result; the coarse erasure network includes: an encoder, a decoder, and a handwritten word mask pre-header; wherein the encoder is used for extracting document image characteristics; the encoder consists of 3 layers of convolutional neural networks and 8 residual error connecting blocks, wherein each layer of convolutional neural network consists of a convolutional layer, a batch regularization layer and a ReLU activation layer, except for the fact that the convolutional kernel of a first layer of convolutional layer is 7*7, the convolutional kernels of the other layers are 3*3, and the output channel numbers of the convolutional layers are respectively: 32, 32, 64; the convolution kernel step length of each convolution layer is 2,1 and 2 respectively; the residual connection block consists of a residual branch and a non-residual branch, wherein the residual branch consists of a 1*1 convolution kernel, and the non-residual branch consists of 2 3*3 convolution kernels; the output channel numbers of the residual error connecting blocks are respectively as follows: 64, 64, 128, 128, 256, 256, 512, 512, wherein the step size of the convolution kernel of the 3 rd, 5 th and 7 th residual connection blocks is 2, the input feature map is downsampled by 2 times once, and the step sizes of the convolution kernels of the other residual connection blocks are all 1;
the decoder is used for decoding the document image characteristics into the preliminary erasure result;
The handwritten character mask prediction head is used for predicting the handwritten character stroke mask and the handwritten character quadrilateral mask from the document image features; the decoder and the handwritten character prediction head are both composed of a 5-layer deconvolution layer network; each layer of deconvolution network comprises a deconvolution layer, a batch regularization layer and a ReLU activation layer, wherein the convolution kernel of the deconvolution layer is 3*3 in size and the step length is 2; after each layer of deconvolution layer network is passed, the resolution of the output features becomes 2 times of the original resolution, and after the deconvolution layer network is passed by 5 layers of deconvolution layer networks, the resolution of the output features becomes 32 times of the resolution of the input features, which is equal to the resolution of the original document image; the number average of output channels of the first 4 layers of deconvolution layer networks of the decoder and the handwritten character pre-measuring head is respectively as follows: 256, 128, 64, 32; the number of output channels of the 5 th layer deconvolution layer network of the decoder is 3, the output is the image of the initial erasure handwritten character, the number of output channels of the 5 th layer deconvolution layer network of the handwritten character pre-measuring head is 2, and the output is the handwritten character stroke mask and the handwritten character quadrilateral mask;
The fine erasure is used for carrying out fine erasure on the preliminary erasure result to generate a document image after handwritten characters are erased;
The fine erasing network adopts a U-net network structure; the fine erasure network includes: a fine encoder and a fine decoder, wherein jump connection exists between the fine encoder and the fine decoder, and the characteristics output by each layer of the fine encoder are input into a network of corresponding layers of the fine decoder; the fine encoder consists of a 6-layer convolutional neural network and a cavity convolutional block; the 6-layer convolutional neural network consists of a convolutional layer, a batch regularization layer and a ReLU activation layer; the convolution kernels of the other layers are 3*3 in size except for the first layer of convolution layers which have a 7*7 in size; the number of output channels of each convolution layer is respectively: 32, 64, 64, 128, 128, 128; the step length of the convolution kernels of the 2 nd and 4 th convolution layers is 2, and the step length of the other layers is 1; the cavity convolution block is formed by stacking 4 cavity convolution kernels, the output channels of the cavity convolution kernels are 128, the convolution kernels are 3*3 in size, and the expansion rates are respectively: 2,4,8, 16; the cavity convolution block can increase the receptive field of the convolution kernel and capture the context information of multiple scales; the output characteristic of the final fine encoder is a characteristic diagram with 4 times of resolution downsampling and 128 channels; the fine decoder consists of a 4-layer convolution layer network and a 2-layer deconvolution layer network, wherein the 1,2,4 and 6 layers are convolution layers, the 3 and 5 layers are deconvolution layers, the convolution kernels of the convolution layers and the deconvolution layers are 3*3 in size, the convolution kernel step sizes of the convolution layers are 1, and the convolution kernel step sizes of the deconvolution layers are 2; the output access numbers of each layer of network of the fine encoder are respectively as follows: 128, 128, 64, 64, 32,3; the 3-channel image output by the last layer is the document image finally output by the network after the handwritten characters are erased;
the discriminator is used for discriminating the true and false of the image; the discriminator consists of a global feature encoder, a local feature encoder and a linear regression layer; the global feature coder is used for extracting global features of the whole picture; the local feature encoder is used for extracting local features of the handwritten character area; the linear regression layer splices the global features and the local features together, and comprehensively judges whether the picture is a network generated image or a real image by combining the two features;
The global feature encoder and the local feature encoder have the same structure and are respectively composed of 6 downsampling convolution kernels, the size of the convolution kernels is 4*4, the step size is 2, and the feature dimensions are [64, 128, 256, 256, 256, 256]; the input pictures of the global feature encoder and the local feature encoder are different, the input of the global feature encoder is a document image, and the input of the local feature encoder is an image obtained by multiplying a handwriting text quadrilateral mask and the document image;
Training the handwritten character erasure model by using the processed data to obtain a final model;
and erasing the handwritten character part in the document image by using the final model.
2. The method of generating a handwritten word erasure based on a countermeasure network of claim 1, wherein the method of generating the handwritten word stroke mask includes: and using the original document image and the erased document image as difference values, and then performing threshold binarization, corrosion expansion to eliminate noise, inward contraction to obtain a character skeleton, outward smooth expansion to obtain a character outer boundary to automatically generate the handwritten character stroke mask.
3. The handwritten text erasure method based on generation of countermeasure networks according to claim 1, wherein the method of obtaining the processed data includes: cutting the complete document image into a plurality of image blocks with the size of 512 x 512 pixels, randomly rotating the image blocks within the range of +/-10 degrees, and turning over the image blocks with the probability level of 0.5 to obtain the processed data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311039086.4A CN117132994B (en) | 2023-08-17 | 2023-08-17 | Handwritten character erasing method based on generation countermeasure network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311039086.4A CN117132994B (en) | 2023-08-17 | 2023-08-17 | Handwritten character erasing method based on generation countermeasure network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117132994A CN117132994A (en) | 2023-11-28 |
CN117132994B true CN117132994B (en) | 2024-07-02 |
Family
ID=88853874
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311039086.4A Active CN117132994B (en) | 2023-08-17 | 2023-08-17 | Handwritten character erasing method based on generation countermeasure network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117132994B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116009749A (en) * | 2022-11-08 | 2023-04-25 | 福建亿能达信息技术股份有限公司 | Handwritten character erasing method and system based on attention mechanism |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109492627B (en) * | 2019-01-22 | 2022-11-08 | 华南理工大学 | Scene text erasing method based on depth model of full convolution network |
US11250252B2 (en) * | 2019-12-03 | 2022-02-15 | Adobe Inc. | Simulated handwriting image generator |
US20210286946A1 (en) * | 2020-03-16 | 2021-09-16 | Samsung Sds Co., Ltd. | Apparatus and method for learning text detection model |
CN114708601A (en) * | 2022-04-18 | 2022-07-05 | 南京大学 | Handwritten character erasing method based on deep learning |
CN115578403A (en) * | 2022-09-20 | 2023-01-06 | 上海合合信息科技股份有限公司 | Erasing optimization method and device for handwritten contents in document image |
CN115965975A (en) * | 2022-09-21 | 2023-04-14 | 复旦大学 | Scene image character detection method based on multi-scale feature aggregation |
CN116091630A (en) * | 2022-11-01 | 2023-05-09 | 哈尔滨工业大学(深圳) | Method and device for training image generation model |
CN116051686B (en) * | 2023-01-13 | 2023-08-01 | 中国科学技术大学 | Method, system, equipment and storage medium for erasing characters on graph |
CN116012835A (en) * | 2023-02-20 | 2023-04-25 | 张国栋 | Two-stage scene text erasing method based on text segmentation |
CN115862030B (en) * | 2023-02-24 | 2023-05-16 | 城云科技(中国)有限公司 | Algorithm model for removing text in image, construction method, device and application thereof |
-
2023
- 2023-08-17 CN CN202311039086.4A patent/CN117132994B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116009749A (en) * | 2022-11-08 | 2023-04-25 | 福建亿能达信息技术股份有限公司 | Handwritten character erasing method and system based on attention mechanism |
Also Published As
Publication number | Publication date |
---|---|
CN117132994A (en) | 2023-11-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110322495B (en) | Scene text segmentation method based on weak supervised deep learning | |
CN111723585B (en) | Style-controllable image text real-time translation and conversion method | |
CN108492272B (en) | Cardiovascular vulnerable plaque identification method and system based on attention model and multitask neural network | |
CN111639646B (en) | Test paper handwritten English character recognition method and system based on deep learning | |
CN108376244B (en) | Method for identifying text font in natural scene picture | |
CN109948714B (en) | Chinese scene text line identification method based on residual convolution and recurrent neural network | |
CN111986125B (en) | Method for multi-target task instance segmentation | |
CN110738207A (en) | character detection method for fusing character area edge information in character image | |
CN110969589A (en) | Dynamic scene fuzzy image blind restoration method based on multi-stream attention countermeasure network | |
CN110503103B (en) | Character segmentation method in text line based on full convolution neural network | |
CN113011357A (en) | Depth fake face video positioning method based on space-time fusion | |
CN110070548B (en) | Deep learning training sample optimization method | |
CN111914654B (en) | Text layout analysis method, device, equipment and medium | |
CN111401353A (en) | Method, device and equipment for identifying mathematical formula | |
CN112102323A (en) | Adherent nucleus segmentation method based on generation of countermeasure network and Caps-Unet network | |
CN111127354A (en) | Single-image rain removing method based on multi-scale dictionary learning | |
CN115273112A (en) | Table identification method and device, electronic equipment and readable storage medium | |
CN111723238A (en) | Method, device, equipment and medium for clipping video multiple events and describing text | |
CN112861785B (en) | Instance segmentation and image restoration-based pedestrian re-identification method with shielding function | |
CN117036243A (en) | Method, device, equipment and storage medium for detecting surface defects of shaving board | |
CN117132994B (en) | Handwritten character erasing method based on generation countermeasure network | |
CN118096799B (en) | Hybrid weakly-supervised wafer SEM defect segmentation method and system | |
CN112733861B (en) | Text erasing and character matting method based on U-shaped residual error network | |
CN114241495A (en) | Data enhancement method for offline handwritten text recognition | |
CN114155556A (en) | Human body posture estimation method and system based on stacked hourglass network added with channel shuffle module |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |