[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN114418869B - Document image geometric correction method, system, device and medium - Google Patents

Document image geometric correction method, system, device and medium Download PDF

Info

Publication number
CN114418869B
CN114418869B CN202111584077.4A CN202111584077A CN114418869B CN 114418869 B CN114418869 B CN 114418869B CN 202111584077 A CN202111584077 A CN 202111584077A CN 114418869 B CN114418869 B CN 114418869B
Authority
CN
China
Prior art keywords
document image
document
image
coordinate
acquiring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111584077.4A
Other languages
Chinese (zh)
Other versions
CN114418869A (en
Inventor
金连文
张家鑫
罗灿杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Zhuhai Institute of Modern Industrial Innovation of South China University of Technology
Original Assignee
South China University of Technology SCUT
Zhuhai Institute of Modern Industrial Innovation of South China University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT, Zhuhai Institute of Modern Industrial Innovation of South China University of Technology filed Critical South China University of Technology SCUT
Priority to CN202111584077.4A priority Critical patent/CN114418869B/en
Publication of CN114418869A publication Critical patent/CN114418869A/en
Application granted granted Critical
Publication of CN114418869B publication Critical patent/CN114418869B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/80Geometric correction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30176Document

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a document image geometric correction method, a system, a device and a medium, wherein the method comprises the following steps: acquiring a first document image, classifying pixels in the first document image, distinguishing a foreground document area and an environment boundary area in the document image, and acquiring a mask image of the foreground document area; extracting control points from the mask map, performing preliminary correction on the first document image according to the control points, deleting the environment boundary, and obtaining a second document image with the preliminary correction and the environment boundary deletion; and acquiring a first coordinate offset matrix of the second document image, and acquiring a corrected third document image after offsetting the second document image according to the first coordinate offset matrix. The invention can process the photographed document image with different environment boundary areas, including the case of having a smaller environment boundary area, having a larger environment boundary area, or having no environment boundary area. The invention can be widely applied to the technical fields of pattern recognition and artificial intelligence.

Description

Document image geometric correction method, system, device and medium
Technical Field
The invention relates to the technical field of pattern recognition and artificial intelligence, in particular to a method, a system, a device and a medium for geometric correction of a document image.
Background
With the development of semiconductor technology, the built-in camera of the mobile device is more advanced, and the imaging quality is higher. The use of a built-in camera to take a picture to digitize a document image has become a convenient way of digitizing a document. However, due to perspective deformation caused by improper position and angle of the camera during photographing, and deformation such as bending, folding, wrinkling and the like of the document, the photographed document image has geometric deformation. These deformations can affect the performance of the optical character recognition system while affecting the aesthetics and readability of the document image. The document image geometric correction method based on deep learning is greatly improved in correction performance and robustness to different document layouts. However, the existing deep learning correction methods focus only on correcting the document image which is cut, i.e. the document image with a smaller environment boundary area, and require a complete document boundary. However, in actual situations, the environmental boundary conditions are various, some document images have large environmental boundary areas, the foreground document areas occupy only a small part, and some document images do not have environmental boundary areas, so that complete document boundaries are not available. The aforementioned deep learning correction method has poor effects on such images.
Disclosure of Invention
In order to solve at least one of the technical problems existing in the prior art to a certain extent, the invention aims to provide a document image geometric correction method, a system, a device and a medium.
The technical scheme adopted by the invention is as follows:
a document image geometry correction method comprising the steps of:
Acquiring a first document image, classifying pixels in the first document image, distinguishing a foreground document area and an environment boundary area in the document image, and acquiring a mask image of the foreground document area;
extracting control points from the mask map, performing preliminary correction on the first document image according to the control points, deleting the environment boundary, and obtaining a second document image with the preliminary correction and the environment boundary deletion;
and acquiring a first coordinate offset matrix of the second document image, and acquiring a corrected third document image after offsetting the second document image according to the first coordinate offset matrix.
Further, the obtaining the corrected third document image includes:
Judging whether to execute an iteration step according to the first coordinate offset matrix, and taking the third document image as an output image if the iteration step is not required to be executed; otherwise, executing an iteration step;
The iterative steps include:
Acquiring a second coordinate offset matrix of the third document image, shifting the third document image according to the second coordinate offset matrix, updating the corrected image into the third document image, and recording the second coordinate offset matrix in the iterative step;
Judging whether to continue to execute the iteration step according to the second coordinate offset matrix, if so, returning to execute the previous step; and otherwise, shifting the second document image according to the first coordinate shift matrix and all second coordinate shift matrices in the record to obtain a corrected image as an output image.
Further, the classifying the pixels in the first document image includes:
Acquiring the classification confidence coefficient of each pixel position in the first document image by adopting a first deep convolutional neural network, and classifying according to the classification confidence coefficient;
the acquiring the first coordinate offset matrix of the second document image includes:
And acquiring a first coordinate offset matrix of the second document image by adopting a second deep convolutional neural network.
Further, the extracting the control point on the mask map, performing preliminary correction on the first document image according to the control point, deleting the environmental boundary, obtaining a second document image with preliminary correction and deleting the environmental boundary, including:
Extracting four corner points of the document on a mask diagram of a foreground document area by using a polygon fitting algorithm;
Drawing a vertical bisector on a line segment formed by taking adjacent corner points as endpoints according to a preset bisector proportion, and taking the intersection point of the bisector and the mask diagram boundary of the foreground document area as the bisector of the document boundary;
drawing a quadrilateral mask diagram by using four corner points, and calculating the intersection ratio according to the quadrilateral mask diagram and the mask diagram of the foreground document area;
if the intersection ratio is smaller than a first preset threshold value, not correcting, and taking the first document image as a second document image;
If the intersection ratio is larger than a first preset threshold value, using four corner points and a plurality of boundary equally dividing points as control points, using a thin plate spline interpolation algorithm to perform preliminary correction on the first document image, deleting the environment boundary, and obtaining a second document image with preliminary correction and deleting the environment boundary.
Further, after the second document image is shifted according to the first coordinate shift matrix, a corrected third document image is obtained, including:
The first coordinate offset matrix designates a two-dimensional offset vector for each pixel point in the second document image, and each pixel is offset according to the corresponding offset vector to obtain a corrected third document image;
the offset vector is used for representing the offset direction and distance on the two-dimensional plane.
Further, the determining whether to continue to execute the iterative step according to the second coordinate offset matrix includes:
Calculating the standard deviation of the second coordinate offset matrix;
if the standard deviation of the second coordinate offset matrix is larger than a second preset threshold value, continuing to execute the iteration step;
And if the standard deviation of the second coordinate offset matrix is smaller than a second preset threshold value, stopping executing the iteration step.
Further, the shifting the second document image according to the first coordinate shift matrix and all second coordinate shift matrices in the record, to obtain a corrected image as an output image, including:
matrix the first coordinate offset And a plurality of coordinate shift matrices recorded in the iterative stepSumming to obtain final coordinate offset matrixBased on the final coordinate shift matrixAnd shifting the second document image to obtain a corrected image as an output image.
The invention adopts another technical scheme that:
A document image geometry correction system comprising:
The pixel classification module is used for acquiring a first document image, classifying pixels in the first document image, distinguishing a foreground document area from an environment boundary area in the document image, and acquiring a mask image of the foreground document area;
The preliminary correction module is used for extracting control points from the mask image, carrying out preliminary correction on the first document image according to the control points, deleting the environment boundary, and obtaining a second document image with the preliminary correction and the environment boundary deletion;
And the offset correction module is used for acquiring a first coordinate offset matrix of the second document image, and obtaining a corrected third document image after the second document image is offset according to the first coordinate offset matrix.
The invention adopts another technical scheme that:
a document image geometry correction device comprising:
at least one processor;
at least one memory for storing at least one program;
the at least one program, when executed by the at least one processor, causes the at least one processor to implement the method described above.
The invention adopts another technical scheme that:
A computer readable storage medium, in which a processor executable program is stored, which when executed by a processor is adapted to carry out the method as described above.
The beneficial effects of the invention are as follows: the invention can process the photographed document image with different environment boundary areas, including the case of having a smaller environment boundary area, having a larger environment boundary area, or having no environment boundary area.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description is made with reference to the accompanying drawings of the embodiments of the present invention or the related technical solutions in the prior art, and it should be understood that the drawings in the following description are only for convenience and clarity of describing some embodiments in the technical solutions of the present invention, and other drawings may be obtained according to these drawings without the need of inventive labor for those skilled in the art.
FIG. 1 is a general flow chart of a method for geometric correction of a captured document image applicable to various environmental boundary conditions in an embodiment of the present invention;
FIG. 2 is a schematic diagram of a document image with environmental border areas removed by control point extraction and preliminary correction in an embodiment of the present invention;
FIG. 3 is a schematic diagram of iterative correction in an embodiment of the invention;
FIG. 4 is a graph of corrective effects on document images with different environmental boundary conditions in an embodiment of the invention.
FIG. 5 is a flowchart illustrating steps of a document image geometry correction method according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention. The step numbers in the following embodiments are set for convenience of illustration only, and the order between the steps is not limited in any way, and the execution order of the steps in the embodiments may be adaptively adjusted according to the understanding of those skilled in the art.
In the description of the present invention, it should be understood that references to orientation descriptions such as upper, lower, front, rear, left, right, etc. are based on the orientation or positional relationship shown in the drawings, are merely for convenience of description of the present invention and to simplify the description, and do not indicate or imply that the apparatus or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus should not be construed as limiting the present invention.
In the description of the present invention, a number means one or more, a number means two or more, and greater than, less than, exceeding, etc. are understood to not include the present number, and above, below, within, etc. are understood to include the present number. The description of the first and second is for the purpose of distinguishing between technical features only and should not be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.
In the description of the present invention, unless explicitly defined otherwise, terms such as arrangement, installation, connection, etc. should be construed broadly and the specific meaning of the terms in the present invention can be reasonably determined by a person skilled in the art in combination with the specific contents of the technical scheme.
As shown in fig. 5, the present embodiment provides a document image geometry correction method including the steps of:
S101, acquiring a first document image, classifying pixels in the first document image, distinguishing a foreground document area and an environment boundary area in the document image, and acquiring a mask image of the foreground document area;
S102, extracting control points from the mask map, performing preliminary correction on the first document image according to the control points, deleting the environment boundary, and obtaining a second document image with the preliminary correction and the environment boundary deletion;
s103, acquiring a first coordinate offset matrix of the second document image, and obtaining a corrected third document image after offsetting the second document image according to the first coordinate offset matrix.
In this embodiment, the first document image may be a document image obtained by photographing by a photographing device (such as an intelligent terminal, a video camera, or the like), a document image obtained by scanning, or the like.
In some embodiments, the step of obtaining the corrected third document image in step S103 specifically includes:
judging whether to execute the iteration step according to the first coordinate offset matrix, and if not, taking the third document image as an output image; otherwise, executing an iteration step;
Wherein the iterative steps include A1-A2:
A1, acquiring a second coordinate offset matrix of the third document image, shifting the third document image according to the second coordinate offset matrix, updating the corrected image into the third document image, and recording the second coordinate offset matrix in the iterative step;
A2, judging whether to continue to execute the iteration step according to the second coordinate offset matrix, if so, returning to execute the previous step; and otherwise, shifting the second document image according to the first coordinate shift matrix and all the second coordinate shift matrices in the record to obtain a corrected image as an output image.
In this embodiment, the flatness of the input image is reflected by the statistical information of the coordinate shift matrix; a low response indicates a flatter input pattern. And when the first coordinate shift matrix is smaller than a preset threshold value, directly taking the third document image as an output image. When the first coordinate offset matrix is larger than a preset threshold value, performing an iteration step, wherein the response of the coordinate offset matrix is lower and lower along with the increase of iteration times until the coordinate offset matrix of the document image is smaller than the preset threshold value; and adding all the obtained coordinate offset matrixes, and offsetting the second document image to obtain a corrected image serving as an output image.
The above method is explained in detail below with reference to specific embodiments and drawings.
As shown in fig. 1, the present embodiment provides a geometric correction method of a captured document image suitable for various environmental boundary conditions, which is used for solving the geometric correction problem of a captured document image with different environmental boundary conditions in an actual scene. The method specifically comprises the following steps:
s1, classifying each pixel of an input shot document image (namely a first document image), and distinguishing a foreground document area and an environment boundary area on the image to obtain a mask image of the foreground document area. This allows the foreground document region to be accurately separated from the document image.
S2, extracting control points on the mask map, performing preliminary correction on the input document image by using the control points, and removing the environment boundary to obtain a document image (namely a second document image) with the preliminary correction and the environment boundary removal as input of the next step.
The control points are extracted on the mask map, and because the control points are binary maps, the control points are used for carrying out preliminary correction on the input document image, and meanwhile, the environment boundaries are removed, so that the document image with the preliminary correction and the environment boundaries removed is obtained as the input of the next step.
Specifically, as shown in fig. 2, the step S2 includes steps S21 to S24:
S21, extracting four corner points of a document on a foreground document area mask diagram by using a Douglas-Peucker polygon fitting algorithm;
S22, distinguishing upper left, upper right, lower right and lower left corner points according to the relative position relation of the four corner points;
s23, drawing a vertical bisector on a line segment formed by taking adjacent corner points as endpoints according to a preset bisector proportion, and taking an intersection point of the bisector and a mask diagram boundary of a foreground document area as a bisector of a document boundary;
S24, drawing a quadrilateral mask diagram by using four corner points, calculating an intersection ratio with a foreground document region mask diagram, and if the intersection ratio is smaller than a preset threshold (the input shot document image can be judged to not contain a complete document boundary according to the small intersection ratio), not correcting, and taking the input document image as input of the next step; if the intersection ratio is larger than a preset threshold value (the input shot document image can be judged to contain complete document boundaries according to the large intersection ratio), using four corner points and a plurality of boundary equally dividing points as control points, using a thin plate spline interpolation algorithm to perform preliminary correction on the input document image, and removing the environment boundary at the same time, so that the document image after the preliminary correction and the removal of the environment boundary is obtained as input of the next step. Since the preliminary correction is performed by using the boundaries of the document, it is not reasonable to perform the above-described thin-plate spline correction if the input document image does not have the complete document boundaries, and here, such input document image that does not have the complete document boundaries is removed by setting the cross-over threshold, and the thin-plate spline correction is not performed, but is directly carried forward to the next step.
S3, predicting a coordinate offset matrix for the document image, obtaining a corrected document image after the document image is offset according to the coordinate offset matrix, and determining whether the corrected document image is subjected to iterative correction or not according to the statistical information of the coordinate offset matrix in a self-adaptive mode by taking the corrected document image as the input of the step S3. And after the iteration is stopped, obtaining a final corrected document image according to the obtained coordinate offset matrixes.
In some optional embodiments, in step S3, the coordinate offset matrix designates a two-dimensional offset vector for each pixel point in the input image, where the offset vector indicates an offset direction and a distance on the two-dimensional plane, and the pixels are offset according to the corresponding offset vectors to obtain the corrected document image, and the offset process uses linear interpolation.
In some alternative embodiments, step S3 uses the standard deviation as the statistical information of the coordinate shift matrix, because as the number of iterations increases, the input image becomes flatter, the corresponding predicted coordinate shift matrix becomes lower, the standard deviation becomes smaller, and when the standard deviation is smaller than a certain threshold value, we can consider that the corresponding input document image is sufficiently flat, so that the iteration can be stopped, and otherwise, the iteration continues. In this way, a balance of corrective performance and corrective efficiency can be achieved, making the system more efficient.
In some alternative embodiments, as shown in fig. 3, the correcting based on the plurality of coordinate offset matrices in step S3 includes: firstly, a plurality of coordinate offset matrixes are obtainedSumming to obtain final coordinate offset matrixBased on the final coordinate shift matrixAnd (3) carrying out coordinate shift on the output of the step S2 to obtain a final corrected document image. Not directly adopted hereThe reason why the document image is corrected as the final output is that: at this time, the document image has been sampled a plurality of times, which may cause a problem of blurring. But adoptsThe corrected document image obtained on the output of the step S2 is sampled only once, so that the problem of blurring can be effectively avoided.
In some optional embodiments, in step S1, a deep convolutional neural network is used to obtain a classification confidence coefficient of each pixel position, the parameters of the network are trained and optimized in advance by using synthetic data, and binarization is performed through a threshold of 0.5 during prediction, so as to obtain a final foreground document region mask map. The network parameter optimization specifically comprises the following steps:
(1) And (3) data acquisition: 100000 data samples (one data sample comprises one input shooting document image and a corresponding foreground document region mask map) in the Doc3D public synthetic data set are used as training (90000 data samples) and verification data (10000 data samples);
(2) Training a network:
(2-1) construction of a deep neural network: and using DeepLabv & lt3+ & gt segmentation model as a network structure, wherein the output class number is set to be 1, namely the channel number of the output result of the last layer of the network is set to be 1.
(2-2) Training mode: the training uses a gradient descent algorithm, and the aim of training the network is achieved by calculating the gradient from the last layer, transmitting the gradient layer by layer and updating all parameters. The loss function during training is a binary cross entropy loss.
(2-3) Setting of training parameters:
Iteration number: 50epoch
An optimizer: adam (Adam)
Learning rate: 0.0001 (learning Rate update strategy: learning Rate decays to 1/2 of the original after every 5 iterations
Weight decay:0.0005
(2-4) Starting training the deep neural network under random initialization parameters.
In some optional embodiments, in step S3, a deep convolutional neural network is used to obtain a coordinate offset matrix, and parameters of the network are pre-trained and optimized through synthetic data, which specifically includes:
(1) And (3) data acquisition: using 100000 data samples (one data sample comprises an input shot document image and a corresponding left offset matrix) in the Doc3D public synthetic data set as training (90000 data samples) and verification data (10000 data samples), and processing the samples through the steps S1 and S2 to remove the environmental boundary before training;
(2) Training a network:
(2-1) construction of a deep neural network: the network employs a downsampled-then-upsampled codec structure while employing a jump connection for preserving detail features and facilitating gradient backhaul as shown in table 1 below:
TABLE 1
Network layer Detailed operation Feature map size
Input layer - 3*448*448
Convolutional layer Number of kernels 32, convolution kernel 3*3, step 1*1, edge repair 32*448*448
Nonlinear layer - 32*448*448
Pooling layer Pool core 2 x 2, step size 2 x 2 32*224*224
Convolutional layer Number of kernels 64, convolution kernel 3*3, step 1*1, edge patch 64*224*224
Nonlinear layer - 64*224*224
Pooling layer Pool core 2 x 2, step size 2 x 2 64*112*112
Convolutional layer The number of kernels is 128, convolution kernel 3*3, step size 1*1 and edge repair 128*112*112
Nonlinear layer - 128*112*112
Pooling layer Pool core 2 x 2, step size 2 x 2 128*56*56
Convolutional layer The number of kernels is 256, the convolution kernel 3*3, the step length 1*1 and the edge is repaired 256*56*56
Nonlinear layer - 256*56*56
Pooling layer Pool core 2 x 2, step size 2 x 2 256*28*28
Convolutional layer The number of kernels is 128, convolution kernel 3*3, step size 1*1 and edge repair 512*28*28
Nonlinear layer - 512*28*28
Transposed convolutional layer Number of kernels 256, convolution kernel 4*4, step size 2 x 2, edge fill 256*56*56
Nonlinear layer - 256*56*56
Jump connection layer Splicing corresponding feature graphs in downsampling paths on channels 512*56*56
Convolutional layer The number of kernels is 256, the convolution kernel 3*3, the step length 1*1 and the edge is repaired 256*56*56
Nonlinear layer - 256*56*56
Transposed convolutional layer Number of kernels 128, convolution kernel 4*4, step size 2 x 2, edge-fill 128*112*112
Nonlinear layer - 128*112*112
Jump connection layer Splicing corresponding feature graphs in downsampling paths on channels 256*112*112
Convolutional layer The number of kernels is 128, convolution kernel 3*3, step size 1*1 and edge repair 128*112*112
Nonlinear layer - 128*112*112
Transposed convolutional layer The number of kernels, 64, the convolution kernel 4*4, step length 2 x 2, edge repair 64*224*224
Nonlinear layer - 64*224*224
Jump connection layer Splicing corresponding feature graphs in downsampling paths on channels 128*224*224
Convolutional layer Number of kernels 64, convolution kernel 3*3, step 1*1, edge patch 64*224*224
Nonlinear layer - 64*224*224
Transposed convolutional layer Number of kernels 32, convolution kernel 4*4, step size 2 x 2, edge-fill 32*448*448
Nonlinear layer - 32*448*448
Jump connection layer Splicing corresponding feature graphs in downsampling paths on channels 64*448*448
Convolutional layer Number of kernels 32, convolution kernel 3*3, step 1*1, edge repair 32*448*448
Nonlinear layer - 32*448*448
Convolutional layer Number of kernels 2, convolution kernel 3*3, step size 1*1, edge repair 2*448*448
Nonlinear layer - 2*448*448
(2-2) Training mode: the training uses a gradient descent algorithm, and the aim of training the network is achieved by calculating the gradient from the last layer, transmitting the gradient layer by layer and updating all parameters. The loss function during training is the mean square error loss.
(2-3) Setting of training parameters:
Iteration number: 50epoch
An optimizer: adam (Adam)
Learning rate: 0.0001 (learning Rate update strategy: learning Rate decays to 1/2 of the original after every 5 iterations
Weight decay:0.0005
(2-4) Starting training the deep neural network under random initialization parameters.
As shown in fig. 4, the method provided in this embodiment can process the photographed document images of various environmental boundary conditions, and can obtain a good correction effect. In summary, the method proposed in the present embodiment can process the captured document image with different environmental boundary areas, including the case of having a smaller environmental boundary area, having a larger environmental boundary area, and not having an environmental boundary area. Meanwhile, aiming at document images with different geometric deformation degrees, the method provided by the embodiment can adaptively determine the iteration times, and a better correction effect is obtained.
The embodiment also provides a document image geometry correction system, which comprises:
The pixel classification module is used for acquiring a first document image, classifying pixels in the first document image, distinguishing a foreground document area from an environment boundary area in the document image, and acquiring a mask image of the foreground document area;
The preliminary correction module is used for extracting control points from the mask image, carrying out preliminary correction on the first document image according to the control points, deleting the environment boundary, and obtaining a second document image with the preliminary correction and the environment boundary deletion;
And the offset correction module is used for acquiring a first coordinate offset matrix of the second document image, and obtaining a corrected third document image after the second document image is offset according to the first coordinate offset matrix.
The document image geometric correction system can execute any combination implementation steps of the document image geometric correction method provided by the method embodiment of the invention, and has the corresponding functions and beneficial effects of the method.
The embodiment also provides a document image geometry correction device, which comprises:
at least one processor;
at least one memory for storing at least one program;
The at least one program, when executed by the at least one processor, causes the at least one processor to implement the method illustrated in fig. 5.
The document image geometric correction system can execute any combination implementation steps of the document image geometric correction method provided by the method embodiment of the invention, and has the corresponding functions and beneficial effects of the method.
Embodiments of the present application also disclose a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions may be read from a computer-readable storage medium by a processor of a computer device, and executed by the processor, to cause the computer device to perform the method shown in fig. 5.
The embodiment also provides a storage medium which stores instructions or programs for executing the document image geometric correction method provided by the embodiment of the method, and when the instructions or programs are run, any combination of the embodiments of the executable method implements steps, and the method has corresponding functions and beneficial effects.
In some alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flowcharts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed, and in which sub-operations described as part of a larger operation are performed independently.
Furthermore, while the invention is described in the context of functional modules, it should be appreciated that, unless otherwise indicated, one or more of the described functions and/or features may be integrated in a single physical device and/or software module or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary to an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be apparent to those skilled in the art from consideration of their attributes, functions and internal relationships. Accordingly, one of ordinary skill in the art can implement the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative and are not intended to be limiting upon the scope of the invention, which is to be defined in the appended claims and their full scope of equivalents.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
In the foregoing description of the present specification, reference has been made to the terms "one embodiment/example", "another embodiment/example", "certain embodiments/examples", and the like, means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiment of the present application has been described in detail, the present application is not limited to the above embodiments, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the present application, and these equivalent modifications and substitutions are intended to be included in the scope of the present application as defined in the appended claims.

Claims (7)

1. A document image geometry correction method, comprising the steps of:
Acquiring a first document image, classifying pixels in the first document image, distinguishing a foreground document area and an environment boundary area in the document image, and acquiring a mask image of the foreground document area;
extracting control points from the mask map, performing preliminary correction on the first document image according to the control points, deleting the environment boundary, and obtaining a second document image with the preliminary correction and the environment boundary deletion;
acquiring a first coordinate offset matrix of the second document image, and acquiring a corrected third document image after offsetting the second document image according to the first coordinate offset matrix;
the obtaining the corrected third document image includes:
Judging whether to execute an iteration step according to the first coordinate offset matrix, and taking the third document image as an output image if the iteration step is not required to be executed; otherwise, executing an iteration step;
The iterative steps include:
Acquiring a second coordinate offset matrix of the third document image, shifting the third document image according to the second coordinate offset matrix, updating the corrected image into the third document image, and recording the second coordinate offset matrix in the iterative step;
Judging whether to continue to execute the iteration step according to the second coordinate offset matrix, if so, returning to execute the previous step; otherwise, shifting the second document image according to the first coordinate shift matrix and all second coordinate shift matrices in the record to obtain a corrected image as an output image;
the step of judging whether to continue to execute the iteration step according to the second coordinate offset matrix comprises the following steps:
Calculating the standard deviation of the second coordinate offset matrix;
if the standard deviation of the second coordinate offset matrix is larger than a second preset threshold value, continuing to execute the iteration step;
if the standard deviation of the second coordinate offset matrix is smaller than a second preset threshold value, stopping executing the iteration step;
the shifting of the second document image according to the first coordinate shift matrix and all the second coordinate shift matrices in the record, and obtaining a corrected image as an output image, including:
matrix the first coordinate offset And a plurality of coordinate shift matrices recorded in the iterative stepSumming to obtain final coordinate offset matrixBased on the final coordinate shift matrixAnd shifting the second document image to obtain a corrected image as an output image.
2. The method for geometrically correcting a document image according to claim 1, wherein, the classifying pixels in the first document image includes:
Acquiring the classification confidence coefficient of each pixel position in the first document image by adopting a first deep convolutional neural network, and classifying according to the classification confidence coefficient;
the acquiring the first coordinate offset matrix of the second document image includes:
And acquiring a first coordinate offset matrix of the second document image by adopting a second deep convolutional neural network.
3. The method for geometrically correcting a document image according to claim 1, wherein said extracting control points on said mask map, performing preliminary correction on said first document image based on said control points, deleting an environmental boundary, obtaining a preliminary corrected second document image with the environmental boundary deleted, comprises:
extracting four corner points of the document on a mask map of a foreground document area by using a polygon fitting algorithm;
Drawing a vertical bisector on a line segment formed by taking adjacent corner points as endpoints according to a preset bisector proportion, and taking the intersection point of the bisector and the mask diagram boundary of the foreground document area as the bisector of the document boundary;
drawing a quadrilateral mask diagram by using four corner points, and calculating the intersection ratio according to the quadrilateral mask diagram and the mask diagram of the foreground document area;
if the intersection ratio is smaller than a first preset threshold value, not correcting, and taking the first document image as a second document image;
If the intersection ratio is larger than a first preset threshold value, using four corner points and a plurality of boundary equally dividing points as control points, using a thin plate spline interpolation algorithm to perform preliminary correction on the first document image, deleting the environment boundary, and obtaining a second document image with preliminary correction and deleting the environment boundary.
4. The method of claim 1, wherein said obtaining a corrected third document image after said shifting said second document image according to said first coordinate shift matrix comprises:
The first coordinate offset matrix designates a two-dimensional offset vector for each pixel point in the second document image, and each pixel is offset according to the corresponding offset vector to obtain a corrected third document image;
the offset vector is used for representing the offset direction and distance on the two-dimensional plane.
5. A document image geometry correction system for performing a document image geometry correction method according to any one of claims 1-4, comprising:
The pixel classification module is used for acquiring a first document image, classifying pixels in the first document image, distinguishing a foreground document area from an environment boundary area in the document image, and acquiring a mask image of the foreground document area;
the preliminary correction module is used for extracting control points from the mask image, carrying out preliminary correction on the first document image according to the control points, deleting the environment boundary, and obtaining a second document image with the preliminary correction and the environment boundary deletion;
And the offset correction module is used for acquiring a first coordinate offset matrix of the second document image, and obtaining a corrected third document image after the second document image is offset according to the first coordinate offset matrix.
6. A document image geometry correction device, comprising:
at least one processor;
at least one memory for storing at least one program;
the at least one program, when executed by the at least one processor, causes the at least one processor to implement the method of any one of claims 1-4.
7. A computer readable storage medium, in which a processor executable program is stored, characterized in that the processor executable program is for performing the method according to any of claims 1-4 when being executed by a processor.
CN202111584077.4A 2021-12-22 2021-12-22 Document image geometric correction method, system, device and medium Active CN114418869B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111584077.4A CN114418869B (en) 2021-12-22 2021-12-22 Document image geometric correction method, system, device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111584077.4A CN114418869B (en) 2021-12-22 2021-12-22 Document image geometric correction method, system, device and medium

Publications (2)

Publication Number Publication Date
CN114418869A CN114418869A (en) 2022-04-29
CN114418869B true CN114418869B (en) 2024-08-13

Family

ID=81267830

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111584077.4A Active CN114418869B (en) 2021-12-22 2021-12-22 Document image geometric correction method, system, device and medium

Country Status (1)

Country Link
CN (1) CN114418869B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115187995B (en) * 2022-07-08 2023-04-18 北京百度网讯科技有限公司 Document correction method, device, electronic equipment and storage medium
CN116030120B (en) * 2022-09-09 2023-11-24 北京市计算中心有限公司 Method for identifying and correcting hexagons
CN117853382B (en) * 2024-03-04 2024-05-28 武汉人工智能研究院 Sparse marker-based image correction method, device and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112767270A (en) * 2021-01-19 2021-05-07 中国科学技术大学 Fold document image correction system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019178702A1 (en) * 2018-03-23 2019-09-26 The Governing Council Of The University Of Toronto Systems and methods for polygon object annotation and a method of training an object annotation system
CN111414915B (en) * 2020-02-21 2024-03-26 华为技术有限公司 Character recognition method and related equipment
KR102361444B1 (en) * 2020-03-06 2022-02-11 주식회사 테스트웍스 System and method of quality adjustment of object detection based on polyggon
CN111401371B (en) * 2020-06-03 2020-09-08 中邮消费金融有限公司 Text detection and identification method and system and computer equipment
CN112766266B (en) * 2021-01-29 2021-12-10 云从科技集团股份有限公司 Text direction correction method, system and device based on staged probability statistics

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112767270A (en) * 2021-01-19 2021-05-07 中国科学技术大学 Fold document image correction system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Marior: Margin Removal and Iterative Content Rectification for Document Dewarping in theWild;Jiaxin Zhang et al.;《arXiv:2207.11515v1》;20220723;第1-11页 *

Also Published As

Publication number Publication date
CN114418869A (en) 2022-04-29

Similar Documents

Publication Publication Date Title
CN114418869B (en) Document image geometric correction method, system, device and medium
CN108388896B (en) License plate identification method based on dynamic time sequence convolution neural network
WO2022141178A1 (en) Image processing method and apparatus
CN111091567B (en) Medical image registration method, medical device and storage medium
CN110807731B (en) Method, device, system and storage medium for compensating image dead pixel
CN113076871A (en) Fish shoal automatic detection method based on target shielding compensation
CN112241976A (en) Method and device for training model
CN114529837A (en) Building outline extraction method, system, computer equipment and storage medium
CN110909663A (en) Human body key point identification method and device and electronic equipment
CN105701770B (en) A kind of human face super-resolution processing method and system based on context linear model
CN115410030A (en) Target detection method, target detection device, computer equipment and storage medium
CN114140623A (en) Image feature point extraction method and system
CN111914596A (en) Lane line detection method, device, system and storage medium
CN110516731B (en) Visual odometer feature point detection method and system based on deep learning
CN114782355B (en) Gastric cancer digital pathological section detection method based on improved VGG16 network
CN113744280B (en) Image processing method, device, equipment and medium
CN116091823A (en) Single-feature anchor-frame-free target detection method based on fast grouping residual error module
CN113808033A (en) Image document correction method, system, terminal and medium
CN113963072A (en) Binocular camera calibration method and device, computer equipment and storage medium
CN111612827B (en) Target position determining method and device based on multiple cameras and computer equipment
CN113191959B (en) Digital imaging system limit image quality improving method based on degradation calibration
CN109816613B (en) Image completion method and device
CN113497886B (en) Video processing method, terminal device and computer-readable storage medium
CN113255405B (en) Parking space line identification method and system, parking space line identification equipment and storage medium
CN114445277A (en) Depth image pixel enhancement method and device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant