CN117095417A - Screen shot form image text recognition method, device, equipment and storage medium - Google Patents
Screen shot form image text recognition method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN117095417A CN117095417A CN202311076101.2A CN202311076101A CN117095417A CN 117095417 A CN117095417 A CN 117095417A CN 202311076101 A CN202311076101 A CN 202311076101A CN 117095417 A CN117095417 A CN 117095417A
- Authority
- CN
- China
- Prior art keywords
- target
- pixel matrix
- preset
- image
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 60
- 239000011159 matrix material Substances 0.000 claims abstract description 186
- 238000001514 detection method Methods 0.000 claims abstract description 68
- 238000012015 optical character recognition Methods 0.000 claims abstract description 39
- 230000008030 elimination Effects 0.000 claims abstract description 35
- 238000003379 elimination reaction Methods 0.000 claims abstract description 35
- 238000005516 engineering process Methods 0.000 claims abstract description 35
- 238000012545 processing Methods 0.000 claims abstract description 25
- 230000009466 transformation Effects 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 14
- 238000007781 pre-processing Methods 0.000 claims description 8
- 238000006243 chemical reaction Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000012163 sequencing technique Methods 0.000 description 3
- 238000013507 mapping Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000003708 edge detection Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 210000000629 knee joint Anatomy 0.000 description 1
- 238000002595 magnetic resonance imaging Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/412—Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/1444—Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/15—Cutting or merging image elements, e.g. region growing, watershed or clustering-based techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/18—Extraction of features or characteristics of the image
- G06V30/1801—Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/18—Extraction of features or characteristics of the image
- G06V30/18105—Extraction of features or characteristics of the image related to colour
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/02—Recognising information on displays, dials, clocks
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Character Input (AREA)
Abstract
The application discloses a screen shot form image text recognition method, a device, equipment and a storage medium, relating to the technical field of image recognition, comprising the following steps: inputting a target screen shot form image into a preset moire elimination model to obtain a first target pixel matrix with moire removed, and performing exposure processing on the first target pixel matrix to determine a second target pixel matrix corresponding to a target form area on the target screen shot form image; inputting the second target pixel matrix into a preset table detection model to obtain target cell vertex coordinates, and determining a cell pixel matrix based on the target cell vertex coordinates; and splicing the cells in the cell pixel matrix to obtain a target image, and performing text detection on the target image by using a preset optical character recognition technology to obtain the text on the target form area. Therefore, automatic image calibration and character recognition are realized, manual intervention is reduced, and recognition efficiency is improved.
Description
Technical Field
The present application relates to the field of image recognition technologies, and in particular, to a method, an apparatus, a device, and a storage medium for recognizing a screen shot form image text.
Background
Today, image processing techniques have been widely used in various industries. However, the identification of low quality on-screen form images has been a major concern due to technical limitations and physical condition limitations. Such images often have problems with high noise, distortion, and low contrast, which presents a significant challenge for form detection and text content recognition. In the prior art, the form on the image is detected after image preprocessing operations such as gamma conversion, perspective conversion and the like are adopted. Since form data of the electronic screen is recorded as one of actual scenes of natural photographing, array aliasing occurs between a display element of the display device and a photosensitive element of the photographing device, and it is difficult to avoid the occurrence of moire phenomenon. In the prior art, the image preprocessing method (such as graying, gamma conversion and perspective conversion) is difficult to eliminate mole marks on the image, and the mole marks can be covered on the text content and the table frame line, so that the problems of missed detection, false detection and the like of an OCR (Optical Character Recognition, namely optical character recognition technology) model recognition result occur. In view of this, there are a number of difficulties and challenges in low quality on-screen form image recognition, including high noise, moire, etc., that need to be addressed by conventional techniques.
Disclosure of Invention
In view of the above, the present application aims to provide a method, a device and a storage medium for recognizing a screen shot form image text, which can improve the accuracy and efficiency of automatic image calibration and text recognition. The specific scheme is as follows:
in a first aspect, the application discloses a screen shot form image text recognition method, which comprises the following steps:
inputting a target screen shot form image into a preset moire elimination model to obtain a first target pixel matrix with moire removed, and performing exposure processing on the first target pixel matrix to determine a second target pixel matrix corresponding to a target form area on the target screen shot form image;
inputting the second target pixel matrix into a preset table detection model to obtain target cell vertex coordinates, and determining a cell pixel matrix based on the target cell vertex coordinates;
and splicing the cells in the cell pixel matrix to obtain a target image, and performing text detection on the target image by using a preset optical character recognition technology to obtain the text on the target form area.
Optionally, before inputting the target screen shot form image to the preset moire elimination model to obtain the first target pixel matrix with the moire removed, the method further includes:
and adjusting the initial moire elimination model through a preset screen shot image data set to obtain the preset moire elimination model.
Optionally, the exposing the first target pixel matrix to determine a second target pixel matrix corresponding to a target form area on the target screen shot form image includes:
performing exposure processing on the first target pixel matrix based on a preset linear transformation formula to obtain an exposed pixel matrix, and cutting the exposed pixel matrix based on a preset pixel cutting range to obtain a target exposed pixel matrix;
and determining a target form area on the target screen shot form image according to the outline vertex coordinates and the preset vertex coordinates corresponding to the target exposed pixel matrix, and calibrating the target form area through a preset perspective transformation algorithm to obtain a second target pixel matrix corresponding to the target form area.
Optionally, before performing exposure processing on the first target pixel matrix based on the preset linear transformation formula to obtain an exposed pixel matrix and clipping the exposed pixel matrix based on a preset pixel clipping range to obtain the target exposed pixel matrix, the method further includes:
converting the first target pixel matrix from an RGB color space to an HLS color space;
correspondingly, after performing exposure processing on the first target pixel matrix based on a preset linear transformation formula to obtain an exposed pixel matrix and cutting the exposed pixel matrix based on a preset pixel cutting range to obtain a target exposed pixel matrix, the method further comprises:
the first target pixel matrix is restored from HLS color space to RGB color space.
Optionally, the inputting the second target pixel matrix to a preset table detection model to obtain a target cell vertex coordinate, and determining the cell pixel matrix based on the target cell vertex coordinate includes:
inputting the second target pixel matrix to a preset table detection model to obtain the vertex coordinates of the target cells;
determining a corresponding cell pixel matrix from the second target pixel matrix based on the target cell vertex coordinates;
and ordering the cell pixel matrix based on a preset ordering rule to determine the cell pixel matrix.
Optionally, the stitching the cells in the cell pixel matrix to obtain a target image, and performing text detection on the target image by using a preset optical character recognition technology to obtain a text on the target form area, including:
adjusting the widths of the cells in the cell pixel matrix to obtain a new cell pixel matrix, and splicing the cells in the new cell pixel matrix in the vertical direction to obtain a target image;
and carrying out text detection on the target image by using a preset optical character recognition technology to obtain text content on the target form area and coordinates corresponding to the text content.
Optionally, after performing text detection on the target image by using a preset optical character recognition technology to obtain text content on the target form area and coordinates corresponding to the text content, the method further includes:
integrating the text content and the coordinates corresponding to the text content to generate a target table, and storing the target table through a preset file format.
In a second aspect, the present application discloses a screen shot form image text recognition device, comprising:
the image preprocessing module is used for inputting a target screen shot form image into a preset moire elimination model to obtain a first target pixel matrix with moire removed, and carrying out exposure processing on the first target pixel matrix to determine a second target pixel matrix corresponding to a target form area on the target screen shot form image;
the table detection module is used for inputting the second target pixel matrix into a preset table detection model to obtain target cell vertex coordinates, and determining a cell pixel matrix based on the target cell vertex coordinates;
and the cell text recognition module is used for splicing cells in the cell pixel matrix to obtain a target image, and performing text detection on the target image by utilizing a preset optical character recognition technology to obtain the text on the target form area.
In a third aspect, the present application discloses an electronic device, comprising:
a memory for storing a computer program;
and the processor is used for executing the computer program to realize the screen shot form image text recognition method.
In a fourth aspect, the present application discloses a computer readable storage medium storing a computer program which, when executed by a processor, implements the aforementioned method for identifying a screen shot form image text.
In the method, a target screen shot form image is input into a preset moire elimination model to obtain a first target pixel matrix with moire removed, and exposure processing is carried out on the first target pixel matrix to determine a second target pixel matrix corresponding to a target form area on the target screen shot form image; inputting the second target pixel matrix into a preset table detection model to obtain target cell vertex coordinates, and determining a cell pixel matrix based on the target cell vertex coordinates; and splicing the cells in the cell pixel matrix to obtain a target image, and performing text detection on the target image by using a preset optical character recognition technology to obtain the text on the target form area. The method comprises the steps of eliminating moire on a target screen shot form image through a preset moire elimination model to obtain a first target pixel matrix, exposing and checking the first target pixel matrix to determine a second target pixel matrix corresponding to a target form area, detecting the second target pixel matrix through a preset form detection model to obtain vertex coordinate information of each cell in the second target pixel matrix, extracting and arranging pixels of the cells to generate a target image, detecting the target image through a preset optical character recognition technology, and actually detecting each cell on the target image to further obtain text on the target form area. According to the application, the performance of character recognition by the preset optical character recognition technology is improved through operations such as moire elimination and image exposure, and when the character is recognized by the preset optical character recognition technology, each cell on the target image is recognized, so that the method is more accurate compared with the whole recognition, and the occurrence of false detection or omission detection is reduced. Therefore, the problems of low accuracy, low efficiency and poor using effect of form detection and character recognition caused by high noise, moire and the like can be solved, the recognition accuracy and efficiency are improved, and the method can be widely applied to the fields of image processing, document management, financial tax management and the like.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for recognizing a screen shot form image text;
FIG. 2 is a flowchart of a text image preprocessing of a specific on-screen form image disclosed in the present application;
FIG. 3 is a flowchart of a specific method for identifying text in a screen shot form image according to the present disclosure;
FIG. 4 is a flowchart of a specific on-screen form image text form verification disclosed herein;
FIG. 5 is a flowchart of a specific method for identifying text in a screen shot form image according to the present disclosure;
FIG. 6 is a schematic diagram of a screen shot form image text recognition device according to the present application;
fig. 7 is a block diagram of an electronic device according to the present disclosure.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In the aspect of low-quality screen shot form image recognition, the conventional technology has a plurality of difficulties and challenges, and the embodiment specifically introduces a screen shot form image text recognition method, so that the problems can be effectively overcome, and the recognition precision and efficiency are improved.
Referring to fig. 1, the embodiment of the application discloses a screen shot form image text recognition method, which comprises the following steps:
step S11: inputting a target screen shot form image into a preset moire elimination model to obtain a first target pixel matrix with moire removed, and carrying out exposure processing on the first target pixel matrix to determine a second target pixel matrix corresponding to a target form area on the target screen shot form image.
In this embodiment, as shown in fig. 2, before inputting the target screen shot form image to the preset moire elimination model to obtain the first target pixel matrix with the moire removed, the method further includes: and adjusting the initial moire elimination model through a preset screen shot image data set to obtain the preset moire elimination model. The initial moire elimination model used in the application is constructed based on the champion scheme MRNet of NTIRE 2021 (a convolutional neural network training knee joint magnetic resonance imaging graph classifier). After the initial mole pattern elimination model is obtained, fine adjustment is carried out on the initial mole pattern elimination model through a real acquisition screen shot image data set to obtain a preset mole pattern elimination model. Therefore, the preset moire elimination model can be more in line with the use condition of the current image processing. Then the acquired target screen shot form image I raw Inputting (x, y) into a preset moire elimination model, and removing the moire of the target screen shot form image by the preset moire elimination model to obtain a first target pixel matrix I with the moire removed mr (x, y). Then, performing exposure processing on the first target pixel matrix based on a preset linear transformation formula to obtain an exposed pixel matrix; i.e. converting the first target pixel matrix from RGB (red green blue) color space to HLS (Hue Saturation Lighmess, i.e. hue, saturation, brightness) color space, i.e. converting the first target pixel matrix I mr (x, y) conversion from RGB color space to HLS color space I hls (x, y) and then adjusting the brightness using a linear transformation, the formula is as follows:
I hls (x,y)=α·I mr (x,y)+β;
where α is a scaling factor of the linear transformation, and β is an offset of the linear transformation, which can be obtained by analyzing the image histogram. And then clipping the exposed pixel matrix based on a preset pixel clipping range to obtain a target exposed pixel matrix. The operation of clipping pixel values can be expressed as:
the first target pixel matrix is then restored from the HLS color space to the RGB color space, i.e. the first target pixel matrix I hls (x, y) reduction from HLS color space to RGB color space I rgb (x, y). Then in turn to I rgb And (x, y) carrying out graying and Gaussian blur, constructing an edge detection operator to finish edge and contour detection and contour polygon fitting, and finally obtaining four vertex coordinates of the contour of the outermost layer of the form area. Then according to the form outermost contour vertex coordinates and target vertex coordinates, solving to obtain a mapping exchange matrix M of the form outermost contour vertex coordinates and target vertex coordinates, and finally carrying out perspective transformation on I through a perspective transformation algorithm based on the mapping exchange matrix M rgb (x, y) calibrating to obtain a calibrated second target pixel matrix I corresponding to the target form area cal (x, y). The perspective transformation can be expressed as:
step S12: and inputting the second target pixel matrix into a preset table detection model to obtain target cell vertex coordinates, and determining a cell pixel matrix based on the target cell vertex coordinates.
In this embodiment, the second target pixel matrix I cal And (x, y) inputting the vertex coordinate information of each cell in the second target pixel matrix into a preset table detection model, and extracting the corresponding cell pixel matrix according to the vertex coordinate information of the cell to obtain a list related to the cell pixel matrix.
Step S13: and splicing the cells in the cell pixel matrix to obtain a target image, and performing text detection on the target image by using a preset optical character recognition technology to obtain the text on the target form area.
In this embodiment, the unit cell pixel momentThe elements in the list of the array are filled accordingly so that the widths of all cells are equal, creating a new list of cell pixel matrices. Then splicing the list elements along the vertical direction to obtain a new image I new (x, y). And then performing text detection on the target image by using a preset optical character recognition technology to obtain the text on the target form area. It should be noted that, when the text is detected by using the preset optical character recognition technology, the text is detected by taking the unit cell as a unit, so that the recognition accuracy is greatly improved.
In this embodiment, a target screen shot form image is input to a preset moire elimination model to obtain a first target pixel matrix with moire removed, and exposure processing is performed on the first target pixel matrix to determine a second target pixel matrix corresponding to a target form area on the target screen shot form image; inputting the second target pixel matrix into a preset table detection model to obtain target cell vertex coordinates, and determining a cell pixel matrix based on the target cell vertex coordinates; and splicing the cells in the cell pixel matrix to obtain a target image, and performing text detection on the target image by using a preset optical character recognition technology to obtain the text on the target form area. The method comprises the steps of eliminating moire on a target screen shot form image through a preset moire elimination model to obtain a first target pixel matrix, exposing and checking the first target pixel matrix to determine a second target pixel matrix corresponding to a target form area, detecting the second target pixel matrix through a preset form detection model to obtain vertex coordinate information of each cell in the second target pixel matrix, extracting and arranging pixels of the cells to generate a target image, detecting the target image through a preset optical character recognition technology, and actually detecting each cell on the target image to further obtain text on the target form area. According to the application, the performance of character recognition by the preset optical character recognition technology is improved through operations such as moire elimination and image exposure, and when the character is recognized by the preset optical character recognition technology, each cell on the target image is recognized, so that the method is more accurate compared with the whole recognition, and the occurrence of false detection or omission detection is reduced. Therefore, the problems of low accuracy, low efficiency and poor using effect of form detection and character recognition caused by high noise, moire and the like can be solved, the recognition accuracy and efficiency are improved, and the method can be widely applied to the fields of image processing, document management, financial tax management and the like.
The above embodiment mainly introduces a screen shot form image text recognition method from the aspect of picture preprocessing, and the present embodiment introduces a screen shot form image text recognition method from the aspect of form detection and text recognition.
Referring to fig. 3, the embodiment of the application discloses a specific screen shot form image text recognition method, which comprises the following steps:
step S21: and inputting the second target pixel matrix into a preset table detection model to obtain the vertex coordinates of the target cells.
In this embodiment, as shown in FIG. 4, a second matrix of target pixels I cal (x, y) is input into a preset table detection model which outputs vertex coordinate information V= { V of each cell in the table 1 ,v 2 ,…,v n }, where v i =(x i1 ,y i1 ,x i2 ,y i2 ,x i3 ,y i3 ,x i4 ,y i4 ) The four vertex coordinates (upper left, lower right, upper right in order) of the ith cell are represented. It should be noted that, the Table-OCR model commonly used for the task of automatic detection of the preset Table detection model Table is developed, and the Table-OCR realizes automatic detection and Table reconstruction of the document Table based on the dark frame. In this way, the accuracy of table detection can be improved.
Step S22: and determining a corresponding cell pixel matrix from the second target pixel matrix based on the target cell vertex coordinates.
In this embodiment, the output vertex coordinate information of the cell is used in the I cal Extracting the pixel matrix of the corresponding cell on (x, y), and for the ith cell, extracting the pixel matrix P of the corresponding cell i Can be represented by I cal (x, y) extract, i.e. P i ={I cal (x,y)|x∈[x i1 ,x i3 ],y∈[y i1 ,y i2 ]}。
Step S23: and ordering the cell pixel matrix based on a preset ordering rule to determine the cell pixel matrix.
In this embodiment, the cell pixel matrix is ordered based on a preset ordering rule to determine the cell pixel matrix, that is, assuming that the form has m rows and n columns of cells, the cells may be arranged according to the order of "from left to right and from top to bottom", and finally a list p= { P about the cell pixel matrix is obtained 1 ,P 2 ,…,P mn }。
Step S24: and adjusting the cell width in the cell pixel matrix to obtain a new cell pixel matrix, and splicing the cells in the new cell pixel matrix in the vertical direction to obtain a target image.
In this embodiment, as shown in fig. 5, for the cell pixel matrix list p= { P 1 ,P 2 ,…,P mn Filling elements in the form, namely filling two sides of each cell with a white background with a pixel value of 255 in sequence to ensure that the widths of all the cells are equal to generate a new cell pixel matrix, wherein the serial single-sheet input reasoning mode of the OCR model can cause overlong reasoning time, so that a plurality of cell pixel matrixes in the form can be spliced into a new image I in a longitudinal stacking mode new (x, y) to reduce the number of inferences and time. Specifically, for the new cell pixel matrix list, list elements are spliced along the vertical direction to obtain a new image I new (x,y)。
Step S25: and carrying out text detection on the target image by using a preset optical character recognition technology to obtain text content on a target form area and coordinates corresponding to the text content.
In the present embodimentAnd constructing a preset optical character recognition model, wherein the text detection module is constructed based on a EAST (Efficient and Accurate Scene Text) model of the MobileNet V3 backbone network, and the text recognition model is constructed based on a RARE (Robust text recognizer with Automatic Rectification, namely a reliable text recognizer with an automatic correction function) model of the MobileNet V3 backbone network. And then, performing text detection on each cell in the target image by using a preset optical character recognition technology to obtain text content on a target form area and coordinates corresponding to the text content. I.e. for image I new (x, y), the recognized text content and its coordinates are t= { (T) 1 ,x 1 ,y 1 ),(t 2 ,x 2 ,y 2 ),…,(t k ,x k ,y k ) }. Wherein t is i Representing the content of the ith cell text, x i And y i Representing the upper left corner coordinates of the ith cell text box.
Step S26: integrating the text content and the coordinates corresponding to the text content to generate a target table, and storing the target table through a preset file format.
In this embodiment, the text content and coordinates corresponding to the text content are integrated and output, and finally are presented in a form of m rows and n columns. And storing the result as an Excel table according to the actual requirement, or performing data processing and analysis in other forms.
Therefore, in the embodiment of the application, the performance of character recognition by the preset optical character recognition technology is improved through operations such as moire elimination and image exposure, and when the character is recognized by the preset optical character recognition technology, each cell on the target image is recognized, so that the method is more accurate compared with the whole recognition, and the occurrence of false detection or omission detection is reduced. Therefore, the problems of low accuracy, low efficiency and poor using effect of form detection and character recognition caused by high noise, moire and the like can be solved, the recognition accuracy and efficiency are improved, and the method can be widely applied to the fields of image processing, document management, financial tax management and the like.
As described with reference to fig. 6, the embodiment of the present application further correspondingly discloses a device for identifying text of a screen shot form image, including:
the image preprocessing module 11 is configured to input a target screen shot form image to a preset moire elimination model to obtain a first target pixel matrix with moire removed, and perform exposure processing on the first target pixel matrix to determine a second target pixel matrix corresponding to a target form area on the target screen shot form image;
a table detection module 12, configured to input the second target pixel matrix to a preset table detection model to obtain a target cell vertex coordinate, and determine a cell pixel matrix based on the target cell vertex coordinate;
and the cell text recognition module 13 is used for splicing cells in the cell pixel matrix to obtain a target image, and performing text detection on the target image by utilizing a preset optical character recognition technology to obtain the text on the target form area.
In this embodiment, a target screen shot form image is input to a preset moire elimination model to obtain a first target pixel matrix with moire removed, and exposure processing is performed on the first target pixel matrix to determine a second target pixel matrix corresponding to a target form area on the target screen shot form image; inputting the second target pixel matrix into a preset table detection model to obtain target cell vertex coordinates, and determining a cell pixel matrix based on the target cell vertex coordinates; and splicing the cells in the cell pixel matrix to obtain a target image, and performing text detection on the target image by using a preset optical character recognition technology to obtain the text on the target form area. The method comprises the steps of eliminating moire on a target screen shot form image through a preset moire elimination model to obtain a first target pixel matrix, exposing and checking the first target pixel matrix to determine a second target pixel matrix corresponding to a target form area, detecting the second target pixel matrix through a preset form detection model to obtain vertex coordinate information of each cell in the second target pixel matrix, extracting and arranging pixels of the cells to generate a target image, detecting the target image through a preset optical character recognition technology, and actually detecting each cell on the target image to further obtain text on the target form area. According to the application, the performance of character recognition by the preset optical character recognition technology is improved through operations such as moire elimination and image exposure, and when the character is recognized by the preset optical character recognition technology, each cell on the target image is recognized, so that the method is more accurate compared with the whole recognition, and the occurrence of false detection or omission detection is reduced. Therefore, the problems of low accuracy, low efficiency and poor using effect of form detection and character recognition caused by high noise, moire and the like can be solved, the recognition accuracy and efficiency are improved, and the method can be widely applied to the fields of image processing, document management, financial tax management and the like.
In some specific embodiments, the device for identifying text of a screen shot form image may further include:
and the model fine adjustment module is used for adjusting the initial mole pattern elimination model through a preset screen shot image data set so as to obtain the preset mole pattern elimination model.
In some specific embodiments, the image preprocessing module 11 may specifically include:
the image exposure unit is used for carrying out exposure processing on the first target pixel matrix based on a preset linear transformation formula to obtain an exposed pixel matrix, and cutting the exposed pixel matrix based on a preset pixel cutting range to obtain a target exposed pixel matrix;
and the image calibration unit is used for determining a target form area on the target screen shot form image according to the outline vertex coordinates and the preset vertex coordinates corresponding to the target exposed pixel matrix, and calibrating the target form area through a preset perspective transformation algorithm to obtain a second target pixel matrix corresponding to the target form area.
In some specific embodiments, the device for identifying text of a screen shot form image may further include:
a color space conversion module for converting the first target pixel matrix from an RGB color space to an HLS color space;
and the color space restoring module is used for restoring the first target pixel matrix from the HLS color space to the RGB color space.
In some specific embodiments, the table detection module 12 may specifically include:
the vertex coordinate determining unit is used for inputting the second target pixel matrix into a preset table detection model to obtain the vertex coordinates of the target unit cells;
a unit cell pixel matrix determining unit, configured to determine a corresponding unit cell pixel matrix from the second target pixel matrix based on the target unit cell vertex coordinates;
and the unit cell sequencing unit is used for sequencing the unit cell pixel matrix based on a preset sequencing rule so as to determine the unit cell pixel matrix.
In some specific embodiments, the cell word recognition module 13 may specifically include:
the unit grid adjusting unit is used for adjusting the width of the unit grid in the unit grid pixel matrix to obtain a new unit grid pixel matrix, and splicing the unit grids in the new unit grid pixel matrix in the vertical direction to obtain a target image;
and the text detection unit is used for carrying out text detection on the target image by utilizing a preset optical character recognition technology so as to obtain text content on the target form area and coordinates corresponding to the text content.
In some specific embodiments, the device for identifying text of a screen shot form image may further include:
and the text storage module is used for integrating the text content and the coordinates corresponding to the text content to generate a target table and storing the target table through a preset file format.
Further, the embodiment of the present application further discloses an electronic device, and fig. 7 is a block diagram of an electronic device 20 according to an exemplary embodiment, where the content of the figure is not to be considered as any limitation on the scope of use of the present application.
Fig. 7 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present application. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. Wherein the memory 22 is configured to store a computer program that is loaded and executed by the processor 21 to implement the relevant steps in the on-screen form image text recognition method disclosed in any of the foregoing embodiments. In addition, the electronic device 20 in the present embodiment may be specifically an electronic computer.
In this embodiment, the power supply 23 is configured to provide an operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and the communication protocol to be followed is any communication protocol applicable to the technical solution of the present application, which is not specifically limited herein; the input/output interface 25 is used for acquiring external input data or outputting external output data, and the specific interface type thereof may be selected according to the specific application requirement, which is not limited herein.
The memory 22 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, and the resources stored thereon may include an operating system 221, a computer program 222, and the like, and the storage may be temporary storage or permanent storage.
The operating system 221 is used for managing and controlling various hardware devices on the electronic device 20 and computer programs 222, which may be Windows Server, netware, unix, linux, etc. The computer program 222 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the on-screen form image text recognition method performed by the electronic device 20 as disclosed in any of the previous embodiments.
Further, the application also discloses a computer readable storage medium for storing a computer program; the computer program, when executed by the processor, implements the screen shot form image text recognition method disclosed above. For specific steps of the method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and no further description is given here.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing has outlined rather broadly the more detailed description of the application in order that the detailed description of the application that follows may be better understood, and in order that the present principles and embodiments may be better understood; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.
Claims (10)
1. A method for identifying text of a screen shot form image, comprising the steps of:
inputting a target screen shot form image into a preset moire elimination model to obtain a first target pixel matrix with moire removed, and performing exposure processing on the first target pixel matrix to determine a second target pixel matrix corresponding to a target form area on the target screen shot form image;
inputting the second target pixel matrix into a preset table detection model to obtain target cell vertex coordinates, and determining a cell pixel matrix based on the target cell vertex coordinates;
and splicing the cells in the cell pixel matrix to obtain a target image, and performing text detection on the target image by using a preset optical character recognition technology to obtain the text on the target form area.
2. The method for recognizing text in a screen shot form image according to claim 1, wherein before inputting the target screen shot form image to a preset moire elimination model to obtain a first target pixel matrix with moire removed, the method further comprises:
and adjusting the initial moire elimination model through a preset screen shot image data set to obtain the preset moire elimination model.
3. The method of claim 1, wherein exposing the first matrix of target pixels to determine a second matrix of target pixels corresponding to a target form area on the target screen form image comprises:
performing exposure processing on the first target pixel matrix based on a preset linear transformation formula to obtain an exposed pixel matrix, and cutting the exposed pixel matrix based on a preset pixel cutting range to obtain a target exposed pixel matrix;
and determining a target form area on the target screen shot form image according to the outline vertex coordinates and the preset vertex coordinates corresponding to the target exposed pixel matrix, and calibrating the target form area through a preset perspective transformation algorithm to obtain a second target pixel matrix corresponding to the target form area.
4. The method for recognizing text in a screen shot form image according to claim 3, wherein before performing exposure processing on the first target pixel matrix based on a preset linear transformation formula to obtain an exposed pixel matrix and clipping the exposed pixel matrix based on a preset pixel clipping range to obtain a target exposed pixel matrix, the method further comprises:
converting the first target pixel matrix from an RGB color space to an HLS color space;
correspondingly, after performing exposure processing on the first target pixel matrix based on a preset linear transformation formula to obtain an exposed pixel matrix and cutting the exposed pixel matrix based on a preset pixel cutting range to obtain a target exposed pixel matrix, the method further comprises:
the first target pixel matrix is restored from HLS color space to RGB color space.
5. The method of claim 1, wherein inputting the second matrix of target pixels into a predetermined table detection model to obtain target cell vertex coordinates, and determining the matrix of cell pixels based on the target cell vertex coordinates, comprises:
inputting the second target pixel matrix to a preset table detection model to obtain the vertex coordinates of the target cells;
determining a corresponding cell pixel matrix from the second target pixel matrix based on the target cell vertex coordinates;
and ordering the cell pixel matrix based on a preset ordering rule to determine the cell pixel matrix.
6. The method for recognizing text in a screen shot form image according to any one of claims 1 to 5, wherein the stitching the cells in the cell pixel matrix to obtain a target image, and performing text detection on the target image by using a preset optical character recognition technology to obtain text on the target form area, includes:
adjusting the widths of the cells in the cell pixel matrix to obtain a new cell pixel matrix, and splicing the cells in the new cell pixel matrix in the vertical direction to obtain a target image;
and carrying out text detection on the target image by using a preset optical character recognition technology to obtain text content on the target form area and coordinates corresponding to the text content.
7. The method for text recognition of a screen shot form image according to claim 6, wherein after text detection is performed on the target image by using a preset optical character recognition technology to obtain text content on the target form area and coordinates corresponding to the text content, further comprising:
integrating the text content and the coordinates corresponding to the text content to generate a target table, and storing the target table through a preset file format.
8. A screen shot form image text recognition device, comprising:
the image preprocessing module is used for inputting a target screen shot form image into a preset moire elimination model to obtain a first target pixel matrix with moire removed, and carrying out exposure processing on the first target pixel matrix to determine a second target pixel matrix corresponding to a target form area on the target screen shot form image;
the table detection module is used for inputting the second target pixel matrix into a preset table detection model to obtain target cell vertex coordinates, and determining a cell pixel matrix based on the target cell vertex coordinates;
and the cell text recognition module is used for splicing cells in the cell pixel matrix to obtain a target image, and performing text detection on the target image by utilizing a preset optical character recognition technology to obtain the text on the target form area.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the on-screen form image text recognition method of any one of claims 1 to 7.
10. A computer readable storage medium for storing a computer program which when executed by a processor implements the on-screen form image text recognition method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311076101.2A CN117095417A (en) | 2023-08-24 | 2023-08-24 | Screen shot form image text recognition method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311076101.2A CN117095417A (en) | 2023-08-24 | 2023-08-24 | Screen shot form image text recognition method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117095417A true CN117095417A (en) | 2023-11-21 |
Family
ID=88769527
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311076101.2A Pending CN117095417A (en) | 2023-08-24 | 2023-08-24 | Screen shot form image text recognition method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117095417A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117350264A (en) * | 2023-12-04 | 2024-01-05 | 税友软件集团股份有限公司 | PPT file generation method, device, equipment and storage medium |
-
2023
- 2023-08-24 CN CN202311076101.2A patent/CN117095417A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117350264A (en) * | 2023-12-04 | 2024-01-05 | 税友软件集团股份有限公司 | PPT file generation method, device, equipment and storage medium |
CN117350264B (en) * | 2023-12-04 | 2024-02-23 | 税友软件集团股份有限公司 | PPT file generation method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101650783B (en) | Image identification method and imaging apparatus | |
US11790499B2 (en) | Certificate image extraction method and terminal device | |
CN109993086B (en) | Face detection method, device and system and terminal equipment | |
CN110400278B (en) | Full-automatic correction method, device and equipment for image color and geometric distortion | |
US10477128B2 (en) | Neighborhood haze density estimation for single-image dehaze | |
CN107749986B (en) | Teaching video generation method and device, storage medium and computer equipment | |
CN111507181B (en) | Correction method and device for bill image and computer equipment | |
WO2021029423A4 (en) | Image processing method and apparatus and non-transitory computer-readable medium | |
US20220398698A1 (en) | Image processing model generation method, processing method, storage medium, and terminal | |
CN113112511A (en) | Method and device for correcting test paper, storage medium and electronic equipment | |
CN117095417A (en) | Screen shot form image text recognition method, device, equipment and storage medium | |
CN114399623B (en) | Universal answer identification method, system, storage medium and computing device | |
CN113965814B (en) | Multi-conference-place key frame extraction method and system based on video conference scene | |
CN117456371B (en) | Group string hot spot detection method, device, equipment and medium | |
CN109949376B (en) | Method, device, computer equipment and storage medium for discriminating black-white picture | |
CN116777769A (en) | Method and device for correcting distorted image, electronic equipment and storage medium | |
US20210281742A1 (en) | Document detections from video images | |
CN113065407B (en) | Financial bill seal erasing method based on attention mechanism and generation countermeasure network | |
CN115690934A (en) | Master and student attendance card punching method and device based on batch face recognition | |
CN112907533A (en) | Detection model training method, device, equipment and readable storage medium | |
WO2022056876A1 (en) | Method and apparatus for recognizing electric motor nameplate, and computer-readable storage medium | |
JPH06237376A (en) | Deteriorated picture recovery system | |
CN112825141A (en) | Method and device for recognizing text, recognition equipment and storage medium | |
CN118570590B (en) | Simulation method, device, equipment and medium for image printing and photographing process | |
Lorch et al. | Landscape More Secure Than Portrait? Zooming Into the Directionality of Digital Images With Security Implications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |