CN113420684A - Report recognition method and device based on feature extraction, electronic equipment and medium - Google Patents
Report recognition method and device based on feature extraction, electronic equipment and medium Download PDFInfo
- Publication number
- CN113420684A CN113420684A CN202110728172.0A CN202110728172A CN113420684A CN 113420684 A CN113420684 A CN 113420684A CN 202110728172 A CN202110728172 A CN 202110728172A CN 113420684 A CN113420684 A CN 113420684A
- Authority
- CN
- China
- Prior art keywords
- report
- image
- report image
- text
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 83
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000006243 chemical reaction Methods 0.000 claims abstract description 20
- 230000009466 transformation Effects 0.000 claims description 32
- 230000006870 function Effects 0.000 claims description 31
- 239000011159 matrix material Substances 0.000 claims description 22
- 238000001514 detection method Methods 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 12
- 238000013527 convolutional neural network Methods 0.000 claims description 10
- 230000004913 activation Effects 0.000 claims description 8
- 238000010606 normalization Methods 0.000 claims description 8
- 230000009467 reduction Effects 0.000 claims description 8
- 238000005516 engineering process Methods 0.000 abstract description 4
- 238000004891 communication Methods 0.000 description 7
- 230000005540 biological transmission Effects 0.000 description 5
- 239000000284 extract Substances 0.000 description 5
- 238000007726 management method Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000013500 data storage Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 239000003086 colorant Substances 0.000 description 2
- 230000001186 cumulative effect Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000016776 visual perception Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Character Discrimination (AREA)
Abstract
The invention relates to the technical field of data display, and discloses a report recognition method based on feature extraction, which comprises the following steps: acquiring a storage path of a report image, acquiring the report image according to the storage path, and extracting the image characteristics of the report image; determining the text direction of the report image according to the image characteristics; judging whether the text direction is a preset direction or not; if so, confirming that the report image is a target report image, and if not, performing angle conversion on the report image to obtain the target report image; and acquiring a pre-trained feature extraction network, and extracting text information of the target report form image by using the feature extraction network to obtain a target text. The invention also provides a report recognition device, equipment and a storage medium based on the feature extraction. The invention also relates to a block chain technology, and the report image can be stored in the block chain node. The invention can improve the accuracy of report recognition.
Description
Technical Field
The invention relates to the technical field of data display, in particular to a report recognition method and device based on feature extraction, electronic equipment and a computer readable storage medium.
Background
The existing data management mode generally adopts a report form to input data, which is convenient for checking and sorting the data and also convenient for reflecting the change among the data. In the process of inputting data, reports generally need to be identified, however, due to the fact that the reports are different in style and uneven in quality, various identification errors occur in the identification process of various scanned report images, and the accuracy of report identification is low.
Disclosure of Invention
The invention provides a report recognition method and device based on feature extraction, electronic equipment and a computer readable storage medium, and mainly aims to improve the accuracy of report recognition.
In order to achieve the above object, the present invention provides a report recognition method based on feature extraction, which comprises:
acquiring a storage path of a report image, acquiring the report image according to the storage path, and extracting the image characteristics of the report image;
determining the text direction of the report image according to the image characteristics;
judging whether the text direction is a preset direction or not;
if so, confirming that the report image is a target report image;
if not, performing angle conversion on the report image to obtain the target report image;
and acquiring a pre-trained feature extraction network, and extracting text information of the target report form image by using the feature extraction network to obtain a target text.
Optionally, the obtaining the report image according to the storage path includes:
acquiring a storage address and a storage serial number of the report image from the storage path;
inquiring whether the report image corresponding to the storage serial number is unique in a database corresponding to the storage address;
if the report image corresponding to the storage serial number is not unique, discarding the report image corresponding to the storage serial number, and obtaining the storage path of the acquired report image again;
and if the report image corresponding to the storage serial number is unique, acquiring the report image according to the storage address.
Optionally, before extracting the image feature of the report image, the method further includes:
acquiring the report image, and carrying out binarization operation on the report image to obtain a binarization report image;
denoising the binary report image to obtain a denoised report image;
and detecting the straight line group of the denoising report image by a preset straight line detection method, and performing straight line compensation on the straight line group of the denoising report image.
Optionally, the determining the text direction of the report image according to the image feature includes:
taking a plurality of different directions as preset text directions of the report image, and identifying characters and confidence degrees of the report image in the different directions;
identifying the text type in the report image according to the image characteristics;
reducing the confidence coefficient of the characters which do not belong to the text category of the report image in the characters in different directions according to a preset proportion;
and counting the accumulated confidence degrees in each text direction, and determining the text direction corresponding to the maximum accumulated confidence degree as the text direction of the report image.
Optionally, the obtaining a pre-trained feature extraction network, and extracting text information of the target report image by using the feature extraction network to obtain a target text includes:
selecting a characteristic dimension according to the target report image, and performing characteristic extraction on the target report image according to the characteristic dimension to obtain report characteristics;
and obtaining the dimensionality reduction report characteristics by reducing the dimensionality of the report characteristics, and classifying the dimensionality reduction report characteristics by using a classifier in the characteristic extraction network to obtain a target text.
Optionally, the performing angle conversion on the report image to obtain the target report image includes:
creating an original transformation matrix according to the report image;
constructing an original transformation equation containing unknown parameters according to the original transformation matrix;
calculating by using the edge coordinate points of the report image and the original transformation equation to obtain a standard transformation matrix;
and performing angle conversion on the report image according to the standard transformation matrix to obtain the target report image.
Optionally, the extracting the image feature of the report image includes:
constructing a first convolution layer through convolution operation, normalization operation and activation operation;
constructing a second convolution layer by utilizing a preset combination function and a preset addition function, and constructing a convolutional neural network through the first convolution layer and the second convolution layer;
and extracting the image characteristics of the report image by using the convolutional neural network.
In order to solve the above problem, the present invention further provides a report recognition apparatus based on feature extraction, the apparatus including:
the report image acquisition module is used for acquiring a storage path of a report image, acquiring the report image according to the storage path and extracting the image characteristics of the report image;
the text direction identification module is used for determining the text direction of the report image according to the image characteristics;
the text direction judging module is used for judging whether the text direction is a preset direction or not;
a target report acquisition module, configured to determine that the report image is a target report image if the text direction is a preset direction, and perform angle conversion on the report image to obtain the target report image if the text direction is not the preset direction;
and the target text extraction module is used for acquiring a pre-trained feature extraction network, and extracting text information of the target report image by using the feature extraction network to obtain a target text.
In order to solve the above problem, the present invention also provides an electronic device, including:
a memory storing at least one computer program; and
and the processor executes the computer program stored in the memory to realize the report recognition method based on the feature extraction.
In order to solve the above problem, the present invention further provides a computer-readable storage medium including a storage data area and a storage program area, the storage data area storing created data, the storage program area storing a computer program; wherein the computer program, when executed by a processor, implements a report recognition method based on feature extraction as described above.
In the embodiment of the invention, the report image is obtained according to the storage path of the report image, the text direction extraction is carried out on the report image, whether the report image is the target report image or not is judged according to the extraction result, and the direction conversion is carried out on the report image when the report image is not the target report image, so that the target report image is obtained, when the target report has direction deviation, the report image can be corrected, further, the pre-trained feature extraction network is utilized to extract the text information of the corrected target report image, so as to obtain the target text, and therefore, the purpose of improving the accuracy of report recognition is achieved.
Drawings
Fig. 1 is a schematic flow chart of a report recognition method based on feature extraction according to an embodiment of the present invention;
fig. 2 is a schematic block diagram of a report recognition apparatus based on feature extraction according to an embodiment of the present invention;
fig. 3 is a schematic diagram of an internal structure of an electronic device implementing a report recognition method based on feature extraction according to an embodiment of the present invention;
the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the application provides a report recognition method based on feature extraction. The execution subject of the report recognition method based on feature extraction includes, but is not limited to, at least one of electronic devices that can be configured to execute the method provided by the embodiments of the present application, such as a server, a terminal, and the like. In other words, the report recognition method based on feature extraction may be executed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
Fig. 1 is a schematic flow chart of a report recognition method based on feature extraction according to an embodiment of the present invention. In this embodiment, the report recognition method based on feature extraction includes:
s1, acquiring a storage path of the report image, acquiring the report image according to the storage path, and extracting the image characteristics of the report image.
In the embodiment of the present invention, the report image may be stored in a Network Attached Storage (NAS), and the report image may be a financial report image.
Specifically, the network attached storage is a device which is linked on the network and has a data storage function, and the storage device and the server can be separated by adopting the network attached storage, so that the data can be centrally managed, and the functions of releasing bandwidth, providing data transmission performance and the like can be realized.
In the embodiment of the invention, the report to be entered at each client (i.e. user end) can be scanned in advance to obtain the report image, and the report image is stored, wherein the report to be entered at the client is the report input by the user or the report acquired from the preset report repository to be entered.
In another embodiment of the present invention, when the report to be entered at each client is scanned to obtain a report image, the multiple report to be entered can be scanned based on the three primary colors respectively by controlling the camera and starting the scanning program to obtain multiple red reports, green reports and blue reports, and then the red reports, the green reports and the blue reports corresponding to the same report to be entered are combined to obtain the report image.
In the scheme, the report to be input is scanned by adopting a three-color separation technology to obtain the red report, the green report and the blue report, and the red report, the green report and the blue report are integrated to obtain the report image. Therefore, in the embodiment, the image quality of the acquired report image is higher.
In this embodiment, the image features mainly include color features, texture features, shape features, spatial relationship features, and the like of an image.
Preferably, the image features are shape features and spatial relationship features of the report image.
In detail, the obtaining the report image according to the storage path includes:
acquiring a storage address and a storage serial number of the report image from the storage path;
inquiring whether the report image corresponding to the storage serial number is unique in a database corresponding to the storage address;
if the report image corresponding to the storage serial number is not unique, discarding the report image corresponding to the storage serial number, and obtaining the storage path of the acquired report image again;
and if the report image corresponding to the storage serial number is unique, acquiring the report image according to the storage address.
Specifically, the storage serial number is a serial number for determining the storage of the report image, and the report image has a unique storage serial number.
Specifically, the storage serial number is used for inquiring a unique report image, so that the report image can be traced conveniently, when the report image corresponding to the storage serial number is not unique, the report image is discarded, that is, the non-unique report image is not selected, after the report image is discarded, the operation of obtaining the storage path of the report image is executed again, and a new report image is obtained according to the operation.
In this embodiment, for example, if the storage serial number of the report image is obtained as a according to the storage path, and the report image a1 and the report image a2 are obtained through the storage serial number a query, the report image a1, the report image a2, and the storage serial number a are all discarded, and the storage path of the report image is obtained again.
In detail, the extracting the image features of the report image includes:
constructing a first convolution layer through convolution operation, normalization operation and activation operation;
constructing a second convolution layer by utilizing a preset combination function and a preset addition function, and constructing a convolutional neural network through the first convolution layer and the second convolution layer;
and extracting the image characteristics of the report image by using the convolutional neural network.
The convolution operation is a 2D convolution operation and is used for obtaining a convolution image by utilizing 2D convolution kernels with different functions to perform convolution from the report image; the normalization operation is used for reducing the pixel values of the pixel points in the convolution image by utilizing a normalization function; the activation operation reduces an area size of the convolution map using an activation function; the merge function is used for connecting two or more first convolution layers, and the add function is used for adding the first convolution layers through a flow formed by the merge function.
In an embodiment of the present invention, before extracting the image feature of the report image, the method further includes:
acquiring the report image, and carrying out binarization operation on the report image to obtain a binarization report image;
denoising the binary report image to obtain a denoised report image;
and detecting the straight line group of the denoising report image by a preset straight line detection method, and performing straight line compensation on the straight line group of the denoising report image.
In the embodiment of the invention, the report image is binarized, so that the information in the image can be conveniently extracted. And denoising the binarization report image can reduce data loss and interference in an image transmission process or other processes.
Further, the straight line detection method comprises a Hough straight line detection method, an FLD straight line detection method and the like, and the table lines of the report image can be detected through the straight line detection method, compensation is performed, and the situations of fuzzy, damage and the like of the table lines are reduced.
And S2, determining the text direction of the report image according to the image characteristics.
In the embodiment of the invention, the text direction is a character direction and is represented as a direction in which the text is vertically upward.
In detail, the determining the text direction of the report image according to the image features includes:
taking a plurality of different directions as preset text directions of the report image, and identifying characters and confidence degrees of the report image in the different directions;
identifying the text type in the report image according to the image characteristics;
reducing the confidence coefficient of the characters which do not belong to the text category of the report image in the characters in different directions according to a preset proportion;
and counting the accumulated confidence degrees in each text direction, and determining the text direction corresponding to the maximum accumulated confidence degree as the text direction of the report image.
In this embodiment, the plurality of directions include, but are not limited to, a vertically upward direction, a vertically downward direction, a horizontally leftward direction, and a horizontally rightward direction.
In the embodiment of the invention, the text type of the report image can be Chinese text, English text, Arabic numerals and the like, and the text type is different according to the content of the report image.
In the embodiment of the invention, the confidence of each character in different directions can be recognized during recognition. The confidence degree can be obtained by calculating the similarity or distance between the characters in different directions and the standard character direction. The higher the confidence of a character in a certain direction, the greater the likelihood that the character is in that direction.
Further, the preset ratio may be 0.5 or 0.3.
Specifically, the cumulative confidence is the sum of the confidences of all characters in one direction.
And S3, judging whether the text direction is a preset direction.
In the embodiment of the present invention, the preset direction may be a reading direction of a user, for example, a vertical upward direction of the report image is taken as the preset direction.
Further, by obtaining a unit vector in a preset direction and a unit vector in a text direction, calculating the unit vector in the preset direction and the unit vector in the text direction by using a cosine similarity formula, and judging whether the text direction is the preset direction.
And S4, if yes, determining that the report image is the target report image.
In the embodiment of the invention, the target report image is an image which can directly extract the characteristics by using a characteristic extraction network.
And S5, if not, performing angle conversion on the report image to obtain the target report image.
In detail, the performing angle conversion on the report image to obtain the target report image includes:
creating an original transformation matrix according to the report image;
constructing an original transformation equation containing unknown parameters according to the original transformation matrix;
calculating by using the edge coordinate points of the report image and the original transformation equation to obtain a standard transformation matrix;
and performing angle conversion on the report image according to the standard transformation matrix to obtain the target report image.
In the embodiment of the present invention, the original transformation matrix is created according to the size of the report image, and the original transformation equation is constructed according to the number in the original transformation matrix, for example:
In the scheme, the direction of the text in the obtained target report image is the preset direction.
And S6, acquiring a pre-trained feature extraction network, and extracting text information of the target report image by using the feature extraction network to obtain a target text.
In the embodiment of the invention, the feature extraction network is a convolutional neural network, and the convolutional neural network is constructed for simulating a visual perception mechanism of a living being and can be used for supervised learning and unsupervised learning.
In detail, the obtaining of the pre-trained feature extraction network, extracting the text information of the target report image by using the feature extraction network, and obtaining the target text, includes:
selecting a characteristic dimension according to the target report image, and performing characteristic extraction on the target report image according to the characteristic dimension to obtain report characteristics;
and obtaining the dimensionality reduction report characteristics by reducing the dimensionality of the report characteristics, and classifying the dimensionality reduction report characteristics by using a classifier in the characteristic extraction network to obtain a target text.
In this embodiment, the feature dimension may be determined according to the pixels of the target report image. For example, the pixel point of the target report image is 128 × 128 to 16384 pixel points, and the number of the feature points obtained according to the gray value of the pixel point is 256, so that the feature dimension of the target report image is 256.
In the embodiment of the invention, the characteristic dimension is used for reflecting the size of the target report image, and the larger the characteristic dimension of the target report image is, the larger the size of the target report image is, and the more pixel points are.
Further, in other optional embodiments of the present invention, after obtaining the target text, the method further includes:
and carrying out structured arrangement on the target text to obtain a standard table, and sending the standard table to a preset report entry system so that the report analysis system reads the standard table and enters the target text.
In the embodiment of the present invention, the structured arrangement is arranged in a hierarchical manner according to the text relationship of the target text, so that the text has an arrangement manner of an organization and a synopsis. The text relationships include temporal order of text entry, semantic association, and the like.
In this embodiment, the report entry system can analyze and enter the report.
The standard form obtained by the embodiment of the invention is a form which has clear text and no damage data and can be directly read, thereby improving the utilization rate of the text of the form after the identified content of the form.
In the embodiment of the invention, the report image is obtained according to the storage path of the report image, the text direction extraction is carried out on the report image, whether the report image is the target report image or not is judged according to the extraction result, and the direction conversion is carried out on the report image when the report image is not the target report image, so that the target report image is obtained, when the target report has direction deviation, the report image can be corrected, further, the pre-trained feature extraction network is utilized to extract the text information of the corrected target report image, so as to obtain the target text, and therefore, the purpose of improving the accuracy of report recognition is achieved.
Fig. 2 is a schematic block diagram of a report recognition apparatus based on feature extraction according to the present invention.
The report recognition apparatus 100 based on feature extraction according to the present invention can be installed in an electronic device. According to the realized function, the report recognition device based on feature extraction may include a report image obtaining module 101, a text direction recognition module 102, a text direction determination module 103, a target report obtaining module 104, and a target text extraction module 105. The module of the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the report image obtaining module 101 is configured to obtain a storage path of a report image, obtain the report image according to the storage path, and extract image features of the report image. In the embodiment of the present invention, the report image may be stored in a Network Attached Storage (NAS), and the report image may be a financial report image.
Specifically, the network attached storage is a device which is linked on the network and has a data storage function, and the storage device and the server can be separated by adopting the network attached storage, so that the data can be centrally managed, and the functions of releasing bandwidth, providing data transmission performance and the like can be realized.
In the embodiment of the invention, the report to be entered at each client (i.e. user end) can be scanned in advance to obtain the report image, and the report image is stored, wherein the report to be entered at the client is the report input by the user or the report acquired from the preset report repository to be entered.
In another embodiment of the present invention, when the report to be entered at each client is scanned to obtain a report image, the multiple report to be entered can be scanned based on the three primary colors respectively by controlling the camera and starting the scanning program to obtain multiple red reports, green reports and blue reports, and then the red reports, the green reports and the blue reports corresponding to the same report to be entered are combined to obtain the report image.
In the scheme, the report to be input is scanned by adopting a three-color separation technology to obtain the red report, the green report and the blue report, and the red report, the green report and the blue report are integrated to obtain the report image. Therefore, in the embodiment, the image quality of the acquired report image is higher.
In this embodiment, the image features mainly include color features, texture features, shape features, spatial relationship features, and the like of an image.
Preferably, the image features are shape features and spatial relationship features of the report image.
In detail, the report image obtaining module 101 obtains the report image according to the storage path by performing the following operations:
acquiring a storage address and a storage serial number of the report image from the storage path;
inquiring whether the report image corresponding to the storage serial number is unique in a database corresponding to the storage address;
if the report image corresponding to the storage serial number is not unique, discarding the report image corresponding to the storage serial number, and obtaining the storage path of the acquired report image again;
and if the report image corresponding to the storage serial number is unique, acquiring the report image according to the storage address.
Specifically, the storage serial number is a serial number for determining the storage of the report image, and the report image has a unique storage serial number.
Specifically, the storage serial number is used for inquiring a unique report image, so that the report image can be traced conveniently, when the report image corresponding to the storage serial number is not unique, the report image is discarded, that is, the non-unique report image is not selected, after the report image is discarded, the operation of obtaining the storage path of the report image is executed again, and a new report image is obtained according to the operation.
In this embodiment, for example, if the storage serial number of the report image is obtained as a according to the storage path, and the report image a1 and the report image a2 are obtained through the storage serial number a query, the report image a1, the report image a2, and the storage serial number a are all discarded, and the storage path of the report image is obtained again.
In detail, the report image obtaining module 101 extracts the image features of the report image by performing the following operations:
constructing a first convolution layer through convolution operation, normalization operation and activation operation;
constructing a second convolution layer by utilizing a preset combination function and a preset addition function, and constructing a convolutional neural network through the first convolution layer and the second convolution layer;
and extracting the image characteristics of the report image by using the convolutional neural network.
The convolution operation is a 2D convolution operation and is used for obtaining a convolution image by utilizing 2D convolution kernels with different functions to perform convolution from the report image; the normalization operation is used for reducing the pixel values of the pixel points in the convolution image by utilizing a normalization function; the activation operation reduces an area size of the convolution map using an activation function; the merge function is used for connecting two or more first convolution layers, and the add function is used for adding the first convolution layers through a flow formed by the merge function.
In an embodiment of the present invention, the apparatus further includes an image processing module, where the image processing module is configured to:
before extracting the image characteristics of the report image, acquiring the report image, and carrying out binarization operation on the report image to obtain a binarization report image;
denoising the binary report image to obtain a denoised report image;
and detecting the straight line group of the denoising report image by a preset straight line detection method, and performing straight line compensation on the straight line group of the denoising report image.
In the embodiment of the invention, the report image is binarized, so that the information in the image can be conveniently extracted. And denoising the binarization report image can reduce data loss and interference in an image transmission process or other processes.
Further, the straight line detection method comprises a Hough straight line detection method, an FLD straight line detection method and the like, and the table lines of the report image can be detected through the straight line detection method, compensation is performed, and the situations of fuzzy, damage and the like of the table lines are reduced.
The text direction identification module 102 is configured to determine a text direction of the report image according to the image feature.
In the embodiment of the invention, the text direction is a character direction and is represented as a direction in which the text is vertically upward.
In detail, the text direction recognition module 102 is specifically configured to:
taking a plurality of different directions as preset text directions of the report image, and identifying characters and confidence degrees of the report image in the different directions;
identifying the text type in the report image according to the image characteristics;
reducing the confidence coefficient of the characters which do not belong to the text category of the report image in the characters in different directions according to a preset proportion;
and counting the accumulated confidence degrees in each text direction, and determining the text direction corresponding to the maximum accumulated confidence degree as the text direction of the report image.
In this embodiment, the plurality of directions include, but are not limited to, a vertically upward direction, a vertically downward direction, a horizontally leftward direction, and a horizontally rightward direction.
In the embodiment of the invention, the text type of the report image can be Chinese text, English text, Arabic numerals and the like, and the text type is different according to the content of the report image.
In the embodiment of the invention, the confidence of each character in different directions can be recognized during recognition. The confidence degree can be obtained by calculating the similarity or distance between the characters in different directions and the standard character direction. The higher the confidence of a character in a certain direction, the greater the likelihood that the character is in that direction.
Further, the preset ratio may be 0.5 or 0.3.
Specifically, the cumulative confidence is the sum of the confidences of all characters in one direction. The text direction determining module 103 is configured to determine whether the text direction is a preset direction.
In the embodiment of the present invention, the preset direction may be a reading direction of a user, for example, a vertical upward direction of the report image is taken as the preset direction.
Further, by obtaining a unit vector in a preset direction and a unit vector in a text direction, calculating the unit vector in the preset direction and the unit vector in the text direction by using a cosine similarity formula, and judging whether the text direction is the preset direction.
The target report obtaining module 104 is configured to determine that the report image is the target report image if the text direction is the preset direction, and perform angle conversion on the report image to obtain the target report image if the text direction is not the preset direction.
In the embodiment of the invention, the target report image is an image which can directly extract the characteristics by using a characteristic extraction network.
In detail, the target report acquisition module 104 performs angle conversion on the report image by executing the following operations to obtain the target report image:
creating an original transformation matrix according to the report image;
constructing an original transformation equation containing unknown parameters according to the original transformation matrix;
calculating by using the edge coordinate points of the report image and the original transformation equation to obtain a standard transformation matrix;
and performing angle conversion on the report image according to the standard transformation matrix to obtain the target report image.
In the embodiment of the present invention, the original transformation matrix is created according to the size of the report image, and the original transformation equation is constructed according to the number in the original transformation matrix, for example:
In the scheme, the direction of the text in the obtained target report image is the preset direction.
The target text extraction module 105 is configured to obtain a pre-trained feature extraction network, and extract text information of the target report image by using the feature extraction network to obtain a target text.
In detail, the target text extraction module 105 is specifically configured to:
selecting a characteristic dimension according to the target report image, and performing characteristic extraction on the target report image according to the characteristic dimension to obtain report characteristics;
and obtaining the dimensionality reduction report characteristics by reducing the dimensionality of the report characteristics, and classifying the dimensionality reduction report characteristics by using a classifier in the characteristic extraction network to obtain a target text.
In this embodiment, the feature dimension may be determined according to the pixels of the target report image. For example, the pixel point of the target report image is 128 × 128 to 16384 pixel points, the number of the feature points obtained according to the gray value of the pixel point is 256, and the feature dimension of the target report image is 256
In the embodiment of the invention, the characteristic dimension is used for reflecting the size of the target report image, and the larger the characteristic dimension of the target report image is, the larger the size of the target report image is, and the more pixel points are.
Further, in other optional embodiments of the present invention, the apparatus further includes a text adjustment module, where the text adjustment module is configured to:
after the target text is obtained, the target text is subjected to structured arrangement to obtain a standard table, and the standard table is sent to a preset report entry system, so that the report analysis system reads the standard table and enters the target text.
In the embodiment of the present invention, the structured arrangement is arranged in a hierarchical manner according to the text relationship of the target text, so that the text has an arrangement manner of an organization and a synopsis. The text relationships include temporal order of text entry, semantic association, and the like.
In this embodiment, the report entry system can analyze and enter the report.
The standard form obtained by the embodiment of the invention is a form which has clear text and no damage data and can be directly read, thereby improving the utilization rate of the text of the form after the identified content of the form.
In the embodiment of the invention, the report image is obtained according to the storage path of the report image, the text direction extraction is carried out on the report image, whether the report image is the target report image or not is judged according to the extraction result, and the direction conversion is carried out on the report image when the report image is not the target report image, so that the target report image is obtained, when the target report has direction deviation, the report image can be corrected, further, the pre-trained feature extraction network is utilized to extract the text information of the corrected target report image, so as to obtain the target text, and therefore, the purpose of improving the accuracy of report recognition is achieved.
Fig. 3 is a schematic structural diagram of an electronic device for implementing a report recognition method based on feature extraction according to the present invention.
The electronic device may include a processor 10, a memory 11, a communication bus 12, and a communication interface 13, and may further include a computer program stored in the memory 11 and executable on the processor 10, such as a report recognition program based on feature extraction.
In some embodiments, the processor 10 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same function or different functions, and includes one or more Central Processing Units (CPUs), a microprocessor, a digital Processing chip, a graphics processor, a combination of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the whole electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device by running or executing programs or modules (for example, executing a report recognition program based on feature extraction, etc.) stored in the memory 11 and calling data stored in the memory 11.
The memory 11 includes at least one type of readable storage medium including flash memory, removable hard disks, multimedia cards, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disks, optical disks, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device, for example a removable hard disk of the electronic device. The memory 11 may also be an external storage device of the electronic device in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device. The memory 11 may be used not only to store application software installed in the electronic device and various types of data, such as codes of a report recognition program based on feature extraction, but also to temporarily store data that has been output or is to be output.
The communication bus 12 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
The communication interface 13 is used for communication between the electronic device and other devices, and includes a network interface and a user interface. Optionally, the network interface may include a wired interface and/or a wireless interface (e.g., WI-FI interface, bluetooth interface, etc.), which are typically used to establish a communication connection between the electronic device and other electronic devices. The user interface may be a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable, among other things, for displaying information processed in the electronic device and for displaying a visualized user interface.
Fig. 3 shows only an electronic device having components, and those skilled in the art will appreciate that the structure shown in fig. 3 does not constitute a limitation of the electronic device, and may include fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
For example, although not shown, the electronic device may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management and the like are realized through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The report recognition program based on feature extraction stored in the memory 11 of the electronic device is a combination of a plurality of computer programs, and when running in the processor 10, can realize:
acquiring a storage path of a report image, acquiring the report image according to the storage path, and extracting the image characteristics of the report image;
determining the text direction of the report image according to the image characteristics;
judging whether the text direction is a preset direction or not;
if so, confirming that the report image is a target report image;
if not, performing angle conversion on the report image to obtain the target report image;
and acquiring a pre-trained feature extraction network, and extracting text information of the target report form image by using the feature extraction network to obtain a target text.
Further, the electronic device integrated module/unit, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a non-volatile computer-readable storage medium. The computer readable storage medium may be volatile or non-volatile. For example, the computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
The present invention also provides a computer-readable storage medium, storing a computer program which, when executed by a processor of an electronic device, may implement:
acquiring a storage path of a report image, acquiring the report image according to the storage path, and extracting the image characteristics of the report image;
determining the text direction of the report image according to the image characteristics;
judging whether the text direction is a preset direction or not;
if so, confirming that the report image is a target report image;
if not, performing angle conversion on the report image to obtain the target report image;
and acquiring a pre-trained feature extraction network, and extracting text information of the target report form image by using the feature extraction network to obtain a target text.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.
Claims (10)
1. A report recognition method based on feature extraction is characterized by comprising the following steps:
acquiring a storage path of a report image, acquiring the report image according to the storage path, and extracting the image characteristics of the report image;
determining the text direction of the report image according to the image characteristics;
judging whether the text direction is a preset direction or not;
if so, confirming that the report image is a target report image;
if not, performing angle conversion on the report image to obtain the target report image;
and acquiring a pre-trained feature extraction network, and extracting text information of the target report form image by using the feature extraction network to obtain a target text.
2. The method for recognizing a report according to claim 1, wherein the obtaining the report image according to the storage path comprises:
acquiring a storage address and a storage serial number of the report image from the storage path;
inquiring whether the report image corresponding to the storage serial number is unique in a database corresponding to the storage address;
if the report image corresponding to the storage serial number is not unique, discarding the report image corresponding to the storage serial number, and obtaining the storage path of the acquired report image again;
and if the report image corresponding to the storage serial number is unique, acquiring the report image according to the storage address.
3. The feature extraction-based report recognition method according to claim 1, wherein before extracting the image features of the report image, the method further comprises:
acquiring the report image, and carrying out binarization operation on the report image to obtain a binarization report image;
denoising the binary report image to obtain a denoised report image;
and detecting the straight line group of the denoising report image by a preset straight line detection method, and performing straight line compensation on the straight line group of the denoising report image.
4. The method for recognizing a report according to claim 1, wherein the determining the text direction of the report image according to the image features comprises:
taking a plurality of different directions as preset text directions of the report image, and identifying characters and confidence degrees of the report image in the different directions;
identifying the text type in the report image according to the image characteristics;
reducing the confidence coefficient of the characters which do not belong to the text category of the report image in the characters in different directions according to a preset proportion;
and counting the accumulated confidence degrees in each text direction, and determining the text direction corresponding to the maximum accumulated confidence degree as the text direction of the report image.
5. The report recognition method based on feature extraction according to any one of claims 1 to 4, wherein the obtaining a pre-trained feature extraction network and extracting text information of the target report image by using the feature extraction network to obtain a target text comprises:
selecting a characteristic dimension according to the target report image, and performing characteristic extraction on the target report image according to the characteristic dimension to obtain report characteristics;
and obtaining the dimensionality reduction report characteristics by reducing the dimensionality of the report characteristics, and classifying the dimensionality reduction report characteristics by using a classifier in the characteristic extraction network to obtain a target text.
6. The report recognition method based on feature extraction according to any one of claims 1 to 4, wherein the angle conversion of the report image to obtain the target report image comprises:
creating an original transformation matrix according to the report image;
constructing an original transformation equation containing unknown parameters according to the original transformation matrix;
calculating by using the edge coordinate points of the report image and the original transformation equation to obtain a standard transformation matrix;
and performing angle conversion on the report image according to the standard transformation matrix to obtain the target report image.
7. The report recognition method based on feature extraction according to any one of claims 1 to 4, wherein the extracting the image features of the report image comprises:
constructing a first convolution layer through convolution operation, normalization operation and activation operation;
constructing a second convolution layer by utilizing a preset combination function and a preset addition function, and constructing a convolutional neural network through the first convolution layer and the second convolution layer;
and extracting the image characteristics of the report image by using the convolutional neural network.
8. A report recognition device based on feature extraction is characterized in that the device comprises:
the report image acquisition module is used for acquiring a storage path of a report image, acquiring the report image according to the storage path and extracting the image characteristics of the report image;
the text direction identification module is used for determining the text direction of the report image according to the image characteristics;
the text direction judging module is used for judging whether the text direction is a preset direction or not;
a target report acquisition module, configured to determine that the report image is a target report image if the text direction is a preset direction, and perform angle conversion on the report image to obtain the target report image if the text direction is not the preset direction;
and the target text extraction module is used for acquiring a pre-trained feature extraction network, and extracting text information of the target report image by using the feature extraction network to obtain a target text.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores computer program instructions executable by the at least one processor to enable the at least one processor to perform the method of feature extraction based report recognition according to any of claims 1 to 7.
10. A computer-readable storage medium comprising a storage data area storing created data and a storage program area storing a computer program; wherein the computer program, when executed by a processor, implements a feature extraction based report recognition method according to any of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110728172.0A CN113420684A (en) | 2021-06-29 | 2021-06-29 | Report recognition method and device based on feature extraction, electronic equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110728172.0A CN113420684A (en) | 2021-06-29 | 2021-06-29 | Report recognition method and device based on feature extraction, electronic equipment and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113420684A true CN113420684A (en) | 2021-09-21 |
Family
ID=77717124
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110728172.0A Pending CN113420684A (en) | 2021-06-29 | 2021-06-29 | Report recognition method and device based on feature extraction, electronic equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113420684A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114154462A (en) * | 2021-10-29 | 2022-03-08 | 北京搜狗科技发展有限公司 | Structure picture restoration method, structure picture restoration device, electronic equipment, medium and program product |
CN114154464A (en) * | 2021-10-29 | 2022-03-08 | 北京搜狗科技发展有限公司 | Structure picture restoration method, structure picture restoration device, electronic equipment, medium and program product |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109993160A (en) * | 2019-02-18 | 2019-07-09 | 北京联合大学 | A kind of image flame detection and text and location recognition method and system |
CN112036259A (en) * | 2020-08-10 | 2020-12-04 | 晶璞(上海)人工智能科技有限公司 | Form correction and recognition method based on combination of image processing and deep learning |
CN112102203A (en) * | 2020-09-27 | 2020-12-18 | 中国建设银行股份有限公司 | Image correction method, device and equipment |
CN112464798A (en) * | 2020-11-24 | 2021-03-09 | 创新奇智(合肥)科技有限公司 | Text recognition method and device, electronic equipment and storage medium |
CN112561889A (en) * | 2020-12-18 | 2021-03-26 | 深圳赛安特技术服务有限公司 | Target detection method and device, electronic equipment and storage medium |
-
2021
- 2021-06-29 CN CN202110728172.0A patent/CN113420684A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109993160A (en) * | 2019-02-18 | 2019-07-09 | 北京联合大学 | A kind of image flame detection and text and location recognition method and system |
CN112036259A (en) * | 2020-08-10 | 2020-12-04 | 晶璞(上海)人工智能科技有限公司 | Form correction and recognition method based on combination of image processing and deep learning |
CN112102203A (en) * | 2020-09-27 | 2020-12-18 | 中国建设银行股份有限公司 | Image correction method, device and equipment |
CN112464798A (en) * | 2020-11-24 | 2021-03-09 | 创新奇智(合肥)科技有限公司 | Text recognition method and device, electronic equipment and storage medium |
CN112561889A (en) * | 2020-12-18 | 2021-03-26 | 深圳赛安特技术服务有限公司 | Target detection method and device, electronic equipment and storage medium |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114154462A (en) * | 2021-10-29 | 2022-03-08 | 北京搜狗科技发展有限公司 | Structure picture restoration method, structure picture restoration device, electronic equipment, medium and program product |
CN114154464A (en) * | 2021-10-29 | 2022-03-08 | 北京搜狗科技发展有限公司 | Structure picture restoration method, structure picture restoration device, electronic equipment, medium and program product |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112699775B (en) | Certificate identification method, device, equipment and storage medium based on deep learning | |
CN112528863A (en) | Identification method and device of table structure, electronic equipment and storage medium | |
CN112861648A (en) | Character recognition method and device, electronic equipment and storage medium | |
CN112396005A (en) | Biological characteristic image recognition method and device, electronic equipment and readable storage medium | |
CN112507934A (en) | Living body detection method, living body detection device, electronic apparatus, and storage medium | |
CN112528616B (en) | Service form generation method and device, electronic equipment and computer storage medium | |
CN113487621A (en) | Medical image grading method and device, electronic equipment and readable storage medium | |
CN113705462A (en) | Face recognition method and device, electronic equipment and computer readable storage medium | |
CN114049568B (en) | Target object deformation detection method, device, equipment and medium based on image comparison | |
CN114708461A (en) | Multi-modal learning model-based classification method, device, equipment and storage medium | |
CN114881698A (en) | Advertisement compliance auditing method and device, electronic equipment and storage medium | |
CN112860905A (en) | Text information extraction method, device and equipment and readable storage medium | |
CN113420684A (en) | Report recognition method and device based on feature extraction, electronic equipment and medium | |
CN114267064A (en) | Face recognition method and device, electronic equipment and storage medium | |
CN113887438A (en) | Watermark detection method, device, equipment and medium for face image | |
CN111177450B (en) | Image retrieval cloud identification method and system and computer readable storage medium | |
CN112668575A (en) | Key information extraction method and device, electronic equipment and storage medium | |
CN113704474A (en) | Bank outlet equipment operation guide generation method, device, equipment and storage medium | |
CN112329666A (en) | Face recognition method and device, electronic equipment and storage medium | |
CN116664066B (en) | Method and system for managing enterprise planning income and actual income | |
CN112528903A (en) | Face image acquisition method and device, electronic equipment and medium | |
CN113869455B (en) | Unsupervised clustering method and device, electronic equipment and medium | |
CN110414497A (en) | Method, device, server and storage medium for electronizing object | |
CN113221888B (en) | License plate number management system test method and device, electronic equipment and storage medium | |
CN114187476A (en) | Vehicle insurance information checking method, device, equipment and medium based on image analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 40053540 Country of ref document: HK |