WO2017162069A1

WO2017162069A1 - Image text identification method and apparatus

Info

Publication number: WO2017162069A1
Application number: PCT/CN2017/076548
Authority: WO
Inventors: 毛旭东; 施兴; 褚崴; 程孟力; 周文猛
Original assignee: 阿里巴巴集团控股有限公司
Priority date: 2016-03-25
Filing date: 2017-03-14
Publication date: 2017-09-28
Also published as: TWI774659B; CN107229932A; TW201740316A; CN107229932B

Abstract

An image text identification method and apparatus. The method comprises: acquiring an image to be identified, wherein the image comprises a plurality of pixel points (101); according to the plurality of pixel points, determining a first text region of the image (102); according to a pre-set rule, extracting a second text region from the first text region (103); and identifying the second text region. The present invention can effectively remove noises in an image to be identified, thereby greatly improving the accuracy rate of identification.

Description

Method and device for identifying image text

The present application claims priority to Chinese Patent Application No. Serial No. No. No. No. No. No. No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No

Technical field

The present application relates to the field of character recognition technology, and in particular, to a method for recognizing image text and an apparatus for recognizing image text.

Background technique

The purpose of pattern recognition technology research is to construct a machine system that can replace the task of human classification and identification according to the recognition mechanism of human brain, and then carry out automatic information processing. Among them, Chinese character recognition is an important field of pattern recognition application. The most typical one is ID card identification, which automatically recognizes the name, ID number, address, gender and other information.

The traditional Chinese character recognition method mainly performs preprocessing on images, such as grayscale and noise reduction, and based on traditional image feature extraction, and then uses the support vector machine SVM, neural network and other classifiers to train the Chinese character recognition model. of. Traditional Chinese character recognition usually extracts image features based on artificial experience and does not have certain noise immunity. Therefore, when the noise interference is large, the recognition rate of Chinese characters is low. In recent years, with the great success of convolutional neural network CNN in the field of computer vision, CNN has also been applied to Chinese character recognition. The recognition effect of CNN is greatly improved compared to the traditional method.

However, for the recognition of Chinese characters, the biggest problem lies in the diversity of Chinese characters, especially the near-words. The traditional Chinese character recognition method is almost powerless for the near-words. Many Chinese characters plus one radical are another word, such as "can" and "he". If the input picture is "OK", but there is some small noise on the left side of the picture, because CNN is sensitive to the location information, especially when the input data is not enough, the existing method of using CNN to recognize Chinese characters It will be difficult to distinguish whether the input picture is "may" or "he". Therefore, the existing method of using Chinese characters for recognition of Chinese characters still cannot solve the problem of recognition of near-words well.

Summary of the invention

In view of the above problems, embodiments of the present application have been made in order to provide an image text recognition method and a corresponding image text recognition apparatus that overcome the above problems or at least partially solve the above problems.

In order to solve the above problem, the present application discloses a method for identifying an image text, including:

Obtaining an image to be identified, the image comprising a plurality of pixel points;

Determining a first text region of the image based on the plurality of pixel points;

Extracting a second text area from the first text area according to a preset rule;

The second text area is identified.

Optionally, the determining, according to the plurality of pixel points, the first text area of the image comprises:

Calculating a histogram of the image for the plurality of pixel points, the histogram having a corresponding plurality of feature values;

Performing contrast normalization processing on the histogram according to the plurality of characteristic values to obtain a contrast normalization processing result;

The result of the contrast normalization processing is binarized to obtain a first text region of the image.

Optionally, the step of performing normalization processing on the histogram according to the plurality of feature values to obtain a contrast normalization processing result includes:

Adjusting the plurality of feature values proportionally, so that the sum of the adjusted plurality of feature values is a specific value;

And transforming the adjusted plurality of feature values by using a cumulative distribution function to obtain a plurality of transformed feature values;

And mapping the transformed plurality of feature values to the plurality of pixel points respectively to obtain mapped pixel values of the plurality of pixel points.

Optionally, the performing the binarization processing on the contrast normalization processing result, and obtaining the first text region of the image includes:

Determining, respectively, whether a mapped pixel value of the plurality of pixel points in the image is greater than a first preset threshold;

If yes, marking the pixel point as a first background area pixel point;

If not, marking the pixel as a first text area pixel;

A circumscribed rectangle having the smallest area including all the pixels of the first text region is extracted from the image.

Optionally, before the step of extracting the second text area from the first text area according to the preset rule, the method further includes:

The first text area is binarized.

Optionally, the step of performing binarization processing on the first text area includes:

Determining, respectively, whether a mapped pixel value of the plurality of pixel points in the first text area is greater than a second preset threshold;

If yes, marking the pixel point as a second background area pixel point;

If not, the pixel is marked as a second text area pixel.

Optionally, the step of extracting the second text area from the first text area according to the preset rule comprises:

Determining a plurality of connected regions in the first text region;

Determining, respectively, whether the plurality of connected areas meet a preset rule;

If so, a plurality of corresponding connected areas are extracted as the second text area.

Optionally, the determining the multiple connectivity areas in the first text area includes:

Traversing the second text area pixel point;

And connecting the current second text area pixel point to the adjacent second text area pixel point to obtain a polygon with the second text area pixel point as a vertex;

A circumscribed rectangle having the smallest area including the polygon is determined as a connected region.

Optionally, the step of identifying the second text area includes:

The second text region is identified using a convolutional neural network CNN Chinese character recognition model.

In order to solve the above problem, the present application discloses an image text recognition apparatus, including:

An acquiring module, configured to acquire an image to be identified, where the image includes a plurality of pixel points;

a determining module, configured to determine a first text region of the image according to the plurality of pixel points;

An extracting module, configured to extract a second text area from the first text area according to a preset rule;

And an identification module, configured to identify the second text area.

Optionally, the determining module includes:

a histogram calculation submodule, configured to calculate a histogram of the image for the plurality of pixel points, the histogram having a corresponding plurality of feature values;

a contrast normalization processing sub-module, configured to perform contrast normalization processing on the histogram according to the plurality of feature values, to obtain a contrast normalization processing result;

The first text area obtaining submodule is configured to perform binarization processing on the contrast normalization processing result to obtain a first text area of the image.

Optionally, the contrast normalization processing submodule includes:

And an eigenvalue adjustment unit, configured to adjust the plurality of feature values proportionally, so that the sum of the adjusted plurality of feature values is a specific value;

An eigenvalue transformation unit configured to transform the adjusted plurality of eigenvalues by using a cumulative distribution function to obtain a plurality of transformed eigenvalues;

The feature value mapping unit is configured to respectively map the transformed plurality of feature values to the plurality of pixel points to obtain mapped pixel values of the plurality of pixel points.

Optionally, the first text area obtaining submodule includes:

a first preset threshold determining unit, configured to determine, respectively, whether a mapped pixel value of the plurality of pixel points in the image is greater than a first preset threshold;

a first background area pixel point marking unit, configured to mark the pixel point as a first background area pixel point when a mapped pixel value of the pixel point is greater than a first preset threshold;

a first text area pixel point marking unit, configured to mark the pixel point as a first text area pixel point when a mapped pixel value of the pixel point is not greater than a first preset threshold;

The first text area extracting unit is configured to extract, from the image, a circumscribed rectangle having the smallest area of all the first text area pixel points.

Optionally, the device further includes:

A binarization processing module is configured to perform binarization processing on the first text region.

Optionally, the binarization processing module includes:

a second preset threshold determining sub-module, configured to determine, respectively, whether a mapped pixel value of the plurality of pixel points in the first text area is greater than a second preset threshold;

a second background area pixel point marking submodule, configured to mark the pixel point as a second background area pixel point when the mapped pixel value of the pixel point is greater than a second preset threshold;

The second text area pixel point sub-module is configured to mark the pixel point as a second text area pixel point when the mapped pixel value of the pixel point is not greater than a second preset threshold.

Optionally, the extraction module includes:

a connected area determining submodule, configured to determine a plurality of connected areas in the first text area;

a preset rule determining sub-module, configured to respectively determine whether the plurality of connected areas meet a preset rule;

The second text area extraction submodule is configured to extract a corresponding plurality of connected areas as the second text area when the plurality of connected areas satisfy the preset rule.

Optionally, the connectivity area determining submodule includes:

a second text area pixel traversal unit for traversing the second text area pixel point;

a second text area pixel point connecting unit, configured to connect the current second text area pixel point to the adjacent second text area pixel point to obtain a polygon with the second text area pixel point as a vertex;

The connected area determining unit is configured to determine a circumscribed rectangle having the smallest area of the polygon as the connected area.

Optionally, the identifying module includes:

And a recognition submodule for identifying the second text area by using a convolutional neural network CNN Chinese character recognition model.

Compared with the background art, the embodiments of the present application include the following advantages:

In the embodiment of the present application, the first text region is extracted by performing contrast normalization processing and binarization processing on the image to be recognized, and then obtaining a second text region based on the connected region of the first text region, effectively The noise in the image to be recognized is removed, and the recognition of the image text is realized by recognizing the second text region, thereby avoiding interference of noise on image text recognition, and greatly improving the accuracy of recognition.

Secondly, in the embodiment of the present application, in the text recognition scene with a single font and a simple background in the ID card, the passport, etc., by performing spatial normalization processing on the image to be recognized, the training data and the test data can be spatially as much as possible. The unity makes the shape near words normalized in space and has different performance characteristics, which makes the CNN Chinese character recognition model more accurately recognize the near-word.

DRAWINGS

1 is a flow chart showing the steps of Embodiment 1 of an image text identification method according to the present application;

2 is a flow chart showing the steps of a second embodiment of the method for identifying an image text according to the present application;

3 is a structural block diagram of an embodiment of an apparatus for identifying an image text according to the present application.

detailed description

The above described objects, features and advantages of the present application will become more apparent and understood.

Referring to FIG. 1 , a flow chart of a first embodiment of a method for identifying an image text according to the present application is shown. Specifically, the method may include the following steps:

Step 101: Acquire an image to be identified;

In the embodiment of the present application, the image to be identified may be various types of ID images, such as an ID card, a passport, and the like. Wherein, the image includes a plurality of pixel points, and the pixel point refers to dividing an image into a plurality of small squares, each small square is called a pixel point, and a grid composed of the pixel points is arranged. Known as the "raster", the computer can represent the entire image by indicating the position, color, brightness, etc. of these pixels.

Usually, the text in each type of document is different from the Chinese character recognition of other natural scenes. The characteristics of the text in the document are: 1) the text is printed; 2) the text is a single (or a small variety) font, for example, all are in the Song, or both in the Song or the Chinese characters; 3) the image background is simple.

Step 102: Determine, according to the plurality of pixel points, a first text area of the image;

Generally, in order to identify text in an image, some background regions may be excluded based on the plurality of pixels to determine a first text region of the image.

In the embodiment of the present application, the first text area may be an area including text information determined through preliminary screening, thereby facilitating further targeted recognition of text of the corresponding area.

In a preferred embodiment of the present application, the step of determining the first text area of the image according to the plurality of pixel points may specifically include the following sub-steps:

Sub-step 1021, calculating a histogram of the image for the plurality of pixel points;

In a specific implementation, after obtaining the image to be identified, a histogram of the image may be first calculated for a plurality of pixel points in the image. A histogram is a graph used to describe the gray value of an image. It can display image data within a certain range. By viewing the histogram of the image, you can understand the exposure of the image, or whether the image is soft. The histogram may have a corresponding plurality of feature values, ie, RGB values representing different brightnesses.

In general, the horizontal axis of the histogram can be used to represent changes in image brightness, and the vertical axis is used to indicate how many pixels. The horizontal axis of the histogram from left to right indicates that the brightness is getting higher and higher, from 0 to 255, where 0 is black and 255 is white. If the peak of a place is higher, the more pixels there are at this brightness.

Sub-step 1022, performing contrast normalization processing on the histogram according to the plurality of feature values, to obtain a contrast normalization processing result;

In a specific implementation, the contrast normalization process on the histogram may first adjust a plurality of feature values of the histogram proportionally, so that the sum of the adjusted plurality of feature values is 255. For example, if the sum of the plurality of feature values of the histogram is 765 after summation, each feature value may be multiplied by 1/3, so that the sum of the adjusted plurality of feature values is 255 (765 *1/3=255); If the sum of the plurality of feature values of the fat map is less than 255, each feature value may be scaled up so that the sum of the adjusted plurality of feature values satisfies the above requirements.

Then, the adjusted plurality of feature values may be transformed by using a cumulative distribution function to obtain a plurality of transformed feature values. The cumulative distribution function is the integral of the probability density function and can fully describe the probability distribution of a real random variable X.

Further, the obtained transformed plurality of feature values may be used as a mapping table, and the plurality of transformed feature values are respectively mapped to a plurality of pixel points of the image, and the transformed feature values are used as the plurality of pixel points Mapped pixel values, Thereby replacing the original pixel value of the pixel.

Sub-step 1023, performing binarization processing on the contrast normalization processing result to obtain a first text region of the image.

In the embodiment of the present application, the mapped pixel values of the plurality of pixel points may be traversed first to determine whether the mapped pixel value is greater than the first preset threshold, and if so, the pixel may be marked as the first background region. a pixel point; if not, the pixel point may be marked as a first text area pixel point; and then a circumscribed rectangle having the smallest area including all the first text area pixel points is extracted from the image, and the circumscribed rectangle is Is the first text area of the image.

In a specific implementation, the first preset threshold may be calculated by an Otsu algorithm (OTSU algorithm). The Otsu algorithm is an efficient algorithm for binarizing images. Using the idea of clustering, the gray value of the image is divided into two parts according to the gray level, so that the difference in gray value between the two parts is the largest. The difference in gray scale between each part is the smallest, and then the variance is calculated to find a suitable gray level to divide. Therefore, in the case of binarization, the Otsu algorithm can be used to automatically select the threshold for binarization. The Otsu algorithm is considered to be the best algorithm for threshold selection in image segmentation. It is simple to calculate and is not affected by image brightness and contrast.

Step 103: Extract a second text area from the first text area according to a preset rule.

In the embodiment of the present application, after the first text area is extracted from the image, the first text area may be further filtered to further exclude the background area, and a second text area including text information is obtained.

In a preferred embodiment of the present application, before the step of extracting the second text area from the first text area according to the preset rule, the method may further include the following:

The first text area is binarized.

In a specific implementation, the first text region may be subjected to a second binarization process according to the method in step 102, thereby marking the second text region pixel point, and then extracting the second text based on the second text region pixel point. region.

In a preferred embodiment of the present application, the step of extracting the second text area from the first text area according to the preset rule may specifically include the following sub-steps:

Sub-step 1031, determining a plurality of connected areas in the first text area;

Sub-step 1032, respectively, determining whether the plurality of connected areas meet the preset rule, and if yes, extracting the corresponding multiple connected areas as the second text area.

In a specific implementation, the second text region pixel of the mark may be processed based on the second binarization process, and the connected graph algorithm is used to determine the plurality of connected regions in the first text region. Specifically, the following pseudo code is implemented in the present application. An example of a connected graph algorithm used in the example:

Then, each connected area can be separately judged, and the connected area that does not satisfy the preset rule is deleted, thereby obtaining a second text area.

In the embodiment of the present application, the connected area that does not satisfy the preset rule may include a connected area with a small area, and a connected area with a large distance from the largest connected area.

Step 104: Identify the second text area.

Generally, after the processing of

steps

102 and 103 is completed on the image to be recognized, the purpose of removing noise has been substantially achieved, so that the second text area can be identified, and the text information in the image to be recognized is obtained.

In the embodiment of the present application, the contrast normalization process and the binarization process are performed on the image to be recognized, thereby extracting the first text region, and then obtaining the second text region on the basis of determining the connected region of the first text region. The noise in the image to be recognized is effectively removed, and the recognition of the second text area is performed to realize the recognition of the image text, thereby avoiding the interference of the noise on the image text recognition, and the recognition accuracy is greatly improved.

Referring to FIG. 2, a flow chart of the steps of the second embodiment of the method for identifying the image text of the present application is shown, which may specifically include the following steps:

Step 201: Acquire an image to be identified.

In the embodiment of the present application, the image to be identified may be various types of ID images, such as an ID card, a passport, and the like. Usually, the text in each type of document image is different from the Chinese character recognition of other natural scenes. The characteristics of the text in the document are: 1) the text is printed; 2) the text is a single (or a small variety) font, for example, all are in the Song, or both in the Song or the Chinese characters; 3) the image background is simple. Therefore, image text recognition based on spatial normalization operations can be applied to scenes for document identification.

Step 202: Calculate a histogram of the image for the plurality of pixel points;

Each image includes multiple pixels, and the computer can indicate the position, color, and Information such as brightness to represent the entire image. Therefore, in the embodiment of the present application, a histogram of the image may be calculated for the plurality of pixel points.

Step 203: Perform contrast normalization processing on the histogram according to the plurality of feature values, to obtain a contrast normalization processing result;

In a preferred embodiment of the present application, the step of performing normalization processing on the histogram according to the plurality of feature values to obtain a contrast normalization processing result may specifically include the following sub-steps:

Sub-step 2031, adjusting the plurality of feature values proportionally, so that the sum of the adjusted plurality of feature values is a specific value;

Sub-step 2032, transforming the adjusted plurality of feature values by using a cumulative distribution function to obtain a plurality of transformed feature values;

Sub-step 2033, respectively mapping the transformed plurality of feature values to the plurality of pixel points to obtain mapped pixel values of the plurality of pixel points.

Then, the adjusted plurality of feature values may be transformed by using a cumulative distribution function to obtain a plurality of transformed feature values. The cumulative distribution function is the integral of the probability density function and can fully describe the probability distribution of a real random variable X. That is, the corresponding value after the transformation of the jth eigenvalue should be the sum of all the eigenvalues preceding it.

Further, the obtained transformed plurality of feature values may be used as a mapping table, and the plurality of transformed feature values are respectively mapped to a plurality of pixel points of the image, and the transformed feature values are used as the plurality of pixel points Map the pixel values to replace the original pixel values of the pixels.

Step 204: Perform binarization processing on the contrast normalization processing result to obtain a first text region of the image;

In a specific implementation, the Otsu algorithm (OTSU algorithm) may be used to calculate a first preset threshold, and the first text region of the image is obtained by comparing the mapped pixel value of each pixel with a first preset threshold. .

In a preferred embodiment of the present application, the step of performing binarization processing on the contrast normalization processing result to obtain the first text region of the image may specifically include the following sub-steps:

Sub-step 2041, respectively, determining whether a mapped pixel value of the plurality of pixel points in the image is greater than a first preset threshold value;

Sub-step 2042, if yes, marking the pixel point as a first background area pixel point;

Sub-step 2043, if no, marking the pixel point as a first text area pixel point;

Sub-step 2044, extracting a circumscribed rectangle having the smallest area of all the first text region pixel points from the image.

In a specific implementation, the mapped pixel value of each pixel point may be compared with a first preset threshold, and if the mapped pixel value is greater than the first preset threshold, the pixel may be marked as the first background area. a pixel, for example, a mark dst(x, y)=1. If the mapped pixel value is not greater than a first preset threshold, the pixel may be marked as a first text area pixel, such as a mark dst(x, y )=0.

Then, a rectangle having the smallest area and being able to include all the pixels of dst(x, y) = 0 is found in the image. The image within the rectangle is the result of normalization of the first-order space, ie the first text region.

Step 205: Perform binarization processing on the first text area.

In the embodiment of the present application, the process of performing the binarization process on the first text area is the same as the step 204. The step of performing the binarization process on the first text area may specifically include the following sub-steps:

Sub-step 2051, respectively, determining whether a mapped pixel value of the plurality of pixel points in the first text area is greater than a second preset threshold;

Sub-step 2052, if yes, marking the pixel point as a second background area pixel point;

Sub-step 2053, if no, marks the pixel as a second text area pixel.

It should be noted that when performing the second binarization process on the first text region, the preset threshold needs to be recalculated, that is, the second preset threshold needs to be calculated by the Otsu algorithm (OTSU algorithm), by using each The mapped pixel value of the pixel is compared with a second predetermined threshold to mark the second background area pixel and the second text area pixel. For example, if the mapping pixel value is greater than the second preset threshold, the pixel may be marked as a second background area pixel, and the flag dst(x, y)=1, if the mapping pixel value is not greater than the second pre- By setting a threshold, the pixel can be marked as a pixel of the second text area, and the mark dst(x, y)=0.

Step 206: Determine a plurality of connected areas in the first text area.

In the embodiment of the present application, a plurality of connected regions in the first text region may be determined by using a connectivity graph algorithm based on the second text region pixel of the second binarization process.

In a preferred embodiment of the present application, the step of determining a plurality of connected regions in the first text region The specific steps may include the following substeps:

Sub-step 2061, traversing the second text area pixel point;

Sub-step 2062, the current second text area pixel point is connected to the adjacent second text area pixel point to obtain a polygon with the second text area pixel point as a vertex;

Sub-step 2063, the circumscribed rectangle including the smallest area of the polygon is determined as the connected area.

In a specific implementation, the second text area pixel point may be traversed, that is, when binarization processing in step 205, the pixel point marked as dst(x, y)=0, the current second text area pixel point And connecting adjacent pixel points of the second text area to obtain a polygon having all the pixels of the second text area as a vertex, and then finding a smallest area in the first text area and including the polygon The rectangle inside. The image within the rectangle is a connected area.

Step 207: Determine whether the multiple connected areas meet the preset rule.

In the embodiment of the present application, after determining all the connected areas, it may be determined one by one whether the connected area satisfies a preset rule. If a connected area does not satisfy the preset rule, the connected area may be deleted, and finally A second text region composed of a plurality of remaining connected regions satisfying the preset rule is obtained.

In a specific implementation, the connected area that does not satisfy the preset rule may include a connected area that is too small in area, and a connected area that is far away from the largest connected area. For example, a connected area having an area of less than 2*2 pixels, and a connected area having a distance of more than 0.06 from the maximum connected area.

Step 208, extracting corresponding multiple connected areas as the second text area;

Step 209: Identify the second text area by using a convolutional neural network CNN Chinese character recognition model.

In the embodiment of the present application, after obtaining the second text area image, the second text area may be identified by using a convolutional neural network CNN Chinese character recognition model. The Convolutional Neural Network (CNN) is a feedforward neural network whose artificial neurons can respond to a surrounding area of a part of the coverage and have excellent performance for large image processing.

In a specific implementation, the training data may be spatially normalized by using the method described in the foregoing steps 201 to 208, and used for training the CNN Chinese character recognition model, thereby obtaining a convolutional neural network CNN Chinese character recognition model. Then, in the image text recognition task, an image to be recognized is given, and the trained CNN Chinese character recognition model is used for recognition.

In the embodiment of the present application, in the text recognition scene with a single font and a simple background in the ID card, the passport, and the like, by performing spatial normalization processing on the image to be recognized, the training data and the test data can be spatially as much as possible. The unity makes the shape near words normalized in space and has different performance characteristics, which makes the CNN Chinese character recognition model more accurately recognize the near-words.

It should be noted that, for the method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should understand that the embodiments of the present application are not limited by the described action sequence, because In accordance with embodiments of the present application, certain steps may be performed in other sequences or concurrently. In the following, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions involved are not necessarily required in the embodiments of the present application.

Referring to FIG. 3, a structural block diagram of an embodiment of an apparatus for identifying an image text according to the present application is shown. Specifically, the following modules may be included:

An obtaining module 301, configured to acquire an image to be identified, where the image includes a plurality of pixel points;

a determining module 302, configured to determine, according to the plurality of pixel points, a first text area of the image;

The extracting module 303 is configured to extract a second text area from the first text area according to a preset rule;

The identification module 304 is configured to identify the second text area.

In the embodiment of the present application, the determining module 302 may specifically include the following submodules:

a histogram calculation sub-module 3021, configured to calculate a histogram of the image for the plurality of pixel points, the histogram having a corresponding plurality of feature values;

The contrast normalization processing sub-module 3022 is configured to perform contrast normalization processing on the histogram according to the plurality of feature values to obtain a contrast normalization processing result;

The first text area obtaining sub-module 3023 is configured to perform binarization processing on the contrast normalization processing result to obtain a first text area of the image.

In the embodiment of the present application, the contrast normalization processing sub-module 3022 may specifically include the following units:

The feature value adjustment unit 221 is configured to adjust the plurality of feature values proportionally, so that the sum of the adjusted plurality of feature values is a specific value;

The eigenvalue transformation unit 222 is configured to transform the adjusted plurality of eigenvalues by using a cumulative distribution function to obtain a plurality of transformed eigenvalues;

The feature value mapping unit 223 is configured to respectively map the transformed plurality of feature values to the plurality of pixel points to obtain mapped pixel values of the plurality of pixel points.

In the embodiment of the present application, the first text area obtaining submodule 3023 may specifically include the following units:

The first preset threshold determining unit 231 is configured to determine, respectively, whether a mapped pixel value of the plurality of pixel points in the image is greater than a first preset threshold;

a first background area pixel point marking unit 232, configured to mark the pixel point as a first background area pixel point when the mapped pixel value of the pixel point is greater than a first preset threshold;

a first text area pixel point marking unit 233, configured to mark the pixel point as a first text area pixel point when the mapped pixel value of the pixel point is not greater than a first preset threshold;

The first text area extracting unit 234 is configured to extract, from the image, a circumscribed rectangle having the smallest area of all the first text area pixel points.

In the embodiment of the present application, the device may further include the following modules:

The binarization processing module 305 is configured to perform binarization processing on the first text region.

In the embodiment of the present application, the binarization processing module 305 may specifically include the following submodules:

a second preset threshold determining sub-module 3051, configured to determine, respectively, whether a mapped pixel value of the plurality of pixel points in the first text region is greater than a second preset threshold;

a second background area pixel point sub-module 3052, configured to mark the pixel point as a second background area pixel point when the mapped pixel value of the pixel point is greater than a second preset threshold;

The second text area pixel point sub-module 3053 is configured to mark the pixel point as a second text area pixel point when the mapped pixel value of the pixel point is not greater than a second preset threshold.

In the embodiment of the present application, the extraction module 303 may specifically include the following submodules:

a connected area determining submodule 3031, configured to determine a plurality of connected areas in the first text area;

The preset rule determining sub-module 3032 is configured to determine whether the plurality of connected areas meet the preset rule respectively;

The second text area extraction sub-module 3033 is configured to extract a corresponding plurality of connected areas as the second text area when the plurality of connected areas satisfy the preset rule.

In the embodiment of the present application, the connectivity area determining submodule 3031 may specifically include the following units:

a second text area pixel traversing unit 311, configured to traverse the second text area pixel point;

a second text area pixel point connecting unit 312, configured to connect the current second text area pixel point with the adjacent second text area pixel point to obtain a polygon with the second text area pixel point as a vertex;

The connected area determining unit 313 is configured to determine a circumscribed rectangle that includes the smallest area of the polygon as the connected area.

In the embodiment of the present application, the identification module 304 may specifically include the following sub-modules:

The identification sub-module 3041 is configured to use the convolutional neural network CNN Chinese character recognition model to the second text area Identify.

For the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.

The various embodiments in the present specification are described in a progressive manner, and each embodiment focuses on differences from other embodiments, and the same similar parts between the various embodiments can be referred to each other.

Those skilled in the art will appreciate that embodiments of the embodiments of the present application can be provided as a method, apparatus, or computer program product. Therefore, the embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, embodiments of the present application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

In a typical configuration, the computer device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. The memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory. Memory is an example of a computer readable medium. Computer readable media includes both permanent and non-persistent, removable and non-removable media. Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-persistent computer readable media, such as modulated data signals and carrier waves.

Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal device to produce a machine such that instructions are executed by a processor of a computer or other programmable data processing terminal device Generate a process for implementing one or more processes and/or block diagrams in a flowchart A device in a box or a function specified in multiple boxes.

The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The instruction device implements the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.

These computer program instructions can also be loaded onto a computer or other programmable data processing terminal device such that a series of operational steps are performed on the computer or other programmable terminal device to produce computer-implemented processing, such that the computer or other programmable terminal device The instructions executed above provide steps for implementing the functions specified in one or more blocks of the flowchart or in a block or blocks of the flowchart.

While a preferred embodiment of the embodiments of the present application has been described, those skilled in the art can make further changes and modifications to the embodiments once they are aware of the basic inventive concept. Therefore, the appended claims are intended to be interpreted as including all the modifications and the modifications

Finally, it should also be noted that in this context, relational terms such as first and second are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities. There is any such actual relationship or order between operations. Furthermore, the terms "comprises" or "comprising" or "comprising" or any other variations are intended to encompass a non-exclusive inclusion, such that a process, method, article, or terminal device that includes a plurality of elements includes not only those elements but also Other elements that are included, or include elements inherent to such a process, method, article, or terminal device. An element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article, or terminal device that comprises the element, without further limitation.

The method for recognizing an image text and the device for recognizing an image text provided by the present application are described in detail above. The principles and implementation manners of the present application are described in the following, and the description of the above embodiments is described. It is only used to help understand the method of the present application and its core ideas; at the same time, for those of ordinary skill in the art, according to the idea of the present application, there will be changes in specific implementation manners and application scopes. The contents of this specification are not to be construed as limiting the application.

Claims

A method for recognizing image text, comprising:

Obtaining an image to be identified, the image comprising a plurality of pixel points;

Determining a first text region of the image based on the plurality of pixel points;

Extracting a second text area from the first text area according to a preset rule;

The second text area is identified.
The method according to claim 1, wherein the determining the first text region of the image according to the plurality of pixel points comprises:

Calculating a histogram of the image for the plurality of pixel points, the histogram having a corresponding plurality of feature values;

Performing contrast normalization processing on the histogram according to the plurality of characteristic values to obtain a contrast normalization processing result;

The result of the contrast normalization processing is binarized to obtain a first text region of the image.
The method according to claim 2, wherein the step of performing contrast normalization processing on the histogram according to the plurality of feature values to obtain a contrast normalization processing result comprises:

Adjusting the plurality of feature values proportionally, so that the sum of the adjusted plurality of feature values is a specific value;

And transforming the adjusted plurality of feature values by using a cumulative distribution function to obtain a plurality of transformed feature values;

And mapping the transformed plurality of feature values to the plurality of pixel points respectively to obtain mapped pixel values of the plurality of pixel points.
The method according to claim 3, wherein the step of binarizing the result of the contrast normalization processing to obtain the first text region of the image comprises:

Determining, respectively, whether a mapped pixel value of the plurality of pixel points in the image is greater than a first preset threshold;

If yes, marking the pixel point as a first background area pixel point;

If not, marking the pixel as a first text area pixel;

A circumscribed rectangle having the smallest area including all the pixels of the first text region is extracted from the image.
The method according to any one of claims 1 to 4, further comprising: before the step of extracting the second text area from the first text area according to the preset rule, further comprising:

The first text area is binarized.
The method according to claim 5, wherein the step of performing binarization processing on the first text region comprises:

Determining, respectively, whether a mapped pixel value of the plurality of pixel points in the first text area is greater than a second preset threshold;

If yes, marking the pixel point as a second background area pixel point;

If not, the pixel is marked as a second text area pixel.
The method according to claim 6, wherein the step of extracting the second text area from the first text area according to a preset rule comprises:

Determining a plurality of connected regions in the first text region;

Determining, respectively, whether the plurality of connected areas meet a preset rule;

If so, a plurality of corresponding connected areas are extracted as the second text area.
The method according to claim 7, wherein the determining the plurality of connected areas in the first text area comprises:

Traversing the second text area pixel point;

And connecting the current second text area pixel point to the adjacent second text area pixel point to obtain a polygon with the second text area pixel point as a vertex;

A circumscribed rectangle having the smallest area including the polygon is determined as a connected region.
The method according to claim 1 or 2 or 3 or 4 or 6 or 7 or 8, wherein said step of identifying said second text region comprises:

The second text region is identified using a convolutional neural network CNN Chinese character recognition model.
An apparatus for identifying an image text, comprising:

An acquiring module, configured to acquire an image to be identified, where the image includes a plurality of pixel points;

a determining module, configured to determine a first text region of the image according to the plurality of pixel points;

An extracting module, configured to extract a second text area from the first text area according to a preset rule;

And an identification module, configured to identify the second text area.
The apparatus according to claim 10, wherein the determining module comprises:

a histogram calculation submodule, configured to calculate a histogram of the image for the plurality of pixel points, the histogram having a corresponding plurality of feature values;

a contrast normalization processing sub-module, configured to perform contrast normalization processing on the histogram according to the plurality of feature values, to obtain a contrast normalization processing result;

The first text area obtaining submodule is configured to perform binarization processing on the contrast normalization processing result to obtain a first text area of the image.
The apparatus according to claim 11, wherein said contrast normalization processing sub-module comprises:

And an eigenvalue adjustment unit, configured to adjust the plurality of feature values proportionally, so that the sum of the adjusted plurality of feature values is a specific value;

An eigenvalue transformation unit configured to transform the adjusted plurality of eigenvalues by using a cumulative distribution function to obtain a plurality of transformed eigenvalues;

The feature value mapping unit is configured to respectively map the transformed plurality of feature values to the plurality of pixel points to obtain mapped pixel values of the plurality of pixel points.
The apparatus according to claim 12, wherein the first text area obtaining submodule comprises:

a first preset threshold determining unit, configured to determine, respectively, whether a mapped pixel value of the plurality of pixel points in the image is greater than a first preset threshold;

a first background area pixel point marking unit, configured to mark the pixel point as a first background area pixel point when a mapped pixel value of the pixel point is greater than a first preset threshold;

a first text area pixel point marking unit, configured to mark the pixel point as a first text area pixel point when a mapped pixel value of the pixel point is not greater than a first preset threshold;

The first text area extracting unit is configured to extract, from the image, a circumscribed rectangle having the smallest area of all the first text area pixel points.
The device according to any one of claims 10-13, further comprising:

A binarization processing module is configured to perform binarization processing on the first text region.
The device according to claim 14, wherein the binarization processing module comprises:

a second preset threshold determining sub-module, configured to determine, respectively, whether a mapped pixel value of the plurality of pixel points in the first text area is greater than a second preset threshold;

a second background area pixel point marking submodule, configured to mark the pixel point as a second background area pixel point when the mapped pixel value of the pixel point is greater than a second preset threshold;

The second text area pixel point sub-module is configured to mark the pixel point as a second text area pixel point when the mapped pixel value of the pixel point is not greater than a second preset threshold.
The device according to claim 15, wherein the extraction module comprises:

a connected area determining submodule, configured to determine a plurality of connected areas in the first text area;

a preset rule determining sub-module, configured to respectively determine whether the plurality of connected areas meet a preset rule;

The second text area extraction submodule is configured to extract a corresponding plurality of connected areas as the second text area when the plurality of connected areas satisfy the preset rule.
The apparatus according to claim 16, wherein the connected area determining submodule comprises:

a second text area pixel traversal unit for traversing the second text area pixel point;

a second text area pixel point connecting unit, configured to connect the current second text area pixel point to the adjacent second text area pixel point to obtain a polygon with the second text area pixel point as a vertex;

The connected area determining unit is configured to determine a circumscribed rectangle having the smallest area of the polygon as the connected area.
The device according to claim 10 or 11 or 12 or 13 or 15 or 16 or 17, wherein the identification module comprises:

And a recognition submodule for identifying the second text area by using a convolutional neural network CNN Chinese character recognition model.