[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2017162069A1 - Image text identification method and apparatus - Google Patents

Image text identification method and apparatus Download PDF

Info

Publication number
WO2017162069A1
WO2017162069A1 PCT/CN2017/076548 CN2017076548W WO2017162069A1 WO 2017162069 A1 WO2017162069 A1 WO 2017162069A1 CN 2017076548 W CN2017076548 W CN 2017076548W WO 2017162069 A1 WO2017162069 A1 WO 2017162069A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
text area
pixel
image
area
Prior art date
Application number
PCT/CN2017/076548
Other languages
French (fr)
Chinese (zh)
Inventor
毛旭东
施兴
褚崴
程孟力
周文猛
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2017162069A1 publication Critical patent/WO2017162069A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/32Normalisation of the pattern dimensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/28Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet
    • G06V30/287Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet of Kanji, Hiragana or Katakana characters

Definitions

  • the present application relates to the field of character recognition technology, and in particular, to a method for recognizing image text and an apparatus for recognizing image text.
  • pattern recognition technology research is to construct a machine system that can replace the task of human classification and identification according to the recognition mechanism of human brain, and then carry out automatic information processing.
  • Chinese character recognition is an important field of pattern recognition application.
  • ID card identification which automatically recognizes the name, ID number, address, gender and other information.
  • embodiments of the present application have been made in order to provide an image text recognition method and a corresponding image text recognition apparatus that overcome the above problems or at least partially solve the above problems.
  • the present application discloses a method for identifying an image text, including:
  • the second text area is identified.
  • the determining, according to the plurality of pixel points, the first text area of the image comprises:
  • the result of the contrast normalization processing is binarized to obtain a first text region of the image.
  • the step of performing normalization processing on the histogram according to the plurality of feature values to obtain a contrast normalization processing result includes:
  • the performing the binarization processing on the contrast normalization processing result, and obtaining the first text region of the image includes:
  • a circumscribed rectangle having the smallest area including all the pixels of the first text region is extracted from the image.
  • the method before the step of extracting the second text area from the first text area according to the preset rule, the method further includes:
  • the first text area is binarized.
  • the step of performing binarization processing on the first text area includes:
  • the pixel is marked as a second text area pixel.
  • the step of extracting the second text area from the first text area according to the preset rule comprises:
  • the determining the multiple connectivity areas in the first text area includes:
  • a circumscribed rectangle having the smallest area including the polygon is determined as a connected region.
  • the step of identifying the second text area includes:
  • the second text region is identified using a convolutional neural network CNN Chinese character recognition model.
  • an image text recognition apparatus including:
  • An acquiring module configured to acquire an image to be identified, where the image includes a plurality of pixel points
  • a determining module configured to determine a first text region of the image according to the plurality of pixel points
  • An extracting module configured to extract a second text area from the first text area according to a preset rule
  • an identification module configured to identify the second text area.
  • the determining module includes:
  • a histogram calculation submodule configured to calculate a histogram of the image for the plurality of pixel points, the histogram having a corresponding plurality of feature values
  • a contrast normalization processing sub-module configured to perform contrast normalization processing on the histogram according to the plurality of feature values, to obtain a contrast normalization processing result
  • the first text area obtaining submodule is configured to perform binarization processing on the contrast normalization processing result to obtain a first text area of the image.
  • the contrast normalization processing submodule includes:
  • an eigenvalue adjustment unit configured to adjust the plurality of feature values proportionally, so that the sum of the adjusted plurality of feature values is a specific value
  • An eigenvalue transformation unit configured to transform the adjusted plurality of eigenvalues by using a cumulative distribution function to obtain a plurality of transformed eigenvalues
  • the feature value mapping unit is configured to respectively map the transformed plurality of feature values to the plurality of pixel points to obtain mapped pixel values of the plurality of pixel points.
  • the first text area obtaining submodule includes:
  • a first preset threshold determining unit configured to determine, respectively, whether a mapped pixel value of the plurality of pixel points in the image is greater than a first preset threshold
  • a first background area pixel point marking unit configured to mark the pixel point as a first background area pixel point when a mapped pixel value of the pixel point is greater than a first preset threshold
  • a first text area pixel point marking unit configured to mark the pixel point as a first text area pixel point when a mapped pixel value of the pixel point is not greater than a first preset threshold
  • the first text area extracting unit is configured to extract, from the image, a circumscribed rectangle having the smallest area of all the first text area pixel points.
  • the device further includes:
  • a binarization processing module is configured to perform binarization processing on the first text region.
  • the binarization processing module includes:
  • a second preset threshold determining sub-module configured to determine, respectively, whether a mapped pixel value of the plurality of pixel points in the first text area is greater than a second preset threshold
  • a second background area pixel point marking submodule configured to mark the pixel point as a second background area pixel point when the mapped pixel value of the pixel point is greater than a second preset threshold
  • the second text area pixel point sub-module is configured to mark the pixel point as a second text area pixel point when the mapped pixel value of the pixel point is not greater than a second preset threshold.
  • the extraction module includes:
  • a connected area determining submodule configured to determine a plurality of connected areas in the first text area
  • a preset rule determining sub-module configured to respectively determine whether the plurality of connected areas meet a preset rule
  • the second text area extraction submodule is configured to extract a corresponding plurality of connected areas as the second text area when the plurality of connected areas satisfy the preset rule.
  • the connectivity area determining submodule includes:
  • a second text area pixel traversal unit for traversing the second text area pixel point
  • a second text area pixel point connecting unit configured to connect the current second text area pixel point to the adjacent second text area pixel point to obtain a polygon with the second text area pixel point as a vertex
  • the connected area determining unit is configured to determine a circumscribed rectangle having the smallest area of the polygon as the connected area.
  • a recognition submodule for identifying the second text area by using a convolutional neural network CNN Chinese character recognition model is a convolutional neural network CNN Chinese character recognition model.
  • the embodiments of the present application include the following advantages:
  • the first text region is extracted by performing contrast normalization processing and binarization processing on the image to be recognized, and then obtaining a second text region based on the connected region of the first text region, effectively
  • the noise in the image to be recognized is removed, and the recognition of the image text is realized by recognizing the second text region, thereby avoiding interference of noise on image text recognition, and greatly improving the accuracy of recognition.
  • the training data and the test data can be spatially as much as possible.
  • the unity makes the shape near words normalized in space and has different performance characteristics, which makes the CNN Chinese character recognition model more accurately recognize the near-word.
  • Embodiment 1 is a flow chart showing the steps of Embodiment 1 of an image text identification method according to the present application;
  • FIG. 2 is a flow chart showing the steps of a second embodiment of the method for identifying an image text according to the present application
  • FIG. 3 is a structural block diagram of an embodiment of an apparatus for identifying an image text according to the present application.
  • FIG. 1 a flow chart of a first embodiment of a method for identifying an image text according to the present application is shown. Specifically, the method may include the following steps:
  • Step 101 Acquire an image to be identified
  • the image to be identified may be various types of ID images, such as an ID card, a passport, and the like.
  • the image includes a plurality of pixel points, and the pixel point refers to dividing an image into a plurality of small squares, each small square is called a pixel point, and a grid composed of the pixel points is arranged.
  • the computer can represent the entire image by indicating the position, color, brightness, etc. of these pixels.
  • the text in each type of document is different from the Chinese character recognition of other natural scenes.
  • the characteristics of the text in the document are: 1) the text is printed; 2) the text is a single (or a small variety) font, for example, all are in the Song, or both in the Song or the Chinese characters; 3) the image background is simple.
  • Step 102 Determine, according to the plurality of pixel points, a first text area of the image
  • some background regions may be excluded based on the plurality of pixels to determine a first text region of the image.
  • the first text area may be an area including text information determined through preliminary screening, thereby facilitating further targeted recognition of text of the corresponding area.
  • the step of determining the first text area of the image according to the plurality of pixel points may specifically include the following sub-steps:
  • Sub-step 1021 calculating a histogram of the image for the plurality of pixel points
  • a histogram of the image may be first calculated for a plurality of pixel points in the image.
  • a histogram is a graph used to describe the gray value of an image. It can display image data within a certain range. By viewing the histogram of the image, you can understand the exposure of the image, or whether the image is soft.
  • the histogram may have a corresponding plurality of feature values, ie, RGB values representing different brightnesses.
  • the horizontal axis of the histogram can be used to represent changes in image brightness, and the vertical axis is used to indicate how many pixels.
  • the horizontal axis of the histogram from left to right indicates that the brightness is getting higher and higher, from 0 to 255, where 0 is black and 255 is white. If the peak of a place is higher, the more pixels there are at this brightness.
  • Sub-step 1022 performing contrast normalization processing on the histogram according to the plurality of feature values, to obtain a contrast normalization processing result
  • the adjusted plurality of feature values may be transformed by using a cumulative distribution function to obtain a plurality of transformed feature values.
  • the cumulative distribution function is the integral of the probability density function and can fully describe the probability distribution of a real random variable X.
  • the obtained transformed plurality of feature values may be used as a mapping table, and the plurality of transformed feature values are respectively mapped to a plurality of pixel points of the image, and the transformed feature values are used as the plurality of pixel points Mapped pixel values, Thereby replacing the original pixel value of the pixel.
  • Sub-step 1023 performing binarization processing on the contrast normalization processing result to obtain a first text region of the image.
  • the mapped pixel values of the plurality of pixel points may be traversed first to determine whether the mapped pixel value is greater than the first preset threshold, and if so, the pixel may be marked as the first background region. a pixel point; if not, the pixel point may be marked as a first text area pixel point; and then a circumscribed rectangle having the smallest area including all the first text area pixel points is extracted from the image, and the circumscribed rectangle is Is the first text area of the image.
  • the first preset threshold may be calculated by an Otsu algorithm (OTSU algorithm).
  • Otsu algorithm is an efficient algorithm for binarizing images. Using the idea of clustering, the gray value of the image is divided into two parts according to the gray level, so that the difference in gray value between the two parts is the largest. The difference in gray scale between each part is the smallest, and then the variance is calculated to find a suitable gray level to divide. Therefore, in the case of binarization, the Otsu algorithm can be used to automatically select the threshold for binarization.
  • the Otsu algorithm is considered to be the best algorithm for threshold selection in image segmentation. It is simple to calculate and is not affected by image brightness and contrast.
  • Step 103 Extract a second text area from the first text area according to a preset rule.
  • the first text area may be further filtered to further exclude the background area, and a second text area including text information is obtained.
  • the method may further include the following:
  • the first text area is binarized.
  • the first text region may be subjected to a second binarization process according to the method in step 102, thereby marking the second text region pixel point, and then extracting the second text based on the second text region pixel point. region.
  • the step of extracting the second text area from the first text area according to the preset rule may specifically include the following sub-steps:
  • Sub-step 1031 determining a plurality of connected areas in the first text area
  • Sub-step 1032 respectively, determining whether the plurality of connected areas meet the preset rule, and if yes, extracting the corresponding multiple connected areas as the second text area.
  • the second text region pixel of the mark may be processed based on the second binarization process, and the connected graph algorithm is used to determine the plurality of connected regions in the first text region.
  • the following pseudo code is implemented in the present application.
  • each connected area can be separately judged, and the connected area that does not satisfy the preset rule is deleted, thereby obtaining a second text area.
  • the connected area that does not satisfy the preset rule may include a connected area with a small area, and a connected area with a large distance from the largest connected area.
  • Step 104 Identify the second text area.
  • steps 102 and 103 After the processing of steps 102 and 103 is completed on the image to be recognized, the purpose of removing noise has been substantially achieved, so that the second text area can be identified, and the text information in the image to be recognized is obtained.
  • the contrast normalization process and the binarization process are performed on the image to be recognized, thereby extracting the first text region, and then obtaining the second text region on the basis of determining the connected region of the first text region.
  • the noise in the image to be recognized is effectively removed, and the recognition of the second text area is performed to realize the recognition of the image text, thereby avoiding the interference of the noise on the image text recognition, and the recognition accuracy is greatly improved.
  • FIG. 2 a flow chart of the steps of the second embodiment of the method for identifying the image text of the present application is shown, which may specifically include the following steps:
  • Step 201 Acquire an image to be identified.
  • the image to be identified may be various types of ID images, such as an ID card, a passport, and the like.
  • ID images such as an ID card, a passport, and the like.
  • the text in each type of document image is different from the Chinese character recognition of other natural scenes.
  • the characteristics of the text in the document are: 1) the text is printed; 2) the text is a single (or a small variety) font, for example, all are in the Song, or both in the Song or the Chinese characters; 3) the image background is simple. Therefore, image text recognition based on spatial normalization operations can be applied to scenes for document identification.
  • Step 202 Calculate a histogram of the image for the plurality of pixel points
  • Each image includes multiple pixels, and the computer can indicate the position, color, and Information such as brightness to represent the entire image. Therefore, in the embodiment of the present application, a histogram of the image may be calculated for the plurality of pixel points.
  • Step 203 Perform contrast normalization processing on the histogram according to the plurality of feature values, to obtain a contrast normalization processing result;
  • the step of performing normalization processing on the histogram according to the plurality of feature values to obtain a contrast normalization processing result may specifically include the following sub-steps:
  • Sub-step 2032 transforming the adjusted plurality of feature values by using a cumulative distribution function to obtain a plurality of transformed feature values
  • Sub-step 2033 respectively mapping the transformed plurality of feature values to the plurality of pixel points to obtain mapped pixel values of the plurality of pixel points.
  • the adjusted plurality of feature values may be transformed by using a cumulative distribution function to obtain a plurality of transformed feature values.
  • the cumulative distribution function is the integral of the probability density function and can fully describe the probability distribution of a real random variable X. That is, the corresponding value after the transformation of the jth eigenvalue should be the sum of all the eigenvalues preceding it.
  • the obtained transformed plurality of feature values may be used as a mapping table, and the plurality of transformed feature values are respectively mapped to a plurality of pixel points of the image, and the transformed feature values are used as the plurality of pixel points Map the pixel values to replace the original pixel values of the pixels.
  • Step 204 Perform binarization processing on the contrast normalization processing result to obtain a first text region of the image
  • the Otsu algorithm may be used to calculate a first preset threshold, and the first text region of the image is obtained by comparing the mapped pixel value of each pixel with a first preset threshold.
  • the step of performing binarization processing on the contrast normalization processing result to obtain the first text region of the image may specifically include the following sub-steps:
  • Sub-step 2041 respectively, determining whether a mapped pixel value of the plurality of pixel points in the image is greater than a first preset threshold value
  • Sub-step 2042 if yes, marking the pixel point as a first background area pixel point
  • Sub-step 2043 if no, marking the pixel point as a first text area pixel point
  • Sub-step 2044 extracting a circumscribed rectangle having the smallest area of all the first text region pixel points from the image.
  • the mapped pixel value of each pixel point may be compared with a first preset threshold, and if the mapped pixel value is greater than the first preset threshold, the pixel may be marked as the first background area.
  • the image within the rectangle is the result of normalization of the first-order space, ie the first text region.
  • Step 205 Perform binarization processing on the first text area.
  • the process of performing the binarization process on the first text area is the same as the step 204.
  • the step of performing the binarization process on the first text area may specifically include the following sub-steps:
  • Sub-step 2051 respectively, determining whether a mapped pixel value of the plurality of pixel points in the first text area is greater than a second preset threshold
  • Sub-step 2052 if yes, marking the pixel point as a second background area pixel point
  • Sub-step 2053 if no, marks the pixel as a second text area pixel.
  • the preset threshold needs to be recalculated, that is, the second preset threshold needs to be calculated by the Otsu algorithm (OTSU algorithm), by using each The mapped pixel value of the pixel is compared with a second predetermined threshold to mark the second background area pixel and the second text area pixel.
  • Otsu algorithm Otsu algorithm
  • Step 206 Determine a plurality of connected areas in the first text area.
  • a plurality of connected regions in the first text region may be determined by using a connectivity graph algorithm based on the second text region pixel of the second binarization process.
  • the step of determining a plurality of connected regions in the first text region may include the following substeps:
  • Sub-step 2061 traversing the second text area pixel point
  • Sub-step 2062 the current second text area pixel point is connected to the adjacent second text area pixel point to obtain a polygon with the second text area pixel point as a vertex;
  • Sub-step 2063 the circumscribed rectangle including the smallest area of the polygon is determined as the connected area.
  • the image within the rectangle is a connected area.
  • Step 207 Determine whether the multiple connected areas meet the preset rule.
  • the connected area after determining all the connected areas, it may be determined one by one whether the connected area satisfies a preset rule. If a connected area does not satisfy the preset rule, the connected area may be deleted, and finally A second text region composed of a plurality of remaining connected regions satisfying the preset rule is obtained.
  • the connected area that does not satisfy the preset rule may include a connected area that is too small in area, and a connected area that is far away from the largest connected area.
  • Step 208 extracting corresponding multiple connected areas as the second text area
  • Step 209 Identify the second text area by using a convolutional neural network CNN Chinese character recognition model.
  • the second text area may be identified by using a convolutional neural network CNN Chinese character recognition model.
  • the Convolutional Neural Network is a feedforward neural network whose artificial neurons can respond to a surrounding area of a part of the coverage and have excellent performance for large image processing.
  • the training data may be spatially normalized by using the method described in the foregoing steps 201 to 208, and used for training the CNN Chinese character recognition model, thereby obtaining a convolutional neural network CNN Chinese character recognition model. Then, in the image text recognition task, an image to be recognized is given, and the trained CNN Chinese character recognition model is used for recognition.
  • the training data and the test data can be spatially as much as possible.
  • the unity makes the shape near words normalized in space and has different performance characteristics, which makes the CNN Chinese character recognition model more accurately recognize the near-words.
  • FIG. 3 a structural block diagram of an embodiment of an apparatus for identifying an image text according to the present application is shown. Specifically, the following modules may be included:
  • An obtaining module 301 configured to acquire an image to be identified, where the image includes a plurality of pixel points;
  • a determining module 302 configured to determine, according to the plurality of pixel points, a first text area of the image
  • the extracting module 303 is configured to extract a second text area from the first text area according to a preset rule
  • the identification module 304 is configured to identify the second text area.
  • the determining module 302 may specifically include the following submodules:
  • a histogram calculation sub-module 3021 configured to calculate a histogram of the image for the plurality of pixel points, the histogram having a corresponding plurality of feature values
  • the contrast normalization processing sub-module 3022 may specifically include the following units:
  • the feature value adjustment unit 221 is configured to adjust the plurality of feature values proportionally, so that the sum of the adjusted plurality of feature values is a specific value;
  • the eigenvalue transformation unit 222 is configured to transform the adjusted plurality of eigenvalues by using a cumulative distribution function to obtain a plurality of transformed eigenvalues;
  • the feature value mapping unit 223 is configured to respectively map the transformed plurality of feature values to the plurality of pixel points to obtain mapped pixel values of the plurality of pixel points.
  • the first text area obtaining submodule 3023 may specifically include the following units:
  • the first preset threshold determining unit 231 is configured to determine, respectively, whether a mapped pixel value of the plurality of pixel points in the image is greater than a first preset threshold
  • a first background area pixel point marking unit 232 configured to mark the pixel point as a first background area pixel point when the mapped pixel value of the pixel point is greater than a first preset threshold
  • a first text area pixel point marking unit 233 configured to mark the pixel point as a first text area pixel point when the mapped pixel value of the pixel point is not greater than a first preset threshold
  • the first text area extracting unit 234 is configured to extract, from the image, a circumscribed rectangle having the smallest area of all the first text area pixel points.
  • the binarization processing module 305 is configured to perform binarization processing on the first text region.
  • the binarization processing module 305 may specifically include the following submodules:
  • a second preset threshold determining sub-module 3051 configured to determine, respectively, whether a mapped pixel value of the plurality of pixel points in the first text region is greater than a second preset threshold
  • a second background area pixel point sub-module 3052 configured to mark the pixel point as a second background area pixel point when the mapped pixel value of the pixel point is greater than a second preset threshold
  • the second text area pixel point sub-module 3053 is configured to mark the pixel point as a second text area pixel point when the mapped pixel value of the pixel point is not greater than a second preset threshold.
  • the extraction module 303 may specifically include the following submodules:
  • a connected area determining submodule 3031 configured to determine a plurality of connected areas in the first text area
  • the preset rule determining sub-module 3032 is configured to determine whether the plurality of connected areas meet the preset rule respectively;
  • the second text area extraction sub-module 3033 is configured to extract a corresponding plurality of connected areas as the second text area when the plurality of connected areas satisfy the preset rule.
  • the connectivity area determining submodule 3031 may specifically include the following units:
  • a second text area pixel traversing unit 311, configured to traverse the second text area pixel point
  • a second text area pixel point connecting unit 312, configured to connect the current second text area pixel point with the adjacent second text area pixel point to obtain a polygon with the second text area pixel point as a vertex;
  • the connected area determining unit 313 is configured to determine a circumscribed rectangle that includes the smallest area of the polygon as the connected area.
  • the identification module 304 may specifically include the following sub-modules:
  • the identification sub-module 3041 is configured to use the convolutional neural network CNN Chinese character recognition model to the second text area Identify.
  • the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
  • embodiments of the embodiments of the present application can be provided as a method, apparatus, or computer program product. Therefore, the embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, embodiments of the present application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include non-persistent computer readable media, such as modulated data signals and carrier waves.
  • Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG.
  • These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal device to produce a machine such that instructions are executed by a processor of a computer or other programmable data processing terminal device Generate a process for implementing one or more processes and/or block diagrams in a flowchart A device in a box or a function specified in multiple boxes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Character Input (AREA)
  • Character Discrimination (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

An image text identification method and apparatus. The method comprises: acquiring an image to be identified, wherein the image comprises a plurality of pixel points (101); according to the plurality of pixel points, determining a first text region of the image (102); according to a pre-set rule, extracting a second text region from the first text region (103); and identifying the second text region. The present invention can effectively remove noises in an image to be identified, thereby greatly improving the accuracy rate of identification.

Description

一种图像文本的识别方法和装置Method and device for identifying image text
本申请要求2016年03月25日递交的申请号为201610179262.8、发明名称为“一种图像文本的识别方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. Serial No. No. No. No. No. No. No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No No
技术领域Technical field
本申请涉及文字识别技术领域,特别是涉及一种图像文本的识别方法和一种图像文本的识别装置。The present application relates to the field of character recognition technology, and in particular, to a method for recognizing image text and an apparatus for recognizing image text.
背景技术Background technique
模式识别技术的研究目的是根据人类大脑的识别机理,通过计算机模拟,构造出能代替人类完成分类和辨识的任务,进而进行自动信息处理的机器系统。其中,汉字识别便是模式识别应用的一个重要领域,最典型的就是身份证识别,自动识别姓名、身份证号码、地址、性别等信息。The purpose of pattern recognition technology research is to construct a machine system that can replace the task of human classification and identification according to the recognition mechanism of human brain, and then carry out automatic information processing. Among them, Chinese character recognition is an important field of pattern recognition application. The most typical one is ID card identification, which automatically recognizes the name, ID number, address, gender and other information.
传统的汉字识别方法主要是通过对图像进行预处理,比如灰度化、降噪等,并基于传统的图像特征抽取,然后再采用支持向量机SVM、神经网络等分类器训练汉字识别模型来进行的。传统的汉字识别通常基于人工经验提取图像特征,不具备一定的噪声抗干扰能力。因此,当噪声干扰较大时,会导致汉字的识别率较低。近年来,随着卷积神经网络CNN在计算机视觉领域取得巨大成功,CNN也被应用于汉字识别。CNN的识别效果,相比于传统方法,准确率有了很大的提高。The traditional Chinese character recognition method mainly performs preprocessing on images, such as grayscale and noise reduction, and based on traditional image feature extraction, and then uses the support vector machine SVM, neural network and other classifiers to train the Chinese character recognition model. of. Traditional Chinese character recognition usually extracts image features based on artificial experience and does not have certain noise immunity. Therefore, when the noise interference is large, the recognition rate of Chinese characters is low. In recent years, with the great success of convolutional neural network CNN in the field of computer vision, CNN has also been applied to Chinese character recognition. The recognition effect of CNN is greatly improved compared to the traditional method.
但是,对于汉字识别,最大的问题在于汉字的多样性,特别是形近字较多,传统的汉字识别方法,对于形近字几乎无能为力。很多汉字加一个偏旁就是另外一个字,比如“可”与“何”。如果输入的图片是“可”,但是在图片的左侧又存在一些小噪声,由于CNN对位置信息还是比较敏感,特别是在输入数据不够充分的情况下,现有的利用CNN识别汉字的方法会很难区分输入图片的到底是“可”还是“何”。因此,现有的利用CNN进行汉字识别的方法仍然不能很好的解决形近字的识别的问题。However, for the recognition of Chinese characters, the biggest problem lies in the diversity of Chinese characters, especially the near-words. The traditional Chinese character recognition method is almost powerless for the near-words. Many Chinese characters plus one radical are another word, such as "can" and "he". If the input picture is "OK", but there is some small noise on the left side of the picture, because CNN is sensitive to the location information, especially when the input data is not enough, the existing method of using CNN to recognize Chinese characters It will be difficult to distinguish whether the input picture is "may" or "he". Therefore, the existing method of using Chinese characters for recognition of Chinese characters still cannot solve the problem of recognition of near-words well.
发明内容Summary of the invention
鉴于上述问题,提出了本申请实施例以便提供一种克服上述问题或者至少部分地解决上述问题的一种图像文本的识别方法和相应的一种图像文本的识别装置。 In view of the above problems, embodiments of the present application have been made in order to provide an image text recognition method and a corresponding image text recognition apparatus that overcome the above problems or at least partially solve the above problems.
为了解决上述问题,本申请公开了一种图像文本的识别方法,包括:In order to solve the above problem, the present application discloses a method for identifying an image text, including:
获取待识别的图像,所述图像包括多个像素点;Obtaining an image to be identified, the image comprising a plurality of pixel points;
根据所述多个像素点,确定所述图像的第一文本区域;Determining a first text region of the image based on the plurality of pixel points;
按照预设规则,从所述第一文本区域中提取出第二文本区域;Extracting a second text area from the first text area according to a preset rule;
对所述第二文本区域进行识别。The second text area is identified.
可选地,所述根据所述多个像素点,确定所述图像的第一文本区域的步骤包括:Optionally, the determining, according to the plurality of pixel points, the first text area of the image comprises:
针对所述多个像素点,计算所述图像的直方图,所述直方图具有对应的多个特征值;Calculating a histogram of the image for the plurality of pixel points, the histogram having a corresponding plurality of feature values;
根据所述多个特征值,对所述直方图进行对比度归一化处理,获得对比度归一化处理结果;Performing contrast normalization processing on the histogram according to the plurality of characteristic values to obtain a contrast normalization processing result;
对所述对比度归一化处理结果进行二值化处理,获得所述图像的第一文本区域。The result of the contrast normalization processing is binarized to obtain a first text region of the image.
可选地,所述根据所述多个特征值,对所述直方图进行对比度归一化处理,获得对比度归一化处理结果的步骤包括:Optionally, the step of performing normalization processing on the histogram according to the plurality of feature values to obtain a contrast normalization processing result includes:
按比例调整所述多个特征值,使调整后的多个特征值的和为特定数值;Adjusting the plurality of feature values proportionally, so that the sum of the adjusted plurality of feature values is a specific value;
采用累积分布函数对所述调整后的多个特征值进行变换,获得变换后的多个特征值;And transforming the adjusted plurality of feature values by using a cumulative distribution function to obtain a plurality of transformed feature values;
分别将所述变换后的多个特征值映射到所述多个像素点,获得所述多个像素点的映射像素值。And mapping the transformed plurality of feature values to the plurality of pixel points respectively to obtain mapped pixel values of the plurality of pixel points.
可选地,所述对所述对比度归一化处理结果进行二值化处理,获得所述图像的第一文本区域的步骤包括:Optionally, the performing the binarization processing on the contrast normalization processing result, and obtaining the first text region of the image includes:
分别判断所述图像中的多个像素点的映射像素值是否大于第一预设阈值;Determining, respectively, whether a mapped pixel value of the plurality of pixel points in the image is greater than a first preset threshold;
若是,则将所述像素点标记为第一背景区域像素点;If yes, marking the pixel point as a first background area pixel point;
若否,则将所述像素点标记为第一文本区域像素点;If not, marking the pixel as a first text area pixel;
从所述图像中提取出包含全部第一文本区域像素点的面积最小的外接矩形。A circumscribed rectangle having the smallest area including all the pixels of the first text region is extracted from the image.
可选地,在所述按照预设规则,从所述第一文本区域中提取出第二文本区域的步骤前,还包括:Optionally, before the step of extracting the second text area from the first text area according to the preset rule, the method further includes:
对所述第一文本区域进行二值化处理。The first text area is binarized.
可选地,所述对所述第一文本区域进行二值化处理的步骤包括:Optionally, the step of performing binarization processing on the first text area includes:
分别判断所述第一文本区域中的多个像素点的映射像素值是否大于第二预设阈值;Determining, respectively, whether a mapped pixel value of the plurality of pixel points in the first text area is greater than a second preset threshold;
若是,则将所述像素点标记为第二背景区域像素点;If yes, marking the pixel point as a second background area pixel point;
若否,则将所述像素点标记为第二文本区域像素点。 If not, the pixel is marked as a second text area pixel.
可选地,所述按照预设规则,从所述第一文本区域中提取出第二文本区域的步骤包括:Optionally, the step of extracting the second text area from the first text area according to the preset rule comprises:
确定所述第一文本区域中的多个连通区域;Determining a plurality of connected regions in the first text region;
分别判断所述多个连通区域是否满足预设规则;Determining, respectively, whether the plurality of connected areas meet a preset rule;
若是,则提取出相对应的多个连通区域作为第二文本区域。If so, a plurality of corresponding connected areas are extracted as the second text area.
可选地,所述确定所述第一文本区域中的多个连通区域的步骤包括:Optionally, the determining the multiple connectivity areas in the first text area includes:
遍历所述第二文本区域像素点;Traversing the second text area pixel point;
将当前第二文本区域像素点与相邻的第二文本区域像素点相连,获得以第二文本区域像素点为顶点的多边形;And connecting the current second text area pixel point to the adjacent second text area pixel point to obtain a polygon with the second text area pixel point as a vertex;
将包含所述多边形的面积最小的外接矩形确定为连通区域。A circumscribed rectangle having the smallest area including the polygon is determined as a connected region.
可选地,所述对所述第二文本区域进行识别的步骤包括:Optionally, the step of identifying the second text area includes:
采用卷积神经网络CNN汉字识别模型对所述第二文本区域进行识别。The second text region is identified using a convolutional neural network CNN Chinese character recognition model.
为了解决上述问题,本申请公开了一种图像文本的识别装置,包括:In order to solve the above problem, the present application discloses an image text recognition apparatus, including:
获取模块,用于获取待识别的图像,所述图像包括多个像素点;An acquiring module, configured to acquire an image to be identified, where the image includes a plurality of pixel points;
确定模块,用于根据所述多个像素点,确定所述图像的第一文本区域;a determining module, configured to determine a first text region of the image according to the plurality of pixel points;
提取模块,用于按照预设规则,从所述第一文本区域中提取出第二文本区域;An extracting module, configured to extract a second text area from the first text area according to a preset rule;
识别模块,用于对所述第二文本区域进行识别。And an identification module, configured to identify the second text area.
可选地,所述确定模块包括:Optionally, the determining module includes:
直方图计算子模块,用于针对所述多个像素点,计算所述图像的直方图,所述直方图具有对应的多个特征值;a histogram calculation submodule, configured to calculate a histogram of the image for the plurality of pixel points, the histogram having a corresponding plurality of feature values;
对比度归一化处理子模块,用于根据所述多个特征值,对所述直方图进行对比度归一化处理,获得对比度归一化处理结果;a contrast normalization processing sub-module, configured to perform contrast normalization processing on the histogram according to the plurality of feature values, to obtain a contrast normalization processing result;
第一文本区域获得子模块,用于对所述对比度归一化处理结果进行二值化处理,获得所述图像的第一文本区域。The first text area obtaining submodule is configured to perform binarization processing on the contrast normalization processing result to obtain a first text area of the image.
可选地,所述对比度归一化处理子模块包括:Optionally, the contrast normalization processing submodule includes:
特征值调整单元,用于按比例调整所述多个特征值,使调整后的多个特征值的和为特定数值;And an eigenvalue adjustment unit, configured to adjust the plurality of feature values proportionally, so that the sum of the adjusted plurality of feature values is a specific value;
特征值变换单元,用于采用累积分布函数对所述调整后的多个特征值进行变换,获得变换后的多个特征值; An eigenvalue transformation unit configured to transform the adjusted plurality of eigenvalues by using a cumulative distribution function to obtain a plurality of transformed eigenvalues;
特征值映射单元,用于分别将所述变换后的多个特征值映射到所述多个像素点,获得所述多个像素点的映射像素值。The feature value mapping unit is configured to respectively map the transformed plurality of feature values to the plurality of pixel points to obtain mapped pixel values of the plurality of pixel points.
可选地,所述第一文本区域获得子模块包括:Optionally, the first text area obtaining submodule includes:
第一预设阈值判断单元,用于分别判断所述图像中的多个像素点的映射像素值是否大于第一预设阈值;a first preset threshold determining unit, configured to determine, respectively, whether a mapped pixel value of the plurality of pixel points in the image is greater than a first preset threshold;
第一背景区域像素点标记单元,用于在所述像素点的映射像素值大于第一预设阈值时,将所述像素点标记为第一背景区域像素点;a first background area pixel point marking unit, configured to mark the pixel point as a first background area pixel point when a mapped pixel value of the pixel point is greater than a first preset threshold;
第一文本区域像素点标记单元,用于在所述像素点的映射像素值不大于第一预设阈值时,将所述像素点标记为第一文本区域像素点;a first text area pixel point marking unit, configured to mark the pixel point as a first text area pixel point when a mapped pixel value of the pixel point is not greater than a first preset threshold;
第一文本区域提取单元,用于从所述图像中提取出包含全部第一文本区域像素点的面积最小的外接矩形。The first text area extracting unit is configured to extract, from the image, a circumscribed rectangle having the smallest area of all the first text area pixel points.
可选地,所述装置还包括:Optionally, the device further includes:
二值化处理模块,用于对所述第一文本区域进行二值化处理。A binarization processing module is configured to perform binarization processing on the first text region.
可选地,所述二值化处理模块包括:Optionally, the binarization processing module includes:
第二预设阈值判断子模块,用于分别判断所述第一文本区域中的多个像素点的映射像素值是否大于第二预设阈值;a second preset threshold determining sub-module, configured to determine, respectively, whether a mapped pixel value of the plurality of pixel points in the first text area is greater than a second preset threshold;
第二背景区域像素点标记子模块,用于在所述像素点的映射像素值大于第二预设阈值时,将所述像素点标记为第二背景区域像素点;a second background area pixel point marking submodule, configured to mark the pixel point as a second background area pixel point when the mapped pixel value of the pixel point is greater than a second preset threshold;
第二文本区域像素点标记子模块,用于在所述像素点的映射像素值不大于第二预设阈值时,将将所述像素点标记为第二文本区域像素点。The second text area pixel point sub-module is configured to mark the pixel point as a second text area pixel point when the mapped pixel value of the pixel point is not greater than a second preset threshold.
可选地,所述提取模块包括:Optionally, the extraction module includes:
连通区域确定子模块,用于确定所述第一文本区域中的多个连通区域;a connected area determining submodule, configured to determine a plurality of connected areas in the first text area;
预设规则判断子模块,用于分别判断所述多个连通区域是否满足预设规则;a preset rule determining sub-module, configured to respectively determine whether the plurality of connected areas meet a preset rule;
第二文本区域提取子模块,用于在所述多个连通区域满足预设规则时,提取出相对应的多个连通区域作为第二文本区域。The second text area extraction submodule is configured to extract a corresponding plurality of connected areas as the second text area when the plurality of connected areas satisfy the preset rule.
可选地,所述连通区域确定子模块包括:Optionally, the connectivity area determining submodule includes:
第二文本区域像素点遍历单元,用于遍历所述第二文本区域像素点;a second text area pixel traversal unit for traversing the second text area pixel point;
第二文本区域像素点连接单元,用于将当前第二文本区域像素点与相邻的第二文本区域像素点相连,获得以第二文本区域像素点为顶点的多边形;a second text area pixel point connecting unit, configured to connect the current second text area pixel point to the adjacent second text area pixel point to obtain a polygon with the second text area pixel point as a vertex;
连通区域确定单元,用于将包含所述多边形的面积最小的外接矩形确定为连通区域。 The connected area determining unit is configured to determine a circumscribed rectangle having the smallest area of the polygon as the connected area.
可选地,所述识别模块包括:Optionally, the identifying module includes:
识别子模块,用于采用卷积神经网络CNN汉字识别模型对所述第二文本区域进行识别。And a recognition submodule for identifying the second text area by using a convolutional neural network CNN Chinese character recognition model.
与背景技术相比,本申请实施例包括以下优点:Compared with the background art, the embodiments of the present application include the following advantages:
本申请实施例通过对待识别的图像进行对比度归一化处理以及二值化处理,从而提取出第一文本区域,然后在确定第一文本区域的连通区域基础上,获得第二文本区域,有效地去除了待识别的图像中的噪声,并通过对所述第二文本区域进行识别来实现对图像文本的识别,避免了噪声对图像文本识别的干扰,大大提高了识别的准确率。In the embodiment of the present application, the first text region is extracted by performing contrast normalization processing and binarization processing on the image to be recognized, and then obtaining a second text region based on the connected region of the first text region, effectively The noise in the image to be recognized is removed, and the recognition of the image text is realized by recognizing the second text region, thereby avoiding interference of noise on image text recognition, and greatly improving the accuracy of recognition.
其次,在本申请实施例中,对于在身份证、护照等字体单一、背景简单的文本识别场景中,通过对待识别图像进行空间归一化处理,能够将训练数据和测试数据在空间上尽可能的统一,使得形近字在空间归一化后,具有不同的表现特征,使得CNN汉字识别模型能够更准确的识别形近字。Secondly, in the embodiment of the present application, in the text recognition scene with a single font and a simple background in the ID card, the passport, etc., by performing spatial normalization processing on the image to be recognized, the training data and the test data can be spatially as much as possible. The unity makes the shape near words normalized in space and has different performance characteristics, which makes the CNN Chinese character recognition model more accurately recognize the near-word.
附图说明DRAWINGS
图1是本申请的一种图像文本的识别方法实施例一的步骤流程图;1 is a flow chart showing the steps of Embodiment 1 of an image text identification method according to the present application;
图2是本申请的一种图像文本的识别方法实施例二的步骤流程图;2 is a flow chart showing the steps of a second embodiment of the method for identifying an image text according to the present application;
图3是本申请的一种图像文本的识别装置实施例的结构框图。3 is a structural block diagram of an embodiment of an apparatus for identifying an image text according to the present application.
具体实施方式detailed description
为使本申请的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本申请作进一步详细的说明。The above described objects, features and advantages of the present application will become more apparent and understood.
参照图1,示出了本申请的一种图像文本的识别方法实施例一的步骤流程图,具体可以包括如下步骤:Referring to FIG. 1 , a flow chart of a first embodiment of a method for identifying an image text according to the present application is shown. Specifically, the method may include the following steps:
步骤101,获取待识别的图像;Step 101: Acquire an image to be identified;
在本申请实施例中,所述待识别的图像可以是各类证件图像,例如身份证、护照等。其中,所述图像包括有多个像素点,像素点是指把某一图像分割成若干个小方格,每个小方格便被称为一个像素点,由这些像素点排列组成的栅格被称为“光栅”,计算机可以通过表示这些像素点的位置、颜色、亮度等信息,从而表示出整幅图像。 In the embodiment of the present application, the image to be identified may be various types of ID images, such as an ID card, a passport, and the like. Wherein, the image includes a plurality of pixel points, and the pixel point refers to dividing an image into a plurality of small squares, each small square is called a pixel point, and a grid composed of the pixel points is arranged. Known as the "raster", the computer can represent the entire image by indicating the position, color, brightness, etc. of these pixels.
通常,各类证件中的文本都有别于其他自然场景的汉字识别。证件中的文本的特点是:1)文本都是印刷体;2)文本都是单一(或种类不多)的字体,例如都是宋体字,或都是宋体字或楷体字;3)图像背景简单。Usually, the text in each type of document is different from the Chinese character recognition of other natural scenes. The characteristics of the text in the document are: 1) the text is printed; 2) the text is a single (or a small variety) font, for example, all are in the Song, or both in the Song or the Chinese characters; 3) the image background is simple.
步骤102,根据所述多个像素点,确定所述图像的第一文本区域;Step 102: Determine, according to the plurality of pixel points, a first text area of the image;
通常,为了对图像中的文本进行识别,可以首先根据所述多个像素点,排除一些背景区域,以确定出所述图像的第一文本区域。Generally, in order to identify text in an image, some background regions may be excluded based on the plurality of pixels to determine a first text region of the image.
在本申请实施例中,所述第一文本区域可以是经过初步筛选而确定的包括文本信息的区域,从而有助于进一步地有针对性地对相应区域的文本进行识别。In the embodiment of the present application, the first text area may be an area including text information determined through preliminary screening, thereby facilitating further targeted recognition of text of the corresponding area.
在本申请的一种优选实施例中,所述根据所述多个像素点,确定所述图像的第一文本区域的步骤具体可以包括如下子步骤:In a preferred embodiment of the present application, the step of determining the first text area of the image according to the plurality of pixel points may specifically include the following sub-steps:
子步骤1021,针对所述多个像素点,计算所述图像的直方图;Sub-step 1021, calculating a histogram of the image for the plurality of pixel points;
在具体实现中,当获得待识别的图像后,可以首先针对图像中的多个像素点,计算出所述图像的直方图。直方图是一种用来描述图像灰度值的图,可以把一定范围内的图像数据显示出来,通过查看图像的直方图,可以了解图像的曝光情况,或者画面是否柔和等。所述直方图可以具有对应的多个特征值,即表示不同亮度的RGB数值。In a specific implementation, after obtaining the image to be identified, a histogram of the image may be first calculated for a plurality of pixel points in the image. A histogram is a graph used to describe the gray value of an image. It can display image data within a certain range. By viewing the histogram of the image, you can understand the exposure of the image, or whether the image is soft. The histogram may have a corresponding plurality of feature values, ie, RGB values representing different brightnesses.
一般地,直方图的横轴可以用来表示图像亮度的变化,纵轴用来表示像素的多少。直方图的横轴从左到右表示亮度越来越高,从0到255,其中,0表示黑,255表示白。如果某个地方的峰越高,则说明在这个亮度下的像素越多。In general, the horizontal axis of the histogram can be used to represent changes in image brightness, and the vertical axis is used to indicate how many pixels. The horizontal axis of the histogram from left to right indicates that the brightness is getting higher and higher, from 0 to 255, where 0 is black and 255 is white. If the peak of a place is higher, the more pixels there are at this brightness.
子步骤1022,根据所述多个特征值,对所述直方图进行对比度归一化处理,获得对比度归一化处理结果;Sub-step 1022, performing contrast normalization processing on the histogram according to the plurality of feature values, to obtain a contrast normalization processing result;
在具体实现中,对所述直方图进行对比度归一化处理可以首先按比例对直方图的多个特征值进行调整,使调整后的多个特征值的和为255。例如,若经过求和,得到所述直方图的多个特征值的和为765,则可以将每个特征值乘以1/3,从而使得调整后的多个特征值的和为255(765*1/3=255);如果所述脂肪图的多个特征值的和小于255,则可以按比例扩大每个特征值,使调整后的多个特征值的和满足上述要求。In a specific implementation, the contrast normalization process on the histogram may first adjust a plurality of feature values of the histogram proportionally, so that the sum of the adjusted plurality of feature values is 255. For example, if the sum of the plurality of feature values of the histogram is 765 after summation, each feature value may be multiplied by 1/3, so that the sum of the adjusted plurality of feature values is 255 (765 *1/3=255); If the sum of the plurality of feature values of the fat map is less than 255, each feature value may be scaled up so that the sum of the adjusted plurality of feature values satisfies the above requirements.
然后,可以采用累积分布函数对所述调整后的多个特征值进行变换,得到变换后的多个特征值。累积分布函数是概率密度函数的积分,能够完整描述一个实数随机变量X的概率分布情况。Then, the adjusted plurality of feature values may be transformed by using a cumulative distribution function to obtain a plurality of transformed feature values. The cumulative distribution function is the integral of the probability density function and can fully describe the probability distribution of a real random variable X.
进而,可以将获得的变换后的多个特征值作为映射表,将变换后的多个特征值分别映射到所述图像的多个像素点,以变换后的特征值作为所述多个像素点的映射像素值, 从而替换所述像素点原有的像素值。Further, the obtained transformed plurality of feature values may be used as a mapping table, and the plurality of transformed feature values are respectively mapped to a plurality of pixel points of the image, and the transformed feature values are used as the plurality of pixel points Mapped pixel values, Thereby replacing the original pixel value of the pixel.
子步骤1023,对所述对比度归一化处理结果进行二值化处理,获得所述图像的第一文本区域。Sub-step 1023, performing binarization processing on the contrast normalization processing result to obtain a first text region of the image.
在本申请实施例中,可以首先遍历所述多个像素点的映射像素值,分别判断其映射像素值是否大于第一预设阈值,若是,则可以将所述像素点标记为第一背景区域像素点;若否,则可以将所述像素点标记为第一文本区域像素点;然后从所述图像中提取出包含全部第一文本区域像素点的面积最小的外接矩形,所述外接矩形即为所述图像的第一文本区域。In the embodiment of the present application, the mapped pixel values of the plurality of pixel points may be traversed first to determine whether the mapped pixel value is greater than the first preset threshold, and if so, the pixel may be marked as the first background region. a pixel point; if not, the pixel point may be marked as a first text area pixel point; and then a circumscribed rectangle having the smallest area including all the first text area pixel points is extracted from the image, and the circumscribed rectangle is Is the first text area of the image.
在具体实现中,所述第一预设阈值可以通过大津算法(OTSU算法)计算得到。大津算法是一种对图像进行二值化的高效算法,使用聚类的思想,通过把图像的灰度数按灰度级分成两个部分,使得两个部分之间的灰度值差异最大,每个部分之间的灰度差异最小,然后通过方差的计算来寻找一个合适的灰度级别来划分。因此,可以在二值化的时候,采用大津算法来自动选取阈值进行二值化。大津算法被认为是图像分割中阈值选取的最佳算法,计算简单,不受图像亮度和对比度的影响。In a specific implementation, the first preset threshold may be calculated by an Otsu algorithm (OTSU algorithm). The Otsu algorithm is an efficient algorithm for binarizing images. Using the idea of clustering, the gray value of the image is divided into two parts according to the gray level, so that the difference in gray value between the two parts is the largest. The difference in gray scale between each part is the smallest, and then the variance is calculated to find a suitable gray level to divide. Therefore, in the case of binarization, the Otsu algorithm can be used to automatically select the threshold for binarization. The Otsu algorithm is considered to be the best algorithm for threshold selection in image segmentation. It is simple to calculate and is not affected by image brightness and contrast.
步骤103,按照预设规则,从所述第一文本区域中提取出第二文本区域;Step 103: Extract a second text area from the first text area according to a preset rule.
在本申请实施例中,当从所述图像中提取出第一文本区域后,可以继续对所述第一文本区域进行筛选,以进一步排除背景区域,得到包含文本信息的第二文本区域。In the embodiment of the present application, after the first text area is extracted from the image, the first text area may be further filtered to further exclude the background area, and a second text area including text information is obtained.
在本申请的一种优选实施例中,在所述按照预设规则,从所述第一文本区域中提取出第二文本区域的步骤前,还可以包括如下:In a preferred embodiment of the present application, before the step of extracting the second text area from the first text area according to the preset rule, the method may further include the following:
对所述第一文本区域进行二值化处理。The first text area is binarized.
在具体实现中,可以继续按照步骤102中的方法对第一文本区域进行第二次二值化处理,从而标记出第二文本区域像素点,然后基于第二文本区域像素点,提取第二文本区域。In a specific implementation, the first text region may be subjected to a second binarization process according to the method in step 102, thereby marking the second text region pixel point, and then extracting the second text based on the second text region pixel point. region.
在本申请的一种优选实施例中,所述按照预设规则,从所述第一文本区域中提取出第二文本区域的步骤具体可以包括如下子步骤:In a preferred embodiment of the present application, the step of extracting the second text area from the first text area according to the preset rule may specifically include the following sub-steps:
子步骤1031,确定所述第一文本区域中的多个连通区域;Sub-step 1031, determining a plurality of connected areas in the first text area;
子步骤1032,分别判断所述多个连通区域是否满足预设规则,若是,则提取出相对应的多个连通区域作为第二文本区域。Sub-step 1032, respectively, determining whether the plurality of connected areas meet the preset rule, and if yes, extracting the corresponding multiple connected areas as the second text area.
在具体实现中,可以基于第二次二值化处理标记的第二文本区域像素点,采用连通图算法确定出所述第一文本区域中的多个连通区域。具体地,如下伪代码是本申请实施 例中采用的连通图算法的一种示例:In a specific implementation, the second text region pixel of the mark may be processed based on the second binarization process, and the connected graph algorithm is used to determine the plurality of connected regions in the first text region. Specifically, the following pseudo code is implemented in the present application. An example of a connected graph algorithm used in the example:
Figure PCTCN2017076548-appb-000001
Figure PCTCN2017076548-appb-000001
然后,可以分别对每个连通区域进行判断,删除不满足预设规则的连通区域,从而得到第二文本区域。Then, each connected area can be separately judged, and the connected area that does not satisfy the preset rule is deleted, thereby obtaining a second text area.
在本申请实施例中,所述不满足预设规则的连通区域可以包括面积太小的连通区域,以及离最大连通区域距离较大的连通区域。In the embodiment of the present application, the connected area that does not satisfy the preset rule may include a connected area with a small area, and a connected area with a large distance from the largest connected area.
步骤104,对所述第二文本区域进行识别。Step 104: Identify the second text area.
通常,在对待识别的图像完成步骤102和步骤103的处理后,已基本达到去除噪声的目的,从而可以对第二文本区域进行识别,获得待识别的图像中的文本信息。Generally, after the processing of steps 102 and 103 is completed on the image to be recognized, the purpose of removing noise has been substantially achieved, so that the second text area can be identified, and the text information in the image to be recognized is obtained.
在本申请实施例中,通过对待识别的图像进行对比度归一化处理以及二值化处理,从而提取出第一文本区域,然后在确定第一文本区域的连通区域基础上,获得第二文本区域,有效地去除了待识别的图像中的噪声,并通过对所述第二文本区域进行识别来实现对图像文本的识别,避免了噪声对图像文本识别的干扰,大大提高了识别的准确率。In the embodiment of the present application, the contrast normalization process and the binarization process are performed on the image to be recognized, thereby extracting the first text region, and then obtaining the second text region on the basis of determining the connected region of the first text region. The noise in the image to be recognized is effectively removed, and the recognition of the second text area is performed to realize the recognition of the image text, thereby avoiding the interference of the noise on the image text recognition, and the recognition accuracy is greatly improved.
参照图2,示出了本申请的一种图像文本的识别方法实施例二的步骤流程图,具体可以包括如下步骤:Referring to FIG. 2, a flow chart of the steps of the second embodiment of the method for identifying the image text of the present application is shown, which may specifically include the following steps:
步骤201,获取待识别的图像;Step 201: Acquire an image to be identified.
在本申请实施例中,所述待识别的图像可以是各类证件图像,例如身份证、护照等。通常,各类证件图像中的文本都有别于其他自然场景的汉字识别。证件中的文本的特点是:1)文本都是印刷体;2)文本都是单一(或种类不多)的字体,例如都是宋体字,或都是宋体字或楷体字;3)图像背景简单。因此,基于空间归一化操作的图像文本识别可以应用于证件识别的场景中。In the embodiment of the present application, the image to be identified may be various types of ID images, such as an ID card, a passport, and the like. Usually, the text in each type of document image is different from the Chinese character recognition of other natural scenes. The characteristics of the text in the document are: 1) the text is printed; 2) the text is a single (or a small variety) font, for example, all are in the Song, or both in the Song or the Chinese characters; 3) the image background is simple. Therefore, image text recognition based on spatial normalization operations can be applied to scenes for document identification.
步骤202,针对所述多个像素点,计算所述图像的直方图;Step 202: Calculate a histogram of the image for the plurality of pixel points;
每一张图像都包括有多个像素点,计算机可以通过表示这些像素点的位置、颜色、 亮度等信息,从而表示出整幅图像。因此,在本申请实施例中,可以针对所述多个像素点,计算出所述图像的直方图。Each image includes multiple pixels, and the computer can indicate the position, color, and Information such as brightness to represent the entire image. Therefore, in the embodiment of the present application, a histogram of the image may be calculated for the plurality of pixel points.
步骤203,根据所述多个特征值,对所述直方图进行对比度归一化处理,获得对比度归一化处理结果;Step 203: Perform contrast normalization processing on the histogram according to the plurality of feature values, to obtain a contrast normalization processing result;
在本申请的一种优选实施例中,所述根据所述多个特征值,对所述直方图进行对比度归一化处理,获得对比度归一化处理结果的步骤具体可以包括如下子步骤:In a preferred embodiment of the present application, the step of performing normalization processing on the histogram according to the plurality of feature values to obtain a contrast normalization processing result may specifically include the following sub-steps:
子步骤2031,按比例调整所述多个特征值,使调整后的多个特征值的和为特定数值;Sub-step 2031, adjusting the plurality of feature values proportionally, so that the sum of the adjusted plurality of feature values is a specific value;
子步骤2032,采用累积分布函数对所述调整后的多个特征值进行变换,获得变换后的多个特征值;Sub-step 2032, transforming the adjusted plurality of feature values by using a cumulative distribution function to obtain a plurality of transformed feature values;
子步骤2033,分别将所述变换后的多个特征值映射到所述多个像素点,获得所述多个像素点的映射像素值。Sub-step 2033, respectively mapping the transformed plurality of feature values to the plurality of pixel points to obtain mapped pixel values of the plurality of pixel points.
在具体实现中,对所述直方图进行对比度归一化处理可以首先按比例对直方图的多个特征值进行调整,使调整后的多个特征值的和为255。例如,若经过求和,得到所述直方图的多个特征值的和为765,则可以将每个特征值乘以1/3,从而使得调整后的多个特征值的和为255(765*1/3=255);如果所述脂肪图的多个特征值的和小于255,则可以按比例扩大每个特征值,使调整后的多个特征值的和满足上述要求。In a specific implementation, the contrast normalization process on the histogram may first adjust a plurality of feature values of the histogram proportionally, so that the sum of the adjusted plurality of feature values is 255. For example, if the sum of the plurality of feature values of the histogram is 765 after summation, each feature value may be multiplied by 1/3, so that the sum of the adjusted plurality of feature values is 255 (765 *1/3=255); If the sum of the plurality of feature values of the fat map is less than 255, each feature value may be scaled up so that the sum of the adjusted plurality of feature values satisfies the above requirements.
然后,可以采用累积分布函数对所述调整后的多个特征值进行变换,得到变换后的多个特征值。累积分布函数是概率密度函数的积分,能够完整描述一个实数随机变量X的概率分布情况。即,第j个特征值变换后的对应的值应该为在其前面的全部特征值之和。Then, the adjusted plurality of feature values may be transformed by using a cumulative distribution function to obtain a plurality of transformed feature values. The cumulative distribution function is the integral of the probability density function and can fully describe the probability distribution of a real random variable X. That is, the corresponding value after the transformation of the jth eigenvalue should be the sum of all the eigenvalues preceding it.
进而,可以将获得的变换后的多个特征值作为映射表,将变换后的多个特征值分别映射到所述图像的多个像素点,以变换后的特征值作为所述多个像素点的映射像素值,从而替换所述像素点原有的像素值。Further, the obtained transformed plurality of feature values may be used as a mapping table, and the plurality of transformed feature values are respectively mapped to a plurality of pixel points of the image, and the transformed feature values are used as the plurality of pixel points Map the pixel values to replace the original pixel values of the pixels.
步骤204,对所述对比度归一化处理结果进行二值化处理,获得所述图像的第一文本区域;Step 204: Perform binarization processing on the contrast normalization processing result to obtain a first text region of the image;
在具体实现中,可以采用大津算法(OTSU算法)计算出第一预设阈值,通过将每个像素点的映射像素值与第一预设阈值进行比较,从而获得所述图像的第一文本区域。In a specific implementation, the Otsu algorithm (OTSU algorithm) may be used to calculate a first preset threshold, and the first text region of the image is obtained by comparing the mapped pixel value of each pixel with a first preset threshold. .
在本申请的一种优选实施例中,所述对所述对比度归一化处理结果进行二值化处理,获得所述图像的第一文本区域的步骤具体可以包括如下子步骤:In a preferred embodiment of the present application, the step of performing binarization processing on the contrast normalization processing result to obtain the first text region of the image may specifically include the following sub-steps:
子步骤2041,分别判断所述图像中的多个像素点的映射像素值是否大于第一预设阈 值;Sub-step 2041, respectively, determining whether a mapped pixel value of the plurality of pixel points in the image is greater than a first preset threshold value;
子步骤2042,若是,则将所述像素点标记为第一背景区域像素点;Sub-step 2042, if yes, marking the pixel point as a first background area pixel point;
子步骤2043,若否,则将所述像素点标记为第一文本区域像素点;Sub-step 2043, if no, marking the pixel point as a first text area pixel point;
子步骤2044,从所述图像中提取出包含全部第一文本区域像素点的面积最小的外接矩形。Sub-step 2044, extracting a circumscribed rectangle having the smallest area of all the first text region pixel points from the image.
在具体实现中,可以分别将每个像素点的映射像素值与第一预设阈值进行比较,若所述映射像素值大于第一预设阈值,则可以标记所述像素点为第一背景区域像素点,例如标记dst(x,y)=1,若所述映射像素值不大于第一预设阈值,则可以标记所述像素点为第一文本区域像素点,例如标记dst(x,y)=0。In a specific implementation, the mapped pixel value of each pixel point may be compared with a first preset threshold, and if the mapped pixel value is greater than the first preset threshold, the pixel may be marked as the first background area. a pixel, for example, a mark dst(x, y)=1. If the mapped pixel value is not greater than a first preset threshold, the pixel may be marked as a first text area pixel, such as a mark dst(x, y )=0.
然后,在所述图像中找到一个面积最小的并且能将所有的dst(x,y)=0的像素点包含在内的矩形。所述矩形内的图像就是一阶空间归一化的结果,即第一文本区域。Then, a rectangle having the smallest area and being able to include all the pixels of dst(x, y) = 0 is found in the image. The image within the rectangle is the result of normalization of the first-order space, ie the first text region.
步骤205,对所述第一文本区域进行二值化处理;Step 205: Perform binarization processing on the first text area.
在本申请实施例中,对第一文本区域进行二值化处理的过程与步骤204相同,即所述对所述第一文本区域进行二值化处理的步骤具体可以包括如下子步骤:In the embodiment of the present application, the process of performing the binarization process on the first text area is the same as the step 204. The step of performing the binarization process on the first text area may specifically include the following sub-steps:
子步骤2051,分别判断所述第一文本区域中的多个像素点的映射像素值是否大于第二预设阈值;Sub-step 2051, respectively, determining whether a mapped pixel value of the plurality of pixel points in the first text area is greater than a second preset threshold;
子步骤2052,若是,则将所述像素点标记为第二背景区域像素点;Sub-step 2052, if yes, marking the pixel point as a second background area pixel point;
子步骤2053,若否,则将所述像素点标记为第二文本区域像素点。Sub-step 2053, if no, marks the pixel as a second text area pixel.
需要注意的是,在对第一文本区域进行第二次二值化处理时,需要重新计算预设阈值,即,需要通过大津算法(OTSU算法)计算出第二预设阈值,通过将每个像素点的映射像素值与第二预设阈值进行比较,从而标记出第二背景区域像素点以及第二文本区域像素点。例如若所述映射像素值大于第二预设阈值,则可以标记所述像素点为第二背景区域像素点,标记dst(x,y)=1,若所述映射像素值不大于第二预设阈值,则可以标记所述像素点为第二文本区域像素点,标记dst(x,y)=0。It should be noted that when performing the second binarization process on the first text region, the preset threshold needs to be recalculated, that is, the second preset threshold needs to be calculated by the Otsu algorithm (OTSU algorithm), by using each The mapped pixel value of the pixel is compared with a second predetermined threshold to mark the second background area pixel and the second text area pixel. For example, if the mapping pixel value is greater than the second preset threshold, the pixel may be marked as a second background area pixel, and the flag dst(x, y)=1, if the mapping pixel value is not greater than the second pre- By setting a threshold, the pixel can be marked as a pixel of the second text area, and the mark dst(x, y)=0.
步骤206,确定所述第一文本区域中的多个连通区域;Step 206: Determine a plurality of connected areas in the first text area.
在本申请实施例中,可以基于第二次二值化处理标记的第二文本区域像素点,采用连通图算法确定出所述第一文本区域中的多个连通区域。In the embodiment of the present application, a plurality of connected regions in the first text region may be determined by using a connectivity graph algorithm based on the second text region pixel of the second binarization process.
在本申请的一种优选实施例中,所述确定所述第一文本区域中的多个连通区域的步 骤具体可以包括如下子步骤:In a preferred embodiment of the present application, the step of determining a plurality of connected regions in the first text region The specific steps may include the following substeps:
子步骤2061,遍历所述第二文本区域像素点;Sub-step 2061, traversing the second text area pixel point;
子步骤2062,将当前第二文本区域像素点与相邻的第二文本区域像素点相连,获得以第二文本区域像素点为顶点的多边形;Sub-step 2062, the current second text area pixel point is connected to the adjacent second text area pixel point to obtain a polygon with the second text area pixel point as a vertex;
子步骤2063,将包含所述多边形的面积最小的外接矩形确定为连通区域。Sub-step 2063, the circumscribed rectangle including the smallest area of the polygon is determined as the connected area.
在具体实现中,可以遍历所述第二文本区域像素点,即在步骤205中二值化处理时,被标记为dst(x,y)=0的像素点,将当前第二文本区域像素点与相邻的第二文本区域像素点相连,得到一个全部以第二文本区域像素点为顶点的多边形,然后,在所述第一文本区域中找到一个面积最小的并且能将所述多边形包含在内的矩形。所述矩形内的图像就是一个连通区域。In a specific implementation, the second text area pixel point may be traversed, that is, when binarization processing in step 205, the pixel point marked as dst(x, y)=0, the current second text area pixel point And connecting adjacent pixel points of the second text area to obtain a polygon having all the pixels of the second text area as a vertex, and then finding a smallest area in the first text area and including the polygon The rectangle inside. The image within the rectangle is a connected area.
步骤207,分别判断所述多个连通区域是否满足预设规则;Step 207: Determine whether the multiple connected areas meet the preset rule.
在本申请实施例中,在确定出全部的连通区域后,可以逐个判断所述连通区域是否满足预设规则,如果某一连通区域不满足预设规则,则可以将该连通区域删除,从而最终得到由剩余的多个满足预设规则的连通区域组成的第二文本区域。In the embodiment of the present application, after determining all the connected areas, it may be determined one by one whether the connected area satisfies a preset rule. If a connected area does not satisfy the preset rule, the connected area may be deleted, and finally A second text region composed of a plurality of remaining connected regions satisfying the preset rule is obtained.
在具体实现中,所述不满足预设规则的连通区域可以包括面积太小的连通区域,以及离最大连通区域距离较大的连通区域。例如,面积小于2*2像素的连通区域,以及,离最大连通区域距离大于0.06的连通区域。In a specific implementation, the connected area that does not satisfy the preset rule may include a connected area that is too small in area, and a connected area that is far away from the largest connected area. For example, a connected area having an area of less than 2*2 pixels, and a connected area having a distance of more than 0.06 from the maximum connected area.
步骤208,提取出相对应的多个连通区域作为第二文本区域; Step 208, extracting corresponding multiple connected areas as the second text area;
步骤209,采用卷积神经网络CNN汉字识别模型对所述第二文本区域进行识别。Step 209: Identify the second text area by using a convolutional neural network CNN Chinese character recognition model.
在本申请实施例中,当获得第二文本区域图像后,可以采用卷积神经网络CNN汉字识别模型对所述第二文本区域进行识别。卷积神经网络(Convolutional Neural Network,CNN)是一种前馈神经网络,它的人工神经元可以响应一部分覆盖范围内的周围单元,对于大型图像处理有着出色的表现。In the embodiment of the present application, after obtaining the second text area image, the second text area may be identified by using a convolutional neural network CNN Chinese character recognition model. The Convolutional Neural Network (CNN) is a feedforward neural network whose artificial neurons can respond to a surrounding area of a part of the coverage and have excellent performance for large image processing.
在具体实现中,可以利用上述步骤201至步骤208所述的方法对训练数据进行空间归一化操作,并用于CNN汉字识别模型的训练,从而得到卷积神经网络CNN汉字识别模型。然后在图像文本识别任务中,给定一张待识别的图像,利用训练好的CNN汉字识别模型进行识别。In a specific implementation, the training data may be spatially normalized by using the method described in the foregoing steps 201 to 208, and used for training the CNN Chinese character recognition model, thereby obtaining a convolutional neural network CNN Chinese character recognition model. Then, in the image text recognition task, an image to be recognized is given, and the trained CNN Chinese character recognition model is used for recognition.
在本申请实施例中,对于在身份证、护照等字体单一、背景简单的文本识别场景中,通过对待识别图像进行空间归一化处理,能够将训练数据和测试数据在空间上尽可能的 统一,使得形近字在空间归一化后,具有不同的表现特征,使得CNN汉字识别模型能够更准确的识别形近字。In the embodiment of the present application, in the text recognition scene with a single font and a simple background in the ID card, the passport, and the like, by performing spatial normalization processing on the image to be recognized, the training data and the test data can be spatially as much as possible. The unity makes the shape near words normalized in space and has different performance characteristics, which makes the CNN Chinese character recognition model more accurately recognize the near-words.
需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请实施例并不受所描述的动作顺序的限制,因为依据本申请实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本申请实施例所必须的。It should be noted that, for the method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should understand that the embodiments of the present application are not limited by the described action sequence, because In accordance with embodiments of the present application, certain steps may be performed in other sequences or concurrently. In the following, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions involved are not necessarily required in the embodiments of the present application.
参照图3,示出了本申请的一种图像文本的识别装置实施例的结构框图,具体可以包括如下模块:Referring to FIG. 3, a structural block diagram of an embodiment of an apparatus for identifying an image text according to the present application is shown. Specifically, the following modules may be included:
获取模块301,用于获取待识别的图像,所述图像包括多个像素点;An obtaining module 301, configured to acquire an image to be identified, where the image includes a plurality of pixel points;
确定模块302,用于根据所述多个像素点,确定所述图像的第一文本区域;a determining module 302, configured to determine, according to the plurality of pixel points, a first text area of the image;
提取模块303,用于按照预设规则,从所述第一文本区域中提取出第二文本区域;The extracting module 303 is configured to extract a second text area from the first text area according to a preset rule;
识别模块304,用于对所述第二文本区域进行识别。The identification module 304 is configured to identify the second text area.
在本申请实施例中,所述确定模块302具体可以包括如下子模块:In the embodiment of the present application, the determining module 302 may specifically include the following submodules:
直方图计算子模块3021,用于针对所述多个像素点,计算所述图像的直方图,所述直方图具有对应的多个特征值;a histogram calculation sub-module 3021, configured to calculate a histogram of the image for the plurality of pixel points, the histogram having a corresponding plurality of feature values;
对比度归一化处理子模块3022,用于根据所述多个特征值,对所述直方图进行对比度归一化处理,获得对比度归一化处理结果;The contrast normalization processing sub-module 3022 is configured to perform contrast normalization processing on the histogram according to the plurality of feature values to obtain a contrast normalization processing result;
第一文本区域获得子模块3023,用于对所述对比度归一化处理结果进行二值化处理,获得所述图像的第一文本区域。The first text area obtaining sub-module 3023 is configured to perform binarization processing on the contrast normalization processing result to obtain a first text area of the image.
在本申请实施例中,所述对比度归一化处理子模块3022具体可以包括如下单元:In the embodiment of the present application, the contrast normalization processing sub-module 3022 may specifically include the following units:
特征值调整单元221,用于按比例调整所述多个特征值,使调整后的多个特征值的和为特定数值;The feature value adjustment unit 221 is configured to adjust the plurality of feature values proportionally, so that the sum of the adjusted plurality of feature values is a specific value;
特征值变换单元222,用于采用累积分布函数对所述调整后的多个特征值进行变换,获得变换后的多个特征值;The eigenvalue transformation unit 222 is configured to transform the adjusted plurality of eigenvalues by using a cumulative distribution function to obtain a plurality of transformed eigenvalues;
特征值映射单元223,用于分别将所述变换后的多个特征值映射到所述多个像素点,获得所述多个像素点的映射像素值。The feature value mapping unit 223 is configured to respectively map the transformed plurality of feature values to the plurality of pixel points to obtain mapped pixel values of the plurality of pixel points.
在本申请实施例中,所述第一文本区域获得子模块3023具体可以包括如下单元: In the embodiment of the present application, the first text area obtaining submodule 3023 may specifically include the following units:
第一预设阈值判断单元231,用于分别判断所述图像中的多个像素点的映射像素值是否大于第一预设阈值;The first preset threshold determining unit 231 is configured to determine, respectively, whether a mapped pixel value of the plurality of pixel points in the image is greater than a first preset threshold;
第一背景区域像素点标记单元232,用于在所述像素点的映射像素值大于第一预设阈值时,将所述像素点标记为第一背景区域像素点;a first background area pixel point marking unit 232, configured to mark the pixel point as a first background area pixel point when the mapped pixel value of the pixel point is greater than a first preset threshold;
第一文本区域像素点标记单元233,用于在所述像素点的映射像素值不大于第一预设阈值时,将所述像素点标记为第一文本区域像素点;a first text area pixel point marking unit 233, configured to mark the pixel point as a first text area pixel point when the mapped pixel value of the pixel point is not greater than a first preset threshold;
第一文本区域提取单元234,用于从所述图像中提取出包含全部第一文本区域像素点的面积最小的外接矩形。The first text area extracting unit 234 is configured to extract, from the image, a circumscribed rectangle having the smallest area of all the first text area pixel points.
在本申请实施例中,所述装置还可以包括如下模块:In the embodiment of the present application, the device may further include the following modules:
二值化处理模块305,用于对所述第一文本区域进行二值化处理。The binarization processing module 305 is configured to perform binarization processing on the first text region.
在本申请实施例中,所述二值化处理模块305具体可以包括如下子模块:In the embodiment of the present application, the binarization processing module 305 may specifically include the following submodules:
第二预设阈值判断子模块3051,用于分别判断所述第一文本区域中的多个像素点的映射像素值是否大于第二预设阈值;a second preset threshold determining sub-module 3051, configured to determine, respectively, whether a mapped pixel value of the plurality of pixel points in the first text region is greater than a second preset threshold;
第二背景区域像素点标记子模块3052,用于在所述像素点的映射像素值大于第二预设阈值时,将所述像素点标记为第二背景区域像素点;a second background area pixel point sub-module 3052, configured to mark the pixel point as a second background area pixel point when the mapped pixel value of the pixel point is greater than a second preset threshold;
第二文本区域像素点标记子模块3053,用于在所述像素点的映射像素值不大于第二预设阈值时,将将所述像素点标记为第二文本区域像素点。The second text area pixel point sub-module 3053 is configured to mark the pixel point as a second text area pixel point when the mapped pixel value of the pixel point is not greater than a second preset threshold.
在本申请实施例中,所述提取模块303具体可以包括如下子模块:In the embodiment of the present application, the extraction module 303 may specifically include the following submodules:
连通区域确定子模块3031,用于确定所述第一文本区域中的多个连通区域;a connected area determining submodule 3031, configured to determine a plurality of connected areas in the first text area;
预设规则判断子模块3032,用于分别判断所述多个连通区域是否满足预设规则;The preset rule determining sub-module 3032 is configured to determine whether the plurality of connected areas meet the preset rule respectively;
第二文本区域提取子模块3033,用于在所述多个连通区域满足预设规则时,提取出相对应的多个连通区域作为第二文本区域。The second text area extraction sub-module 3033 is configured to extract a corresponding plurality of connected areas as the second text area when the plurality of connected areas satisfy the preset rule.
在本申请实施例中,所述连通区域确定子模块3031具体可以包括如下单元:In the embodiment of the present application, the connectivity area determining submodule 3031 may specifically include the following units:
第二文本区域像素点遍历单元311,用于遍历所述第二文本区域像素点;a second text area pixel traversing unit 311, configured to traverse the second text area pixel point;
第二文本区域像素点连接单元312,用于将当前第二文本区域像素点与相邻的第二文本区域像素点相连,获得以第二文本区域像素点为顶点的多边形;a second text area pixel point connecting unit 312, configured to connect the current second text area pixel point with the adjacent second text area pixel point to obtain a polygon with the second text area pixel point as a vertex;
连通区域确定单元313,用于将包含所述多边形的面积最小的外接矩形确定为连通区域。The connected area determining unit 313 is configured to determine a circumscribed rectangle that includes the smallest area of the polygon as the connected area.
在本申请实施例中,所述识别模块304具体可以包括如下子模块:In the embodiment of the present application, the identification module 304 may specifically include the following sub-modules:
识别子模块3041,用于采用卷积神经网络CNN汉字识别模型对所述第二文本区域 进行识别。The identification sub-module 3041 is configured to use the convolutional neural network CNN Chinese character recognition model to the second text area Identify.
对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。For the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。The various embodiments in the present specification are described in a progressive manner, and each embodiment focuses on differences from other embodiments, and the same similar parts between the various embodiments can be referred to each other.
本领域内的技术人员应明白,本申请实施例的实施例可提供为方法、装置、或计算机程序产品。因此,本申请实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the embodiments of the present application can be provided as a method, apparatus, or computer program product. Therefore, the embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, embodiments of the present application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
在一个典型的配置中,所述计算机设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括非持续性的电脑可读媒体(transitory media),如调制的数据信号和载波。In a typical configuration, the computer device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. The memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory. Memory is an example of a computer readable medium. Computer readable media includes both permanent and non-persistent, removable and non-removable media. Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-persistent computer readable media, such as modulated data signals and carrier waves.
本申请实施例是参照根据本申请实施例的方法、终端设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方 框或多个方框中指定的功能的装置。Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal device to produce a machine such that instructions are executed by a processor of a computer or other programmable data processing terminal device Generate a process for implementing one or more processes and/or block diagrams in a flowchart A device in a box or a function specified in multiple boxes.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The instruction device implements the functions specified in one or more blocks of the flowchart or in a flow or block of the flowchart.
这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设备上,使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing terminal device such that a series of operational steps are performed on the computer or other programmable terminal device to produce computer-implemented processing, such that the computer or other programmable terminal device The instructions executed above provide steps for implementing the functions specified in one or more blocks of the flowchart or in a block or blocks of the flowchart.
尽管已描述了本申请实施例的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请实施例范围的所有变更和修改。While a preferred embodiment of the embodiments of the present application has been described, those skilled in the art can make further changes and modifications to the embodiments once they are aware of the basic inventive concept. Therefore, the appended claims are intended to be interpreted as including all the modifications and the modifications
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。Finally, it should also be noted that in this context, relational terms such as first and second are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities. There is any such actual relationship or order between operations. Furthermore, the terms "comprises" or "comprising" or "comprising" or any other variations are intended to encompass a non-exclusive inclusion, such that a process, method, article, or terminal device that includes a plurality of elements includes not only those elements but also Other elements that are included, or include elements inherent to such a process, method, article, or terminal device. An element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article, or terminal device that comprises the element, without further limitation.
以上对本申请所提供的一种图像文本的识别方法和一种图像文本的识别装置,进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。 The method for recognizing an image text and the device for recognizing an image text provided by the present application are described in detail above. The principles and implementation manners of the present application are described in the following, and the description of the above embodiments is described. It is only used to help understand the method of the present application and its core ideas; at the same time, for those of ordinary skill in the art, according to the idea of the present application, there will be changes in specific implementation manners and application scopes. The contents of this specification are not to be construed as limiting the application.

Claims (18)

  1. 一种图像文本的识别方法,其特征在于,包括:A method for recognizing image text, comprising:
    获取待识别的图像,所述图像包括多个像素点;Obtaining an image to be identified, the image comprising a plurality of pixel points;
    根据所述多个像素点,确定所述图像的第一文本区域;Determining a first text region of the image based on the plurality of pixel points;
    按照预设规则,从所述第一文本区域中提取出第二文本区域;Extracting a second text area from the first text area according to a preset rule;
    对所述第二文本区域进行识别。The second text area is identified.
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述多个像素点,确定所述图像的第一文本区域的步骤包括:The method according to claim 1, wherein the determining the first text region of the image according to the plurality of pixel points comprises:
    针对所述多个像素点,计算所述图像的直方图,所述直方图具有对应的多个特征值;Calculating a histogram of the image for the plurality of pixel points, the histogram having a corresponding plurality of feature values;
    根据所述多个特征值,对所述直方图进行对比度归一化处理,获得对比度归一化处理结果;Performing contrast normalization processing on the histogram according to the plurality of characteristic values to obtain a contrast normalization processing result;
    对所述对比度归一化处理结果进行二值化处理,获得所述图像的第一文本区域。The result of the contrast normalization processing is binarized to obtain a first text region of the image.
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述多个特征值,对所述直方图进行对比度归一化处理,获得对比度归一化处理结果的步骤包括:The method according to claim 2, wherein the step of performing contrast normalization processing on the histogram according to the plurality of feature values to obtain a contrast normalization processing result comprises:
    按比例调整所述多个特征值,使调整后的多个特征值的和为特定数值;Adjusting the plurality of feature values proportionally, so that the sum of the adjusted plurality of feature values is a specific value;
    采用累积分布函数对所述调整后的多个特征值进行变换,获得变换后的多个特征值;And transforming the adjusted plurality of feature values by using a cumulative distribution function to obtain a plurality of transformed feature values;
    分别将所述变换后的多个特征值映射到所述多个像素点,获得所述多个像素点的映射像素值。And mapping the transformed plurality of feature values to the plurality of pixel points respectively to obtain mapped pixel values of the plurality of pixel points.
  4. 根据权利要求3所述的方法,其特征在于,所述对所述对比度归一化处理结果进行二值化处理,获得所述图像的第一文本区域的步骤包括:The method according to claim 3, wherein the step of binarizing the result of the contrast normalization processing to obtain the first text region of the image comprises:
    分别判断所述图像中的多个像素点的映射像素值是否大于第一预设阈值;Determining, respectively, whether a mapped pixel value of the plurality of pixel points in the image is greater than a first preset threshold;
    若是,则将所述像素点标记为第一背景区域像素点;If yes, marking the pixel point as a first background area pixel point;
    若否,则将所述像素点标记为第一文本区域像素点;If not, marking the pixel as a first text area pixel;
    从所述图像中提取出包含全部第一文本区域像素点的面积最小的外接矩形。A circumscribed rectangle having the smallest area including all the pixels of the first text region is extracted from the image.
  5. 根据权利要求1-4任一所述的方法,其特征在于,在所述按照预设规则,从所述第一文本区域中提取出第二文本区域的步骤前,还包括:The method according to any one of claims 1 to 4, further comprising: before the step of extracting the second text area from the first text area according to the preset rule, further comprising:
    对所述第一文本区域进行二值化处理。The first text area is binarized.
  6. 根据权利要求5所述的方法,其特征在于,所述对所述第一文本区域进行二值化处理的步骤包括: The method according to claim 5, wherein the step of performing binarization processing on the first text region comprises:
    分别判断所述第一文本区域中的多个像素点的映射像素值是否大于第二预设阈值;Determining, respectively, whether a mapped pixel value of the plurality of pixel points in the first text area is greater than a second preset threshold;
    若是,则将所述像素点标记为第二背景区域像素点;If yes, marking the pixel point as a second background area pixel point;
    若否,则将所述像素点标记为第二文本区域像素点。If not, the pixel is marked as a second text area pixel.
  7. 根据权利要求6所述的方法,其特征在于,所述按照预设规则,从所述第一文本区域中提取出第二文本区域的步骤包括:The method according to claim 6, wherein the step of extracting the second text area from the first text area according to a preset rule comprises:
    确定所述第一文本区域中的多个连通区域;Determining a plurality of connected regions in the first text region;
    分别判断所述多个连通区域是否满足预设规则;Determining, respectively, whether the plurality of connected areas meet a preset rule;
    若是,则提取出相对应的多个连通区域作为第二文本区域。If so, a plurality of corresponding connected areas are extracted as the second text area.
  8. 根据权利要求7所述的方法,其特征在于,所述确定所述第一文本区域中的多个连通区域的步骤包括:The method according to claim 7, wherein the determining the plurality of connected areas in the first text area comprises:
    遍历所述第二文本区域像素点;Traversing the second text area pixel point;
    将当前第二文本区域像素点与相邻的第二文本区域像素点相连,获得以第二文本区域像素点为顶点的多边形;And connecting the current second text area pixel point to the adjacent second text area pixel point to obtain a polygon with the second text area pixel point as a vertex;
    将包含所述多边形的面积最小的外接矩形确定为连通区域。A circumscribed rectangle having the smallest area including the polygon is determined as a connected region.
  9. 根据权利要求1或2或3或4或6或7或8所述的方法,其特征在于,所述对所述第二文本区域进行识别的步骤包括:The method according to claim 1 or 2 or 3 or 4 or 6 or 7 or 8, wherein said step of identifying said second text region comprises:
    采用卷积神经网络CNN汉字识别模型对所述第二文本区域进行识别。The second text region is identified using a convolutional neural network CNN Chinese character recognition model.
  10. 一种图像文本的识别装置,其特征在于,包括:An apparatus for identifying an image text, comprising:
    获取模块,用于获取待识别的图像,所述图像包括多个像素点;An acquiring module, configured to acquire an image to be identified, where the image includes a plurality of pixel points;
    确定模块,用于根据所述多个像素点,确定所述图像的第一文本区域;a determining module, configured to determine a first text region of the image according to the plurality of pixel points;
    提取模块,用于按照预设规则,从所述第一文本区域中提取出第二文本区域;An extracting module, configured to extract a second text area from the first text area according to a preset rule;
    识别模块,用于对所述第二文本区域进行识别。And an identification module, configured to identify the second text area.
  11. 根据权利要求10所述的装置,其特征在于,所述确定模块包括:The apparatus according to claim 10, wherein the determining module comprises:
    直方图计算子模块,用于针对所述多个像素点,计算所述图像的直方图,所述直方图具有对应的多个特征值;a histogram calculation submodule, configured to calculate a histogram of the image for the plurality of pixel points, the histogram having a corresponding plurality of feature values;
    对比度归一化处理子模块,用于根据所述多个特征值,对所述直方图进行对比度归一化处理,获得对比度归一化处理结果;a contrast normalization processing sub-module, configured to perform contrast normalization processing on the histogram according to the plurality of feature values, to obtain a contrast normalization processing result;
    第一文本区域获得子模块,用于对所述对比度归一化处理结果进行二值化处理,获得所述图像的第一文本区域。 The first text area obtaining submodule is configured to perform binarization processing on the contrast normalization processing result to obtain a first text area of the image.
  12. 根据权利要求11所述的装置,其特征在于,所述对比度归一化处理子模块包括:The apparatus according to claim 11, wherein said contrast normalization processing sub-module comprises:
    特征值调整单元,用于按比例调整所述多个特征值,使调整后的多个特征值的和为特定数值;And an eigenvalue adjustment unit, configured to adjust the plurality of feature values proportionally, so that the sum of the adjusted plurality of feature values is a specific value;
    特征值变换单元,用于采用累积分布函数对所述调整后的多个特征值进行变换,获得变换后的多个特征值;An eigenvalue transformation unit configured to transform the adjusted plurality of eigenvalues by using a cumulative distribution function to obtain a plurality of transformed eigenvalues;
    特征值映射单元,用于分别将所述变换后的多个特征值映射到所述多个像素点,获得所述多个像素点的映射像素值。The feature value mapping unit is configured to respectively map the transformed plurality of feature values to the plurality of pixel points to obtain mapped pixel values of the plurality of pixel points.
  13. 根据权利要求12所述的装置,其特征在于,所述第一文本区域获得子模块包括:The apparatus according to claim 12, wherein the first text area obtaining submodule comprises:
    第一预设阈值判断单元,用于分别判断所述图像中的多个像素点的映射像素值是否大于第一预设阈值;a first preset threshold determining unit, configured to determine, respectively, whether a mapped pixel value of the plurality of pixel points in the image is greater than a first preset threshold;
    第一背景区域像素点标记单元,用于在所述像素点的映射像素值大于第一预设阈值时,将所述像素点标记为第一背景区域像素点;a first background area pixel point marking unit, configured to mark the pixel point as a first background area pixel point when a mapped pixel value of the pixel point is greater than a first preset threshold;
    第一文本区域像素点标记单元,用于在所述像素点的映射像素值不大于第一预设阈值时,将所述像素点标记为第一文本区域像素点;a first text area pixel point marking unit, configured to mark the pixel point as a first text area pixel point when a mapped pixel value of the pixel point is not greater than a first preset threshold;
    第一文本区域提取单元,用于从所述图像中提取出包含全部第一文本区域像素点的面积最小的外接矩形。The first text area extracting unit is configured to extract, from the image, a circumscribed rectangle having the smallest area of all the first text area pixel points.
  14. 根据权利要求10-13任一所述的装置,其特征在于,还包括:The device according to any one of claims 10-13, further comprising:
    二值化处理模块,用于对所述第一文本区域进行二值化处理。A binarization processing module is configured to perform binarization processing on the first text region.
  15. 根据权利要求14所述的装置,其特征在于,所述二值化处理模块包括:The device according to claim 14, wherein the binarization processing module comprises:
    第二预设阈值判断子模块,用于分别判断所述第一文本区域中的多个像素点的映射像素值是否大于第二预设阈值;a second preset threshold determining sub-module, configured to determine, respectively, whether a mapped pixel value of the plurality of pixel points in the first text area is greater than a second preset threshold;
    第二背景区域像素点标记子模块,用于在所述像素点的映射像素值大于第二预设阈值时,将所述像素点标记为第二背景区域像素点;a second background area pixel point marking submodule, configured to mark the pixel point as a second background area pixel point when the mapped pixel value of the pixel point is greater than a second preset threshold;
    第二文本区域像素点标记子模块,用于在所述像素点的映射像素值不大于第二预设阈值时,将将所述像素点标记为第二文本区域像素点。The second text area pixel point sub-module is configured to mark the pixel point as a second text area pixel point when the mapped pixel value of the pixel point is not greater than a second preset threshold.
  16. 根据权利要求15所述的装置,其特征在于,所述提取模块包括:The device according to claim 15, wherein the extraction module comprises:
    连通区域确定子模块,用于确定所述第一文本区域中的多个连通区域;a connected area determining submodule, configured to determine a plurality of connected areas in the first text area;
    预设规则判断子模块,用于分别判断所述多个连通区域是否满足预设规则;a preset rule determining sub-module, configured to respectively determine whether the plurality of connected areas meet a preset rule;
    第二文本区域提取子模块,用于在所述多个连通区域满足预设规则时,提取出相对应的多个连通区域作为第二文本区域。 The second text area extraction submodule is configured to extract a corresponding plurality of connected areas as the second text area when the plurality of connected areas satisfy the preset rule.
  17. 根据权利要求16所述的装置,其特征在于,所述连通区域确定子模块包括:The apparatus according to claim 16, wherein the connected area determining submodule comprises:
    第二文本区域像素点遍历单元,用于遍历所述第二文本区域像素点;a second text area pixel traversal unit for traversing the second text area pixel point;
    第二文本区域像素点连接单元,用于将当前第二文本区域像素点与相邻的第二文本区域像素点相连,获得以第二文本区域像素点为顶点的多边形;a second text area pixel point connecting unit, configured to connect the current second text area pixel point to the adjacent second text area pixel point to obtain a polygon with the second text area pixel point as a vertex;
    连通区域确定单元,用于将包含所述多边形的面积最小的外接矩形确定为连通区域。The connected area determining unit is configured to determine a circumscribed rectangle having the smallest area of the polygon as the connected area.
  18. 根据权利要求10或11或12或13或15或16或17所述的装置,其特征在于,所述识别模块包括:The device according to claim 10 or 11 or 12 or 13 or 15 or 16 or 17, wherein the identification module comprises:
    识别子模块,用于采用卷积神经网络CNN汉字识别模型对所述第二文本区域进行识别。 And a recognition submodule for identifying the second text area by using a convolutional neural network CNN Chinese character recognition model.
PCT/CN2017/076548 2016-03-25 2017-03-14 Image text identification method and apparatus WO2017162069A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610179262.8 2016-03-25
CN201610179262.8A CN107229932B (en) 2016-03-25 2016-03-25 Image text recognition method and device

Publications (1)

Publication Number Publication Date
WO2017162069A1 true WO2017162069A1 (en) 2017-09-28

Family

ID=59899251

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/076548 WO2017162069A1 (en) 2016-03-25 2017-03-14 Image text identification method and apparatus

Country Status (3)

Country Link
CN (1) CN107229932B (en)
TW (1) TWI774659B (en)
WO (1) WO2017162069A1 (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109874313A (en) * 2017-10-13 2019-06-11 众安信息技术服务有限公司 Text line detection method and line of text detection device
CN110569835A (en) * 2018-06-06 2019-12-13 北京搜狗科技发展有限公司 Image identification method and device and electronic equipment
CN110619325A (en) * 2018-06-20 2019-12-27 北京搜狗科技发展有限公司 Text recognition method and device
CN111161185A (en) * 2019-12-30 2020-05-15 深圳蓝韵医学影像有限公司 Method and system for continuously adjusting X-ray image
CN111178362A (en) * 2019-12-16 2020-05-19 平安国际智慧城市科技股份有限公司 Text image processing method, device, equipment and storage medium
CN111192149A (en) * 2019-11-25 2020-05-22 泰康保险集团股份有限公司 Method and device for generating underwriting result data
CN111275051A (en) * 2020-02-28 2020-06-12 上海眼控科技股份有限公司 Character recognition method, character recognition device, computer equipment and computer-readable storage medium
CN111368822A (en) * 2020-03-20 2020-07-03 上海中通吉网络技术有限公司 Method, device, device, and storage medium for cutting express area in an image
CN111368837A (en) * 2018-12-25 2020-07-03 中移(杭州)信息技术有限公司 Image quality evaluation method and device, electronic equipment and storage medium
CN111553336A (en) * 2020-04-27 2020-08-18 西安电子科技大学 A system and method for image recognition of printed Uyghur documents based on conjoined segments
CN111723627A (en) * 2019-03-22 2020-09-29 北京搜狗科技发展有限公司 An image processing method, device and electronic device
CN111814508A (en) * 2019-04-10 2020-10-23 阿里巴巴集团控股有限公司 Method, system and device for character recognition
CN112634382A (en) * 2020-11-27 2021-04-09 国家电网有限公司大数据中心 Image recognition and replacement method and device for unnatural object
CN112784835A (en) * 2021-01-21 2021-05-11 恒安嘉新(北京)科技股份公司 Method and device for identifying authenticity of circular seal, electronic equipment and storage medium
CN113011409A (en) * 2021-04-02 2021-06-22 北京世纪好未来教育科技有限公司 Image identification method and device, electronic equipment and storage medium
CN113688811A (en) * 2021-10-26 2021-11-23 北京美摄网络科技有限公司 Image processing method, image processing device, electronic equipment and storage medium
CN113793316A (en) * 2021-09-13 2021-12-14 合肥合滨智能机器人有限公司 Ultrasonic scanning area extraction method, device, equipment and storage medium
CN114245106A (en) * 2021-12-17 2022-03-25 杭州视洞科技有限公司 Camera picture movement detection area conversion method
CN114550173A (en) * 2020-11-26 2022-05-27 中移物联网有限公司 Image preprocessing method and device, electronic equipment and readable storage medium
CN115278104A (en) * 2022-07-04 2022-11-01 浙江大华技术股份有限公司 Image brightness adjusting method and device, electronic equipment and storage medium
CN115471709A (en) * 2022-09-28 2022-12-13 刘鹏 Directional signal intelligent analysis platform

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110717486B (en) * 2018-07-13 2022-08-05 杭州海康威视数字技术股份有限公司 Text detection method and device, electronic equipment and storage medium
CN110858404B (en) * 2018-08-22 2023-07-07 瑞芯微电子股份有限公司 Identification method and terminal based on regional offset
CN109409377B (en) * 2018-12-03 2020-06-02 龙马智芯(珠海横琴)科技有限公司 Method and device for detecting characters in image
CN111523315B (en) * 2019-01-16 2023-04-18 阿里巴巴集团控股有限公司 Data processing method, text recognition device and computer equipment
CN112101334B (en) 2019-06-18 2024-07-19 京东方科技集团股份有限公司 Method and device for determining area to be cleaned and dust cleaning device
CN113903043B (en) * 2021-12-11 2022-05-06 绵阳职业技术学院 Method for identifying printed Chinese character font based on twin metric model
CN115429157B (en) * 2022-08-29 2024-11-08 广州萨普拉智能科技有限公司 Method and device for determining cleaning range, cleaning robot and storage medium
CN118172777B (en) * 2024-05-16 2024-07-12 成都航空职业技术学院 Interactive virtual teaching aid implementation method based on image processing

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050013502A1 (en) * 2003-06-28 2005-01-20 Samsung Electronics Co., Ltd. Method of improving image quality
CN101599125A (en) * 2009-06-11 2009-12-09 上海交通大学 A Binarization Method for Image Processing under Complicated Background
CN104281850A (en) * 2013-07-09 2015-01-14 腾讯科技(深圳)有限公司 Character area identification method and device
CN105426818A (en) * 2015-10-30 2016-03-23 小米科技有限责任公司 Area extraction method and device

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100382096C (en) * 2003-08-20 2008-04-16 奥西-技术有限公司 Document scanner
US7570816B2 (en) * 2005-03-31 2009-08-04 Microsoft Corporation Systems and methods for detecting text
CN101615244A (en) * 2008-06-26 2009-12-30 上海梅山钢铁股份有限公司 Handwritten plate blank numbers automatic identifying method and recognition device
CN102314608A (en) * 2010-06-30 2012-01-11 汉王科技股份有限公司 Method and device for extracting rows from character image
CN102456137B (en) * 2010-10-20 2013-11-13 上海青研信息技术有限公司 Sight line tracking preprocessing method based on near-infrared reflection point characteristic
CN103336961B (en) * 2013-07-22 2016-06-29 中国科学院自动化研究所 A kind of interactively natural scene Method for text detection
CN104268150A (en) * 2014-08-28 2015-01-07 小米科技有限责任公司 Method and device for playing music based on image content
CN104573685B (en) * 2015-01-29 2017-11-21 中南大学 A kind of natural scene Method for text detection based on linear structure extraction
CN105335745B (en) * 2015-11-27 2018-12-18 小米科技有限责任公司 Digital recognition methods, device and equipment in image
CN105336169B (en) * 2015-12-09 2018-06-26 青岛海信网络科技股份有限公司 A kind of method and system that traffic congestion is judged based on video

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050013502A1 (en) * 2003-06-28 2005-01-20 Samsung Electronics Co., Ltd. Method of improving image quality
CN101599125A (en) * 2009-06-11 2009-12-09 上海交通大学 A Binarization Method for Image Processing under Complicated Background
CN104281850A (en) * 2013-07-09 2015-01-14 腾讯科技(深圳)有限公司 Character area identification method and device
CN105426818A (en) * 2015-10-30 2016-03-23 小米科技有限责任公司 Area extraction method and device

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109874313A (en) * 2017-10-13 2019-06-11 众安信息技术服务有限公司 Text line detection method and line of text detection device
CN110569835A (en) * 2018-06-06 2019-12-13 北京搜狗科技发展有限公司 Image identification method and device and electronic equipment
CN110619325A (en) * 2018-06-20 2019-12-27 北京搜狗科技发展有限公司 Text recognition method and device
CN110619325B (en) * 2018-06-20 2024-03-08 北京搜狗科技发展有限公司 Text recognition method and device
CN111368837B (en) * 2018-12-25 2023-12-05 中移(杭州)信息技术有限公司 Image quality evaluation method and device, electronic equipment and storage medium
CN111368837A (en) * 2018-12-25 2020-07-03 中移(杭州)信息技术有限公司 Image quality evaluation method and device, electronic equipment and storage medium
CN111723627A (en) * 2019-03-22 2020-09-29 北京搜狗科技发展有限公司 An image processing method, device and electronic device
CN111814508B (en) * 2019-04-10 2024-01-09 阿里巴巴集团控股有限公司 Character recognition method, system and equipment
CN111814508A (en) * 2019-04-10 2020-10-23 阿里巴巴集团控股有限公司 Method, system and device for character recognition
CN111192149B (en) * 2019-11-25 2023-06-16 泰康保险集团股份有限公司 Nuclear insurance result data generation method and device
CN111192149A (en) * 2019-11-25 2020-05-22 泰康保险集团股份有限公司 Method and device for generating underwriting result data
CN111178362B (en) * 2019-12-16 2023-05-26 平安国际智慧城市科技股份有限公司 Text image processing method, device, equipment and storage medium
CN111178362A (en) * 2019-12-16 2020-05-19 平安国际智慧城市科技股份有限公司 Text image processing method, device, equipment and storage medium
CN111161185A (en) * 2019-12-30 2020-05-15 深圳蓝韵医学影像有限公司 Method and system for continuously adjusting X-ray image
CN111161185B (en) * 2019-12-30 2024-01-19 深圳蓝影医学科技股份有限公司 X-ray image continuous adjustment method and system
CN111275051A (en) * 2020-02-28 2020-06-12 上海眼控科技股份有限公司 Character recognition method, character recognition device, computer equipment and computer-readable storage medium
CN111368822A (en) * 2020-03-20 2020-07-03 上海中通吉网络技术有限公司 Method, device, device, and storage medium for cutting express area in an image
CN111368822B (en) * 2020-03-20 2023-09-19 上海中通吉网络技术有限公司 Method, device, equipment and storage medium for cutting express delivery form area in image
CN111553336B (en) * 2020-04-27 2023-03-24 西安电子科技大学 Print Uyghur document image recognition system and method based on link segment
CN111553336A (en) * 2020-04-27 2020-08-18 西安电子科技大学 A system and method for image recognition of printed Uyghur documents based on conjoined segments
CN114550173A (en) * 2020-11-26 2022-05-27 中移物联网有限公司 Image preprocessing method and device, electronic equipment and readable storage medium
CN112634382B (en) * 2020-11-27 2024-03-19 国家电网有限公司大数据中心 Method and device for identifying and replacing images of unnatural objects
CN112634382A (en) * 2020-11-27 2021-04-09 国家电网有限公司大数据中心 Image recognition and replacement method and device for unnatural object
CN112784835B (en) * 2021-01-21 2024-04-12 恒安嘉新(北京)科技股份公司 Method and device for identifying authenticity of circular seal, electronic equipment and storage medium
CN112784835A (en) * 2021-01-21 2021-05-11 恒安嘉新(北京)科技股份公司 Method and device for identifying authenticity of circular seal, electronic equipment and storage medium
CN113011409A (en) * 2021-04-02 2021-06-22 北京世纪好未来教育科技有限公司 Image identification method and device, electronic equipment and storage medium
CN113793316B (en) * 2021-09-13 2023-09-12 合肥合滨智能机器人有限公司 Ultrasonic scanning area extraction method, device, equipment and storage medium
CN113793316A (en) * 2021-09-13 2021-12-14 合肥合滨智能机器人有限公司 Ultrasonic scanning area extraction method, device, equipment and storage medium
CN113688811B (en) * 2021-10-26 2022-04-08 北京美摄网络科技有限公司 Image processing method, image processing device, electronic equipment and storage medium
CN113688811A (en) * 2021-10-26 2021-11-23 北京美摄网络科技有限公司 Image processing method, image processing device, electronic equipment and storage medium
CN114245106A (en) * 2021-12-17 2022-03-25 杭州视洞科技有限公司 Camera picture movement detection area conversion method
CN115278104B (en) * 2022-07-04 2024-02-09 浙江大华技术股份有限公司 Image brightness adjustment method and device, electronic equipment and storage medium
CN115278104A (en) * 2022-07-04 2022-11-01 浙江大华技术股份有限公司 Image brightness adjusting method and device, electronic equipment and storage medium
CN115471709B (en) * 2022-09-28 2023-06-27 武汉中安智通科技有限公司 Intelligent analysis system for directional signals
CN115471709A (en) * 2022-09-28 2022-12-13 刘鹏 Directional signal intelligent analysis platform

Also Published As

Publication number Publication date
TWI774659B (en) 2022-08-21
CN107229932A (en) 2017-10-03
TW201740316A (en) 2017-11-16
CN107229932B (en) 2021-05-28

Similar Documents

Publication Publication Date Title
WO2017162069A1 (en) Image text identification method and apparatus
CN112686812B (en) Bank card tilt correction detection method, device, readable storage medium and terminal
Khan et al. An efficient contour based fine-grained algorithm for multi category object detection
CN103914676B (en) A kind of method and apparatus used in recognition of face
CN106446952B (en) A kind of musical score image recognition methods and device
CN109522908A (en) Image significance detection method based on area label fusion
CN109615614B (en) Method for extracting blood vessels in fundus image based on multi-feature fusion and electronic equipment
CN107845068B (en) Image viewing angle conversion device and method
CN105512683A (en) Target positioning method and device based on convolution neural network
CN110766016B (en) Code-spraying character recognition method based on probabilistic neural network
WO2020253508A1 (en) Abnormal cell detection method and apparatus, and computer readable storage medium
JP2014531097A (en) Text detection using multi-layer connected components with histograms
CN108197644A (en) A kind of image-recognizing method and device
WO2019204577A1 (en) System and method for multimedia analytic processing and display
CN110472521B (en) Pupil positioning calibration method and system
CN105023253A (en) Visual underlying feature-based image enhancement method
CN111695373A (en) Zebra crossing positioning method, system, medium and device
CN104504368A (en) Image scene recognition method and image scene recognition system
CN107392968A (en) The image significance detection method of Fusion of Color comparison diagram and Color-spatial distribution figure
CN105225218B (en) Distortion correction method and equipment for file and picture
CN116580397A (en) Pathological image recognition method, device, equipment and storage medium
CN108960247B (en) Image significance detection method and device and electronic equipment
CN114926635B (en) Target segmentation method in multi-focus image combined with deep learning method
CN119048521B (en) Method, device and computer equipment for counting milk somatic cells
CN107368832A (en) Target detection and sorting technique based on image

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17769346

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17769346

Country of ref document: EP

Kind code of ref document: A1