[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN112348021A - Text detection method, device, equipment and storage medium - Google Patents

Text detection method, device, equipment and storage medium Download PDF

Info

Publication number
CN112348021A
CN112348021A CN202110020727.6A CN202110020727A CN112348021A CN 112348021 A CN112348021 A CN 112348021A CN 202110020727 A CN202110020727 A CN 202110020727A CN 112348021 A CN112348021 A CN 112348021A
Authority
CN
China
Prior art keywords
text
curve
contracted
starting point
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110020727.6A
Other languages
Chinese (zh)
Inventor
秦勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yizhen Xuesi Education Technology Co Ltd
Original Assignee
Beijing Yizhen Xuesi Education Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yizhen Xuesi Education Technology Co Ltd filed Critical Beijing Yizhen Xuesi Education Technology Co Ltd
Priority to CN202110020727.6A priority Critical patent/CN112348021A/en
Publication of CN112348021A publication Critical patent/CN112348021A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a text detection method, a text detection device, text detection equipment and a storage medium; the method comprises the following steps: acquiring an inner contracted text curve probability graph of a text image to be processed, wherein the inner contracted text curve probability graph can represent the probability value that a pixel point in the text image to be processed belongs to an inner contracted text curve; acquiring a starting point score map of the text image to be processed and a characteristic value of a corresponding starting point in the starting point score map, wherein the starting point score map can represent whether a pixel point in the text image to be processed belongs to a contracted text curve or not; and determining a plurality of text regions corresponding to the text image to be processed based on each starting point corresponding to the starting point score map, the characteristic value of each starting point and a plurality of contracted text curves corresponding to the contracted text curve probability map. Therefore, intensive text detection can be performed quickly, and the efficiency of text detection is improved.

Description

Text detection method, device, equipment and storage medium
Technical Field
The present application relates to image processing technologies, and in particular, to a text detection method, apparatus, device, and storage medium.
Background
Text detection has a wide application range, is a pre-step of many computer vision tasks, and is mainly used for positioning text lines or characters in an image, and accurate positioning of the text is very important and challenging, because compared with general target detection, characters have the characteristics of multiple directions, irregular shapes, extreme length-width ratios, fonts, colors, backgrounds and the like, and therefore, algorithms which are successful in general target detection cannot be directly migrated into text detection. Although some text detection methods are proposed with the rise of deep learning technology, for denser texts, such as arithmetic practice of pupils, there are 100 text regions on an image, and at this time, the processing rate of the existing method decreases linearly with the increase of the number of text boxes, so that the requirement of practical application scenarios cannot be met, and the user experience is reduced.
Disclosure of Invention
The embodiment of the application provides a text detection method, a text detection device, text detection equipment and a storage medium, which are used for solving the problems in the related technology, and the technical scheme is as follows:
in a first aspect, an embodiment of the present application provides a text detection method, including:
acquiring an inner contracted text curve probability graph of a text image to be processed, wherein the inner contracted text curve probability graph can represent the probability value that a pixel point in the text image to be processed belongs to an inner contracted text curve;
acquiring a starting point score map of the text image to be processed and a characteristic value of a corresponding starting point in the starting point score map, wherein the starting point score map can represent whether a pixel point in the text image to be processed belongs to a contracted text curve or not;
and determining a plurality of text regions corresponding to the text image to be processed based on each starting point corresponding to the starting point score map, the characteristic value of each starting point and a plurality of contracted text curves corresponding to the contracted text curve probability map.
In one embodiment, the method further comprises:
and inputting the text image to be processed into a preset model, and outputting an inner contracted text curve probability chart from a first branch of the preset model.
In one embodiment, the method further comprises:
inputting the text image to be processed into a preset model, and outputting the starting point score map and the characteristic values corresponding to the starting points from a second branch of the preset model, wherein the starting point score map and the characteristic values corresponding to the starting points are output from different channels of the second branch.
In one embodiment, the method further comprises:
determining coordinate information of each starting point and determining coordinate information of the contracted text curve;
determining the corresponding relation between the starting point and the contracted text curve based on the coordinate information;
determining a plurality of text regions corresponding to the text image to be processed based on the starting points corresponding to the starting point score maps, the feature values of the starting points and the contracted text curves, wherein the determining comprises:
and determining a plurality of text regions corresponding to the text image to be processed based on the corresponding relation between the starting point and the contracted text curve and the characteristic value of the starting point.
In one embodiment, the coordinate information of each starting point includes:
carrying out binarization processing on the initial point score map to obtain an initial point binary map;
and solving a connected domain for the initial point binary image to obtain coordinate information of each initial point corresponding to the initial point binary image.
In one embodiment, the determining the coordinate information of the contracted text curve includes:
carrying out binarization processing on the probability graph of the contracted text curve to obtain a binary graph of the contracted text curve;
and solving a connected domain of the binary image of the contracted text curve to obtain the coordinate information of the contracted text curve corresponding to the binary image of the contracted text curve.
In one embodiment, the determining a plurality of text regions corresponding to the text image to be processed based on the correspondence between the starting point and the contracted text curve and the feature value of the starting point includes:
determining an inner contracted text curve corresponding to each starting point based on the corresponding relation between the starting point and the inner contracted text curve;
drawing a circle along the contracted text curve corresponding to the initial point by taking the initial point corresponding to the contracted text curve as a circle center and the characteristic value corresponding to the initial point as a radius until the end point of the contracted text curve corresponding to the initial point;
and determining text regions corresponding to the contracted text curves based on the regions where the circles determined by circle drawing processing are located, so as to obtain a plurality of text regions corresponding to the text image to be processed.
In a second aspect, an embodiment of the present application provides a text detection apparatus, including:
the device comprises a first obtaining unit, a second obtaining unit and a third obtaining unit, wherein the first obtaining unit is used for obtaining an inner contracted text curve probability chart of a text image to be processed, and the inner contracted text curve probability chart can represent the probability value that a pixel point in the text image to be processed belongs to an inner contracted text curve;
the second obtaining unit is used for obtaining a starting point score map of the text image to be processed and a characteristic value of a corresponding starting point in the starting point score map, wherein the starting point score map can represent whether a pixel point in the text image to be processed belongs to a contracted text curve or not;
and the text detection unit is used for determining a plurality of text regions corresponding to the text image to be processed based on the starting points corresponding to the starting point score maps, the characteristic values of the starting points and a plurality of contracted text curves corresponding to the contracted text curve probability maps.
In an embodiment, the first obtaining unit is further configured to input the text image to be processed into a preset model, and output an indented text curve probability map from a first branch of the preset model.
In an embodiment, the second obtaining unit is further configured to input the text image to be processed into a preset model, and output the starting point score map and the feature values corresponding to the starting points from a second branch of the preset model, where the starting point score map and the feature values corresponding to the starting points are output from different channels of the second branch.
In one embodiment, a coordinate information determination unit is further included, wherein,
the coordinate information determining unit is used for determining the coordinate information of each starting point and determining the coordinate information of the contracted text curve; determining the corresponding relation between the starting point and the contracted text curve based on the coordinate information;
the text detection unit is further configured to determine a plurality of text regions corresponding to the text image to be processed based on the correspondence between the starting point and the contracted text curve and the characteristic value of the starting point.
In one embodiment, the coordinate information determination unit is further configured to:
carrying out binarization processing on the initial point score map to obtain an initial point binary map;
and solving a connected domain for the initial point binary image to obtain coordinate information of each initial point corresponding to the initial point binary image.
In one embodiment, the coordinate information determination unit is further configured to:
carrying out binarization processing on the probability graph of the contracted text curve to obtain a binary graph of the contracted text curve;
and solving a connected domain of the binary image of the contracted text curve to obtain the coordinate information of the contracted text curve corresponding to the binary image of the contracted text curve.
In one embodiment, the text detection unit is further configured to:
determining an inner contracted text curve corresponding to each starting point based on the corresponding relation between the starting point and the inner contracted text curve;
drawing a circle along the contracted text curve corresponding to the initial point by taking the initial point corresponding to the contracted text curve as a circle center and the characteristic value corresponding to the initial point as a radius until the end point of the contracted text curve corresponding to the initial point;
and determining text regions corresponding to the contracted text curves based on the regions where the circles determined by circle drawing processing are located, so as to obtain a plurality of text regions corresponding to the text image to be processed.
In a third aspect, an embodiment of the present application provides a text detection device, including: a memory and a processor. Wherein the memory and the processor are in communication with each other via an internal connection path, the memory is configured to store instructions, the processor is configured to execute the instructions stored by the memory, and the processor is configured to perform the method of any of the above aspects when the processor executes the instructions stored by the memory.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, which stores a computer program, and when the computer program runs on a computer, the method in any one of the above-mentioned aspects is executed.
The advantages or beneficial effects in the above technical solution at least include: the text detection method and the device can realize rapid detection on the text image, particularly on the dense text containing a plurality of text areas in the text image, improve the text detection efficiency, further meet the requirements of practical application scenes, and improve the user experience.
The foregoing summary is provided for the purpose of description only and is not intended to be limiting in any way. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features of the present application will be readily apparent by reference to the drawings and following detailed description.
Drawings
In the drawings, like reference numerals refer to the same or similar parts or elements throughout the several views unless otherwise specified. The figures are not necessarily to scale. It is appreciated that these drawings depict only some embodiments in accordance with the disclosure and are therefore not to be considered limiting of its scope.
FIG. 1 shows a flow diagram of an implementation of a text recognition method according to an embodiment of the application;
FIG. 2 illustrates an implementation flow diagram in a specific example according to an embodiment of the present application;
fig. 3 shows a schematic structural diagram of a text recognition device according to an embodiment of the present application;
fig. 4 shows a schematic structural diagram of a text recognition device according to an embodiment of the present application.
Detailed Description
In the following, only certain exemplary embodiments are briefly described. As those skilled in the art will recognize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present application. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.
Fig. 1 shows a flowchart of an implementation of a file identification method according to an embodiment of the present application. As shown in fig. 1, the method includes:
step S101: the method comprises the steps of obtaining an inner contracted text curve probability graph of a text image to be processed, wherein the inner contracted text curve probability graph can represent the probability value that a pixel point in the text image to be processed belongs to an inner contracted text curve. For example, the so-called indented text curve refers to that the text region is compressed into a line along the vertical direction, which is certainly not a true line, the width of the line is several, such as 3 pixel points, and the horizontal direction is not compressed, because in practical application, the text adhesion is usually up and down, and there is almost no horizontal adhesion, especially for long-curved text. Of course, for text adhesion in different scenes, the corresponding contracted text curve can be obtained in a similar manner as described above, which is not limited by the scheme of the present application.
Step S102: and acquiring a starting point score chart of the text image to be processed and a characteristic value of a corresponding starting point in the starting point score chart, wherein the starting point score chart can represent whether a pixel point in the text image to be processed belongs to an inner contracted text curve or not. The starting point score map can specifically represent the probability value of a pixel point belonging to the starting point of the contracted text curve, and the characteristic value of the corresponding starting point can be used as a radius value corresponding to the starting point, so that a foundation is laid for accurately obtaining a text region in the follow-up process.
Step S103: and determining a plurality of text regions corresponding to the text image to be processed based on each starting point corresponding to the starting point score map, the characteristic value of each starting point and a plurality of contracted text curves corresponding to the contracted text curve probability map.
Here, it should be noted that, in practical applications, the execution sequence of step S101 and step S102 is not limited, that is, the graph of probability of the contracted text curve may be obtained, and then the score map of the starting point and the specific value are obtained, or the graph of probability of the contracted text curve may be obtained after the score map of the starting point and the specific value are obtained first, which is not limited in the present application.
Therefore, according to the scheme, based on the starting points and the characteristic values corresponding to the starting point score maps and the plurality of inner contracted text curves corresponding to the inner contracted text curve probability maps, the text image, especially the dense text containing a plurality of text areas in the text image can be quickly detected, the text detection efficiency is improved, the requirements of practical application scenes are met, and the user experience is improved.
In a specific example of the scheme of the present application, the method further includes: and inputting the text image to be processed into a preset model, and outputting an inner contracted text curve probability chart from a first branch of the preset model. That is to say, in practical applications, the probability map of the contracted text curve can be obtained by using a first branch of a preset model, for example, the first branch uses a Pixel Aggregation Network (PAN) structure to obtain a probability map of the predicted contracted text curve. By integrating the experience of the PAN, in the process of performing model training on the PAN, the PAN can be used for training a Loss function, such as Dice Loss, obtained by the relation between a real text region and an inner contracted text obtained based on an inner contracted text curve probability map so as to improve the accuracy of the obtained inner contracted text curve probability map.
In a specific example of the scheme of the present application, the method further includes: inputting the text image to be processed into a preset model, and outputting the starting point score map and the characteristic values corresponding to the starting points from a second branch of the preset model, wherein the starting point score map and the characteristic values corresponding to the starting points are output from different channels of the second branch. That is, in practical applications, the starting point score map and the feature values corresponding to each starting point may be obtained by using a second branch of the preset model, for example, the second branch uses a Differentiable Binarization method (DB, Real-time Scene Text Detection with differentiated Binarization) structure and a centret to obtain the starting point score map and the feature values corresponding to each starting point (which may be used as a radius of a circle), that is, the output result of the second branch is 2 channels, that is, the first channel represents the starting point score map (i.e., whether a pixel belongs to a starting point, for example, belongs to a starting point, the score is 1, but not 1, and the score is 0), here, in the process of model training the first channel of the DB structure, the relationship between the Real starting point and the predicted starting point score map may be used for training, and thus, the first channel in the trained DB structure is used, and obtaining the initial point score map, thus improving the accuracy of the obtained initial point score map. The second channel represents a radius taking the starting point as a circle, and may be referred to as a radius value, where in practical applications, in the process of training the second channel of the DB structure, the smooth L1 loss function may be used to train a difference between the real radius value and the predicted radius value, so that the trained second channel of the DB structure is used to obtain a feature value of each starting point, so as to improve the accuracy of the predicted radius value. Meanwhile, a foundation is laid for the subsequent accurate and quick detection of the text region.
In a specific example of the scheme of the present application, the method further includes: determining coordinate information of each starting point and determining coordinate information of the contracted text curve; determining the corresponding relation between the starting point and the contracted text curve based on the coordinate information; determining a plurality of text regions corresponding to the text image to be processed based on the starting points corresponding to the starting point score maps, the feature values of the starting points and the contracted text curves, wherein the determining includes: and determining a plurality of text regions corresponding to the text image to be processed based on the corresponding relation between the starting point and the contracted text curve and the characteristic value of the starting point. That is to say, determine the initial point based on the coordinate information with the corresponding relation between the text curve of contracting in, so, and then realize carrying out the short-term test to the text image, especially to the intensive text that contains a plurality of text regions in the text image, promoted text detection efficiency, satisfied the demand of practical application scene, promoted user experience.
In a specific example of the solution of the present application, the obtaining of the coordinate information of the starting points may be implemented in a manner that the determining of the coordinate information of each starting point includes: carrying out binarization processing on the initial point score map to obtain an initial point binary map; and solving a connected domain for the initial point binary image to obtain coordinate information of each initial point corresponding to the initial point binary image. Therefore, a foundation is laid for determining the corresponding relation between the starting point and the contracted text curve based on the coordinate information subsequently, and a foundation is laid for accurately obtaining the text region.
In a specific example of the solution of the present application, the obtaining of the coordinate information of the contracted text curve may further be performed in the following manner, specifically, the determining of the coordinate information of the contracted text curve includes: carrying out binarization processing on the probability graph of the contracted text curve to obtain a binary graph of the contracted text curve; and solving a connected domain of the binary image of the contracted text curve to obtain the coordinate information of the contracted text curve corresponding to the binary image of the contracted text curve. Therefore, a foundation is laid for determining the corresponding relation between the starting point and the contracted text curve based on the coordinate information subsequently, and a foundation is laid for accurately obtaining the text region.
In a specific example of the present application, the determining a plurality of text regions corresponding to the text image to be processed based on the correspondence between the starting point and the contracted text curve and the characteristic value of the starting point includes: determining an inner contracted text curve corresponding to each starting point based on the corresponding relation between the starting point and the inner contracted text curve; drawing a circle along the contracted text curve corresponding to the initial point by taking the initial point corresponding to the contracted text curve as a circle center and the characteristic value corresponding to the initial point as a radius until the end point of the contracted text curve corresponding to the initial point; and determining text regions corresponding to the contracted text curves based on the regions where the circles determined by circle drawing processing are located, so as to obtain a plurality of text regions corresponding to the text image to be processed. That is to say, the real text area is obtained in a circular rolling mode, so that the detection effect is enhanced, and the speed of intensive text detection is comprehensively increased.
Specifically, after obtaining the corresponding relationship between each starting point and each contracted text curve according to the coordinate information, performing circle drawing processing by using the predicted radius value (i.e. the characteristic value corresponding to the starting point) from the starting point to obtain a first circle, wherein the horizontal radius of the circle is about to fall on the contracted text curve, then continuing to draw the circle according to the end point of the horizontal radius of the first circle (the characteristic value corresponding to the end point is taken as the radius), and the two circles are intersected and sequentially circulating the above operations until the last point on the contracted text curve. In practical applications, for different contracted text curves, parallel Processing may be performed, for example, parallel Processing is performed by using a Graphics Processing Unit (GPU), so as to improve detection efficiency while completing dense text detection.
Therefore, the text image, especially the dense text containing a plurality of text areas in the text image can be quickly detected, the text detection efficiency is improved, the requirement of an actual application scene is met, and the user experience is improved.
The present solution is described in further detail below with reference to specific examples, which, in particular,
the method combines the advantages of a PAN (PAN) structure, a DB (differential Binarization) structure and a center Network (CenterNet), combines the segmentation and regression methods, uses a brand new text detection idea, finds out the corresponding dense text by obtaining a text region shrinkage curve, and then obtains a real text region through a circular rolling idea, so as to enhance the detection effect and comprehensively improve the speed of dense text detection.
Here, the preset model used in the present example is used to a PAN structure, a DB structure, and a centrnet; specifically, in the model training phase, similar to PAN and DB, a Resnet18 network model is used as a basic network model, where a Resnet18 network used in this example is constructed by connecting 4 block blocks in series, each block includes several layers of convolution operations, where the size of a feature map output by a first block is 1/4 of an original, the size of a feature map output by a second block is 1/8 of the original, the size of a feature map output by a third block is 1/16 of the original, and the size of a feature map output by a fourth block is 1/32 of the original, and then all four groups of feature maps output by the 4 block blocks are interpolated to be 1/4 of the original and connected in series to obtain a group of feature maps, where the number of channels is 512.
Further, feature extraction is carried out again in the DB, namely, the feature mapping chart of 2 channels is output after convolution operation and deconvolution operation for two times; in the processing procedure, the number of feature maps output by each block is 128.
This is now divided into two branches, namely:
the first branch is: the PAN structure is used for obtaining a probability map of a predicted contracted text curve, specifically, a convolution operation is performed on the feature mapping of 512 channels, a deconvolution operation is performed twice, and a feature mapping of 1 channel with the same size as an original image is output, wherein the output result of the channel represents the probability map of the contracted text curve, here, the contracted text curve refers to that a text area is compressed into a line along the vertical direction, certainly not a true line, the width of the line is 3 pixels, and the compression is not performed in the horizontal direction, because the text adhesion is generally up and down, almost no horizontal adhesion exists, and especially for long-curved texts. Here, by integrating the experience of PAN, a Loss function obtained by PAN on the relationship between the real text region and the contracted text obtained based on the contracted text curve probability map may be used, and the accuracy of the obtained contracted text curve probability map is improved by training through Dice pass, which is expressed as follows:
Figure DEST_PATH_IMAGE001
wherein, Ptex(i) Indicates the predicted value of the ith pixel point, Gtex(i) The real value of the ith pixel point is referred, and the label of the probability graph of the contracted text curve is an artificially made two-value graph of the contracted text curve.
The second branch is as follows: using the DB structure and the centret to obtain the starting point score map and the radius of the circle corresponding to each starting point, where the starting point refers to the starting point on the contracted text curve, specifically, the branch output result is also 2 channels, that is, the first channel represents the starting point score map (i.e., whether the pixel belongs to the starting point, for example, belongs to the starting point, the score is 1, does not belong to, and the score is 0), where the relationship between the real starting point score map and the predicted starting point score map can be trained using focalloss in the centret, so as to improve the accuracy of the obtained starting point score map. Here, the function of focalloss is expressed as follows:
Figure DEST_PATH_IMAGE002
wherein x and y represent coordinate positions, and c represents a channel; n represents the number of all channels,
Figure DEST_PATH_IMAGE003
a predictor characterizing coordinates (x, y) and a channel as c;
Figure DEST_PATH_IMAGE004
the characterization coordinate is the true value of (x, y) channel as c. Here, the number of all pixel points is equal to
Figure DEST_PATH_IMAGE005
The second channel represents the radius of the circle around the starting point, which may be referred to as the radius value, and here, the difference between the true radius value and the predicted radius value may be trained using the smooth L1 loss function to improve the accuracy of the predicted radius value.
Further, in the model using stage, after obtaining the output of two branches based on the input image to be processed, respectively performing binarization processing on an inner contracted text curve probability map and a starting point score map according to a specified threshold, wherein the threshold corresponding to the inner contracted text curve probability map is set relatively low, and the threshold corresponding to the starting point score map is set relatively high, so as to obtain an inner contracted text curve binary map and a starting point binary map, then respectively obtaining connected domains for the two maps, i.e. the inner contracted text curve binary map and the starting point binary map, and obtaining the corresponding relation between each starting point and each inner contracted text curve based on the respectively obtained coordinate information, then obtaining a first circle by using the predicted radius value (i.e. the radius value corresponding to the starting point) from the starting point, and the horizontal radius of the circle falls on the inner contracted text curve, and then continuing to draw the circle according to the ending point of the horizontal radius of the first circle, the two circles are intersected, the operation is sequentially and circularly carried out until the last point of the contracted text curve is cut off, macroscopically, the circle center of one circle rolls along the contracted text curve, and finally the obtained areas contained by all the circles are real text areas.
As shown in fig. 2, the specific steps include:
firstly, inputting the dense text image into a Resnet18 network in a preset model, and performing feature extraction to obtain feature 1, wherein the feature 1 may specifically include 4 sets of multichannel feature maps with different sizes, that is, 4 sets of features.
Here, similar to PAN and DB, using the Resnet18 network model as the underlying network model, features such as texture, edges, corners, and semantic information are extracted from the input text image, and these features are characterized by 4 sets of multi-channel feature maps of different sizes. And then, extracting the features such as texture, edge, corner, semantic information and the like again by using 2 Feature Pyramid Enhancement Modules (FPEM). Specifically, the Resnet18 network used in this example is constructed by 4 blocks connected in series, each block including several layers of convolution operations, where the first block outputs a feature map of size 1/4 for artwork, the second block outputs a feature map of size 1/8 for artwork, the third block outputs a feature map of size 1/16 for artwork, and the fourth block outputs a feature map of size 1/32 for artwork. Further, in practical application, the four sets of feature maps output by the 4 block blocks may be all interpolated to be changed to the original 1/4 size and connected in series to obtain a set of feature maps (i.e. feature 1) with a channel number of 512.
And secondly, performing upsampling and series connection on the features 1 extracted in the first step to obtain a group of feature maps, wherein the group of feature maps pass through two FPEM modules in a preset model, and extracting again to obtain features 2, and the features 2 comprise 4 groups of feature maps.
Here, the reason for choosing 2 FPEM modules is because 2 can achieve the best results among the experimental results. The processing performed by each FPEM module is the same, and the specific details include: the multi-channel feature maps with different sizes of 4 groups (namely four block outputs) obtained in the last step are sequentially called as forward first, forward second, forward third and forward fourth group feature maps from large to small from front to back, the forward fourth group feature map is up-sampled by 2 times, namely the size of the forward fourth group feature map is enlarged by 2 times, then the forward fourth group feature map and the forward third group feature map are added point by point according to channels, after the result is subjected to a depth separable convolution operation, the convolution, batch normalization and activation function action operation are carried out again, the obtained result is called as reverse second group feature map, the same operation is used for reversing the second group feature map and the forward second group feature map to obtain reverse third group feature maps, and then the same operation is applied to the reverse third group feature maps and the forward first group feature maps to obtain reverse fourth group feature maps, meanwhile, the forward fourth group of feature maps are regarded as a reverse first group of feature maps, so that 4 groups of reverse feature maps are obtained; taking the fourth group of reverse feature maps as a target first group of feature maps, performing 2-time down-sampling on the target first group of feature maps, namely reducing the size by 2 times, then adding the fourth group of reverse feature maps and the reverse third group of feature maps point by point according to channels, performing a depth separable convolution operation on the result, and then performing convolution, batch normalization and activation function action operation once again to obtain a result called a target second group of feature maps, wherein the same operation is applied to the target second group of feature maps and the reverse second group of feature maps to obtain a target third group of feature maps, and then the same operation is applied to the target third group of feature maps and the reverse first group of feature maps to obtain a target fourth group of feature maps, wherein the target first group of feature maps, the target second group of feature maps, the target third group of feature maps and the target fourth group of feature maps are output of; the 2 nd FFEM module takes the output of the 1 st FFEM module as input, and the same operation is carried out to obtain output.
And thirdly, the size of the feature maps of the 4 groups of feature maps obtained in the second step is up-sampled to the size of 1/4 of the original image and connected in series to obtain a group of feature maps, wherein the number of channels is 512.
And fourthly, performing convolution operation once and deconvolution operation twice on the feature mapping obtained in the third step, outputting the feature mapping with the feature mapping channel being 1 and the feature mapping size being consistent with that of the original image, wherein the feature mapping represents a probability graph of the contracted text curve.
And fifthly, training the output of the fourth step by using the Dice Loss of the PAN for training the real text area and the contracted text.
And sixthly, performing convolution operation and deconvolution operation on the feature mapping obtained in the third step, wherein the output feature mapping channels are 2, the size of the feature mapping output in each channel is consistent with that of the original image, the output result of the first channel represents a score map of the starting point, and the output result of the second channel represents a radius value corresponding to the starting point.
And seventhly, training the output of the first channel in the sixth step by using the Focal local of the center point training by using the CenterNet, and training the output of the second channel in the sixth step by using a smoothing L1 Loss function.
Based on the steps, namely the training step of the preset model, after the training is finished, the dense text detection can be carried out by using the preset model. The specific detection steps are as follows:
eighthly, in a prediction stage, obtaining an inner contracted text curve probability graph corresponding to the image to be predicted (namely the text image to be processed) by adopting a mode of the fourth step, carrying out binarization processing on the inner contracted text curve probability graph according to a specified threshold value to obtain an inner contracted text curve binary graph, and obtaining coordinate information of an inner contracted text curve point after solving a connected domain;
a ninth step, similarly, for the image to be predicted, obtaining a starting point score map of the image to be predicted and a radius value corresponding to each starting point in a sixth step, performing binarization processing on the starting point score map according to a specified threshold, and obtaining coordinate information corresponding to each starting point after solving a connected domain;
tenth, obtaining the corresponding relation between each initial point and each contracted text curve according to the coordinate information;
and eleventh, obtaining a first circle by using the predicted radius value from the starting point, wherein the horizontal radius of the circle is to fall on the contracted text curve, then continuing drawing the circle according to the end point of the horizontal radius of the first circle (the characteristic value corresponding to the end point is taken as the radius), intersecting the two circles, sequentially circulating the operations until the last point on the contracted text curve, macroscopically, rolling the circle as if the center of the circle rolls along the contracted text curve, and finally taking the area corresponding to all the circles as a real text area.
And step ten, for different contracted text curves, performing parallel processing to complete dense text detection.
In conclusion, the text image, especially the dense text containing a plurality of text areas in the text image, can be rapidly detected, so that the text detection efficiency is improved, the requirements of practical application scenes are met, and the user experience is improved.
Fig. 3 shows a schematic structural diagram of a text recognition device according to an embodiment of the present application. As shown in fig. 3, the apparatus may include:
the first obtaining unit 301 is configured to obtain an inner contracted text curve probability map of a text image to be processed, where the inner contracted text curve probability map can represent probability values of pixels in the text image to be processed belonging to an inner contracted text curve;
a second obtaining unit 302, configured to obtain a starting point score map of the to-be-processed text image and a feature value of a corresponding starting point in the starting point score map, where the starting point score map can represent whether a pixel point in the to-be-processed text image belongs to a contracted text curve;
a text detecting unit 303, configured to determine a plurality of text regions corresponding to the to-be-processed text image based on the starting points corresponding to the starting point score maps, the feature values of the starting points, and a plurality of contracted text curves corresponding to the contracted text curve probability maps.
In a specific example of the scheme of the application, the first obtaining unit is further configured to input the text image to be processed into a preset model, and output an indented text curve probability map from a first branch of the preset model.
In a specific example of the scheme of the application, the second obtaining unit is further configured to input the text image to be processed into a preset model, and output the starting point score map and the feature values corresponding to the starting points from a second branch of the preset model, where the starting point score map and the feature values corresponding to the starting points are output from different channels of the second branch.
In a specific example of the present application, a coordinate information determination unit is further included, wherein,
the coordinate information determining unit is used for determining the coordinate information of each starting point and determining the coordinate information of the contracted text curve; determining the corresponding relation between the starting point and the contracted text curve based on the coordinate information;
the text detection unit is further configured to determine a plurality of text regions corresponding to the text image to be processed based on the correspondence between the starting point and the contracted text curve and the characteristic value of the starting point.
In a specific example of the solution of the present application, the coordinate information determining unit is further configured to:
carrying out binarization processing on the initial point score map to obtain an initial point binary map;
and solving a connected domain for the initial point binary image to obtain coordinate information of each initial point corresponding to the initial point binary image.
In a specific example of the solution of the present application, the coordinate information determining unit is further configured to:
carrying out binarization processing on the probability graph of the contracted text curve to obtain a binary graph of the contracted text curve;
and solving a connected domain of the binary image of the contracted text curve to obtain the coordinate information of the contracted text curve corresponding to the binary image of the contracted text curve.
In a specific example of the scheme of the present application, the text detection unit is further configured to:
determining an inner contracted text curve corresponding to each starting point based on the corresponding relation between the starting point and the inner contracted text curve;
drawing a circle along the contracted text curve corresponding to the initial point by taking the initial point corresponding to the contracted text curve as a circle center and the characteristic value corresponding to the initial point as a radius until the end point of the contracted text curve corresponding to the initial point;
and determining text regions corresponding to the contracted text curves based on the regions where the circles determined by circle drawing processing are located, so as to obtain a plurality of text regions corresponding to the text image to be processed.
The functions of each module in each apparatus in the embodiments of the present invention may refer to the corresponding description in the above method, and are not described herein again.
Fig. 4 shows a block diagram of a structure of a text detection apparatus according to an embodiment of the present invention. As shown in fig. 4, the text detection apparatus includes: a memory 410 and a processor 420, the memory 410 having stored therein a computer program operable on the processor 420. The processor 420, when executing the computer program, implements the text detection method in the above-described embodiments. The number of the memory 410 and the processor 420 may be one or more.
The text detection device further includes:
and a communication interface 430, configured to communicate with an external device, and perform data interactive transmission.
If the memory 410, the processor 420 and the communication interface 430 are implemented independently, the memory 410, the processor 420 and the communication interface 430 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 4, but this does not indicate only one bus or one type of bus.
Optionally, in an implementation, if the memory 410, the processor 420, and the communication interface 430 are integrated on a chip, the memory 410, the processor 420, and the communication interface 430 may complete communication with each other through an internal interface.
Embodiments of the present invention provide a computer-readable storage medium, which stores a computer program, and when the program is executed by a processor, the computer program implements the method provided in the embodiments of the present application.
The embodiment of the present application further provides a chip, where the chip includes a processor, and is configured to call and execute the instruction stored in the memory from the memory, so that the communication device in which the chip is installed executes the method provided in the embodiment of the present application.
An embodiment of the present application further provides a chip, including: the system comprises an input interface, an output interface, a processor and a memory, wherein the input interface, the output interface, the processor and the memory are connected through an internal connection path, the processor is used for executing codes in the memory, and when the codes are executed, the processor is used for executing the method provided by the embodiment of the application.
It should be understood that the processor may be a Central Processing Unit (CPU), other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or any conventional processor or the like. It is noted that the processor may be an advanced reduced instruction set machine (ARM) architecture supported processor.
Further, optionally, the memory may include a read-only memory and a random access memory, and may further include a nonvolatile random access memory. The memory may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may include a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available. For example, Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), Enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and direct memory bus RAM (DR RAM).
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the present application are generated in whole or in part when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process. And the scope of the preferred embodiments of the present application includes other implementations in which functions may be performed out of the order shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. All or part of the steps of the method of the above embodiments may be implemented by hardware that is configured to be instructed to perform the relevant steps by a program, which may be stored in a computer-readable storage medium, and which, when executed, includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module may also be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various changes or substitutions within the technical scope of the present application, and these should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (16)

1. A text detection method, comprising:
acquiring an inner contracted text curve probability graph of a text image to be processed, wherein the inner contracted text curve probability graph can represent the probability value that a pixel point in the text image to be processed belongs to an inner contracted text curve;
acquiring a starting point score map of the text image to be processed and a characteristic value of a corresponding starting point in the starting point score map, wherein the starting point score map can represent whether a pixel point in the text image to be processed belongs to a contracted text curve or not;
and determining a plurality of text regions corresponding to the text image to be processed based on each starting point corresponding to the starting point score map, the characteristic value of each starting point and a plurality of contracted text curves corresponding to the contracted text curve probability map.
2. The method of claim 1, further comprising:
and inputting the text image to be processed into a preset model, and outputting an inner contracted text curve probability chart from a first branch of the preset model.
3. The method of claim 1, further comprising:
inputting the text image to be processed into a preset model, and outputting the starting point score map and the characteristic values corresponding to the starting points from a second branch of the preset model, wherein the starting point score map and the characteristic values corresponding to the starting points are output from different channels of the second branch.
4. The method of claim 1, 2 or 3, further comprising:
determining coordinate information of each starting point and determining coordinate information of the contracted text curve;
determining the corresponding relation between the starting point and the contracted text curve based on the coordinate information;
determining a plurality of text regions corresponding to the text image to be processed based on the starting points corresponding to the starting point score maps, the feature values of the starting points and the contracted text curves, wherein the determining comprises:
and determining a plurality of text regions corresponding to the text image to be processed based on the corresponding relation between the starting point and the contracted text curve and the characteristic value of the starting point.
5. The method of claim 4, wherein the determining the coordinate information of each starting point comprises:
carrying out binarization processing on the initial point score map to obtain an initial point binary map;
and solving a connected domain for the initial point binary image to obtain coordinate information of each initial point corresponding to the initial point binary image.
6. The method of claim 4, wherein the determining the coordinate information of the contracted text curve comprises:
carrying out binarization processing on the probability graph of the contracted text curve to obtain a binary graph of the contracted text curve;
and solving a connected domain of the binary image of the contracted text curve to obtain the coordinate information of the contracted text curve corresponding to the binary image of the contracted text curve.
7. The method according to claim 4, wherein the determining a plurality of text regions corresponding to the text image to be processed based on the correspondence between the starting point and the contracted text curve and the characteristic value of the starting point comprises:
determining an inner contracted text curve corresponding to each starting point based on the corresponding relation between the starting point and the inner contracted text curve;
drawing a circle along the contracted text curve corresponding to the initial point by taking the initial point corresponding to the contracted text curve as a circle center and the characteristic value corresponding to the initial point as a radius until the end point of the contracted text curve corresponding to the initial point;
and determining text regions corresponding to the contracted text curves based on the regions where the circles determined by circle drawing processing are located, so as to obtain a plurality of text regions corresponding to the text image to be processed.
8. A text detection apparatus, comprising:
the device comprises a first obtaining unit, a second obtaining unit and a third obtaining unit, wherein the first obtaining unit is used for obtaining an inner contracted text curve probability chart of a text image to be processed, and the inner contracted text curve probability chart can represent the probability value that a pixel point in the text image to be processed belongs to an inner contracted text curve;
the second obtaining unit is used for obtaining a starting point score map of the text image to be processed and a characteristic value of a corresponding starting point in the starting point score map, wherein the starting point score map can represent whether a pixel point in the text image to be processed belongs to a contracted text curve or not;
and the text detection unit is used for determining a plurality of text regions corresponding to the text image to be processed based on the starting points corresponding to the starting point score maps, the characteristic values of the starting points and a plurality of contracted text curves corresponding to the contracted text curve probability maps.
9. The apparatus according to claim 8, wherein the first obtaining unit is further configured to input the text image to be processed into a preset model, and output an indented text curve probability map from a first branch of the preset model.
10. The apparatus according to claim 8, wherein the second obtaining unit is further configured to input the text image to be processed into a preset model, and output the starting point score map and the feature values corresponding to the starting points from a second branch of the preset model, where the starting point score map and the feature values corresponding to the starting points are output from different channels of the second branch.
11. The apparatus according to claim 8, 9 or 10, further comprising a coordinate information determination unit, wherein,
the coordinate information determining unit is used for determining the coordinate information of each starting point and determining the coordinate information of the contracted text curve; determining the corresponding relation between the starting point and the contracted text curve based on the coordinate information;
the text detection unit is further configured to determine a plurality of text regions corresponding to the text image to be processed based on the correspondence between the starting point and the contracted text curve and the characteristic value of the starting point.
12. The apparatus of claim 11, wherein the coordinate information determining unit is further configured to:
carrying out binarization processing on the initial point score map to obtain an initial point binary map;
and solving a connected domain for the initial point binary image to obtain coordinate information of each initial point corresponding to the initial point binary image.
13. The apparatus of claim 11, wherein the coordinate information determining unit is further configured to:
carrying out binarization processing on the probability graph of the contracted text curve to obtain a binary graph of the contracted text curve;
and solving a connected domain of the binary image of the contracted text curve to obtain the coordinate information of the contracted text curve corresponding to the binary image of the contracted text curve.
14. The apparatus of claim 11, wherein the text detection unit is further configured to:
determining an inner contracted text curve corresponding to each starting point based on the corresponding relation between the starting point and the inner contracted text curve;
drawing a circle along the contracted text curve corresponding to the initial point by taking the initial point corresponding to the contracted text curve as a circle center and the characteristic value corresponding to the initial point as a radius until the end point of the contracted text curve corresponding to the initial point;
and determining text regions corresponding to the contracted text curves based on the regions where the circles determined by circle drawing processing are located, so as to obtain a plurality of text regions corresponding to the text image to be processed.
15. A document sensing apparatus, comprising: comprising a processor and a memory, said memory having stored therein instructions that are loaded and executed by the processor to implement the method of any of claims 1 to 7.
16. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
CN202110020727.6A 2021-01-08 2021-01-08 Text detection method, device, equipment and storage medium Pending CN112348021A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110020727.6A CN112348021A (en) 2021-01-08 2021-01-08 Text detection method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110020727.6A CN112348021A (en) 2021-01-08 2021-01-08 Text detection method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112348021A true CN112348021A (en) 2021-02-09

Family

ID=74427474

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110020727.6A Pending CN112348021A (en) 2021-01-08 2021-01-08 Text detection method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112348021A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112990204A (en) * 2021-05-11 2021-06-18 北京世纪好未来教育科技有限公司 Target detection method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009122649A1 (en) * 2008-04-03 2009-10-08 富士フイルム株式会社 Equipment, method, and program for detecting three-dimensional peritoneal cavity region
CN108805131A (en) * 2018-05-22 2018-11-13 北京旷视科技有限公司 Text line detection method, apparatus and system
CN110598708A (en) * 2019-08-08 2019-12-20 广东工业大学 Streetscape text target identification and detection method
CN112101355A (en) * 2020-09-25 2020-12-18 北京百度网讯科技有限公司 Method and device for detecting text in image, electronic equipment and computer medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009122649A1 (en) * 2008-04-03 2009-10-08 富士フイルム株式会社 Equipment, method, and program for detecting three-dimensional peritoneal cavity region
CN108805131A (en) * 2018-05-22 2018-11-13 北京旷视科技有限公司 Text line detection method, apparatus and system
CN110598708A (en) * 2019-08-08 2019-12-20 广东工业大学 Streetscape text target identification and detection method
CN112101355A (en) * 2020-09-25 2020-12-18 北京百度网讯科技有限公司 Method and device for detecting text in image, electronic equipment and computer medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112990204A (en) * 2021-05-11 2021-06-18 北京世纪好未来教育科技有限公司 Target detection method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110738207B (en) Character detection method for fusing character area edge information in character image
CN111814794B (en) Text detection method and device, electronic equipment and storage medium
CN108446694B (en) Target detection method and device
CN112767418B (en) Mirror image segmentation method based on depth perception
CN108764039B (en) Neural network, building extraction method of remote sensing image, medium and computing equipment
CN110991560B (en) Target detection method and system combining context information
CN110503682B (en) Rectangular control identification method and device, terminal and storage medium
CN108765315B (en) Image completion method and device, computer equipment and storage medium
CN112329761A (en) Text detection method, device, equipment and storage medium
WO2019209751A1 (en) Superpixel merging
CN112132164B (en) Target detection method, system, computer device and storage medium
CN111414823B (en) Human body characteristic point detection method and device, electronic equipment and storage medium
CN110796130A (en) Method, device and computer storage medium for character recognition
CN110046623B (en) Image feature point extraction method and camera
CN113269280B (en) Text detection method and device, electronic equipment and computer readable storage medium
CN112348021A (en) Text detection method, device, equipment and storage medium
CN111095295B (en) Object detection method and device
CN111160358A (en) Image binarization method, device, equipment and medium
CN113129298A (en) Definition recognition method of text image
CN113033593A (en) Text detection training method and device based on deep learning
CN110544256B (en) Deep learning image segmentation method and device based on sparse features
KR20150094108A (en) Method for generating saliency map based background location and medium for recording the same
CN116128792A (en) Image processing method and related equipment
CN114648751A (en) Method, device, terminal and storage medium for processing video subtitles
CN112668582B (en) Image recognition method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210209