CN102831579B - Text enhancement method and device, text extraction method and device - Google Patents
Text enhancement method and device, text extraction method and device Download PDFInfo
- Publication number
- CN102831579B CN102831579B CN201110172095.1A CN201110172095A CN102831579B CN 102831579 B CN102831579 B CN 102831579B CN 201110172095 A CN201110172095 A CN 201110172095A CN 102831579 B CN102831579 B CN 102831579B
- Authority
- CN
- China
- Prior art keywords
- mrow
- stroke
- pixel point
- original
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000000605 extraction Methods 0.000 title abstract description 43
- 238000001914 filtration Methods 0.000 claims abstract description 56
- 230000004044 response Effects 0.000 claims description 153
- 238000004364 calculation method Methods 0.000 claims description 15
- 238000010586 diagram Methods 0.000 description 11
- 230000008569 process Effects 0.000 description 9
- 230000000694 effects Effects 0.000 description 6
- 230000010365 information processing Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Landscapes
- Image Analysis (AREA)
- Character Input (AREA)
Abstract
The embodiment of the invention discloses a text enhancement method, a text enhancement device, a text extraction method and a text extraction device. The text enhancement method comprises the following steps: acquiring an original image including a line of text; according to the direct difference degree and indirect difference degree between any original pixel in the original image and each neighborhood pixel in a neighborhood set of the original image, carrying out stroke two-dimensional filtering on the original brightness value or/and color value of each original pixel so as to obtain an updated brightness value or/and color value of the filtered original image, wherein the range of the neighborhood set is a square with a side length of w and the original pixel as a center, and w is less than the height of the original image; and respectively replacing corresponding original brightness value or/and color value with the updated brightness value or/and color value after filtering so as to generate a text enhancement image corresponding to the original image. Through the embodiment of the invention, texts in original images can be enhanced, so that the subsequent text extraction on the text enhancement image is more precise and accurate.
Description
Technical Field
The present invention generally relates to the field of image processing technologies, and in particular, to a text enhancement method and apparatus, and a text extraction method and apparatus.
Background
In the process of playing a video or when enjoying an image, the video or the image often includes a text description, for example, a description of the occurrence time and place of a piece of video, or a description of the image. Since the text content is closely related to the video or image, extracting the text in the video or image becomes a crucial technology.
In the prior art, a method for extracting a text of a video or an image can be used for extracting the text in the image or the video on the basis of binarization, edge color clustering and detection technologies.
However, in the prior art, when text extraction is performed, due to excessive noise possibly existing in a video or an image, the image or the video is blurred, even if illumination changes exist in a section of video, the text in the image or the video and a background boundary are blurred, or the text content itself is not clear enough, so that the effect of text extraction is affected.
Therefore, how to perform enhancement processing on the text in the original image or video to enhance the text in the image or video and further optimize the effect of text extraction has become a problem to be solved in the prior art.
Disclosure of Invention
In view of this, embodiments of the present invention provide a text enhancement method and apparatus, and a text extraction method and apparatus, which can perform enhancement processing on a text in an original image including a line of text, so that the text in the original image is more obvious, and further, the effect of text extraction can be optimized.
According to an aspect of an embodiment of the present invention, there is provided a text enhancement method, including: acquiring an original image comprising a line of text; according to the direct difference and the indirect difference from any original pixel point in the original image to each neighborhood pixel point in a neighborhood set of the original image, performing stroke two-dimensional filtering on the original brightness value or/and the color value of each original pixel point to obtain a filtered updated brightness value or/and color value of the original image, wherein the range of the neighborhood set is a square with the original pixel point as a center and the side length of the square as w, and the w is smaller than the height of the original image; and replacing the corresponding original brightness value or/and color value with the filtered updated brightness value or/and color value respectively to generate a text enhanced image corresponding to the original image.
According to another aspect of the embodiments of the present invention, there is provided a text enhancement apparatus including: the acquisition module is used for acquiring an original image comprising a line of text; the system comprises a filtering module, a calculating module and a calculating module, wherein the filtering module is used for performing stroke two-dimensional filtering on an original brightness value or/and a color value of each original pixel point according to a direct difference and an indirect difference from any original pixel point in an original image to each neighborhood pixel point in a neighborhood set of the original image to obtain a filtered updated brightness value or/and color value of the original image, the range of the neighborhood set is a square with the original pixel point as a center and the side length of the square as w, and the w is smaller than the height of the original image; and a replacing module, configured to replace the corresponding original luminance value or/and color value with the filtered updated luminance value or/and color value, respectively, so as to generate a text-enhanced image corresponding to the original image.
According to another aspect of the embodiments of the present invention, there is provided a text extraction method, including: acquiring an original image comprising a line of text; according to the direct difference and the indirect difference from any original pixel point in the original image to each neighborhood pixel point in the neighborhood set, performing stroke two-dimensional filtering on the original brightness value or/and the color value of each original pixel point to obtain the filtered updated brightness value or/and the filtered color value of the original image; the range of the neighborhood set is a square which takes the original pixel point as the center and has the side length of w; the w is less than the height of the original image; replacing the corresponding original brightness value or/and color value with the filtered updated brightness value or/and color value respectively to generate a text enhanced image corresponding to the original image; and extracting the text in the text enhanced image.
According to still another aspect of the embodiments of the present invention, there is provided a text extraction apparatus including: the acquisition module is used for acquiring an original image comprising a line of text; the system comprises a filtering module, a calculating module and a calculating module, wherein the filtering module is used for performing stroke two-dimensional filtering on an original brightness value or/and a color value of each original pixel point according to a direct difference and an indirect difference from any original pixel point in an original image to each neighborhood pixel point in a neighborhood set of the original image to obtain a filtered updated brightness value or/and color value of the original image, the range of the neighborhood set is a square with the original pixel point as a center and the side length of the square as w, and the w is smaller than the height of the original image; a replacing module, configured to replace the corresponding original luminance value or/and color value with the filtered updated luminance value or/and color value, respectively, so as to generate a text-enhanced image corresponding to the original image; and the extraction module is used for extracting the text in the text enhanced image.
In addition, according to another aspect of the present invention, there is also provided a storage medium. The storage medium includes a program code readable by a machine, which, when executed on an information processing apparatus, causes the information processing apparatus to execute the above-described text enhancement method and text extraction method according to the present invention.
Further, according to still another aspect of the present invention, there is provided a program product. The program product comprises machine-executable instructions that, when executed on an information processing apparatus, cause the information processing apparatus to perform the text enhancement method and text extraction method described above in accordance with the present invention.
According to the text enhancement method provided by the embodiment of the invention, the text strokes in the pixel points in the obtained text enhanced image are enhanced, the consistency of the pixels in the strokes is enhanced, and the difference between the text and the background is deepened, namely, the text included in the original image is enhanced, so that the subsequent text extraction for the text enhanced image is more accurate and more accurate.
According to the text extraction method provided by the embodiment of the invention, the text extraction is carried out based on the obtained text enhanced image, so that the text extraction result is more accurate and accurate, and the text extraction efficiency is improved because the complexity of the text extraction is reduced.
Additional aspects of embodiments of the present invention are set forth in the description section that follows, wherein the detailed description is presented to fully disclose preferred embodiments of the present invention and not to limit it.
Drawings
The above and other objects and advantages of embodiments of the present invention will be further described with reference to the accompanying drawings in conjunction with the specific embodiments. In the drawings, the same or corresponding technical features or components will be denoted by the same or corresponding reference numerals.
FIG. 1 is a flow chart illustrating a first method of text enhancement provided as an embodiment of the invention;
fig. 2 is a flowchart showing step S102 as a first text enhancement method;
FIG. 3 is a flow diagram illustrating a second method of text enhancement provided as an embodiment of the invention;
fig. 4 is a flowchart showing step S302 as a second text enhancement method;
fig. 5 is another flowchart showing step S302 as in the second text enhancement method;
fig. 6 is still another flowchart showing step S302 as in the second text enhancement method;
fig. 7 is still another flowchart showing step S302 as in the second text enhancement method;
fig. 8 is a flowchart showing step S304 as a second text enhancement method;
fig. 9 is a schematic diagram showing a first text enhancement apparatus provided as an embodiment of the present invention;
FIG. 10 is a diagram showing a filtering module 902 as a first type of text enhancement device;
fig. 11 is a schematic diagram showing a second text enhancement apparatus provided as an embodiment of the present invention;
FIG. 12 is a diagram showing a stroke polarity estimation module 1101 as in a second text enhancement device;
FIG. 13 is another schematic diagram showing a stroke polarity estimation module 1101 as in a second text enhancement device;
FIG. 14 is a further schematic diagram showing a stroke polarity estimation module 1101 as in a second text enhancement apparatus;
FIG. 15 is a diagram showing a stroke polarity estimation module 1101 as in the second text enhancement apparatus;
fig. 16 is a diagram showing a judgment module 1102 in a text enhancement apparatus as a second type;
fig. 17 is a flowchart illustrating a text extraction method provided as an embodiment of the present invention;
fig. 18 is a schematic diagram showing a text extraction apparatus provided as an embodiment of the present invention;
fig. 19 is a block diagram showing an exemplary configuration of a personal computer as an information processing apparatus employed in the embodiment of the present invention.
Detailed Description
Embodiments of the present invention are described below with reference to the drawings.
Specifically, referring to fig. 1, an embodiment of the present invention provides a first text enhancement method, which specifically includes:
s101: an original image comprising a line of text is acquired.
In the embodiment of the present invention, the text enhancement refers to performing enhancement processing on text in an original image including a line of text. Enhancement here can be understood as deepening the edges of the text, or highlighting the text from the background, etc. When the embodiment of the invention is applied, the stroke appearance (such as brightness or color and the like) and the shape (such as the text is in a stripe shape) information of the text are considered, so that the effects of enhancing the consistency of pixels inside the strokes and deepening the difference between the text and the background are achieved.
S102: according to the direct difference and the indirect difference from any original pixel point in the original image to each neighborhood pixel point in a neighborhood set of the original image, performing stroke two-dimensional filtering on the original brightness value or/and the color value of each original pixel point to obtain the filtered updated brightness value or/and color value of the original image, wherein the range of the neighborhood set is a square with the original pixel point as the center and the side length of the square as w, and the w is smaller than the height of the original image.
The direct difference in this step represents the direct appearance difference, such as color or brightness difference, between any original pixel and the pixel in each field in the neighborhood set, and the indirect difference represents the gradient modulus of the pixel passing from the original pixel to the pixel in each field in the neighborhood set. And performing stroke two-dimensional filtering on the original brightness value or/and the color value of each original pixel point by using the direct difference and the indirect difference to obtain the filtered updated brightness value or/and color value of the original image. The neighborhood set mentioned here may be obtained by selecting a square with w as a side length with an original pixel point as a center, where w is smaller than an image height of an original image, and preferably, may be one eighth of the original image height.
In practical applications, referring to fig. 2, the S102 may specifically include:
s201: and carrying out algebraic subtraction on the original brightness value or/and the color value of the original pixel point and each neighborhood pixel point to obtain the direct difference degree.
In this example, D is used1(i, j) represents the direct difference of the pixels i and j, the direct difference of the luminance values can be calculated using the following formula (1):
wherein f' (i) represents the mean value of the target pixel neighborhood brightness, namely, the neighborhood pixel mean value is used for replacing the target pixel value to calculate the direct difference; σ (i) represents the local standard deviation of luminance around pixel i, which can serve as a normalization.
The direct difference in luminance values of pixels i and j can also be calculated using the following formula (2):
where f (i) represents the target pixel neighborhood luminance.
It should be noted that, when calculating the direct difference degree of the color values of the pixels i and j, the direct difference degree may be calculated by using equation (3) or equation (4):
n in the formulas (3) and (4) represents R, G, and B channels of color. It should be noted that the above formula for calculating the direct difference degree is only an example, and those skilled in the art can make adaptive modifications to the above formula.
S202: and acquiring the indirect difference according to the gradient modulus from the original pixel point to each neighborhood pixel point of the neighborhood set.
Wherein in the present embodiment, D is adopted2(i, j) represents the indirect difference between the luminance values of the pixels i and j, the indirect difference between the luminance values of the pixels i and j can be calculated by using the formula (5):
wherein,representing the gradient modulus at pixel i along the i to j direction. B in the formula (5) and a in the formula (1) are preset parameters, and the monotonicity of the two parameters is consistent to control the filtering smoothness degree.
Of course, in practical applications, the difference between the maximum and minimum gradient values of the luminance of i to j passing through the pixel can also be used to replace the maximum gradient value in the formula (5), and the calculation is performed as shown in the formula (6):
where Max represents the maximum value of the gradient values and min represents the minimum value of the gradient values.
And the indirect difference degree of the color values of the pixels i and j can be calculated by adopting the formula (7) and the formula (8), respectively:
where n represents the R, G and B channels of a color. It should be noted that the above formula for calculating the indirect difference is only an example, and those skilled in the art can make adaptive modifications to the above formula.
S203: and calculating the weight value of each neighborhood pixel point to the brightness value or/and the color value of the original pixel point according to the direct difference and the indirect difference.
After the indirect and direct disparity degrees are obtained, the weight value can be calculated using equation (9):
w(i,j)/wn(i,j)=exp{-[D1(i,j)+D2(i,j)]} (9)
wherein D is1(i, j) represents the direct difference in brightness and/or color values of pixels i and j, D2(i, j) represents the indirect degree of difference in luminance and/or color values for pixels i and j, where w (i, j) represents the luminance weight value and w (j) represents the luminance weight valuen(i, j) represents a color weight value.
S204: and calculating the updated brightness value of the original pixel point by adopting a stroke two-dimensional filtering formula (10).
Wherein N (i) represents a neighborhood set of pixel points i; w (i, j) represents the weight value of the brightness value of the original pixel point i by the neighborhood pixel point j; f (j) is the brightness value of the pixel point j in the neighborhood set.
S205: and calculating the updated color value of the original pixel point by adopting a stroke two-dimensional filtering formula (11).
Wherein, the wn(i, j) represents the weight value of the color value of the original pixel point i by the neighborhood pixel point j on the n channel; f. ofn(j) And the color value of the pixel point j in the neighborhood set on the n channel is obtained.
It should be noted that, since the updated luminance value and the updated color value are calculated in steps S204 and S205 respectively, in practical applications, either one of the steps may be selected to be executed, or both of the steps may be executed simultaneously, and the embodiment of the present invention may be implemented.
S103: and replacing the corresponding original brightness value or/and color value with the filtered updated brightness value or/and color value respectively to generate a text enhanced image corresponding to the original image.
After the filtered updated brightness value or/and color value is obtained, the original brightness value or/and color value is replaced by the updated brightness value or/and color value respectively, so that text strokes in pixel points in the original image are enhanced after replacement, consistency of pixels inside the strokes can be enhanced, and the difference between the text and the background is deepened.
In view of the above problem, an embodiment of the present invention also provides a corresponding solution, and specifically, referring to fig. 3, an embodiment of the present invention provides another text enhancement method, which specifically may include:
s301: an original image comprising a line of text is acquired.
S302: and estimating the stroke polarity of the text in the original image, wherein the polarity represents the size relation of the brightness value or/and the color value between the pixel point positioned in the stroke area and the pixel point positioned outside the stroke area.
In practical application, because stroke enhancement of a text is mainly based on a filtering technology, that is, a target pixel value in a stroke is enhanced by using a surrounding pixel value outside the stroke, noise pixel points surrounding the target pixel in the stroke have a negative influence on the stroke enhancement effect, and the influence is particularly obvious when thinner strokes or stroke intervals are processed. To prevent this degradation, a stroke polarity estimation scheme is introduced in this embodiment. The stroke polarity estimated in this step may represent a magnitude relationship of a luminance value or/and a color value between a pixel point located inside the stroke region and a pixel point located outside the stroke region.
Specifically, in a case where the polarity represents a relationship between luminance values of pixel points inside the stroke region and pixel points outside the stroke region, referring to fig. 4, the step of estimating the stroke polarity of the text in the original image includes:
s401: calculating the stroke response strength in the horizontal direction, the vertical direction and the two diagonal directions by adopting a formula (12):
wherein w is one eighth of the height of the original image, and f (i) represents the brightness value of the pixel point i. The four stroke response strengths in the horizontal direction, the vertical direction and the two diagonal directions can be obtained in the step.
S402: judging whether the maximum stroke response intensity in the four calculated stroke response intensities meets the following two conditions: (l) has the same polarity as (f) (i) -f (k) and the stroke response strength is greater than a preset threshold; if so, go to step S403; if not, step S404 is performed.
In a specific application, because the brightness or color value of the pixel point inside the text stroke and the background pixel point are usually opposite, if the polarities of [ f (i) -f (l)) ] and [ f (i) -f (k)) ] are the same, it indicates that the pixel point i is most likely to be the pixel point inside the stroke, and this step is to find out the pixel point which is likely to be the pixel point inside the stroke from all the pixel points in the original image. The polarities are the same, and it means that the brightness value and/or the color value of the i point are greater than the l pixel and the k pixel at the same time, or the brightness value and/or the color value of the i point are less than or equal to the l pixel and the k pixel at the same time, that is, both [ f (i) -f (l) ], and [ f (i) -f (k) ], are greater than zero, or both are less than or equal to zero. The threshold of the stroke response strength can be adjusted according to actual requirements, and therefore the invention does not limit the selection of the threshold.
S403: determining an estimated stroke polarity for the line of text in accordance with the polarity of [ f (i) -f (l) ] or [ f (i) -f (k) ].
When the pixel point i meets the two conditions, the estimated stroke polarity in the text is determined to be p (i) according to the polarity of [ f (i) -f (l) ] or [ f (i) -f (k) ], the value of the estimated stroke polarity can be random, and only the internal pixel point and the external pixel point of the stroke are needed to be distinguished. For example, if the luminance value of the stroke in the text is lower than the luminance value of the background pixel, p (i) is set to 0, and correspondingly, if p (i) is 1, it indicates that the luminance value of the stroke in the text is higher than or equal to the luminance value of the background pixel.
S404: and selecting the calculated stroke response strength according to the size relationship in sequence to execute the step S402.
And when the maximum stroke response strength does not meet the two conditions, selecting the second-highest stroke response strength to execute the judging step of the step S402, and repeating the steps according to the magnitude relation of the stroke response strengths until the stroke response strengths meeting the two conditions are obtained, or taking the pixel point i as a non-stroke pixel point under the condition that the four stroke response strengths do not meet the two conditions.
Specifically, in a case where the polarity represents a relationship between luminance values of pixel points inside the stroke region and pixel points outside the stroke region, another embodiment also exists, and referring to fig. 5, the step of estimating the polarity of the stroke of the text in the original image may specifically include:
s501: calculating the stroke response intensity of each original pixel point in the original image by adopting a formula (12) in one direction; the one direction is any one of a horizontal direction, a vertical direction, and two diagonal directions.
In the step, the stroke response strength is calculated in the horizontal direction, the vertical direction and any one direction of two diagonal directions.
S502: judging whether the stroke response strength simultaneously meets the following two conditions: (f) (i) -f (l) has the same polarity as (f (i) -f (k)) and the stroke response strength is greater than a preset threshold, and if so, step S503 is performed.
And then judging whether the calculated stroke response strength meets the two conditions at the same time, if not, not performing any processing on the calculated stroke response strength, and calculating the stroke response strength of each original pixel point in the original image in other directions without calculation.
S503: determining the initial polarity of the original pixel point i according to the polarities of [ f (i) -f (l) ] or [ f (i) -f (k) ].
If the calculated stroke response strength meets the two conditions, determining the initial polarity of the original pixel point i according to the polarities of [ f (i) -f (l) ] or [ f (i) -f (k) ]. The initial polarity determined here should be the same as the polarity of [ f (i) -f (l) ], or [ f (i) -f (k) ].
S504: judging whether the stroke response intensity in the four directions is completely calculated, if so, executing the step S505; if not, the step S501 is executed.
Then, it is determined whether the stroke response strengths in the four directions including the horizontal direction, the vertical direction and the two diagonal directions are all calculated, and if not, any direction in which the stroke response strength is not calculated is selected and the step S501 is executed.
S505: and determining the initial polarity corresponding to the maximum stroke response strength in the four directions as the estimated stroke polarity of the line of text.
If the stroke response strengths in the four directions are all calculated, selecting the initial polarity corresponding to the maximum stroke response strength from the stroke response strengths meeting the two conditions to determine the estimated stroke polarity of the text, namely the estimated stroke polarity of the text in the original image is the same as the initial polarity corresponding to the maximum stroke response strength.
Specifically, in a case where the polarity represents a color value size relationship between a pixel point inside the stroke region and a pixel point outside the stroke region, the step of estimating the stroke polarity of the text in the original image may specifically include, as shown in fig. 6:
s601: and calculating the stroke response intensity of each channel in the horizontal direction, the vertical direction and the two diagonal directions by adopting the following formulas respectively:
wherein w is one eighth of the height of the original image, fn(i) Representing the color value of pixel point i on channel n. For example, if the n channels are the R channel, the G channel, and the B channel, respectively, the stroke response strength is the sum of the stroke response strengths on the R channel, the G channel, and the B channel.
S602: judging whether the maximum stroke response intensity in the four stroke response intensities of the channels meets the following two conditions: on channel n [ fn(i)-fn(l)]And [ fn(i)-fn(k)]The polarities of the strokes are all kept consistent, and the stroke response strength is greater than a preset threshold value, if so, the step S603 is executed; if not, step S604 is performed.
In this step, [ f ] needs to be satisfied on any one channeln(i)-fn(l)]And [ fn(i)-fn(k)]The polarity is maintained in a consistent condition.
S603: according to [ fn(i)-fn(l)]Or [ fn(i)-fn(k)]Determines an estimated stroke polarity for the line of text.
S604: and sequentially selecting the calculated stroke response strength according to the size relationship to execute the step S602.
When the maximum stroke response strength does not satisfy the two conditions, the step S602 needs to be executed by sequentially selecting the stroke response strengths of the second largest, the third largest and the fourth largest according to the size relationship until the response strength of a certain stroke satisfies the two conditions, or, when the response strengths of the four strokes do not satisfy the two conditions, the pixel point i is taken as a non-stroke pixel point. For example, when the second greatest stroke response strength has satisfied both of the above conditions, the flow of stroke polarity estimation is stopped.
In practical application, in a case where the polarity represents a color value size relationship between a pixel point inside the stroke region and a pixel point outside the stroke region, there is another scenario, as shown in fig. 7, the step of estimating the stroke polarity of the text in the original image includes:
s701: and calculating the stroke response intensity of each channel of each original pixel point in the original image by adopting a formula (13) in one direction, wherein the one direction is any one of the horizontal direction, the vertical direction and the two diagonal directions.
S702: judging whether the stroke response strength of each channel in the one direction simultaneously meets the following two conditions: on channel n [ fn(i)-fn(l)]And [ fn(i)-fn(k)]Are consistent and the stroke response strength is greater than a preset threshold, if yes, step S703 is executed.
This step directly determines whether the one stroke response strength calculated in step S701 satisfies the two conditions, and if not, does not set the initial polarity in the direction.
S703: according to [ fn(i)-fn(l)]Or [ fn(i)-fn(k)]Determines the initial polarity of the original pixel point i.
If the response strength of one stroke calculated in the step S701 meets the two conditions, the initial polarity of the original pixel point i in the direction is set to be equal to [ f [ ]n(i)-fn(l)]Or [ fn(i)-fn(k)]Are of the same polarity.
S704: judging whether the stroke response intensity in the four directions is completely calculated, if so, executing the step S705; if not, step S701 is executed
S705: and determining the initial polarity corresponding to the maximum stroke response strength in the four directions as the estimated stroke polarity of the line of text.
And if the stroke response strengths in the four directions are all calculated, selecting the initial polarity corresponding to the maximum stroke response strength from the stroke response strengths in the four directions meeting the two conditions as the estimated stroke polarity of the text. If the stroke response strengths in the four directions are not completely calculated, any direction which is not calculated is selected again to execute the step S701.
S303: according to the direct difference and the indirect difference from any original pixel point in the original image to each neighborhood pixel point in a neighborhood set of the original image, performing stroke two-dimensional filtering on the original brightness value or/and the color value of each original pixel point to obtain the filtered updated brightness value or/and color value of the original image, wherein the range of the neighborhood set is a square with the original pixel point as the center and the side length of the square as w, and the w is smaller than the height of the original image.
For the detailed description of this step, reference may be made to the related contents in embodiment 1, and details are not repeated herein. It should be noted that the step may be executed simultaneously with the step S302, or the step S303 may be executed first and then the step S302 may be executed.
S304: and judging whether the updated brightness value or/and color value after filtering is matched with the stroke polarity or not for each pixel point of the original image, if so, executing the step S305, and if not, not replacing.
After calculating to obtain the stroke polarity and performing stroke filtering, whether the filtered updated brightness value or/and color value of each pixel point of the original image is matched with the stroke polarity can be sequentially judged, if the filtered updated brightness value or/and color value of a certain pixel point is not matched with the stroke polarity, the original brightness value or/and color value of the pixel point is not replaced, and whether the filtered updated brightness value or/and color value of the next pixel point is matched with the stroke polarity is continuously judged.
Referring to fig. 8, the step S304 may specifically include:
s801: and acquiring a first size relation between the filtered updated brightness value or/and color value and the original brightness value or/and color value.
First, a first size relationship between the filtered updated luminance value or/and color value and the original luminance value or/and color value is obtained, where the first size relationship may be, for example, that the updated luminance value is brighter than the luminance of the original luminance value, and then the first size relationship is that the updated luminance value is greater than the original luminance value, or may also be, for example, that the updated color value is greater than the original color value, and then the first size relationship is that the updated color value is greater than the original color value. The first magnitude relation may be obtained by algebraically subtracting the updated luminance or/and color value from the original luminance or/and color value.
S802: and judging whether the first size relation is matched with a second size relation represented by the stroke polarity.
And the stroke polarity represents the size relationship between the brightness value or/and the color value of the pixel point inside the stroke and the brightness value or/and the color value of the pixel point outside the stroke, therefore, when the updated color value is larger than the original color value, and the second size relationship represented by the stroke polarity is also larger than the original color value for the updated color value, or when the updated color value is smaller than the original color value for the updated color value, and the second size relationship represented by the stroke polarity is also smaller than the original color value for the updated color value, the first size relationship is considered to be matched with the second size relationship represented by the stroke polarity, otherwise, the first size relationship is considered to be not matched with the second size relationship.
S305: and replacing the corresponding original brightness value or/and color value with the filtered updated brightness value or/and color value respectively to generate a text enhanced image corresponding to the original image.
And under the condition that the first size relation is matched with the second size relation, replacing the corresponding original brightness value or/and color value with the filtered updated brightness value or/and color value respectively to obtain a text enhanced image corresponding to the original image. In this embodiment, through the step of estimating the stroke polarity, the text enhancement can be performed under the condition that the stroke polarity is matched with the filtered updated brightness value or color value, and compared with embodiment 1, the effect of text enhancement can be more prominent, so that the accuracy of subsequent text extraction can be improved.
Corresponding to the first text enhancement method provided by the embodiment of the present invention, an embodiment of the present invention further provides a text enhancement apparatus, and referring to fig. 9, the apparatus may include:
an obtaining module 901, configured to obtain an original image including a line of text.
The filtering module 902 is configured to perform stroke two-dimensional filtering on an original brightness value or/and a color value of each original pixel according to a direct difference and an indirect difference between any original pixel in the original image and each neighbor pixel in a neighbor set of the original image, so as to obtain a filtered updated brightness value or/and color value of the original image, where the range of the neighbor set is a square with the original pixel as a center and a side length of w, and w is smaller than a height of the original image.
Referring to fig. 10, the filtering module 902 may specifically include:
a first obtaining submodule 1001, configured to perform algebraic subtraction on the original luminance value or/and the color value of the original pixel point and each of the neighborhood pixel points to obtain the direct difference;
a second obtaining submodule 1002, configured to obtain the indirect disparity according to a gradient modulus from the original pixel point to each neighbor pixel point of the neighbor set;
the weight calculation submodule 1003 is configured to calculate, according to the direct disparity and the indirect disparity, a weight value of each neighborhood pixel for a brightness value or/and a color value of the original pixel;
and the updated brightness value calculating submodule 1004 is configured to calculate an updated brightness value of the original pixel point by using the stroke two-dimensional filtering formula (10).
And an update color value calculation submodule 1005, configured to calculate an update color value of the original pixel point by using the stroke two-dimensional filtering formula (11).
A replacing module 903, configured to replace the corresponding original luminance value or/and color value with the filtered updated luminance value or/and color value, respectively, so as to generate a text-enhanced image corresponding to the original image.
According to the text enhancement device provided by the embodiment of the invention, after the filtered updated brightness value or/and color value is obtained, the original brightness value or/and color value is replaced by the updated brightness value or/and color value, so that the text strokes in the pixel points in the original image are enhanced after replacement, the consistency of the pixels in the strokes can be enhanced, the difference between the text and the background can be deepened, a better text enhancement image is provided for the subsequent text extraction, and the accuracy and precision of the subsequent text extraction can be improved.
Corresponding to the second text enhancement method provided by the embodiment of the present invention, an embodiment of the present invention further provides a text enhancement apparatus, and referring to fig. 11, the apparatus may include:
an obtaining module 901, configured to obtain an original image including a line of text.
The stroke polarity estimation module 1101 is configured to estimate a stroke polarity of the text in the original image, where the polarity represents a size relationship between a luminance value or/and a color value between a pixel point located inside the stroke region and a pixel point located outside the stroke region.
The stroke polarity estimation module 1101 may be configured in different ways in different application scenarios. In a scenario where the polarity represents a brightness value magnitude relationship between a pixel point inside a stroke region and a pixel point outside the stroke region, referring to fig. 12, the stroke polarity estimation module 1101 may include:
the first calculating submodule 1201 is configured to calculate the stroke response strength in the horizontal direction, the vertical direction, and the two diagonal directions by using the formula (12), respectively.
The first determining sub-module 1202 is configured to determine whether the maximum stroke response strength of the four calculated stroke response strengths meets the following two conditions: the polarity of [ f (i) -f (l) ] is the same as that of [ f (i) -f (k) ] and the stroke response strength is larger than a preset threshold value.
A first determining sub-module 1203, configured to determine, if the result of the first determining sub-module is yes, the estimated stroke polarity of the line of text according to the polarities of [ f (i) -f (l) ], or [ f (i) -f (k) ].
And a first triggering submodule 1204, configured to, when a result of the first judging submodule is negative, sequentially select the calculated stroke response strength according to a size relationship and trigger the first judging submodule until a certain stroke response strength satisfies the two conditions, or, when none of the four stroke response strengths satisfies the two conditions, take the pixel point i as a non-stroke pixel point.
In another scenario, in a case where the polarity represents a brightness value magnitude relationship between a pixel point inside the stroke region and a pixel point outside the stroke region, referring to fig. 13, the stroke polarity estimation module 1101 includes:
the second calculating submodule 1301 is configured to calculate the stroke response strength of each original pixel point in the original image by using a formula (12) in one direction, where the one direction is any one of a horizontal direction, a vertical direction, and two diagonal directions.
The second judging submodule 1302 is configured to judge whether the stroke response strength satisfies the following two conditions at the same time: the polarity of [ f (i) -f (l) ] is the same as that of [ f (i) -f (k) ] and the stroke response strength is larger than a preset threshold value.
A second determining sub-module 1303, configured to determine, if the second determining sub-module is yes, an initial polarity of the original pixel point i according to the polarities of [ f (i) -f (l)) or [ f (i) -f (k)).
And a third judging submodule 1304, configured to judge whether the stroke response strengths in the four directions are all calculated.
A third determining submodule 1305, configured to determine, if the result of the third determining submodule is yes, an initial polarity corresponding to the maximum stroke response strength in four directions as the estimated stroke polarity of the line of text.
And the second triggering submodule 1306 is configured to trigger the second calculating submodule if the result of the third judging submodule is negative.
In another scenario, in a case that the polarity represents a color value size relationship between a pixel point inside the stroke region and a pixel point outside the stroke region, referring to fig. 14, the stroke polarity estimation module 1101 may specifically include:
and a third computation submodule 1401 for computing the stroke response strength in the horizontal direction, the vertical direction and the two diagonal directions by using the formula (13), respectively.
The fourth judging submodule 1402 is configured to judge whether a maximum stroke response strength among the calculated stroke response strengths meets the following two conditions: on channel n [ fn(i)-fn(l)]And [ fn(i)-fn(k)]Are consistent and the stroke response strength is greater than a preset threshold.
A fourth determination sub-module 1403 for determining in the fourth decisionIn the case of a positive result of the fault block, according to fn(i)-fn(l)]Or [ fn(i)-fn(k)]Determines an estimated stroke polarity for the line of text.
And a third triggering submodule 1404, configured to, when a result of the fourth judging submodule is negative, sequentially select the calculated stroke response strength according to a size relationship and trigger the fourth judging submodule until a certain stroke response strength satisfies the two conditions, or, when none of the four stroke response strengths satisfies the two conditions, take the pixel point i as a non-stroke pixel point.
In another scenario, in a case that the polarity represents a color value size relationship between a pixel point inside the stroke region and a pixel point outside the stroke region, referring to fig. 15, the stroke polarity estimation module 1101 may specifically include:
the fourth calculating submodule 1501 is configured to calculate the stroke response strength of each original pixel point in the original image by using a formula (13) in one direction, where the one direction is any one of a horizontal direction, a vertical direction, and two diagonal directions.
A fifth judging submodule 1502, configured to judge whether the stroke response strength satisfies the following two conditions at the same time: on channel n [ fn(i)-fn(l)]And [ fn(i)-fn(k)]Are consistent and the stroke response strength is greater than a preset threshold.
A fifth determining sub-module 1503 for determining if the result of the fifth determining sub-module is yes according to [ fn(i)-fn(i)]Or [ fn(i)-fn(k)]Determines the initial polarity of the original pixel point i.
And a sixth judging submodule 1504 for judging whether the stroke response strengths in the four directions are all calculated.
A sixth determining submodule 1505, configured to determine, if the result of the sixth determining submodule is yes, the initial polarity corresponding to the maximum stroke response strength in the four directions as the estimated stroke polarity of the line of text.
The fourth triggering sub-module 1506 is configured to trigger the fourth calculating sub-module if the result of the sixth determining sub-module is negative.
The filtering module 902 is configured to perform stroke two-dimensional filtering on an original brightness value or/and a color value of each original pixel according to a direct difference and an indirect difference between any original pixel in the original image and each neighbor pixel in a neighbor set of the original image, so as to obtain a filtered updated brightness value or/and color value of the original image, where the range of the neighbor set is a square with the original pixel as a center and a side length of w, and w is smaller than a height of the original image.
The determining module 1102 determines whether the filtered updated brightness value or/and color value is matched with the stroke polarity, and if so, triggers the replacing module 903.
Referring to fig. 16, in practical application, the determining module 1102 may specifically include:
a third obtaining sub-module 1601, configured to obtain a first size relationship between the filtered updated luminance value or/and color value and the original luminance value or/and color value.
A seventh determining submodule 1602, configured to determine whether the first size relationship matches the second size relationship indicated by the stroke polarity.
A replacing module 903, configured to replace the corresponding original luminance value or/and color value with the filtered updated luminance value or/and color value, respectively, so as to generate a text-enhanced image corresponding to the original image.
By the device provided by the embodiment of the invention, the stroke polarity estimation mode is further adopted, the filtered updated color value or the updated brightness value can be verified, and the filtered updated brightness value or/and color value is respectively replaced by the corresponding original brightness value or/and color value under the condition that the stroke polarity is matched with the filtered updated color value and/or updated brightness value, so that the obtained text enhanced image is more effective and accurate.
In addition, referring to fig. 17, after the text enhancement, an embodiment of the present invention further provides a text extraction method, which may include:
s1701: an original image comprising a line of text is acquired.
S1702: according to the direct difference and the indirect difference from any original pixel point in the original image to each neighborhood pixel point in the neighborhood set, performing stroke two-dimensional filtering on the original brightness value or/and the color value of each original pixel point to obtain the filtered updated brightness value or/and the filtered color value of the original image; the range of the neighborhood set is a square which takes the original pixel point as the center and has the side length of w; the w is less than the height of the original image.
S1703: replacing the corresponding original brightness value or/and color value with the filtered updated brightness value or/and color value respectively to generate a text enhanced image corresponding to the original image;
s1704: and extracting the text in the text enhanced image.
By adopting the text extraction method, the text extraction can be carried out based on the text enhanced image, so that the extracted text is more accurate and precise, and meanwhile, the complexity of text extraction can be reduced and the efficiency of text extraction can be improved because the text is enhanced during extraction.
Corresponding to the text extraction method, as shown in fig. 18, an embodiment of the present invention further provides a text extraction apparatus, where the apparatus may include:
an obtaining module 901, configured to obtain an original image including a line of text.
The filtering module 902 is configured to perform stroke two-dimensional filtering on an original brightness value or/and a color value of each original pixel according to a direct difference and an indirect difference between any original pixel in the original image and each neighbor pixel in a neighbor set of the original image, so as to obtain a filtered updated brightness value or/and color value of the original image, where the range of the neighbor set is a square with the original pixel as a center and a side length of w, and w is smaller than a height of the original image.
A replacing module 903, configured to replace the corresponding original luminance value or/and color value with the filtered updated luminance value or/and color value, respectively, so as to generate a text-enhanced image corresponding to the original image.
An extracting module 1801, configured to extract a text in the text-enhanced image.
By adopting the text extraction device, the extraction can be carried out based on the text enhanced image, so that the extracted text is more accurate and precise, and meanwhile, the complexity of text extraction can be reduced and the efficiency of text extraction can be improved because the text is enhanced during extraction.
Further, it should be noted that the above series of processes and means may also be implemented by software and/or firmware. In the case of implementation by software and/or firmware, a program constituting the software is installed from a storage medium or a network to a computer having a dedicated hardware structure, such as a general-purpose personal computer 1900 shown in fig. 19, which is capable of executing various functions and the like when various programs are installed.
In fig. 19, a Central Processing Unit (CPU)1901 executes various processes in accordance with a program stored in a Read Only Memory (ROM)1902 or a program loaded from a storage section 1908 to a Random Access Memory (RAM) 1903. The RAM 1903 also stores data necessary when the CPU 1901 executes various processes and the like as necessary.
The CPU 1901, ROM 1902, and RAM 1903 are connected to each other via a bus 1904. An input/output interface 1905 is also connected to the bus 1904.
The following components are connected to the input/output interface 1905: an input section 1906 including a keyboard, mouse, and the like; an output section 1907 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker and the like; a storage section 1908 including a hard disk and the like; and a communication section 1909 including a network interface card such as a LAN card, a modem, and the like. The communication section 1909 performs communication processing via a network such as the internet.
A driver 1910 is also connected to the input/output interface 1905 as needed. A removable medium 1911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1910 as necessary, so that a computer program read out therefrom is installed in the storage section 1908 as necessary.
In the case where the above-described series of processes is realized by software, a program constituting the software is installed from a network such as the internet or a storage medium such as the removable medium 1911.
It should be understood by those skilled in the art that such a storage medium is not limited to the removable medium 1911 shown in fig. 19 in which the program is stored, distributed separately from the apparatus to provide the program to the user. Examples of the removable medium 1911 include a magnetic disk (including a floppy disk (registered trademark)), an optical disk (including a compact disc read only memory (CD-ROM) and a Digital Versatile Disc (DVD)), a magneto-optical disk (including a Mini Disk (MD) (registered trademark)), and a semiconductor memory. Alternatively, the storage medium may be the ROM 1902, a hard disk included in the storage section 1908, or the like, in which programs are stored, and which is distributed to users together with the apparatus including them.
It is also to be noted that the steps of executing the above-described series of processes may naturally be executed chronologically in the order described, but need not necessarily be executed chronologically. Some steps may be performed in parallel or independently of each other.
Although the present invention and its advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims. Furthermore, the terms "comprises," "comprising," or any other variation thereof, in embodiments of the present invention are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. The term "comprising", without further limitation, means that the element so defined is not excluded from the group consisting of additional identical elements in the process, method, article, or apparatus that comprises the element.
With respect to the implementation including the above embodiments, the following remarks are also disclosed:
supplementary note 1, a text enhancement method, comprising:
acquiring an original image comprising a line of text;
according to the direct difference and the indirect difference from any original pixel point in the original image to each neighborhood pixel point in a neighborhood set of the original image, performing stroke two-dimensional filtering on the original brightness value or/and the color value of each original pixel point to obtain a filtered updated brightness value or/and color value of the original image, wherein the range of the neighborhood set is a square with the original pixel point as a center and the side length of the square as w, and the w is smaller than the height of the original image;
and replacing the corresponding original brightness value or/and color value with the filtered updated brightness value or/and color value respectively to generate a text enhanced image corresponding to the original image.
2. The method according to supplementary note 1, wherein the step of performing stroke two-dimensional filtering on the original brightness value or/and the color value of each original pixel point comprises:
algebraically subtracting the original brightness value or/and the color value of the original pixel point and each neighborhood pixel point to obtain the direct difference degree;
acquiring the indirect difference according to the gradient modulus value from the original pixel point to each neighborhood pixel point of the neighborhood set;
calculating the weight value of each neighborhood pixel point to the brightness value or/and the color value of the original pixel point according to the direct difference and the indirect difference;
calculating the updated brightness value of the original pixel point by adopting a following stroke two-dimensional filtering formula;
wherein N (i) represents a neighborhood set of pixel points i; w (i, j) represents the weight value of the brightness value of the original pixel point i by the neighborhood pixel point j; f (j) is the brightness value of the pixel point j in the neighborhood set;
and/or the presence of a gas in the gas,
calculating the updated color value of the original pixel point by adopting a following stroke two-dimensional filtering formula;
wherein, the wn(i, j) represents the weight value of the color value of the original pixel point i by the neighborhood pixel point j on the n channel; f. ofn(j) And the color value of the pixel point j in the neighborhood set on the n channel is obtained.
3. The method according to supplementary note 1, wherein after acquiring the original image including one line of text, the method further includes:
estimating the stroke polarity of the text in the original image, wherein the polarity represents the size relation of the brightness value or/and the color value between the pixel point positioned in the stroke area and the pixel point positioned outside the stroke area;
and after obtaining the filtered updated brightness value or/and color value of the original image, further comprising:
and judging whether the updated brightness value or/and color value after filtering is matched with the stroke polarity, if so, executing the step of replacing the original brightness value or/and color value.
4. The method according to supplementary note 3, wherein the step of estimating the stroke polarity of the text in the original image, in a case where the polarity represents a relationship of a magnitude of a luminance value between a pixel point inside the stroke region and a pixel point outside the stroke region, comprises:
the stroke response intensity is calculated in the horizontal direction, the vertical direction and the two diagonal directions by adopting the following formulas respectively:
wherein w is one eighth of the height of the original image, and f (i) represents the brightness value of pixel point i;
judging whether the maximum stroke response intensity in the four calculated stroke response intensities meets the following two conditions: (f) (i) -f (l) has the same polarity as (f (i) -f (k)) and the stroke response strength is greater than a preset threshold, and if so, determining the estimated stroke polarity of the line of text according to the polarities of (f (i) -f (l)) or (f (i) -f (k)); if not, selecting the calculated stroke response strength according to the size relationship in sequence to execute the judging step until the response strength of a certain stroke meets the two conditions, or taking the pixel point i as a non-stroke pixel point under the condition that the response strengths of the four strokes do not meet the two conditions.
5. The method according to supplementary note 3, wherein the step of estimating the stroke polarity of the text in the original image, in a case where the polarity represents a relationship of a magnitude of a luminance value between a pixel point inside the stroke region and a pixel point outside the stroke region, comprises:
calculating the stroke response intensity of each original pixel point in the original image in one direction by adopting the following formula; the one direction is any one of a horizontal direction, a vertical direction and two diagonal directions;
wherein w is one eighth of the height of the original image, and f (i) represents the brightness value of pixel point i;
judging whether the stroke response strength simultaneously meets the following two conditions: (f) (i) -f (l)) has the same polarity as (f (i) -f (k)) and the stroke response strength is greater than a preset threshold, if yes, determining the initial polarity of the original pixel point i according to the polarities of (f) (i) -f (l)) or (f (i) -f (k));
judging whether the stroke response strengths in the four directions are completely calculated, if so, determining the initial polarity corresponding to the maximum stroke response strength in the four directions as the estimated stroke polarity of the line of text; if not, repeating the step of calculating the stroke response intensity of each original pixel point in the original image.
6. The method according to supplementary note 3, wherein the step of estimating the polarity of the strokes of the text in the original image, in case that the polarity represents a color value size relationship between a pixel point inside the stroke region and a pixel point outside the stroke region, comprises:
the stroke response intensity is calculated in the horizontal direction, the vertical direction and the two diagonal directions by adopting the following formulas respectively:
wherein w is one eighth of the height of the original image, fn(i) A color value representing a pixel point i on channel n;
judging whether the maximum stroke response intensity in the four calculated stroke response intensities meets the following two conditions: on channel n [ fn(i)-fn(l)]And [ fn(i)-fn(k)]Are consistent and the stroke response strength is greater than a preset threshold, if so, according to [ f [ ]n(i)-fn(l)]Or [ fn(i)-fn(k)]Determines an estimated stroke polarity for the line of text; if not, selecting the calculated stroke response strength according to the size relationship in sequence to execute the judging step until the response strength of a certain stroke meets the two conditions, or taking the pixel point i as a non-stroke pixel point under the condition that the response strengths of the four strokes do not meet the two conditions.
7. The method according to supplementary note 3, wherein the step of estimating the polarity of the strokes of the text in the original image, in case that the polarity represents a color value size relationship between a pixel point inside the stroke region and a pixel point outside the stroke region, comprises:
calculating the stroke response intensity of each original pixel point in the original image by adopting the following formula in one direction, wherein the one direction is any one direction of a horizontal direction, a vertical direction and two diagonal directions;
wherein w is one eighth of the height of the original image, fn(i) A color value representing a pixel point i on channel n;
judging whether the stroke response strength simultaneously meets the following two conditions: on channel n [ fn(i)-fn(l)]And [ fn(i)-fn(k)]Are consistent and the stroke response strength is greater than a preset threshold, if so, according to [ f [ ]n(i)-fn(l)]Or [ fn(i)-fn(k)]Determining the initial polarity of the original pixel point i;
judging whether the stroke response strengths in the four directions are completely calculated, if so, determining the initial polarity corresponding to the maximum stroke response strength in the four directions as the estimated stroke polarity of the line of text; if not, repeating the step of calculating the stroke response intensity of each original pixel point in the original image.
8. The method according to supplementary note 3, wherein the step of determining whether the filtered updated luminance or/and color values match the stroke polarity comprises:
acquiring a first size relation between the filtered updated brightness value or/and color value and the original brightness value or/and color value;
and judging whether the first size relation is matched with a second size relation represented by the stroke polarity.
9. A text enhancement apparatus comprising:
the acquisition module is used for acquiring an original image comprising a line of text;
the system comprises a filtering module, a calculating module and a calculating module, wherein the filtering module is used for performing stroke two-dimensional filtering on an original brightness value or/and a color value of each original pixel point according to a direct difference and an indirect difference from any original pixel point in an original image to each neighborhood pixel point in a neighborhood set of the original image to obtain a filtered updated brightness value or/and color value of the original image, the range of the neighborhood set is a square with the original pixel point as a center and the side length of the square as w, and the w is smaller than the height of the original image;
and a replacing module, configured to replace the corresponding original luminance value or/and color value with the filtered updated luminance value or/and color value, respectively, so as to generate a text-enhanced image corresponding to the original image.
10. The apparatus according to supplementary note 9, wherein the filtering module includes:
the first obtaining submodule is used for carrying out algebraic subtraction on the original brightness value or/and the color value of the original pixel point and each neighborhood pixel point to obtain the direct difference degree;
the second obtaining submodule is used for obtaining the indirect difference according to the gradient modulus value from the original pixel point to each neighborhood pixel point of the neighborhood set;
the weight calculation submodule is used for calculating the weight value of each neighborhood pixel point to the brightness value or/and the color value of the original pixel point according to the direct difference degree and the indirect difference degree;
the updated brightness value calculation submodule is used for calculating the updated brightness value of the original pixel point by adopting the following stroke two-dimensional filtering formula;
wherein, n (i) represents a neighborhood set of a pixel point i, w (i, j) represents a weighted value of a neighborhood pixel point j to a brightness value of an original pixel point i, and f (j) is a brightness value of a pixel point j in the neighborhood set; and/or the presence of a gas in the gas,
the updated color value calculation submodule is used for calculating the updated color value of the original pixel point by adopting the following stroke two-dimensional filtering formula;
wherein, the wn(i, j) represents the weight value of the color value of the original pixel point i by the neighborhood pixel point j on the n channel, fn(j) And the color value of the pixel point j in the neighborhood set on the n channel is obtained.
11. The apparatus according to supplementary note 9, further comprising:
the stroke polarity estimation module is used for estimating the stroke polarity of the text in the original image, wherein the polarity represents the magnitude relation of the brightness value or/and the color value between the pixel point positioned in the stroke area and the pixel point positioned outside the stroke area; and
and the judging module is used for judging whether the filtered updated brightness value or/and color value is matched with the stroke polarity or not, and if so, triggering the replacing module.
12. According to the apparatus of supplementary note 11, in a case where the polarity indicates a relationship between a luminance value of a pixel point inside the stroke region and a pixel point outside the stroke region, the stroke polarity estimation module includes:
the first calculation submodule is used for calculating the stroke response intensity in the horizontal direction, the vertical direction and the two diagonal directions by adopting the following formulas respectively:
wherein w is one eighth of the height of the original image, and f (i) represents the brightness value of pixel point i;
the first judgment submodule is used for judging whether the maximum stroke response intensity in the four calculated stroke response intensities meets the following two conditions: (l) has the same polarity as (f) (i) -f (k) and the stroke response strength is greater than a preset threshold;
a first determining sub-module, configured to determine, if a result of the first determining sub-module is yes, an estimated stroke polarity of the line of text according to the polarities of [ f (i) -f (l)) or [ f (i) -f (k));
and the first triggering submodule is used for selecting the calculated stroke response intensity according to the size relationship in sequence and triggering the first judging submodule under the condition that the result of the first judging submodule is negative until the response intensity of a certain stroke meets the two conditions, or taking the pixel point i as a non-stroke pixel point under the condition that the response intensities of the four strokes do not meet the two conditions.
13. According to the apparatus of supplementary note 11, in a case where the polarity indicates a relationship between a luminance value of a pixel point inside the stroke region and a pixel point outside the stroke region, the stroke polarity estimation module includes:
the second calculation submodule is used for calculating the stroke response intensity of each original pixel point in the original image in one direction by adopting the following formula, wherein the one direction is any one of the horizontal direction, the vertical direction and the two diagonal directions;
wherein w is one eighth of the height of the original image, and f (i) represents the brightness value of pixel point i;
the second judgment submodule is used for judging whether the stroke response strength simultaneously meets the following two conditions: (l) has the same polarity as (f) (i) -f (k) and the stroke response strength is greater than a preset threshold;
a second determining sub-module, configured to determine, if the second determining sub-module is yes, an initial polarity of the original pixel point i according to the polarities of [ f (i) -f (l)) or [ f (i) -f (k)) ];
the third judgment submodule is used for judging whether the stroke response intensity in the four directions is completely calculated;
a third determining submodule, configured to determine, when a result of the third determining submodule is yes, an initial polarity corresponding to maximum stroke response strengths in four directions as an estimated stroke polarity of the line of text;
and the second triggering submodule is used for triggering the second calculating submodule under the condition that the result of the third judging submodule is negative.
14. According to the apparatus of supplementary note 11, in a case where the polarity indicates a color value size relationship between a pixel point inside the stroke region and a pixel point outside the stroke region, the stroke polarity estimation module includes:
the third computation submodule is used for computing the stroke response intensity in the horizontal direction, the vertical direction and the two diagonal directions by adopting the following formulas respectively:
wherein w is one eighth of the height of the original image, fn(i) A color value representing a pixel point i on channel n;
a fourth judgment submodule for judging the maximum stroke response intensity among the calculated stroke response intensitiesWhether the following two conditions are satisfied: on channel n [ fn(i)-fn(l)]And [ fn(i)-fn(k)]The polarities of the strokes are kept consistent, and the stroke response strength is greater than a preset threshold value;
a fourth determination submodule for determining, if the result of the fourth determination submodule is yes, a criterion according to [ f ]n(i)-fn(l)]Or [ fn(i)-fn(k)]Determines an estimated stroke polarity for the line of text;
and the third triggering submodule is used for selecting the calculated stroke response intensity according to the size relationship in sequence and triggering the fourth judging submodule under the condition that the result of the fourth judging submodule is negative until the response intensity of a certain stroke meets the two conditions, or taking the pixel point i as a non-stroke pixel point under the condition that the response intensities of the four strokes do not meet the two conditions.
15. According to the apparatus of supplementary note 11, in a case where the polarity indicates a color value size relationship between a pixel point inside the stroke region and a pixel point outside the stroke region, the stroke polarity estimation module includes:
the fourth calculation submodule is used for calculating the stroke response intensity of each original pixel point in the original image in one direction by adopting the following formula, wherein the one direction is any one of the horizontal direction, the vertical direction and the two diagonal directions;
wherein w is one eighth of the height of the original image, fn(i) A color value representing a pixel point i on channel n;
a fifth judging submodule, configured to judge whether the stroke response strength satisfies the following two conditions at the same time: on channel n [ fn(i)-fn(l)]And [ fn(i)-fn(k)]The polarities of the strokes are kept consistent, and the stroke response strength is greater than a preset threshold value;
a fifth determination submodule for determining, if the result of the fifth determination submodule is yes, a criterion according to [ fn(i)-fn(l)]Or [ fn(i)-fn(k)]Determining the initial polarity of the original pixel point i;
the sixth judgment submodule is used for judging whether the stroke response strengths in the four directions are completely calculated;
a sixth determining submodule, configured to determine, when a result of the determining submodule is yes, an initial polarity corresponding to a maximum stroke response strength in four directions as an estimated stroke polarity of the line of text;
and the fourth triggering submodule is used for triggering the fourth calculating submodule under the condition that the result of the sixth judging submodule is negative.
16. The apparatus according to supplementary note 11, wherein the judging means includes:
a third obtaining sub-module, configured to obtain a first size relationship between the filtered updated luminance value or/and color value and the original luminance value or/and color value;
and the seventh judging submodule is used for judging whether the first size relation is matched with the second size relation represented by the stroke polarity.
17. A text extraction method, comprising:
acquiring an original image comprising a line of text;
according to the direct difference and the indirect difference from any original pixel point in the original image to each neighborhood pixel point in the neighborhood set, performing stroke two-dimensional filtering on the original brightness value or/and the color value of each original pixel point to obtain the filtered updated brightness value or/and the filtered color value of the original image; the range of the neighborhood set is a square which takes the original pixel point as the center and has the side length of w; the w is less than the height of the original image;
replacing the corresponding original brightness value or/and color value with the filtered updated brightness value or/and color value respectively to generate a text enhanced image corresponding to the original image;
and extracting the text in the text enhanced image.
18. A text extraction apparatus comprising:
the acquisition module is used for acquiring an original image comprising a line of text;
the system comprises a filtering module, a calculating module and a calculating module, wherein the filtering module is used for performing stroke two-dimensional filtering on an original brightness value or/and a color value of each original pixel point according to a direct difference and an indirect difference from any original pixel point in an original image to each neighborhood pixel point in a neighborhood set of the original image to obtain a filtered updated brightness value or/and color value of the original image, the range of the neighborhood set is a square with the original pixel point as a center and the side length of the square as w, and the w is smaller than the height of the original image;
a replacing module, configured to replace the corresponding original luminance value or/and color value with the filtered updated luminance value or/and color value, respectively, so as to generate a text-enhanced image corresponding to the original image;
and the extraction module is used for extracting the text in the text enhanced image.
Claims (6)
1. A text enhancement method, comprising:
acquiring an original image comprising a line of text;
according to the direct difference and the indirect difference from any original pixel point in the original image to each neighborhood pixel point in a neighborhood set of the original image, performing stroke two-dimensional filtering on the original brightness value or/and the color value of each original pixel point to obtain a filtered updated brightness value or/and color value of the original image, wherein the range of the neighborhood set is a square with the original pixel point as a center and the side length of the square as w, and the w is smaller than the height of the original image;
replacing the corresponding original luminance values or/and color values with the filtered updated luminance values or/and color values, respectively, to generate a text-enhanced image corresponding to the original image,
after the acquiring of the original image including one line of text, the method further includes:
estimating the stroke polarity of the text in the original image, wherein the polarity represents the size relation of the brightness value or/and the color value between the pixel point positioned in the stroke area and the pixel point positioned outside the stroke area;
and after obtaining the filtered updated brightness value or/and color value of the original image, further comprising:
judging whether the updated brightness value or/and color value after filtering is matched with the stroke polarity, if so, executing the step of replacing the original brightness value or/and color value,
wherein, the step of performing stroke two-dimensional filtering on the original brightness value or/and the color value of each original pixel point comprises:
algebraically subtracting the original brightness value or/and the color value of the original pixel point and each neighborhood pixel point to obtain the direct difference degree;
acquiring the indirect difference according to the gradient modulus value from the original pixel point to each neighborhood pixel point of the neighborhood set;
calculating the weight value of each neighborhood pixel point to the brightness value or/and the color value of the original pixel point according to the direct difference and the indirect difference;
calculating the updated brightness value of the original pixel point by adopting a following stroke two-dimensional filtering formula;
wherein N (i) represents a neighborhood set of pixel points i; w (i, j) represents the weight value of the brightness value of the original pixel point i by the neighborhood pixel point j; f (j) is the brightness value of the pixel point j in the neighborhood set;
and/or
Calculating the updated color value of the original pixel point by adopting a following stroke two-dimensional filtering formula;
wherein, the wn(i, j) represents the weight value of the color value of the original pixel point i by the neighborhood pixel point j on the n channel; f. ofn(j) And the color value of the pixel point j in the neighborhood set on the n channel is obtained.
2. The method of claim 1, wherein in the case that the polarity represents a brightness value magnitude relationship between a pixel point inside the stroke region and a pixel point outside the stroke region, the step of estimating the stroke polarity of the text in the original image comprises:
the stroke response intensity is calculated in the horizontal direction, the vertical direction and the two diagonal directions by adopting the following formulas respectively:
wherein w is one eighth of the height of the original image, and f (i) represents the brightness value of pixel point i;
judging whether the maximum stroke response intensity in the four calculated stroke response intensities meets the following two conditions: (f) (i) -f (l) has the same polarity as (f (i) -f (k)) and the stroke response strength is greater than a preset threshold, and if so, determining the estimated stroke polarity of the line of text according to the polarities of (f (i) -f (l)) or (f (i) -f (k)); if not, selecting the calculated stroke response strength according to the size relationship in sequence to execute the judging step until the response strength of a certain stroke meets the two conditions, or taking the pixel point i as a non-stroke pixel point under the condition that the response strengths of the four strokes do not meet the two conditions.
3. The method of claim 1, wherein, in the case that the polarity represents a color value size relationship between a pixel point inside a stroke region and a pixel point outside the stroke region, the step of estimating the stroke polarity of the text in the original image comprises:
the stroke response intensity is calculated in the horizontal direction, the vertical direction and the two diagonal directions by adopting the following formulas respectively:
wherein w is one eighth of the height of the original image, fn(i) A color value representing a pixel point i on channel n;
judging whether the maximum stroke response intensity in the four calculated stroke response intensities meets the following two conditions: on channel n [ fn(i)-fn(l)]And [ fn(i)-fn(k)]Are consistent and the stroke response strength is greater than a preset threshold, if so, according to [ f [ ]n(i)-fn(l)]Or [ fn(i)-fn(k)]Determines an estimated stroke polarity for the line of text; if not, selecting the calculated stroke response strength according to the size relationship in sequence to execute the judging step until the response strength of a certain stroke meets the two conditions, or taking the pixel point i as a non-stroke pixel point under the condition that the response strengths of the four strokes do not meet the two conditions.
4. A text enhancement apparatus comprising:
the acquisition module is used for acquiring an original image comprising a line of text;
the system comprises a filtering module, a calculating module and a calculating module, wherein the filtering module is used for performing stroke two-dimensional filtering on an original brightness value or/and a color value of each original pixel point according to a direct difference and an indirect difference from any original pixel point in an original image to each neighborhood pixel point in a neighborhood set of the original image to obtain a filtered updated brightness value or/and color value of the original image, the range of the neighborhood set is a square with the original pixel point as a center and the side length of the square as w, and the w is smaller than the height of the original image;
a replacing module, configured to replace the corresponding original luminance value or/and color value with the filtered updated luminance value or/and color value, respectively, so as to generate a text-enhanced image corresponding to the original image;
the stroke polarity estimation module is used for estimating the stroke polarity of the text in the original image, wherein the polarity represents the magnitude relation of the brightness value or/and the color value between the pixel point positioned in the stroke area and the pixel point positioned outside the stroke area; and
a judging module, configured to judge whether the filtered updated brightness value or/and color value is matched with the stroke polarity, if so, trigger the replacing module,
wherein the filtering module comprises:
the first obtaining submodule is used for carrying out algebraic subtraction on the original brightness value or/and the color value of the original pixel point and each neighborhood pixel point to obtain the direct difference degree;
the second obtaining submodule is used for obtaining the indirect difference according to the gradient modulus value from the original pixel point to each neighborhood pixel point of the neighborhood set;
the weight calculation submodule is used for calculating the weight value of each neighborhood pixel point to the brightness value or/and the color value of the original pixel point according to the direct difference degree and the indirect difference degree;
the updated brightness value calculation submodule is used for calculating the updated brightness value of the original pixel point by adopting the following stroke two-dimensional filtering formula;
wherein, n (i) represents a neighborhood set of a pixel point i, w (i, j) represents a weighted value of a neighborhood pixel point j to a brightness value of an original pixel point i, and f (j) is a brightness value of a pixel point j in the neighborhood set; and/or
The updated color value calculation submodule is used for calculating the updated color value of the original pixel point by adopting the following stroke two-dimensional filtering formula;
wherein, the wn(i, j) represents the weight value of the color value of the original pixel point i by the neighborhood pixel point j on the n channel, fn(j) And the color value of the pixel point j in the neighborhood set on the n channel is obtained.
5. The apparatus of claim 4, where the polarity represents a luminance value magnitude relationship between a pixel point inside a stroke region and a pixel point outside the stroke region, the stroke polarity estimation module comprising:
the first calculation submodule is used for calculating the stroke response intensity in the horizontal direction, the vertical direction and the two diagonal directions by adopting the following formulas respectively:
wherein w is one eighth of the height of the original image, and f (i) represents the brightness value of pixel point i;
the first judgment submodule is used for judging whether the maximum stroke response intensity in the four calculated stroke response intensities meets the following two conditions: (l) has the same polarity as (f) (i) -f (k) and the stroke response strength is greater than a preset threshold;
a first determining sub-module, configured to determine, if a result of the first determining sub-module is yes, an estimated stroke polarity of the line of text according to the polarities of [ f (i) -f (l)) or [ f (i) -f (k));
and the first triggering submodule is used for selecting the calculated stroke response intensity according to the size relationship in sequence and triggering the first judging submodule under the condition that the result of the first judging submodule is negative until the response intensity of a certain stroke meets the two conditions, or taking the pixel point i as a non-stroke pixel point under the condition that the response intensities of the four strokes do not meet the two conditions.
6. The apparatus of claim 4, where the polarity represents a color value size relationship between a pixel point inside a stroke region and a pixel point outside the stroke region, the stroke polarity estimation module comprising:
and the second calculation submodule is used for calculating the stroke response intensity in the horizontal direction, the vertical direction and the two diagonal directions by adopting the following formulas respectively:
wherein w is one eighth of the height of the original image, fn(i) A color value representing a pixel point i on channel n;
the second judgment submodule is used for judging whether the maximum stroke response intensity in the four calculated stroke response intensities meets the following two conditions: on channel n [ fn(i)-fn(l)]And [ fn(i)-fn(k)]The polarities of the strokes are kept consistent, and the stroke response strength is greater than a preset threshold value;
a second determination submodule for, in the event that the result of said second determination submodule is yes, depending on [ fn(i)-fn(l)]Or [ fn(i)-fn(k)]Determines an estimated stroke polarity for the line of text;
and the second triggering submodule is used for selecting the calculated stroke response intensity according to the size relationship in sequence and triggering the second judging submodule under the condition that the result of the second judging submodule is negative until the response intensity of a certain stroke meets the two conditions, or taking the pixel point i as a non-stroke pixel point under the condition that the response intensities of the four strokes do not meet the two conditions.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110172095.1A CN102831579B (en) | 2011-06-16 | 2011-06-16 | Text enhancement method and device, text extraction method and device |
JP2012132919A JP5939047B2 (en) | 2011-06-16 | 2012-06-12 | Text enhancement method and apparatus, and text extraction method and apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110172095.1A CN102831579B (en) | 2011-06-16 | 2011-06-16 | Text enhancement method and device, text extraction method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102831579A CN102831579A (en) | 2012-12-19 |
CN102831579B true CN102831579B (en) | 2015-06-17 |
Family
ID=47334696
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201110172095.1A Active CN102831579B (en) | 2011-06-16 | 2011-06-16 | Text enhancement method and device, text extraction method and device |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP5939047B2 (en) |
CN (1) | CN102831579B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106485666B (en) * | 2015-08-31 | 2019-11-29 | 中国航天科工集团第四研究院指挥自动化技术研发与应用中心 | A kind of information indicating method and apparatus |
CN109285123B (en) * | 2017-07-20 | 2020-09-11 | 展讯通信(上海)有限公司 | Image smoothing method and device, computer readable storage medium and terminal |
CN107424137B (en) * | 2017-08-01 | 2020-06-19 | 深信服科技股份有限公司 | Text enhancement method and device, computer device and readable storage medium |
CN110263301B (en) * | 2019-06-27 | 2023-12-05 | 北京百度网讯科技有限公司 | Method and device for determining color of text |
CN110738625B (en) * | 2019-10-21 | 2022-03-11 | Oppo广东移动通信有限公司 | Image resampling method, device, terminal and computer readable storage medium |
CN111582290B (en) * | 2020-05-13 | 2023-04-07 | 郑州轻工业大学 | Computer image recognition method |
CN116468640B (en) * | 2023-06-20 | 2023-08-29 | 山东正禾大教育科技有限公司 | Video image enhancement method for Internet teaching |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101021905A (en) * | 2006-02-15 | 2007-08-22 | 中国科学院自动化研究所 | File image binaryzation method |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3501031B2 (en) * | 1999-08-24 | 2004-02-23 | 日本電気株式会社 | Image region determination device, image region determination method, and storage medium storing program thereof |
JP3488678B2 (en) * | 2000-08-10 | 2004-01-19 | シャープ株式会社 | Image classification device |
JP4659789B2 (en) * | 2006-06-30 | 2011-03-30 | キヤノン株式会社 | Image processing apparatus, image processing method, program, and recording medium |
US7856142B2 (en) * | 2007-01-26 | 2010-12-21 | Sharp Laboratories Of America, Inc. | Methods and systems for detecting character content in a digital image |
JP2009278363A (en) * | 2008-05-14 | 2009-11-26 | Canon Inc | Image processor and image processing method |
JP2009302761A (en) * | 2008-06-11 | 2009-12-24 | Canon Inc | Image processor |
-
2011
- 2011-06-16 CN CN201110172095.1A patent/CN102831579B/en active Active
-
2012
- 2012-06-12 JP JP2012132919A patent/JP5939047B2/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101021905A (en) * | 2006-02-15 | 2007-08-22 | 中国科学院自动化研究所 | File image binaryzation method |
Also Published As
Publication number | Publication date |
---|---|
JP5939047B2 (en) | 2016-06-22 |
CN102831579A (en) | 2012-12-19 |
JP2013004094A (en) | 2013-01-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102831579B (en) | Text enhancement method and device, text extraction method and device | |
EP3540637B1 (en) | Neural network model training method, device and storage medium for image processing | |
US9767387B2 (en) | Predicting accuracy of object recognition in a stitched image | |
JP3944738B2 (en) | Image processing apparatus and method, recording medium, and program | |
JP5900208B2 (en) | Image processing apparatus and image processing method | |
US9374570B2 (en) | Apparatus and method for correcting depth map for three-dimensional image | |
KR20150051711A (en) | Apparatus and method for extracting skin area for blocking harmful content image | |
CN104462381A (en) | Trademark image retrieval method | |
CN102855478B (en) | Image Chinese version area positioning method and device | |
CN103312963A (en) | Image processing device and image processing method | |
CN104574328A (en) | Color image enhancement method based on histogram segmentation | |
US10810462B2 (en) | Object detection with adaptive channel features | |
CN109982012B (en) | Image processing method and device, storage medium and terminal | |
CN112800850A (en) | Video processing method and device, electronic equipment and storage medium | |
CN103985106A (en) | Equipment and method used for multi-frame fusion of strong noise images | |
CN104077765B (en) | Image segmentation device, image partition method | |
JP2008210387A (en) | Noise elimination device and noise elimination program for improving binarization performance of document image | |
CN102447870A (en) | Stationary object detection method and motion compensation device | |
US9311563B2 (en) | Method and apparatus for generating hierarchical saliency images detection with selective refinement | |
CN105139372A (en) | Codebook improvement algorithm for prospect detection | |
CN104301618A (en) | Flicker detecting method and device | |
CN116958113A (en) | Product detection method, device, equipment and storage medium | |
KR20130124659A (en) | System for detaching object and the method thereof | |
Jeon et al. | Rough sets-assisted subfield optimization for alternating current plasma display panel | |
CN108564593A (en) | A kind of image partition method and system based on anomaly particle cluster algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |