[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN112348028A - Scene text detection method, correction method, device, electronic equipment and medium - Google Patents

Scene text detection method, correction method, device, electronic equipment and medium Download PDF

Info

Publication number
CN112348028A
CN112348028A CN202011385920.1A CN202011385920A CN112348028A CN 112348028 A CN112348028 A CN 112348028A CN 202011385920 A CN202011385920 A CN 202011385920A CN 112348028 A CN112348028 A CN 112348028A
Authority
CN
China
Prior art keywords
map
target
question
threshold
contour
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011385920.1A
Other languages
Chinese (zh)
Inventor
孙永毫
徐强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Guoli Education Technology Co ltd
Original Assignee
Guangdong Guoli Education Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Guoli Education Technology Co ltd filed Critical Guangdong Guoli Education Technology Co ltd
Priority to CN202011385920.1A priority Critical patent/CN112348028A/en
Publication of CN112348028A publication Critical patent/CN112348028A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a scene text detection method, a correction device, electronic equipment and a medium, and belongs to the technical field of network intelligent education. The scene text detection method comprises the steps of obtaining a target picture, wherein the target picture is sent by an intelligent terminal; processing the target picture according to the feature pyramid network to generate a feature map F, predicting a probability map P and a threshold map T through the feature map F, and generating an approximate binary map B through the probability map P and the threshold map T; and carrying out self-adaptive threshold processing on the approximate binary image B by utilizing a differentiable binarization processing model to obtain a first target result, wherein the first target result comprises different regions in the target image. The invention utilizes a differentiable binarization processing model to implement binarization operation in a segmentation network so as to achieve the effect of combination optimization and realize the self-adaptation of the threshold value in each position of the thermodynamic diagram, thereby shortening the reasoning and calculating time of picture and character recognition and improving the recognition correction rate.

Description

Scene text detection method, correction method, device, electronic equipment and medium
Technical Field
The invention belongs to the technical field of network intelligent education, and particularly relates to a scene text detection method, a correction device, electronic equipment and a medium.
Background
With the development of computer technology, on-line teaching is rapidly developed, and some corresponding teaching tool products are produced at the same time, so that technical support and help in education guidance are provided for students, teachers and parents, and many teaching tool products can provide the function of correcting subjects by taking pictures or screenshots.
The most important thing for the modification of the picture is the recognition process, and the most dependent on the quality of the picture to be photographed. Different from document character recognition, character recognition in natural scenes has the problems of complex image background, low resolution, various fonts, various shapes and the like, and the traditional optical character recognition cannot be applied under the conditions. In order to better recognize the natural scene text, the scene text needs to be detected more accurately.
In recent years, segmentation-based methods have become popular in the field of scene text detection because they are more accurate in detecting scene texts of various shapes (curved, vertical, multi-directional).
Due to the pixel-level prediction result, the scene character detection method based on segmentation can describe characters with different shapes, and is popular recently. However, most segmentation-based methods require complex post-processing to classify pixel-level predictors into detected text instances, resulting in a relatively high time-cost for inference.
The detection of scene text based on segmentation converts a probability map (thermodynamic diagram) generated by a segmentation method into a bounding box and a character area, wherein a binarization post-processing process is included. The binarization process is very critical, a fixed threshold value is set in the conventional binarization operation, and the used standard binarization function is not differentiable, so that the fixed threshold value is difficult to adapt to complex and variable detection scenes, and finally, the detected result has high distortion rate, low accuracy and high requirement on post-processing.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides a scene text detection method, a correction device, an electronic device and a medium, solves the problem that binarization processing cannot be differentiated in the prior art, and overcomes the technical bottleneck of low recognition efficiency.
In order to achieve the above object, in a first aspect, the present invention provides a scene text detection method, where the method includes:
acquiring a target picture, wherein the target picture is sent by an intelligent terminal;
processing the target picture according to the feature pyramid network to generate a feature map F, predicting a probability map P and a threshold map T through the feature map F, and generating an approximate binary map B through the probability map P and the threshold map T;
and carrying out self-adaptive threshold processing on the approximate binary image B by utilizing a differentiable binarization processing model to obtain a first target result, wherein the first target result comprises different regions in the target image.
Further, in a differentiable binarization processing model, an approximate step function is introduced, differentiable binarization processing is applied to a segmentation network, and when a relationship between the probability map P and the threshold map T and the binary map B is established, the following formula is used:
Figure BDA0002808788000000021
where k is the amplification factor.
Further, the first target result includes at least one functional area generated by segmentation, and the functional area is calculated and identified to obtain a first identification contour, where the first identification contour is described by a set of line segments:
Figure BDA0002808788000000022
wherein n represents the number of vertices;
the polygon is reduced by the Vatti clipping algorithm, and the contraction offset D is calculated by the perimeter L and the area A:
Figure BDA0002808788000000023
where r is the contraction factor.
Further, optimizing the first target result by using a loss function L, wherein the loss function L is obtained by calculating the weight of the probability map P loss Ls, the binary map B loss Lb and the threshold map T loss Lt: l ═ Ls + α × Lb + β × Lt, where α and β are weighting factors, and the probability map P penalty Ls and the binary map B penalty Lb use a binary cross entropy penalty function:
Figure BDA0002808788000000031
wherein S istA sample set representing a ratio of positive and negative samples of 1: 3;
lt uses the L1 distance loss function:
Figure BDA0002808788000000032
in a second aspect, the present invention further provides a batching method, which applies the scene text detection method as described above, where the method includes:
applying the scene text detection method according to any one of claims 1 to 4 to obtain a first target result;
and carrying out correction processing on the first target result to obtain a second target result.
Further, a simulation training model is added, and in the training phase, supervision is carried out on the probability map P, the threshold map T and the approximate binary map B, wherein the threshold map T and the approximate binary map B share the same supervision.
Further, in the process of separating the first target result, determining a test paper outline, a text line outline and an answer number frame outline, wherein the test paper outline comprises a whole target picture, the text line outline comprises each line of text, the answer number frame outline comprises the answer number of each question, the upper border of each question is defined by the answer number frame outline and the text line outline, the left end point and the right end point of the upper border are extended, the upper border is connected with the test paper outline in the left-right extending direction, and the upper border divides the test paper outline into at least one test question area.
And further, calculating and identifying the test question area, wherein the first identification outline comprises a print outline, a graph outline and a handwritten outline, the print outline and the graph outline form question information, and the handwritten outline forms answer information.
Further, in the process of performing correction processing on the first target result, the first target result includes topic information and answer information, OCR recognition is performed on the topic information to obtain topic text recognition information, and OCR recognition is performed on the answer information to obtain answer text recognition information;
extracting key words in the question text identification information according to the question text identification information and the graphic outline, and inquiring in a database according to the key words to obtain a similar original question group; and identifying a graph area in the similar question group, judging the graph similarity between the graph area and the graph outline, determining a final question from the similar question group when the graph similarity is greater than a preset similarity, and inquiring to obtain a corresponding answer analysis according to the final question.
In a third aspect, the present invention further provides an apparatus applied to the scene text detection method, including:
an acquisition unit configured to acquire a target picture, the target picture being transmitted by an intelligent terminal;
a generating unit configured to process the target picture according to a feature pyramid network, generate a feature map F, predict a probability map P and a threshold map T from the feature map F, and generate an approximate binary map B from the probability map P and the threshold map T;
a binarization unit configured to perform adaptive threshold processing on the approximated binary image B by using a differentiable binarization processing model to obtain a first target result, wherein the first target result comprises different regions in the generated target image.
In a fourth aspect, the present invention further provides an apparatus applied to the above modifying method, including:
the scene text detection device further includes a correcting unit, where the correcting unit is configured to perform correcting processing on the first target result to obtain a second target result.
In a fifth aspect, the present invention further provides an electronic device, including a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or an instruction set, and the at least one instruction, at least one program, a code set, or an instruction set is loaded and executed by the processor to implement the scene text detection method as described above, or the modification method as described above.
In a sixth aspect, the present invention is also a computer readable storage medium, on which computer instructions are stored, which when executed by a processor implement the steps of the scene text detection method or the steps of the correction method.
The invention has the beneficial effects that:
the invention utilizes a differentiable binarization processing model to implement binarization operation in a segmentation network so as to achieve the effect of combination optimization and realize the self-adaptation of the threshold value in each position of the thermodynamic diagram, thereby shortening the reasoning and calculating time of picture and character recognition, improving the recognition correction rate, having high accuracy of detection and recognition and reducing the requirement on post-processing.
Drawings
The invention is further illustrated by means of the attached drawings, but the embodiments in the drawings do not constitute any limitation to the invention, and for a person skilled in the art, other drawings can be obtained on the basis of the following drawings without inventive effort.
Fig. 1 is a schematic flowchart of a scene text detection method provided in this embodiment 1.
Fig. 2 is a schematic flow chart of a batching method provided in this embodiment 2.
Fig. 3 is a schematic diagram of a framework of a scene text detection device provided in embodiment 3.
Fig. 4 is a schematic diagram of a frame of a modification device provided in this embodiment 4.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
Example 1:
referring to fig. 1, this embodiment 1 provides a scene text detection method, where the method includes:
acquiring a target picture, wherein the target picture is sent by an intelligent terminal and can be obtained by shooting and sending by the intelligent terminal;
processing the target picture according to the feature pyramid network to generate a feature map F, predicting a probability map P and a threshold map T through the feature map F, and generating an approximate binary map B through the probability map P and the threshold map T;
and carrying out self-adaptive threshold processing on the approximate binary image B by utilizing a differentiable binarization processing model to obtain a first target result, wherein the first target result comprises different regions in the target image.
After receiving a target picture, converting the output of the feature pyramid into the same size by an upsampling mode through a backbone with a feature pyramid structure, and cascading (cascade) to generate a feature F; in a conventional binarization operation method, where P denotes a probability map, t denotes a partition threshold, the probability map P output by the network is partitioned by a fixed threshold, specifically using the following formula:
Figure BDA0002808788000000061
since this binarization calculation method is not differentiable, this method cannot be optimized with the segmentation network in the training phase.
In this embodiment, an approximate step function is introduced into the differentiable binarization processing model, the differentiable binarization processing is applied to the segmentation network, and when a relationship between the probability map P and the threshold map T is established, the following formula is used:
Figure BDA0002808788000000071
where k is an amplification factor and is empirically set to 50.
The binary calculation is differentiable through the formula, so that the condition of gradient back propagation can be met, and the differentiable binary calculation with the self-adaptive threshold value is not only beneficial to segmenting different regions, but also can segment similar example regions.
In this embodiment, the first target result includes at least one functional area generated by segmentation, and the functional area is calculated and identified to obtain a first identification contour, where the first identification contour is a polygonal shape and includes curved, vertical, and multi-directional sides with different angles, that is, the first identification contour may be described by a set of line segments:
Figure BDA0002808788000000072
wherein n represents the number of vertices;
the polygon is reduced by the Vatti clipping algorithm, and the contraction offset D is calculated by the perimeter L and the area A:
Figure BDA0002808788000000073
where r is the contraction factor.
In this embodiment, the first target result is optimized by using a loss function L, which is obtained by weight calculation from the probability map P loss Ls, the binary map B loss Lb, and the threshold map T loss Lt: l is Ls + α × Lb + β × Lt, where α and β are weighting factors, and α and β are set to 1.0 and 10, respectively.
The probability map P penalty Ls and the binary map B penalty Lb use a binary cross entropy penalty function:
Figure BDA0002808788000000074
wherein S istA sample set representing a ratio of positive and negative samples of 1: 3;
lt uses the L1 distance loss function:
Figure BDA0002808788000000075
example 2:
referring to fig. 2, this embodiment 2 provides a batching method to which the scene text detection method in embodiment 1 is applied, and the method includes:
applying the scene text detection method in embodiment 1 to obtain a first target result;
and carrying out correction processing on the first target result to obtain a second target result.
It should be noted that the scene text detection method in embodiment 1 is applied to a correction method in the educational field, and more specifically, a target photo is obtained by taking a picture, and after the processing by the scene text detection method in embodiment 1, a first target result is obtained, and the first target result separates areas representing different meanings, such as titles, answers, graphs, and formulas, in the target photo according to a set requirement, and performs correction processing on the first target result, and pushes a second target result to a user.
In this embodiment, a simulation training model is added, thereby leading to two phases: a training phase and a correcting phase. In the training phase, supervising is carried out on the probability map P, the threshold map T and the approximate binary map B, wherein the threshold map T and the approximate binary map B share the same supervising; by the method, the boundary box can be easily and quickly acquired from the threshold value image T and the approximate binary value image B in the correction stage.
Additionally, training data can be randomly generated through a simulation training model, and the model can be effectively perfected and the response speed of the reaction is improved through the training data and the flow sequence of detection and correction.
As a preferable mode, in the process of separating the first target result, a test paper contour, a text line contour and an answer number frame contour are determined, the test paper contour includes the whole target picture, the text line contour includes each line of text, the answer number frame contour includes the answer number of each answer, the upper boundary of each answer is defined by the answer number frame contour and the text line contour, the left end point and the right end point of the upper boundary are extended, the upper boundary is connected with the test paper contour in the left-right extending direction, and the upper boundary separates the test paper contour into at least one test question area.
It should be noted that the upper boundary of the question mark frame outline is the upper boundary of each question, the upper boundary of each question can be defined by the question mark frame outline and the corresponding text line outline of the first line, after the left and right end points of the upper boundary are extended, the upper boundary is combined with the test paper outline, namely the upper boundary divides the test paper outline into at least one test question area, one test question area is arranged between every two upper boundaries, and the last test question of the page is arranged between the upper boundary and the bottom edge of the test paper outline.
As a preferred mode, calculating and identifying each test question area to obtain a first identification outline, wherein the first identification outline comprises a printing outline, a graph outline and a handwritten outline, the printing outline and the graph outline form question information, and the handwritten outline forms answer information; it should be noted that, the method can also include a formula outline, and because the subject information includes a formula and a figure besides printed characters, the three elements form all the subject information; the outline of the handwriting may also include characters and formulas, but since all are handwriting, all are included in the answer information.
As a preferred mode, in the course of performing the correction processing on the first target result, since the first target result is already calculated and identified, and corresponding topic information and answer information are formed, wherein each topic information and answer information corresponds to a label, further, performing OCR recognition on the topic information to obtain topic text recognition information, and performing OCR recognition on the answer information to obtain answer text recognition information;
extracting key words in the question text identification information according to the question text identification information and the graphic outline and also can comprise a formula outline, and inquiring in a database according to the key words to obtain a similar original question group; and identifying a graph area in the similar question group, judging the graph similarity between the graph area and the graph outline, determining a final question from the similar question group when the graph similarity is greater than a preset similarity, and inquiring to obtain a corresponding answer analysis according to the final question.
It should be noted that, in the first step, a similar problem group is found according to the keywords in the topic text identification information, where the similar problem group includes at least one problem, and then the final problem is further located by combining the graphic similarity between the graphic area of the problem and the graphic outline, and optionally, the final problem and the corresponding answer are pushed to the user for analysis; of course, according to practical situations, after the title is corrected, at least one of the score/loss, the correction mark and the score ranking related to the title can be pushed, and correction can be selected to be directly performed on the target picture or the correction result can be sent to the intelligent terminal.
Example 3:
referring to fig. 3, this embodiment 3 provides an apparatus applied to the scene text detection method in embodiment 1, including:
an acquisition unit configured to acquire a target picture, the target picture being transmitted by an intelligent terminal;
a generating unit configured to process the target picture according to a feature pyramid network, generate a feature map F, predict a probability map P and a threshold map T from the feature map F, and generate an approximate binary map B from the probability map P and the threshold map T;
a binarization unit configured to perform adaptive threshold processing on the approximated binary image B by using a differentiable binarization processing model to obtain a first target result, wherein the first target result comprises different regions in the generated target image.
The method includes the steps of obtaining a target picture by an obtaining unit, wherein the target picture includes information of test question questions, test question answers, examinees, examination time, subjects, grade and the like, generating an approximate binary image B for the target picture, performing adaptive threshold processing on the approximate binary image B by using a differentiable binarization processing model, and accurately segmenting different text regions in the scene.
Example 4:
referring to fig. 4, this embodiment 4 provides an apparatus applied to the correction method in embodiment 2, including the scene text detection apparatus in embodiment 3 and a correction unit, where the correction unit is configured to perform correction processing on the first target result to obtain a second target result.
It should be noted that, on the basis of embodiment 3, a correction unit is added, and different text areas are identified, judged and corrected in combination with the photographing correction requirement in the education field.
Example 5:
this embodiment 5 provides an electronic device, including a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or a set of instructions, and the at least one instruction, at least one program, a code set, or a set of instructions is loaded and executed by the processor to implement the scene text detection method in embodiment 1 or the modification method in embodiment 2.
Example 6:
this embodiment 6 provides a computer-readable storage medium on which computer instructions are stored, which when executed by a processor implement the steps of the scene text detection method as in embodiment 1, or the steps of the correction method as in embodiment 2.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
In a typical configuration, a computer includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage, quantum memory, graphene-based storage media or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
Compared with the prior art, the method and the device adopt a model detection algorithm aiming at the target image to generate the question information detection result and the answer information detection result, perform OCR model recognition on the two detection results respectively to recognize the character line recognition result and the formula recognition result, improve the detection and recognition efficiency of the chart and the formula in the test question and the answer, and further improve the correction efficiency.
The invention utilizes a differentiable binarization processing model to implement binarization operation in a segmentation network so as to achieve the effect of combination optimization and realize the self-adaptation of the threshold value in each position of the thermodynamic diagram, thereby shortening the reasoning and calculating time of picture and character recognition, improving the recognition correction rate, having high accuracy of detection and recognition and reducing the requirement on post-processing.
Finally, it should be emphasized that the present invention is not limited to the above-described embodiments, but only the preferred embodiments of the invention have been described above, and the present invention is not limited to the above-described embodiments, and any modifications, equivalent substitutions, improvements, etc. within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (13)

1. A scene text detection method is characterized by comprising the following steps:
acquiring a target picture, wherein the target picture is sent by an intelligent terminal;
processing the target picture according to the feature pyramid network to generate a feature map F, predicting a probability map P and a threshold map T through the feature map F, and generating an approximate binary map B through the probability map P and the threshold map T;
and carrying out self-adaptive threshold processing on the approximate binary image B by utilizing a differentiable binarization processing model to obtain a first target result, wherein the first target result comprises different regions in the target image.
2. The method for detecting the scene text according to claim 1, characterized in that an approximate step function is introduced into the differentiable binarization processing model, the differentiable binarization processing is applied to the segmentation network, and when establishing the relationship between the probability map P and the threshold map T and the binary map B, the following formula is used:
Figure FDA0002808787990000011
where k is the amplification factor.
3. The method as claimed in claim 2, wherein the first objective result comprises at least one functional area generated by segmentation, and the functional area is calculated and identified to obtain a first identification contour, and the first identification contour is described by a set of line segments:
Figure FDA0002808787990000012
wherein n represents the number of vertices;
the polygon is reduced by the Vatti clipping algorithm, and the contraction offset D is calculated by the perimeter L and the area A:
Figure FDA0002808787990000013
where r is the contraction factor.
4. The scene text detection method according to claim 3, wherein the first objective result is optimized by using a loss function L, which is obtained by weight calculation from the probability map P loss Ls, the binary map B loss Lb and the threshold map T loss Lt: l ═ Ls + α × Lb + β × Lt, where α and β are weighting factors, and the probability map P penalty Ls and the binary map B penalty Lb use a binary cross entropy penalty function:
Figure FDA0002808787990000021
wherein S istA sample set representing a ratio of positive and negative samples of 1: 3;
lt uses the L1 distance loss function:
Figure FDA0002808787990000022
5. a batching method for applying the scene text detection method according to any one of claims 1 to 4, characterized in that the method comprises:
applying the scene text detection method according to any one of claims 1 to 4 to obtain a first target result;
and carrying out correction processing on the first target result to obtain a second target result.
6. A batching method as claimed in claim 5, characterized in that a simulated training model is added, in the training phase supervision being applied on said probability map P, threshold map T and approximated binary map B, wherein said threshold map T and approximated binary map B share the same supervision.
7. The approval method of claim 6, wherein in the process of dividing the first target result, a test paper contour, a text line contour and an question mark frame contour are determined, the test paper contour comprises the whole target picture, the text line contour comprises each line of text, the question mark frame contour comprises the question mark of each question, an upper boundary of each question is defined by the question mark frame contour and the text line contour, left and right end points of the upper boundary are extended, the upper boundary is connected with the test paper contour in a left and right extending direction, and the upper boundary divides the test paper contour into at least one test question area.
8. The approval method of claim 7, wherein the test question region is computationally identified, the first identification profile comprises a block profile, a figure profile and a handwritten profile, the block profile and the figure profile constitute question information, and the handwritten profile constitutes answer information.
9. The approval method of claim 8, wherein during the approval process of the first target result, the first target result comprises topic information and answer information, OCR recognition is performed on the topic information to obtain topic text recognition information, OCR recognition is performed on the answer information to obtain answer text recognition information;
extracting key words in the question text identification information according to the question text identification information and the graphic outline, and inquiring in a database according to the key words to obtain a similar original question group; and identifying a graph area in the similar question group, judging the graph similarity between the graph area and the graph outline, determining a final question from the similar question group when the graph similarity is greater than a preset similarity, and inquiring to obtain a corresponding answer analysis according to the final question.
10. An apparatus applied to the scene text detection method according to any one of claims 1 to 4, comprising:
an acquisition unit configured to acquire a target picture, the target picture being transmitted by an intelligent terminal;
a generating unit configured to process the target picture according to a feature pyramid network, generate a feature map F, predict a probability map P and a threshold map T from the feature map F, and generate an approximate binary map B from the probability map P and the threshold map T;
a binarization unit configured to perform adaptive threshold processing on the approximated binary image B by using a differentiable binarization processing model to obtain a first target result, wherein the first target result comprises different regions in the generated target image.
11. An apparatus applied to the correction method according to any one of claims 5 to 9, characterized by comprising:
the scene text detection device as claimed in claim 10, further comprising a modification unit configured to perform modification processing on the first target result to obtain a second target result.
12. An electronic device comprising a processor and a memory, wherein the memory has stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by the processor to implement the scene text detection method according to any one of claims 1 to 4, or to implement the correction method according to any one of claims 5 to 9.
13. A computer readable storage medium having stored thereon computer instructions, which, when executed by a processor, carry out the steps of the scene text detection method according to any one of claims 1 to 4, or carry out the steps of the correction method according to any one of claims 5 to 9.
CN202011385920.1A 2020-11-30 2020-11-30 Scene text detection method, correction method, device, electronic equipment and medium Pending CN112348028A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011385920.1A CN112348028A (en) 2020-11-30 2020-11-30 Scene text detection method, correction method, device, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011385920.1A CN112348028A (en) 2020-11-30 2020-11-30 Scene text detection method, correction method, device, electronic equipment and medium

Publications (1)

Publication Number Publication Date
CN112348028A true CN112348028A (en) 2021-02-09

Family

ID=74427864

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011385920.1A Pending CN112348028A (en) 2020-11-30 2020-11-30 Scene text detection method, correction method, device, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN112348028A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113496008A (en) * 2021-09-06 2021-10-12 北京壁仞科技开发有限公司 Method, computing device, and computer storage medium for performing matrix computations
CN113569838A (en) * 2021-08-30 2021-10-29 平安医疗健康管理股份有限公司 Text recognition method and device based on text detection algorithm
CN113610068A (en) * 2021-10-11 2021-11-05 江西风向标教育科技有限公司 Test question disassembling method, system, storage medium and equipment based on test paper image
CN113657213A (en) * 2021-07-30 2021-11-16 五邑大学 Text recognition method, text recognition device and computer-readable storage medium
CN114283431A (en) * 2022-03-04 2022-04-05 南京安元科技有限公司 Text detection method based on differentiable binarization

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781967A (en) * 2019-10-29 2020-02-11 华中科技大学 Real-time text detection method based on differentiable binarization
CN111428724A (en) * 2020-04-13 2020-07-17 北京星网锐捷网络技术有限公司 Test paper handwriting statistical method, device and storage medium
CN111652217A (en) * 2020-06-03 2020-09-11 北京易真学思教育科技有限公司 Text detection method and device, electronic equipment and computer storage medium
CN111753767A (en) * 2020-06-29 2020-10-09 广东小天才科技有限公司 Method and device for automatically correcting operation, electronic equipment and storage medium
CN111753120A (en) * 2020-06-29 2020-10-09 广东小天才科技有限公司 Method and device for searching questions, electronic equipment and storage medium
CN111860443A (en) * 2020-07-31 2020-10-30 上海掌学教育科技有限公司 Chinese work topic character recognition method, search method, server and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781967A (en) * 2019-10-29 2020-02-11 华中科技大学 Real-time text detection method based on differentiable binarization
CN111428724A (en) * 2020-04-13 2020-07-17 北京星网锐捷网络技术有限公司 Test paper handwriting statistical method, device and storage medium
CN111652217A (en) * 2020-06-03 2020-09-11 北京易真学思教育科技有限公司 Text detection method and device, electronic equipment and computer storage medium
CN111753767A (en) * 2020-06-29 2020-10-09 广东小天才科技有限公司 Method and device for automatically correcting operation, electronic equipment and storage medium
CN111753120A (en) * 2020-06-29 2020-10-09 广东小天才科技有限公司 Method and device for searching questions, electronic equipment and storage medium
CN111860443A (en) * 2020-07-31 2020-10-30 上海掌学教育科技有限公司 Chinese work topic character recognition method, search method, server and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MINGHUI LIAO等: "Real-Time Scene Text Detection with Differentiable Binarization", 《PROCEEDINGS OF THE AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE》, vol. 34, no. 7, pages 11475 - 11477 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657213A (en) * 2021-07-30 2021-11-16 五邑大学 Text recognition method, text recognition device and computer-readable storage medium
CN113569838A (en) * 2021-08-30 2021-10-29 平安医疗健康管理股份有限公司 Text recognition method and device based on text detection algorithm
CN113496008A (en) * 2021-09-06 2021-10-12 北京壁仞科技开发有限公司 Method, computing device, and computer storage medium for performing matrix computations
CN113496008B (en) * 2021-09-06 2021-12-03 北京壁仞科技开发有限公司 Method, computing device, and computer storage medium for performing matrix computations
CN113610068A (en) * 2021-10-11 2021-11-05 江西风向标教育科技有限公司 Test question disassembling method, system, storage medium and equipment based on test paper image
CN114283431A (en) * 2022-03-04 2022-04-05 南京安元科技有限公司 Text detection method based on differentiable binarization

Similar Documents

Publication Publication Date Title
US11790641B2 (en) Answer evaluation method, answer evaluation system, electronic device, and medium
US20220230420A1 (en) Artificial intelligence-based object detection method and apparatus, device, and storage medium
CN112348028A (en) Scene text detection method, correction method, device, electronic equipment and medium
CN110738207B (en) Character detection method for fusing character area edge information in character image
CN112818975B (en) Text detection model training method and device, text detection method and device
CN106846362B (en) Target detection tracking method and device
CN111488770A (en) Traffic sign recognition method, and training method and device of neural network model
CN108108731B (en) Text detection method and device based on synthetic data
CN111950528B (en) Graph recognition model training method and device
CN111507250B (en) Image recognition method, device and storage medium
CN115131797A (en) Scene text detection method based on feature enhancement pyramid network
CN110443235B (en) Intelligent paper test paper total score identification method and system
CN113591746B (en) Document table structure detection method and device
CN112307919A (en) Improved YOLOv 3-based digital information area identification method in document image
CN112347997A (en) Test question detection and identification method and device, electronic equipment and medium
CN114463770A (en) Intelligent question-cutting method for general test paper questions
CN113436222A (en) Image processing method, image processing apparatus, electronic device, and storage medium
CN110210480A (en) Character recognition method, device, electronic equipment and computer readable storage medium
CN116189162A (en) Ship plate detection and identification method and device, electronic equipment and storage medium
CN114882204A (en) Automatic ship name recognition method
CN110287981A (en) Conspicuousness detection method and system based on biological enlightening representative learning
CN116259050B (en) Method, device, equipment and detection method for positioning and identifying label characters of filling barrel
CN113065548A (en) Feature-based text detection method and device
CN115049851B (en) Target detection method, device and equipment terminal based on YOLOv5 network
CN114399626B (en) Image processing method, apparatus, computer device, storage medium, and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination