CN108764228A - Word object detection method in a kind of image - Google Patents
Word object detection method in a kind of image Download PDFInfo
- Publication number
- CN108764228A CN108764228A CN201810520329.9A CN201810520329A CN108764228A CN 108764228 A CN108764228 A CN 108764228A CN 201810520329 A CN201810520329 A CN 201810520329A CN 108764228 A CN108764228 A CN 108764228A
- Authority
- CN
- China
- Prior art keywords
- layer
- frame
- bounding box
- feature
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Image Analysis (AREA)
Abstract
The present invention provides the word object detection method in a kind of image, belongs to pattern-recognition, technical field of image processing.It includes the following steps:Step 1:The convolutional neural networks of the feature based layer fusion end to end of structure one, the target for different scale in prognostic chart picture;Step 2:According to the candidate frame that Feature-level fusion network exports, the word object detection results in final image are obtained using bounding box blending algorithm.The image object detection method of the present invention is that the band of position of word target is extracted from natural scene image, improves the efficiency and accuracy rate of subsequent target identification.The bounding box of the neural network prediction target using the Feature-level fusion based on deep learning is proposed, and the bounding box of prediction is merged using bounding box blending algorithm, can effectively detect the band of position of pictograph target.
Description
Technical field
The invention belongs to pattern-recognition, technical field of image processing, more particularly to are word target detections in a kind of image
Method.
Background technology
With the development of internet and multimedia technology, more and more information carriers exist in the form of images.Image
In include abundant visual information:Word, color, shape, pattern, position etc., these information can help the mankind to divide
Analyse the meaning of scene.Technology currently based on the word target detection of image is square in Car license recognition, traffic sign analysis etc.
Face has a wide range of applications.But because shooting the randomness of image, the word in image is because of visitors such as deformation, incompleteness, fuzzy fractures
Sight factor can generate interference to the detection of character area.In addition, general background is more complicated in scene image, word and background
There may be similar textures, this can increase the difficulty of word target detection.Traditional word object detection method is needed to text
Word target carries out feature selecting, and the position of word, effect unobvious are obtained using a large amount of heuristic rule.
Invention content
The object of the present invention is to provide word object detection methods in a kind of image based on deep learning, to solve image
In word target orientation problem, the present invention carrys out position and the confidence level of predictive text target object by neural network, then
By candidate frame aggregation algorithms, all candidate frames of output are merged, obtain the final bounding box of image object, is i.e. image object is examined
Survey result.
The technical solution adopted by the present invention to solve the technical problems is:
The pictograph object detection method that a kind of feature based converged network and bounding box blending algorithm are combined, the party
Method includes the following steps:
First, a convolutional neural networks end to end are designed, and there are multiple output layers, multiple output layers have strong table
Danone power.The different output layer of network can predict the target object of different scale, wherein high-rise output layer predicts large scale
Target object, low output layer predicts the target object of small scale.The position of the output layer output target object of network and confidence
Degree, obtains a series of boundary candidate frame.
Then the candidate text box of neural network output is post-processed, by merging multiple boundary candidate frames, is obtained
The optimum detection position of target object.
Further, the convolutional neural networks of one feature based layer of structure fusion, the position for detecting word target,
Include the following steps:
(1) convolutional neural networks of a propagated forward are built, i.e. pre-network is VGG-16, wherein last two layers complete
Articulamentum replaces with convolutional layer, after pre-network structure, is added to additional convolutional layer and pond layer.
(2) warp lamination is separately added between highest characteristic layer and other characteristic layers, the deconvolution behaviour in warp lamination
Make to be similar to bilinearity difference, selectively characteristic pattern can be amplified so that the characteristic pattern ruler in top characteristic layer
Degree becomes the size as low layer scale.The calculation formula of characteristic pattern size of warp lamination output is:
Wherein, i indicates that the size of warp lamination input feature vector figure, k indicate that the size of convolution kernel, s indicate step sizes, p
Indicate filling back gauge.According to the size of characteristic layer input feature vector figure and output characteristic pattern, high-rise characteristic layer passes through warp lamination
Corresponding parameter is set, can be obtained and low layer characteristic pattern of a size.
(3) characteristic pattern after deconvolution is merged with the characteristic pattern of low-level feature layer using element dot product mode, is obtained
New characteristic layer.New characteristic layer is as output layer, the position for exporting target object and confidence level, wherein two features
The element dot product operations of figure, are equal to two matrix dot product operations, and two matrix corresponding elements are multiplied:
(4) a series of acquiescence frame that fixed sizes are defined on output layer defines a series of acquiescence frame of fixed sizes, defeated
Go out the confidence level of layer output text and the offset coordinates relative to acquiescence frame.Assuming that the size of image and characteristic pattern is (w respectivelyim,
him) and (wmap,hmap), the position (i, j) corresponds to an acquiescence frame b in characteristic pattern0=(x0,y0,w0,h0), output layer it is defeated
Go out for (Δ x, Δ y, Δ w, Δ h, c), wherein (Δ x, Δ y, Δ w, Δ h) indicate prediction text border frame relative to acquiescence frame
Offset coordinates, c indicate text confidence level.The text border frame of prediction is b=(x, y, w, h), wherein:
X=x0+w0△x
Y=y0+h0△y
W=w0+exp△x
H=h0+exp△y
X, y indicate that the transverse and longitudinal coordinate in the upper left corner of the text box of prediction, w, h are the width and height of text box.
For Feature-level fusion neural network, setting uses strategy to select positive negative sample, specific steps packet for neural network
It includes:
(1) frame, the feature of N × N sizes are given tacit consent to using the schema creation of sliding window on the characteristic pattern of each output layer
Figure has N × N number of characteristic point, according to the transverse and longitudinal of target object ratio, each characteristic point correspond to six kinds of different transverse and longitudinals than acquiescence frame:
ar={ a1,a2,a3,a4,a5,a6}
(2) relationship between the true tag frame (ground truth) of target object in image and acquiescence frame is established, and
Acquiescence frame is labeled.Acquiescence frame is labeled using jaccard Duplication as matching index, jaccard Duplication
Higher to show that Sample Similarity is higher, two samples more match.Given acquiescence frame A and true tag frame B, acquiescence frame and true mark
Sign the ratio of the intersection area and union area of the jaccard Duplication expression A and B of frame:
For acquiescence frame using jaccard Duplication more than or equal to 0.5 as matched acquiescence frame, jaccard Duplication is small
In 0.5 acquiescence frame as unmatched acquiescence frame.Wherein, matched acquiescence frame is made as positive sample, unmatched acquiescence frame
For negative sample.
(3) after sample mark, the negative sample given tacit consent in frame is ranked up by confidence level loss, selects confidence level loss
It is worth negative sample of the higher acquiescence frame as network training, the ratio of the positive negative sample of training is made to be maintained at 1:3.
For Feature-level fusion network, the object function of Feature-level fusion network is set, specific steps include:
(1) setting target loss function is the weighted sum of positioning loss and confidence level loss:
Wherein, x indicates that matching result matrix, c indicate that confidence level, l indicate that predicted position, g indicate the actual position of target,
N indicates the number of acquiescence frame matching true tag frame;Wherein, weight coefficient α is set as 1;
(2) setting positioning loss is LlocFor the predicted position of target and the L2 losses of actual position, setting confidence level is lost
LconfThe softmax losses of two classification of position:
For Fusion Features network, each output is arranged in the target object bounding box of multiple output layer prediction different scales
The scale of layer output object boundary frame, specific steps include:
(1) select the characteristic layer that top characteristic layer and top characteristic layer are formed with other Feature-level fusions as net
The output layer of network.
(2) size that frame is given tacit consent in each output layer is set, and output layer exports object boundary frame relative to the inclined of acquiescence frame
Coordinate and confidence level are moved, candidate object boundary frame is obtained.Assuming that there is m output layer in network, each output layer corresponds to one
Characteristic pattern, the scale of acquiescence frame is in each characteristic pattern:
Each width for giving tacit consent to frame and height are respectively:
Wherein, Smin, SmaxIndicate that the scale of lowermost layer and top acquiescence frame, low layer output layer predict small scale respectively
Target object, the target object of high-rise output layer prediction large scale.The acquiescence frame of output layer has on different characteristic patterns
Different scales has different transverse and longitudinal ratios in the same characteristic pattern again, correspondingly, whole network can pass through multiple output layers
Predict different scale and target object of different shapes.
Further, multiple candidate target bounding boxes of Feature-level fusion network output are carried out using bounding box blending algorithm
Post-processing, obtains the final position of image object, the specific steps of bounding box blending algorithm include:
(1) the boundary candidate frame of target is sorted from high to low according to the value of confidence level, chooses first boundary candidate frame
Bounding box as present fusion;
(2) using other boundary candidate frames as the bounding box being fused, compare present fusion bounding box and be fused boundary
If the confidence levels of two text boxes of confidence level be all higher than threshold alpha, calculate present fusion bounding box and be fused bounding box
Area overlaps rate, otherwise, executes step (3).Wherein, area overlaps rate and refers to that the overlapping area of two bounding boxes accounts for two sides
The ratio of boundary's frame union area:
Wherein, area (C) and area (G) is respectively the area of text box C and text box G:
(3) if the area of two boundary candidate frames overlaps rate and is optionally greater than threshold value beta, two bounding boxes are merged, after fusion
Bounding box be two bounding boxes extraneous rectangle frame, confidence level be fusion bounding box confidence level.
(4) if the area of two boundary candidate frames overlaps rate and is less than threshold value beta, two bounding boxes of calculating include overlapping
Rate removes the bounding box if two bounding boxes are more than threshold gamma comprising Duplication, otherwise, executes step (5).Wherein, it wraps
Refer to that the overlapping area of two bounding boxes accounts for the ratio of another bounding box area containing Duplication:
Wherein, area (ti) indicate rectangle frame tiArea, area (ti) indicate rectangle frame tjArea.Ii(ti,tj) table
Show rectangle frame tiRelative to rectangle frame tiInclude Duplication.
(5) if only remaining the last one text box, algorithm terminates, and selects text box of the confidence level higher than threshold value δ as most
Otherwise whole object detection results update the boundary candidate frame of image object, according to the sequence arranged before, take it is next not
The bounding box being fused executes step (2) as fusing text frame.
Fusion Features network exports the boundary candidate frame of target, and bounding box blending algorithm handles boundary candidate frame,
Finally obtain the testing result of image object.
Compared to the prior art the present invention, has the following advantages and effect:Image object detection method proposed by the present invention
It is the band of position that target object is oriented from natural scene.This method utilizes multiple output layers in single Neural straight
The band of position of prediction target object is connect, recognition efficiency is high, while only there are one post-processing algorithms for merging all candidates
Bounding box obtains the testing result of final image object.
Description of the drawings
Fig. 1 is the flow chart of word object detection method in image related to the technical solution of the present invention.
Fig. 2 is the network structure of Feature-level fusion network related to the technical solution of the present invention.
Fig. 3 is the output layer of Feature-level fusion network related to the technical solution of the present invention.
Fig. 4 is Feature-level fusion network samples mode related to the technical solution of the present invention.
Fig. 5 is the boundary candidate of the text objects related to the technical solution of the present invention exported using Feature-level fusion network
Frame.
Fig. 6 is the algorithm flow chart of bounding box blending algorithm related to the technical solution of the present invention.
Fig. 7 is related to the technical solution of the present invention using bounding box blending algorithm treated testing result figure.
Specific implementation mode
The present invention is described in further detail below in conjunction with the accompanying drawings and by embodiment, and following embodiment is to this hair
Bright explanation and the invention is not limited in following embodiments.
The present invention examines word target with the method that bounding box blending algorithm is combined using Feature-level fusion network
It surveys, is broadly divided into two steps, respectively:(1) band of position for using Feature-level fusion neural network forecast image object, obtains text
The boundary candidate frame of word target;(2) bounding box blending algorithm is used to obtain final detection result.It is as shown in Figure 1 present invention text
The flow chart of word target detection.
With the development of internet and multimedia technology, more and more information carriers exist in the form of images, image
There is be widely applied in actual life for target detection.Traditional text detection algorithm needs a large amount of heuristic rule sieve
Select text filed, effect is not obvious, and method of the patent of the present invention based on deep learning builds a characteristic layer end to end
Converged network, can the position of word target and confidence level directly in prognostic chart picture.
The neural network of feature based layer fusion is built, Fig. 2 shows the network structure of Feature-level fusion network.With
The depth of network, the characteristic pattern scale in characteristic layer tapers into, and the ability to express of characteristic pattern is also increasingly stronger, will be high-rise special
Sign layer, which with low-level feature layer merge, is combined into new feature layer as output layer, can enhance the ability to express of output layer.Such as Fig. 3
Shown, there are two types of connection types in overall structure for Fusion Features network, and one is bottom-up connection types, and one is certainly
Push up downward connection type, such as Fig. 3.Bottom-up is the propagated forward process of network, the size of characteristic pattern by convolutional layer and
It can be tapered into after the layer of pond, whole network is pyramid structure in hierarchical structure.Top-down connection uses deconvolution,
By the Fusion Features of network high level to low-level feature layer, new output layer is built.As shown in figure 3, the output of Fusion Features network
Layer be A, B ', C ', D ', wherein characteristic layer A, B merge to form new characteristic layer B ', and characteristic layer A, C merge to form new characteristic layer
C ', characteristic layer A, D merge to form new characteristic layer D ', since characteristic layer A is top characteristic layer, still as the output of network
Layer.
The construction step of Feature-level fusion network is as follows:
Step (1):On the basis of the convolutional neural networks for building a propagated forward, wherein last two layers of full articulamentum
Convolutional layer is replaced with, pre-network is VGG-16, after pre-network structure, adds additional convolutional layer and pond layer.
Step (2):On the basis of propagated forward network, it will be separately added between top characteristic layer and other characteristic layers
Warp lamination makes the characteristic pattern scale after deconvolution and the scale of characteristic pattern in low-level feature layer be consistent.In warp lamination
Deconvolution operation be similar to bilinearity difference, selectively characteristic pattern can be amplified so that in top characteristic layer
Characteristic pattern scale become the size as low layer scale.The calculation formula of characteristic pattern size of warp lamination output is:
Wherein, i indicates that the size of warp lamination input feature vector figure, k indicate that the size of convolution kernel, s indicate step sizes, p
Indicate filling back gauge.According to the size of characteristic layer input feature vector figure and output characteristic pattern, high-rise characteristic layer passes through warp lamination
Corresponding parameter is set, can be obtained and low layer characteristic pattern of a size.
Step (3):Characteristic pattern after deconvolution is merged with the characteristic pattern of low-level feature layer using element dot product mode,
Obtain new characteristic layer.New characteristic layer is as output layer, the position for exporting target object and confidence level, wherein two
The element dot product operations of characteristic pattern, are equal to two matrix dot product operations, and two matrix corresponding elements are multiplied:
Step (4):A series of acquiescence frame that fixed sizes are defined on output layer, defines a series of acquiescence of fixed sizes
Frame, output layer export the confidence level of text and the offset coordinates relative to acquiescence frame.Assuming that the size of image and characteristic pattern is distinguished
It is (wim, him) and (wmap,hmap), the position (i, j) corresponds to an acquiescence frame b in characteristic pattern0=(x0,y0,w0,h0), output
The output of layer is (Δ x, Δ y, Δ w, Δ h, c), wherein (Δ x, Δ y, Δ w, Δ h) indicate prediction text border frame relative to
Give tacit consent to the offset coordinates of frame, c indicates the confidence level of text.The text border frame of prediction is b=(x, y, w, h), wherein:
X=x0+w0△x
Y=y0+h0△y
W=w0+exp△x
H=h0+exp△y
X, y indicate that the transverse and longitudinal coordinate in the upper left corner of the text box of prediction, w, h are the width and height of text box.
It is characterized a layer converged network setting sampling policy, positive negative sample is obtained, needs to define on the characteristic pattern of output layer,
Give tacit consent to frame, and establish the relationship between the true tag frame of target object in image and acquiescence frame, selects positive negative sample.Specific packet
Include following steps:
Step (1):Frame is given tacit consent to using the schema creation of sliding window on the characteristic pattern of each output layer, N × N sizes
Characteristic pattern has N × N number of characteristic point, according to the transverse and longitudinal of target object ratio, six kinds of transverse and longitudinals of each characteristic point than acquiescence frame:
ar={ a1,a2,a3,a4,a5,a6}
Step (2):Establish the pass between the true tag frame (ground truth) of target object in image and acquiescence frame
System, and acquiescence frame is labeled.Acquiescence frame is labeled using jaccard Duplication as matching index, jaccard weights
Folded rate is higher to show that Sample Similarity is higher, and two samples more match.Given acquiescence frame A and true tag frame B, acquiescence frame with it is true
The jaccard Duplication of real label frame indicates the ratio of the intersection area and union area of A and B:
For acquiescence frame using jaccard Duplication more than or equal to 0.5 as matched acquiescence frame, jaccard Duplication is small
In 0.5 acquiescence frame as unmatched acquiescence frame.Wherein, matched acquiescence frame is made as positive sample, unmatched acquiescence frame
For negative sample.
Text objects in detection image are characterized converged network and select positive negative sample, need to establish image true tag
Relationship between frame and acquiescence frame, such as Fig. 4.The true tag frame of text objects " Marlboro " is the top in figure in Fig. 4 (a)
Solid box, the true tag frame of text " LIGHTS " is the solid box of the lower section in figure.The dotted line frame of Fig. 4 (b) and Fig. 4 (c)
The acquiescence frame on the characteristic pattern of 8 × 8 sizes and the characteristic pattern of 4 × 4 sizes is indicated respectively.Wherein, matched text " LIGHTS " has
Two dotted line frames, for matched text " Marlboro " there are one dotted line frame, the matched acquiescence frame of mark is unmatched as positive sample
Frame is given tacit consent to as negative sample.
Step (3):After sample mark, the negative sample given tacit consent in frame is ranked up by confidence level loss, selects confidence
Negative sample of the higher acquiescence frame of penalty values as network training is spent, the ratio of the positive negative sample of training is made to be maintained at 1:3.
For Feature-level fusion network.The object function of Feature-level fusion network is set, following steps are specifically included:
(1):The weighted sum that target loss function is positioning loss and confidence level loss is set:
Wherein, x indicates that matching result matrix, c indicate that confidence level, l indicate that predicted position, g indicate the actual position of target,
N indicates the number of acquiescence frame matching true tag frame;Wherein, weight coefficient α is set as 1;
(2):Setting positioning loss is LlocFor the predicted position of target and the L2 losses of actual position, setting confidence level is damaged
Lose LconfThe softmax losses of two classification of position:
Since the corresponding characteristic pattern scale of output layer different in network is different, target of the different output layers to prediction
Scale is different, and high-rise output layer predicts that the target object of large scale, the output layer of low layer predict the target object of small scale.
The scale of Feature-level fusion network output layer output object boundary frame, boundary candidate frame such as Fig. 5 of Fusion Features network are set
It is shown, specifically include following steps:
(1) select the characteristic layer that top characteristic layer and top characteristic layer are formed with other Feature-level fusions as net
The output layer of network.
(2) the corresponding characteristic pattern scale of output layer different in network is different, it is assumed that has m output layer in network, often
A output layer corresponds to a characteristic pattern, and the scale that frame is given tacit consent in each characteristic pattern is:
Each width for giving tacit consent to frame and height are respectively:
Wherein, Smin, SmaxIndicate that the scale of lowermost layer and top acquiescence frame, low layer output layer predict small scale respectively
Target object, the target object of high-rise output layer prediction large scale.The acquiescence frame of output layer has on different characteristic patterns
Different scales has different transverse and longitudinal ratios in the same characteristic pattern again, correspondingly, whole network can pass through multiple output layers
Predict different scale and text of different shapes.
Feature-level fusion network directly predicts that the bounding box of target object, each bounding box can be obtained using multiple output layers
To a confidence score.The bounding box that output layer predicts can there is a situation where overlapped, use bounding box blending algorithm
The higher bounding box of confidence level in contiguous range can be chosen, and merges overlapped boundary candidate frame, obtains optimal mesh
Test position is marked, following steps are specifically included:
(1) the boundary candidate frame of word target is sorted from high to low according to the value of confidence level, chooses first candidate side
Bounding box of boundary's frame as present fusion;
(2) using other boundary candidate frames as the bounding box being fused, compare present fusion bounding box and be fused boundary
If the confidence levels of two text boxes of confidence level be all higher than threshold alpha, calculate present fusion bounding box and be fused bounding box
Area overlaps rate, otherwise, executes step (3).Wherein, area overlaps rate and refers to that the overlapping area of two bounding boxes accounts for two sides
The ratio of boundary's frame union area:
Wherein, area (C) and area (G) is respectively the area of text box C and text box G:
(3) if the area of two boundary candidate frames overlaps rate and is optionally greater than threshold value beta, two bounding boxes are merged, after fusion
Bounding box be two bounding boxes extraneous rectangle frame, confidence level be fusion bounding box confidence level.
(4) if the area of two boundary candidate frames overlaps rate and is less than threshold value beta, two bounding boxes of calculating include overlapping
Rate removes the bounding box if two bounding boxes are more than threshold gamma comprising Duplication, otherwise, executes step (5).Wherein, it wraps
Refer to that the overlapping area of two bounding boxes accounts for the ratio of another bounding box area containing Duplication:
Wherein, area (ti) indicate rectangle frame tiArea, area (ti) indicate rectangle frame tjArea.Ii(ti,tj) table
Show rectangle frame tiRelative to rectangle frame tiInclude Duplication.
(5) if only remaining the last one text box, algorithm terminates, and selects text box of the confidence level higher than threshold value δ as most
Otherwise whole object detection results update the boundary candidate frame of image object, according to the sequence arranged before, take it is next not
The bounding box being fused executes step (2) as fusing text frame.
It is merged using two bounding boxes of above-mentioned bounding box blending algorithm pair, the flow chart of algorithm is as shown in Figure 6, wherein IOU
(ti, tj) indicate bounding box tiAnd tjIOU overlap rate, Fusion (ti,tj) indicate bounding box tiAnd tjBounding box after merging,
For the boundary rectangle frame of two bounding boxes;Ii(ti,tj) and Ij(ti,tj) bounding box t is indicated respectivelyiAnd tjInclude Duplication.Side
Boundary's frame blending algorithm includes three threshold values, respectively:Confidence threshold value α, IOU overlaps rate threshold value beta, includes Duplication threshold gamma.
Confidence threshold value determines whether two bounding boxes merge, and when the confidence level of two bounding boxes is all higher than α, two bounding boxes carry out
Fusion.
The last text objects testing result obtained using bounding box blending algorithm, as shown in Figure 7.Bounding box blending algorithm
The position relationship and confidence level of neighborhood boundary candidate frame is utilized, boundary candidate frame is merged, obtains final image mesh
Mark testing result.Described in this specification above content is only illustrations made for the present invention.
Those skilled in the art can make various modifications to described specific embodiment
Or supplement or substitute by a similar method, content without departing from description of the invention or surmount the claims institute
The range of definition, is within the scope of protection of the invention.
Claims (4)
1. word object detection method in a kind of image, which is characterized in that include the following steps:
Step 1:The convolutional neural networks of the feature based layer fusion end to end of structure one, for different rulers in prognostic chart picture
The word target of degree;
Step 2:According to the candidate frame that Feature-level fusion network exports, final image text is obtained using bounding box blending algorithm
Word object detection results.
2. word object detection method in a kind of image according to claim 1, which is characterized in that structure one is end-to-end
The convolutional neural networks of feature based layer fusion specifically include following step for the position of the word target in detection image
Suddenly:
(1) convolutional neural networks of a propagated forward are built, pre-network is VGG-16, wherein last two layers of full articulamentum
Convolutional layer is replaced with, after pre-network structure, is added to additional convolutional layer and pond layer;
(2) on the basis of propagated forward network, deconvolution will be separately added between top characteristic layer and other characteristic layers
Layer, makes the characteristic pattern scale after deconvolution and the scale of characteristic pattern in low-level feature layer be consistent;
(3) characteristic pattern after deconvolution is merged with the characteristic pattern of low-level feature layer using element dot product mode, is obtained new
Characteristic layer, new characteristic layer is as output layer, the position for exporting target object and confidence level;
(4) it defines a series of acquiescence frame of fixed sizes on output layer, defines the confidence level of output layer output text and opposite
In the offset coordinates of acquiescence frame.
3. word object detection method in a kind of image according to claim 2, which is characterized in that feature based layer merges
Convolutional neural networks, setting Feature-level fusion network output layer export object boundary frame scale, specifically include:
(1) select the characteristic layer that top characteristic layer and top characteristic layer are formed with other Feature-level fusions as network
Output layer;
(2) size that frame is given tacit consent in each output layer is set, and output layer exports object boundary frame and sat relative to the offset of acquiescence frame
Mark and confidence level, obtain candidate object boundary frame, and setting low layer output layer predicts the target object of small scale, high-rise output layer
Predict the word target object of large scale.
4. word object detection method in a kind of image according to claim 1, which is characterized in that Feature-level fusion network
The boundary candidate frame of output is obtained the final position of word target using bounding box blending algorithm, specifically includes following steps:
(1) the boundary candidate frame of word target is sorted from high to low according to the value of confidence level, chooses first boundary candidate frame
Bounding box as present fusion;
(2) using other boundary candidate frames as the bounding box being fused, compare present fusion bounding box and be fused setting for boundary
If the confidence level of two text boxes of reliability is all higher than threshold alpha, present fusion bounding box and the area for being fused bounding box are calculated
Otherwise overlapping rate executes step (3);
(3) if the area of two boundary candidate frames overlaps rate and is optionally greater than threshold value beta, two bounding boxes, the side after fusion are merged
Boundary's frame is the extraneous rectangle frame of two bounding boxes, and confidence level is to merge the confidence level of bounding box;
(4) if the area of two boundary candidate frames overlaps rate and is less than threshold value beta, two bounding boxes of calculating include Duplication, such as
Two bounding boxes of fruit are more than threshold gamma comprising Duplication, remove the bounding box, otherwise, execute step (5);
(5) if only remaining the last one text box, algorithm terminates, and selects text box of the confidence level higher than threshold value δ as final mesh
Mark testing result;
Otherwise, the boundary candidate frame of more new literacy target takes next boundary not being fused according to the sequence arranged before
Frame executes step (2) as fusing text frame.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810520329.9A CN108764228A (en) | 2018-05-28 | 2018-05-28 | Word object detection method in a kind of image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810520329.9A CN108764228A (en) | 2018-05-28 | 2018-05-28 | Word object detection method in a kind of image |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108764228A true CN108764228A (en) | 2018-11-06 |
Family
ID=64005915
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810520329.9A Pending CN108764228A (en) | 2018-05-28 | 2018-05-28 | Word object detection method in a kind of image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108764228A (en) |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109299274A (en) * | 2018-11-07 | 2019-02-01 | 南京大学 | A kind of natural scene Method for text detection based on full convolutional neural networks |
CN109458978A (en) * | 2018-11-07 | 2019-03-12 | 五邑大学 | A kind of Downtilt measurement method based on multiple scale detecting algorithm |
CN109918951A (en) * | 2019-03-12 | 2019-06-21 | 中国科学院信息工程研究所 | A kind of artificial intelligence process device side channel system of defense based on interlayer fusion |
CN110110722A (en) * | 2019-04-30 | 2019-08-09 | 广州华工邦元信息技术有限公司 | A kind of region detection modification method based on deep learning model recognition result |
CN110135423A (en) * | 2019-05-23 | 2019-08-16 | 北京阿丘机器人科技有限公司 | The training method and optical character recognition method of text identification network |
CN110163081A (en) * | 2019-04-02 | 2019-08-23 | 宜通世纪物联网研究院(广州)有限公司 | SSD-based real-time regional intrusion detection method, system and storage medium |
CN110263877A (en) * | 2019-06-27 | 2019-09-20 | 中国科学技术大学 | Scene character detecting method |
CN110414417A (en) * | 2019-07-25 | 2019-11-05 | 电子科技大学 | A kind of traffic mark board recognition methods based on multi-level Fusion multi-scale prediction |
CN110458170A (en) * | 2019-08-06 | 2019-11-15 | 汕头大学 | Chinese character positioning and recognition methods in a kind of very noisy complex background image |
CN110674804A (en) * | 2019-09-24 | 2020-01-10 | 上海眼控科技股份有限公司 | Text image detection method and device, computer equipment and storage medium |
CN110796640A (en) * | 2019-09-29 | 2020-02-14 | 郑州金惠计算机系统工程有限公司 | Small target defect detection method and device, electronic equipment and storage medium |
CN111046923A (en) * | 2019-11-26 | 2020-04-21 | 佛山科学技术学院 | Image target detection method and device based on bounding box and storage medium |
CN111222368A (en) * | 2018-11-26 | 2020-06-02 | 北京金山办公软件股份有限公司 | Method and device for identifying document paragraph and electronic equipment |
CN111598082A (en) * | 2020-04-24 | 2020-08-28 | 云南电网有限责任公司电力科学研究院 | Electric power nameplate text detection method based on full convolution network and instance segmentation network |
CN111680628A (en) * | 2020-06-09 | 2020-09-18 | 北京百度网讯科技有限公司 | Text box fusion method, device, equipment and storage medium |
TWI706336B (en) * | 2018-11-19 | 2020-10-01 | 中華電信股份有限公司 | Image processing device and method for detecting and filtering text object |
CN111783685A (en) * | 2020-05-08 | 2020-10-16 | 西安建筑科技大学 | Target detection improved algorithm based on single-stage network model |
CN111844101A (en) * | 2020-07-31 | 2020-10-30 | 中国科学技术大学 | Multi-finger dexterous hand sorting planning method |
WO2020221298A1 (en) * | 2019-04-30 | 2020-11-05 | 北京金山云网络技术有限公司 | Text detection model training method and apparatus, text region determination method and apparatus, and text content determination method and apparatus |
CN111986252A (en) * | 2020-07-16 | 2020-11-24 | 浙江工业大学 | Method for accurately positioning candidate bounding box in target segmentation network |
CN111985465A (en) * | 2020-08-17 | 2020-11-24 | 中移(杭州)信息技术有限公司 | Text recognition method, device, equipment and storage medium |
CN112419310A (en) * | 2020-12-08 | 2021-02-26 | 中国电子科技集团公司第二十研究所 | Target detection method based on intersection and fusion frame optimization |
CN112487848A (en) * | 2019-09-12 | 2021-03-12 | 京东方科技集团股份有限公司 | Character recognition method and terminal equipment |
CN112906699A (en) * | 2020-12-23 | 2021-06-04 | 深圳市信义科技有限公司 | Method for detecting and identifying enlarged number of license plate |
CN113269049A (en) * | 2021-04-30 | 2021-08-17 | 天津科技大学 | Method for detecting handwritten Chinese character area |
CN113850264A (en) * | 2019-06-10 | 2021-12-28 | 创新先进技术有限公司 | Method and system for evaluating target detection model |
US20220083819A1 (en) * | 2019-11-15 | 2022-03-17 | Salesforce.Com, Inc. | Image augmentation and object detection |
CN114359889A (en) * | 2022-03-14 | 2022-04-15 | 北京智源人工智能研究院 | Text recognition method for long text data |
WO2022150978A1 (en) * | 2021-01-12 | 2022-07-21 | Nvidia Corporation | Neighboring bounding box aggregation for neural networks |
CN114898171A (en) * | 2022-04-07 | 2022-08-12 | 中国科学院光电技术研究所 | Real-time target detection method suitable for embedded platform |
CN115080051A (en) * | 2022-05-31 | 2022-09-20 | 武汉大学 | GUI code automatic generation method based on computer vision |
CN117048773A (en) * | 2023-08-01 | 2023-11-14 | 黄岛检验认证有限公司 | Automatic tracking water gauge light supplementing double-shaft camera and water gauge observation method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106570497A (en) * | 2016-10-08 | 2017-04-19 | 中国科学院深圳先进技术研究院 | Text detection method and device for scene image |
CN106650725A (en) * | 2016-11-29 | 2017-05-10 | 华南理工大学 | Full convolutional neural network-based candidate text box generation and text detection method |
CN107563381A (en) * | 2017-09-12 | 2018-01-09 | 国家新闻出版广电总局广播科学研究院 | The object detection method of multiple features fusion based on full convolutional network |
CN107688808A (en) * | 2017-08-07 | 2018-02-13 | 电子科技大学 | A kind of quickly natural scene Method for text detection |
-
2018
- 2018-05-28 CN CN201810520329.9A patent/CN108764228A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106570497A (en) * | 2016-10-08 | 2017-04-19 | 中国科学院深圳先进技术研究院 | Text detection method and device for scene image |
CN106650725A (en) * | 2016-11-29 | 2017-05-10 | 华南理工大学 | Full convolutional neural network-based candidate text box generation and text detection method |
CN107688808A (en) * | 2017-08-07 | 2018-02-13 | 电子科技大学 | A kind of quickly natural scene Method for text detection |
CN107563381A (en) * | 2017-09-12 | 2018-01-09 | 国家新闻出版广电总局广播科学研究院 | The object detection method of multiple features fusion based on full convolutional network |
Non-Patent Citations (3)
Title |
---|
CHENG-YANG FU等: ""DSSD : Deconvolutional Single Shot Detector"", 《ARXIV》 * |
MINGHUI LIAO等: "TextBoxes: A Fast Text Detector with a Single Deep Neural Network", 《ADVANCEMENT OF ARTIFICIAL INTELLIGENCE(AAAI)》 * |
WEILIU等: ""SSD: Single Shot MultiBox Detector"", 《SPRINGER》 * |
Cited By (44)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109299274A (en) * | 2018-11-07 | 2019-02-01 | 南京大学 | A kind of natural scene Method for text detection based on full convolutional neural networks |
CN109458978A (en) * | 2018-11-07 | 2019-03-12 | 五邑大学 | A kind of Downtilt measurement method based on multiple scale detecting algorithm |
CN109299274B (en) * | 2018-11-07 | 2021-12-17 | 南京大学 | Natural scene text detection method based on full convolution neural network |
TWI706336B (en) * | 2018-11-19 | 2020-10-01 | 中華電信股份有限公司 | Image processing device and method for detecting and filtering text object |
CN111222368B (en) * | 2018-11-26 | 2023-09-19 | 北京金山办公软件股份有限公司 | Method and device for identifying document paragraphs and electronic equipment |
CN111222368A (en) * | 2018-11-26 | 2020-06-02 | 北京金山办公软件股份有限公司 | Method and device for identifying document paragraph and electronic equipment |
CN109918951A (en) * | 2019-03-12 | 2019-06-21 | 中国科学院信息工程研究所 | A kind of artificial intelligence process device side channel system of defense based on interlayer fusion |
CN110163081A (en) * | 2019-04-02 | 2019-08-23 | 宜通世纪物联网研究院(广州)有限公司 | SSD-based real-time regional intrusion detection method, system and storage medium |
CN110110722A (en) * | 2019-04-30 | 2019-08-09 | 广州华工邦元信息技术有限公司 | A kind of region detection modification method based on deep learning model recognition result |
WO2020221298A1 (en) * | 2019-04-30 | 2020-11-05 | 北京金山云网络技术有限公司 | Text detection model training method and apparatus, text region determination method and apparatus, and text content determination method and apparatus |
CN110135423A (en) * | 2019-05-23 | 2019-08-16 | 北京阿丘机器人科技有限公司 | The training method and optical character recognition method of text identification network |
CN113850264A (en) * | 2019-06-10 | 2021-12-28 | 创新先进技术有限公司 | Method and system for evaluating target detection model |
CN110263877A (en) * | 2019-06-27 | 2019-09-20 | 中国科学技术大学 | Scene character detecting method |
CN110263877B (en) * | 2019-06-27 | 2022-07-08 | 中国科学技术大学 | Scene character detection method |
CN110414417A (en) * | 2019-07-25 | 2019-11-05 | 电子科技大学 | A kind of traffic mark board recognition methods based on multi-level Fusion multi-scale prediction |
CN110458170A (en) * | 2019-08-06 | 2019-11-15 | 汕头大学 | Chinese character positioning and recognition methods in a kind of very noisy complex background image |
CN112487848A (en) * | 2019-09-12 | 2021-03-12 | 京东方科技集团股份有限公司 | Character recognition method and terminal equipment |
CN112487848B (en) * | 2019-09-12 | 2024-04-26 | 京东方科技集团股份有限公司 | Character recognition method and terminal equipment |
CN110674804A (en) * | 2019-09-24 | 2020-01-10 | 上海眼控科技股份有限公司 | Text image detection method and device, computer equipment and storage medium |
CN110796640A (en) * | 2019-09-29 | 2020-02-14 | 郑州金惠计算机系统工程有限公司 | Small target defect detection method and device, electronic equipment and storage medium |
US11710077B2 (en) * | 2019-11-15 | 2023-07-25 | Salesforce, Inc. | Image augmentation and object detection |
US20220083819A1 (en) * | 2019-11-15 | 2022-03-17 | Salesforce.Com, Inc. | Image augmentation and object detection |
CN111046923A (en) * | 2019-11-26 | 2020-04-21 | 佛山科学技术学院 | Image target detection method and device based on bounding box and storage medium |
CN111046923B (en) * | 2019-11-26 | 2023-02-28 | 佛山科学技术学院 | Image target detection method and device based on bounding box and storage medium |
CN111598082A (en) * | 2020-04-24 | 2020-08-28 | 云南电网有限责任公司电力科学研究院 | Electric power nameplate text detection method based on full convolution network and instance segmentation network |
CN111598082B (en) * | 2020-04-24 | 2023-10-17 | 云南电网有限责任公司电力科学研究院 | Electric power nameplate text detection method based on full convolution network and instance segmentation network |
CN111783685A (en) * | 2020-05-08 | 2020-10-16 | 西安建筑科技大学 | Target detection improved algorithm based on single-stage network model |
CN111680628A (en) * | 2020-06-09 | 2020-09-18 | 北京百度网讯科技有限公司 | Text box fusion method, device, equipment and storage medium |
CN111680628B (en) * | 2020-06-09 | 2023-04-28 | 北京百度网讯科技有限公司 | Text frame fusion method, device, equipment and storage medium |
CN111986252A (en) * | 2020-07-16 | 2020-11-24 | 浙江工业大学 | Method for accurately positioning candidate bounding box in target segmentation network |
CN111986252B (en) * | 2020-07-16 | 2024-03-29 | 浙江工业大学 | Method for accurately positioning candidate bounding boxes in target segmentation network |
CN111844101A (en) * | 2020-07-31 | 2020-10-30 | 中国科学技术大学 | Multi-finger dexterous hand sorting planning method |
CN111985465A (en) * | 2020-08-17 | 2020-11-24 | 中移(杭州)信息技术有限公司 | Text recognition method, device, equipment and storage medium |
CN112419310B (en) * | 2020-12-08 | 2023-07-07 | 中国电子科技集团公司第二十研究所 | Target detection method based on cross fusion frame optimization |
CN112419310A (en) * | 2020-12-08 | 2021-02-26 | 中国电子科技集团公司第二十研究所 | Target detection method based on intersection and fusion frame optimization |
CN112906699A (en) * | 2020-12-23 | 2021-06-04 | 深圳市信义科技有限公司 | Method for detecting and identifying enlarged number of license plate |
WO2022150978A1 (en) * | 2021-01-12 | 2022-07-21 | Nvidia Corporation | Neighboring bounding box aggregation for neural networks |
CN113269049A (en) * | 2021-04-30 | 2021-08-17 | 天津科技大学 | Method for detecting handwritten Chinese character area |
CN114359889A (en) * | 2022-03-14 | 2022-04-15 | 北京智源人工智能研究院 | Text recognition method for long text data |
CN114898171B (en) * | 2022-04-07 | 2023-09-22 | 中国科学院光电技术研究所 | Real-time target detection method suitable for embedded platform |
CN114898171A (en) * | 2022-04-07 | 2022-08-12 | 中国科学院光电技术研究所 | Real-time target detection method suitable for embedded platform |
CN115080051A (en) * | 2022-05-31 | 2022-09-20 | 武汉大学 | GUI code automatic generation method based on computer vision |
CN117048773A (en) * | 2023-08-01 | 2023-11-14 | 黄岛检验认证有限公司 | Automatic tracking water gauge light supplementing double-shaft camera and water gauge observation method |
CN117048773B (en) * | 2023-08-01 | 2024-09-10 | 黄岛检验认证有限公司 | Automatic tracking water gauge light supplementing double-shaft camera and water gauge observation method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108764228A (en) | Word object detection method in a kind of image | |
CN108876780B (en) | Bridge crack image crack detection method under complex background | |
Xu et al. | Scale-aware feature pyramid architecture for marine object detection | |
CN109784203B (en) | Method for inspecting contraband in weak supervision X-ray image based on layered propagation and activation | |
CN113158862B (en) | Multitasking-based lightweight real-time face detection method | |
CN110046572A (en) | A kind of identification of landmark object and detection method based on deep learning | |
CN109583425A (en) | A kind of integrated recognition methods of the remote sensing images ship based on deep learning | |
CN108830188A (en) | Vehicle checking method based on deep learning | |
CN111275688A (en) | Small target detection method based on context feature fusion screening of attention mechanism | |
CN109977918A (en) | A kind of target detection and localization optimization method adapted to based on unsupervised domain | |
CN110097568A (en) | A kind of the video object detection and dividing method based on the double branching networks of space-time | |
CN111079602A (en) | Vehicle fine granularity identification method and device based on multi-scale regional feature constraint | |
CN110097044A (en) | Stage car plate detection recognition methods based on deep learning | |
CN108182454A (en) | Safety check identifying system and its control method | |
CN107134144A (en) | A kind of vehicle checking method for traffic monitoring | |
CN107871119A (en) | A kind of object detection method learnt based on object space knowledge and two-stage forecasting | |
CN107862261A (en) | Image people counting method based on multiple dimensioned convolutional neural networks | |
CN107729801A (en) | A kind of vehicle color identifying system based on multitask depth convolutional neural networks | |
CN109753949B (en) | Multi-window traffic sign detection method based on deep learning | |
CN107945153A (en) | A kind of road surface crack detection method based on deep learning | |
CN112560675B (en) | Bird visual target detection method combining YOLO and rotation-fusion strategy | |
CN105005794A (en) | Image pixel semantic annotation method with combination of multi-granularity context information | |
CN105528575A (en) | Sky detection algorithm based on context inference | |
Yin et al. | G2Grad-CAMRL: an object detection and interpretation model based on gradient-weighted class activation mapping and reinforcement learning in remote sensing images | |
CN113971764B (en) | Remote sensing image small target detection method based on improvement YOLOv3 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20181106 |