[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN113139539A - Method and device for detecting characters of arbitrary-shaped scene with asymptotic regression boundary - Google Patents

Method and device for detecting characters of arbitrary-shaped scene with asymptotic regression boundary Download PDF

Info

Publication number
CN113139539A
CN113139539A CN202110280975.4A CN202110280975A CN113139539A CN 113139539 A CN113139539 A CN 113139539A CN 202110280975 A CN202110280975 A CN 202110280975A CN 113139539 A CN113139539 A CN 113139539A
Authority
CN
China
Prior art keywords
boundary
character
expression
feature
sampling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110280975.4A
Other languages
Chinese (zh)
Other versions
CN113139539B (en
Inventor
操晓春
代朋纹
张三义
张华�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN202110280975.4A priority Critical patent/CN113139539B/en
Publication of CN113139539A publication Critical patent/CN113139539A/en
Application granted granted Critical
Publication of CN113139539B publication Critical patent/CN113139539B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • G06F18/2193Validation; Performance evaluation; Active pattern learning techniques based on specific statistical tests
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method and a device for detecting characters of any shape scene with an asymptotic regression boundary, wherein the method comprises the following steps: extracting visual characteristics of an image to be detected, and performing characteristic fusion on the visual characteristics to obtain characteristic expression; inputting the feature expression into a horizontal suggestion box to generate a network and generating a horizontal character candidate box; inputting the feature expression and the horizontal character candidate box into a direction suggestion box to generate a network and generating a direction character suggestion box; and inputting the feature expression and direction character suggestion box into a character boundary of any shape to generate a network, and acquiring a scene character detection result. According to the method, more accurate and smooth character boundaries can be generated through asymptotic regression, more accurate point positions are obtained by utilizing the geometric topological relation and the semantic relation among boundary sampling points, and the model has better generalization, more effective execution speed and stronger detection capability.

Description

Method and device for detecting characters of arbitrary-shaped scene with asymptotic regression boundary
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a method and a device for detecting characters of an arbitrary-shape scene with an asymptotic regression boundary.
Background
In modern social life, images are a popular information carrier, which is widely present in a network space to deliver rich information. Characters have been used as a more direct information carrier since ancient times, and also contain rich and accurate high-level semantic information. When the characters take the images as carriers, the characters can not only directly transmit character information, but also help to understand the deep meaning of the images. Therefore, how to detect and identify the characters in the image has a very important application value in real life, which is mainly embodied in four aspects: (1) a deep intelligent visual question and answer or description system. For a given image, the machine can intelligently respond to or describe deeper meanings in conjunction with textual information in the image. If a bus image is shot in a natural scene, the intelligent system can understand the deeper semantic meaning of the image according to the visual elements containing characters, such as the license plate, the starting station and the destination of the bus, the advertisement poster on the surface of the bus and the like. (2) Provided is a human-computer interaction system. When people are shopping or shopping malls, many billboards, posters, store signs, menus, product needs, etc. are often encountered, however, the information is often presented in different languages. Therefore, the mobile device collects images and identifies the character elements in the images, and can bring convenience to the life of people. (3) And searching images based on the text content. The character information in the image can effectively solve the ambiguity of the image content, and the image retrieval based on the visual cue can be supplemented according to the character retrieval image in the image. In addition, many lawbreakers use images as carriers, and embed some vulgar characters in the images to propagate in a network space. And bad character information in the image is identified, so that the transmission of the image is prevented, and the physical and mental health of the underage is protected. (4) An intelligent transportation system. In the outdoor environment, accurate discernment license plate and traffic sign all have positive effect to the intelligent management of traffic.
In order to effectively identify characters in an image, the position of the characters is the most important preorder step in accurate positioning. In addition, the detection of the characters in the natural scene plays an important role in the field of image editing, and the accurate positioning of the characters is beneficial to better removing or replacing the character contents in the image, so that the effect of privacy protection is achieved. However, detection of text in images of natural scenes is extremely challenging. Firstly, under the condition of an uncontrollable natural scene, due to factors such as uneven illumination, weather change, shooting angle or shaking and the like, the scene characters have low resolution, large noise, blurring, shadow or shielding, thereby increasing the difficulty of character detection in the scene. In addition, due to the characteristics of the characters in the scene, such as the layout of any shape, the diversity of font color/type/size, the similarity between the character texture and background elements (bricks, fences, etc.), and the like, these factors also make the characters in the image missed, misdetected, or incomplete in boundary positioning. In summary, text detection in natural scenes is a very challenging task in the field of computer vision.
In recent years, natural character detection methods based on deep learning are mainly classified into three categories: a method based on boundary point regression; a pixel segmentation based approach; a method based on a mixture of regression and segmentation. The method based on the boundary point regression is to regress key points or a plurality of sampling points on the boundary of characters in any shape. The method mainly comprises the steps of regressing accurate character boundaries for characters with any shapes in a candidate region; or directly regressing points on the boundary through a one-stage model. The pixel segmentation-based method is to regard character areas in any shape in an image as a semantic segmentation problem, estimate the geometric attribute or connection relation of each pixel in the character areas, and finally aggregate the pixels into different character instances according to auxiliary information predicted by each pixel. In addition, a learner aggregates pixels into local connected regions according to the predicted attribute information of each pixel, and then predicts or infers the connection relationship between the connected regions, thereby aggregating into different text instances. The method based on regression segmentation mixing is to obtain horizontal candidate frames through regression, and then completes semantic segmentation of pixel level in the candidate frames. However, the current mainstream methods have their own disadvantages. Such as regression-based methods, usually obtain candidate boxes through a regional suggestion network, which requires manual design of prior boxes and relies on smart positive and negative sample sampling, thereby limiting the generalization performance of the model. In addition, such methods independently regress points on the text boundary, ignoring geometric topological or semantic relationships between the boundary points. Pixel segmentation based methods are typically extremely sensitive to noise. Due to the interference of the background, many erroneous judgments are easily generated, such as judging the pixels of the character area as the background or misjudging the background as the character area. Moreover, such methods often have difficulty generating smooth boundaries, which may negatively impact some practical applications. Furthermore, such methods typically involve complex post-aggregation processing that requires processing of a large number of pixels, thereby slowing down the overall algorithm. The regression-based segmentation blending method also involves generating text candidate boxes using the regional suggestion network, and the segmentation is limited to the candidate boxes, which will also be affected when the candidate boxes are not precisely located. In addition, the method also involves segmenting a plurality of scales and a large number of candidate blocks, and the execution speed of the method is also seriously influenced.
Therefore, the invention generates a small number of candidate frames through a network without prior frame design, then samples a plurality of dense points on the boundary of the candidate frames, considers the geometric topology and semantic relationship among the sampling points, and gradually iterates regression to obtain the accurate boundary of characters with any shape.
Disclosure of Invention
The invention provides a method and a device for detecting characters in any shape with asymptotic regression character boundaries aiming at natural scene images. The method gradually evolves sampling points on the boundary of the bounding box on the basis of the candidate box so as to accurately position the position of characters with any shape in the scene picture. In the process of generating the candidate frame, the invention avoids regression on the basis of a prior frame designed manually by regressing the center point of the character and the width and height of the horizontal external bounding frame. In the evolution process, the invention captures the topological relation and the semantic relation among the sampling points on the boundary, thereby enhancing the characteristic expression of the sampling points on the boundary and obtaining more accurate position of the boundary sampling point by regression.
In order to achieve the purpose, the technical scheme of the invention comprises the following steps:
a method for detecting characters of an arbitrary-shaped scene with an asymptotic regression boundary comprises the following steps:
1) extracting visual characteristics of an image to be detected, performing multi-scale characteristic fusion on the visual characteristics, and acquiring a characteristic expression F of the image to be detectede
2) Expressing F according to characteristicseGenerating a horizontal character candidate box Bh
3) Expressing F according to characteristicseAnd in the horizontal character candidate box BhGenerating an offset prediction value by a first boundary sampling point set obtained by sampling on the boundary, and generating a directional character suggestion box B by combining sampling points in the first boundary sampling point setoThe angular point of the direction character suggestion box B is obtainedo
4) Using in-direction text suggestion boxes BoA second boundary sampling point set obtained by sampling on the boundary, and F is expressed according to the evolution characteristiceGenerating new coordinates of the sampling points according to the obtained coordinate positions of the sampling points, and obtaining the coordinates of the accurate boundary positions according to the new coordinate positions of the sampling points
Figure BDA0002978369010000031
And estimating the score s of the characters belonging to the boundary surrounding area, thereby obtaining a scene character detection result.
Further, the method for extracting visual features comprises the following steps: a backbone network pre-trained on ImageNet was utilized.
Further, the backbone network includes: DLA34 network or ResNet50 network.
Further, the method for performing multi-scale feature fusion on the visual features comprises the following steps: and fusing the multi-scale features from shallow to deep.
Further, a horizontal character suggestion box is used for generating a network to obtain a horizontal character candidate box Bh
1) For feature expression FeAfter convolution and linear rectification, the linear rectification result is input into the first convolution layer to generate a character center response diagram
Figure BDA0002978369010000032
2) For feature expression FeTo carry outAfter convolution and linear rectification, the linear rectification result is input into the second convolution layer to generate a character external rectangular frame scale estimation graph
Figure BDA0002978369010000033
The number of convolution kernels of the first convolution layer is different from that of the second convolution layer;
3) response graph to character center
Figure BDA0002978369010000034
Performing maximum pooling operation and passing a set threshold τcFiltering the central point with low score to obtain the filtered central point;
4) estimating the graph according to the filtered central point and the dimension of the character external rectangular frame
Figure BDA0002978369010000035
Generating a horizontal text candidate box Bh
Further, training the horizontal character suggestion box to generate a loss function of the network
Figure BDA0002978369010000036
Wherein the center of the text is lost
Figure BDA0002978369010000037
Loss of dimension
Figure BDA0002978369010000038
Figure BDA0002978369010000039
NtRepresenting the number of text instances in the sample image, i represents the text center response graph
Figure BDA00029783690100000310
The position index of the upper part, P and Q respectively represent the character center response diagram
Figure BDA00029783690100000311
Dimension estimation graph of rectangle frame externally connected with characters
Figure BDA00029783690100000312
The true value of (c) is given,
Figure BDA00029783690100000313
alpha and beta represent a first penalty factor and a second penalty factor, respectively, for the Smooth-L1 loss function.
Further, generating an offset prediction value by:
1) in the horizontal text suggestion box BhIs uniformly sampled over the boundary of NoObtaining coordinates x of each sampling point in the first boundary sampling point set by using the point;
2) expression of F according to characteristicseExtracting the boundary original feature expression F of each sampling point in the first boundary sampling point setc
3) For each boundary original characteristic FcPerforming boundary information aggregation to obtain increased feature expression Fcia
4) Expression of F according to an increasing characteristicciaAnd performing offset prediction to obtain an offset prediction value o of each sampling point in the first boundary sampling point set.
Further, obtaining a boundary original feature expression F through the following stepsc
1) Expression of F according to characteristicseRespectively extracting semantic features F of each sampling point in the first boundary sampling point setsemAnd position feature Floc
2) To identify semantic features FsemAnd position feature FlocSplicing together to obtain a boundary original characteristic expression Fc
Further, the increase of the characteristic expression F is obtained by the following stepscia
1) Expression of boundary primitive features FcAfter one-dimensional cyclic convolution operation and linear rectification, inputting a linear rectification result into a BN layer to capture a geometric topological structure of closed boundary cycle
Figure BDA0002978369010000041
2) By geometric topology
Figure BDA0002978369010000042
And t expansion ratios rtThe boundary information aggregation unit acquires t features
Figure BDA0002978369010000043
And will geometrically topological structure
Figure BDA0002978369010000044
And each feature
Figure BDA0002978369010000045
Performing splicing to generate multi-scale fused boundary features
Figure BDA0002978369010000046
3) Boundary characterization using one-dimensional convolution operations
Figure BDA0002978369010000047
Reducing dimension and obtaining characteristics
Figure BDA0002978369010000048
4) Feature-based
Figure BDA0002978369010000049
Performing maximal pooling operation to generate boundary global features
Figure BDA00029783690100000410
5) Global feature of boundary
Figure BDA00029783690100000411
Distributing the data to each sampling point in the first boundary sampling point set, and splicing to obtain an increased feature expression Fcia
Further, the characteristics are obtained by the following steps
Figure BDA00029783690100000412
1) Will be characterized by
Figure BDA00029783690100000413
By 3 expansion ratios rtBy expanding one-dimensional cyclic convolution to produce a feature representation
Figure BDA00029783690100000414
Expression of characteristics
Figure BDA00029783690100000415
And characteristic expression
Figure BDA00029783690100000416
2) Respectively based on feature expressions
Figure BDA00029783690100000417
And characteristic expression
Figure BDA00029783690100000418
By using NgCollecting information along character boundary by each collection node, capturing semantic relation between sampling points, and obtaining characteristics of the collection nodes by combining with boundary global relation captured by adding a virtual collection node
Figure BDA00029783690100000419
And sink node characteristics
Figure BDA00029783690100000420
3) Calculating the relationship between boundary sampling points and sink nodes
Figure BDA00029783690100000421
Wherein
Figure BDA00029783690100000422
i is the number of the collection node, j is the number of the virtual collection node, DuFor the characteristic expression
Figure BDA00029783690100000423
And
Figure BDA00029783690100000424
dimension (d);
4) assigning features on sink nodes to each of the boundary sample points to produce features
Figure BDA0002978369010000051
Wherein
Figure BDA0002978369010000052
Representing element-by-element addition.
Further, a network is generated through an angle point, and a direction character suggestion box B is generatedoCorner points of (a):
1) obtaining an updated sampling point coordinate x' ═ x + o through the coordinate x of each sampling point and the offset predicted value o;
2) selecting 4 points from coordinates x' of sampling points at equal intervals as direction character suggestion boxes
Figure BDA0002978369010000053
Angular point, generating directional character suggestion box BoThe corner points of (a).
Further, the training corner points generate a loss function of the network
Figure BDA0002978369010000054
NtIndicating the number of instances of text in the sample image,
Figure BDA0002978369010000055
coordinates are predicted for the corner point of the jth sample point of the ith word in the sample image,
Figure BDA0002978369010000056
the coordinates of the truth value of the corner point of the jth sampling point of the ith character in the sample image,
Figure BDA0002978369010000057
as a function of Smooth-L1 loss.
Further, by bounding the position network on one side, new coordinate positions of the generated sampling points are generated, wherein the objective function of the boundary positioning network is trained
Figure BDA0002978369010000058
Wherein N istRepresenting the number of text instances in the sample image, NaThe number of sampling points of the ith word in the sample image,
Figure BDA0002978369010000059
as a function of Smooth-L1 loss,
Figure BDA00029783690100000510
is the predicted value of the j sampling point of the ith character in the sample image,
Figure BDA00029783690100000511
is the true value of the jth sampling point of the ith word in the sample image.
Further, accurate boundary position coordinates are obtained through a reliable boundary positioning network
Figure BDA00029783690100000512
And the score s of characters belonging to the predicted boundary surrounding area, wherein the objective function of the reliable boundary positioning network is trained
Figure BDA00029783690100000513
Wherein N istIndicating the number of instances of text in the sample image,
Figure BDA00029783690100000514
and the confidence that the area surrounded by the ith prediction boundary in the sample image belongs to the characters or the background is represented.
A storage medium having a computer program stored therein, wherein the computer program is arranged to perform the above-mentioned method when executed.
An electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer to perform the method as described above.
The invention has the beneficial effects that:
1. the invention gradually regresses the sampling points on the character boundary by adopting an asymptotic regression mode, the regression form is consistent with the human visual system, and the asymptotic regression can generate more accurate and smooth character boundary for the character layout with a complex form.
2. The invention models the geometric topological relation and semantic relation between the boundary sampling points to interact the information on the boundary, thereby enhancing the boundary characteristic expression and obtaining the position of a more accurate point by regression.
3. The invention does not need to design a prior frame, thereby leading the model to have better generalization.
4. The number of the candidate frames generated by the method is obviously less than that of the candidate frames generated by the prior frame regression, so that the execution speed of the model is effectively improved.
5. The invention has strong detection capability and excellent performance for characters in any shapes, such as horizontal characters, multidirectional characters, curved characters and the like.
Drawings
FIG. 1 is a flow chart of detecting characters in an arbitrary shape scene.
Fig. 2 is a schematic diagram of a boundary offset prediction network.
Fig. 3 is a schematic diagram of a boundary information aggregation unit.
Fig. 4 is a schematic diagram of a reliable boundary positioning mechanism.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The method for detecting characters in scenes with any shapes, as shown in fig. 1, includes:
1) extracting feature expression of an input image;
2) generating a network prediction character central point and the scale of an external rectangular frame by using the horizontal character suggestion frame to generate a horizontal character candidate frame;
3) sampling the boundary of the horizontal candidate frame, extracting the characteristics of the boundary sampling points, and enhancing the characteristic expression of the boundary sampling points by using a boundary aggregation network (CIA) to evolve the positions of the boundary sampling points so as to generate a directional character candidate frame;
4) sampling the boundary of the direction candidate frame, and gradually evolving the position of the boundary sampling point through a plurality of boundary positioning Mechanism (CLM) modules to approach the boundary of characters in any shape; and finally, a Reliable boundary positioning Mechanism (RCLM) technology is utilized to determine the confidence degree of whether the area surrounded by the positioned boundary points belongs to characters.
In one embodiment of the present invention, the input is an RBG image with a size H × W. Images were randomly cropped to 640 x 640 at the time of training. The shortest side of the input image is set to different values according to different data sets during testing, and the longest side of the input image is kept fixed in length and breadth and is changed correspondingly. Then, the image input into the network is subjected to feature extraction, and the specific steps are as follows:
i) the visual features of the input image are extracted using a backbone network (i.e., a network pre-trained on ImageNet, such as DLA34, ResNet50, etc.), the output of which is represented as
Figure BDA0002978369010000071
Figure BDA0002978369010000072
For DLA34, D2,D3,D4And D564,128, 256,512 respectively. For ResNet50, it is 256,512,1024 and 2048, respectively.
ii) mixing C2,C3,C4And C5And inputting the feature enhancement module. Then C3By reducing the characteristic dimension of a convolution operationDegree; then, sampling the characteristic diagram by 2 times, and splicing the characteristic diagram with C2; spliced feature is subjected to a feature C 'with a new deformation convolution parameter'3(ii) a Then, C4Repeating the above process with C'3C 'is generated through deformation convolution after splicing'4(ii) a Finally, C5Repeating the above process with C'4After splicing, a deformation convolution is carried out to generate a characteristic expression of scale robustness
Figure BDA0002978369010000073
Where σ is 4, De64 or 256 for DLA34 or ResNet50, respectively.
In one embodiment of the invention, the horizontal text candidate box generation network is composed of two branches. One of them consists of 1 convolution layer of 3 x 3 (256 convolution kernels), 1 Linear rectifying Unit (ReLU) and 1 convolution layer of 1 x 1 (1 convolution kernel) to generate a word center response map
Figure BDA0002978369010000074
The second is composed of 1 convolution layer (256 convolution kernels) of 3 x 3, 1 Linear rectification Unit (ReLU) and 1 convolution layer (2 convolution kernels) of 1 x 1 to generate a character circumscribed rectangle frame scale estimation graph
Figure BDA0002978369010000075
During training, the loss function of the network contains text center loss
Figure BDA0002978369010000076
Loss of dimension
Figure BDA0002978369010000077
Two parts, respectively denoted as:
Figure BDA0002978369010000078
Figure BDA0002978369010000079
wherein N istIndicating the number of text instances in the image and i indicates the position index on the response map. P and Q represent the truth values of the word center response diagram and the scale of the circumscribed rectangle box respectively.
Figure BDA00029783690100000710
As a function of Smooth-L1 loss. Alpha and beta represent penalty factors that are set to 2 and 4, respectively, during the training process.
In the test process, according to the obtained word center response diagram
Figure BDA00029783690100000711
The invention firstly utilizes a3 x 3 maximum pooling operation to highlight the central point; then passes a threshold τcCenter points with low scores are filtered. The horizontal text suggestion box may then be represented as:
Figure BDA00029783690100000712
wherein
Figure BDA00029783690100000713
And
Figure BDA00029783690100000714
i.e. the abscissa and ordinate of the point corresponding to the ith maximum response.
In one embodiment of the present invention, for the generation of directional text suggestion boxes, the present invention first generates a directional text suggestion box at each horizontal text suggestion box
Figure BDA00029783690100000715
Is uniformly sampled over the boundary of (1)oAnd then extracting an original boundary Feature expression by using a boundary Feature Extractor (CFE) in a boundary offset prediction network
Figure BDA00029783690100000716
The specific process is as follows:
i) from FeExtracting corresponding semantic features of sampling points
Figure BDA0002978369010000081
ii) calculating the position characteristics of the boundary sampling points in the following way: floc=x-xmin
iii) comparing the semantic features F of each sample pointsemAnd position feature FlocSpliced together to form the original characteristic expression F of each sampling pointc
Generating enhanced feature expressions after passing through a boundary Information Aggregation (CIA) module
Figure BDA0002978369010000082
Then, FciaInput to an Offset Prediction Head (OPH) to generate an Offset for each sample point
Figure BDA0002978369010000083
The boundary feature extractor CFE, the boundary information aggregation module CIA, and the offset prediction header form an offset prediction network (as shown in fig. 2). The updated coordinates of the sample points are thus expressed as: x' ═ x + o. From x', 4 points are selected at equal intervals as the corner points of the directional character suggestion box. Stacking all the predicted corner coordinates in the image together to form a coordinate matrix
Figure BDA0002978369010000084
During the training process, the loss function of the corner learning is expressed as:
Figure BDA0002978369010000085
wherein
Figure BDA0002978369010000086
I.e. the true coordinates representing the corner points.
In an embodiment of the present invention, a boundary Information Aggregation (CIA) module specifically executes the following steps:
i) input boundary primitive feature FcThen, the geometric topology of the closed boundary loop is captured using 1 9 × 9 one-dimensional cyclic convolution operation (D convolution kernels), 1 ReLU, and one (batch normalization, BN) layer, whose output is represented as
Figure BDA0002978369010000087
ii)
Figure BDA0002978369010000088
Input into 6 CIA units with expansion rates r of 1,1,2,2,4 and 4 respectively, to model semantic relations between boundary sampling points to generate features respectively
Figure BDA0002978369010000089
And
Figure BDA00029783690100000810
iii) stitching the features to generate a multi-scale fused boundary feature
Figure BDA00029783690100000811
iv) fusing the features using a standard 3 x 3 one-dimensional convolution operation (D convolution kernels)
Figure BDA00029783690100000812
Performing dimensionality reduction treatment to obtain a reduced characteristic
Figure BDA00029783690100000813
v) is based on
Figure BDA00029783690100000814
Generation of boundary global features using max pooling operations
Figure BDA00029783690100000815
vi) global features
Figure BDA00029783690100000816
Distributing the data to each sampling point for splicing to obtain the output characteristics of the CIA module
Figure BDA00029783690100000817
Figure BDA00029783690100000818
Wherein Dcia=8*D。
For each CIA unit, as shown in fig. 3, the specific steps are as follows:
i) input features
Figure BDA00029783690100000819
The periodicity of the sample points on the closed boundary is first encoded by 3 dilation one-dimensional cyclic convolutions with dilation rate r to produce a feature representation
Figure BDA00029783690100000820
And
Figure BDA00029783690100000821
wherein DuIs a dimension.
ii) based on
Figure BDA0002978369010000091
By using NgThe collection nodes collect information along the character boundaries, capture semantic relations among sampling points, and reduce interference between redundant sampling points and noise sampling points. In addition, a virtual sink node is added to capture the global relationship of the boundary, so the characteristics of the sink node
Figure BDA0002978369010000092
Expressed as:
Figure BDA0002978369010000093
wherein
Figure BDA0002978369010000094
Represents a maximum pooling operation; phi denotes a characteristic polymerization operation, NgFor hyper-parameter, N in the directional suggestion box generation moduleg64, N in the arbitrary-shaped character boundary generating moduleg=128。
iii) in the same way based on
Figure BDA0002978369010000095
Obtaining polymerization characteristics
Figure BDA0002978369010000096
iv) relationship between boundary sample points and sink nodes
Figure BDA0002978369010000097
Can be expressed as:
Figure BDA0002978369010000098
wherein
Figure BDA0002978369010000099
Representing the relationship between the ith boundary sample point and the jth sink node.
v) the features on the sink node are assigned to each of the boundary sample points to produce an aggregate feature
Figure BDA00029783690100000910
It is expressed as:
Figure BDA00029783690100000911
wherein
Figure BDA00029783690100000912
Representing element-by-element addition.
In bookIn one embodiment of the invention, for each CIA unit, FuAre all different. For 1,2, …,6 CIA units, FuI.e. respectively represent
Figure BDA00029783690100000913
In an embodiment of the present invention, the Offset Prediction Head (OPH) is composed of 3 1 × 1 one-dimensional convolutions (the number of convolution kernels is 256,64,2, respectively) and 2 ReLU units.
In one embodiment of the present invention, for the generation of arbitrary shape literal boundaries, it first proposes box B from the direction literaloIs sampled by the boundary of NaPoint; and then, gradually evolving sampling points by utilizing K boundary positioning Mechanism (CLM) modules to approximate the boundary of characters in any shape. Finally, as shown in FIG. 4, the new coordinates obtained by the CLM evolution
Figure BDA00029783690100000914
Inputting the data into a Reliable boundary positioning Mechanism (RCLM) to generate accurate boundary positioning and predict the score of the characters belonging to the boundary enclosing region.
In an embodiment of the invention, the CLM is composed of a boundary feature extractor CFE, a boundary information aggregation CIA and an offset prediction head OPH module; adding the position offset output by the OPH to the input coordinate position to obtain a new coordinate position; the new coordinate position is then entered into the next CLM module for further evolution, and so on.
In an embodiment of the invention, the RCLM is similar in structure to the CLM, except that the RCLM not only aggregates the boundary information into the output F of the CIA moduleciaInputting the position of the boundary into the offset prediction head to further adjust the position of the boundary, and generating final position coordinates
Figure BDA0002978369010000101
And inputting the result into a boundary Scoring Mechanism (CSM) module to predict the confidence level of whether the region surrounded by the boundary is a character
Figure BDA0002978369010000102
In the training process, the evolution of the whole process passes through an objective function
Figure BDA0002978369010000103
Learning is carried out; and the CSM module is updated through the objective function
Figure BDA0002978369010000104
This is completed. It is expressed as:
Figure BDA0002978369010000105
Figure BDA0002978369010000106
wherein
Figure BDA0002978369010000107
Which represents the true value of the jth sample point for the ith word.
Figure BDA0002978369010000108
And the confidence that the area surrounded by the ith prediction boundary belongs to the characters or the background is represented. l1 denotes a character, and l 0 denotes a background.
In an embodiment of the present invention, D, No,Na,Du,τcAnd K is set to 128,64,128, 0.35, and 2, respectively.
The invention provides a method for detecting characters with any shapes at asymptotic regression boundaries, which comprises the following test environments and experimental results:
(1) and (3) testing environment:
the system environment is as follows: ubuntu 16.04.
Hardware environment: memory: 15GB, GPU: NVIDIA RTX 2080Ti, CPU 4.00GHz Intel (R) Xeon (R) W-2125, hard disk: 2 TB.
(2) Experimental data:
the invention has performed experiments on three data sets, CTW1500(1000 training pictures, 500 test pictures), Total-Text (1255 training pictures, 300 test pictures) and ArT (5603 training pictures, 4563 test pictures). During the evaluation, for CTW1500, Total-Text and ArT, the shortest edges of their test pictures were set to 416,512 and 640, respectively.
(3) The optimization method comprises the following steps:
optimization was performed using an Adam optimizer. For CTW1500, the Total-Text and ArT models were trained for 250,300,300 epochs, respectively. The initial learning rate of the model was 0.0001. It multiplies the learning rate by 0.1 after 80,120,160,180 th and 260 epochs. For the backbone networks DLA34 and ResNet-50, the batch size during training is set to 6 and 3, respectively.
(4) The experimental results are as follows:
1) ablation experiment:
the experiment was performed on the CTW1500 dataset and the results are shown in tables 1 and 2. In the experiment, the baseline model carries out position evolution once from the sampling point on the horizontal candidate frame to the boundary of the character with any shape. As shown in the first row of Table 1, baseball can obtain 73.9% Recall, 80.9% Precision and 77.2% F-measure. If a directional character candidate generation network is added into the baseline model, 2.6% of F-measure can be improved. Further, when a CIA module is added into OPTG and ATPG, the F-measure of the model is improved by 2.6 percent. If RCLM is added on the basis of the baselene model added with OTPG, Precision (83.4% vs. 81.5%) of the model is obviously improved compared with Recall (77.8% vs. 78.2%). When the baseline is added with OTPG, CIA and RCLM, the optimal Recall (81.3%), Precision (86.1%) and F-measure (83.7%) can be achieved. It can be seen from table 2 that as the number of CLM modules increases, the F-measure tends to be stable, but the computation speed of the model is significantly affected.
Table 1: validation of extracted modules
Figure BDA0002978369010000111
Table 2: impact of CLM Module number
#CLM Recall(%) Precision(%) F-measure(%) FPS
K=0 80.3 86.6 83.0 16.3
K=1 81.3 86.1 83.7 12.8
K=2 81.1 87.1 84.0 11.8
K=3 81.3 86.5 83.8 10.7
2) And (3) comparing the performances:
as can be seen from tables 3, 4 and 5, the method of the present invention achieves the most advanced performance.
Table 3: performance comparison over CTW1500
Figure BDA0002978369010000112
Figure BDA0002978369010000121
Table 4: performance comparison on Total-Text
Figure BDA0002978369010000122
Figure BDA0002978369010000131
Table 5: comparison of Performance at ArT
Figure BDA0002978369010000132
Figure BDA0002978369010000141
Reference documents:
[1]Shangbang Long,Jiaqiang Ruan,Wenjie Zhang,Xin He,Wenhao Wu,and Cong Yao. TextSnake:A flexible representation for detecting text of arbitrary shapes.In ECCV,pages 19–35, 2018.
[2]Zichuan Liu,Guosheng Lin,Sheng Yang,Fayao Liu,Weisi Lin,and Wang Ling Goh. Towards robust curve text detection with conditional spatial expansion.In CVPR,pages 7269–7278, 2019.
[3]Wenhai Wang,Enze Xie,Xiaoge Song,Yuhang Zang,Wenjia Wang,Tong Lu,Gang Yu,and Chunhua Shen.Efficient and accurate arbitrary-shaped text detection with pixel aggregation network.In ICCV,pages 8439–8448,2019.
[4]Chuhui Xue,Shijian Lu,and Wei Zhang.MSR:multi-scale shape regression for scene text detection.In IJCAI,pages 989–995,2019.
[5]Wenhai Wang,Enze Xie,Xiang Li,Wenbo Hou,Tong Lu,Gang Yu,and Shuai Shao.Shape robust text detection with progressive scale expansion network.In CVPR,pages 9336–9345,2019.
[6]Youngmin Baek,Bado Lee,Dongyoon Han,Sangdoo Yun,and Hwalsuk Lee.Character region awareness for text detection.In CVPR,pages 9365–9374,2019.
[7]Zhuotao Tian,Michelle Shu,Pengyuan Lyu,Ruiyu Li,Chao Zhou,Xiaoyong Shen,and Jiaya Jia.Learning shape-aware embedding for scene text detection.In CVPR,pages 4234–4243,2019.
[8]Pengfei Wang,Chengquan Zhang,Fei Qi,Zuming Huang,Mengyi En,Junyu Han,Jingtuo Liu,Errui Ding,and Guangming Shi.A single-shot arbitrarily-shaped text detector based on context attended multi-task learning.In ACM-MM,pages 1277–1285,2019.
[9]Yongchao Xu,Yukang Wang,Wei Zhou,Yongpan Wang,Zhibo Yang,and Xiang Bai. TextField:Learning A deep direction field for irregular scene text detection.IEEE Trans.Image Process.,28(11):5566–5579,2019.
[10]Minghui Liao,Zhaoyi Wan,Cong Yao,Kai Chen,and Xiang Bai.Real-time scene text detection with differentiable binarization.In AAAI,pages 11474–11481,2020.
[11]Yu Zhou,Hongtao Xie,Shancheng Fang,Yan Li,and Yongdong Zhang.CRNet:A center-aware representation for detecting text of arbitrary shapes.In ACM-MM,pages 2571–2580, 2020.
[12]Yuliang Liu,Lianwen Jin,and ChuanMing Fang.Arbitrarily shaped scene text detection with a mask tightness text detector.IEEE Trans.Image Process.,29:2918–2930,2020.
[13]Shanyu Xiao,Liangrui Peng,Ruijie Yan,Keyu An,Gang Yao,and Jaesik Min.Sequential deformation for accurate scene text detection.In ECCV,pages 108–124,2020.
[14]Yuxin Wang,Hongtao Xie,Zhengjun Zha,Mengting Xing,Zilong Fu,and Yongdong Zhang. ContourNet:Taking a further step toward accurate arbitrary-shaped scene text detection.In CVPR, pages 11750–11759,2020.
[15]Yixing Zhu and Jun Du.Sliding line point regression for shape robust scene text detection. In ICPR,pages 3735–3740,2018.
[16]Yuliang Liu,Lianwen Jin,Shuaitao Zhang,and Sheng Zhang.Curved scene text detection via transverse and longitudinal sequence connection.Pattern Recognit.,90:337–345,2019.
[17]Jun Tang,Zhibo Yang,Yongpan Wang,Qi Zheng,Yongchao Xu,and Xiang Bai.SegLink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping.Pattern Recognit.,96,2019.
[18]Xiaobing Wang,Yingying Jiang,Zhenbo Luo,Cheng-Lin Liu,Hyunsoo Choi,and Sungjin Kim.Arbitrary shape scene text detection with adaptive text region representation.In CVPR,pages 6449–6458,2019.
[19]Chengquan Zhang,Borong Liang,Zuming Huang,Mengyi En,Junyu Han,Errui Ding,and Xinghao Ding.Look More Than Once:An accurate detector for text of arbitrary shapes.In CVPR, pages 10552–10561,2019.
[20]Fangfang Wang,Yifeng Chen,Fei Wu,and Xi Li.TextRay:Contour-based geometric modeling for arbitraryshaped scene text detection.In ACM-MM,pages 111–119,2020.
[21]Hao Wang,Pu Lu,Hui Zhang,Mingkun Yang,Xiang Bai,Yongchao Xu,Mengchao He, Yongpan Wang,and Wenyu Liu.All You Need Is Boundary:Toward arbitrary-shaped text spotting. In AAAI,pages 12160–12167,2020.
the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and a person skilled in the art may make modifications or equivalent substitutions to the technical solutions of the present invention without departing from the scope of the present invention, and the scope of the present invention should be determined by the claims.

Claims (10)

1. A method for detecting characters of an arbitrary-shaped scene with an asymptotic regression boundary comprises the following steps:
1) extracting visual characteristics of an image to be detected, performing multi-scale characteristic fusion on the visual characteristics, and acquiring a characteristic expression F of the image to be detectede
2) Expressing F according to characteristicseGenerating a horizontal character candidate box Bh
3) Expressing F according to characteristicseAnd in the horizontal character candidate box BhGenerating an offset prediction value by a first boundary sampling point set obtained by sampling on the boundary, and generating a directional character suggestion box B by combining sampling points in the first boundary sampling point setoThe angular point of the direction character suggestion box B is obtainedo
4) Using in-direction text suggestion boxes BoA second boundary sampling point set obtained by sampling on the boundary, and F is expressed according to the evolution characteristiceGenerating new coordinates of the sampling points according to the obtained coordinate positions of the sampling points, and obtaining the coordinates of the accurate boundary positions according to the new coordinate positions of the sampling points
Figure FDA0002978369000000011
And estimating the score s of the characters belonging to the boundary surrounding area, thereby obtaining a scene character detection result.
2. The method of claim 1, wherein the method of extracting visual features comprises: utilizing a backbone network pre-trained on ImageNet; the backbone network includes: DLA34 network or ResNet50 network.
3. The method of claim 1, wherein the horizontal text candidate box B is obtained by the following stepsh
1) For feature expression FeAfter convolution and linear rectification, the linear rectification result is input into the first convolution layer to generate a character center response diagram
Figure FDA0002978369000000014
2) For feature expression FeAfter convolution and linear rectification, the linear rectification result is input into the second convolution layer to generate a character external rectangular frame scale estimation graph
Figure FDA0002978369000000012
The number of convolution kernels of the first convolution layer is different from that of the second convolution layer;
3) response graph to character center
Figure FDA0002978369000000013
Performing maximum pooling operation and passing a set threshold τcFiltering the central point with low score to obtain the filtered central point;
4) estimating the graph according to the filtered central point and the dimension of the character external rectangular frame
Figure FDA0002978369000000015
Generating a horizontal text candidate box Bh
4. The method of claim 1, wherein the offset prediction value is generated by:
1) in the horizontal text suggestion box BhIs uniformly sampled over the boundary of NoObtaining coordinates x of each sampling point in the first boundary sampling point set by using the point;
2) expression of F according to characteristicseExtracting the boundary original feature expression F of each sampling point in the first boundary sampling point setc
3) To eachBoundary primitive feature FcPerforming boundary information aggregation to obtain increased feature expression Fcia
4) Expression of F according to an increasing characteristicciaAnd performing offset prediction to obtain an offset prediction value o of each sampling point in the first boundary sampling point set.
5. The method of claim 4, wherein the boundary primitive feature representation F is obtained byc
1) Expression of F according to characteristicseRespectively extracting semantic features F of each sampling point in the first boundary sampling point setsemAnd position feature Floc
2) To identify semantic features FsemAnd position feature FlocSplicing together to obtain a boundary original characteristic expression Fc
6. The method of claim 4, wherein the increased characteristic expression F is obtained bycia
1) Expression of boundary primitive features FcAfter one-dimensional cyclic convolution operation and linear rectification, inputting a linear rectification result into a BN layer to capture a geometric topological structure of closed boundary cycle
Figure FDA0002978369000000021
2) By geometric topology
Figure FDA0002978369000000022
And t expansion ratios rtThe boundary information aggregation unit acquires t features
Figure FDA0002978369000000023
And will geometrically topological structure
Figure FDA0002978369000000024
And each feature
Figure FDA0002978369000000025
Performing splicing to generate multi-scale fused boundary features
Figure FDA0002978369000000026
3) Boundary characterization using one-dimensional convolution operations
Figure FDA0002978369000000027
Reducing dimension and obtaining characteristics
Figure FDA0002978369000000028
4) Feature-based
Figure FDA0002978369000000029
Performing maximal pooling operation to generate boundary global features
Figure FDA00029783690000000210
5) Global feature of boundary
Figure FDA00029783690000000211
Distributing the data to each sampling point in the first boundary sampling point set, and splicing to obtain an increased feature expression Fcia
7. The method of claim 6, wherein the features are obtained by the steps of
Figure FDA00029783690000000212
1) Will be characterized by
Figure FDA00029783690000000213
By 3 expansion ratios rtBy expanding one-dimensional cyclic convolution to produce a feature representation
Figure FDA00029783690000000214
Expression of characteristics
Figure FDA00029783690000000215
And characteristic expression
Figure FDA00029783690000000216
2) Respectively based on feature expressions
Figure FDA00029783690000000217
And characteristic expression
Figure FDA00029783690000000218
By using NgCollecting information along character boundary by each collection node, capturing semantic relation between sampling points, and obtaining characteristics of the collection nodes by combining with boundary global relation captured by adding a virtual collection node
Figure FDA00029783690000000219
And sink node characteristics
Figure FDA00029783690000000220
3) Calculating the relationship between boundary sampling points and sink nodes
Figure FDA00029783690000000221
Wherein
Figure FDA00029783690000000222
i is the number of the collection node, j is the number of the virtual collection node, DuFor the characteristic expression
Figure FDA00029783690000000223
And
Figure FDA00029783690000000224
dimension (d);
4) assigning features on sink nodes to each of the boundary sample points to produce features
Figure FDA00029783690000000225
Wherein
Figure FDA00029783690000000226
Representing element-by-element addition.
8. The method of claim 4, wherein the directional text suggestion box B is generated byoCorner points of (a):
1) obtaining an updated sampling point coordinate x' ═ x + o through the coordinate x of each sampling point and the offset predicted value o;
2) selecting 4 points from coordinates x' of sampling points at equal intervals as direction character suggestion boxes
Figure FDA00029783690000000227
Angular point, generating directional character suggestion box BoThe corner points of (a).
9. A storage medium having a computer program stored thereon, wherein the computer program is arranged to, when run, perform the method of any of claims 1-8.
10. An electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the method according to any of claims 1-8.
CN202110280975.4A 2021-03-16 2021-03-16 Method and device for detecting characters of arbitrary-shaped scene with asymptotic regression boundary Active CN113139539B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110280975.4A CN113139539B (en) 2021-03-16 2021-03-16 Method and device for detecting characters of arbitrary-shaped scene with asymptotic regression boundary

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110280975.4A CN113139539B (en) 2021-03-16 2021-03-16 Method and device for detecting characters of arbitrary-shaped scene with asymptotic regression boundary

Publications (2)

Publication Number Publication Date
CN113139539A true CN113139539A (en) 2021-07-20
CN113139539B CN113139539B (en) 2023-01-13

Family

ID=76811104

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110280975.4A Active CN113139539B (en) 2021-03-16 2021-03-16 Method and device for detecting characters of arbitrary-shaped scene with asymptotic regression boundary

Country Status (1)

Country Link
CN (1) CN113139539B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113822278A (en) * 2021-11-22 2021-12-21 松立控股集团股份有限公司 License plate recognition method for unlimited scene

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004334913A (en) * 2004-08-19 2004-11-25 Matsushita Electric Ind Co Ltd Document recognition device and document recognition method
CN107346420A (en) * 2017-06-19 2017-11-14 中国科学院信息工程研究所 Text detection localization method under a kind of natural scene based on deep learning
CN108960229A (en) * 2018-04-23 2018-12-07 中国科学院信息工程研究所 One kind is towards multidirectional character detecting method and device
CN109117836A (en) * 2018-07-05 2019-01-01 中国科学院信息工程研究所 Text detection localization method and device under a kind of natural scene based on focal loss function
CN110245545A (en) * 2018-09-26 2019-09-17 浙江大华技术股份有限公司 A kind of character recognition method and device
CN110287960A (en) * 2019-07-02 2019-09-27 中国科学院信息工程研究所 The detection recognition method of curve text in natural scene image
CN111738055A (en) * 2020-04-24 2020-10-02 浙江大学城市学院 Multi-class text detection system and bill form detection method based on same

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004334913A (en) * 2004-08-19 2004-11-25 Matsushita Electric Ind Co Ltd Document recognition device and document recognition method
CN107346420A (en) * 2017-06-19 2017-11-14 中国科学院信息工程研究所 Text detection localization method under a kind of natural scene based on deep learning
CN108960229A (en) * 2018-04-23 2018-12-07 中国科学院信息工程研究所 One kind is towards multidirectional character detecting method and device
CN109117836A (en) * 2018-07-05 2019-01-01 中国科学院信息工程研究所 Text detection localization method and device under a kind of natural scene based on focal loss function
CN110245545A (en) * 2018-09-26 2019-09-17 浙江大华技术股份有限公司 A kind of character recognition method and device
CN110287960A (en) * 2019-07-02 2019-09-27 中国科学院信息工程研究所 The detection recognition method of curve text in natural scene image
CN111738055A (en) * 2020-04-24 2020-10-02 浙江大学城市学院 Multi-class text detection system and bill form detection method based on same

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
PENGWEN DAI 等: "Deep Multi-Scale Context Aware Feature Aggregation for Curved Scene Text Detection", 《IEEE TRANSACTIONS ON MULTIMEDIA》 *
朱盈盈 等: "适用于文字检测的候选框提取算法", 《数据采集与处理》 *
陈泽瀛: "一种基于自适应非极大值抑制的文本检测算法", 《数字技术与应用》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113822278A (en) * 2021-11-22 2021-12-21 松立控股集团股份有限公司 License plate recognition method for unlimited scene

Also Published As

Publication number Publication date
CN113139539B (en) 2023-01-13

Similar Documents

Publication Publication Date Title
Zhou et al. Violence detection in surveillance video using low-level features
Chen et al. Show, match and segment: Joint weakly supervised learning of semantic matching and object co-segmentation
US11062123B2 (en) Method, terminal, and storage medium for tracking facial critical area
WO2018103608A1 (en) Text detection method, device and storage medium
He et al. Multi-scale FCN with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild
Xiao et al. A weakly supervised semantic segmentation network by aggregating seed cues: the multi-object proposal generation perspective
Wang et al. Video co-saliency guided co-segmentation
Shivakumara et al. Multioriented video scene text detection through Bayesian classification and boundary growing
Ma et al. A saliency prior context model for real-time object tracking
Li et al. Hierarchical feature fusion network for salient object detection
Xu et al. Video saliency detection via graph clustering with motion energy and spatiotemporal objectness
US9626585B2 (en) Composition modeling for photo retrieval through geometric image segmentation
Ni et al. Learning to photograph: A compositional perspective
CN106203423B (en) Weak structure perception visual target tracking method fusing context detection
CN104952083B (en) A kind of saliency detection method based on the modeling of conspicuousness target background
Lee et al. Unsupervised video object segmentation via prototype memory network
Zheng et al. A feature-adaptive semi-supervised framework for co-saliency detection
CN112752158B (en) Video display method and device, electronic equipment and storage medium
Mei et al. Large-field contextual feature learning for glass detection
CN113139544A (en) Saliency target detection method based on multi-scale feature dynamic fusion
Tang et al. CLASS: cross-level attention and supervision for salient objects detection
Bak et al. Two-stream convolutional networks for dynamic saliency prediction
Zhang et al. Deep salient object detection by integrating multi-level cues
Zhang et al. Detecting and removing visual distractors for video aesthetic enhancement
Wang et al. End-to-end trainable network for superpixel and image segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant