CN106650725B - Candidate text box generation and text detection method based on full convolution neural network - Google Patents
Candidate text box generation and text detection method based on full convolution neural network Download PDFInfo
- Publication number
- CN106650725B CN106650725B CN201611070587.9A CN201611070587A CN106650725B CN 106650725 B CN106650725 B CN 106650725B CN 201611070587 A CN201611070587 A CN 201611070587A CN 106650725 B CN106650725 B CN 106650725B
- Authority
- CN
- China
- Prior art keywords
- text
- candidate
- box
- network
- boxes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a candidate text box generation and text detection method based on a full convolution neural network, which comprises the following steps: generating a text region candidate frame, wherein an initiation-RPN takes a natural scene picture and a set of real boundary frames for marking a text region as input, generates word region candidate frames with controllable quantity, slides an initiation network on a convolution characteristic response image of a VGG16 model, and assists a set of text characteristic prior frames at each sliding position; text type monitoring information which easily causes ambiguity is merged, multilevel regional downsampling information is merged, and text detection is carried out; training an initiation candidate box in an end-to-end mode to generate a network and a text detection network through back propagation and random gradient descent; the candidate box iterative voting achieves a higher text recall in a complementary manner, using a candidate box filtering algorithm to remove the remaining detection boxes. The invention obtains 0.83 and 0.85 accuracy on ICDAR 2011 and 2013robust text detection standard databases respectively, which is superior to the best result in the past.
Description
Technical Field
The invention relates to a technology for generating a text candidate box and detecting a text in a natural scene picture, in particular to a method for generating a candidate text box and detecting a text based on a full convolution neural network.
Background
Text in images provides a rich and accurate high level of semantic information that is critical to a large number of potential applications such as scene understanding, image and food retrieval, content-based recommendation systems, and the like. Text detection of natural scene pictures has attracted a great deal of attention in computer vision and image understanding communities. However, text detection of natural scenes remains a challenging and unsolved problem. First, the background of text pictures is complex and the composition of areas such as symbols, logos, bricks and grass is very difficult to distinguish from text. In addition, the super-mixed factors of non-uniform lighting conditions, strong exposure, low contrast, blur, and low resolution add significant challenges to the text detection task
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a candidate text box generation and text detection method based on a full convolution neural network.
The technical scheme of the invention is realized as follows:
the candidate text box generating and text detecting method based on the full convolution neural network includes the steps of
S1: generating a text region candidate frame, wherein an initiation-RPN takes a natural scene picture and a set of real boundary frames for marking a text region as input, generates word region candidate frames with controllable quantity, slides an initiation network on a convolution characteristic response image of a VGG16 model, and assists a set of text characteristic prior frames at each sliding position;
s2: text type monitoring information which easily causes ambiguity is merged, multilevel regional downsampling information is merged, and text detection is carried out;
s3: training an initiation candidate box in an end-to-end mode to generate a network and a text detection network through back propagation and random gradient descent;
s4: the candidate box iterative voting achieves a higher text recall in a complementary manner, using a candidate box filtering algorithm to remove the remaining detection boxes.
Further, step S1 includes the step of
S11: designing a text feature prior box;
s12: and constructing an inclusion candidate box generation network.
Further, in step S11, there are 24 text feature prior boxes, where the width of the sliding window at each sliding position is set to be 32, 48, 64 and 80, and the length-to-width ratio is 0.2, 0.5, 0.8, 1.0, 1.2 and 1.5.
Further, the initiation candidate frame generation net in step S12 is formed by connecting a 3 × 3 convolutional layer, a 5 × 5 convolutional layer and a 3 × 3 max pooling layer to a corresponding spatial acceptance domain of the characteristic response map of Conv5_3 as input.
Further, in step S2, the text type supervision information is: the candidate box IoU is designated as existing text with an overlap of 0.5 or more, the candidate box IoU is designated as "fuzzy text" with an overlap of 0.2 or more and less than 0.5, and the others are designated as containing no text information.
Further, the multi-level area downsampling information in step S2 is: the convolutional feature response maps of Conv4_3 and Conv5_3 in the VGG16 network both perform multi-level region downsampling and obtain two 512H W sampled features, which are then decoded with one 512H 1W convolutional layer to join the features together.
Compared with the prior art, the invention provides an initiation candidate box generation network, which applies sliding windows with different sizes on a convolution characteristic diagram and assists a set of text characteristic prior boxes at each sliding position to generate word area candidate boxes. The sliding windows with different sizes keep local information at corresponding positions and also give consideration to context information, and help to filter candidate frames without texts, and the initiation candidate frame generation network of the invention obtains high recall rate under the condition of only using hundreds of word candidate frames; the invention also introduces additional text category supervision information which is easy to disambiguate and multi-level regional down-sampling information which are fused into the text detection network, and the information helps the text detection network to learn more discriminative information to distinguish the text from the complex background; in addition, in order to better utilize the model in the training process, the invention provides a candidate box iterative voting scheme, and a higher word recall rate is obtained in a supplementary mode.
Drawings
FIG. 1 is a flow chart of a candidate text box generation and text detection method based on a full convolution neural network according to the present invention.
FIG. 2 is an exemplary diagram of IoU overlap of word region candidate boxes in a particular interval, according to one embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the method for generating a candidate text box and detecting a text based on a full convolution neural network of the present invention includes four steps: s1, generating a text region candidate box; s2, text detection; s3, end-to-end learning optimization; and S4, heuristic processing.
The function of the component S1 is: the initiation-RPN takes a natural scene picture and a set of real boundary boxes for marking text areas as input to generate word area candidate boxes with controllable quantity; to search for word region candidate boxes, we slide an initiation network over the convolution feature response map of the VGG16 model and assist a set of text feature prior boxes at each slide position. The method comprises the following steps: (1) and designing a text feature prior box (2) an inclusion candidate box to generate a network. Four different scales (32, 48, 64 and 80) and six different ratios (0.2, 0.5, 0.8, 1.0, 1.2 and 1.5) were set at each sliding position, for a total of 24 a priori sliding windows. In the learning phase, text labels are designated that intersect with the real text box divided by a union greater than 0.5, whereas background labels are designated that overlap areas divided by a union area less than 0.3. The designed concept candidate frame generation net is connected to a corresponding spatial acceptance domain of the characteristic response map of Conv5_3 as input by a convolution layer of 3 x 3, a convolution layer of 5 x 5 and a maximum pooling layer of 3 x 3. In addition, to reduce dimensionality, a convolution operation of 1 × 1 is applied on the 3 × 3 max pooling layer. Then we join the features of each part on the channel coordinates, and a 640-dimensional connected feature vector is sent to two output layers: the classification layer predicts whether the text score exists in the region, and the regression layer improves the text region position of various prior windows at each sliding position.
Step S2 includes: (1) text category supervision information which is easy to cause ambiguity is integrated to increase more reasonable supervision information, a classifier is helped to learn more distinctive features, text regions are distinguished from complex and diverse backgrounds, and candidate boxes which do not contain texts are filtered out. (2) And integrating multi-level regional down-sampling information. The effect is to better utilize the convolution characteristics of multiple layers and enrich the distinguishing information of each sliding window.
Much of the previous work has been on detecting networks designating IoU candidates with overlap greater than 0.5 as being text present and vice versa. However, this method of determining whether text is present in the candidate box is not reasonable because IoU overlap in the interval 0.2 to 0.5 may contain spatial or extensive text information, as shown in FIG. 2. This confounding labeling information can confuse the classification learning of textual and non-textual candidate boxes. To this end, we propose to designate as present text the candidate box IoU that overlaps by 0.5 or more, the candidate box IoU that overlaps by 0.2 or more and less than 0.5 as "fuzzy text", and the others as not containing textual information. This strategy provides more reasonable supervision information to help the classifier learn more distinctive features to discriminate text from a complex and diverse background and filter out candidate boxes that do not contain text.
In order to better utilize multilevel convolution characteristics and enrich the discrimination information of each candidate frame, the invention performs multilevel regional downsampling on the convolution characteristic response graphs of Conv4_3 and Conv5_3 of the VGG16 network, and obtains two sampling characteristics of 512H W. The connected features are then decoded with a 512 x 1 convolutional layer. The effect of this 1 x 1 convolutional layer is (1) to combine the sampled features of multiple levels together and weight-weighted fusion during the training process. (2) The dimensions are reduced to match the first fully connected layer of VGG 16.
The component S3 is different from the proposed four-step training strategy combining RPN and Fast-RCNN, the invention trains the initiation candidate box generation network and the text detection network in an end-to-end mode through a back propagation and random gradient descent method. The shared convolutional network is initialized by a pre-trained imageNet classification network. The weights of the new layers are initialized by a gaussian distribution with a mean of 0 and a variance of 0.01. The baseline learning rate was 0.001, and was reduced to one tenth of the original 40000 times per iteration. The momentum and the weight attenuation were set to 0.9 and 0.0005, respectively.
The inclusion candidate box generation network and the text detection network have two sibling input layers: a classification layer, and a regression layer. The difference between the output layers of the inclusion candidate box generation network and the text detection network is as follows: (1) the inclusion candidate box generates a network, and each prior box should be parameterized independently, so we need to predict k-24 prior candidate boxes simultaneously. The classification layer outputs 2k scores for judging whether the candidate frames have texts, and the regression layer outputs 4k numerical values of the improved candidate frames deviating from the original candidate frames. (2) The text detection network has three output scores for each candidate box, which respectively correspond to the background, the fuzzy text and the candidate box with the text. The regression layer outputs 4 regression bias values for each text candidate box. We minimize this multi-tasking loss function during the training process, the formula is as follows:
L(p,p*,t,t*)=Lcls(p,p*)+λLreg(t,t*), (0.1)
loss function L of classification layerclsIs the softmax loss function, p and p*Respectively a predicted tag and a genuine tag. Regression loss function LregA smooth-L1 loss function is applied. In addition, t is { t ═ tx,ty,tw,thAndcorresponding regression deviation value vectors, t, representing the prediction and true candidate frames, respectively*The following formula is obtained:
here, P ═ { P ═ Px,Py,Pw,PhAnd G ═ Gx,Gy,Gw,GhRepresents the center coordinates, height, and width of the corresponding candidate frame P and real text frame G, respectivelyAnd (4) degree. λ represents the loss balance parameter, and in the initiation candidate box generation network we let λ be 3 to bias him towards better candidate box positions, and in the text detection network let λ be 1.
The component S4 includes a candidate box iterative voting mechanism and a filtering algorithm. The candidate box iterative voting mechanism enables the invention to obtain higher text recall rate in a supplementary mode and improves the performance of a text detection system. The filtering algorithm allows the present invention to remove excess detection boxes to improve accuracy.
The invention firstly inputs a natural scene picture and a set of real text box data into an initiation candidate box to generate a network, and generates a certain number of word area candidate boxes. And then, sending the obtained word region candidate box into a text detection network for text and non-text classification and text positioning, wherein the network adds text category supervision information which is easy to cause ambiguity and region down-sampling information which is fused with multiple layers in the training process. The entire system is trained in an end-to-end fashion through back-propagation and gradient descent mechanisms. In order to fully utilize the intermediate model of the training process, the invention adopts a candidate box iterative voting mechanism to obtain the high recall rate of the text example in a supplementary mode, and improves the performance of the whole text detection system. Finally, the invention applies a filtering algorithm that finds the inside and outside candidate boxes for each text instance in terms of coordinate position, retains the high-score candidate boxes, and removes the low-score candidate boxes.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.
Claims (6)
1. The candidate text box generation and text detection method based on the full convolution neural network is characterized by comprising the following steps
S1: generating a text region candidate frame, wherein an initiation-RPN takes a natural scene picture and a set of real boundary frames for marking a text region as input, generates word region candidate frames with controllable quantity, slides an initiation network on a convolution characteristic response image of a VGG16 model, and assists a set of text characteristic prior frames at each sliding position;
s2: text type monitoring information which easily causes ambiguity is merged, multilevel regional downsampling information is merged, and text detection is carried out;
s3: training an initiation candidate box in an end-to-end mode to generate a network and a text detection network through back propagation and random gradient descent;
s4: iteratively voting the candidate boxes to obtain a higher text recall rate in a supplementary mode, and removing the residual detection boxes by using a candidate box filtering algorithm;
the loss function of multitask is minimized in the training process, and the formula is as follows:
loss function L of classification layerclsIs the softmax loss function, p and p*Predicted tags and true tags, respectively; regression loss function LregApplying a smooth-L1 loss function; in addition, t is { t ═ tx,ty,tw,thAndcorresponding regression deviation value vectors, t, representing the prediction and true candidate frames, respectively*The following formula is obtained:
wherein P ═ { P ═ Px,Py,Pw,PhAnd G ═ Gx,Gy,Gw,GhRepresents the center coordinates, height and width of the corresponding candidate box P and real text box G, respectively, and λ represents the loss balance parameter.
2. The method for generating candidate text boxes and detecting text based on full convolutional neural network as claimed in claim 1, wherein step S1 includes the step of
S11: designing a text feature prior box;
s12: and constructing an acceptance candidate box to generate a network.
3. The method for generating candidate text boxes and detecting text based on full convolution neural network as claimed in claim 2, wherein the text feature prior boxes in step S11 are 24 in number, wherein the width of each sliding window in sliding position is set to be 32, 48, 64 and 80, and the ratio of length to width is 0.2, 0.5, 0.8, 1.0, 1.2 and 1.5.
4. The method as claimed in claim 2, wherein the concept candidate box generating network in step S12 is formed by connecting a 3 x 3 convolutional layer, a 5 x 5 convolutional layer and a 3 x 3 maximal pooling layer to corresponding spatial acceptance fields of a feature response map of Conv5_3 as input.
5. The method for generating candidate text boxes and detecting texts based on full convolutional neural network as claimed in claim 1, wherein the text type supervision information in step S2 is: the candidate box IoU is designated as existing text with an overlap of 0.5 or more, the candidate box IoU is designated as "fuzzy text" with an overlap of 0.2 or more and less than 0.5, and the others are designated as containing no text information.
6. The method for generating a candidate text box and detecting a text based on a full convolutional neural network as claimed in claim 1, wherein the multi-level region downsampling information in step S2 is: the convolutional feature response maps of Conv4_3 and Conv5_3 in the VGG16 network both perform multi-level region downsampling and obtain two 512H W sampled features, which are then decoded with one 512H 1W convolutional layer to join the features together.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611070587.9A CN106650725B (en) | 2016-11-29 | 2016-11-29 | Candidate text box generation and text detection method based on full convolution neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611070587.9A CN106650725B (en) | 2016-11-29 | 2016-11-29 | Candidate text box generation and text detection method based on full convolution neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106650725A CN106650725A (en) | 2017-05-10 |
CN106650725B true CN106650725B (en) | 2020-06-26 |
Family
ID=58813359
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611070587.9A Active CN106650725B (en) | 2016-11-29 | 2016-11-29 | Candidate text box generation and text detection method based on full convolution neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106650725B (en) |
Families Citing this family (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107316058A (en) * | 2017-06-15 | 2017-11-03 | 国家新闻出版广电总局广播科学研究院 | Improve the method for target detection performance by improving target classification and positional accuracy |
CN107397658B (en) * | 2017-07-26 | 2020-06-19 | 成都快眼科技有限公司 | Multi-scale full-convolution network and visual blind guiding method and device |
CN109389114B (en) * | 2017-08-08 | 2021-12-03 | 富士通株式会社 | Text line acquisition device and method |
CN107480649B (en) * | 2017-08-24 | 2020-08-18 | 浙江工业大学 | Fingerprint sweat pore extraction method based on full convolution neural network |
CN108090443B (en) * | 2017-12-15 | 2020-09-22 | 华南理工大学 | Scene text detection method and system based on deep reinforcement learning |
CN108288088B (en) * | 2018-01-17 | 2020-02-28 | 浙江大学 | Scene text detection method based on end-to-end full convolution neural network |
CN108154145B (en) * | 2018-01-24 | 2020-05-19 | 北京地平线机器人技术研发有限公司 | Method and device for detecting position of text in natural scene image |
CN108647681B (en) * | 2018-05-08 | 2019-06-14 | 重庆邮电大学 | A kind of English text detection method with text orientation correction |
CN108764228A (en) * | 2018-05-28 | 2018-11-06 | 嘉兴善索智能科技有限公司 | Word object detection method in a kind of image |
CN110619325B (en) * | 2018-06-20 | 2024-03-08 | 北京搜狗科技发展有限公司 | Text recognition method and device |
CN109190458B (en) * | 2018-07-20 | 2022-03-25 | 华南理工大学 | Method for detecting head of small person based on deep learning |
CN109165697B (en) * | 2018-10-12 | 2021-11-30 | 福州大学 | Natural scene character detection method based on attention mechanism convolutional neural network |
CN109376658B (en) * | 2018-10-26 | 2022-03-08 | 信雅达科技股份有限公司 | OCR method based on deep learning |
CN109492630A (en) * | 2018-10-26 | 2019-03-19 | 信雅达系统工程股份有限公司 | A method of the word area detection positioning in the financial industry image based on deep learning |
CN109299274B (en) * | 2018-11-07 | 2021-12-17 | 南京大学 | Natural scene text detection method based on full convolution neural network |
CN109598290A (en) * | 2018-11-22 | 2019-04-09 | 上海交通大学 | A kind of image small target detecting method combined based on hierarchical detection |
CN109800756B (en) * | 2018-12-14 | 2021-02-12 | 华南理工大学 | Character detection and identification method for dense text of Chinese historical literature |
CN113454638A (en) * | 2018-12-19 | 2021-09-28 | 艾奎菲股份有限公司 | System and method for joint learning of complex visual inspection tasks using computer vision |
CN109918987B (en) * | 2018-12-29 | 2021-05-14 | 中国电子科技集团公司信息科学研究院 | Video subtitle keyword identification method and device |
CN110135408B (en) * | 2019-03-26 | 2021-02-19 | 北京捷通华声科技股份有限公司 | Text image detection method, network and equipment |
CN110135248A (en) * | 2019-04-03 | 2019-08-16 | 华南理工大学 | A kind of natural scene Method for text detection based on deep learning |
CN110135424B (en) * | 2019-05-23 | 2021-06-11 | 阳光保险集团股份有限公司 | Inclined text detection model training method and ticket image text detection method |
CN112418207B (en) * | 2020-11-23 | 2024-03-19 | 南京审计大学 | Weak supervision character detection method based on self-attention distillation |
CN112765353B (en) * | 2021-01-22 | 2022-11-04 | 重庆邮电大学 | Scientific research text-based biomedical subject classification method and device |
CN117275005B (en) * | 2023-09-21 | 2024-08-09 | 北京百度网讯科技有限公司 | Text detection, text detection model optimization and data annotation method and device |
CN117496130B (en) * | 2023-11-22 | 2024-07-02 | 中国科学院空天信息创新研究院 | Basic model weak supervision target detection method based on context awareness self-training |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104915386A (en) * | 2015-05-25 | 2015-09-16 | 中国科学院自动化研究所 | Short text clustering method based on deep semantic feature learning |
CN105740892A (en) * | 2016-01-27 | 2016-07-06 | 北京工业大学 | High-accuracy human body multi-position identification method based on convolutional neural network |
CN105912611A (en) * | 2016-04-05 | 2016-08-31 | 中国科学技术大学 | CNN based quick image search method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015132665A2 (en) * | 2014-03-07 | 2015-09-11 | Wolf, Lior | System and method for the detection and counting of repetitions of repetitive activity via a trained network |
-
2016
- 2016-11-29 CN CN201611070587.9A patent/CN106650725B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104915386A (en) * | 2015-05-25 | 2015-09-16 | 中国科学院自动化研究所 | Short text clustering method based on deep semantic feature learning |
CN105740892A (en) * | 2016-01-27 | 2016-07-06 | 北京工业大学 | High-accuracy human body multi-position identification method based on convolutional neural network |
CN105912611A (en) * | 2016-04-05 | 2016-08-31 | 中国科学技术大学 | CNN based quick image search method |
Non-Patent Citations (2)
Title |
---|
Dictionary Pair Classifier Driven Convolutional Neural Networks for Object Detection;Keze Wang 等;《2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)》;20160630;第2138-2146页 * |
深度学习在手写汉字识别中的应用综述;金连文 等;《自动化学报》;20160831;第1125-1142页 * |
Also Published As
Publication number | Publication date |
---|---|
CN106650725A (en) | 2017-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106650725B (en) | Candidate text box generation and text detection method based on full convolution neural network | |
CN110852368B (en) | Global and local feature embedding and image-text fusion emotion analysis method and system | |
Yan | Computational methods for deep learning | |
CN106897732B (en) | It is a kind of based on connection text section natural picture in multi-direction Method for text detection | |
CN111858954B (en) | Task-oriented text-generated image network model | |
CN109299274B (en) | Natural scene text detection method based on full convolution neural network | |
Iwana et al. | Judging a book by its cover | |
EP3447727B1 (en) | A method, an apparatus and a computer program product for object detection | |
CN110866140A (en) | Image feature extraction model training method, image searching method and computer equipment | |
WO2016037300A1 (en) | Method and system for multi-class object detection | |
Ni et al. | Learning to photograph: A compositional perspective | |
CN113361432B (en) | Video character end-to-end detection and identification method based on deep learning | |
CN111914622A (en) | Character interaction detection method based on deep learning | |
CN108345850A (en) | The scene text detection method of the territorial classification of stroke feature transformation and deep learning based on super-pixel | |
CN111274981B (en) | Target detection network construction method and device and target detection method | |
CN110929665A (en) | Natural scene curve text detection method | |
CN111598183A (en) | Multi-feature fusion image description method | |
US20180314894A1 (en) | Method, an apparatus and a computer program product for object detection | |
Yan | Computational methods for deep learning: theory, algorithms, and implementations | |
CN112734803A (en) | Single target tracking method, device, equipment and storage medium based on character description | |
CN113378919A (en) | Image description generation method for fusing visual sense and enhancing multilayer global features | |
CN111598155A (en) | Fine-grained image weak supervision target positioning method based on deep learning | |
Piergiovanni et al. | Video question answering with iterative video-text co-tokenization | |
CN112101344B (en) | Video text tracking method and device | |
Shi et al. | Weakly supervised deep learning for objects detection from images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |