Long-Range Dependence Involutional Network for Logo Detection
<p>Three logo challenges. (<b>a</b>) Logo with a large aspect ratio; (<b>b</b>) logos with multiple scales in an image; (<b>c</b>) logo deformation caused by angle change, reflection, and other reasons.</p> "> Figure 2
<p>Overview of proposed LDI-Net for logo detection. MRNAS: multilevel representation neural architecture search. LD involution: long-range dependence involution. ARM: adaptive RoI pooling module.</p> "> Figure 3
<p>Feature map transformation process based on involution.</p> "> Figure 4
<p>Construction of the involutional kernel.</p> "> Figure 5
<p>Some examples of LDI-Net test results. The orange box represents the location of the detected logo object. The top of the box represents categories and accuracy.</p> "> Figure 6
<p>Comparison of dynamic R-CNN and LDI-Net with increasing number of iterations.</p> "> Figure 7
<p>Comparison of visualization results of dynamic R-CNN and LDI-Net for the large aspect ratio problem. Blue boxes: ground-truth boxes. Orange boxes: correct detection boxes.</p> "> Figure 8
<p>Comparison of visualization results of dynamic R-CNN and LDI-Net for multiscale logo images. Blue boxes: ground-truth boxes. Orange boxes: correct detection boxes.</p> "> Figure 9
<p>Comparison of visualization results of dynamic R-CNN and LDI-Net for logo deformation images. Blue boxes: ground-truth boxes. Orange boxes: correct detection boxes.</p> ">
Abstract
:1. Introduction
- The large aspect ratio of logos usually spans a large area in an image, as shown in Figure 1a. As far as we know, there has been little research on the problem of large aspect ratios in logo detection. Existing two-stage approaches based on fixed-size anchors fail to complete the detection of flexible aspect ratio logos [5,6,7,8]. Optimized strategies [9,10,11] also have a limited effect on the detection of logos with a large aspect ratio. The method in [12] could generate anchors of any shape, but it was unable to extract long-range dependence. Similarly, the traditional convolutional approach cannot fully utilize long-range interaction and the locations of various spatial features, which severely restricts its ability to address the large aspect ratio of logos.
- Multiscale logo objects in an image. As seen in Figure 1b, ’adidas’ appears both in the foreground and background, but the scale varies greatly. Scale diversity can be resolved utilizing feature pyramid networks (FPNs) [13], but the semantic information of small objects may be lost after multiple instances of downsampling. The bottom–up information channel is increased by PANet [14], but the information is concentrated more in the adjacent layers. Although SEPC [15] can extract multilevel features, it has the disadvantage of the topology being too simple to extract more information.
- We developed a network with LD Involution for logo detection by establishing long-range information dependence, and ranking the significance of visual information via a new operator and a self-attention mechanism to solve the problem of a large aspect ratio.
- We constructed a diverse multipath topology on the basis of neural architecture search theory in which each path utilizes a specific feature fusion.
- We conducted extensive experiments and evaluated our approach on four benchmark logo datasets: FlickrLogos-32, QMUL-OpenLogo, LogoDet-3K-1000 and LogoDet-3K. The experimental results demonstrate the effectiveness of the proposed model.
2. Related Work
2.1. Object Detection
2.2. Logo Detection
3. Our Approach
3.1. Multilevel Representation Neural Architecture Search
3.2. Long-Range Dependence Involution
3.3. Adaptive RoI Pooling Module
3.4. Loss Function
4. Experiments
4.1. Experimental Setting
4.1.1. Datasets
4.1.2. Implementation Details
4.2. Experiments on LogoDet-3K
4.2.1. Comparisons with State of the Art
4.2.2. Qualitative Analysis
4.3. Experiments on Other Benchmarks
4.3.1. Results on LogoDet-3K-1000
4.3.2. Results on QMUL-OpenLogo
4.3.3. Results on FlickrLogos-32
4.4. Ablation Study
4.4.1. LD Involution
4.4.2. MRNAS
4.4.3. ARM
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Yang, L.; Luo, P.; Change Loy, C.; Tang, X. A large-scale car dataset for fine-grained categorization and verification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3973–3981. [Google Scholar]
- Ke, X.; Du, P. Vehicle logo recognition with small sample problem in complex scene based on data augmentation. Math. Probl. Eng. 2020, 2020, 6591873. [Google Scholar] [CrossRef]
- Gao, Y.; Wang, F.; Luan, H.; Chua, T.S. Brand data gathering from live social media streams. In Proceedings of the International Conference on Multimedia Retrieval, Glasgow, UK, 1–4 April 2014; pp. 169–176. [Google Scholar]
- Zhu, G.; Doermann, D. Automatic document logo detection. In Proceedings of the Ninth International Conference on Document Analysis and Recognition, Glasgow, UK, 1–4 April 2014; pp. 864–868. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Zhong, Y.; Wang, J.; Peng, J.; Zhang, L. Anchor box optimization for object detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1286–1294. [Google Scholar]
- Yang, T.; Zhang, X.; Li, Z.; Zhang, W.; Sun, J. Metaanchor: Learning to detect objects with customized anchors. Adv. Neural Inf. Process. Syst. 2018, 31, 318–328. [Google Scholar]
- Kong, T.; Sun, F.; Liu, H.; Jiang, Y.; Shi, J. Consistent optimization for single-shot object detection. arXiv 2019, arXiv:1901.06563. [Google Scholar]
- Wang, J.; Chen, K.; Yang, S.; Loy, C.C.; Lin, D. Region proposal by guided anchoring. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 2965–2974. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
- Wang, X.; Zhang, S.; Yu, Z.; Feng, L.; Zhang, W. Scale-equalizing pyramid convolution for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 13359–13368. [Google Scholar]
- Dewi, C.; Chen, R.C.; Zhuang, Y.C.; Christanto, H.J. Yolov5 Series Algorithm for Road Marking Sign Identification. Big Data Cogn. Comput. 2022, 6, 149. [Google Scholar] [CrossRef]
- El Morabit, S.; Rivenq, A.; Zighem, M.E.n.; Hadid, A.; Ouahabi, A.; Taleb-Ahmed, A. Automatic pain estimation from facial expressions: A comparative analysis using off-the-shelf CNN architectures. Electronics 2021, 10, 1926. [Google Scholar] [CrossRef]
- Chen, W.; Gao, L.; Li, X.; Shen, W. Lightweight convolutional neural network with knowledge distillation for cervical cells classification. Biomed. Signal Process. Control 2022, 71, 103177. [Google Scholar] [CrossRef]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
- Hou, S.; Li, J.; Min, W.; Hou, Q.; Zhao, Y.; Zheng, Y.; Jiang, S. Deep Learning for Logo Detection: A Survey. arXiv 2022, arXiv:2210.04399. [Google Scholar]
- Wang, J.; Min, W.; Hou, S.; Ma, S.; Zheng, Y.; Jiang, S. LogoDet-3K: A Large-Scale Image Dataset for Logo Detection. ACM Trans. Multimed. Comput. Commun. Appl. 2022, 18, 1–19. [Google Scholar] [CrossRef]
- Hou, Q.; Min, W.; Wang, J.; Hou, S.; Zheng, Y.; Jiang, S. FoodLogoDet-1500: A Dataset for Large-Scale Food Logo Detection via Multi-Scale Feature Decoupling Network. In Proceedings of the 29th ACM International Conference on Multimedia, Chengdu, China, 20 October 2021; pp. 4670–4679. [Google Scholar]
- Xu, W.; Liu, Y.; Lin, D. A Simple and Effective Baseline for Robust Logo Detection. In Proceedings of the 29th ACM International Conference on Multimedia, Nice, France, 21–25 October 2021; pp. 4784–4788. [Google Scholar]
- Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Kauai, HI, USA, 8–14 December 2001; pp. I–I. [Google Scholar]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–26 June 2005; pp. 886–893. [Google Scholar]
- Felzenszwalb, P.; McAllester, D.; Ramanan, D. A discriminatively trained, multiscale, deformable part model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Anchorage, Alaska, 23–28 June 2008; pp. 1–8. [Google Scholar]
- Yan, W.Q.; Wang, J.; Kankanhalli, M.S. Automatic video logo detection and removal. Multimed. Syst. 2005, 10, 379–391. [Google Scholar] [CrossRef]
- Wang, Y.; Liu, Z.; Xiao, F. A fast coarse-to-fine vehicle logo detection and recognition method. In Proceedings of the 2007 IEEE International Conference on Robotics and Biomimetics, Sanya, China, 15–18 December 2007; pp. 691–696. [Google Scholar]
- Bao, Y.; Li, H.; Fan, X.; Liu, R.; Jia, Q. Region-based CNN for logo detection. In Proceedings of the International Conference on Internet Multimedia Computing and Service, Xi’an, China, 19–21 August 2016; pp. 319–322. [Google Scholar]
- Velazquez, D.A.; Gonfaus, J.M.; Rodriguez, P.; Roca, F.X.; Ozawa, S.; Gonzàlez, J. Logo Detection With No Priors. IEEE Access 2021, 9, 106998–107011. [Google Scholar] [CrossRef]
- Wang, J.; Zheng, Y.; Song, J.; Hou, S. Cross-View Representation Learning for Multi-View Logo Classification with Information Bottleneck. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual, 20–24 October 2021; pp. 4680–4688. [Google Scholar]
- Liang, T.; Wang, Y.; Tang, Z.; Hu, G.; Ling, H. Opanas: One-shot path aggregation network architecture search for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 10195–10203. [Google Scholar]
- Li, D.; Hu, J.; Wang, C.; Li, X.; She, Q.; Zhu, L.; Zhang, T.; Chen, Q. Involution: Inverting the inherence of convolution for visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 12321–12330. [Google Scholar]
- Srinivas, A.; Lin, T.Y.; Parmar, N.; Shlens, J.; Abbeel, P.; Vaswani, A. Bottleneck transformers for visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 16519–16529. [Google Scholar]
- Zhu, X.; Hu, H.; Lin, S.; Dai, J. Deformable convnets v2: More deformable, better results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9308–9316. [Google Scholar]
- Su, H.; Zhu, X.; Gong, S. Open logo detection challenge. arXiv 2018, arXiv:1807.01964. [Google Scholar]
- Romberg, S.; Pueyo, L.G.; Lienhart, R.; Van Zwol, R. Scalable logo recognition in real-world images. In Proceedings of the 1st ACM International Conference on Multimedia Retrieval, Trento, Italy, 18–20 April 2011; pp. 1–8. [Google Scholar]
- Chen, K.; Wang, J.; Pang, J.; Cao, Y.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Xu, J.; et al. MMDetection: Open mmlab detection toolbox and benchmark. arXiv 2019, arXiv:1906.07155. [Google Scholar]
- Zhang, H.; Chang, H.; Ma, B.; Wang, N.; Chen, X. Dynamic R-CNN: Towards high quality object detection via dynamic training. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 260–275. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef] [Green Version]
- Zhu, C.; He, Y.; Savvides, M. Feature selective anchor-free module for single-shot object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 840–849. [Google Scholar]
- Zhang, S.; Chi, C.; Yao, Y.; Lei, Z.; Li, S.Z. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9759–9768. [Google Scholar]
- Li, X.; Wang, W.; Wu, L.; Chen, S.; Hu, X.; Li, J.; Tang, J.; Yang, J. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. Adv. Neural Inf. Process. Syst. 2020, 33, 21002–21012. [Google Scholar]
- Bodla, N.; Singh, B.; Chellappa, R.; Davis, L.S. Soft-NMS–improving object detection with one line of code. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5561–5569. [Google Scholar]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
- Pang, J.; Chen, K.; Shi, J.; Feng, H.; Ouyang, W.; Lin, D. Libra r-cnn: Towards balanced learning for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 821–830. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
- Wang, J.; Zhang, W.; Cao, Y.; Chen, K.; Pang, J.; Gong, T.; Shi, J.; Loy, C.C.; Lin, D. Side-aware boundary localization for more precise object detection. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 403–419. [Google Scholar]
- Sun, P.; Zhang, R.; Jiang, Y.; Kong, T.; Xu, C.; Zhan, W.; Tomizuka, M.; Li, L.; Yuan, Z.; Wang, C.; et al. Sparse r-cnn: End-to-end object detection with learnable proposals. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 14454–14463. [Google Scholar]
- Kong, T.; Sun, F.; Liu, H.; Jiang, Y.; Li, L.; Shi, J. Foveabox: Beyound anchor-based object detection. IEEE Trans. Image Process. 2020, 29, 7389–7398. [Google Scholar] [CrossRef]
- Wu, Y.; Chen, Y.; Yuan, L.; Liu, Z.; Wang, L.; Li, H.; Fu, Y. Rethinking classification and localization for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10186–10195. [Google Scholar]
- Iandola, F.N.; Shen, A.; Gao, P.; Keutzer, K. Deeplogo: Hitting logo recognition with the deep neural network hammer. arXiv 2015, arXiv:1510.02131. [Google Scholar]
- Oliveira, G.; Frazão, X.; Pimentel, A.; Ribeiro, B. Automatic graphic logo detection via fast region-based convolutional networks. In Proceedings of the 2016 International Joint Conference on Neural Networks, Vancouver, BC, Canada, 24–29 July 2016; pp. 985–991. [Google Scholar]
Datasets | #Classes | #Images | #Objects | #Trainval | #Test |
---|---|---|---|---|---|
FlickrLogos-32 [41] | 32 | 2240 | 3405 | 1478 | 762 |
QMUL-OpenLogo [40] | 352 | 27,083 | 51,207 | 18,752 | 8331 |
LogoDet-3K-1000 [25] | 1000 | 85,344 | 101,345 | 75,785 | 9559 |
LogoDet-3K [25] | 3000 | 158,652 | 194,261 | 142,142 | 16,510 |
Methods | Backbone | mAP(%) | Processing Time (days) | Size (KB/epoch) | Params (M) | FLOPs (G) |
---|---|---|---|---|---|---|
One-stage: | ||||||
FSAF [45] | ResNet-50-FPN | 78.3 | 5 | 336,668 | 42.92 | 349.84 |
ATSS [46] | ResNet-50-FPN | 79.9 | 7 | 304,457 | 38.8 | 348.86 |
GFL [47] | ResNet-50-FPN | 81.2 | 5 | 305,590 | 38.95 | 351.96 |
Two-stage: | ||||||
Faster R-CNN [5] | ResNet-50-FPN | 83.8 | 4 | 442,663 | 56.49 | 222.02 |
Soft-NMS [48] | ResNet-50-FPN | 82.1 | - | - | 56.28 | 177.46 |
PANet [14] | ResNet-50-PAFPN | 83.1 | 5 | 470,332 | 60.03 | 246.8 |
Cascade R-CNN [20] | ResNet-50-FPN | 85.6 | 8 | 611,864 | 78.15 | 243.68 |
Generalized IoU [49] | ResNet-50-FPN | 84.4 | 7 | 442,663 | 56.49 | 222.02 |
Libra R-CNN [50] | ResNet-50-BFP | 82.4 | 5 | 444,726 | 56.76 | 223.07 |
Guided Anchoring [12] | ResNet-50-FPN | 86.3 | - | - | 57.08 | 221.79 |
Distance IoU [51] | ResNet-50-FPN | 83.5 | 4 | 442,663 | 56.49 | 222.02 |
Complete IoU [51] | ResNet-50-FPN | 82.7 | 4 | 442,663 | 56.49 | 222.02 |
Dynamic R-CNN [43] | ResNet-50-FPN | 87.1 | 8 | 442,664 | 56.49 | 222.02 |
SABL [52] | ResNet-50-FPN | 85.7 | 6 | 352,738 | 44.98 | 269.34 |
Sparse R-CNN [53] | ResNet-50-FPN | 74.3 | 5 | 1,297,338 | 110.57 | 150.36 |
LDI-Net(ours) | ResNet-50-MRNAS | 88.7 | 10 | 1,443,658 | 183.66 | 152.1 |
Methods | Backbone | mAP(%) |
---|---|---|
One-stage: | ||
FSAF [45] | ResNet-50-FPN | 87.3 |
ATSS [46] | ResNet-50-FPN | 87.8 |
GFL [47] | ResNet-50-FPN | 87.7 |
Two-stage: | ||
Faster R-CNN [5] | ResNet-50-FPN | 88.2 |
Soft-NMS [48] | ResNet-50-FPN | 89.1 |
PANet [14] | ResNet-50-PAFPN | 89.1 |
Cascade R-CNN [20] | ResNet-50-FPN | 89.1 |
Generalized IoU [49] | ResNet-50-FPN | 88.2 |
Libra R-CNN [50] | ResNet-50-BFP | 88.4 |
Guided anchoring [12] | ResNet-50-FPN | 89.1 |
Distance IoU [51] | ResNet-50-FPN | 88.7 |
Complete IoU [51] | ResNet-50-FPN | 88.9 |
Dynamic R-CNN [43] | ResNet-50-FPN | 88.5 |
SABL [52] | ResNet-50-FPN | 88.8 |
Sparse R-CNN [53] | ResNet-50-FPN | 86.8 |
LDI-Net(ours) | ResNet-50-MRNAS | 90.4 |
Methods | Backbone | mAP(%) |
---|---|---|
One-stage: | ||
SSD [6] | VGG-16 | 41.6 |
FSAF [45] | ResNet-50-FPN | 44.6 |
ATSS [46] | ResNet-50-FPN | 48.4 |
GFL [47] | ResNet-50-FPN | 47.3 |
FoveaBox [54] | ResNet-50-FPN | 35.6 |
Two-stage: | ||
Faster R-CNN [5] | ResNet-50-FPN | 53.8 |
Soft-NMS [48] | ResNet-50-FPN | 54.1 |
PANet [14] | ResNet-50-PAFPN | 54.5 |
Cascade R-CNN [20] | ResNet-50-FPN | 54.2 |
Generalized IoU [49] | ResNet-50-FPN | 54.2 |
Libra R-CNN [50] | ResNet-50-BFP | 54.6 |
Guided Anchoring [12] | ResNet-50-FPN | 52.2 |
Distance IoU [51] | ResNet-50-FPN | 54.4 |
Complete IoU [51] | ResNet-50-FPN | 53.7 |
Dynamic R-CNN [43] | ResNet-50-FPN | 54.6 |
Double-head R-CNN [55] | ResNet-50-FPN | 54.2 |
SABL [52] | ResNet-50-FPN | 53.4 |
Sparse R-CNN [53] | ResNet-50-FPN | 50.5 |
LDI-Net(ours) | ResNet-50-MRNAS | 56.3 |
Methods | Backbone | mAP(%) |
---|---|---|
One-stage: | ||
SSD [6] | VGG-16 | 80.2 |
RetinaNet [7] | ResNet-50-FPN | 78.4 |
FSAF [45] | ResNet-50-FPN | 86.3 |
ATSS [46] | ResNet-50-FPN | 86.4 |
GFL [47] | ResNet-50-FPN | 87.2 |
FoveaBox [54] | ResNet-50-FPN | 85.5 |
Two-stage: | ||
Deep Logo [56] | VGG-16 | 74.4 |
Faster R-CNN [5] | ResNet-50-FPN | 88.2 |
BD-FRCN-M [57] | VGG-16 | 73.5 |
Soft-NMS [48] | ResNet-50-FPN | 88.8 |
PANet [14] | ResNet-50-PAFPN | 89.2 |
Cascade R-CNN [20] | ResNet-50-FPN | 89.2 |
Generalized IoU [49] | ResNet-50-FPN | 88.7 |
Libra R-CNN [50] | ResNet-50-BFP | 89.5 |
Guided Anchoring [12] | ResNet-50-FPN | 88.5 |
Distance IoU [51] | ResNet-50-FPN | 88.7 |
Complete IoU [51] | ResNet-50-FPN | 89.0 |
Dynamic R-CNN [43] | ResNet-50-FPN | 88.9 |
Double-head R-CNN [55] | ResNet-50-FPN | 89.2 |
SABL [52] | ResNet-50-FPN | 88.4 |
Sparse R-CNN [53] | ResNet-50-FPN | 81.6 |
LDI-Net(ours) | ResNet-50-MRNAS | 89.8 |
Involution | LD Involution | MRNAS | ARM | mAP(%) |
---|---|---|---|---|
87.1 | ||||
✓ | 87.2 | |||
✓ | 87.3 | |||
✓ | 87.5 | |||
✓ | 88.2 | |||
✓ | ✓ | 88.2 | ||
✓ | ✓ | ✓ | 88.7 |
Involution | LD Involution | MRNAS | ARM | mAP(%) |
---|---|---|---|---|
88.5 | ||||
✓ | 89.6 | |||
✓ | 89.7 | |||
✓ | 89.5 | |||
✓ | 88.9 | |||
✓ | ✓ | 89.9 | ||
✓ | ✓ | ✓ | 90.4 |
Involution | LD Involution | MRNAS | ARM | mAP(%) |
---|---|---|---|---|
54.6 | ||||
✓ | 55.1 | |||
✓ | 56.1 | |||
✓ | 56.0 | |||
✓ | 54.8 | |||
✓ | ✓ | 56.2 | ||
✓ | ✓ | ✓ | 56.3 |
Involution | LD Involution | MRNAS | ARM | mAP(%) |
---|---|---|---|---|
88.9 | ||||
✓ | 89.2 | |||
✓ | 89.6 | |||
✓ | 89.7 | |||
✓ | 89.2 | |||
✓ | ✓ | 89.8 | ||
✓ | ✓ | ✓ | 89.8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, X.; Hou, S.; Zhang, B.; Wang, J.; Jia, W.; Zheng, Y. Long-Range Dependence Involutional Network for Logo Detection. Entropy 2023, 25, 174. https://doi.org/10.3390/e25010174
Li X, Hou S, Zhang B, Wang J, Jia W, Zheng Y. Long-Range Dependence Involutional Network for Logo Detection. Entropy. 2023; 25(1):174. https://doi.org/10.3390/e25010174
Chicago/Turabian StyleLi, Xingzhuo, Sujuan Hou, Baisong Zhang, Jing Wang, Weikuan Jia, and Yuanjie Zheng. 2023. "Long-Range Dependence Involutional Network for Logo Detection" Entropy 25, no. 1: 174. https://doi.org/10.3390/e25010174
APA StyleLi, X., Hou, S., Zhang, B., Wang, J., Jia, W., & Zheng, Y. (2023). Long-Range Dependence Involutional Network for Logo Detection. Entropy, 25(1), 174. https://doi.org/10.3390/e25010174