More Web Proxy on the site http://driver.im/

research-article

Visual and Semantic Knowledge Transfer for Large Scale Semi-Supervised Object Detection

Authors:

Emmanuel Dellandréa,

Robert Gaizauskas,

Liming ChenAuthors Info & Claims

IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 40, Issue 12

Pages 3045 - 3058

https://doi.org/10.1109/TPAMI.2017.2771779

Published: 01 December 2018 Publication History

Abstract

Deep CNN-based object detection systems have achieved remarkable success on several large-scale object detection benchmarks. However, training such detectors requires a large number of labeled bounding boxes, which are more difficult to obtain than image-level annotations. Previous work addresses this issue by transforming image-level classifiers into object detectors. This is done by modeling the differences between the two on categories with both image-level and bounding box annotations, and transferring this information to convert classifiers to detectors for categories without bounding box annotations. We improve this previous work by incorporating knowledge about object similarities from visual and semantic domains during the transfer process. The intuition behind our proposed method is that visually and semantically similar categories should exhibit more common transferable properties than dissimilar categories, e.g. a better detector would result by transforming the differences between a dog classifier and a dog detector onto the cat class, than would by transforming from the violin class. Experimental results on the challenging ILSVRC2013 detection dataset demonstrate that each of our proposed object similarity based knowledge transfer methods outperforms the baseline methods. We found strong evidence that visual similarity and semantic relatedness are complementary for the task, and when combined notably improve detection, achieving state-of-the-art detection performance in a semi-supervised setting.

References

[1]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2012, pp. 1097–1105.

[2]

C. Szegedy, A. Toshev, and D. Erhan, “Deep neural networks for object detection,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2013, pp. 2553–2561.

[3]

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, “Overfeat: Integrated recognition, localization and detection using convolutional networks,” in Proc. Int. Conf. Learn. Representations, 2014.

[4]

R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2014, pp. 580–587.

Digital Library

[5]

K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE Trans. Pattern Anal. Mach. Intell. , vol. 37, no. 9, pp. 1904–1916, Sep. 2015.

Digital Library

[6]

R. Girshick, “Fast R-CNN: Towards real-time object detection with region proposal networks,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 1440–1448.

[7]

S. Ren, K. He, R. Girshick, and J. Sun, “Fast R-CNN: Towards real-time object detection with region proposal networks,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2015, pp. 91–99.

[8]

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 779–788.

[9]

W. Liu, et al., “SSD: Single shot multibox detector,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 21–37.

[10]

M. Everingham, L. Van Gool, C. Williams, J. Winn, and A. Zisserman, “The Pascal visual object classes (VOC) challenge,” Int. J. Comput. Vis., vol. 88, no. 2, pp. 303 –338, 2010.

Digital Library

[11]

O. Russakovsky, et al., “ImageNet large scale visual recognition challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, Apr. 2015.

Digital Library

[12]

T. Lin, et al., “Microsoft COCO: Common objects in context,” in Proc. Eur. Conf. Comput. Vis. (ECCV), vol. abs/1405.0312, 2014, pp. 740–755.

[13]

J. Hoffman, et al., “LSDA: Large scale detection through adaptation,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2014, pp. 3536–3544.

[14]

B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Object detectors emerge in deep scene CNNs,” in Proc. Int. Conf. Learn. Representations, 2015.

[15]

T. Deselaers and V. Ferrari, “Visual and semantic similarity in imageNet,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2011, pp. 1777–1784.

[16]

Y. Tang, J. Wang, B. Gao, E. Dellandrea, R. Gaizauskas, and L. Chen, “Large scale semi-supervised object detection using visual and semantic knowledge transfer,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 2119–2128.

[17]

D. Crandall and D. Huttenlocher, “Weakly supervised learning of part-based spatial models for visual object recognition,” in Proc. Eur. Conf. Comput. Vis., 2006, pp. 16–29.

[18]

O. Chum and A. Zisserman, “An exemplar model for learning object classes, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2007, pp. 1–8 .

[19]

C. Galleguillos, B. Babenko, A. Rabinovich, and S. Belongie, “Weakly supervised object recognition and localization with stable segmentations,” in Proc. Eur. Conf. Comput. Vis., 2008, pp 193–207.

[20]

M. Nguyen, L. Torresani, F. de la Torre, and C. Rother, “Weakly supervised discriminative localization and classification: A joint learning process,” in Proc. IEEE Int. Conf. Comput. Vis., 2009, pp. 1925 –1932.

[21]

P. Siva and T. Xiang, “Weakly supervised object detector learning with model drift detection,” in Proc. IEEE Int. Conf. Comput. Vis, 2011, pp. 343 –350.

[22]

M. Pandey and S. Lazebnik, “Scene recognition and weakly supervised object localization with deformable part-based models,” in Proc. IEEE Int. Conf. Comput. Vis., 2011, pp. 343–350.

[23]

P. Siva, C. Russell, and T. Xiang, “In defence of negative mining for annotating weakly labelled data,” in Proc. Eur. Conf. Comput. Vis., 2012, pp. 594–608.

[24]

T. Deselaers, B. Alexe, and V. Ferrari, “Weakly supervised localization and learning with generic knowledge,” Int. J. Comput. Vis. , vol. 100, no. 3, pp. 275–293, 2012.

Digital Library

[25]

Z. Shi, T. M. Hospedales, and T. Xiang, “Bayesian joint topic modelling for weakly supervised object localisation,” in Proc. IEEE Int. Conf. Comput. Vis., 2013, pp. 2984–2991.

[26]

Y. Tang, X. Wang, E. Dellandrea, S. Masnou, and L. Chen, “Fusing generic objectness and deformable part-based models for weakly supervised object detection,” in Proc. IEEE Int. Conf. Image Process., 2014, pp. 4072–4076.

[27]

H. Bilen, M. Pedersoli, and T. Tuytelaars, “Weakly supervised object detection with posterior regularization,” in Proc. Brit. Mach. Vis. Conf., 2014.

[28]

H. O. Song, R. Girshick, S. Jegelka, J. Mairal, Z. Harchaoui, and T. Darrell, “On learning to localize objects with minimal supervision,” in Proc. Int. Conf. Mach. Learn., 2014, pp. II-1611–II-1619.

[29]

H. Bilen, M. Pedersoli, and T. Tuytelaars, “Weakly supervised object detection with convex clustering,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 1081–1089.

[30]

C. Wang, K. Huang, W. Ren, J. Zhang, and S. Maybank, “Large-scale weakly supervised object localization via latent category learning,” IEEE Trans. Image Process., vol. 24, no. 4, pp. 1371–1385, Apr. 2015.

Digital Library

[31]

Y. Tang, X. Wang, E. Dellandrea, and L. Chen, “Weakly supervised learning of deformable part-based models for object detection via region proposals,” IEEE Trans. Multimedia, vol. 19, no. 2, pp. 393–407, Feb. 2017.

Digital Library

[32]

J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders, “Selective search for object recognition,” Int. J. Comput. Vis., vol. 104, no. 2, pp. 154–171, 2013.

Digital Library

[33]

M.-M. Cheng, Z. Zhang, W.-Y. Lin, and P. Torr, “Bing: Binarized normed gradients for objectness estimation at 300fps,” in IEEE Conf. Comput. Vis. Pattern Recognit., 2014, pp. 3286–3293.

[34]

C. L. Zitnick and P. Dollár, “Edge boxes: Locating object proposals from edges,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 391–405.

[35]

M. Oquab, L. Bottou, I. Laptev, and J. Sivic, “Is object localization for free? - weakly-supervised learning with convolutional neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 685– 694.

[36]

H. Bilen and A. Vedaldi, “Weakly supervised deep detection networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 2846– 2854.

[37]

B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 2921–2929.

[38]

L. Shao, F. Zhu, and X. Li, “Transfer learning for visual categorization: A survey,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 5, pp. 1019–1034, May 2015.

[39]

J. Donahue, J. Hoffman, E. Rodner, K. Saenko, and T. Darrell, “Semi-supervised domain adaptation with instance constraints,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2013, pp. 668–675.

[40]

M. Oquab, L. Bottou, I. Laptev, and J. Sivic, “Learning and transferring mid-level image representations using convolutional neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2014, pp. 1717–1724.

[41]

M. Rochan and Y. Wang, “Weakly supervised localization of novel objects using appearance transfer,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 4315–4324.

[42]

X. Shu, G.-J. Qi, J. Tang, and J. Wang, “Weakly-shared deep transfer networks for heterogeneous-domain knowledge propagation,” in Proc. ACM Int. Conf. Multimedia, 2015, pp. 35–44.

[43]

Y. Zhu, Y. Chen, Z. Lu, S. J. Pan, G.-R. Xue, Y. Yu, and Q. Yang, “Heterogeneous transfer learning for image classification,” in Proc. AAAI Conf. Artif. Intell., 2011, pp. 1304–1309.

[44]

Y. Lu, L. Chen, A. Saidi, E. Dellandrea, and Y. Wang, “Discriminative transfer learning using similarities and dissimilarities,” IEEE Trans. Neural Netw. Learn. Syst., vol. PP, no. 99, pp. 1–14, 2017.

[45]

K. K. Singh, F. Xiao, and Y. J. Lee, “Track and transfer: Watching videos to simulate strong human supervision for weakly-supervised object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 3548–3556.

[46]

A. Frome, et al., “DeViSE: A deep visual-semantic embedding model,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2013, pp. 2121–2129.

[47]

Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy layer-wise training of deep networks,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2007, pp. 153–160.

[48]

C. Fellbaum, Ed., WordNet: An Electronic Lexical Database. Cambridge, MA, USA: MIT Press, 1998.

[49]

C. Leacock and M. Chodorow, “Combining local context and WordNet similarity for word sense identification,” in WordNet: An Electronic Lexical Database, C. Fellbaum, Ed. Cambridge, MA, USA: MIT Press, 1998, pp. 265 –283.

[50]

P. Resnik, “Using information content to evaluate semantic similarity in a taxonomy,” in Proc. Int. Joint Conf. Artif. Intell., 1995, pp. 448–453.

[51]

D. Lin, “An information-theoretic definition of similarity,” in Proc. Int. Conf. Mach. Learn., 1998, pp. 296–304.

[52]

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2013, pp. 3111–3119.

[53]

J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Proc. Conf. Empirical Methods Natural Language Process., 2014, pp. 1532–1543.

[54]

T. Mikolov, W.-T. Yih, and G. Zweig, “Linguistic regularities in continuous space word representations,” in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics: Human Language Technol., 2013, pp. 746 –751.

[55]

I. Misra, A. Shrivastava, and M. Hebert, “ Watch and learn: Semi-supervised learning of object detectors from videos,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 3593–3602.

[56]

C. Rosenberg, M. Hebert, and H. Schneiderman, “ Semi-supervised self-training of object detection models,” in Proc. IEEE Workshops Appl. Comput. Vis., 2005, pp. 29–36 .

[57]

Y. Yang, G. Shu, and M. Shah, “ Semi-supervised learning of feature hierarchies for object detection in a video,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2013, pp. 1650–1657.

[58]

P. Agrawal, R. Girshick, and J. Malik, “Analyzing the performance of multilayer neural networks for object recognition,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 329–344.

[59]

Y. Jia, et al., “Caffe: Convolutional architecture for fast feature embedding,” in Proc. ACM Int. Conf. Multimedia, 2014, pp. 675–678.

[60]

M. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 818–833.

[61]

M. Rohrbach, M. Stark, G. Szarvas, I. Gurevych, and B. Schiele, “What helps where – and why? semantic relatedness for knowledge transfer,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2010, pp. 910–917.

[62]

S. Rothe and H. Schütze, “Autoextend: Extending word embeddings to embeddings for synsets and lexemes,” in Proc. 53rd Annu. Meet. Assoc. Comput. Linguistics 7th Int. Joint Conf. Natural Language Process., 2015, pp. 1793–1803.

[63]

B. Gao, E. Dellandrea, and L. Chen, “Music sparse decomposition onto a midi dictionary of musical words and its application to music mood classification,” in Proc. Int. Workshop Content-Based Multimedia Indexing, 2012, pp. 1 –6.

[64]

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learning Representations, 2015.

[65]

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 2818–2826.

[66]

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., June 2016, pp. 770–778.

[67]

R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Region-based convolutional networks for accurate object detection and segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 1, pp. 142–158, Jan. 2016.

Digital Library

Cited By

Wu HYe GZhou ZTian LWang QLin L(2024)Dual-View Data Hallucination With Semantic Relation Guidance for Few-Shot Image RecognitionIEEE Transactions on Multimedia10.1109/TMM.2024.345305526(11302-11315)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3453055
Guo QZhang ZZhou MYue HPu HLuo J(2023)Image Defogging Based on Regional Gradient Constrained PriorACM Transactions on Multimedia Computing, Communications, and Applications10.1145/361783420:3(1-17)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1145/3617834
Liu TZhang LWang YGuan JFu YZhao JZhou S(2023)Recent Few-shot Object Detection Algorithms: A Survey with Performance ComparisonACM Transactions on Intelligent Systems and Technology10.1145/359358814:4(1-36)Online publication date: 15-Jun-2023
https://dl.acm.org/doi/10.1145/3593588
Show More Cited By

Index Terms

Visual and Semantic Knowledge Transfer for Large Scale Semi-Supervised Object Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object recognition
      2. Computer vision tasks
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning

Index terms have been assigned to the content through auto-classification.

Recommendations

Boosting Weakly Supervised Object Detection with Progressive Knowledge Transfer
Computer Vision – ECCV 2020
Abstract
In this paper, we propose an effective knowledge transfer framework to boost the weakly supervised object detection accuracy with the help of an external fully-annotated source dataset, whose categories may not overlap with the target domain. This ...
CSOT: Cross-scan Object Transfer for Semi-Supervised LiDAR Object Detection
Computer Vision – ECCV 2024
Abstract
Large-scale 3D bounding box annotation is crucial for LiDAR object detection but comes at a high cost. Semi-supervised object detection (SSOD) offers promising solutions to leverage unannotated data, but the predominant pseudo-labeling approach ...
Mixed Supervised Object Detection with Robust Objectness Transfer

In this paper, we consider the problem of leveraging existing fully labeled categories to improve the weakly supervised detection (WSD) of new object categories, which we refer to as mixed supervised detection (MSD). Different from previous MSD methods ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Pattern Analysis and Machine Intelligence

IEEE Transactions on Pattern Analysis and Machine Intelligence Volume 40, Issue 12

Dec. 2018

276 pages

ISSN:0162-8828

Issue’s Table of Contents

0162-8828 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 December 2018

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wu HYe GZhou ZTian LWang QLin L(2024)Dual-View Data Hallucination With Semantic Relation Guidance for Few-Shot Image RecognitionIEEE Transactions on Multimedia10.1109/TMM.2024.345305526(11302-11315)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3453055
Guo QZhang ZZhou MYue HPu HLuo J(2023)Image Defogging Based on Regional Gradient Constrained PriorACM Transactions on Multimedia Computing, Communications, and Applications10.1145/361783420:3(1-17)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1145/3617834
Liu TZhang LWang YGuan JFu YZhao JZhou S(2023)Recent Few-shot Object Detection Algorithms: A Survey with Performance ComparisonACM Transactions on Intelligent Systems and Technology10.1145/359358814:4(1-36)Online publication date: 15-Jun-2023
https://dl.acm.org/doi/10.1145/3593588
Pei XDeng XTian SZhang LXue K(2023)A Knowledge Transfer-Based Semi-Supervised Federated Learning for IoT Malware DetectionIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2022.317366420:3(2127-2143)Online publication date: 1-May-2023
https://dl.acm.org/doi/10.1109/TDSC.2022.3173664
Yan YShu YChen SXue JShen CWang H(2023)SPL-Net: Spatial-Semantic Patch Learning Network for Facial Attribute Recognition with Limited Labeled DataInternational Journal of Computer Vision10.1007/s11263-023-01787-w131:8(2097-2121)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1007/s11263-023-01787-w
Monka SHalilaj LRettinger AAlam MBuscaldi DCochez MOsborne FRefogiato Recupero DSack H(2022)A survey on visual transfer learning using knowledge graphsSemantic Web10.3233/SW-21295913:3(477-510)Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.3233/SW-212959
Aslam ACurry E(2022)UnseenNet: Fast Training Detector for Unseen Concepts with No Bounding BoxesImage and Vision Computing10.1007/978-3-031-25825-1_2(18-32)Online publication date: 23-Nov-2022
https://dl.acm.org/doi/10.1007/978-3-031-25825-1_2
Liu CWang KLu HCao ZZhang Z(2022)Robust Object Detection with Inaccurate Bounding BoxesComputer Vision – ECCV 202210.1007/978-3-031-20080-9_4(53-69)Online publication date: 23-Oct-2022
https://dl.acm.org/doi/10.1007/978-3-031-20080-9_4
Zhong YWang JPeng JZhang L(2020)Boosting Weakly Supervised Object Detection with Progressive Knowledge TransferComputer Vision – ECCV 202010.1007/978-3-030-58574-7_37(615-631)Online publication date: 23-Aug-2020
https://dl.acm.org/doi/10.1007/978-3-030-58574-7_37

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents