More Web Proxy on the site http://driver.im/

Article

On learning to localize objects with minimal supervision

Authors:

Stefanie Jegelka,

Zaid Harchaoui,

Trevor DarrellAuthors Info & Claims

ICML'14: Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32

Pages II-1611 - II-1619

Published: 21 June 2014 Publication History

Abstract

Learning to localize objects with minimal supervision is an important problem in computer vision, since large fully annotated datasets are extremely costly to obtain. In this paper, we propose a new method that achieves this goal with only image-level labels of whether the objects are present or not. Our approach combines a discriminative submodular cover problem for automatically discovering a set of positive object windows with a smoothed latent SVM formulation. The latter allows us to leverage efficient quasi-Newton optimization techniques. Our experiments demonstrate that the proposed approach provides a 50% relative improvement in mean average precision over the current state-of-the-art on PASCAL VOC 2007 detection.

References

[1]

Alexe, B., Deselaers, T., and Ferrari, V. Classcut for unsupervised class segmentation. In ECCV, 2010.

[2]

Andrews, S, Tsochantaridis, I, and Hofmann, T. Support vector machines for multiple-instance learning. In NIPS, 2003.

[3]

Bach, F., Jenatton, R., Mairal, J., and Obozinski, G. Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning, 4(1):1-106, 2012.

[4]

Barinova, O., Lempitsky, V., and Kohli, P. On detection of multiple object instances using hough transforms. IEEE TPAMI, 2012.

[5]

Boyd, S. P. and Vandenberghe, L. Convex Optimization. Cambridge University Press, 2004.

[6]

Chen, X., Shrivastava, A., and and, A. Gupta. Neil: Extracting visual knowledge from web data. In ICCV, 2013.

[7]

Chen, Y., Shioi, H., Montesinos, C. Fuentes, Koh, L. P., Wich, S., and Krause, A. Active detection via adaptive submodularity. In ICML, 2014.

[8]

Chum, O. and Zisserman, A. An exemplar model for learning object classes. In CVPR, 2007.

[9]

Crandall, D. and Huttenlocher, D. Weakly supervised learning of part-based spatial models for visual object recognition. In ECCV. 2006.

[10]

Darrell, T., Sclaroff, S., and Pentland, A. Segmentation by minimal description. In ICCV, 1990.

[11]

Deselaers, T., Alex, B., and Ferrari, V. Localizing objects while learning their appearance. In ECCV, 2010.

[12]

Deselaers, T., Alex, B., and Ferrari, V. Weakly supervised localization and learning with generic knowledge. IJCV, 2012.

[13]

Doersch, C., Singh, S., Gupta, A., Sivic, J., and Efros, A. What makes paris look like paris? In SIGGRAPH, 2012.

[14]

Doersch, C., Gupta, A., and Efros, A. Mid-level visual element discovery as discriminative mode seeking. In NIPS, 2013.

[15]

Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., and Darrell, T. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. In ICML, 2014.

[16]

Endres, I., Shih, K., and Hoeim, D. Learning collections of part models for object recognition. In CVPR, 2013.

[17]

Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., and Zisserman, A. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results.

[18]

Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., and Zisserman, A. The PASCAL Visual Object Classes (VOC) Challenge. IJCV, 2010.

[19]

Felzenszwalb, P. F., Girshick, R. B., McAllester, D., and Ramanan, D. Object detection with discriminatively trained part based models. IEEE TPAMI, 32(9), 2010.

[20]

Fergus, R., Perona, P., and Zisserman, A. Weakly supervised scale-invariant learning of models for visual recognition. IJCV, 2007.

[21]

Fukunaga, K. and Hostetler, L. The estimation of the gradient of a density function, with applications in pattern recognition. Information Theory, 1975.

[22]

Galleguillos, C., Babenko, B., Rabinovich, A., and Belongie, S. Weakly supervised object localization with stable segmentations. In ECCV, 2008.

[23]

Girshick, R., Donahue, J., Darrell, T., and Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.

[24]

Joulin, A. and Bach, F. A convex relaxation for weakly supervised classifiers. In ICML, 2012.

[25]

Joulin, A., Bach, F., and Ponce, J. Discriminative clustering for image co-segmentation. In CVPR, 2010.

[26]

Juneja, M., Vedaldi, A., Jawahar, V., and Zisserman, A. Blocks that shout: Distinctive parts for scene classification. In CVPR, 2013.

[27]

Kim, G., Xing, E.P., Fei-Fei, L., and Kanade, T. Distributed cosegmentation via submodular optimization on anisotropic diffusion. In ICCV, 2011.

[28]

Kumar, P, Packer, B, and Koller, D. Modeling latent variable uncertainty for loss-based learning. In ICML, 2012.

[29]

Leibe, B., Leonardis, A., and Schiele, B. Combined object categorization and segmentation with an implicit chape model. In ECCVW, 2004.

[30]

Li, Y., Tsang, I., Kwok, J., and Zhou, Z. Convex and scalable weakly labeled svms. In ICML, 2013.

[31]

Long, P.M. and Tan, L. PAC learning axis aligned rectangles with respect to product distributions from multiple-instance examples. In Proc. Comp. Learning Theory, 1996.

[32]

Micolajczyk, K., Leibe, G., and Schiele, B. Multiple object class detection with a generative model. In CVPR, 2006.

[33]

Nesterov, Y. Smooth minimization of non-smooth functions. Mathematical Programming, 103(1), 2005.

[34]

Nocedal, J. and Wright, S. Numerical Optimization. Springer, 1999.

[35]

Pandey, M. and Lazebnik, S. Scene recognition and weakly supervised object localization with deformable part-based models. In ICCV, 2011.

[36]

Raptis, M., Kokkinos, I., and Soatto, S. Discovering discriminative action parts from mid-level video representations. In CVPR, 2012.

[37]

Rother, C., Minka, T., Blake, A., and Kolmogorov, V. Cosegmentation of image pairs by histogram matching incorporating a global constraint into MRFs. In CVPR, 2006.

[38]

Russakovsky, O., Lin, Y., Yu, K., and Fei Fei, L. Object-centric spatial pooling for image classification. In ECCV, 2012.

[39]

Singh, S., Gupta, A., and Efros, A. Unsupervised discovery of mid-level discriminative patches. In ECCV, 2012.

[40]

Siva, P. and Xiang, T. Weakly supervised object detector learning with model drift detection. In ICCV, 2011.

[41]

Siva, P., Russell, C., and Xiang, T. In defence of negative mining for annotating weakly labelled data. In ECCV, 2012.

[42]

Uijlings, J., van de Sande, K., Gevers, T., and Smeulders, A. Selective search for object recognition. In IJCV, 2013.

[43]

Weber, M., Welling, M., and Perona, P. Towards automatic discovery of object categories. In CVPR, 2000a.

[44]

Weber, M., Welling, M., and Perona, P. Unsupervised learning of models for recognition. In ECCV, 2000b.

[45]

Wolsey, L. An analysis of the greedy algorithm for the submodular set covering problem. Combinatorica, 2:385-393, 1982.

[46]

Yu, C.N. and Joachims, T. Learning structural svms with latent variables. In ICML, 2009.

[47]

Yuille, A.L. and Rangarajan, A. The concave-convex procedure. Neural Computation, 15(4):915-936, 2003.

Cited By

Chen NPan XChen RYang LLin ZRen YYuan HGuo XHuang FWang WShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)Distributed Attention for Grounded Image CaptioningProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475354(1966-1975)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3475354
Li YZhang JHuang KZhang J(2019)Mixed Supervised Object Detection with Robust Objectness TransferIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2018.281028841:3(639-653)Online publication date: 1-Mar-2019
https://dl.acm.org/doi/10.1109/TPAMI.2018.2810288
Sangineto ENabi MCulibrk DSebe N(2019)Self Paced Deep Learning for Weakly Supervised Object DetectionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2018.280490741:3(712-725)Online publication date: 1-Mar-2019
https://dl.acm.org/doi/10.1109/TPAMI.2018.2804907
Show More Cited By

On learning to localize objects with minimal supervision
1. Computing methodologies

Recommendations

Learning to localize objects with noisy labeled instances
AAAI'19/IAAI'19/EAAI'19: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence

This paper addresses Weakly Supervised Object Localization (WSOL) with only image-level supervision. We model the missing object locations as latent variables, and contribute a novel self-directed optimization strategy to infer them. With the strategy, ...
Rethinking weak supervision in helping contrastive learning
ICML'23: Proceedings of the 40th International Conference on Machine Learning

Contrastive learning has shown outstanding performances in both supervised and unsupervised learning, and has recently been introduced to solve weakly supervised learning problems such as semi-supervised learning and noisy label learning. Despite the ...
Learning to Detect Carried Objects with Minimal Supervision
CVPRW '13: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops

We propose a learning-based method for detecting carried objects that generates candidate image regions from protrusion, color contrast and occlusion boundary cues, and uses a classifier to filter out the regions unlikely to be carried objects. The ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'14: Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32

June 2014

2786 pages

Publisher

JMLR.org

Publication History

Published: 21 June 2014

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen NPan XChen RYang LLin ZRen YYuan HGuo XHuang FWang WShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)Distributed Attention for Grounded Image CaptioningProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475354(1966-1975)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3475354
Li YZhang JHuang KZhang J(2019)Mixed Supervised Object Detection with Robust Objectness TransferIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2018.281028841:3(639-653)Online publication date: 1-Mar-2019
https://dl.acm.org/doi/10.1109/TPAMI.2018.2810288
Sangineto ENabi MCulibrk DSebe N(2019)Self Paced Deep Learning for Weakly Supervised Object DetectionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2018.280490741:3(712-725)Online publication date: 1-Mar-2019
https://dl.acm.org/doi/10.1109/TPAMI.2018.2804907
Liu XXu QWang N(2019)A survey on deep neural network-based image captioningThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-018-1566-y35:3(445-470)Online publication date: 1-Mar-2019
https://dl.acm.org/doi/10.1007/s00371-018-1566-y
Pillutla KRoulet VKakade SHarchaoui Z(2018)A smoother way to train structured prediction modelsProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327345.3327386(4771-4783)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327345.3327386
Iyer RLi YLi HLewis MSundar RSycara KFurman JMarchant GPrice HRossi F(2018)Transparency and Explanation in Deep Reinforcement Learning Neural NetworksProceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society10.1145/3278721.3278776(144-150)Online publication date: 27-Dec-2018
https://dl.acm.org/doi/10.1145/3278721.3278776
Jiang WZhao ZSu F(2018)Weakly supervised detection with decoupled attention-based deep representationMultimedia Tools and Applications10.1007/s11042-017-5087-x77:3(3261-3277)Online publication date: 1-Feb-2018
https://dl.acm.org/doi/10.1007/s11042-017-5087-x
Lai BGong X(2017)Saliency guided end-to-end learning forweakly supervised object detectionProceedings of the 26th International Joint Conference on Artificial Intelligence10.5555/3172077.3172173(2053-2059)Online publication date: 19-Aug-2017
https://dl.acm.org/doi/10.5555/3172077.3172173
Dong XMeng DMa FYang YLiu QLienhart RWang HChen SBoll SChen PFriedland GLi JYan S(2017)A Dual-Network Progressive Approach to Weakly Supervised Object DetectionProceedings of the 25th ACM international conference on Multimedia10.1145/3123266.3123455(279-287)Online publication date: 23-Oct-2017
https://dl.acm.org/doi/10.1145/3123266.3123455
Horel TSinger Y(2016)Maximization of approximately submodular functionsProceedings of the 30th International Conference on Neural Information Processing Systems10.5555/3157382.3157439(3053-3061)Online publication date: 5-Dec-2016
https://dl.acm.org/doi/10.5555/3157382.3157439
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents