[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/3044805.3045072guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

On learning to localize objects with minimal supervision

Published: 21 June 2014 Publication History

Abstract

Learning to localize objects with minimal supervision is an important problem in computer vision, since large fully annotated datasets are extremely costly to obtain. In this paper, we propose a new method that achieves this goal with only image-level labels of whether the objects are present or not. Our approach combines a discriminative submodular cover problem for automatically discovering a set of positive object windows with a smoothed latent SVM formulation. The latter allows us to leverage efficient quasi-Newton optimization techniques. Our experiments demonstrate that the proposed approach provides a 50% relative improvement in mean average precision over the current state-of-the-art on PASCAL VOC 2007 detection.

References

[1]
Alexe, B., Deselaers, T., and Ferrari, V. Classcut for unsupervised class segmentation. In ECCV, 2010.
[2]
Andrews, S, Tsochantaridis, I, and Hofmann, T. Support vector machines for multiple-instance learning. In NIPS, 2003.
[3]
Bach, F., Jenatton, R., Mairal, J., and Obozinski, G. Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning, 4(1):1-106, 2012.
[4]
Barinova, O., Lempitsky, V., and Kohli, P. On detection of multiple object instances using hough transforms. IEEE TPAMI, 2012.
[5]
Boyd, S. P. and Vandenberghe, L. Convex Optimization. Cambridge University Press, 2004.
[6]
Chen, X., Shrivastava, A., and and, A. Gupta. Neil: Extracting visual knowledge from web data. In ICCV, 2013.
[7]
Chen, Y., Shioi, H., Montesinos, C. Fuentes, Koh, L. P., Wich, S., and Krause, A. Active detection via adaptive submodularity. In ICML, 2014.
[8]
Chum, O. and Zisserman, A. An exemplar model for learning object classes. In CVPR, 2007.
[9]
Crandall, D. and Huttenlocher, D. Weakly supervised learning of part-based spatial models for visual object recognition. In ECCV. 2006.
[10]
Darrell, T., Sclaroff, S., and Pentland, A. Segmentation by minimal description. In ICCV, 1990.
[11]
Deselaers, T., Alex, B., and Ferrari, V. Localizing objects while learning their appearance. In ECCV, 2010.
[12]
Deselaers, T., Alex, B., and Ferrari, V. Weakly supervised localization and learning with generic knowledge. IJCV, 2012.
[13]
Doersch, C., Singh, S., Gupta, A., Sivic, J., and Efros, A. What makes paris look like paris? In SIGGRAPH, 2012.
[14]
Doersch, C., Gupta, A., and Efros, A. Mid-level visual element discovery as discriminative mode seeking. In NIPS, 2013.
[15]
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., and Darrell, T. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. In ICML, 2014.
[16]
Endres, I., Shih, K., and Hoeim, D. Learning collections of part models for object recognition. In CVPR, 2013.
[17]
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., and Zisserman, A. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results.
[18]
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., and Zisserman, A. The PASCAL Visual Object Classes (VOC) Challenge. IJCV, 2010.
[19]
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., and Ramanan, D. Object detection with discriminatively trained part based models. IEEE TPAMI, 32(9), 2010.
[20]
Fergus, R., Perona, P., and Zisserman, A. Weakly supervised scale-invariant learning of models for visual recognition. IJCV, 2007.
[21]
Fukunaga, K. and Hostetler, L. The estimation of the gradient of a density function, with applications in pattern recognition. Information Theory, 1975.
[22]
Galleguillos, C., Babenko, B., Rabinovich, A., and Belongie, S. Weakly supervised object localization with stable segmentations. In ECCV, 2008.
[23]
Girshick, R., Donahue, J., Darrell, T., and Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.
[24]
Joulin, A. and Bach, F. A convex relaxation for weakly supervised classifiers. In ICML, 2012.
[25]
Joulin, A., Bach, F., and Ponce, J. Discriminative clustering for image co-segmentation. In CVPR, 2010.
[26]
Juneja, M., Vedaldi, A., Jawahar, V., and Zisserman, A. Blocks that shout: Distinctive parts for scene classification. In CVPR, 2013.
[27]
Kim, G., Xing, E.P., Fei-Fei, L., and Kanade, T. Distributed cosegmentation via submodular optimization on anisotropic diffusion. In ICCV, 2011.
[28]
Kumar, P, Packer, B, and Koller, D. Modeling latent variable uncertainty for loss-based learning. In ICML, 2012.
[29]
Leibe, B., Leonardis, A., and Schiele, B. Combined object categorization and segmentation with an implicit chape model. In ECCVW, 2004.
[30]
Li, Y., Tsang, I., Kwok, J., and Zhou, Z. Convex and scalable weakly labeled svms. In ICML, 2013.
[31]
Long, P.M. and Tan, L. PAC learning axis aligned rectangles with respect to product distributions from multiple-instance examples. In Proc. Comp. Learning Theory, 1996.
[32]
Micolajczyk, K., Leibe, G., and Schiele, B. Multiple object class detection with a generative model. In CVPR, 2006.
[33]
Nesterov, Y. Smooth minimization of non-smooth functions. Mathematical Programming, 103(1), 2005.
[34]
Nocedal, J. and Wright, S. Numerical Optimization. Springer, 1999.
[35]
Pandey, M. and Lazebnik, S. Scene recognition and weakly supervised object localization with deformable part-based models. In ICCV, 2011.
[36]
Raptis, M., Kokkinos, I., and Soatto, S. Discovering discriminative action parts from mid-level video representations. In CVPR, 2012.
[37]
Rother, C., Minka, T., Blake, A., and Kolmogorov, V. Cosegmentation of image pairs by histogram matching incorporating a global constraint into MRFs. In CVPR, 2006.
[38]
Russakovsky, O., Lin, Y., Yu, K., and Fei Fei, L. Object-centric spatial pooling for image classification. In ECCV, 2012.
[39]
Singh, S., Gupta, A., and Efros, A. Unsupervised discovery of mid-level discriminative patches. In ECCV, 2012.
[40]
Siva, P. and Xiang, T. Weakly supervised object detector learning with model drift detection. In ICCV, 2011.
[41]
Siva, P., Russell, C., and Xiang, T. In defence of negative mining for annotating weakly labelled data. In ECCV, 2012.
[42]
Uijlings, J., van de Sande, K., Gevers, T., and Smeulders, A. Selective search for object recognition. In IJCV, 2013.
[43]
Weber, M., Welling, M., and Perona, P. Towards automatic discovery of object categories. In CVPR, 2000a.
[44]
Weber, M., Welling, M., and Perona, P. Unsupervised learning of models for recognition. In ECCV, 2000b.
[45]
Wolsey, L. An analysis of the greedy algorithm for the submodular set covering problem. Combinatorica, 2:385-393, 1982.
[46]
Yu, C.N. and Joachims, T. Learning structural svms with latent variables. In ICML, 2009.
[47]
Yuille, A.L. and Rangarajan, A. The concave-convex procedure. Neural Computation, 15(4):915-936, 2003.

Cited By

View all
  • (2021)Distributed Attention for Grounded Image CaptioningProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475354(1966-1975)Online publication date: 17-Oct-2021
  • (2019)Mixed Supervised Object Detection with Robust Objectness TransferIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2018.281028841:3(639-653)Online publication date: 1-Mar-2019
  • (2019)Self Paced Deep Learning for Weakly Supervised Object DetectionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2018.280490741:3(712-725)Online publication date: 1-Mar-2019
  • Show More Cited By
  1. On learning to localize objects with minimal supervision

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    ICML'14: Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32
    June 2014
    2786 pages

    Publisher

    JMLR.org

    Publication History

    Published: 21 June 2014

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Distributed Attention for Grounded Image CaptioningProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475354(1966-1975)Online publication date: 17-Oct-2021
    • (2019)Mixed Supervised Object Detection with Robust Objectness TransferIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2018.281028841:3(639-653)Online publication date: 1-Mar-2019
    • (2019)Self Paced Deep Learning for Weakly Supervised Object DetectionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2018.280490741:3(712-725)Online publication date: 1-Mar-2019
    • (2019)A survey on deep neural network-based image captioningThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-018-1566-y35:3(445-470)Online publication date: 1-Mar-2019
    • (2018)A smoother way to train structured prediction modelsProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327345.3327386(4771-4783)Online publication date: 3-Dec-2018
    • (2018)Transparency and Explanation in Deep Reinforcement Learning Neural NetworksProceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society10.1145/3278721.3278776(144-150)Online publication date: 27-Dec-2018
    • (2018)Weakly supervised detection with decoupled attention-based deep representationMultimedia Tools and Applications10.1007/s11042-017-5087-x77:3(3261-3277)Online publication date: 1-Feb-2018
    • (2017)Saliency guided end-to-end learning forweakly supervised object detectionProceedings of the 26th International Joint Conference on Artificial Intelligence10.5555/3172077.3172173(2053-2059)Online publication date: 19-Aug-2017
    • (2017)A Dual-Network Progressive Approach to Weakly Supervised Object DetectionProceedings of the 25th ACM international conference on Multimedia10.1145/3123266.3123455(279-287)Online publication date: 23-Oct-2017
    • (2016)Maximization of approximately submodular functionsProceedings of the 30th International Conference on Neural Information Processing Systems10.5555/3157382.3157439(3053-3061)Online publication date: 5-Dec-2016
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media