[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Free access

Using the forest to see the trees: exploiting context for visual object detection and localization

Published: 01 March 2010 Publication History

Abstract

Recognizing objects in images is an active area of research in computer vision. In the last two decades, there has been much progress and there are already object recognition systems operating in commercial products. However, most of the algorithms for detecting objects perform an exhaustive search across all locations and scales in the image comparing local image regions with an object model. That approach ignores the semantic structure of scenes and tries to solve the recognition problem by brute force. In the real world, objects tend to covary with other objects, providing a rich collection of contextual associations. These contextual associations can be used to reduce the search space by looking only in places in which the object is expected to be; this also increases performance, by rejecting patterns that look like the target but appear in unlikely places.
Most modeling attempts so far have defined the context of an object in terms of other previously recognized objects. The drawback of this approach is that inferring the context becomes as difficult as detecting each object. An alternative view of context relies on using the entire scene information holistically. This approach is algorithmically attractive since it dispenses with the need for a prior step of individual object recognition. In this paper, we use a probabilistic framework for encoding the relationships between context and object properties and we show how an integrated system provides improved performance. We view this as a significant step toward general purpose machine vision systems.

References

[1]
Fei-Fei, L., Perona, P. A Bayesian hierarchical model for learning natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2005), 524--531.
[2]
Heeger, D., Bergen, J.R. Pyramid-based texture analysis/synthesis. In SIGGRAPH'95: Proceedings of the 22nd Annual Conference on Computer Graphics and Interactive Techniques (New York, USA, 1995). ACM, NY, 229--238.
[3]
Hoiem, D., Efros, A., Hebert, M. Geometric context from a single image. In IEEE International Conference on Computer Vision (2005).
[4]
Jordan, M.I., Jacobs, R.A. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6 (1994), 181--214.
[5]
Koller, D., Friedman, N. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.
[6]
Kumar, S., Hebert, M. Discriminative random fields: A discriminative framework for contextual interaction in classification. In IEEE International Conference on Computer Vision (2003).
[7]
Lazebnik, S., Schmid, C., Ponce, J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2006), 2169--2178.
[8]
Murphy, K., Torralba, A., Eaton, D., Freeman, W.T. Object detection and localization using local and global features. Toward Category-Level Object Recognition. J. Ponce, M. Hebert, C. Schmidt, and A. Zisserman, eds. 2006.
[9]
Murphy, K., Torralba, A., Freeman, W. Using the forest to see the trees: a graphical model relating features, objects and scenes. In Advances in Neural Information Proceedings Systems (2003).
[10]
Oliva, A., Schyns, P.G. Diagnostic color blobs mediate scene recognition. Cogn. Psychol. 41 (2000), 176--210.
[11]
Oliva, A., Torralba, A. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comp. Vision 42 (2001), 145--175.
[12]
Quattoni, A., Torralba, A. Recognizing indoor scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2009), 413--420.
[13]
Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., Belongie, S. Objects in context. In IEEE International Conference on Computer Vision (Rio de Janeiro, 2007).
[14]
Richard, X.H., Zemel, R.S., Carreiraperpinan, M.A. Multiscale conditional random fields for image labeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2004), 695--702.
[15]
Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T. LabelMe: a database and web-based tool for image annotation. Int. J. Comp. Vision 77, 1--3 (2008), 157--173.
[16]
Strat, T.M., Fischler, M.A. Context-based vision: recognizing objects using information from both 2-D and 3-D imagery. IEEE Transaction on Pattern Analysis and Machine Intelligence 13, 10 (1991) 1050--1065.
[17]
Torralba, A. Contextual priming for object detection. Int. J. Comp. Vision 53, 2 (2003), 153--167.
[18]
Torralba, A., Murphy, K., Freeman, W. Contextual models for object detection using boosted random fields. In Advances in Neural Information Proceedings Systems (2004).
[19]
Torralba, A., Murphy, K.P., Freeman, W.T. Sharing visual features for multiclass and multiview object detection. IEEE Trans. Pattern Anal. Mach. Intell. 29, 5 (2007), 854--869.
[20]
Viola, P., Jones, M. Robust real-time object detection. Int. J. Comp. Vision 57, 2 (2004), 137--154.

Cited By

View all
  • (2024)RLGS-YOLO: an improved algorithm for metro station passenger detection based on YOLOv8Engineering Research Express10.1088/2631-8695/ad9fd4Online publication date: 16-Dec-2024
  • (2024)Structured Generative Models for Scene UnderstandingInternational Journal of Computer Vision10.1007/s11263-024-02316-zOnline publication date: 12-Dec-2024
  • (2024)A face retrieval technique combining large models and artificial neural networksConcurrency and Computation: Practice and Experience10.1002/cpe.809436:15Online publication date: 25-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Communications of the ACM
Communications of the ACM  Volume 53, Issue 3
March 2010
152 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/1666420
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 2010
Published in CACM Volume 53, Issue 3

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Popular
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)401
  • Downloads (Last 6 weeks)126
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)RLGS-YOLO: an improved algorithm for metro station passenger detection based on YOLOv8Engineering Research Express10.1088/2631-8695/ad9fd4Online publication date: 16-Dec-2024
  • (2024)Structured Generative Models for Scene UnderstandingInternational Journal of Computer Vision10.1007/s11263-024-02316-zOnline publication date: 12-Dec-2024
  • (2024)A face retrieval technique combining large models and artificial neural networksConcurrency and Computation: Practice and Experience10.1002/cpe.809436:15Online publication date: 25-Mar-2024
  • (2023)Driving Environment Inference from POI of Navigation Map: Fuzzy Logic and Machine Learning ApproachesSensors10.3390/s2322915623:22(9156)Online publication date: 13-Nov-2023
  • (2023)Context understanding in computer vision: A surveyComputer Vision and Image Understanding10.1016/j.cviu.2023.103646229(103646)Online publication date: Mar-2023
  • (2022)From Node to Graph: Joint Reasoning on Visual-Semantic Relational Graph for Zero-Shot Detection2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV51458.2022.00171(1648-1657)Online publication date: Jan-2022
  • (2022)Bounding Boxes Are All We Need: Street View Image Classification via Context Encoding of Detected BuildingsIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2021.306431660(1-17)Online publication date: 2022
  • (2022)Research Review of Dispensing Based on Machine Vision2022 4th International Conference on Applied Machine Learning (ICAML)10.1109/ICAML57167.2022.00018(54-60)Online publication date: Jul-2022
  • (2022)Ghostbusters: How the Absence of Class Pairs in Multi-Class Multi-Label Datasets Impacts Classifier AccuracyAdvanced Computing10.1007/978-3-030-95502-1_29(377-398)Online publication date: 8-Feb-2022
  • (2021)Methodology of Calculating the Number of Trees Based on ALS Data for Forestry Applications for the Area of Samławki Forest DistrictRemote Sensing10.3390/rs1401001614:1(16)Online publication date: 21-Dec-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Magazine Site

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media