Learning the Relative Importance of Objects from Tagged Images for Retrieval and Cross-Modal Search

Sung Ju Hwang¹ &
Kristen Grauman¹

1581 Accesses
Explore all metrics

Abstract

We introduce an approach to image retrieval and auto-tagging that leverages the implicit information about object importance conveyed by the list of keyword tags a person supplies for an image. We propose an unsupervised learning procedure based on Kernel Canonical Correlation Analysis that discovers the relationship between how humans tag images (e.g., the order in which words are mentioned) and the relative importance of objects and their layout in the scene. Using this discovered connection, we show how to boost accuracy for novel queries, such that the search results better preserve the aspects a human may find most worth mentioning. We evaluate our approach on three datasets using either keyword tags or natural language descriptions, and quantify results with both ground truth parameters as well as direct tests with human subjects. Our results show clear improvements over approaches that either rely on image features alone, or that use words and image features but ignore the implied importance cues. Overall, our work provides a novel way to incorporate high-level human perception of scenes into visual representations for enhanced image search.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Comparative Analysis of Relevance Feedback Techniques for Image Retrieval

WhittleSearch: Interactive Image Search with Relative Attribute Feedback

Article 04 April 2015

Information-Theoretic Active Learning for Content-Based Image Retrieval

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

von Ahn, L., & Dabbish, L. (2004). Labeling images with a computer game. In CHI.
Google Scholar
Akaho, S. (2001). A kernel method for canonical correlation analysis. In International meeting of Psychometric Society.
Google Scholar
Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Reading: Addison Wesley.
Google Scholar
Barnard, K., Duygulu, P., de Freitas, N., Forsyth, D., Blei, D., & Jordan, M. (2003). Matching words and pictures. Journal of Machine Learning Research, 3, 1107–1135.
MATH Google Scholar
Bekkerman, R., & Jeon, J. (2007). Multi-modal clustering for multimedia collections. In CVPR.
Google Scholar
Berg, T., Berg, A., Edwards, J., & Forsyth, D. (2004). Who’s in the picture. In NIPS.
Google Scholar
Blaschko, M. B., & Lampert, C. H. (2008). Correlational spectral clustering. In CVPR.
Google Scholar
Bruce, N., & Tsotsos, J. (2005). Saliency based on information maximization. In NIPS.
Google Scholar
Datta, R., Joshi, D., Li, J., & Wang, J. (2008). Image retrieval: ideas, influences, and trends of the New Age. ACM Computing Surveys, 40(2), 1–60.
Article Google Scholar
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: a large-scale hierarchical image database. In CVPR.
Google Scholar
Duygulu, P., Barnard, K., de Freitas, N., & Forsyth, D. (2002). Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In ECCV.
Google Scholar
Einhauser, W., Spain, M., & Perona, P. (2008). Objects predict fixations better than early saliency. Journal of Vision, 8(14), 1–26.
Article Google Scholar
Elazary, L., & Itti, L. (2008). Interesting objects are visually salient. Journal of Vision, 8(3), 1–15.
Article Google Scholar
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2007). The PASCAL visual object classes challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.
Farhadi, A., Hejrati, M., Sadeghi, A., Young, P., Rashtchian, C., Hockenmaier, J., & Forsyth, D. (2010). Every picture tells a story: generating sentences for images. In ECCV.
Google Scholar
Fergus, R., Fei-Fei, L., Perona, P., & Zisserman, A. (2005). Learning object categories from Google’s image search. In ICCV.
Google Scholar
Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382.
Article Google Scholar
Fyfe, C., & Lai, P. (2001). Kernel and nonlinear canonical correlation analysis. International Journal of Neural Systems, 10, 365–374.
Google Scholar
Gupta, A., & Davis, L. (2008). Beyond nouns: exploiting prepositions and comparative adjectives for learning visual classifiers. In ECCV.
Google Scholar
Hardoon, D., & Shawe-Taylor, J. (2003). KCCA for different level precision in content-based image retrieval. In Third international workshop on content-based multimedia indexing.
Google Scholar
Hardoon, D. R., Szedmak, S., & Shawe-Taylor, J. (2004). Canonical correlation analysis: an overview with application to learning methods. Neural Computation, 16(12).
Hotelling, H. (1936). Relations between two sets of variants. Biometrika, 28, 321–377.
MATH Google Scholar
Hwang, S. J., & Grauman, K. (2010a). Accounting for the relative importance of objects in image retrieval. In British machine vision conference.
Google Scholar
Hwang, S. J., & Grauman, K. (2010b). Reading between the lines: object localization using implicit cues from image tags. In CVPR.
Google Scholar
Jarvelin, K., & Kekalainen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4), 422–446.
Article Google Scholar
Kadir, T., & Brady, M. (2001). Saliency, scale and image description. International Journal of Computer Vision, 45(2), 83–105.
Article MATH Google Scholar
Kulis, B., & Grauman, K. (2009). Kernelized locality-sensitive hashing for scalable image search. In ICCV.
Google Scholar
Lavrenko, V., Manmatha, R., & Jeon, J. (2003). A model for learning the semantics of pictures. In NIPS.
Google Scholar
Li, L., Wang, G., & Fei-Fei, L. (2007). Optimol: automatic online picture collection via incremental model learning. In CVPR.
Google Scholar
Li, L. J., Socher, R., & Fei-Fei, L. (2009). Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In CVPR.
Google Scholar
Li, Y., & Shawe-Taylor, J. (2006). Using KCCA for Japanese-English cross-language information retrieval and document classification. Journal of Intelligent Information Systems, 27(2).
Loeff, N., & Farhadi, A. (2008). Scene discovery by matrix factorization. In ECCV.
Google Scholar
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. IJCV, 60(2).
Makadia, A., Pavlovic, V., & Kumar, S. (2008). A new baseline for image annotation. In ECCV.
Google Scholar
Monay, F., & Gatica-Perez, D. (2003). On image auto-annotation with latent space models. In ACM multimedia.
Google Scholar
Qi, G. J., Hua, X. S., & Zhang, H. J. (2009). Learning semantic distance from community-tagged media collection. In ACM multimedia.
Google Scholar
Quack, T., Leibe, B., & Gool, L. V. (2008). World-scale mining of objects and events from community photo collections. In CIVR.
Google Scholar
Quattoni, A., Collins, M., & Darrell, T. (2007). Learning visual representations using images with captions. In CVPR.
Google Scholar
Russell, B., Torralba, A., Murphy, K., & Freeman, W. (2005). Labelme: a database and web-based tool for image annotation (Tech. rep). MIT.
Schroff, F., Criminisi, A., & Zisserman, A. (2007). Harvesting image databases from the web. In ICCV.
Google Scholar
Smeulders, A., Worring, M., Santini, S., Gupta, A., & Jain, R. (2000). Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12), 1349–1380.
Article Google Scholar
Spain, M., & Perona, P. (2008). Some objects are more equal than others: measuring and predicting importance. In ECCV.
Google Scholar
Tatler, B., Baddeley, R., & Gilchrist, I. (2005). Visual correlates of fixation selection: effects of scale and time. Vision Research, 45, 643–659.
Article Google Scholar
Torralba, A. (2003). Contextual priming for object detection. International Journal of Computer Vision, 53(2), 169–191.
Article Google Scholar
Vijayanarasimhan, S., & Grauman, K. (2008). Keywords to visual categories: multiple-instance learning for weakly supervised object categorization. In CVPR.
Google Scholar
Wolfe, J., & Horowitz, T. (2004). What attributes guide the deployment of visual attention and how do they do it? Neuroscience, 5, 495–501.
Google Scholar
Yakhnenko, O., & Honavar, V. (2009). Multiple label prediction for image annotation with multiple kernel correlation models. In Workshop on visual context learning, in conjunction with CVPR.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Texas at Austin, Austin, TX, 78712, USA
Sung Ju Hwang & Kristen Grauman

Authors

Sung Ju Hwang
View author publications
You can also search for this author in PubMed Google Scholar
Kristen Grauman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sung Ju Hwang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hwang, S.J., Grauman, K. Learning the Relative Importance of Objects from Tagged Images for Retrieval and Cross-Modal Search. Int J Comput Vis 100, 134–153 (2012). https://doi.org/10.1007/s11263-011-0494-3

Download citation

Received: 16 December 2010
Accepted: 23 August 2011
Published: 18 October 2011
Issue Date: November 2012
DOI: https://doi.org/10.1007/s11263-011-0494-3

Learning the Relative Importance of Objects from Tagged Images for Retrieval and Cross-Modal Search

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Comparative Analysis of Relevance Feedback Techniques for Image Retrieval

WhittleSearch: Interactive Image Search with Relative Attribute Feedback

Information-Theoretic Active Learning for Content-Based Image Retrieval

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Learning the Relative Importance of Objects from Tagged Images for Retrieval and Cross-Modal Search

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Comparative Analysis of Relevance Feedback Techniques for Image Retrieval

WhittleSearch: Interactive Image Search with Relative Attribute Feedback

Information-Theoretic Active Learning for Content-Based Image Retrieval

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation