Abstract
We study the task of learning to rank images given a text query, a problem that is complicated by the issue of multiple senses. That is, the senses of interest are typically the visually distinct concepts that a user wishes to retrieve. In this paper, we propose to learn a ranking function that optimizes the ranking cost of interest and simultaneously discovers the disambiguated senses of the query that are optimal for the supervised task. Note that no supervised information is given about the senses. Experiments performed on web images and the ImageNet dataset show that using our approach leads to a clear gain in performance.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Grangier, D., Bengio, S.: A discriminative kernel-based model to rank images from text queries. PAMI 30, 1371–1384 (2008)
Boser, B.E., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: COLT, pp. 144–152 (1992)
Monay, F., Gatica-Perez, D.: On image auto-annotation with latent space models. In: ICMR, pp. 275–278 (2003)
Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models. In: SIGIR (2003)
Makadia, A., Pavlovic, V., Kumar, S.: A New Baseline for Image Annotation. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 316–329. Springer, Heidelberg (2008)
Guillaumin, M., Mensink, T., Verbeek, J., Schmid, C.: Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In: ICCV, pp. 309–316 (2009)
Grangier, D., Bengio, S.: A Neural Network to Retrieve Images from Text Queries. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006, Part II. LNCS, vol. 4132, pp. 24–34. Springer, Heidelberg (2006)
Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38, 39–41 (1995)
Pedersen, T., Bruce, R.: Distinguishing word senses in untagged text. In: EMNLP, vol. 2, pp. 197–207 (1997)
Purandare, A., Pedersen, T.: Word sense discrimination by clustering contexts in vector and similarity spaces. In: CoNLL, pp. 41–48 (2004)
Basile, P., Caputo, A., Semeraro, G.: Exploiting disambiguation and discrimination in information retrieval systems. In: WI/IAT Workshops, pp. 539–542 (2009)
Agirre, E., Edmonds, P.: Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology), 1st edn. Springer (2007)
Navigli, R.: Word sense disambiguation: A survey. ACM Computing Surveys (CSUR) 41, 10 (2009)
Berg, T.L., Forsyth, D.A.: Animals on the web. In: CVPR (2006)
Schroff, F., Criminisi, A., Zisserman, A.: Harvesting Image Databases from the Web. PAMI 33, 754–766 (2011)
Loeff, N., Alm, C., Forsyth, D.: Discriminating image senses by clustering with multimodal features. In: ACL, pp. 547–554 (2006)
Saenko, K., Darrell, T.: Filtering abstract senses from image search results. In: NIPS, pp. 1589–1597 (2009)
Wan, K.W., Tan, A.H., Lim, J.H., Chia, L.T., Roy, S.: A latent model for visual disambiguation of keyword-based image search. In: BMVC (2009)
Chang, Y.-C., Chen, H.-H.: Image Sense Classification in Text-Based Image Retrieval. In: Lee, G.G., Song, D., Lin, C.-Y., Aizawa, A., Kuriyama, K., Yoshioka, M., Sakai, T. (eds.) AIRS 2009. LNCS, vol. 5839, pp. 124–135. Springer, Heidelberg (2009)
Barnard, K., Johnson, M.: Word sense disambiguation with pictures. Artif. Intell. 167, 13–30 (2005)
Yu, C.N.J., Joachims, T.: Learning structural svms with latent variables. In: ICML (2009)
Bergeron, C., Zaretzki, J., Breneman, C., Bennett, K.P.: Multiple instance ranking. In: ICML (2008)
Boyd, S., Mutapcic, A.: Subgradient methods. notes for ee364b, Stanford university (2007)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A Large-Scale Hierarchical Image Database. In: CVPR (2009)
Weston, J., Bengio, S., Usunier, N.: Wsabie: Scaling up to large vocabulary image annotation. In: IJCAI, pp. 2764–2770 (2011)
Grauman, K., Trevor, D.: The pyramid match kernel: Efficient learning with sets of features. JMLR 8, 725–760 (2007)
Leung, T., Malik, J.: Representing and Recognizing the Visual Appearance of Materials Using Three-Dimensional Textons. IJCV 43, 29–44 (2001)
Schoelkopf, B., Smola, A., Müller, K.R.: Kernel principal component analysis. In: Advances in Kernel Methods - Support Vector Learning, pp. 327–352. MIT Press (1999)
Barla, A., Odone, F., Verri, A.: Histogram intersection kernel for image classification. In: ICIP, pp. 513–516 (2003)
Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. JMLR 2, 265–292 (2001)
Zien, A., De Bona, F., Ong, C.S.: Training and approximation of a primal multiclass support vector machine. In: ASMDA (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lucchi, A., Weston, J. (2012). Joint Image and Word Sense Discrimination for Image Retrieval. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds) Computer Vision – ECCV 2012. ECCV 2012. Lecture Notes in Computer Science, vol 7572. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33718-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-33718-5_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33717-8
Online ISBN: 978-3-642-33718-5
eBook Packages: Computer ScienceComputer Science (R0)