[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/645318.649254guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

Published: 28 May 2002 Publication History

Abstract

We describe a model of object recognition as machine translation. In this model, recognition is a process of annotating image regions with words. Firstly, images are segmented into regions, which are classified into region types using a variety of features. A mapping between region types and keywords supplied with the images, is then learned, using a method based around EM. This process is analogous with learning a lexicon from an aligned bitext. For the implementation we describe, these words are nouns taken from a large vocabulary. On a large test set, the method can predict numerous words with high accuracy. Simple methods identify words that cannot be predicted well. We show how to cluster words that individually are difficult to predict into clusters that can be predicted well -- for example, we cannot predict the distinction between train and locomotive using the current set of features, but we can predict the underlying concept. The method is trained on a substantial collection of images. Extensive experimental results illustrate the strengths and weaknesses of the approach.

References

[1]
K. Barnard, P. Duygulu and D. A. Forsyth. Clustering art. In IEEE Conf. on Computer Vision and Pattern Recognition , II: 434-441, 2001.
[2]
K. Barnard and D. A. Forsyth. Learning the semantics of words and pictures. In Int. Conf. on Computer Vision pages 408-15, 2001.
[3]
P. Brown, S. A. Della Pietra, V. J. Della Pietra, and R. L. Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics , 32(2):263-311, 1993.
[4]
D.A. Forsyth and J. Ponce. Computer Vision: a modern approach . Prentice-Hall 2001. in preparation.
[5]
D. Jurafsky and J. H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition . Prentice-Hall, 2000.
[6]
C. D. Manning and H. Schütze. Foundations of Statistical Natural Language Processing . MIT Press, 1999.
[7]
M. Markkula and E. Sormunen. End-user searching challenges indexing practices in the digital newspaper photo archive. Information retrieval , 1:259-285, 2000.
[8]
Y. Mori, H. Takahashi, R. Oka Image-to-word transformation based on dividing and vector quantizing images with words In First International Workshop on Multimedia Intelligent Storage and Retrieval Management (MISRM'99), 1999.
[9]
O. Maron. Learning from Ambiguity . PhD thesis, MIT, 1998.
[10]
O. Maron and A. L. Ratan. Multiple-Instance Learning for Natural Scene Classification, In The Fifteenth International Conference on Machine Learning , 1998.
[11]
I. Dan Melamed. Empirical Methods for Exploiting Parallel Texts . MIT Press, 2001.
[12]
S. Ornager. View a picture, theoretical image analysis and empirical user studies on indexing and retrieval. Swedis Library Research , 2-3:31-41, 1996.
[13]
J. Shi and J. Malik. Normalised cuts and image segmentation. In IEEE Conf. on Computer Vision and Pattern Recognition , pages 731-737, 1997.

Cited By

View all
  • (2023)Masked two-channel decoupling framework for incomplete multi-view weak multi-label learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667528(32387-32400)Online publication date: 10-Dec-2023
  • (2023)Generative Multi-Label Correlation LearningACM Transactions on Knowledge Discovery from Data10.1145/353870817:2(1-19)Online publication date: 20-Feb-2023
  • (2023)Distance-Preserving Embedding Adaptive Bipartite Graph Multi-View Learning with Application to Multi-Label ClassificationACM Transactions on Knowledge Discovery from Data10.1145/353790017:2(1-21)Online publication date: 20-Feb-2023
  • Show More Cited By
  1. Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      ECCV '02: Proceedings of the 7th European Conference on Computer Vision-Part IV
      May 2002
      834 pages

      Publisher

      Springer-Verlag

      Berlin, Heidelberg

      Publication History

      Published: 28 May 2002

      Author Tags

      1. EM algorithm
      2. correspondence
      3. object recognition

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 10 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Masked two-channel decoupling framework for incomplete multi-view weak multi-label learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667528(32387-32400)Online publication date: 10-Dec-2023
      • (2023)Generative Multi-Label Correlation LearningACM Transactions on Knowledge Discovery from Data10.1145/353870817:2(1-19)Online publication date: 20-Feb-2023
      • (2023)Distance-Preserving Embedding Adaptive Bipartite Graph Multi-View Learning with Application to Multi-Label ClassificationACM Transactions on Knowledge Discovery from Data10.1145/353790017:2(1-21)Online publication date: 20-Feb-2023
      • (2019)Multi-view multi-label learning with view-specific information extractionProceedings of the 28th International Joint Conference on Artificial Intelligence10.5555/3367471.3367581(3884-3890)Online publication date: 10-Aug-2019
      • (2019)Image Captioning by Asking QuestionsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/331387315:2s(1-19)Online publication date: 19-Jul-2019
      • (2019)Laplacian Eigenmaps Regularized Feature Mapping for Image Annotation2019 IEEE International Conference on Systems, Man and Cybernetics (SMC)10.1109/SMC.2019.8913981(3901-3906)Online publication date: 6-Oct-2019
      • (2019)A hybrid automatic image annotation approachMultimedia Tools and Applications10.1007/s11042-018-6742-678:9(11815-11834)Online publication date: 1-May-2019
      • (2019)CNN-feature based automatic image annotation methodMultimedia Tools and Applications10.1007/s11042-018-6038-x78:3(3767-3780)Online publication date: 1-Feb-2019
      • (2019)Image annotation refinement via 2P-KNN based group sparse reconstructionMultimedia Tools and Applications10.1007/s11042-018-5925-578:10(13213-13225)Online publication date: 1-May-2019
      • (2019)Local and global feature selection for multilabel classification with binary relevanceArtificial Intelligence Review10.1007/s10462-017-9556-451:1(33-60)Online publication date: 1-Jan-2019
      • Show More Cited By

      View Options

      View options

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media