Article

Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

Authors:

P. Duygulu,

Kobus Barnard,

J. F. G. de Freitas,

David A. ForsythAuthors Info & Claims

ECCV '02: Proceedings of the 7th European Conference on Computer Vision-Part IV

Pages 97 - 112

Published: 28 May 2002 Publication History

Abstract

We describe a model of object recognition as machine translation. In this model, recognition is a process of annotating image regions with words. Firstly, images are segmented into regions, which are classified into region types using a variety of features. A mapping between region types and keywords supplied with the images, is then learned, using a method based around EM. This process is analogous with learning a lexicon from an aligned bitext. For the implementation we describe, these words are nouns taken from a large vocabulary. On a large test set, the method can predict numerous words with high accuracy. Simple methods identify words that cannot be predicted well. We show how to cluster words that individually are difficult to predict into clusters that can be predicted well -- for example, we cannot predict the distinction between train and locomotive using the current set of features, but we can predict the underlying concept. The method is trained on a substantial collection of images. Extensive experimental results illustrate the strengths and weaknesses of the approach.

References

[1]

K. Barnard, P. Duygulu and D. A. Forsyth. Clustering art. In IEEE Conf. on Computer Vision and Pattern Recognition , II: 434-441, 2001.

Google Scholar

[2]

K. Barnard and D. A. Forsyth. Learning the semantics of words and pictures. In Int. Conf. on Computer Vision pages 408-15, 2001.

Google Scholar

[3]

P. Brown, S. A. Della Pietra, V. J. Della Pietra, and R. L. Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics , 32(2):263-311, 1993.

Crossref

Google Scholar

[4]

D.A. Forsyth and J. Ponce. Computer Vision: a modern approach . Prentice-Hall 2001. in preparation.

Crossref

Google Scholar

[5]

D. Jurafsky and J. H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition . Prentice-Hall, 2000.

Crossref

Google Scholar

[6]

C. D. Manning and H. Schütze. Foundations of Statistical Natural Language Processing . MIT Press, 1999.

Crossref

Google Scholar

[7]

M. Markkula and E. Sormunen. End-user searching challenges indexing practices in the digital newspaper photo archive. Information retrieval , 1:259-285, 2000.

Crossref

Google Scholar

[8]

Y. Mori, H. Takahashi, R. Oka Image-to-word transformation based on dividing and vector quantizing images with words In First International Workshop on Multimedia Intelligent Storage and Retrieval Management (MISRM'99), 1999.

Google Scholar

[9]

O. Maron. Learning from Ambiguity . PhD thesis, MIT, 1998.

Crossref

Google Scholar

[10]

O. Maron and A. L. Ratan. Multiple-Instance Learning for Natural Scene Classification, In The Fifteenth International Conference on Machine Learning , 1998.

Crossref

Google Scholar

[11]

I. Dan Melamed. Empirical Methods for Exploiting Parallel Texts . MIT Press, 2001.

Google Scholar

[12]

S. Ornager. View a picture, theoretical image analysis and empirical user studies on indexing and retrieval. Swedis Library Research , 2-3:31-41, 1996.

Google Scholar

[13]

J. Shi and J. Malik. Normalised cuts and image segmentation. In IEEE Conf. on Computer Vision and Pattern Recognition , pages 731-737, 1997.

Crossref

Google Scholar

Cited By

View all

Liu CWen JLiu YHuang CWu ZLuo XXu YOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Masked two-channel decoupling framework for incomplete multi-view weak multi-label learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667528(32387-32400)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667528
Wang LDing ZLee KHan SHan JChoi CFu Y(2023)Generative Multi-Label Correlation LearningACM Transactions on Knowledge Discovery from Data10.1145/353870817:2(1-19)Online publication date: 20-Feb-2023
https://dl.acm.org/doi/10.1145/3538708
Lu XFeng SLyu GJin YLang C(2023)Distance-Preserving Embedding Adaptive Bipartite Graph Multi-View Learning with Application to Multi-Label ClassificationACM Transactions on Knowledge Discovery from Data10.1145/353790017:2(1-21)Online publication date: 20-Feb-2023
https://dl.acm.org/doi/10.1145/3537900
Show More Cited By

Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary
1. Computing methodologies
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Word Sense Based Hindi-Tamil Statistical Machine Translation

Corpus based natural language processing has emerged with great success in recent years. It is not only used for languages like English, French, Spanish, and Hindi but also is widely used for languages like Tamil, Telugu etc. This paper focuses to ...
Overview of Verb Phrase Translation in Machine Translation: English to Tamil and Hindi to Tamil
FIRE '18: Proceedings of the 10th Annual Meeting of the Forum for Information Retrieval Evaluation

We present an overview of verb phrase translation in machine translation from English to Tamil and Hindi to Tamil track, where English, Hindi and Tamil belong to three different language families, namely, Indo-European, Indo-Aryan and Dravidian family ...
Handling of prepositions in English to Bengali machine translation
Prepositions '06: Proceedings of the Third ACL-SIGSEM Workshop on Prepositions

The present study focuses on the lexical meanings of prepositions rather than on the thematic meanings because it is intended for use in an English-Bengali machine translation (MT) system, where the meaning of a lexical unit must be preserved in the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

ECCV '02: Proceedings of the 7th European Conference on Computer Vision-Part IV

May 2002

834 pages

ISBN:3540437487

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 28 May 2002

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

354
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Liu CWen JLiu YHuang CWu ZLuo XXu YOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Masked two-channel decoupling framework for incomplete multi-view weak multi-label learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667528(32387-32400)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667528
Wang LDing ZLee KHan SHan JChoi CFu Y(2023)Generative Multi-Label Correlation LearningACM Transactions on Knowledge Discovery from Data10.1145/353870817:2(1-19)Online publication date: 20-Feb-2023
https://dl.acm.org/doi/10.1145/3538708
Lu XFeng SLyu GJin YLang C(2023)Distance-Preserving Embedding Adaptive Bipartite Graph Multi-View Learning with Application to Multi-Label ClassificationACM Transactions on Knowledge Discovery from Data10.1145/353790017:2(1-21)Online publication date: 20-Feb-2023
https://dl.acm.org/doi/10.1145/3537900
Wu XChen QHu YWang DChang XWang XZhang M(2019)Multi-view multi-label learning with view-specific information extractionProceedings of the 28th International Joint Conference on Artificial Intelligence10.5555/3367471.3367581(3884-3890)Online publication date: 10-Aug-2019
https://dl.acm.org/doi/10.5555/3367471.3367581
Yang XXu C(2019)Image Captioning by Asking QuestionsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/331387315:2s(1-19)Online publication date: 19-Jul-2019
https://dl.acm.org/doi/10.1145/3313873
Shao QLiu B(2019)Laplacian Eigenmaps Regularized Feature Mapping for Image Annotation2019 IEEE International Conference on Systems, Man and Cybernetics (SMC)10.1109/SMC.2019.8913981(3901-3906)Online publication date: 6-Oct-2019
https://dl.acm.org/doi/10.1109/SMC.2019.8913981
Jin CSun QJin S(2019)A hybrid automatic image annotation approachMultimedia Tools and Applications10.1007/s11042-018-6742-678:9(11815-11834)Online publication date: 1-May-2019
https://dl.acm.org/doi/10.1007/s11042-018-6742-6
Ma YLiu YXie QLi L(2019)CNN-feature based automatic image annotation methodMultimedia Tools and Applications10.1007/s11042-018-6038-x78:3(3767-3780)Online publication date: 1-Feb-2019
https://dl.acm.org/doi/10.1007/s11042-018-6038-x
Ji QZhang LShu XTang J(2019)Image annotation refinement via 2P-KNN based group sparse reconstructionMultimedia Tools and Applications10.1007/s11042-018-5925-578:10(13213-13225)Online publication date: 1-May-2019
https://dl.acm.org/doi/10.1007/s11042-018-5925-5
Melo APaulheim H(2019)Local and global feature selection for multilabel classification with binary relevanceArtificial Intelligence Review10.1007/s10462-017-9556-451:1(33-60)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1007/s10462-017-9556-4
Show More Cited By

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

References

Cited By

Recommendations

Word Sense Based Hindi-Tamil Statistical Machine Translation

Overview of Verb Phrase Translation in Machine Translation: English to Tamil and Hindi to Tamil

Handling of prepositions in English to Bengali machine translation

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations