More Web Proxy on the site http://driver.im/

Article

DeViSE: a deep visual-semantic embedding model

Authors:

Greg S. Corrado,

Jonathon Shlens,

Marc'Aurelio Ranzato,

Tomas MikolovAuthors Info & Claims

NIPS'13: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2

Pages 2121 - 2129

Published: 05 December 2013 Publication History

Abstract

Modern visual recognition systems are often limited in their ability to scale to large numbers of object categories. This limitation is in part due to the increasing difficulty of acquiring sufficient training data in the form of labeled images as the number of object categories grows. One remedy is to leverage data from other sources - such as text data - both to train visual models and to constrain their predictions. In this paper we present a new deep visual-semantic embedding model trained to identify visual objects using both labeled image data as well as semantic information gleaned from unannotated text. We demonstrate that this model matches state-of-the-art performance on the 1000-class ImageNet object recognition challenge while making more semantically reasonable errors, and also show that the semantic information can be exploited to make predictions about tens of thousands of image labels not observed during training. Semantic knowledge improves such zero-shot predictions achieving hit rates of up to 18% across thousands of novel labels never seen by the visual model.

References

[1]

S. Bengio, J. Weston, and D. Grangier. Label embedding trees for large multi-class tasks. In Advances in Neural Information Processing Systems, NIPS, 2010.

[2]

Y. Bengio, R. Ducharme, and P. Vincent. A neural probabilistic language model. Journal of Machine Learning Research, 3:1137-1155, 2003.

[3]

A. Coates and A. Ng. The importance of encoding versus training with sparse coding and vector quantization. In International Conference on Machine Learning (ICML), 2011.

[4]

Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, MarcAurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng. Large scale distributed deep networks. In Advances in Neural Information Processing Systems, NIPS, 2012.

[5]

Thomas Dean, Mark Ruzon, Mark Segal, Jonathon Shlens, Sudheendra Vijayanarasimhan, and Jay Yagnik. Fast, accurate detection of 100,000 object classes on a single machine. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.

[6]

Jia Deng, Alex Berg, Sanjeev Satheesh, Hao Su, Aditya Khosla, and Fei-Fei Li. Imagenet large scale visual recognition challenge 2012.

[7]

Jia Deng, Wei Dong, Richard Socher, Li jia Li, Kai Li, and Li Fei-fei. Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.

[8]

Thomas Deselaers and Vittorio Ferrari. Visual and semantic similarity in imagenet. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.

[9]

J. C. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:2121-2159, 2011.

[10]

Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.

[11]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, NIPS, 2012.

[12]

Thomas Mensink, Jakob Verbeek, Florent Perronnin, and Gabriela Csurka. Metric learning for large scale image classification: Generalizing to new classes at near-zero cost. In European Conference on Computer Vision (ECCV), 2012.

[13]

Tomas Mikolov, Kai Chen, Greg S. Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. In International Conference on Learning Representations (ICLR), Scottsdale, Arizona, USA, 2013.

[14]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, NIPS, 2013.

[15]

Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Jonathon Shlens, Andrea Frome, Greg S. Corrado, and Jeffrey Dean. Zero-shot learning by convex combination of semantic embeddings. arXiv (to be submitted), 2013.

[16]

Mark Palatucci, Dean Pomerleau, Geoffrey E. Hinton, and Tom M. Mitchell. Zero-shot learning with semantic output codes. In Advances in Neural Information Processing Systems, NIPS, 2009.

[17]

Marcus Rohrbach, Michael Stark, and Bernt Schiele. Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.

[18]

R. Socher, M. Ganjoo, H. Sridhar, O. Bastani, C. D. Manning, and A. Y. Ng. Zero-shot learning through cross-modal transfer. In International Conference on Learning Representations (ICLR), Scottsdale, Arizona, USA, 2013.

[19]

L.J.P. van der Maaten and G.E. Hinton. Visualizing high-dimensional data using t-sne. Journal of Machine Learning Research, 9:2579-2605, 2008.

[20]

Jason Weston, Samy Bengio, and Nicolas Usunier. Large scale image annotation: learning to rank with joint word-image embeddings. Machine Learning, 81(1):21-35, 2010.

Cited By

Park JHessel JChandu KLiang PLu XWest PYu YHuang QGao JFarhadi AChoi YOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Localized symbolic knowledge distillation for visual commonsense modelsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666623(11338-11352)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666623
Li XZhang YBian SQu YXie YShi ZFan JElkind E(2023)VS-BoostProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/123(1107-1115)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/123
Chen ZHuang YChen JGeng YZhang WFang YPan JChen HWilliams BChen YNeville J(2023)DUETProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i1.25114(405-413)Online publication date: 7-Feb-2023
https://dl.acm.org/doi/10.1609/aaai.v37i1.25114
Show More Cited By

Recommendations

DEVise: integrated querying and visual exploration of large datasets

DEVise is a data exploration system that allows users to easily develop, browse, and share visual presentation of large tabular datasets (possibly containing or referencing multimedia objects) from several sources. The DEVise framework is being ...
DEVise (demo abstract): integrated querying and visual exploration of large datasets
SIGMOD '97: Proceedings of the 1997 ACM SIGMOD international conference on Management of data

DEVise is a data exploration system that allows users to easily develop, browse, and share visual presentations of large tabular datasets (possibly containing or referencing multimedia objects) from several sources. The DEVise framework, implemented in ...
Transductive Multilabel Learning via Label Set Propagation

The problem of multilabel classification has attracted great interest in the last decade, where each instance can be assigned with a set of multiple class labels simultaneously. It has a wide variety of real-world applications, e.g., automatic image ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'13: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2

December 2013

3236 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 05 December 2013

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

256
Total Citations
View Citations
1
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Park JHessel JChandu KLiang PLu XWest PYu YHuang QGao JFarhadi AChoi YOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Localized symbolic knowledge distillation for visual commonsense modelsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666623(11338-11352)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666623
Li XZhang YBian SQu YXie YShi ZFan JElkind E(2023)VS-BoostProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/123(1107-1115)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/123
Chen ZHuang YChen JGeng YZhang WFang YPan JChen HWilliams BChen YNeville J(2023)DUETProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i1.25114(405-413)Online publication date: 7-Feb-2023
https://dl.acm.org/doi/10.1609/aaai.v37i1.25114
Neptune NMothe J(2023)Annotating Satellite Images of Forests with Keywords from a Specialized Corpus in the Context of Change DetectionProceedings of the 20th International Conference on Content-based Multimedia Indexing10.1145/3617233.3617242(14-20)Online publication date: 20-Sep-2023
https://dl.acm.org/doi/10.1145/3617233.3617242
Liang PMorency L(2023)Tutorial on Multimodal Machine Learning: Principles, Challenges, and Open QuestionsCompanion Publication of the 25th International Conference on Multimodal Interaction10.1145/3610661.3617602(101-104)Online publication date: 9-Oct-2023
https://dl.acm.org/doi/10.1145/3610661.3617602
Zhang HYang YQi FQian SXu CEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)C2MR: Continual Cross-Modal Retrieval for Streaming Multi-modal DataProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611919(8963-8974)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3611919
Liang WZhang YKwon YYeung SZou JKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Mind the gapProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601550(17612-17625)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601550
Naeem MXian YVan Gool LTombari FKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)I2DFormerProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601162(12283-12294)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601162
Zheng Y(2022)Improved Feature Generating Networks for Zero-Shot LearningProceedings of the 2022 4th International Conference on Image, Video and Signal Processing10.1145/3531232.3531263(211-217)Online publication date: 18-Mar-2022
https://dl.acm.org/doi/10.1145/3531232.3531263
Xie ZLi LZhong LLiu JLiu LOria VSapino MSatoh SKerhervé BCheng WIde ISingh V(2022)Cross-Modal Retrieval between Event-Dense Text and ImageProceedings of the 2022 International Conference on Multimedia Retrieval10.1145/3512527.3531374(229-238)Online publication date: 27-Jun-2022
https://dl.acm.org/doi/10.1145/3512527.3531374
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents