More Web Proxy on the site http://driver.im/

research-article

Nonparametric bayesian upstream supervised multi-modal topic models

Authors:

Zengchang QinAuthors Info & Claims

WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining

Pages 493 - 502

https://doi.org/10.1145/2556195.2556238

Published: 24 February 2014 Publication History

Abstract

Learning with multi-modal data is at the core of many multimedia applications, such as cross-modal retrieval and image annotation. In this paper, we present a nonparametric Bayesian approach to learning upstream supervised topic models for analyzing multi-modal data. Our model develops a compound nonparametric Bayesian multi-modal prior to describe the correlation structure of data both within each individual modality and between different modalities. It extends the hierarchical Dirichlet process (HDP) through incorporating upstream supervised response variables and values of latent functions under Gaussian process (GP). Upstream responses shared by data from multiple modalities are beneficial for discriminatively training and GP allows flexible structure learning of correlations. Hence, our model inherits the automatic determination of the number of topics from HDP, structure learning from GP and enhanced predictive capacity from upstream supervision. We also provide efficient variational inference and prediction algorithms. Empirical studies demonstrate superior performances on several benchmark datasets compared with previous competitors.

References

[1]

D. Blackwell and J. MacQueen. Ferguson distributions via Pólya urn schemes. The Annals of Statistics, 1(2):353--355, 1973.

[2]

D. Blei and M. Jordan. Modeling annotated data. In ACM SIGIR, pages 127--134, 2003.

Digital Library

[3]

D. Blei and J. Lafferty. A correlated topic model of science. The Annals of Applied Statistics, pages 17--35, 2007.

[4]

D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. JMLR, 3:993--1022, 2003.

Digital Library

[5]

D. M. Blei and J. D. McAuliffe. Supervised topic models. In NIPS, 2007.

Digital Library

[6]

G. Carneiro, A. Chan, P. Moreno, and N. Vasconcelos. Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans. Pattern Anal. Mach. Intell., 29(3):394--410, 2007.

Digital Library

[7]

N. Chen, J. Zhu, F. Sun, and E. Xing. Large-margin predictive latent subspace learning for multiview data analysis. IEEE Trans. Pattern Anal. Mach. Intell., 34(12):2365--2378, 2012.

Digital Library

[8]

N. Chen, J. Zhu, and E. Xing. Predictive subspace learning for multi-view data: A large margin approach. In NIPS, 2010.

[9]

W. B. Croft, D. Metzler, and T. Strohman. Search engines: Information retrieval in practice. Addison-Wesley Reading, 2010.

Digital Library

[10]

P. Duygulu, K. Barnard, J. De Freitas, and D. Forsyth. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In ECCV, pages 349--354. Springer, 2002.

Digital Library

[11]

L. Fei-Fei and P. Perona. A Bayesian hierarchical model for learning natural scene categories. In CVPR, volume 2, pages 524--531. IEEE, 2005.

Digital Library

[12]

S. Feng, R. Manmatha, and V. Lavrenko. Multiple Bernoulli relevance models for image and video annotation. In CVPR, volume 2, pages II--1002. IEEE, 2004.

Digital Library

[13]

T. Ferguson. A Bayesian analysis of some nonparametric problems. The Annals of Statistics, pages 209--230, 1973.

[14]

M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid. Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In ICCV, pages 309--316. IEEE, 2009.

[15]

M. Jordan, Z. Ghahramani, T. Jaakkola, and L. Saul. An introduction to variational methods for graphical models. Machine Learning, 37(2):183--233, 1999.

Digital Library

[16]

V. Lavrenko, R. Manmatha, and J. Jeon. A model for learning the semantics of pictures. In NIPS, 2003.

[17]

P. Liang, S. Petrov, M. I. Jordan, and D. Klein. The infinite PCFG using hierarchical Dirichlet processes. In EMNLP/CoNLL, 2007.

[18]

J. Liu, M. Li, Q. Liu, H. Lu, and S. Ma. Image annotation via graph learning. Pattern Recognition, 42(2):218--228, 2009.

Digital Library

[19]

D. G. Lowe. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2):91--110, 2004.

Digital Library

[20]

A. Makadia, V. Pavlovic, and S. Kumar. A new baseline for image annotation. In ECCV, volume 8, pages 316--329, 2008.

Digital Library

[21]

D. Metzler and R. Manmatha. An inference network approach to image retrieval. In CIVR, 2004.

[22]

F. Monay and D. Gatica-Perez. Modeling semantic aspects for cross-media image indexing. IEEE Trans. Pattern Anal. Mach. Intell., 29(10):1802--1817, 2007.

Digital Library

[23]

J. Paisley, C. Wang, and D. M. Blei. The discrete infinite logistic normal distribution. Bayesian Analysis, 7(4):997--1034, 2012.

[24]

N. Rasiwasia, J. Costa Pereira, E. Coviello, G. Doyle, G. Lanckriet, R. Levy, and N. Vasconcelos. A new approach to cross-modal multimedia retrieval. In ACM Multimedia, pages 251--260, 2010.

Digital Library

[25]

C. Rasmussen and C. Williams. Gaussian processes for machine learning, volume 1. MIT press Cambridge, MA, 2006.

Digital Library

[26]

K. Salomatin, Y. Yang, and A. Lad. Multi-field correlated topic modeling. SIAM SDM, 2009.

[27]

J. Sethuraman. A constructive definition of Dirichlet priors. Statistica Sinica, 4:639--650, 1994.

[28]

A. Sharma, A. Kumar, H. Daume, and D. Jacobs. Generalized multiview analysis: A discriminative latent space. In CVPR, pages 2160--2167. IEEE, 2012.

Digital Library

[29]

Y. Teh, M. Jordan, M. Beal, and D. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):1566--1581, 2006.

[30]

S. Virtanen, Y. Jia, A. Klami, and T. Darrell. Factorized multi-modal topic model. In UAI, 2012.

[31]

S. Virtanen, A. Klami, and S. Kaski. Bayesian CCA via group sparsity. In ICML, 2011.

[32]

C. Wang, S. Yan, L. Zhang, and H. Zhang. Multi-label sparse coding for automatic image annotation. In CVPR, pages 1643--1650. IEEE, 2009.

[33]

P. Wu, S. C.-H. Hoi, P. Zhao, and Y. He. Mining social images with distance metric learning for automated image tagging. In WSDM, pages 197--206. ACM, 2011.

Digital Library

[34]

H. Xia, P. Wu, and S. C. Hoi. Online multi-modal distance learning for scalable multimedia retrieval. In WSDM, pages 455--464. ACM, 2013.

Digital Library

[35]

E. Xing, R. Yan, and A. Hauptmann. Mining associated text and images with dual-wing harmoniums. In UAI, 2005.

Digital Library

[36]

O. Yakhnenko and V. Honavar. Multi-modal hierarchical Dirichlet process model for predicting image annotation and image-object label correspondence. In SIAM SDM, 2009.

[37]

A. Yavlinsky, E. Schofield, and S. Rüger. Automated image annotation using global features and robust nonparametric density estimation. In CIVR, 2005.

Digital Library

[38]

J. Yu, Y. Cong, Z. Qin, and T. Wan. Cross-modal topic correlations for multimedia retrieval. In ICPR, 2012.

[39]

Y. Zhen and D. Yeung. A probabilistic model for multimodal hash function learning. In ACM SIGKDD, 2012.

Digital Library

[40]

J. Zhu, A. Ahmed, and E. P. Xing. Medlda: maximum margin supervised topic models. Journal of Machine Learning Research, 13:2237--2278, 2012.

Digital Library

[41]

J. Zhu, L.-J. Li, L. Fei-Fei, and E. P. Xing. Large margin learning of upstream scene understanding models. In NIPS, pages 2586--2594, 2010.

Cited By

Okui T(2019)Bayesian Nonparametric Topic Model Using an Outcome Variable For Microbial Dataアウトカム情報を利用した細菌データに対するノンパラメトリックベイズトピックモデルOuyou toukeigaku10.5023/jappstat.48.148:1-2(1-16)Online publication date: 2019
https://doi.org/10.5023/jappstat.48.1
Xu MZhu ZZhao Y(2019)Towards learning a semantic-consistent subspace for cross-modal retrievalMultimedia Tools and Applications10.1007/s11042-018-6578-078:1(389-412)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1007/s11042-018-6578-0
Zhuang YWang HXiao JWu FYang YLu WZhang Z(2017)Bag-of-Discriminative-Words (BoDW) Representation via Topic ModelingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.265857129:5(977-990)Online publication date: 1-May-2017
https://dl.acm.org/doi/10.1109/TKDE.2017.2658571
Show More Cited By

Index Terms

Nonparametric bayesian upstream supervised multi-modal topic models
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
2. Mathematics of computing
  1. Probability and statistics
    1. Nonparametric statistics

Recommendations

Bayesian nonparametric latent feature models
Discriminative multi-modal deep generative models
Abstract
Multi-modal learning is of practical importance for real world datasets with heterogeneous features. Most of existing multi-modal algorithms aim to learn shared representations that can maximally extract the correlations among multi-...
Bayesian Sparse Topic Model

This paper presents a new Bayesian sparse learning approach to select salient lexical features for sparse topic modeling. The Bayesian learning based on latent Dirichlet allocation (LDA) is performed by incorporating the spike-and-slab priors. According ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining

February 2014

712 pages

ISBN:9781450323512

DOI:10.1145/2556195

General Chairs:
Ben Carterette
University of Delaware, USA
,
Fernando Diaz
Microsoft Research, USA
,
Program Chairs:
Carlos Castillo
Qatar Computing Research Institute, Qatar
,
Donald Metzler
Google, USA

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 February 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WSDM 2014

Sponsor:

WSDM 2014: Seventh ACM International Conference on Web Search and Data Mining

February 24 - 28, 2014

New York, New York, USA

Acceptance Rates

WSDM '14 Paper Acceptance Rate 64 of 355 submissions, 18%;

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
349
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Okui T(2019)Bayesian Nonparametric Topic Model Using an Outcome Variable For Microbial Dataアウトカム情報を利用した細菌データに対するノンパラメトリックベイズトピックモデルOuyou toukeigaku10.5023/jappstat.48.148:1-2(1-16)Online publication date: 2019
https://doi.org/10.5023/jappstat.48.1
Xu MZhu ZZhao Y(2019)Towards learning a semantic-consistent subspace for cross-modal retrievalMultimedia Tools and Applications10.1007/s11042-018-6578-078:1(389-412)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1007/s11042-018-6578-0
Zhuang YWang HXiao JWu FYang YLu WZhang Z(2017)Bag-of-Discriminative-Words (BoDW) Representation via Topic ModelingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.265857129:5(977-990)Online publication date: 1-May-2017
https://dl.acm.org/doi/10.1109/TKDE.2017.2658571
Zou JYe QCui YWan FFu KJiao J(2016)Collective motion pattern inference via Locally Consistent Latent Dirichlet AllocationNeurocomputing10.1016/j.neucom.2015.08.108184(221-231)Online publication date: May-2016
https://doi.org/10.1016/j.neucom.2015.08.108
Wang CYang HMeinel C(2016)A deep semantic framework for multimodal representation learningMultimedia Tools and Applications10.1007/s11042-016-3380-875:15(9255-9276)Online publication date: 1-Aug-2016
https://dl.acm.org/doi/10.1007/s11042-016-3380-8
Jameel SLam WBing L(2015)Supervised topic models with word order structure for document classification and retrieval learningInformation Retrieval Journal10.1007/s10791-015-9254-218:4(283-330)Online publication date: 4-Jun-2015
https://doi.org/10.1007/s10791-015-9254-2
Wang YWu FSong JLi XZhuang YHua KRui YSteinmetz RHanjalic ANatsev AZhu W(2014)Multi-modal Mutual Topic Reinforce Modeling for Cross-media RetrievalProceedings of the 22nd ACM international conference on Multimedia10.1145/2647868.2654901(307-316)Online publication date: 3-Nov-2014
https://dl.acm.org/doi/10.1145/2647868.2654901

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten