[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2556195.2556238acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Nonparametric bayesian upstream supervised multi-modal topic models

Published: 24 February 2014 Publication History

Abstract

Learning with multi-modal data is at the core of many multimedia applications, such as cross-modal retrieval and image annotation. In this paper, we present a nonparametric Bayesian approach to learning upstream supervised topic models for analyzing multi-modal data. Our model develops a compound nonparametric Bayesian multi-modal prior to describe the correlation structure of data both within each individual modality and between different modalities. It extends the hierarchical Dirichlet process (HDP) through incorporating upstream supervised response variables and values of latent functions under Gaussian process (GP). Upstream responses shared by data from multiple modalities are beneficial for discriminatively training and GP allows flexible structure learning of correlations. Hence, our model inherits the automatic determination of the number of topics from HDP, structure learning from GP and enhanced predictive capacity from upstream supervision. We also provide efficient variational inference and prediction algorithms. Empirical studies demonstrate superior performances on several benchmark datasets compared with previous competitors.

References

[1]
D. Blackwell and J. MacQueen. Ferguson distributions via Pólya urn schemes. The Annals of Statistics, 1(2):353--355, 1973.
[2]
D. Blei and M. Jordan. Modeling annotated data. In ACM SIGIR, pages 127--134, 2003.
[3]
D. Blei and J. Lafferty. A correlated topic model of science. The Annals of Applied Statistics, pages 17--35, 2007.
[4]
D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. JMLR, 3:993--1022, 2003.
[5]
D. M. Blei and J. D. McAuliffe. Supervised topic models. In NIPS, 2007.
[6]
G. Carneiro, A. Chan, P. Moreno, and N. Vasconcelos. Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans. Pattern Anal. Mach. Intell., 29(3):394--410, 2007.
[7]
N. Chen, J. Zhu, F. Sun, and E. Xing. Large-margin predictive latent subspace learning for multiview data analysis. IEEE Trans. Pattern Anal. Mach. Intell., 34(12):2365--2378, 2012.
[8]
N. Chen, J. Zhu, and E. Xing. Predictive subspace learning for multi-view data: A large margin approach. In NIPS, 2010.
[9]
W. B. Croft, D. Metzler, and T. Strohman. Search engines: Information retrieval in practice. Addison-Wesley Reading, 2010.
[10]
P. Duygulu, K. Barnard, J. De Freitas, and D. Forsyth. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In ECCV, pages 349--354. Springer, 2002.
[11]
L. Fei-Fei and P. Perona. A Bayesian hierarchical model for learning natural scene categories. In CVPR, volume 2, pages 524--531. IEEE, 2005.
[12]
S. Feng, R. Manmatha, and V. Lavrenko. Multiple Bernoulli relevance models for image and video annotation. In CVPR, volume 2, pages II--1002. IEEE, 2004.
[13]
T. Ferguson. A Bayesian analysis of some nonparametric problems. The Annals of Statistics, pages 209--230, 1973.
[14]
M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid. Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In ICCV, pages 309--316. IEEE, 2009.
[15]
M. Jordan, Z. Ghahramani, T. Jaakkola, and L. Saul. An introduction to variational methods for graphical models. Machine Learning, 37(2):183--233, 1999.
[16]
V. Lavrenko, R. Manmatha, and J. Jeon. A model for learning the semantics of pictures. In NIPS, 2003.
[17]
P. Liang, S. Petrov, M. I. Jordan, and D. Klein. The infinite PCFG using hierarchical Dirichlet processes. In EMNLP/CoNLL, 2007.
[18]
J. Liu, M. Li, Q. Liu, H. Lu, and S. Ma. Image annotation via graph learning. Pattern Recognition, 42(2):218--228, 2009.
[19]
D. G. Lowe. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2):91--110, 2004.
[20]
A. Makadia, V. Pavlovic, and S. Kumar. A new baseline for image annotation. In ECCV, volume 8, pages 316--329, 2008.
[21]
D. Metzler and R. Manmatha. An inference network approach to image retrieval. In CIVR, 2004.
[22]
F. Monay and D. Gatica-Perez. Modeling semantic aspects for cross-media image indexing. IEEE Trans. Pattern Anal. Mach. Intell., 29(10):1802--1817, 2007.
[23]
J. Paisley, C. Wang, and D. M. Blei. The discrete infinite logistic normal distribution. Bayesian Analysis, 7(4):997--1034, 2012.
[24]
N. Rasiwasia, J. Costa Pereira, E. Coviello, G. Doyle, G. Lanckriet, R. Levy, and N. Vasconcelos. A new approach to cross-modal multimedia retrieval. In ACM Multimedia, pages 251--260, 2010.
[25]
C. Rasmussen and C. Williams. Gaussian processes for machine learning, volume 1. MIT press Cambridge, MA, 2006.
[26]
K. Salomatin, Y. Yang, and A. Lad. Multi-field correlated topic modeling. SIAM SDM, 2009.
[27]
J. Sethuraman. A constructive definition of Dirichlet priors. Statistica Sinica, 4:639--650, 1994.
[28]
A. Sharma, A. Kumar, H. Daume, and D. Jacobs. Generalized multiview analysis: A discriminative latent space. In CVPR, pages 2160--2167. IEEE, 2012.
[29]
Y. Teh, M. Jordan, M. Beal, and D. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):1566--1581, 2006.
[30]
S. Virtanen, Y. Jia, A. Klami, and T. Darrell. Factorized multi-modal topic model. In UAI, 2012.
[31]
S. Virtanen, A. Klami, and S. Kaski. Bayesian CCA via group sparsity. In ICML, 2011.
[32]
C. Wang, S. Yan, L. Zhang, and H. Zhang. Multi-label sparse coding for automatic image annotation. In CVPR, pages 1643--1650. IEEE, 2009.
[33]
P. Wu, S. C.-H. Hoi, P. Zhao, and Y. He. Mining social images with distance metric learning for automated image tagging. In WSDM, pages 197--206. ACM, 2011.
[34]
H. Xia, P. Wu, and S. C. Hoi. Online multi-modal distance learning for scalable multimedia retrieval. In WSDM, pages 455--464. ACM, 2013.
[35]
E. Xing, R. Yan, and A. Hauptmann. Mining associated text and images with dual-wing harmoniums. In UAI, 2005.
[36]
O. Yakhnenko and V. Honavar. Multi-modal hierarchical Dirichlet process model for predicting image annotation and image-object label correspondence. In SIAM SDM, 2009.
[37]
A. Yavlinsky, E. Schofield, and S. Rüger. Automated image annotation using global features and robust nonparametric density estimation. In CIVR, 2005.
[38]
J. Yu, Y. Cong, Z. Qin, and T. Wan. Cross-modal topic correlations for multimedia retrieval. In ICPR, 2012.
[39]
Y. Zhen and D. Yeung. A probabilistic model for multimodal hash function learning. In ACM SIGKDD, 2012.
[40]
J. Zhu, A. Ahmed, and E. P. Xing. Medlda: maximum margin supervised topic models. Journal of Machine Learning Research, 13:2237--2278, 2012.
[41]
J. Zhu, L.-J. Li, L. Fei-Fei, and E. P. Xing. Large margin learning of upstream scene understanding models. In NIPS, pages 2586--2594, 2010.

Cited By

View all
  • (2019)Bayesian Nonparametric Topic Model Using an Outcome Variable For Microbial Dataアウトカム情報を利用した細菌データに対する ノンパラメトリックベイズトピックモデルOuyou toukeigaku10.5023/jappstat.48.148:1-2(1-16)Online publication date: 2019
  • (2019)Towards learning a semantic-consistent subspace for cross-modal retrievalMultimedia Tools and Applications10.1007/s11042-018-6578-078:1(389-412)Online publication date: 1-Jan-2019
  • (2017)Bag-of-Discriminative-Words (BoDW) Representation via Topic ModelingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.265857129:5(977-990)Online publication date: 1-May-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining
February 2014
712 pages
ISBN:9781450323512
DOI:10.1145/2556195
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 February 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cross-modal retrieval
  2. multi-modal learning
  3. nonparametric bayesian
  4. topic model

Qualifiers

  • Research-article

Conference

WSDM 2014

Acceptance Rates

WSDM '14 Paper Acceptance Rate 64 of 355 submissions, 18%;
Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Bayesian Nonparametric Topic Model Using an Outcome Variable For Microbial Dataアウトカム情報を利用した細菌データに対する ノンパラメトリックベイズトピックモデルOuyou toukeigaku10.5023/jappstat.48.148:1-2(1-16)Online publication date: 2019
  • (2019)Towards learning a semantic-consistent subspace for cross-modal retrievalMultimedia Tools and Applications10.1007/s11042-018-6578-078:1(389-412)Online publication date: 1-Jan-2019
  • (2017)Bag-of-Discriminative-Words (BoDW) Representation via Topic ModelingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.265857129:5(977-990)Online publication date: 1-May-2017
  • (2016)Collective motion pattern inference via Locally Consistent Latent Dirichlet AllocationNeurocomputing10.1016/j.neucom.2015.08.108184(221-231)Online publication date: May-2016
  • (2016)A deep semantic framework for multimodal representation learningMultimedia Tools and Applications10.1007/s11042-016-3380-875:15(9255-9276)Online publication date: 1-Aug-2016
  • (2015)Supervised topic models with word order structure for document classification and retrieval learningInformation Retrieval Journal10.1007/s10791-015-9254-218:4(283-330)Online publication date: 4-Jun-2015
  • (2014)Multi-modal Mutual Topic Reinforce Modeling for Cross-media RetrievalProceedings of the 22nd ACM international conference on Multimedia10.1145/2647868.2654901(307-316)Online publication date: 3-Nov-2014

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media