[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2020408.2020600acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
poster

Probabilistic topic models with biased propagation on heterogeneous information networks

Published: 21 August 2011 Publication History

Abstract

With the development of Web applications, textual documents are not only getting richer, but also ubiquitously interconnected with users and other objects in various ways, which brings about text-rich heterogeneous information networks. Topic models have been proposed and shown to be useful for document analysis, and the interactions among multi-typed objects play a key role at disclosing the rich semantics of the network. However, most of topic models only consider the textual information while ignore the network structures or can merely integrate with homogeneous networks. None of them can handle heterogeneous information network well. In this paper, we propose a novel topic model with biased propagation (TMBP) algorithm to directly incorporate heterogeneous information network with topic modeling in a unified way. The underlying intuition is that multi-typed objects should be treated differently along with their inherent textual information and the rich semantics of the heterogeneous information network. A simple and unbiased topic propagation across such a heterogeneous network does not make much sense. Consequently, we investigate and develop two biased propagation frameworks, the biased random walk framework and the biased regularization framework, for the TMBP algorithm from different perspectives, which can discover latent topics and identify clusters of multi-typed objects simultaneously. We extensively evaluate the proposed approach and compare to the state-of-the-art techniques on several datasets. Experimental results demonstrate that the improvement in our proposed approach is consistent and promising.

References

[1]
D. M. Blei and J. D. Lafferty. Correlated topic models. In NIPS, 2005.
[2]
D. M. Blei and J. D. Lafferty. Dynamic topic models. In ICML, pages 113--120, 2006.
[3]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3:993--1022, 2003.
[4]
A. Borodin, G. O. Roberts, J. S. Rosenthal, and P. Tsaparas. Link analysis ranking: algorithms, theory, and experiments. TOIT, 5(1):231--297, 2005.
[5]
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer Networks, 30(1--7):107--117, 1998.
[6]
D. Cai, Q. Mei, J. Han, and C. Zhai. Modeling hidden topics on document manifold. In CIKM, pages 911--920, 2008.
[7]
D. Cai, X. Wang, and X. He. Probabilistic dyadic data analysis with local and global consistency. In ICML, page 14, 2009.
[8]
D. A. Cohn and T. Hofmann. The missing link - a probabilistic model of document content and hypertext connectivity. In NIPS, pages 430--436, 2000.
[9]
A. Dempster, N. Laird, D. Rubin, et al. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1--38, 1977.
[10]
H. Deng, M. R. Lyu, and I. King. Effective latent space graph-based re-ranking model with global consistency. In WSDM, pages 212--221, 2009.
[11]
H. Deng, M. R. Lyu, and I. King. A generalized Co-HITS algorithm and its application to bipartite graphs. In KDD, pages 239--248, 2009.
[12]
H. Deng, B. Zhao, and J. han. Collective topic modeling for heterogeneous networks. In SIGIR, 2011.
[13]
T. Hofmann. Probabilistic latent semantic indexing. In SIGIR, pages 50--57, 1999.
[14]
S. Huh and S. E. Fienberg. Discriminative topic modeling based on manifold learning. In KDD, pages 653--662, 2010.
[15]
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM), 46(5):604--632, 1999.
[16]
D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In NIPS, pages 556--562, 2000.
[17]
Q. Mei, D. Cai, D. Zhang, and C. Zhai. Topic modeling with network regularization. In WWW, pages 101--110, 2008.
[18]
R. Nallapati, A. Ahmed, E. P. Xing, and W. Cohen. Joint latent topic models for text and citations. In KDD, pages 542--550, 2008.
[19]
R. Neal and G. Hinton. A view of the EM algorithm that justifies incremental, sparse, and other variants. Learning in graphical models, 89:355--368, 1998.
[20]
W. Press, S. Teukolsky, W. Vetterling, and B. Flannery. Numerical Recipes in C: The Art of Scientific Computing, Cambridge, 1992.
[21]
A. Smola and R. Kondor. Kernels and regularization on graphs. COLT, 2003.
[22]
M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. L. Griffiths. Probabilistic author-topic models for information discovery. In KDD, pages 306--315, 2004.
[23]
Y. Sun, J. Han, J. Gao, and Y. Yu. itopicmodel: Information network-integrated topic modeling. In ICDM, pages 493--502, 2009.
[24]
Y. Sun, Y. Yu, and J. Han. Ranking-based clustering of heterogeneous information networks with star network schema. In KDD, pages 797--806, 2009.
[25]
J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: extraction and mining of academic social networks. In KDD, pages 990--998, 2008.
[26]
X. Wei and W. B. Croft. LDA-based document models for ad-hoc retrieval. In SIGIR, pages 178--185, 2006.
[27]
W. Xu, X. Liu, and Y. Gong. Document clustering based on non-negative matrix factorization. In SIGIR, pages 267--273, 2003.
[28]
Z. Yin, L. Cao, J. Han, C. Zhai, and T. S. Huang. Geographical topic discovery and comparison. In WWW, pages 247--256, 2011.
[29]
B. Zhang, H. Li, Y. Liu, L. Ji, W. Xi, W. Fan, Z. Chen, and W.-Y. Ma. Improving web search results using affinity graph. In SIGIR, pages 504--511, 2005.
[30]
D. Zhou, J. Bian, S. Zheng, H. Zha, and C. L. Giles. Exploring social annotations for information retrieval. In WWW, pages 715--724, 2008.
[31]
D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Schölkopf. Learning with local and global consistency. In NIPS, 2003.
[32]
D. Zhou, B. Schölkopf, and T. Hofmann. Semi- supervised learning on directed graphs. In NIPS, 2004.
[33]
X. Zhu, Z. Ghahramani, and J. D. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, pages 912--919, 2003.

Cited By

View all

Index Terms

  1. Probabilistic topic models with biased propagation on heterogeneous information networks

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2011
      1446 pages
      ISBN:9781450308137
      DOI:10.1145/2020408
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 August 2011

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. biased propagation
      2. clustering
      3. heterogeneous information network
      4. topic modeling

      Qualifiers

      • Poster

      Conference

      KDD '11
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)13
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 12 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Revisiting Probabilistic Latent Semantic Analysis: Extensions, Challenges and InsightsTechnologies10.3390/technologies1201000512:1(5)Online publication date: 3-Jan-2024
      • (2023)A review on semi-supervised clusteringInformation Sciences10.1016/j.ins.2023.02.088632(164-200)Online publication date: Jun-2023
      • (2023)An ablation study on the use of publication venue quality to rank computer science departmentsScientometrics10.1007/s11192-023-04733-2128:8(4197-4218)Online publication date: 3-Jul-2023
      • (2022)VFDP: Visual Analysis of Flight Delay and Propagation on a Geographical MapIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2020.303719123:4(3510-3521)Online publication date: Apr-2022
      • (2022)Joint Text Mining with Heterogeneous DataMachine Learning for Text10.1007/978-3-030-96623-2_8(233-256)Online publication date: 10-Feb-2022
      • (2021)Random Walks in HypergraphInternational Journal of Education and Information Technologies10.46300/9109.2021.15.215(13-20)Online publication date: 10-Mar-2021
      • (2020)Framework for Inferring Following Strategies from Time Series of Movement DataACM Transactions on Knowledge Discovery from Data10.1145/338573014:3(1-22)Online publication date: 13-May-2020
      • (2020)Better Classifier Calibration for Small DatasetsACM Transactions on Knowledge Discovery from Data10.1145/338565614:3(1-19)Online publication date: 13-May-2020
      • (2020)MP2SDAACM Transactions on Knowledge Discovery from Data10.1145/337491914:3(1-22)Online publication date: 13-Mar-2020
      • (2020)Efficient Ridesharing Framework for Ride-matching via Heterogeneous Network EmbeddingACM Transactions on Knowledge Discovery from Data10.1145/337383914:3(1-24)Online publication date: 13-Mar-2020
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media