More Web Proxy on the site http://driver.im/

research-article

The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies

Authors:

Thomas L. Griffiths,

Michael I. JordanAuthors Info & Claims

Journal of the ACM (JACM), Volume 57, Issue 2

Article No.: 7, Pages 1 - 30

https://doi.org/10.1145/1667053.1667056

Published: 08 February 2010 Publication History

Abstract

We present the nested Chinese restaurant process (nCRP), a stochastic process that assigns probability distributions to ensembles of infinitely deep, infinitely branching trees. We show how this stochastic process can be used as a prior distribution in a Bayesian nonparametric model of document collections. Specifically, we present an application to information retrieval in which documents are modeled as paths down a random tree, and the preferential attachment dynamics of the nCRP leads to clustering of documents according to sharing of topics at multiple levels of abstraction. Given a corpus of documents, a posterior inference algorithm finds an approximation to a posterior distribution over trees, topics and allocations of words to levels of the tree. We demonstrate this algorithm on collections of scientific abstracts from several journals. This model exemplifies a recent trend in statistical machine learning—the use of Bayesian nonparametric methods to infer distributions on flexible data structures.

References

[1]

Airoldi, E., Blei, D., Fienberg, S., and Xing, E. 2008. Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9, 1981--2014.

Digital Library

[2]

Albert, R., and Barabasi, A. 2002. Statistical mechanics of complex networks. Rev. Mod. Phy. 74, 1, 47--97.

[3]

Aldous, D. 1985. Exchangeability and related topics. In Ecole d'Eté de Probabilités de Saint-Flour, XIII—1983. Springer-Verlag, Berlin, Germany, 1--198.

[4]

Antoniak, C. 1974. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Stat. 2, 1152--1174.

[5]

Barabasi, A., and Reka, A. 1999. Emergence of scaling in random networks. Science 286, 5439, 509--512.

[6]

Bernardo, J., and Smith, A. 1994. Bayesian Theory. Wiley, Chichester, UK.

[7]

Billingsley, P. 1995. Probability and Measure. Wiley, New York.

[8]

Blackwell, D., and MacQueen, J. B. 1973. Ferguson distributions via Pólya urn schemes. Ann. Stat. 1, 353--355.

[9]

Blei, D., Griffiths, T., Jordan, M., and Tenenbaum, J. 2003a. Hierarchical topic models and the nested Chinese restaurant process. In Adv. Neural Inf. Proc. Syst. 16. MIT Press, Cambridge, MA, 17--24.

[10]

Blei, D., and Jordan, M. 2003. Modeling annotated data. In Proceedings of the 26th annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 127--134.

Digital Library

[11]

Blei, D., and Jordan, M. 2005. Variational inference for Dirichlet process mixtures. J. Bayesian Anal. 1, 121--144.

[12]

Blei, D., and Lafferty, J. 2006. Dynamic topic models. In Proceedings of the 23rd International Conference on Machine Learning. ACM, New York, 113--120.

Digital Library

[13]

Blei, D., and Lafferty, J. 2007. A correlated topic model of Science. Ann. Appl. Stat. 1, 17--35.

[14]

Blei, D., and Lafferty, J. 2009. Topic models. In Text Mining: Theory and Applications. Taylor and Francis, London, UK.

[15]

Blei, D., Ng, A., and Jordan, M. 2003b. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993--1022.

[16]

Chakrabarti, S., Dom, B., Agrawal, R., and Raghavan, P. 1998. Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. VLDB J. 7, 163--178.

Digital Library

[17]

Cimiano, P., Hotho, A., and Staab, S. 2005. Learning concept hierarchies from text corpora using formal concept analysis. J. Artifi. Intelli. Res. 24, 305--339.

Digital Library

[18]

Deerwester, S., Dumais, S., Landauer, T., Furnas, G., and Harshman, R. 1990. Indexing by latent semantic analysis. J. Amer. Soci. Inf. Sci. 6, 391--407.

[19]

Devroye, L., Györfi, L., and Lugosi, G. 1996. A Probabilistic Theory of Pattern Recognition. Springer-Verlag, New York.

[20]

Dietz, L., Bickel, S., and Scheffer, T. 2007. Unsupervised prediction of citation influences. In Proceedings of the 24th International Conference on Machine Learning. ACM, New York, 233--240.

Digital Library

[21]

Drinea, E., Enachesu, M., and Mitzenmacher, M. 2006. Variations on random graph models for the web. Tech. Rep. TR-06-01, Harvard University.

[22]

Duan, J., Guindani, M., and Gelfand, A. 2007. Generalized spatial Dirichlet process models. Biometrika 94, 809--825.

[23]

Duda, R., Hart, P., and Stork, D. 2000. Pattern Classification. Wiley-Interscience, New York.

Digital Library

[24]

Dumais, S., and Chen, H. 2000. Hierarchical classification of web content. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 256--263.

Digital Library

[25]

Escobar, M., and West, M. 1995. Bayesian density estimation and inference using mixtures. J. Amer. Stat. Assoc. 90, 577--588.

[26]

Fei-Fei, L., and Perona, P. 2005. A Bayesian hierarchical model for learning natural scene categories. IEEE Comput. Visi. Patt. Rec., 524--531.

Digital Library

[27]

Ferguson, T. 1973. A Bayesian analysis of some nonparametric problems. Ann. Stat. 1, 209--230.

[28]

Gelfand, A., and Smith, A. 1990. Sampling based approaches to calculating marginal densities. J. Amer. Stati. Assoc. 85, 398--409.

[29]

Gelman, A., Carlin, J., Stern, H., and Rubin, D. 1995. Bayesian Data Analysis. Chapman & Hall, London, UK.

[30]

Geman, S., and Geman, D. 1984. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans. Patt. Anal. Mach. Intell. 6, 721--741.

Digital Library

[31]

Goldberg, A., and Tarjan, R. 1986. A new approach to the maximum flow problem. JACM 35, 4, 921--940.

Digital Library

[32]

Goldwater, S., Griffiths, T., and Johnson, M. 2006a. Contextual dependencies in unsupervised word segmentation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, 673--680.

Digital Library

[33]

Goldwater, S., Griffiths, T., and Johnson, M. 2006b. Interpolating between types and tokens by estimating power-law generators. In Advances in Neural Information Processing Systems 18. MIT Press, Cambridge, MA, 459--467.

[34]

Griffiths, T., and Ghahramani, Z. 2006. Infinite latent feature models and the Indian buffet process. In Advances in Neural Information Processing Systems 18. MIT Press, Cambridge, MA, 475--482.

[35]

Griffiths, T., and Steyvers, M. 2004. Finding scientific topics. Proc. Nat. Acad. Sci. 101, 5228--5235.

[36]

Griffiths, T., and Steyvers, M. 2006. Probabilistic topic models. In Latent Semantic Analysis: A Road to Meaning, Erlbaum, Hillsdale, NJ.

[37]

Hastie, T., Tibshirani, R., and Friedman, J. 2001. The Elements of Statistical Learning. Springer, New York.

[38]

Heller, K., and Ghahramani, Z. 2005. Bayesian hierarchical clustering. In Proceedings of the 22nd International Conference on Machine Learning. ACM, New York, 297--304.

Digital Library

[39]

Hjort, N., Holmes, C., Müller, P., and Walker, S. 2010. Bayesian Nonparametrics: Principles and Practice. Cambridge University Press, Cambridge, UK.

[40]

Hofmann, T. 1999a. The cluster-abstraction model: Unsupervised learning of topic hierarchies from text data. In Proceedings of the 15th International Joint Conferences on Artificial Intelligence. Morgan-Kaufmann, San Francisco, CA, 682--687.

Digital Library

[41]

Hofmann, T. 1999b. Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 50--57.

Digital Library

[42]

Johnson, M., Griffiths, T., and S., G. 2007. Adaptor grammars: A framework for specifying compositional nonparametric Bayesian models. In Advances in Neural Information Processing Systems 19. MIT Press, Cambridge, MA, 641--648.

[43]

Johnson, N., and Kotz, S. 1977. Urn Models and Their Applications: An Approach to Modern Discrete Probability Theory. Wiley, New York.

[44]

Jordan, M. I. 2000. Graphical models. Stat. Sci. 19, 140--155.

[45]

Kass, R., and Raftery, A. 1995. Bayes factors. J. American Statistical Association 90, 773--795.

[46]

Koller, D., and Sahami, M. 1997. Hierarchically classifying documents using very few words. In Proceedings of the 14th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA, 170--178.

Digital Library

[47]

Krapivsky, P., and Redner, S. 2001. Organization of growing random networks. Phys. Rev. E 63, 6.

[48]

Kurihara, K., Welling, M., and Vlassis, N. 2007. Accelerated variational Dirichlet process mixtures. In Advances in Neural Information Processing Systems 19. MIT Press, Cambridge, MA, 761--768.

[49]

Larsen, B., and Aone, C. 1999. Fast and effective text mining using linear-time document clustering. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, 16--22.

Digital Library

[50]

Lauritzen, S. L. 1996. Graphical Models. Oxford University Press, Oxford, UK.

[51]

Li, W., Blei, D., and McCallum, A. 2007. Nonparametric Bayes pachinko allocation. In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence. AUAI Press, Menlo Park, CA.

[52]

Liang, P., Petrov, S., Klein, D., and Jordan, M. 2007. The infinite PCFG using hierarchical Dirichlet processes. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, Stroudsburg, PA, 688--697.

[53]

Liu, J. 1994. The collapsed Gibbs sampler in Bayesian computations with application to a gene regulation problem. J. Amer. Stat. Assoc. 89, 958--966.

[54]

MacEachern, S., and Muller, P. 1998. Estimating mixture of Dirichlet process models. J. Computat. Graph. Stat. 7, 223--238.

[55]

Marlin, B. 2003. Modeling user rating profiles for collaborative filtering. In Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 627--634.

[56]

McCallum, A., Corrada-Emmanuel, A., and Wang, X. 2004. The author-recipient-topic model for topic and role discovery in social networks: Experiments with Enron and academic email. Tech. rep., University of Massachusetts, Amherst.

[57]

McCallum, A., Nigam, K., Rennie, J., and Seymore, K. 1999. Building domain-specific search engines with machine learning techniques. In Proceedings of the AAAI Spring Symposium on Intelligent Agents in Cyberspace. AAAI Press, Menlo Park, CA.

Digital Library

[58]

Milch, B., Marthi, B., Sontag, D., Ong, D., and Kobolov, A. 2005. Approximate inference for infinite contingent Bayesian networks. In Proceedings of 10th International Workshop on Artificial Intelligence and Statistics. The Society for Artificial Intelligence and Statistics, NJ.

[59]

Mimno, D., and McCallum, A. 2007. Organizing the OCA: Learning faceted subjects from a library of digital books. In Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital libraries. ACM, New York, 376--385.

Digital Library

[60]

Neal, R. 2000. Markov chain sampling methods for Dirichlet process mixture models. J. Computat. Graph. Stat. 9, 249--265.

[61]

Newman, D., Chemudugunta, C., and Smyth, P. 2006. Statistical entity-topic models. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, 680--686.

Digital Library

[62]

Pasula, H., and Russell, S. 2001. Approximate inference for first-order probabilistic languages. In Proceedings of the 17th International Joint Conferences on Artificial Intelligence. Morgan Kaufmann, San Francisco, CA, 741--748.

Digital Library

[63]

Pitman, J. 2002. Combinatorial Stochastic Processes. Lecture Notes for St. Flour Summer School. Springer-Verlag, New York, NY.

[64]

Poole, D. 2007. Logical generative models for probabilistic reasoning about existence, roles and identity. In Proceedings of the 22nd AAAI Conference on Artificial Intelligence. AAAI Press, Menlo Park, CA, 1271--1279.

Digital Library

[65]

Pritchard, J., Stephens, M., and Donnelly, P. 2000. Inference of population structure using multilocus genotype data. Genetics 155, 945--959.

[66]

Robert, C., and Casella, G. 2004. Monte Carlo Statistical Methods. Springer-Verlag, New York.

Digital Library

[67]

Rodríguez, A., Dunson, D. B., and Gelfand, A. E. 2008. The nested Dirichlet process. J. Amer. Stat. Assoc. 103, 1131--1154.

[68]

Rosen-Zvi, M., Griffiths, T., Steyvers, M., and Smith, P. 2004. The author-topic model for authors and documents. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. AUAI Press, Menlo Park, CA, 487--494.

Digital Library

[69]

Sanderson, M., and Croft, B. 1999. Deriving concept hierarchies from text. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 206--213.

Digital Library

[70]

Sethuraman, J. 1994. A constructive definition of Dirichlet priors. Stat. Sin. 4, 639--650.

[71]

Stoica, E., and Hearst, M. 2004. Nearly-automated metadata hierarchy creation. In Companion Proceedings of HLT-NAACL. Boston, MA.

Digital Library

[72]

Sudderth, E., Torralba, A., Freeman, W., and Willsky, A. 2005. Describing visual scenes using transformed Dirichlet processes. In Advances in Neural Information Processing Systems 18. MIT Press, Cambridge, MA, 1297--1306.

[73]

Teh, Y., Gorur, D., and Ghahramani, Z. 2007. Stick-breaking construction for the Indian buffet process. In Proceedings of 11th International Workshop on Artificial Intelligence and Statistics. The Society for Artificial Intelligence and Statistics, NJ.

[74]

Teh, Y., Jordan, M., Beal, M., and Blei, D. 2007. Hierarchical Dirichlet processes. J. Amer. Stat. Assoc. 101, 1566--1581.

[75]

Thibaux, R., and Jordan, M. 2007. Hierarchical beta processes and the Indian buffet process. In Proceedings of 11th International Workshop on Artificial Intelligence and Statistics. The Society for Artificial Intelligence and Statistics, NJ.

[76]

Titterington, D., Smith, A., and Makov, E. 1985. Statistical Analysis of Finite Mixture Distributions. Wiley, Chichester, UK.

[77]

Vaithyanathan, S., and Dom, B. 2000. Model-based hierarchical clustering. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence. Morgan-Kaufmann, San Francisco, CA, 599--608.

Digital Library

[78]

Xing, E., Jordan, M., and Sharan, R. 2007. Bayesian haplotype inference via the Dirichlet process. J. Comput. Biol. 14, 267--284.

[79]

Zamir, O., and Etzioni, O. 1998. Web document clustering: A feasibility demonstration. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 46--54.

Digital Library

Cited By

Wang KLei AHuang ZGao ZMa QZheng CLi JEynard BXiao J(2025)Online Review-Assisted Product Improvement Attribute Extraction and Prioritization Method for Small- and Medium-Sized EnterprisesSystems10.3390/systems1303014913:3(149)Online publication date: 22-Feb-2025
https://doi.org/10.3390/systems13030149
Ren YTan MSu YWang RWang L(2025)Two-Stage Adaptive Robust Charging Scheduling of Electric Vehicle Station Based on Hybrid Demand ResponseIEEE Transactions on Transportation Electrification10.1109/TTE.2024.340614411:1(1442-1454)Online publication date: Feb-2025
https://doi.org/10.1109/TTE.2024.3406144
Yang YLalor JAbbasi AZeng D(2025)Hierarchical Deep Document ModelIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348752337:1(351-364)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1109/TKDE.2024.3487523
Show More Cited By

Index Terms

The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
2. Mathematics of computing
  1. Probability and statistics
    1. Stochastic processes

Recommendations

Modeling topic hierarchies with the recursive chinese restaurant process
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Topic models such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet processes (HDP) are simple solutions to discover topics from a set of unannotated documents. While they are simple and popular, a major shortcoming of LDA and HDP is that ...
Variational inference for the nested Chinese restaurant process
NIPS'09: Proceedings of the 23rd International Conference on Neural Information Processing Systems

The nested Chinese restaurant process (nCRP) is a powerful nonparametric Bayesian model for learning tree-based hierarchies from data. Since its posterior distribution is intractable, current inference methods have all relied on MCMC sampling. In this ...
Bayesian nonparametric latent feature models

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of the ACM

Journal of the ACM Volume 57, Issue 2

January 2010

248 pages

ISSN:0004-5411

EISSN:1557-735X

DOI:10.1145/1667053

Issue’s Table of Contents

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 February 2010

Accepted: 01 August 2009

Revised: 01 February 2008

Received: 01 June 2007

Published in JACM Volume 57, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

403
Total Citations
View Citations
4,448
Total Downloads

Downloads (Last 12 months)129
Downloads (Last 6 weeks)16

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang KLei AHuang ZGao ZMa QZheng CLi JEynard BXiao J(2025)Online Review-Assisted Product Improvement Attribute Extraction and Prioritization Method for Small- and Medium-Sized EnterprisesSystems10.3390/systems1303014913:3(149)Online publication date: 22-Feb-2025
https://doi.org/10.3390/systems13030149
Ren YTan MSu YWang RWang L(2025)Two-Stage Adaptive Robust Charging Scheduling of Electric Vehicle Station Based on Hybrid Demand ResponseIEEE Transactions on Transportation Electrification10.1109/TTE.2024.340614411:1(1442-1454)Online publication date: Feb-2025
https://doi.org/10.1109/TTE.2024.3406144
Yang YLalor JAbbasi AZeng D(2025)Hierarchical Deep Document ModelIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348752337:1(351-364)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1109/TKDE.2024.3487523
Khan BUsman MKhan IKhan JHussain DGu Y(2025)Next-Generation Text Summarization: A T5-LSTM FusionNet Hybrid Approach for Psychological DataIEEE Access10.1109/ACCESS.2025.354059013(37557-37571)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3540590
Tang YHuang HShi XMao X(2025)Bridging insight gaps in topic dependency discovery with a knowledge-inspired topic modelInformation Processing & Management10.1016/j.ipm.2024.10391162:1(103911)Online publication date: Jan-2025
https://doi.org/10.1016/j.ipm.2024.103911
Liang QDeng X(2025)A multifacet hierarchical sentiment-topic model with application to multi-brand online review analysisStatistics and Computing10.1007/s11222-025-10593-y35:3Online publication date: 5-Mar-2025
https://doi.org/10.1007/s11222-025-10593-y
Park SYu JSternberg Z(2024)Multi-Dimensional Clustering Based on Restricted Distance-Dependent Mixture Dirichlet Process for Diffusion Tensor ImagingJournal of Data Science10.6339/24-JDS1125(537-557)Online publication date: 2-Apr-2024
https://doi.org/10.6339/24-JDS1125
Qiu ZHuang GQin XWang YWang JZhou Y(2024)A Hybrid Semantic Representation Method Based on Fusion Conceptual Knowledge and Weighted Word Embeddings for English TextsInformation10.3390/info1511070815:11(708)Online publication date: 5-Nov-2024
https://doi.org/10.3390/info15110708
Hu YZhou LHuang PLiu QLiu H(2024)Dimensions of Job Demands Among New-Generation Employees Based on Online Reviews by EmployeesBehavioral Sciences10.3390/bs1411099014:11(990)Online publication date: 24-Oct-2024
https://doi.org/10.3390/bs14110990
Wu XPan FNguyen TFeng YLiu CNguyen CLuu AWooldridge MDy JNatarajan S(2024)On the affinity, rationality, and diversity of hierarchical topic modelingProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i17.29895(19261-19269)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i17.29895
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents