[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies

Published: 08 February 2010 Publication History

Abstract

We present the nested Chinese restaurant process (nCRP), a stochastic process that assigns probability distributions to ensembles of infinitely deep, infinitely branching trees. We show how this stochastic process can be used as a prior distribution in a Bayesian nonparametric model of document collections. Specifically, we present an application to information retrieval in which documents are modeled as paths down a random tree, and the preferential attachment dynamics of the nCRP leads to clustering of documents according to sharing of topics at multiple levels of abstraction. Given a corpus of documents, a posterior inference algorithm finds an approximation to a posterior distribution over trees, topics and allocations of words to levels of the tree. We demonstrate this algorithm on collections of scientific abstracts from several journals. This model exemplifies a recent trend in statistical machine learning—the use of Bayesian nonparametric methods to infer distributions on flexible data structures.

References

[1]
Airoldi, E., Blei, D., Fienberg, S., and Xing, E. 2008. Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9, 1981--2014.
[2]
Albert, R., and Barabasi, A. 2002. Statistical mechanics of complex networks. Rev. Mod. Phy. 74, 1, 47--97.
[3]
Aldous, D. 1985. Exchangeability and related topics. In Ecole d'Eté de Probabilités de Saint-Flour, XIII—1983. Springer-Verlag, Berlin, Germany, 1--198.
[4]
Antoniak, C. 1974. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Stat. 2, 1152--1174.
[5]
Barabasi, A., and Reka, A. 1999. Emergence of scaling in random networks. Science 286, 5439, 509--512.
[6]
Bernardo, J., and Smith, A. 1994. Bayesian Theory. Wiley, Chichester, UK.
[7]
Billingsley, P. 1995. Probability and Measure. Wiley, New York.
[8]
Blackwell, D., and MacQueen, J. B. 1973. Ferguson distributions via Pólya urn schemes. Ann. Stat. 1, 353--355.
[9]
Blei, D., Griffiths, T., Jordan, M., and Tenenbaum, J. 2003a. Hierarchical topic models and the nested Chinese restaurant process. In Adv. Neural Inf. Proc. Syst. 16. MIT Press, Cambridge, MA, 17--24.
[10]
Blei, D., and Jordan, M. 2003. Modeling annotated data. In Proceedings of the 26th annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 127--134.
[11]
Blei, D., and Jordan, M. 2005. Variational inference for Dirichlet process mixtures. J. Bayesian Anal. 1, 121--144.
[12]
Blei, D., and Lafferty, J. 2006. Dynamic topic models. In Proceedings of the 23rd International Conference on Machine Learning. ACM, New York, 113--120.
[13]
Blei, D., and Lafferty, J. 2007. A correlated topic model of Science. Ann. Appl. Stat. 1, 17--35.
[14]
Blei, D., and Lafferty, J. 2009. Topic models. In Text Mining: Theory and Applications. Taylor and Francis, London, UK.
[15]
Blei, D., Ng, A., and Jordan, M. 2003b. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993--1022.
[16]
Chakrabarti, S., Dom, B., Agrawal, R., and Raghavan, P. 1998. Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. VLDB J. 7, 163--178.
[17]
Cimiano, P., Hotho, A., and Staab, S. 2005. Learning concept hierarchies from text corpora using formal concept analysis. J. Artifi. Intelli. Res. 24, 305--339.
[18]
Deerwester, S., Dumais, S., Landauer, T., Furnas, G., and Harshman, R. 1990. Indexing by latent semantic analysis. J. Amer. Soci. Inf. Sci. 6, 391--407.
[19]
Devroye, L., Györfi, L., and Lugosi, G. 1996. A Probabilistic Theory of Pattern Recognition. Springer-Verlag, New York.
[20]
Dietz, L., Bickel, S., and Scheffer, T. 2007. Unsupervised prediction of citation influences. In Proceedings of the 24th International Conference on Machine Learning. ACM, New York, 233--240.
[21]
Drinea, E., Enachesu, M., and Mitzenmacher, M. 2006. Variations on random graph models for the web. Tech. Rep. TR-06-01, Harvard University.
[22]
Duan, J., Guindani, M., and Gelfand, A. 2007. Generalized spatial Dirichlet process models. Biometrika 94, 809--825.
[23]
Duda, R., Hart, P., and Stork, D. 2000. Pattern Classification. Wiley-Interscience, New York.
[24]
Dumais, S., and Chen, H. 2000. Hierarchical classification of web content. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 256--263.
[25]
Escobar, M., and West, M. 1995. Bayesian density estimation and inference using mixtures. J. Amer. Stat. Assoc. 90, 577--588.
[26]
Fei-Fei, L., and Perona, P. 2005. A Bayesian hierarchical model for learning natural scene categories. IEEE Comput. Visi. Patt. Rec., 524--531.
[27]
Ferguson, T. 1973. A Bayesian analysis of some nonparametric problems. Ann. Stat. 1, 209--230.
[28]
Gelfand, A., and Smith, A. 1990. Sampling based approaches to calculating marginal densities. J. Amer. Stati. Assoc. 85, 398--409.
[29]
Gelman, A., Carlin, J., Stern, H., and Rubin, D. 1995. Bayesian Data Analysis. Chapman & Hall, London, UK.
[30]
Geman, S., and Geman, D. 1984. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans. Patt. Anal. Mach. Intell. 6, 721--741.
[31]
Goldberg, A., and Tarjan, R. 1986. A new approach to the maximum flow problem. JACM 35, 4, 921--940.
[32]
Goldwater, S., Griffiths, T., and Johnson, M. 2006a. Contextual dependencies in unsupervised word segmentation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, 673--680.
[33]
Goldwater, S., Griffiths, T., and Johnson, M. 2006b. Interpolating between types and tokens by estimating power-law generators. In Advances in Neural Information Processing Systems 18. MIT Press, Cambridge, MA, 459--467.
[34]
Griffiths, T., and Ghahramani, Z. 2006. Infinite latent feature models and the Indian buffet process. In Advances in Neural Information Processing Systems 18. MIT Press, Cambridge, MA, 475--482.
[35]
Griffiths, T., and Steyvers, M. 2004. Finding scientific topics. Proc. Nat. Acad. Sci. 101, 5228--5235.
[36]
Griffiths, T., and Steyvers, M. 2006. Probabilistic topic models. In Latent Semantic Analysis: A Road to Meaning, Erlbaum, Hillsdale, NJ.
[37]
Hastie, T., Tibshirani, R., and Friedman, J. 2001. The Elements of Statistical Learning. Springer, New York.
[38]
Heller, K., and Ghahramani, Z. 2005. Bayesian hierarchical clustering. In Proceedings of the 22nd International Conference on Machine Learning. ACM, New York, 297--304.
[39]
Hjort, N., Holmes, C., Müller, P., and Walker, S. 2010. Bayesian Nonparametrics: Principles and Practice. Cambridge University Press, Cambridge, UK.
[40]
Hofmann, T. 1999a. The cluster-abstraction model: Unsupervised learning of topic hierarchies from text data. In Proceedings of the 15th International Joint Conferences on Artificial Intelligence. Morgan-Kaufmann, San Francisco, CA, 682--687.
[41]
Hofmann, T. 1999b. Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 50--57.
[42]
Johnson, M., Griffiths, T., and S., G. 2007. Adaptor grammars: A framework for specifying compositional nonparametric Bayesian models. In Advances in Neural Information Processing Systems 19. MIT Press, Cambridge, MA, 641--648.
[43]
Johnson, N., and Kotz, S. 1977. Urn Models and Their Applications: An Approach to Modern Discrete Probability Theory. Wiley, New York.
[44]
Jordan, M. I. 2000. Graphical models. Stat. Sci. 19, 140--155.
[45]
Kass, R., and Raftery, A. 1995. Bayes factors. J. American Statistical Association 90, 773--795.
[46]
Koller, D., and Sahami, M. 1997. Hierarchically classifying documents using very few words. In Proceedings of the 14th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA, 170--178.
[47]
Krapivsky, P., and Redner, S. 2001. Organization of growing random networks. Phys. Rev. E 63, 6.
[48]
Kurihara, K., Welling, M., and Vlassis, N. 2007. Accelerated variational Dirichlet process mixtures. In Advances in Neural Information Processing Systems 19. MIT Press, Cambridge, MA, 761--768.
[49]
Larsen, B., and Aone, C. 1999. Fast and effective text mining using linear-time document clustering. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, 16--22.
[50]
Lauritzen, S. L. 1996. Graphical Models. Oxford University Press, Oxford, UK.
[51]
Li, W., Blei, D., and McCallum, A. 2007. Nonparametric Bayes pachinko allocation. In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence. AUAI Press, Menlo Park, CA.
[52]
Liang, P., Petrov, S., Klein, D., and Jordan, M. 2007. The infinite PCFG using hierarchical Dirichlet processes. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, Stroudsburg, PA, 688--697.
[53]
Liu, J. 1994. The collapsed Gibbs sampler in Bayesian computations with application to a gene regulation problem. J. Amer. Stat. Assoc. 89, 958--966.
[54]
MacEachern, S., and Muller, P. 1998. Estimating mixture of Dirichlet process models. J. Computat. Graph. Stat. 7, 223--238.
[55]
Marlin, B. 2003. Modeling user rating profiles for collaborative filtering. In Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 627--634.
[56]
McCallum, A., Corrada-Emmanuel, A., and Wang, X. 2004. The author-recipient-topic model for topic and role discovery in social networks: Experiments with Enron and academic email. Tech. rep., University of Massachusetts, Amherst.
[57]
McCallum, A., Nigam, K., Rennie, J., and Seymore, K. 1999. Building domain-specific search engines with machine learning techniques. In Proceedings of the AAAI Spring Symposium on Intelligent Agents in Cyberspace. AAAI Press, Menlo Park, CA.
[58]
Milch, B., Marthi, B., Sontag, D., Ong, D., and Kobolov, A. 2005. Approximate inference for infinite contingent Bayesian networks. In Proceedings of 10th International Workshop on Artificial Intelligence and Statistics. The Society for Artificial Intelligence and Statistics, NJ.
[59]
Mimno, D., and McCallum, A. 2007. Organizing the OCA: Learning faceted subjects from a library of digital books. In Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital libraries. ACM, New York, 376--385.
[60]
Neal, R. 2000. Markov chain sampling methods for Dirichlet process mixture models. J. Computat. Graph. Stat. 9, 249--265.
[61]
Newman, D., Chemudugunta, C., and Smyth, P. 2006. Statistical entity-topic models. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, 680--686.
[62]
Pasula, H., and Russell, S. 2001. Approximate inference for first-order probabilistic languages. In Proceedings of the 17th International Joint Conferences on Artificial Intelligence. Morgan Kaufmann, San Francisco, CA, 741--748.
[63]
Pitman, J. 2002. Combinatorial Stochastic Processes. Lecture Notes for St. Flour Summer School. Springer-Verlag, New York, NY.
[64]
Poole, D. 2007. Logical generative models for probabilistic reasoning about existence, roles and identity. In Proceedings of the 22nd AAAI Conference on Artificial Intelligence. AAAI Press, Menlo Park, CA, 1271--1279.
[65]
Pritchard, J., Stephens, M., and Donnelly, P. 2000. Inference of population structure using multilocus genotype data. Genetics 155, 945--959.
[66]
Robert, C., and Casella, G. 2004. Monte Carlo Statistical Methods. Springer-Verlag, New York.
[67]
Rodríguez, A., Dunson, D. B., and Gelfand, A. E. 2008. The nested Dirichlet process. J. Amer. Stat. Assoc. 103, 1131--1154.
[68]
Rosen-Zvi, M., Griffiths, T., Steyvers, M., and Smith, P. 2004. The author-topic model for authors and documents. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. AUAI Press, Menlo Park, CA, 487--494.
[69]
Sanderson, M., and Croft, B. 1999. Deriving concept hierarchies from text. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 206--213.
[70]
Sethuraman, J. 1994. A constructive definition of Dirichlet priors. Stat. Sin. 4, 639--650.
[71]
Stoica, E., and Hearst, M. 2004. Nearly-automated metadata hierarchy creation. In Companion Proceedings of HLT-NAACL. Boston, MA.
[72]
Sudderth, E., Torralba, A., Freeman, W., and Willsky, A. 2005. Describing visual scenes using transformed Dirichlet processes. In Advances in Neural Information Processing Systems 18. MIT Press, Cambridge, MA, 1297--1306.
[73]
Teh, Y., Gorur, D., and Ghahramani, Z. 2007. Stick-breaking construction for the Indian buffet process. In Proceedings of 11th International Workshop on Artificial Intelligence and Statistics. The Society for Artificial Intelligence and Statistics, NJ.
[74]
Teh, Y., Jordan, M., Beal, M., and Blei, D. 2007. Hierarchical Dirichlet processes. J. Amer. Stat. Assoc. 101, 1566--1581.
[75]
Thibaux, R., and Jordan, M. 2007. Hierarchical beta processes and the Indian buffet process. In Proceedings of 11th International Workshop on Artificial Intelligence and Statistics. The Society for Artificial Intelligence and Statistics, NJ.
[76]
Titterington, D., Smith, A., and Makov, E. 1985. Statistical Analysis of Finite Mixture Distributions. Wiley, Chichester, UK.
[77]
Vaithyanathan, S., and Dom, B. 2000. Model-based hierarchical clustering. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence. Morgan-Kaufmann, San Francisco, CA, 599--608.
[78]
Xing, E., Jordan, M., and Sharan, R. 2007. Bayesian haplotype inference via the Dirichlet process. J. Comput. Biol. 14, 267--284.
[79]
Zamir, O., and Etzioni, O. 1998. Web document clustering: A feasibility demonstration. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 46--54.

Cited By

View all
  • (2025)Online Review-Assisted Product Improvement Attribute Extraction and Prioritization Method for Small- and Medium-Sized EnterprisesSystems10.3390/systems1303014913:3(149)Online publication date: 22-Feb-2025
  • (2025)Two-Stage Adaptive Robust Charging Scheduling of Electric Vehicle Station Based on Hybrid Demand ResponseIEEE Transactions on Transportation Electrification10.1109/TTE.2024.340614411:1(1442-1454)Online publication date: Feb-2025
  • (2025)Hierarchical Deep Document ModelIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348752337:1(351-364)Online publication date: 1-Jan-2025
  • Show More Cited By

Index Terms

  1. The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Journal of the ACM
      Journal of the ACM  Volume 57, Issue 2
      January 2010
      248 pages
      ISSN:0004-5411
      EISSN:1557-735X
      DOI:10.1145/1667053
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 08 February 2010
      Accepted: 01 August 2009
      Revised: 01 February 2008
      Received: 01 June 2007
      Published in JACM Volume 57, Issue 2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Bayesian nonparametric statistics
      2. unsupervised learning

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)129
      • Downloads (Last 6 weeks)16
      Reflects downloads up to 02 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)Online Review-Assisted Product Improvement Attribute Extraction and Prioritization Method for Small- and Medium-Sized EnterprisesSystems10.3390/systems1303014913:3(149)Online publication date: 22-Feb-2025
      • (2025)Two-Stage Adaptive Robust Charging Scheduling of Electric Vehicle Station Based on Hybrid Demand ResponseIEEE Transactions on Transportation Electrification10.1109/TTE.2024.340614411:1(1442-1454)Online publication date: Feb-2025
      • (2025)Hierarchical Deep Document ModelIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348752337:1(351-364)Online publication date: 1-Jan-2025
      • (2025)Next-Generation Text Summarization: A T5-LSTM FusionNet Hybrid Approach for Psychological DataIEEE Access10.1109/ACCESS.2025.354059013(37557-37571)Online publication date: 2025
      • (2025)Bridging insight gaps in topic dependency discovery with a knowledge-inspired topic modelInformation Processing & Management10.1016/j.ipm.2024.10391162:1(103911)Online publication date: Jan-2025
      • (2025)A multifacet hierarchical sentiment-topic model with application to multi-brand online review analysisStatistics and Computing10.1007/s11222-025-10593-y35:3Online publication date: 5-Mar-2025
      • (2024)Multi-Dimensional Clustering Based on Restricted Distance-Dependent Mixture Dirichlet Process for Diffusion Tensor ImagingJournal of Data Science10.6339/24-JDS1125(537-557)Online publication date: 2-Apr-2024
      • (2024)A Hybrid Semantic Representation Method Based on Fusion Conceptual Knowledge and Weighted Word Embeddings for English TextsInformation10.3390/info1511070815:11(708)Online publication date: 5-Nov-2024
      • (2024)Dimensions of Job Demands Among New-Generation Employees Based on Online Reviews by EmployeesBehavioral Sciences10.3390/bs1411099014:11(990)Online publication date: 24-Oct-2024
      • (2024)On the affinity, rationality, and diversity of hierarchical topic modelingProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i17.29895(19261-19269)Online publication date: 20-Feb-2024
      • Show More Cited By

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media