[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1553374.1553518acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

A stochastic memoizer for sequence data

Published: 14 June 2009 Publication History

Abstract

We propose an unbounded-depth, hierarchical, Bayesian nonparametric model for discrete sequence data. This model can be estimated from a single training sequence, yet shares statistical strength between subsequent symbol predictive distributions in such a way that predictive performance generalizes well. The model builds on a specific parameterization of an unbounded-depth hierarchical Pitman-Yor process. We introduce analytic marginalization steps (using coagulation operators) to reduce this model to one that can be represented in time and space linear in the length of the training sequence. We show how to perform inference in such a model without truncation approximation and introduce fragmentation operators necessary to do predictive inference. We demonstrate the sequence memoizer by using it as a language model, achieving state-of-the-art results.

References

[1]
Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3, 1137--1155.
[2]
Cleary, J. G. & Teahan, W. J. (1997). Unbounded length contexts for PPM. The Computer Journal, 40, 67--75.
[3]
Goodman, N. D., Mansinghka, V. K., Roy, D., Bonawitz, K., & Tenenbaum, J. B. (2008). Church: a language for generative models. In Uncertainty and Artificial Intelligence. to appear.
[4]
Ho, M. W., James, L. F., & Lau, J. W. (2006). Coagulation fragmentation laws induced by general coagulations of two-parameter Poisson-Dirichlet processes. http://arxiv.org/abs/math.PR/0601608.
[5]
Ishwaran, H. & James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. Journal of American Statistical Association, 96(453), 161--173.
[6]
Michie, D. (1968). Memo functions and machine learning. Nature, 218, 19--22.
[7]
Mnih, A. & Hinton, G. (2009). A scalable hierarchical distributed language model. In Neural Information Processing Systems 22. to appear.
[8]
Mochihashi, D. & Sumita, E. (2008). The infinite Markov model. In Advances in Neural Information Processing Systems 20, (pp. 1017--1024).
[9]
Perman, M. (1990). Random Discrete Distributions Derived from Subordinators. PhD thesis, Department of Statistics, University of California at Berkeley.
[10]
Pitman, J. (1999). Coalescents with multiple collisions. Annals of Probability, 27, 1870--1902.
[11]
Pitman, J. & Yor, M. (1997). The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Annals of Probability, 25, 855--900.
[12]
Sudderth, E. B. & Jordan, M. I. (2009). Shared segmentation of natural scenes using dependent pitman-yor processes. In Neural Information Processing Systems 22. to appear.
[13]
Teh, Y. W. (2006). A hierarchical Bayesian language model based on Pitman-Yor processes. In Proceedings of the Association for Computational Linguistics, (pp. 985--992).
[14]
Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2006). Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476), 1566--1581.
[15]
Ukkonen, E. (1995). On-line construction of suffix trees. Algorithmica, 14, 249--260.
[16]
Weiner, P. (1973). Linear pattern matching algorithms. In IEEE 14th Annual Symposium on Switching and Automata Theory, (pp. 1--11).
[17]
Wood, F. & Teh, Y. W. (2009). A hierarchical nonparametric Bayesian approach to statistical language model domain adaptation. In Journal of Machine Learning, Workshop and Conference Proceedings: Artificial Intelligence in Statistics 2009, volume 5, (pp. 607--614).

Cited By

View all
  • (2024)Persistent impact of prior experience on spatial learningeneuro10.1523/ENEURO.0266-24.2024(ENEURO.0266-24.2024)Online publication date: 16-Sep-2024
  • (2023)Affine Monads and Lazy Structures for Bayesian ProgrammingProceedings of the ACM on Programming Languages10.1145/35712397:POPL(1338-1368)Online publication date: 11-Jan-2023
  • (2022)Bernstein-von Mises theorem for the Pitman-Yor process of nonnegative typeElectronic Journal of Statistics10.1214/22-EJS207716:2Online publication date: 1-Jan-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
June 2009
1331 pages
ISBN:9781605585161
DOI:10.1145/1553374

Sponsors

  • NSF
  • Microsoft Research: Microsoft Research
  • MITACS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2009

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

ICML '09
Sponsor:
  • Microsoft Research

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Persistent impact of prior experience on spatial learningeneuro10.1523/ENEURO.0266-24.2024(ENEURO.0266-24.2024)Online publication date: 16-Sep-2024
  • (2023)Affine Monads and Lazy Structures for Bayesian ProgrammingProceedings of the ACM on Programming Languages10.1145/35712397:POPL(1338-1368)Online publication date: 11-Jan-2023
  • (2022)Bernstein-von Mises theorem for the Pitman-Yor process of nonnegative typeElectronic Journal of Statistics10.1214/22-EJS207716:2Online publication date: 1-Jan-2022
  • (2022)Perfect Sampling of the Posterior in the Hierarchical Pitman–Yor ProcessBayesian Analysis10.1214/21-BA126917:3Online publication date: 1-Sep-2022
  • (2021)Machine learning for video event recognitionIntegrated Computer-Aided Engineering10.3233/ICA-210652(1-24)Online publication date: 6-Apr-2021
  • (2021)The unreasonable effectiveness of machine learning in Moldavian versus Romanian dialect identificationInternational Journal of Intelligent Systems10.1002/int.2274637:8(4928-4966)Online publication date: 17-Nov-2021
  • (2020)Hierarchical Species Sampling ModelsBayesian Analysis10.1214/19-BA116815:3Online publication date: 1-Sep-2020
  • (2019)Bayesian Analysis in Natural Language Processing, Second EditionSynthesis Lectures on Human Language Technologies10.2200/S00905ED2V01Y201903HLT04112:1(1-343)Online publication date: 8-Apr-2019
  • (2019)Multi-label classification using stacked hierarchical Dirichlet processes with reduced sampling complexityKnowledge and Information Systems10.1007/s10115-018-1204-z59:1(93-115)Online publication date: 1-Apr-2019
  • (2017)Complex Event Detection by Identifying Reliable Shots from Untrimmed Videos2017 IEEE International Conference on Computer Vision (ICCV)10.1109/ICCV.2017.86(736-744)Online publication date: Oct-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media