More Web Proxy on the site http://driver.im/

research-article

A stochastic memoizer for sequence data

Authors:

Cédric Archambeau,

Lancelot James,

Yee Whye TehAuthors Info & Claims

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

Pages 1129 - 1136

https://doi.org/10.1145/1553374.1553518

Published: 14 June 2009 Publication History

Abstract

We propose an unbounded-depth, hierarchical, Bayesian nonparametric model for discrete sequence data. This model can be estimated from a single training sequence, yet shares statistical strength between subsequent symbol predictive distributions in such a way that predictive performance generalizes well. The model builds on a specific parameterization of an unbounded-depth hierarchical Pitman-Yor process. We introduce analytic marginalization steps (using coagulation operators) to reduce this model to one that can be represented in time and space linear in the length of the training sequence. We show how to perform inference in such a model without truncation approximation and introduce fragmentation operators necessary to do predictive inference. We demonstrate the sequence memoizer by using it as a language model, achieving state-of-the-art results.

References

[1]

Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3, 1137--1155.

Digital Library

[2]

Cleary, J. G. & Teahan, W. J. (1997). Unbounded length contexts for PPM. The Computer Journal, 40, 67--75.

[3]

Goodman, N. D., Mansinghka, V. K., Roy, D., Bonawitz, K., & Tenenbaum, J. B. (2008). Church: a language for generative models. In Uncertainty and Artificial Intelligence. to appear.

[4]

Ho, M. W., James, L. F., & Lau, J. W. (2006). Coagulation fragmentation laws induced by general coagulations of two-parameter Poisson-Dirichlet processes. http://arxiv.org/abs/math.PR/0601608.

[5]

Ishwaran, H. & James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. Journal of American Statistical Association, 96(453), 161--173.

[6]

Michie, D. (1968). Memo functions and machine learning. Nature, 218, 19--22.

[7]

Mnih, A. & Hinton, G. (2009). A scalable hierarchical distributed language model. In Neural Information Processing Systems 22. to appear.

[8]

Mochihashi, D. & Sumita, E. (2008). The infinite Markov model. In Advances in Neural Information Processing Systems 20, (pp. 1017--1024).

[9]

Perman, M. (1990). Random Discrete Distributions Derived from Subordinators. PhD thesis, Department of Statistics, University of California at Berkeley.

[10]

Pitman, J. (1999). Coalescents with multiple collisions. Annals of Probability, 27, 1870--1902.

[11]

Pitman, J. & Yor, M. (1997). The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Annals of Probability, 25, 855--900.

[12]

Sudderth, E. B. & Jordan, M. I. (2009). Shared segmentation of natural scenes using dependent pitman-yor processes. In Neural Information Processing Systems 22. to appear.

[13]

Teh, Y. W. (2006). A hierarchical Bayesian language model based on Pitman-Yor processes. In Proceedings of the Association for Computational Linguistics, (pp. 985--992).

Digital Library

[14]

Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2006). Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476), 1566--1581.

[15]

Ukkonen, E. (1995). On-line construction of suffix trees. Algorithmica, 14, 249--260.

Digital Library

[16]

Weiner, P. (1973). Linear pattern matching algorithms. In IEEE 14th Annual Symposium on Switching and Automata Theory, (pp. 1--11).

Digital Library

[17]

Wood, F. & Teh, Y. W. (2009). A hierarchical nonparametric Bayesian approach to statistical language model domain adaptation. In Journal of Machine Learning, Workshop and Conference Proceedings: Artificial Intelligence in Statistics 2009, volume 5, (pp. 607--614).

Cited By

Awh MLatimer KZhou NLeveroni ZPoon AStephens ZYu J(2024)Persistent impact of prior experience on spatial learningeneuro10.1523/ENEURO.0266-24.2024(ENEURO.0266-24.2024)Online publication date: 16-Sep-2024
https://doi.org/10.1523/ENEURO.0266-24.2024
Dash SKaddar YPaquet HStaton S(2023)Affine Monads and Lazy Structures for Bayesian ProgrammingProceedings of the ACM on Programming Languages10.1145/35712397:POPL(1338-1368)Online publication date: 11-Jan-2023
https://dl.acm.org/doi/10.1145/3571239
Franssen Svan der Vaart A(2022)Bernstein-von Mises theorem for the Pitman-Yor process of nonnegative typeElectronic Journal of Statistics10.1214/22-EJS207716:2Online publication date: 1-Jan-2022
https://doi.org/10.1214/22-EJS2077
Show More Cited By

Recommendations

The sequence memoizer

Probabilistic models of sequences play a central role in most machine translation, automated speech recognition, lossless compression, spell-checking, and gene identification applications to name but a few. Unfortunately, real-world sequence data often ...
Generative adversarial networks

Generative adversarial networks are a kind of artificial intelligence algorithm designed to solve the generative modeling problem. The goal of a generative model is to study a collection of training examples and learn the probability distribution that ...
Transformers in Vision: A Survey
Astounding results from Transformer models on natural language tasks have intrigued the vision community to study their application to computer vision problems. Among their salient benefits, Transformers enable modeling long dependencies between input ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning

June 2009

1331 pages

ISBN:9781605585161

DOI:10.1145/1553374

General Chair:
Andrea Danyluk
Williams College
,
Program Chairs:
Léon Bottou
NEC Laboratories America
,
Michael Littman
Rutgers University

Copyright © 2009 Copyright 2009 by the author(s)/owner(s).

Sponsors

NSF
Microsoft Research: Microsoft Research
MITACS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

ICML '09

Sponsor:

Microsoft Research

ICML '09: The 26th Annual International Conference on Machine Learning held in conjunction with the 2007 International Conference on Inductive Logic Programming

June 14 - 18, 2009

Quebec, Montreal, Canada

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

45
Total Citations
View Citations
476
Total Downloads

Downloads (Last 12 months)21
Downloads (Last 6 weeks)0

Reflects downloads up to 21 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Awh MLatimer KZhou NLeveroni ZPoon AStephens ZYu J(2024)Persistent impact of prior experience on spatial learningeneuro10.1523/ENEURO.0266-24.2024(ENEURO.0266-24.2024)Online publication date: 16-Sep-2024
https://doi.org/10.1523/ENEURO.0266-24.2024
Dash SKaddar YPaquet HStaton S(2023)Affine Monads and Lazy Structures for Bayesian ProgrammingProceedings of the ACM on Programming Languages10.1145/35712397:POPL(1338-1368)Online publication date: 11-Jan-2023
https://dl.acm.org/doi/10.1145/3571239
Franssen Svan der Vaart A(2022)Bernstein-von Mises theorem for the Pitman-Yor process of nonnegative typeElectronic Journal of Statistics10.1214/22-EJS207716:2Online publication date: 1-Jan-2022
https://doi.org/10.1214/22-EJS2077
Bacallado SFavaro SPower STrippa L(2022)Perfect Sampling of the Posterior in the Hierarchical Pitman–Yor ProcessBayesian Analysis10.1214/21-BA126917:3Online publication date: 1-Sep-2022
https://doi.org/10.1214/21-BA1269
Avola DCascio MCinque LForesti GPannone D(2021)Machine learning for video event recognitionIntegrated Computer-Aided Engineering10.3233/ICA-210652(1-24)Online publication date: 6-Apr-2021
https://doi.org/10.3233/ICA-210652
Găman MIonescu R(2021)The unreasonable effectiveness of machine learning in Moldavian versus Romanian dialect identificationInternational Journal of Intelligent Systems10.1002/int.2274637:8(4928-4966)Online publication date: 17-Nov-2021
https://doi.org/10.1002/int.22746
Bassetti FCasarin RRossini L(2020)Hierarchical Species Sampling ModelsBayesian Analysis10.1214/19-BA116815:3Online publication date: 1-Sep-2020
https://doi.org/10.1214/19-BA1168
Cohen S(2019)Bayesian Analysis in Natural Language Processing, Second EditionSynthesis Lectures on Human Language Technologies10.2200/S00905ED2V01Y201903HLT04112:1(1-343)Online publication date: 8-Apr-2019
https://doi.org/10.2200/S00905ED2V01Y201903HLT041
Burkhardt SKramer S(2019)Multi-label classification using stacked hierarchical Dirichlet processes with reduced sampling complexityKnowledge and Information Systems10.1007/s10115-018-1204-z59:1(93-115)Online publication date: 1-Apr-2019
https://dl.acm.org/doi/10.1007/s10115-018-1204-z
Fan HChang XCheng DYang YXu DHauptmann A(2017)Complex Event Detection by Identifying Reliable Shots from Untrimmed Videos2017 IEEE International Conference on Computer Vision (ICCV)10.1109/ICCV.2017.86(736-744)Online publication date: Oct-2017
https://doi.org/10.1109/ICCV.2017.86
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents