[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/3009055.3009114guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article

Efficient Bayesian parameter estimation in large discrete domains

Published: 01 December 1998 Publication History

Abstract

We examine the problem of estimating the parameters of a multinomial distribution over a large number of discrete outcomes, most of which do not appear in the training data. We analyze this problem from a Bayesian perspective and develop a hierarchical prior that incorporates the assumption that the observed outcomes constitute only a small subset of the possible outcomes. We show how to efficiently perform exact inference with this form of hierarchical prior and compare it to standard approaches.

References

[1]
W. Buntine. Learning classification trees. In Artificial Intelligence Frontiers in Statistics. Chapman & Hall, 1993.
[2]
B.S. Clarke and A.R. Barron. Jeffrey's prior is asymptotically least favorable under entropic risk. J. Stat. Planning and Inference, 41:37-60, 1994.
[3]
G. F. Cooper and E. Herskovits. A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9:309-347, 1992.
[4]
M. H. DeGroot. Optimal Statistical Decisions. McGraw-Hill, 1970.
[5]
I.J. Good. The population frequencies of species and the estimation of population parameters. Biometrika, 40(3):237-264, 1953.
[6]
D. Heckerman, D. Geiger, and D. M. Chickering. Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20:197-243, 1995.
[7]
P.S. Laplace. Philosophical Essay on Probabilities. Springer-Verlag, 1995.
[8]
D.J.C. MacKay and L. Peto. A hierarchical Dirichlet language model. Natural Language Eng., 1(3): 1-19, 1995.
[9]
L. R. Rabiner and B. H. Juang. An introduction to hidden Markov models. IEEE ASSP Mag., 3(1): 4-16, 1986.
[10]
E. Ristad. A natural law of succession. Tech. Report CS-TR-495-95. Princeton Univ., 1995.
[11]
Y. Singer. Adaptive mixtures of probabilistic transducers. Neur. Comp., 9(8): 1711-1734, 1997.
[12]
F.M.J. Willems, Y.M. Shtarkov, and T.J. Tjalkens. The context tree weighting method: basic properties. IEEE Trans, on Info. Theory, 41(3):653-664, 1995.
[13]
I.H. Witten and T.C. Bell. The zero-frequency problem: estimating the probabilities of novel events in adaptive text compression. IEEE Trans, on Info. Theory, 37(4): 1085-1094, 1991.

Cited By

View all
  • (2017)Fast and Highly Scalable Bayesian MDP on a GPU PlatformProceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics10.1145/3107411.3107440(158-167)Online publication date: 20-Aug-2017

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS'98: Proceedings of the 12th International Conference on Neural Information Processing Systems
December 1998
1080 pages

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 01 December 1998

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Fast and Highly Scalable Bayesian MDP on a GPU PlatformProceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics10.1145/3107411.3107440(158-167)Online publication date: 20-Aug-2017

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media