Abstract
We analyze the Dawid-Rissanen prequential maximum likelihood codes relative to one-parameter exponential family models \({\mathcal M}\). If data are i.i.d. according to an (essentially) arbitraryP, then the redundancy grows at rate \({\frac{1}{2}} {\rm c} {\rm ln} n\). We show that c = σ \(_{\rm 1}^{\rm 2}\)/ σ \(_{\rm 2}^{\rm 2}\), where σ \(_{\rm 1}^{\rm 2}\) is the variance of P, and σ \(_{\rm 2}^{\rm 2}\) is the variance of the distribution \(M^{*} \in {\mathcal M}\) that is closest to P in KL divergence. This shows that prequential codes behave quite differently from other important universal codes such as the 2-part MDL, Shtarkov and Bayes codes, for which c = 1. This behavior is undesirable in an MDL model selection setting.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Azoury, K., Warmuth, M.: Relative loss bounds for on-line density estimation with the exponential family of distributions. Machine Learning 43(3) (2001)
Barron, A., Rissanen, J., Yu, B.: The minimum description length principle in coding and modeling. IEEE Trans. Inf. Theory 44(6), 2743–2760 (1998)
Cesa-Bianchi, N., Lugosi, G.: Worst-case bounds for the logarithmic loss of predictors. Journal of Machine Learning 43(3), 247–264 (2001)
Clarke, B.S., Barron, A.R.: Information-theoretic asymptotics of Bayes methods. IEEE Trans. Inf. Theory IT-36(3), 453–471 (1990)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, Chichester (1991)
Dawid, A.P.: Present position and potential developments: Some personal views, statistical theory, the prequential approach. Journal of the Royal Statistical Society, Series A 147(2), 278–292 (1984)
de Rooij, S., Grünwald, P.: An empirical study of MDL model selection with infinite parametric complexity. Available at the CoRR arXiv (2005), at http://xxx.lanl.gov/abs/cs.LG/0501028abs.cs.LG/0501028
Freund, Y.: Predicting a binary sequence almost as well as the optimal biased coin. In: Proc. Ninth Annual Conf. on Comp. Learning Theory, COLT 1996 (1996)
Grünwald, P.: MDL tutorial. In: Grünwald, P., Myung, J., Pitt, M. (eds.) Advances in Minimum Description Length, MIT Press, Cambridge (2005)
Grünwald, P., de Rooij, S.: Asymptotic log–loss of prequential maximum likelihood codes. Available at the CoRR arXiv (2005), at http://xxx.lanl.gov/
Hemerly, E.M., Davis, M.H.A.: Strong consistency of the PLS criterion for order determination of autoregressive processes. Ann. Statist. 17(2), 941–946 (1989)
Kass, R., Vos, P.: Geometric Foundations of Asymptotic Inference. Wiley, Chichester (1997)
Li, L., Yu, B.: Iterated logarithmic expansions of the pathwise code lengths for exponential families. IEEE Trans. Inf. Theory 46(7), 2683–2689 (2000)
Rissanen, J.: Universal coding, information, prediction and estimation. IEEE Trans. Inf. Theory 30, 629–636 (1984)
Rissanen, J.: A predictive least squares principle. IMA Journal of Mathematical Control and Information 3, 211–222 (1986)
Rissanen, J.: Stochastic Complexity in Statistical Inquiry. World Scientific, Singapore (1989)
Wei, C.Z.: On predictive least squares principles. Ann. Statist 20(1), 1–42 (1990)
Whittle, P.: Bounds for the moments of linear and quadratic forms in independent variables. Theory of Probability and its Applications (3) (1960)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Grünwald, P., de Rooij, S. (2005). Asymptotic Log-Loss of Prequential Maximum Likelihood Codes. In: Auer, P., Meir, R. (eds) Learning Theory. COLT 2005. Lecture Notes in Computer Science(), vol 3559. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11503415_44
Download citation
DOI: https://doi.org/10.1007/11503415_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26556-6
Online ISBN: 978-3-540-31892-7
eBook Packages: Computer ScienceComputer Science (R0)