Abstract
In this paper we unify divergence minimization and statistical inference by means of convex duality. In the process of doing so, we prove that the dual of approximate maximum entropy estimation is maximum a posteriori estimation as a special case. Moreover, our treatment leads to stability and convergence bounds for many statistical learning problems. Finally, we show how an algorithm by Zhang can be used to solve this class of optimization problems efficiently.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Altun, Y., Hofmann, T., Smola, A.J.: Exponential families for conditional random fields. In: Uncertainty in Artificial Intelligence UAI, pp. 2–9 (2004)
Borgwardt, K., Gretton, A., Smola, A.J.: Kernel discrepancy estimation. Technical report, NICTA, Canberra (2006)
Borwein, J., Zhu, Q.J.: Techniques of Variational Analysis. Springer, Heidelberg (2005)
Borwein, J.M.: Semi-infinite programming: How special is it? In: Fiacco, A.V., Kortanek, K.O. (eds.) Semi-Infinite Programming and Applications. Springer, Heidelberg (1983)
Bousquet, O., Boucheron, S., Lugosi, G.: Theory of classification: a survey of recent advances. In: ESAIM: Probability and Statistics (submitted, 2004)
Bousquet, O., Elisseeff, A.: Stability and generalization. JMLR 2, 499–526 (2002)
Chen, S., Rosenfeld, R.: A Gaussian prior for smoothing maximum entropy models. Technical Report CMUCS-99-108, Carnegie Mellon University (1999)
Collins, M., Schapire, R.E., Singer, Y.: Logistic regression, adaboost and bregman distances. In: COLT 2000, pp. 158–169 (2000)
Cortes, C., Vapnik, V.: Support vector networks. Machine Learning 20, 273–297 (1995)
Dudík, M., Phillips, S.J., Schapire, R.E.: Performance guarantees for regularized maximum entropy density estimation. In: Shawe-Taylor, J., Singer, Y. (eds.) COLT 2004. LNCS, vol. 3120, pp. 472–486. Springer, Heidelberg (2004)
Dudík, M., Schapire, R.E.: Maximum entropy distribution estimation with generalized regularization. In: Lugosi, G., Simon, H.U. (eds.) COLT 2006. LNCS, vol. 4005, pp. 123–138. Springer, Heidelberg (2006)
Friedlander, M.P., Gupta, M.R.: On minimizing distortion and relative entropy. IEEE Transactions on Information Theory 52(1) (2006)
Kivinen, J., Warmuth, M.: Boosting as entropy projection. In: COLT 1999 (1999)
Lafferty, J.: Additive models, boosting, and inference for generalized divergences. In: COLT 1999, pp. 125–133. ACM Press, New York (1999)
Le, Q.V., Smola, A.J., Canu, S.: Heteroscedastic gaussian process regression. In: International Conference on Machine Learning ICML 2005 (2005)
Morozov, V.A.: Methods for solving incorrectly posed problems. Springer, Heidelberg (1984)
Neal, R.: Priors for infinite networks. Technical report, U. Toronto (1994)
Nemenman, I., Bialek, W.: Occam factors and model independent bayesian learning of continuous distributions. Physical Review E 65(2), 6137 (2002)
Rätsch, G., Mika, S., Warmuth, M.K.: On the convergence of leveraging. In: Advances in Neural Information Processing Systems (NIPS) (2002)
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Ruderman, D.L., Bialek, W.: Statistics of natural images: Scaling in the woods. Phys. Rev. Letters (1994)
Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2002)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Thikhonov, A.N., Arsenin, V.Y.: Solutions of Ill-Posed Problems. Wiley, Chichester (1977)
Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Technical Report 649, UC Berkeley (September 2003)
Zhang, T.: Sequential greedy approximation for certain convex optimization problems. IEEE Transactions on Information Theory 49(3), 682–691 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Altun, Y., Smola, A. (2006). Unifying Divergence Minimization and Statistical Inference Via Convex Duality. In: Lugosi, G., Simon, H.U. (eds) Learning Theory. COLT 2006. Lecture Notes in Computer Science(), vol 4005. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11776420_13
Download citation
DOI: https://doi.org/10.1007/11776420_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35294-5
Online ISBN: 978-3-540-35296-9
eBook Packages: Computer ScienceComputer Science (R0)