Abstract
Divergences (distances) which measure the similarity respectively proximity between two probability distributions have turned out to be very useful for several different tasks in statistics, machine learning, information theory, etc. Some prominent examples are the Kullback-Leibler information, – for convex functions \(\phi \) – the Csiszar-Ali-Silvey \(\phi -\)divergences CASD, the “classical” (i.e., unscaled) Bregman distances and the more general scaled Bregman distances SBD of [26, 27]. By means of 3D plots we show several properties and pitfalls of the geometries of SBDs, also for non-probability distributions; robustness of corresponding minimum-distance-concepts will also be covered. For these investigations, we construct a special SBD subclass which covers both the often used power divergences (of CASD type) as well as their robustness-enhanced extensions with non-convex non-concave \(\phi \).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Which is equal to \(D_{{\check{\phi }}_{1}}\left( P,Q \right) \) with \({\check{\phi }}_{1}(t) := t\log t \in [-\mathrm {e}^{-1}, \infty [\), but generally \(D_{\phi _{1}}\left( \mu ,\nu \right) \ne D_{{\check{\phi }}_{1}}\left( \mu ,\nu \right) \) where the latter can be negative and thus isn’t a distance.
- 2.
Also notice that the HD together with \(\theta _0 = 0.5\) does not exhibit such an effect for our smaller 3-element-state space, due to the lack of outliers.
References
Ali, M.S., Silvey, D.: A general class of coefficients of divergence of one distribution from another. J. Roy. Statist. Soc. B–28, 131–140 (1966)
Basu, A., Lindsay, B.G.: Minimum disparity estimation for continuous models: efficiency, distributions and robustness. Ann. Inst. Statist. Math. 46(4), 683–705 (1994)
Basu, A., Harris, I.R., Hjort, N.L., Jones, M.C.: Robust and efficient estimation by minimising a density power divergence. Biometrika 85, 549–559 (1998)
Basu, A., Shioya, H., Park, C.: Statistical Inference: The Minimum Distance Approach. CRC Press, Boca Raton (2011)
Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J.: Clustering with Bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (2005)
Broniatowski, M.: A weighted bootstrap procedure for divergence minimization problems. In: Antoch, J., Jureckova, J., Maciak, M., PeSta, M. (eds.) AMISTAT 2015, pp. 1–22. Springer, Cham (2017)
Cerone, P., Dragomir, S.S.: Approximation of the integral mean divergence and \(f-\)divergence via mean results. Math. Comp. Model. 42, 207–219 (2005)
Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning & Games. Cambridge UP, New York (2006)
Csiszar, I.: Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizität von Markoffschen Ketten. Publ. Math. Inst. Hungar. Acad. Sci. A–8, 85–108 (1963)
Csiszar, I., Breuer, T.: Measuring distribution model risk. Mathe. Finance 26(2), 395–411 (2016)
Collins, M., Schapire, R.E., Singer, Y.: Logistic regression, AdaBoost and Bregman distances. Mach. Learn. 48, 253–285 (2002)
Kißlinger, A.-L., Stummer, W.: Some decision procedures based on scaled Bregman distance surfaces. In: Nielsen, F., Barbaresco, F. (eds.) GSI 2013. LNCS, vol. 8085, pp. 479–486. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40020-9_52
Kißlinger, A.-L., Stummer, W.: New model search for nonlinear recursive models, regressions and autoregressions. In: Nielsen, F., Barbaresco, F. (eds.) GSI 2015. LNCS, vol. 9389, pp. 693–701. Springer, Cham (2015). doi:10.1007/978-3-319-25040-3_74
Kißlinger, A.-L., Stummer, W.: A New Information-Geometric Method of Change Detection. (2015, Preprint)
Kißlinger, A.-L., Stummer, W.: Robust statistical engineering by means of scaled Bregman distances. In: Agostinelli, C., Basu, A., Filzmoser, P., Mukherjee, D. (eds.) Recent Advances in Robust Statistics: Theory and Applications, pp. 81–113. Springer, New Delhi (2016). doi:10.1007/978-81-322-3643-6_5
Liese, F., Vajda, I.: Convex Statistical Distances. Teubner, Leipzig (1987)
Lindsay, B.G.: Efficiency versus robustness: the case for minimum Hellinger distance and related methods. Ann. Statist. 22(2), 1081–1114 (1994)
Murata, N., Takenouchi, T., Kanamori, T., Eguchi, S.: Information geometry of U-boost and Bregman divergence. Neural Comput. 16(7), 1437–1481 (2004)
Nock, R., Menon, A.K., Ong, C.S.: A scaled Bregman theorem with applications. In: Advances in Neural Information Processing Systems 29 (NIPS 2016), pp. 19–27 (2016)
Nock, R., Nielsen, F., Amari, S.-I.: On conformal divergences and their population minimizers. IEEE Trans. Inf. Theory 62(1), 527–538 (2016)
Nock, R., Nielsen, F.: Bregman divergences and surrogates for learning. IEEE Trans. Pattern Anal. Mach. Intell. 31(11), 2048–2059 (2009)
Pardo, L.: Statistical Inference Based on Divergence Measures. Chapman H, Boca Raton (2006)
Pardo, M.C., Vajda, I.: On asymptotic properties of information-theoretic divergences. IEEE Trans. Inf. Theory 49(7), 1860–1868 (2003)
Read, T.R.C., Cressie, N.A.C.: Goodness-of-Fit Statistics for Discrete Multivariate Data. Springer, New York (1988)
Shioya, H., Da-te, T.: A generalisation of Lin divergence and the derivation of a new information divergence measure. Electr. Commun. Japan 78(7), 34–40 (1995)
Stummer, W.: Some Bregman distances between financial diffusion processes. Proc. Appl. Math. Mech. 7(1), 1050503–1050504 (2007)
Stummer, W., Vajda, I.: On Bregman distances and divergences of probability measures. IEEE Trans. Inf.Theory 58(3), 1277–1288 (2012)
Stummer, W., Vajda, I.: On divergences of finite measures and their applicability in statistics and information theory. Statistics 44, 169–187 (2010)
Sugiyama, M., Suzuki, T., Kanamori, T.: Density-ratio matching under the Bregman divergence: a unified framework of density-ratio estimation. Ann. Inst. Stat. Math. 64, 1009–1044 (2012)
Tsuda, K., Rätsch, G., Warmuth, M.: Matrix exponentiated gradient updates for on-line learning and Bregman projection. J. Mach. Learn. Res. 6, 995–1018 (2005)
Wu, L., Hoi, S.C.H., Jin, R., Zhu, J., Yu, N.: Learning Bregman distance functions for semi-supervised clustering. IEEE Trans. Knowl. Data Eng. 24(3), 478–491 (2012)
Acknowledgement
We are grateful to all three referees for their useful suggestions. W. Stummer thanks A.L. Kißlinger for valuable discussions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Roensch, B., Stummer, W. (2017). 3D Insights to Some Divergences for Robust Statistics and Machine Learning. In: Nielsen, F., Barbaresco, F. (eds) Geometric Science of Information. GSI 2017. Lecture Notes in Computer Science(), vol 10589. Springer, Cham. https://doi.org/10.1007/978-3-319-68445-1_54
Download citation
DOI: https://doi.org/10.1007/978-3-319-68445-1_54
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68444-4
Online ISBN: 978-3-319-68445-1
eBook Packages: Computer ScienceComputer Science (R0)