3D Insights to Some Divergences for Robust Statistics and Machine Learning

Birgit Roensch¹⁵ &
Wolfgang Stummer^15,16

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10589))

Included in the following conference series:

International Conference on Geometric Science of Information

2360 Accesses
6 Citations

Abstract

Divergences (distances) which measure the similarity respectively proximity between two probability distributions have turned out to be very useful for several different tasks in statistics, machine learning, information theory, etc. Some prominent examples are the Kullback-Leibler information, – for convex functions \(\phi \) – the Csiszar-Ali-Silvey \(\phi -\)divergences CASD, the “classical” (i.e., unscaled) Bregman distances and the more general scaled Bregman distances SBD of [26, 27]. By means of 3D plots we show several properties and pitfalls of the geometries of SBDs, also for non-probability distributions; robustness of corresponding minimum-distance-concepts will also be covered. For these investigations, we construct a special SBD subclass which covers both the often used power divergences (of CASD type) as well as their robustness-enhanced extensions with non-convex non-concave \(\phi \).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 71.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 89.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Robust statistical inference based on the C-divergence family

Article 30 July 2018

The B-exponential divergence and its generalizations with applications to parametric estimation

Article 17 November 2018

Some Universal Insights on Divergences for Statistics, Machine Learning and Artificial Intelligence

Notes

1.
Which is equal to \(D_{{\check{\phi }}_{1}}\left( P,Q \right) \) with \({\check{\phi }}_{1}(t) := t\log t \in [-\mathrm {e}^{-1}, \infty [\), but generally \(D_{\phi _{1}}\left( \mu ,\nu \right) \ne D_{{\check{\phi }}_{1}}\left( \mu ,\nu \right) \) where the latter can be negative and thus isn’t a distance.
2.
Also notice that the HD together with \(\theta _0 = 0.5\) does not exhibit such an effect for our smaller 3-element-state space, due to the lack of outliers.

References

Ali, M.S., Silvey, D.: A general class of coefficients of divergence of one distribution from another. J. Roy. Statist. Soc. B–28, 131–140 (1966)
MATH MathSciNet Google Scholar
Basu, A., Lindsay, B.G.: Minimum disparity estimation for continuous models: efficiency, distributions and robustness. Ann. Inst. Statist. Math. 46(4), 683–705 (1994)
Article MATH MathSciNet Google Scholar
Basu, A., Harris, I.R., Hjort, N.L., Jones, M.C.: Robust and efficient estimation by minimising a density power divergence. Biometrika 85, 549–559 (1998)
Article MATH MathSciNet Google Scholar
Basu, A., Shioya, H., Park, C.: Statistical Inference: The Minimum Distance Approach. CRC Press, Boca Raton (2011)
MATH Google Scholar
Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J.: Clustering with Bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (2005)
MATH MathSciNet Google Scholar
Broniatowski, M.: A weighted bootstrap procedure for divergence minimization problems. In: Antoch, J., Jureckova, J., Maciak, M., PeSta, M. (eds.) AMISTAT 2015, pp. 1–22. Springer, Cham (2017)
Google Scholar
Cerone, P., Dragomir, S.S.: Approximation of the integral mean divergence and \(f-\)divergence via mean results. Math. Comp. Model. 42, 207–219 (2005)
Article MATH MathSciNet Google Scholar
Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning & Games. Cambridge UP, New York (2006)
Book MATH Google Scholar
Csiszar, I.: Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizität von Markoffschen Ketten. Publ. Math. Inst. Hungar. Acad. Sci. A–8, 85–108 (1963)
MATH Google Scholar
Csiszar, I., Breuer, T.: Measuring distribution model risk. Mathe. Finance 26(2), 395–411 (2016)
Article MATH MathSciNet Google Scholar
Collins, M., Schapire, R.E., Singer, Y.: Logistic regression, AdaBoost and Bregman distances. Mach. Learn. 48, 253–285 (2002)
Article MATH Google Scholar
Kißlinger, A.-L., Stummer, W.: Some decision procedures based on scaled Bregman distance surfaces. In: Nielsen, F., Barbaresco, F. (eds.) GSI 2013. LNCS, vol. 8085, pp. 479–486. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40020-9_52
Chapter Google Scholar
Kißlinger, A.-L., Stummer, W.: New model search for nonlinear recursive models, regressions and autoregressions. In: Nielsen, F., Barbaresco, F. (eds.) GSI 2015. LNCS, vol. 9389, pp. 693–701. Springer, Cham (2015). doi:10.1007/978-3-319-25040-3_74
Chapter Google Scholar
Kißlinger, A.-L., Stummer, W.: A New Information-Geometric Method of Change Detection. (2015, Preprint)
Google Scholar
Kißlinger, A.-L., Stummer, W.: Robust statistical engineering by means of scaled Bregman distances. In: Agostinelli, C., Basu, A., Filzmoser, P., Mukherjee, D. (eds.) Recent Advances in Robust Statistics: Theory and Applications, pp. 81–113. Springer, New Delhi (2016). doi:10.1007/978-81-322-3643-6_5
Chapter Google Scholar
Liese, F., Vajda, I.: Convex Statistical Distances. Teubner, Leipzig (1987)
MATH Google Scholar
Lindsay, B.G.: Efficiency versus robustness: the case for minimum Hellinger distance and related methods. Ann. Statist. 22(2), 1081–1114 (1994)
Article MATH MathSciNet Google Scholar
Murata, N., Takenouchi, T., Kanamori, T., Eguchi, S.: Information geometry of U-boost and Bregman divergence. Neural Comput. 16(7), 1437–1481 (2004)
Article MATH Google Scholar
Nock, R., Menon, A.K., Ong, C.S.: A scaled Bregman theorem with applications. In: Advances in Neural Information Processing Systems 29 (NIPS 2016), pp. 19–27 (2016)
Google Scholar
Nock, R., Nielsen, F., Amari, S.-I.: On conformal divergences and their population minimizers. IEEE Trans. Inf. Theory 62(1), 527–538 (2016)
Article MATH MathSciNet Google Scholar
Nock, R., Nielsen, F.: Bregman divergences and surrogates for learning. IEEE Trans. Pattern Anal. Mach. Intell. 31(11), 2048–2059 (2009)
Article Google Scholar
Pardo, L.: Statistical Inference Based on Divergence Measures. Chapman H, Boca Raton (2006)
MATH Google Scholar
Pardo, M.C., Vajda, I.: On asymptotic properties of information-theoretic divergences. IEEE Trans. Inf. Theory 49(7), 1860–1868 (2003)
Article MATH MathSciNet Google Scholar
Read, T.R.C., Cressie, N.A.C.: Goodness-of-Fit Statistics for Discrete Multivariate Data. Springer, New York (1988)
Book MATH Google Scholar
Shioya, H., Da-te, T.: A generalisation of Lin divergence and the derivation of a new information divergence measure. Electr. Commun. Japan 78(7), 34–40 (1995)
Article Google Scholar
Stummer, W.: Some Bregman distances between financial diffusion processes. Proc. Appl. Math. Mech. 7(1), 1050503–1050504 (2007)
Article Google Scholar
Stummer, W., Vajda, I.: On Bregman distances and divergences of probability measures. IEEE Trans. Inf.Theory 58(3), 1277–1288 (2012)
Article MATH MathSciNet Google Scholar
Stummer, W., Vajda, I.: On divergences of finite measures and their applicability in statistics and information theory. Statistics 44, 169–187 (2010)
Article MATH MathSciNet Google Scholar
Sugiyama, M., Suzuki, T., Kanamori, T.: Density-ratio matching under the Bregman divergence: a unified framework of density-ratio estimation. Ann. Inst. Stat. Math. 64, 1009–1044 (2012)
Article MATH MathSciNet Google Scholar
Tsuda, K., Rätsch, G., Warmuth, M.: Matrix exponentiated gradient updates for on-line learning and Bregman projection. J. Mach. Learn. Res. 6, 995–1018 (2005)
MATH MathSciNet Google Scholar
Wu, L., Hoi, S.C.H., Jin, R., Zhu, J., Yu, N.: Learning Bregman distance functions for semi-supervised clustering. IEEE Trans. Knowl. Data Eng. 24(3), 478–491 (2012)
Article Google Scholar

Download references

Acknowledgement

We are grateful to all three referees for their useful suggestions. W. Stummer thanks A.L. Kißlinger for valuable discussions.

Author information

Authors and Affiliations

Department of Mathematics, University of Erlangen–Nürnberg, Cauerstrasse 11, 91058, Erlangen, Germany
Birgit Roensch & Wolfgang Stummer
Affiliated Faculty Member of the School of Business and Economics, University of Erlangen–Nürnberg, Lange Gasse 20, 90403, Nürnberg, Germany
Wolfgang Stummer

Authors

Birgit Roensch
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Stummer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wolfgang Stummer .

Editor information

Editors and Affiliations

Ecole Polytechnique, Palaiseau, France
Frank Nielsen
Thales Land and Air Systems, Limours, France
Frédéric Barbaresco

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Roensch, B., Stummer, W. (2017). 3D Insights to Some Divergences for Robust Statistics and Machine Learning. In: Nielsen, F., Barbaresco, F. (eds) Geometric Science of Information. GSI 2017. Lecture Notes in Computer Science(), vol 10589. Springer, Cham. https://doi.org/10.1007/978-3-319-68445-1_54

Download citation

DOI: https://doi.org/10.1007/978-3-319-68445-1_54
Published: 24 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68444-4
Online ISBN: 978-3-319-68445-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

3D Insights to Some Divergences for Robust Statistics and Machine Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Robust statistical inference based on the C-divergence family

The B-exponential divergence and its generalizations with applications to parametric estimation

Some Universal Insights on Divergences for Statistics, Machine Learning and Artificial Intelligence

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

3D Insights to Some Divergences for Robust Statistics and Machine Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Robust statistical inference based on the C-divergence family

The B-exponential divergence and its generalizations with applications to parametric estimation

Some Universal Insights on Divergences for Statistics, Machine Learning and Artificial Intelligence

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation