Matrix versions of the Hellinger distance

Rajendra Bhatia¹,
Stephane Gaubert² &
Tanvi Jain³

1146 Accesses
33 Citations
1 Altmetric
Explore all metrics

A Correction to this article was published on 11 October 2019

This article has been updated

Abstract

On the space of positive definite matrices, we consider distance functions of the form $d(A,B)=\left[ \mathrm{tr}\mathcal {A}(A,B)-\mathrm{tr}\mathcal {G}(A,B)\right] ^{1/2},$ where $\mathcal {A}(A,B)$ is the arithmetic mean and $\mathcal {G}(A,B)$ is one of the different versions of the geometric mean. When $\mathcal {G}(A,B)=A^{1/2}B^{1/2}$ this distance is $\Vert A^{1/2}-B^{1/2}\Vert _2,$ and when $\mathcal {G}(A,B)=(A^{1/2}BA^{1/2})^{1/2}$ it is the Bures–Wasserstein metric. We study two other cases: $\mathcal {G}(A,B)=A^{1/2}(A^{-1/2}BA^{-1/2})^{1/2}A^{1/2},$ the Pusz–Woronowicz geometric mean, and $\mathcal {G}(A,B)=\exp \big (\frac{\log A+\log B}{2}\big ),$ the log Euclidean mean. With these choices, d(A, B) is no longer a metric, but it turns out that $d^2(A,B)$ is a divergence. We establish some (strict) convexity properties of these divergences. We obtain characterisations of barycentres of m positive definite matrices with respect to these distance measures. One of these leads to a new interpretation of a power mean introduced by Lim and Palfia, as a barycentre. The other uncovers interesting relations between the log Euclidean mean and relative entropy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Inequalities of the Wasserstein mean with other matrix means

Article 01 December 2019

A Unified Formulation for the Bures-Wasserstein and Log-Euclidean/Log-Hilbert-Schmidt Distances between Positive Definite Operators

Quantum Hellinger distances revisited

Article 10 March 2020

Change history

11 October 2019
Theorem 9 in our paper [1] is wrong. The statement should be replaced by the following.

References

Abatzoglou, T.J.: Norm derivatives on spaces of operators. Math. Ann. 239, 129–135 (1979)
Article MathSciNet Google Scholar
Agueh, M., Carlier, G.: Barycenters in the Wasserstein space. SIAM J. Math. Anal. Appl. 43, 904–924 (2011)
Article MathSciNet Google Scholar
Aiken, J.G., Erdos, J.A., Goldstein, J.A.: Unitary approximation of positive operators. Illinois J. Math. 24, 61–72 (1980)
Article MathSciNet Google Scholar
Amari, S.: Information Geometry and its Applications. Springer, Tokyo (2016)
Book Google Scholar
Ando, T.: Concavity of certain maps on positive definite matrices and applications to Hadamard products. Linear Algebra Appl. 26, 203–241 (1979)
Article MathSciNet Google Scholar
Ando, T., Li, C.-K., Mathias, R.: Geometric means. Linear Algebra Appl. 385, 305–334 (2004)
Article MathSciNet Google Scholar
Arsigny, V., Fillard, P., Pennec, X., Ayache, N.: Geometric means in a novel vector space structure on symmetric positive-definite matrices. SIAM J. Math. Anal. Appl. 29, 328–347 (2007)
Article MathSciNet Google Scholar
Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J.: Clustering with Bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (2005)
MathSciNet MATH Google Scholar
Barbaresco, F.: Innovative tools for radar signal processing based on Cartan’s geometry of SPD matrices and information geometry. In: IEEE Radar Conference, Rome (2008)
Bauschke, H.H., Borwein, J.M.: Legendre functions and the method of random Bregman projections. J. Convex Anal. 4, 27–67 (1997)
MathSciNet MATH Google Scholar
Bauschke, H.H., Borwein, J.M.: Joint and separate convexity of the Bregman distance. Stud. Comput. Math. 8, 23–36 (2001)
Article MathSciNet Google Scholar
Bengtsson, I., Zyczkowski, K.: Geometry of Quantum States: An Introduction to Quantum Entanglement. Cambridge University Press, Cambridge (2006)
Book Google Scholar
Bhagwat, K.V., Subramanian, R.: Inequalities between means of positive operators. Math. Proc. Camb. Philos. Soc. 83, 393–401 (1978)
Article MathSciNet Google Scholar
Bhatia, R.: Matrix Analysis. Springer, Tokyo (1997)
Book Google Scholar
Bhatia, R.: Positive Definite Matrices. Princeton University Press, Princeton (2007)
MATH Google Scholar
Bhatia, R.: The Riemannian mean of positive matrices. In: Nielsen, F., Bhatia, R. (eds.) Matrix Information Geometry, pp. 35–51. Springer, Tokyo (2013)
Chapter Google Scholar
Bhatia, R., Grover, P.: Norm inequalities related to the matrix geometric mean. Linear Algebra Appl. 437, 726–733 (2012)
Article MathSciNet Google Scholar
Bhatia, R., Jain, T., Lim, Y.: On the Bures-Wasserstein distance between positive definite matrices. Expos. Math. (2018). https://doi.org/10.1016/j.exmath.2018.01.002
Bhatia, R., Jain, T., Lim, Y.: Strong convexity of sandwiched entropies and related optimization problems. Rev. Math. Phys. 30, 1850014 (2018)
Article MathSciNet Google Scholar
Carlen, E.A., Lieb, E.H.: A Minkowski type trace inequality and strong subadditivity of quantum entropy. Adv. Math. Sci. AMS Transl. 180, 59–68 (1999)
MathSciNet MATH Google Scholar
Carlen, E.A., Lieb, E.H.: A Minkowski type trace inequality and strong subadditivity of quantum entropy. II. Convexity and concavity. Lett. Math. Phys. 83, 107–126 (2008)
Article ADS MathSciNet Google Scholar
Chebbi, Z., Moakher, M.: Means of Hermitian positive-definite matrices based on the log-determinant $\alpha $-divergence function. Linear Algebra Appl. 436, 18721889 (2012)
Google Scholar
Dhillon, I.S., Tropp, J.A.: Matrix nearness problems with Bregman divergences. SIAM J. Matrix Anal. Appl. 29, 1120–1146 (2004)
Article MathSciNet Google Scholar
Fletcher, P., Joshi, S.: Riemannian geometry for the statistical analysis of diffusion tensor data. Signal Process. 87, 250–262 (2007)
Article Google Scholar
Hiai, F., Mosonyi, M., Petz, D., Beny, C.: Quantum f-divergences and error correction. Rev. Math. Phys. 23, 691–747 (2011)
Article MathSciNet Google Scholar
Jencová, A.: Geodesic distances on density matrices. J. Math. Phys. 45, 1787–1794 (2004)
Article ADS MathSciNet Google Scholar
Jencova, A., Ruskai, M.B.: A unified treatment of convexity of relative entropy and related trace functions with conditions for equality. Rev. Math. Phys. 22, 1099–1121 (2010)
Article MathSciNet Google Scholar
Lim, Y., Palfia, M.: Matrix power means and the Karcher mean. J. Funct. Anal. 262, 1498–1514 (2012)
Article MathSciNet Google Scholar
Modin, K.: Geometry of matrix decompositions seen through optimal transport and information geometry. J. Geom. Mech. 9, 335–390 (2017)
Article MathSciNet Google Scholar
Nielsen, F., Bhatia, R. (eds.): Matrix Information Geometry. Springer, Tokyo (2013)
Google Scholar
Nielsen, F., Boltz, S.: The Burbea-Rao and Bhattacharyya centroids. IEEE Trans. Inf. Theory 57, 5455–5466 (2011)
Article MathSciNet Google Scholar
Pitrik, J., Virosztek, D.: On the joint convexity of the Bregman divergence of matrices. Lett. Math. Phys. 105, 675–692 (2015)
Article ADS MathSciNet Google Scholar
Pusz, W., Woronowicz, S.L.: Functional calculus for sesquilinear forms and the purification map. Rep. Math. Phys. 8, 159–170 (1975)
Article ADS MathSciNet Google Scholar
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Book Google Scholar
Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis. Springer, Tokyo (1998)
Book Google Scholar
Sra, S.: Positive definite matrices and the $S$-divergence. Proc. Am. Math. Soc. 144, 2787–2797 (2016)
Google Scholar
Takatsu, A.: Wasserstein geometry of Gaussian measures. Osaka J. Math. 48, 1005–1026 (2011)
MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors thank F. Hiai and S. Sra for helpful comments and references, and the anonymous referee for a careful reading of the manuscript. The first author is grateful to INRIA and Ecole polytechnique, Palaiseau for visits that facilitated this work, and to CSIR(India) for the award of a Bhatnagar Fellowship.

Author information

Authors and Affiliations

Ashoka University, Sonepat, Haryana, 131029, India
Rajendra Bhatia
INRIA and CMAP, Ecole Polytechnique, CNRS, 91128, Palaiseau, France
Stephane Gaubert
Indian Statistical Institute, New Delhi, 110016, India
Tanvi Jain

Authors

Rajendra Bhatia
View author publications
You can also search for this author in PubMed Google Scholar
Stephane Gaubert
View author publications
You can also search for this author in PubMed Google Scholar
Tanvi Jain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tanvi Jain.

Appendices

Appendix A. Proof of Lemma 5

We make a variation of the proof of Theorem 3.12 in [10], dealing with a related problem (the minimisation of $\Phi $ over a closed convex set).

Since $\varphi $ is of Legendre type, Theorem 3.7(iii) of [10] shows that for all $a\in {\text {int}}{\text {dom}}\varphi $, the map $x\mapsto \Phi (x,a)$ is coercive, meaning that $\mathrm{lim}_{\Vert x\Vert \rightarrow \infty } \Phi (x,a)=+\infty $. A sum of coercive functions is coercive, and so the map

$$\begin{aligned} \Psi (x):= \sum _{j=1}^m\frac{1}{m}\Phi (x,a_j) \end{aligned}$$

is coercive. The infimum of a coercive lower-semicontinuous function on a closed non-empty set is attained, so there is an element $\bar{x}\in {\text {clo}}{\text {int}}{\text {dom}}\varphi $ such that $\inf _{x\in {\text {clo}}{\text {int}}{\text {dom}}\varphi } \Phi (x)=\Phi (\bar{x})<+\infty $. Suppose that $\bar{x}$ belongs to the boundary of ${\text {int}}{\text {dom}}\varphi $. Let us fix an arbitrary $z\in {\text {int}}{\text {dom}}\varphi $, and let $g(t) :=\Psi ((1-t)\bar{x}+t z)$, defined for $t\in [0,1)$. We have

$$\begin{aligned} g'(t)=\langle \nabla \varphi ((1-t)\bar{x}+t z)-\sum _{j=1}^m \frac{1}{m} \nabla \varphi (a_j), z-\bar{x}\rangle . \end{aligned}$$

Using property (iv) of the definition of Legendre type functions, we obtain that $\mathrm{lim}_{t\rightarrow 0^+}g'(t)=-\infty $, which entails that $g(t)<g(0)=\Psi ({\bar{x}})$ for t small enough. Since $(1-t)\bar{x}+t z\in {\text {int}}{\text {dom}}\varphi $ for all $t\in (0,1)$, this contradicts the optimality of ${\bar{x}}$. So ${\bar{x}}\in {\text {int}}{\text {dom}}\varphi $, which proves Lemma 5.

Appendix B. Examples

In the last statement of Theorem 6, dealing with tracial convex functions, we required $\varphi $ to be differentiable and strictly convex on $\mathbb {P}$. In the second statement, dealing with the non tracial case, we made a stronger assumption, requiring $\varphi $ to be of Legendre type. We now give an example showing that the Legendre condition cannot be dispensed with. To this end, it is convenient to construct first an example showing the tightness of Lemma 5.

1.1 Need for the Legendre condition in Lemma 5

Let us fix $N>3$, let $e=(1,1)^\top \in \mathbb {R}^2$,

$$\begin{aligned} L=\left( \begin{array}{cc} N-1 &{}\quad -2 \\ -2 &{}\quad N-1 \end{array}\right) \end{aligned}$$

(59)

and consider the affine transformation $g(x)=e+Lx$. Let $ a = (N,0)^\top $, $ b = (0 , N)^\top $, and

$$\begin{aligned}&{\bar{a}}:= g^{-1}( a)= \frac{1}{N^2-2N-3}\left( \begin{array}{c} N^2-2N-1 \\ N-1 \end{array} \right) , \\&{\bar{b}}:=g^{-1}(b)= \frac{1}{N^2-2N-3} \left( \begin{array}{c}N-1 \\ N^2-2N-1 \end{array} \right) . \end{aligned}$$

Observe that ${\bar{a}}, {\bar{b}}\in \mathbb {R}_{++}^2$ since $N>3$.

Consider now, for $p>1$, the map $\varphi (x):=\Vert x\Vert _p^p=|x_1|^p+|x_2|^p$ defined on $\mathbb {R}^2$ and ${\bar{\varphi }}(x)=\varphi (g(x))$. Observe that $\varphi $ is strictly convex and differentiable. Let ${\bar{\Phi }}$ denote the Bregman divergence associated with ${\bar{\varphi }}$, and let ${\bar{\Psi }}(x):= \frac{1}{2}({\bar{\Phi }}(x,{\bar{a}})+{\bar{\Phi }}(x,{\bar{b}}))$. We claim that 0 is the unique point of minimum of ${\bar{\Psi }}$ over $\mathbb {R}_{+}^2$. Indeed,

$$\begin{aligned} \nabla {\bar{\Psi }}(x)&=L^\top (\nabla \varphi (g(x)))-\frac{1}{2} \Big (L^\top (\nabla \varphi ( a))+ L^\top (\nabla \varphi ( b))\Big ), \end{aligned}$$

from which we obtain

$$\begin{aligned} \nabla {\bar{\Psi }}(0)&= L(p(1 -N^{p-1}/2) e)=(N-3)p(1-N^{p-1}/2)e. \end{aligned}$$

It follows that $\nabla {\bar{\Psi }}(0)\in \mathbb {R}_{++}^2$ if $p>1$ is chosen close enough to 1, so that $1-N^{p-1}/2>0$. Then, since ${\bar{\Psi }}$ is convex, we have

$$\begin{aligned} {\bar{\Psi }}(x)-{\bar{\Psi }}(0)\geqslant \langle \nabla {\bar{\Psi }}(0),x\rangle >0, \quad \text { for all } x\in \mathbb {R}_{+}^2{\setminus }\{0\} \end{aligned}$$

(60)

showing the claim.

Consider now the modification $\hat{\varphi }$ of $\bar{\varphi }$, so that $\hat{\varphi }(x)=\bar{\varphi }(x)$ for $x\in \mathbb {R}_{+}^2$, and $\hat{\varphi }(x)=+\infty $ otherwise. The function $\hat{\varphi }$ is strictly convex, lower-semicontinuous, and differentiable on the interior of its domain, but not of Legendre type, and the conclusion of Lemma 5 does not apply to it.

The geometric intuition leading to this example is described in the figure.

The example illustrated. The point u is the unconstrained minimum of the sum of Bregman divergences $\Psi (x):=\Phi (x,a)+\Phi (x,b)$ associated with $\varphi (x)=x_1^p+x_2^p$, here $p=1.2$. Level curves of $\Psi $ are shown. The minimum of $\Psi $ on the simplicial cone C is at the unit vector e. An affine change of variables sending C to the standard quadrant, and a lift to the cone of positive semidefinite matrices leads to Proposition 11.

1.2 Need for the Legendre condition in Theorem 6

We next construct an example showing that the Legendre condition in the second statement of Theorem 6 cannot be dispensed with. Observe that the inverse of the linear operator L in (59) is given by

$$\begin{aligned} L^{-1} = \frac{1}{N^2-2N-3}\left( \begin{array}{c@{\quad }c} N-1 &{} 2 \\ 2 &{} N-1 \end{array}\right) . \end{aligned}$$

In particular, it is a nonnegative matrix.

We set $\tau =\left( {\begin{matrix}0&{}1\\ 1&{} 0\end{matrix}}\right) $, and consider the “quantum” analogue of L, i.e.

$$\begin{aligned} T(X)=(N-1)X- 2 \tau X \tau . \end{aligned}$$

Then,

$$\begin{aligned} T^{-1}(X)= \frac{1}{N^2-2N-3}\big ((N-1)X+ 2\tau X\tau \big ) \end{aligned}$$

is a completely positive map leaving $\mathbb {P}$ invariant. The analogue of the map g is

$$\begin{aligned} G(X) =I + T(X) \end{aligned}$$

where I denotes the identity matrix.

We now consider the map $\varphi (X):= \Vert X\Vert _p^p={\text {tr}}(|X|^p)$ defined on the space of Hermitian matrices. The function $\varphi $ is differentiable and strictly convex, still assuming that $p>1$. We set ${\bar{A}}:= {\text {diag}}({\bar{a}})\in \mathbb {P}$, ${\bar{B}}:={\text {diag}}({\bar{b}})\in \mathbb {P}$, and now define ${\bar{\Phi }}$ to be the Bregman divergence associated with ${\bar{\varphi }}:= \varphi \circ G$. Let

$$\begin{aligned} {\bar{\Psi }}(X):= \frac{1}{2}\Big ( {\bar{\Phi }}(X,{\bar{A}})+ {\bar{\Phi }}(X,{\bar{B}}) \Big ). \end{aligned}$$

We then have the following result.

Proposition 11

The minimum of the function ${\bar{\Psi }}$ on the closure of $\mathbb {P}$ is achieved at point 0. Moreover, the equation

$$\begin{aligned} \nabla \bar{\varphi }(X)=\frac{1}{2}(\nabla \bar{\varphi }(\bar{A}) + \nabla \bar{\varphi }(\bar{B})) \end{aligned}$$

(61)

has no solution X in $\mathbb {P}$. $\square $

Proof

From [3] (Theorem 2.1) or [1] (Theorem 2.3), we have

$$\begin{aligned} \frac{{\hbox {d}}}{{\hbox {d}}t}\mid _{t=0} {\text {tr}}|X+tY|^p = p {\text {Re}} {\text {tr}}|X|^{p-1}U^*Y \end{aligned}$$

where $X=U|X|$ is the polar decomposition of X. In particular, if X is diagonal and positive semidefinite,

$$\begin{aligned} \nabla \varphi (X) = pX^{p-1}. \end{aligned}$$

Then, by a computation similar to the one in the scalar case above, we obtain

$$\begin{aligned} \nabla {\bar{\Psi }}(0) = (N-3)p(1-N^{p-1}/2)I \in \mathbb {P}. \end{aligned}$$

We conclude, as in (60), that

$$\begin{aligned} {\bar{\Psi }}(X)-{\bar{\Psi }}(0)\geqslant \langle \nabla {\bar{\Psi }}(0),X\rangle >0, \quad \text { for all } X\in {\text {clo}}\mathbb {P}{\setminus }\{0\}, \end{aligned}$$

where now $\langle \cdot ,\cdot \rangle $ is the Frobenius scalar product on the space of Hermitian matrices. It follows that 0 is the unique point of minimum of ${\bar{\Psi }}$ on ${\text {clo}}\mathbb {P}$.

Moreover, if Eq. (61) had a solution $X\in \mathbb {P}$, the first-order optimality condition for the minimisation of the function ${\bar{\Psi }}$ over $\mathbb {P}$ would be satisfied, showing that ${\bar{\Psi }}(Y)\geqslant {\bar{\Psi }}(X)$ for all $X\in \mathbb {P}$, and by density, ${\bar{\Psi }}(0)\geqslant {\bar{\Psi }}(X)$, contradicting the fact that 0 is the unique point of minimum of ${\bar{\Psi }}$ over ${\text {clo}}\mathbb {P}$. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bhatia, R., Gaubert, S. & Jain, T. Matrix versions of the Hellinger distance. Lett Math Phys 109, 1777–1804 (2019). https://doi.org/10.1007/s11005-019-01156-0

Download citation

Received: 04 August 2018
Revised: 23 November 2018
Accepted: 04 January 2019
Published: 14 January 2019
Issue Date: 01 August 2019
DOI: https://doi.org/10.1007/s11005-019-01156-0

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Inequalities of the Wasserstein mean with other matrix means