Abstract
On the space of positive definite matrices, we consider distance functions of the form \(d(A,B)=\left[ \mathrm{tr}\mathcal {A}(A,B)-\mathrm{tr}\mathcal {G}(A,B)\right] ^{1/2},\) where \(\mathcal {A}(A,B)\) is the arithmetic mean and \(\mathcal {G}(A,B)\) is one of the different versions of the geometric mean. When \(\mathcal {G}(A,B)=A^{1/2}B^{1/2}\) this distance is \(\Vert A^{1/2}-B^{1/2}\Vert _2,\) and when \(\mathcal {G}(A,B)=(A^{1/2}BA^{1/2})^{1/2}\) it is the Bures–Wasserstein metric. We study two other cases: \(\mathcal {G}(A,B)=A^{1/2}(A^{-1/2}BA^{-1/2})^{1/2}A^{1/2},\) the Pusz–Woronowicz geometric mean, and \(\mathcal {G}(A,B)=\exp \big (\frac{\log A+\log B}{2}\big ),\) the log Euclidean mean. With these choices, d(A, B) is no longer a metric, but it turns out that \(d^2(A,B)\) is a divergence. We establish some (strict) convexity properties of these divergences. We obtain characterisations of barycentres of m positive definite matrices with respect to these distance measures. One of these leads to a new interpretation of a power mean introduced by Lim and Palfia, as a barycentre. The other uncovers interesting relations between the log Euclidean mean and relative entropy.
Similar content being viewed by others
Change history
11 October 2019
Theorem 9 in our paper [1] is wrong. The statement should be replaced by the following.
References
Abatzoglou, T.J.: Norm derivatives on spaces of operators. Math. Ann. 239, 129–135 (1979)
Agueh, M., Carlier, G.: Barycenters in the Wasserstein space. SIAM J. Math. Anal. Appl. 43, 904–924 (2011)
Aiken, J.G., Erdos, J.A., Goldstein, J.A.: Unitary approximation of positive operators. Illinois J. Math. 24, 61–72 (1980)
Amari, S.: Information Geometry and its Applications. Springer, Tokyo (2016)
Ando, T.: Concavity of certain maps on positive definite matrices and applications to Hadamard products. Linear Algebra Appl. 26, 203–241 (1979)
Ando, T., Li, C.-K., Mathias, R.: Geometric means. Linear Algebra Appl. 385, 305–334 (2004)
Arsigny, V., Fillard, P., Pennec, X., Ayache, N.: Geometric means in a novel vector space structure on symmetric positive-definite matrices. SIAM J. Math. Anal. Appl. 29, 328–347 (2007)
Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J.: Clustering with Bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (2005)
Barbaresco, F.: Innovative tools for radar signal processing based on Cartan’s geometry of SPD matrices and information geometry. In: IEEE Radar Conference, Rome (2008)
Bauschke, H.H., Borwein, J.M.: Legendre functions and the method of random Bregman projections. J. Convex Anal. 4, 27–67 (1997)
Bauschke, H.H., Borwein, J.M.: Joint and separate convexity of the Bregman distance. Stud. Comput. Math. 8, 23–36 (2001)
Bengtsson, I., Zyczkowski, K.: Geometry of Quantum States: An Introduction to Quantum Entanglement. Cambridge University Press, Cambridge (2006)
Bhagwat, K.V., Subramanian, R.: Inequalities between means of positive operators. Math. Proc. Camb. Philos. Soc. 83, 393–401 (1978)
Bhatia, R.: Matrix Analysis. Springer, Tokyo (1997)
Bhatia, R.: Positive Definite Matrices. Princeton University Press, Princeton (2007)
Bhatia, R.: The Riemannian mean of positive matrices. In: Nielsen, F., Bhatia, R. (eds.) Matrix Information Geometry, pp. 35–51. Springer, Tokyo (2013)
Bhatia, R., Grover, P.: Norm inequalities related to the matrix geometric mean. Linear Algebra Appl. 437, 726–733 (2012)
Bhatia, R., Jain, T., Lim, Y.: On the Bures-Wasserstein distance between positive definite matrices. Expos. Math. (2018). https://doi.org/10.1016/j.exmath.2018.01.002
Bhatia, R., Jain, T., Lim, Y.: Strong convexity of sandwiched entropies and related optimization problems. Rev. Math. Phys. 30, 1850014 (2018)
Carlen, E.A., Lieb, E.H.: A Minkowski type trace inequality and strong subadditivity of quantum entropy. Adv. Math. Sci. AMS Transl. 180, 59–68 (1999)
Carlen, E.A., Lieb, E.H.: A Minkowski type trace inequality and strong subadditivity of quantum entropy. II. Convexity and concavity. Lett. Math. Phys. 83, 107–126 (2008)
Chebbi, Z., Moakher, M.: Means of Hermitian positive-definite matrices based on the log-determinant \(\alpha \)-divergence function. Linear Algebra Appl. 436, 18721889 (2012)
Dhillon, I.S., Tropp, J.A.: Matrix nearness problems with Bregman divergences. SIAM J. Matrix Anal. Appl. 29, 1120–1146 (2004)
Fletcher, P., Joshi, S.: Riemannian geometry for the statistical analysis of diffusion tensor data. Signal Process. 87, 250–262 (2007)
Hiai, F., Mosonyi, M., Petz, D., Beny, C.: Quantum f-divergences and error correction. Rev. Math. Phys. 23, 691–747 (2011)
Jencová, A.: Geodesic distances on density matrices. J. Math. Phys. 45, 1787–1794 (2004)
Jencova, A., Ruskai, M.B.: A unified treatment of convexity of relative entropy and related trace functions with conditions for equality. Rev. Math. Phys. 22, 1099–1121 (2010)
Lim, Y., Palfia, M.: Matrix power means and the Karcher mean. J. Funct. Anal. 262, 1498–1514 (2012)
Modin, K.: Geometry of matrix decompositions seen through optimal transport and information geometry. J. Geom. Mech. 9, 335–390 (2017)
Nielsen, F., Bhatia, R. (eds.): Matrix Information Geometry. Springer, Tokyo (2013)
Nielsen, F., Boltz, S.: The Burbea-Rao and Bhattacharyya centroids. IEEE Trans. Inf. Theory 57, 5455–5466 (2011)
Pitrik, J., Virosztek, D.: On the joint convexity of the Bregman divergence of matrices. Lett. Math. Phys. 105, 675–692 (2015)
Pusz, W., Woronowicz, S.L.: Functional calculus for sesquilinear forms and the purification map. Rep. Math. Phys. 8, 159–170 (1975)
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis. Springer, Tokyo (1998)
Sra, S.: Positive definite matrices and the \(S\)-divergence. Proc. Am. Math. Soc. 144, 2787–2797 (2016)
Takatsu, A.: Wasserstein geometry of Gaussian measures. Osaka J. Math. 48, 1005–1026 (2011)
Acknowledgements
The authors thank F. Hiai and S. Sra for helpful comments and references, and the anonymous referee for a careful reading of the manuscript. The first author is grateful to INRIA and Ecole polytechnique, Palaiseau for visits that facilitated this work, and to CSIR(India) for the award of a Bhatnagar Fellowship.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A. Proof of Lemma 5
We make a variation of the proof of Theorem 3.12 in [10], dealing with a related problem (the minimisation of \(\Phi \) over a closed convex set).
Since \(\varphi \) is of Legendre type, Theorem 3.7(iii) of [10] shows that for all \(a\in {\text {int}}{\text {dom}}\varphi \), the map \(x\mapsto \Phi (x,a)\) is coercive, meaning that \(\mathrm{lim}_{\Vert x\Vert \rightarrow \infty } \Phi (x,a)=+\infty \). A sum of coercive functions is coercive, and so the map
is coercive. The infimum of a coercive lower-semicontinuous function on a closed non-empty set is attained, so there is an element \(\bar{x}\in {\text {clo}}{\text {int}}{\text {dom}}\varphi \) such that \(\inf _{x\in {\text {clo}}{\text {int}}{\text {dom}}\varphi } \Phi (x)=\Phi (\bar{x})<+\infty \). Suppose that \(\bar{x}\) belongs to the boundary of \({\text {int}}{\text {dom}}\varphi \). Let us fix an arbitrary \(z\in {\text {int}}{\text {dom}}\varphi \), and let \(g(t) :=\Psi ((1-t)\bar{x}+t z)\), defined for \(t\in [0,1)\). We have
Using property (iv) of the definition of Legendre type functions, we obtain that \(\mathrm{lim}_{t\rightarrow 0^+}g'(t)=-\infty \), which entails that \(g(t)<g(0)=\Psi ({\bar{x}})\) for t small enough. Since \((1-t)\bar{x}+t z\in {\text {int}}{\text {dom}}\varphi \) for all \(t\in (0,1)\), this contradicts the optimality of \({\bar{x}}\). So \({\bar{x}}\in {\text {int}}{\text {dom}}\varphi \), which proves Lemma 5.
Appendix B. Examples
In the last statement of Theorem 6, dealing with tracial convex functions, we required \(\varphi \) to be differentiable and strictly convex on \(\mathbb {P}\). In the second statement, dealing with the non tracial case, we made a stronger assumption, requiring \(\varphi \) to be of Legendre type. We now give an example showing that the Legendre condition cannot be dispensed with. To this end, it is convenient to construct first an example showing the tightness of Lemma 5.
1.1 Need for the Legendre condition in Lemma 5
Let us fix \(N>3\), let \(e=(1,1)^\top \in \mathbb {R}^2\),
and consider the affine transformation \(g(x)=e+Lx\). Let \( a = (N,0)^\top \), \( b = (0 , N)^\top \), and
Observe that \({\bar{a}}, {\bar{b}}\in \mathbb {R}_{++}^2\) since \(N>3\).
Consider now, for \(p>1\), the map \(\varphi (x):=\Vert x\Vert _p^p=|x_1|^p+|x_2|^p\) defined on \(\mathbb {R}^2\) and \({\bar{\varphi }}(x)=\varphi (g(x))\). Observe that \(\varphi \) is strictly convex and differentiable. Let \({\bar{\Phi }}\) denote the Bregman divergence associated with \({\bar{\varphi }}\), and let \({\bar{\Psi }}(x):= \frac{1}{2}({\bar{\Phi }}(x,{\bar{a}})+{\bar{\Phi }}(x,{\bar{b}}))\). We claim that 0 is the unique point of minimum of \({\bar{\Psi }}\) over \(\mathbb {R}_{+}^2\). Indeed,
from which we obtain
It follows that \(\nabla {\bar{\Psi }}(0)\in \mathbb {R}_{++}^2\) if \(p>1\) is chosen close enough to 1, so that \(1-N^{p-1}/2>0\). Then, since \({\bar{\Psi }}\) is convex, we have
showing the claim.
Consider now the modification \(\hat{\varphi }\) of \(\bar{\varphi }\), so that \(\hat{\varphi }(x)=\bar{\varphi }(x)\) for \(x\in \mathbb {R}_{+}^2\), and \(\hat{\varphi }(x)=+\infty \) otherwise. The function \(\hat{\varphi }\) is strictly convex, lower-semicontinuous, and differentiable on the interior of its domain, but not of Legendre type, and the conclusion of Lemma 5 does not apply to it.
The geometric intuition leading to this example is described in the figure.
The example illustrated. The point u is the unconstrained minimum of the sum of Bregman divergences \(\Psi (x):=\Phi (x,a)+\Phi (x,b)\) associated with \(\varphi (x)=x_1^p+x_2^p\), here \(p=1.2\). Level curves of \(\Psi \) are shown. The minimum of \(\Psi \) on the simplicial cone C is at the unit vector e. An affine change of variables sending C to the standard quadrant, and a lift to the cone of positive semidefinite matrices leads to Proposition 11.
1.2 Need for the Legendre condition in Theorem 6
We next construct an example showing that the Legendre condition in the second statement of Theorem 6 cannot be dispensed with. Observe that the inverse of the linear operator L in (59) is given by
In particular, it is a nonnegative matrix.
We set \(\tau =\left( {\begin{matrix}0&{}1\\ 1&{} 0\end{matrix}}\right) \), and consider the “quantum” analogue of L, i.e.
Then,
is a completely positive map leaving \(\mathbb {P}\) invariant. The analogue of the map g is
where I denotes the identity matrix.
We now consider the map \(\varphi (X):= \Vert X\Vert _p^p={\text {tr}}(|X|^p)\) defined on the space of Hermitian matrices. The function \(\varphi \) is differentiable and strictly convex, still assuming that \(p>1\). We set \({\bar{A}}:= {\text {diag}}({\bar{a}})\in \mathbb {P}\), \({\bar{B}}:={\text {diag}}({\bar{b}})\in \mathbb {P}\), and now define \({\bar{\Phi }}\) to be the Bregman divergence associated with \({\bar{\varphi }}:= \varphi \circ G\). Let
We then have the following result.
Proposition 11
The minimum of the function \({\bar{\Psi }}\) on the closure of \(\mathbb {P}\) is achieved at point 0. Moreover, the equation
has no solution X in \(\mathbb {P}\). \(\square \)
Proof
From [3] (Theorem 2.1) or [1] (Theorem 2.3), we have
where \(X=U|X|\) is the polar decomposition of X. In particular, if X is diagonal and positive semidefinite,
Then, by a computation similar to the one in the scalar case above, we obtain
We conclude, as in (60), that
where now \(\langle \cdot ,\cdot \rangle \) is the Frobenius scalar product on the space of Hermitian matrices. It follows that 0 is the unique point of minimum of \({\bar{\Psi }}\) on \({\text {clo}}\mathbb {P}\).
Moreover, if Eq. (61) had a solution \(X\in \mathbb {P}\), the first-order optimality condition for the minimisation of the function \({\bar{\Psi }}\) over \(\mathbb {P}\) would be satisfied, showing that \({\bar{\Psi }}(Y)\geqslant {\bar{\Psi }}(X)\) for all \(X\in \mathbb {P}\), and by density, \({\bar{\Psi }}(0)\geqslant {\bar{\Psi }}(X)\), contradicting the fact that 0 is the unique point of minimum of \({\bar{\Psi }}\) over \({\text {clo}}\mathbb {P}\). \(\square \)
Rights and permissions
About this article
Cite this article
Bhatia, R., Gaubert, S. & Jain, T. Matrix versions of the Hellinger distance. Lett Math Phys 109, 1777–1804 (2019). https://doi.org/10.1007/s11005-019-01156-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11005-019-01156-0