On the space of positive definite matrices, we consider distance functions of the form \(d(A,B)=\left[ \mathrm{tr}\mathcal {A}(A,B)-\mathrm{tr}\mathcal {G}(A,B)\right] ^{1/2},\) where \(\mathcal {A}(A,B)\) is the arithmetic mean and \(\mathcal {G}(A,B)\) is one of the different versions of the geometric mean. When \(\mathcal {G}(A,B)=A^{1/2}B^{1/2}\) this distance is \(\Vert A^{1/2}-B^{1/2}\Vert _2,\) and when \(\mathcal {G}(A,B)=(A^{1/2}BA^{1/2})^{1/2}\) it is the Bures–Wasserstein metric. We study two other cases: \(\mathcal {G}(A,B)=A^{1/2}(A^{-1/2}BA^{-1/2})^{1/2}A^{1/2},\) the Pusz–Woronowicz geometric mean, and \(\mathcal {G}(A,B)=\exp \big (\frac{\log A+\log B}{2}\big ),\) the log Euclidean mean. With these choices, d(A, B) is no longer a metric, but it turns out that \(d^2(A,B)\) is a divergence. We establish some (strict) convexity properties of these divergences. We obtain characterisations of barycentres of m positive definite matrices with respect to these distance measures. One of these leads to a new interpretation of a power mean introduced by Lim and Palfia, as a barycentre. The other uncovers interesting relations between the log Euclidean mean and relative entropy.
Change history
11 October 2019
Theorem 9 in our paper [1] is wrong. The statement should be replaced by the following.
The authors thank F. Hiai and S. Sra for helpful comments and references, and the anonymous referee for a careful reading of the manuscript. The first author is grateful to INRIA and Ecole polytechnique, Palaiseau for visits that facilitated this work, and to CSIR(India) for the award of a Bhatnagar Fellowship.
Appendix A. Proof of Lemma 5
We make a variation of the proof of Theorem 3.12 in [10], dealing with a related problem (the minimisation of \(\Phi \) over a closed convex set).
Since \(\varphi \) is of Legendre type, Theorem 3.7(iii) of [10] shows that for all \(a\in {\text {int}}{\text {dom}}\varphi \), the map \(x\mapsto \Phi (x,a)\) is coercive, meaning that \(\mathrm{lim}_{\Vert x\Vert \rightarrow \infty } \Phi (x,a)=+\infty \). A sum of coercive functions is coercive, and so the map
is coercive. The infimum of a coercive lower-semicontinuous function on a closed non-empty set is attained, so there is an element \(\bar{x}\in {\text {clo}}{\text {int}}{\text {dom}}\varphi \) such that \(\inf _{x\in {\text {clo}}{\text {int}}{\text {dom}}\varphi } \Phi (x)=\Phi (\bar{x})<+\infty \). Suppose that \(\bar{x}\) belongs to the boundary of \({\text {int}}{\text {dom}}\varphi \). Let us fix an arbitrary \(z\in {\text {int}}{\text {dom}}\varphi \), and let \(g(t) :=\Psi ((1-t)\bar{x}+t z)\), defined for \(t\in [0,1)\). We have
Using property (iv) of the definition of Legendre type functions, we obtain that \(\mathrm{lim}_{t\rightarrow 0^+}g'(t)=-\infty \), which entails that \(g(t)<g(0)=\Psi ({\bar{x}})\) for t small enough. Since \((1-t)\bar{x}+t z\in {\text {int}}{\text {dom}}\varphi \) for all \(t\in (0,1)\), this contradicts the optimality of \({\bar{x}}\). So \({\bar{x}}\in {\text {int}}{\text {dom}}\varphi \), which proves Lemma 5.
Appendix B. Examples
In the last statement of Theorem 6, dealing with tracial convex functions, we required \(\varphi \) to be differentiable and strictly convex on \(\mathbb {P}\). In the second statement, dealing with the non tracial case, we made a stronger assumption, requiring \(\varphi \) to be of Legendre type. We now give an example showing that the Legendre condition cannot be dispensed with. To this end, it is convenient to construct first an example showing the tightness of Lemma 5.
1.1 Need for the Legendre condition in Lemma 5
Let us fix \(N>3\), let \(e=(1,1)^\top \in \mathbb {R}^2\),
and consider the affine transformation \(g(x)=e+Lx\). Let \( a = (N,0)^\top \), \( b = (0 , N)^\top \), and
Observe that \({\bar{a}}, {\bar{b}}\in \mathbb {R}_{++}^2\) since \(N>3\).
Consider now, for \(p>1\), the map \(\varphi (x):=\Vert x\Vert _p^p=|x_1|^p+|x_2|^p\) defined on \(\mathbb {R}^2\) and \({\bar{\varphi }}(x)=\varphi (g(x))\). Observe that \(\varphi \) is strictly convex and differentiable. Let \({\bar{\Phi }}\) denote the Bregman divergence associated with \({\bar{\varphi }}\), and let \({\bar{\Psi }}(x):= \frac{1}{2}({\bar{\Phi }}(x,{\bar{a}})+{\bar{\Phi }}(x,{\bar{b}}))\). We claim that 0 is the unique point of minimum of \({\bar{\Psi }}\) over \(\mathbb {R}_{+}^2\). Indeed,
from which we obtain
It follows that \(\nabla {\bar{\Psi }}(0)\in \mathbb {R}_{++}^2\) if \(p>1\) is chosen close enough to 1, so that \(1-N^{p-1}/2>0\). Then, since \({\bar{\Psi }}\) is convex, we have
showing the claim.
Consider now the modification \(\hat{\varphi }\) of \(\bar{\varphi }\), so that \(\hat{\varphi }(x)=\bar{\varphi }(x)\) for \(x\in \mathbb {R}_{+}^2\), and \(\hat{\varphi }(x)=+\infty \) otherwise. The function \(\hat{\varphi }\) is strictly convex, lower-semicontinuous, and differentiable on the interior of its domain, but not of Legendre type, and the conclusion of Lemma 5 does not apply to it.
The geometric intuition leading to this example is described in the figure.
The example illustrated. The point u is the unconstrained minimum of the sum of Bregman divergences \(\Psi (x):=\Phi (x,a)+\Phi (x,b)\) associated with \(\varphi (x)=x_1^p+x_2^p\), here \(p=1.2\). Level curves of \(\Psi \) are shown. The minimum of \(\Psi \) on the simplicial cone C is at the unit vector e. An affine change of variables sending C to the standard quadrant, and a lift to the cone of positive semidefinite matrices leads to Proposition 11.
1.2 Need for the Legendre condition in Theorem 6
We next construct an example showing that the Legendre condition in the second statement of Theorem 6 cannot be dispensed with. Observe that the inverse of the linear operator L in (59) is given by
In particular, it is a nonnegative matrix.
We set \(\tau =\left( {\begin{matrix}0&{}1\\ 1&{} 0\end{matrix}}\right) \), and consider the “quantum” analogue of L, i.e.
is a completely positive map leaving \(\mathbb {P}\) invariant. The analogue of the map g is
where I denotes the identity matrix.
We now consider the map \(\varphi (X):= \Vert X\Vert _p^p={\text {tr}}(|X|^p)\) defined on the space of Hermitian matrices. The function \(\varphi \) is differentiable and strictly convex, still assuming that \(p>1\). We set \({\bar{A}}:= {\text {diag}}({\bar{a}})\in \mathbb {P}\), \({\bar{B}}:={\text {diag}}({\bar{b}})\in \mathbb {P}\), and now define \({\bar{\Phi }}\) to be the Bregman divergence associated with \({\bar{\varphi }}:= \varphi \circ G\). Let
We then have the following result.
Proposition 11
The minimum of the function \({\bar{\Psi }}\) on the closure of \(\mathbb {P}\) is achieved at point 0. Moreover, the equation
has no solution X in \(\mathbb {P}\). \(\square \)
From [3] (Theorem 2.1) or [1] (Theorem 2.3), we have
where \(X=U|X|\) is the polar decomposition of X. In particular, if X is diagonal and positive semidefinite,
Then, by a computation similar to the one in the scalar case above, we obtain
We conclude, as in (60), that
where now \(\langle \cdot ,\cdot \rangle \) is the Frobenius scalar product on the space of Hermitian matrices. It follows that 0 is the unique point of minimum of \({\bar{\Psi }}\) on \({\text {clo}}\mathbb {P}\).
Moreover, if Eq. (61) had a solution \(X\in \mathbb {P}\), the first-order optimality condition for the minimisation of the function \({\bar{\Psi }}\) over \(\mathbb {P}\) would be satisfied, showing that \({\bar{\Psi }}(Y)\geqslant {\bar{\Psi }}(X)\) for all \(X\in \mathbb {P}\), and by density, \({\bar{\Psi }}(0)\geqslant {\bar{\Psi }}(X)\), contradicting the fact that 0 is the unique point of minimum of \({\bar{\Psi }}\) over \({\text {clo}}\mathbb {P}\). \(\square \)
Bhatia, R., Gaubert, S. & Jain, T. Matrix versions of the Hellinger distance. Lett Math Phys 109, 1777–1804 (2019). https://doi.org/10.1007/s11005-019-01156-0
Issue Date:
DOI: https://doi.org/10.1007/s11005-019-01156-0