1. Introduction
Methods of Information Geometry (IG) [
1] are ubiquitous in physical sciences and encompass both classical and quantum systems [
2]. The natural object of study in IG is a quadruple
given by a smooth manifold
, a Riemannian metric
, and a pair of affine connections on
, which are dual with respect to
,
for all sections
. The quadruple
is called a statistical manifold whenever the dual connections are both torsion-free [
3]. Actually, the notion of the statistical manifold, introduced by Lauritzen [
4], is usually referred to the triple
, where
is a three-symmetric tensor. However, when
and
are both torsion-free connections, the structures
and
are equivalent [
3]. When
is a manifold of probability distributions,
is the Fisher metric, and
and
are the exponential and mixture connections [
5], IG has been successfully applied to many fields, such as statistical inference, control systems theory, and neural networks (see [
6] and the references therein for a comprehensive literature on applications of IG).
The geometric structure of a statistical manifold is encoded by a smooth function
such that:
for all
[
7]. The dualistic structure
of
is then recovered in the following way:
Here,
and
and
are local coordinate systems of
and
, respectively. Moreover,
,
are the symbols of the dual connections
and
, respectively. The function
is called a divergence or contrast function of the statistical manifold
[
8].
The function
is called a flat divergence if the dualistic structure
introduced on a smooth manifold
by Equations (
3) and (4) is flat, namely the curvature tensors
and
are zero. In this particular case, an attempt to connect IG with physics was established in [
9]. There, the connection between IG and integrable dynamical systems was bridged by a divergence, which is canonical in some sense. More precisely, for
, the gradient flows
and
converge to the point
along the
-geodesic and the
-geodesic, respectively. In this context, when
is the manifold of Gaussian distributions with mean
and variance
, the Kullback-Leibler (KL) divergence induces a dualistic structure given by the Fisher metric and the
connections. In this case, the dynamics of the above-mentioned gradient flows, given now in terms of the KL divergence, turns out to be the Uhlenbeck-Ornstein process
characterized by the drift coefficient
and diffusion coefficient
[
9]. Further connections between IG and dynamical systems can be found in [
10], where certain gradient flows on Gaussian and multinomial distributions are characterized as completely integrable Hamiltonian systems.
The Kullback-Leibler divergence has been effectively employed to quantify the complexity of a system described by a probability distribution
in terms of its deviation from an exponential family of probability distributions [
11]. The quantum version of the KL divergence, namely the quantum relative entropy, induces on the manifold of quantum states a dually flat structure given by a quantum version of the Fisher metric tensor and two flat connections, also called the mixture and the exponential connections, which are dual in the sense of Equation (
1) [
12]. Moreover, the quantum relative entropy has been used to quantify the many-party correlations of a composite quantum state
as the deviation of it from a Gibbs family of quantum states [
13]. The utility of the quantum relative entropy as a measure of the complexity for quantum states was shown in [
14], where algorithms for its evaluation were studied. Furthermore, in that context, the many-party correlations were related to the entanglement of quantum systems as defined in [
15].
A generalization of the flat structures induced by the Kullback-Leibler divergence on the finite classical systems and by the quantum relative entropy on the finite quantum states is provided by the classical
-divergence and the quantum
-divergence, respectively. Both generate a one-parameter family of connections, the
-connections, which are dual with respect to the Fisher metric in the classical case [
1], whereas in the quantum case, they are dual with respect to the Wigner-Yanase-Dyson metric [
16].
From a physical viewpoint, it is worth remarking that the Boltzmann-Gibbs distribution in statistical physics is an exponential family such that an invariant flat structure is given to the underlying manifold in terms of the
-connection [
6]. Tsallis generalized the concept of the Boltzmann-Gibbs distribution by introducing a generalized entropy, called the
-entropy, for studying various phenomena not included in the conventional Boltzmann-Gibbs framework [
17,
18]. Actually, the
-geometry, which is induced by the
-divergence, covers the geometry of
-entropy physics [
19]. Therefore, the
-divergence can be understood as a generalization of the classical KL divergence also from a physical standpoint. In the quantum case, the geometry of
-entropy physics has been successfully employed for carrying out a criterion that detects the critical frontier, which has separable states on one side and quantum entangled ones on the other [
20].
The purpose of the present article is to consider a recent canonical divergence, introduced by Ay and Amari in [
21], as a tool for unifying classical and quantum information geometry. In particular, we aim to prove that this canonical divergence is the classical
-divergence when computed on the space of positive measures, as well as the quantum
-divergence if evaluated on the manifold of positive definite Hermitian operators.
2. Canonical Divergence and the Inverse Problem in Information Geometry
The inverse problem within information geometry concerns the search for a divergence function
, which recovers a given dual structure
of a smooth manifold
according to Equations (
3) and (4). The solution to this problem was provided by Matumoto, who showed that such a divergence always exists for any statistical manifold [
22]. Nonetheless, this solution is not unique, and infinitely many divergences can be defined on
, which gives the same dualistic structure. For this reason, seeking a divergence that can be considered as the most natural is of utmost importance. To this end, Amari and Nagaoka defined a Bregman-type divergence on dually flat manifolds [
1]. This one has relevant properties such as the generalized Pythagorean theorem and the geodesic projection theorem, and it is commonly assessed as the natural solution of the inverse problem in information geometry for dually flat manifolds. This is exactly why the Amari and Nagaoka divergence is referred to as the canonical divergence of dually flat statistical manifolds. However, the need for a general canonical divergence, which applies to any dualistic structure, is a very crucial issue, as pointed out in [
23]. According to the theory developed in [
3], a divergence function
of a statistical manifold
is called canonical if:
generates the dualistic structure
based on Equations (
3) and (4);
is one half of the squared Riemannian distance, i.e., , when the statistical manifold is self-dual, namely when coincides with the Levi–Civita connection of ;
is the canonical divergence of the Bregman type when is dually flat.
Ay and Amari recently defined a divergence for a general dualistic structure in terms of geodesic integration of the inverse exponential map [
21]. It turns out that such a divergence satisfies all the above-mentioned requirements. Therefore, it can be viewed as a canonical divergence for a general dualistic structure. For
, consider the
-geodesic
connecting
with
, the recent canonical divergence introduced in [
21] is then defined by:
The
-exponential map,
, is defined by
whenever the
-geodesic
, satisfying
, exists on an interval of
containing
. Therefore, if
is the
-geodesic such that
and
, the inverse at
of the exponential map is given by
. According to this definition, for every
, we can consider the
-geodesic
such that
and
and then define the
-velocity vector at
by
. In this way, the vector field
of Equation (
5) turns out to be given by
. Here,
is the
-parallel transport from
to
. In light of all this, the divergence
assumes the following useful expression:
Analogously, the dual function of
is defined as the
-geodesic integration of the inverse of the
-exponential map [
21]. Therefore, we have for the dual divergence
a similar expression as Equation (
6) for the canonical divergence
:
where
is the
-geodesic connecting
with
.
The canonical divergence given by Equation (
6) has been recently proposed as a tool for unifying classical and quantum information geometry [
24]. In particular, it has been considered on the simplex of probability measures:
as well as on the manifold of quantum finite states,
where
denotes a finite-dimensional Hilbert space and
is any Hermitian operator on
. The natural dualistic structure on the simplex
is given, in classical information geometry, in terms of the Fisher metric
and two flat connections, the mixture
and the exponential
ones, which are dual with respect to
in the sense of Equation (
1) [
5]. Very remarkably, the Fisher metric is the only monotone Riemannian metric (up to a positive factor) on the class of finite probability simplices [
25]. The quantum version of a monotone Riemannian metric on the manifold of quantum finite states
is given in terms of the notion of stochastic mapping [
26]. More precisely, let
be the set of all Hermitian operators on the Hilbert space
; a linear mapping
is said to be stochastic if
and
is completely positive [
27]. Furthermore, a family
of inner products on
is said to be monotone if
for any arbitrary stochastic map
and for every
[
26]. Due to Petz, there are infinitely many monotone inner products on
. Therefore, the quantum analogue of the Fisher metric is not unique [
28]. However, when the flat connections are required to be torsion-free, a natural dualistic structure on the manifold of quantum states
is the one induced by the Bogoliubov-Kubo-Mori (BKM) inner product [
12]. Furthermore, it turns out that the only monotone metrics that make the mixture connection
and the exponential connection
dual are the scalar multiples of the BKM metric [
29]. When the canonical divergence (
6) is computed on the simplex
, it is shown to be the Kullback-Leibler divergence,
which proves that
recovers the natural dualistic structure on
given by the Fisher metric
and the dually flat connections
and
[
24]. Analogously, when
is considered on the manifold of quantum states
, it has been proven that:
where
denotes the trace operator on the finite-dimensional Hilbert space of density matrices [
24]. The function on the right-hand side of Equation (
11) is called the quantum relative entropy, and it recovers the dual structure of
given by the metric induced by the BKM inner product and the flat connections
and
[
1].
In this article, we aim to investigate the canonical divergence (
6) on the manifold of positive measures, as well as on the space of positive definite Hermitian operators for the more general
-connections. In the classical information geometry, the one-parameter family of the
-connections is defined on the manifold of positive measures by the linear combination of the mixture and exponential connections [
6],
It turns out that
and
are dual with respect to the Fisher metric
in the sense of Equation (
1). In quantum information geometry,
-connections appeared also in terms of the Amari
-embeddings [
30]. While in the classical information geometry, the two definitions coincide, this is no longer true in quantum information geometry [
31]. In the present paper, we consider the definition of the quantum
-connections by means of the Amari
-embedding, which excludes that they can be obtained by the convex mixture of
and
connections. The natural inner product that makes the quantum
-connections dual in the sense of Equation (
1) is the Wigner-Yanase-Dyson (WYD) metric, which appeared for the first time in the context of quantum information geometry in the work of Hasegawa [
32]. Actually, it turns out that this is the only monotone metric (up to a scalar multiple) that makes the quantum
and
dual [
16].
3. Classical Flat Alpha-Divergence
We represent measures on the set
as elements of
. In this representation, the Dirac measures
form the canonical basis of
. The
-dimensional cone of positive measures on the set
is then defined by:
The very natural Riemannian metric on
is the Fisher metric [
6], which is defined by:
for all
and
. Here,
denotes the tangent space to
at
. Given the mixture and the exponential connections, we can define the
-connection on
by using Equation (
12). Recalling that
, we can write any
with respect to the canonical basis of
. Hence, the
-connection on
reads as follows:
for all vector fields
and
on
. Here,
denotes the derivative in the direction
. On the contrary, the
-connection is given by:
Therefore, by applying Equation (
12), we can describe the
-connection for the manifold
as follows,
for all vector fields
on
. It turns out that the dualistic structure
is dually flat, i.e., the Riemannian curvature tensors of
and
are zero [
6]. Furthermore, very naturally, such an
-flat dualistic structure is induced on
by the
-divergence
, which is commonly assessed as the canonical solution of the inverse problem for recovering the dually flat
-structure
on
[
6]. Given
, the
-divergence between
and
is given by:
The flat
-divergence (
18) has been deeply studied, and its features have been widely discussed in the literature (see [
3,
6] for more details). In particular, we point out that
is a continuous function of the parameter
, and the limit
gives the well-known Kullback-Leibler divergence on the manifold of positive measures,
whereas the limit
gives:
Let us observe that, if we restrict (
19) to the simplex of probability distributions, namely when
, then we obtain the function (
10). At this point, it is worth mentioning the close connection between the
-divergence (
18) and the Tsallis relative entropy, or
-divergence, on the manifold of positive measures. The
-divergence on
is defined by (see [
33] for more details):
By setting
, we can easily verify that the flat
-divergence (
18) and the Tsallis
-divergence (
20) coincide up to a scaling factor. Further, the limit
, that is
, recovers the Kullback-Leibler divergence,
In this section, we aim to compute the canonical divergence (
6) on
for the
-connections given by (
17). In order to achieve this result, we need to consider the geodesic with respect to the
-connection. Let
, a curve
from
to
is an
-geodesic iff
and
,
. From Equation (
17), we then obtain the following geodesic equations:
where
. Hence, the solution of (
21) is the
-geodesic from
to
, and it is given by:
At this point, we can apply Equation (
6) to compute the canonical divergence
on the manifold
of positive measures. From Equation (
14), we obtain:
Now, we can compute the integral in (
6) by performing an integration by parts:
This proves that:
that is the canonical divergence (
6) coincides with the
-divergence (
18) on the manifold
of positive measures.
4. Quantum Flat Alpha-Divergence
The goal of this section is to compute the canonical divergence of Equation (
6) on the manifold of positive definite Hermitian operators endowed with the quantum
-connections. In the classical case, the definition of the
-connection
can be equivalently given by Equation (
12) or by means of the well-known Amari
-embedding [
6]. In the quantum setting, by exploiting the linear structure of the manifold, we can introduce the quantum mixture connection
on the space of positive definite Hermitian operators. On the other hand, the quantum exponential connection
is introduced by relying on the linear structure of logarithms of the manifold. Both connections,
and
, are flat. Furthermore, they turn out to be dual in the sense of Equation (
1) with respect to the metric induced by the BKM inner product [
12]. Actually, this metric is (up to a scalar multiple) the only one that makes
and
dual on the manifold of positive definite Hermitian operators. A generalization of these two natural connections is provided by the quantum
-connections, which, on the manifold under consideration, are flat ones, as well. In our approach, these
-connections are introduced in terms of the Amari
-embeddings, and then, they cannot be obtained by the convex combination of the mixture and the exponential connections as in the classical setting [
16]. It turns out that for
, the connections
and
are torsion-free and dual with respect to the WYD metric [
34]. However, the target spaces of our approach are
, with
, and then, we restrict our discussion to the range
.
Recall that in order to compute the canonical divergence (
6), we need to get the
-geodesic
between any two positive Hermitian operators
and
. To do this, we shall describe the
-connections in terms of the
-parallel transport, which is introduced very naturally on the manifold of positive Hermitian operators. Given
, the algebra of linear operators on an
-dimensional complex Hilbert space
, the subspace of Hermitian operators is an
-dimensional real vector space defined by:
where
and
, here, denotes the transpose matrix. Therefore, the manifold of all positive definite Hermitian operators, or more simply, quantum operators, is given by:
The
-embedding:
maps the manifold of quantum operators into the vector space
for all
[
1]. Furthermore, for
, we have that
. Hence, the
-embedding supplies a useful representation of the tangent bundle of
. In fact, by considering the subspace of
given by:
we can then define the isomorphism:
where
is a curve such that
. This isomorphism provides the
-representation of the tangent space
[
16]. In particular, if
is a coordinate system for
, then the
-representation of the basis
of
is
, where
. Finally, for any vector field
, we have that its
-representation is defined by:
From Equation (
26), we may observe that the
for all
. Now, since
, we can simply define the
-parallel transport on
by:
Therefore, we find that the
-representation of the covariant derivative
associated with
is:
where
is an arbitrary coordinate system for
and
is the topological dimension of the tangent space
. In order to show that
is
-flat, consider a basis
of
. Then, there exist real numbers
such that we can write
as follows:
for every
. We can see from Equation (
31) that:
This proves that
is a
-affine coordinate system for
, and then,
is
-flat [
16]. As a consequence, for any couple of points
, we can write the
-geodesic from
to
in these
-affine coordinates as follows:
In order to perform the computation of the integral (
6), we need to calculate the norm
. To do this, we have to specify a suitable metric on the tangent space
. As discussed above, we select the metric induced by the WYD inner product because this metric is the only one (up to a scalar multiple) that makes the quantum flat
-connections dual in the sense of Equation (
1) [
29]. For any
vector fields on
, this metric turns out to be defined by:
where
denotes the
-representation of the tangent vector
, and it is given by Equation (
29) [
35]. It is worth noticing that the limit
gives the metric induced by the BKM inner product on the manifold of density operators [
12].
In order to write the WYD metric with respect to the
-affine coordinate system
, we may observe that:
where
is a vector of the basis
. This implies that, with this coordinate system, the
-representation
of a vector field
is the vector field
itself. In addition, we also have that:
Therefore, in the
-affine coordinate system
, the components of the metric tensor (
33) are given by:
The quantum dually flat structure
for the manifold of positive definite Hermitian operators (or quantum operators) so far described can be also obtained through Equations (
3) and (4) when the following divergence is considered,
This function is called the quantum
-divergence, and it has been introduced on
as the generalization of the
-divergence (
18) of the positive measures [
36]. Carrying on this line of reasoning, we can introduce a
-divergence on the manifold of positive Hermitian operators, as well. In analogy with the classical case, we can set
in the argument of the trace
in Equation (
36), and then, we can write the following expression,
which corresponds to the quantum
-divergence up to the same scalar factor as in the classical case. It is worth noting that this function is different from the extensions of the Tsallis relative entropy to the positive operators in the literature. For example, in [
37], the quantum
-divergence was introduced in the following way,
for all
. However, both functions,
and
, reduce to the Tsallis relative entropy when restricted to the manifold
of density operators,
Analogously, we can restrict the quantum
-divergence (
36) to the set of density operators, and since
, we can easily verify that (
36) becomes:
Again, we can see that, on the manifold of density operators, the quantum
-divergence coincides (up to a scalar factor) with the Tsallis relative entropy when we set
. The quantum
-divergence (
39) was introduced and studied by Hasegawa in [
32], and it turns out to be a continuous function of the parameter
. In particular, we have that:
that is the quantum relative entropy
as previously given in Equation (
11). Furthermore, the limit
gives
.
At this point, we have in our hands all the ingredients necessary to compute the canonical divergence defined in (
6) between
and
on the manifold
of positive definite Hermitian operators,
where
is the
-geodesic from
to
. In the
-affine coordinates
, this reads as in Equation (
32). Here, the norm
is induced by the WYD metric given in Equation (
33). Hence, in this case, the canonical divergence (
40) can be written as:
where
and
denote the
and the
representations of
. In order to compute the
representations of
, we consider Equation (
29) together with Equations (
34) and (
35). This yields:
We can plug these expressions in Equation (
41). Hence, by performing an integration by parts, we get:
At this point, we can use Equation (
32) to obtain:
Finally, we can conclude that:
which corresponds to the
-canonical divergence on the manifold of positive definite matrices.
5. Conclusions
The present article is a follow-up of the recently published paper [
24]. In the latter one, the authors showed that the canonical divergence defined in Equation (
6) provides a powerful tool for unifying classical and quantum information geometry. In particular, such a divergence was proven to coincide with the Kullback-Leibler divergence on the simplex of probability distributions and with the quantum relative entropy on the manifold of quantum states. The effectiveness of the Kullback-Leibler divergence was ascribed in the context of complex systems for quantifying how much a probability measure deviates from the non-interacting states that are modeled by exponential families of probabilities [
11]. On the other hand, the quantum relative entropy turned out to be relevant for providing a measure of the many-party correlations of a quantum state from a Gibbs family [
13], which in turn was related to the entanglement of quantum systems as defined in [
15].
The
-divergence (
18) is a generalization of the relative entropy (
10). Moreover, the flat
-geometry induced by the
-divergence on the manifold of positive measures constitutes a generalization of the dually flat structure given by the Fisher metric and the mixture and exponential connections. Very remarkably, the
-geometry covers the geometry of
-entropy physics [
19]. This bridges a very nice connection between the
-geometry and the generalized statistical mechanics established by Tsallis [
17,
18]. On the quantum side, the
-divergence (
36) was introduced as the generalization of the
-divergence (
18) of the positive measures [
36]. The quantum
-geometry of the manifold of positive definite matrices is flat, as well, and turns out to be a generalization of the quantum geometry given by the quantum Fisher metric and the quantum mixture and exponential connections induced on the manifold of quantum states by the BKM inner product [
16]. In the quantum setting as well, the connection between the
-geometry and the
-entropy physics by Tsallis provides a physical interpretation of the
-divergence. Indeed, a conditional form of the
-entropy was employed for the exact calculation, on some systems, of the separable-entangled separatrix [
17].
In this article, we computed the canonical divergence (
6) for flat
-connections on the manifold of positive measures, as well as on the manifold of positive definite Hermitian operators. Therefore, we proved that the divergence introduced by Ay and Amari in [
21] reduces to the classical
-divergence (
18) and to the quantum
-divergence (
36). Actually, the equivalence between the canonical divergence (
6) and the classical
-divergence was primarily shown in [
21]. There, the derivation of
was grounded on the idea of a squared distance function associated to the
-connections through the vector field
given in Equation (
5). However, the
-divergence
does not share all the properties of a squared distance function, unless
. For this reason, in the present paper, we obtained the equivalence between
and
in a different way, by considering the
-divergence as a function of a more general structure.
The present paper is conceived within a project that aims to characterize a general canonical divergence for a given dualistic structure
of a smooth manifold
. This project started with the work by Ay and Amari [
21] and later developed in [
3], where the concept of the “canonical divergence” was clearly depicted. A considerable effort towards the definition of a general canonical divergence was put forward in [
38], where a divergence function was introduced through an extensive investigation of the geodesic geometry of the dualistic structure
. Actually, this latter divergence turns out to coincide with the divergence (
6) when the dualistic structure
is flat. Further work around this topic was presented in [
39], where the very recent divergence was compared with other divergence functions present in the literature. It is commonly accepted that the
-divergence on the simplex is obtained by restricting (
18) to the set of normalized positive measures (see [
1,
6] for more details). Then, the
-divergence of probability distributions in the simplex reads as follows,
One can easily verify that for
, this divergence is closely related to the Hellinger distance, which is the distance in the ambient space
[
3]. However, when
, the
-connection is the Levi–Civita connection of the Fisher metric. Therefore, in this case, the canonical divergence has to be one half the square of the Fisher distance, as discussed in
Section 2, which is different from
[
3]. It would be interesting to evaluate the very recent divergence introduced in [
38] on the simplex of probability distributions, which is not
-flat, and to compare it with the
-divergence (
43).
On the quantum side, an intriguing work within the above-mentioned project is to consider that very recent divergence function, introduced in [
38], on the manifold of pure quantum states, where a dually flat structure does not exist [
23]. This will constitute the object of study of a forthcoming investigation.