1. Introduction
In the early days of quantum information theory, the term “quantum communication” would typically have been understood to refer to the transmission of
classical information via quantum mechanical signals. Such communication can be done in a sophisticated way, with the receiver making joint measurements on several successive signal particles [
1,
2], or it can be done in a relatively straightforward way with the receiver performing a separate measurement on each individual signal particle. In both cases, but especially in the latter case, a particularly interesting quantity, given an ensemble of quantum states to be used as an alphabet, is the ensemble’s
accessible information. This is the maximum amount of information that one can obtain about the identity of the state, on average, by making a measurement on the system described by the specified ensemble. The average here is over the outcomes of the measurement, and the maximization is over all possible measurements. In general, accessible information can be defined for ensembles consisting of pure and mixed states, but in this paper, we consider only pure-state ensembles.
Any ensemble
of pure quantum states with their probabilities has a unique density matrix. However, for any given density matrix
representing more than a single pure state, there are infinitely many ensembles—“
-ensembles”—described by that density matrix. Thus, it is natural to ask the following question: for a given density matrix
, what pure-state
-ensemble has the greatest value of the accessible information and what pure-state
-ensemble has the lowest value? The former question was answered by an early (1973) result in quantum information theory [
3]—the pure-state
-ensemble with the greatest accessible information is the one consisting of the eigenstates of
with weights given by the eigenvalues. The latter question was answered in a 1994 paper [
4], in which the
-ensemble minimizing the accessible information was called the Scrooge ensemble, or Scrooge distribution, since it is the ensemble that is most stingy with its information.
To see a simple example, consider a spin-1/2 particle whose density matrix
has the
and
states as its eigenvectors, with eigenvalues
and
. The eigenstate ensemble for
, that is, the
-ensemble from which one can extract the most information, is the two-state ensemble consisting of the
state with probability
and the
state with probability
. The optimal measurement in this case—the measurement that provides the most information—is the up-down measurement, and the amount of information it provides is equal to the von Neumann entropy of the density matrix:
On the other hand, the Scrooge ensemble for this density matrix is represented by a continuous probability distribution over the whole surface of the Bloch sphere. If
is larger than
, then this continuous distribution is weighted more heavily towards the top of the sphere. We can write the Scrooge distribution explicitly in terms of the variable
, where
is the angle measured from the north pole:
The probability density
is normalized in the sense that
(the distribution is uniform over the azimuthal angle). Again, this is the ensemble of pure states from which one can extract the least information about the identity of the pure state, among all ensembles with the density matrix
. Somewhat remarkably, the average amount of information one gains by measuring this particular ensemble is entirely independent of the choice of measurement, as long as the measurement is complete—that is, as long as each outcome is associated with a definite pure state. This amount of information comes out to be a quantity called the subentropy
Q of the density matrix:
We give more general expressions for both the Scrooge ensemble and the subentropy in
Section 2 below.
In recent years, the Scrooge distribution has made other appearances in the physics literature. Of particular interest is the fact that this distribution has emerged from an entirely different line of investigation, in which the system under consideration is entangled with a large environment and the whole system is in a pure state. In that case, if one looks at the
conditional pure states of the original system relative to the elements of an orthogonal basis of the environment, one typically finds that these conditional states are distributed by a Scrooge distribution [
5,
6,
7,
8]. In this context, the distribution is usually called a GAP measure (Gaussian adjusted projected measure, the three adjectives corresponding to the three steps by which the measure can be constructed). On another front, the Scrooge distribution has been used to address the difficult problem of bounding the
locally accessible information when there is more than one receiver [
9].
Meanwhile, the concept of subentropy, which originally arose (though without a name) in connection with the outcome entropy of random measurements [
10,
11], has appeared not only in problems concerning the acquisition of classical information [
12,
13,
14], but also in the quantification of entanglement [
15] and the study of quantum coherence [
16,
17,
18,
19]. Many detailed properties of subentropy have now been worked out, especially concerning its relation to the Shannon entropy [
20,
21,
22,
23,
24].
Though it is possible to devise a strictly classical situation in which subentropy arises [
22], the Scrooge distribution has generally been regarded as a purely quantum mechanical concept. It is, after all, a probability distribution over pure quantum states. The aim of this paper is to provide a classical interpretation of the Scrooge distribution, and in this way, to provide a new window into the relation between quantum mechanics and classical probability theory.
We find that it is much easier to make the connection if we begin by considering not the standard Scrooge distribution, but rather the analogous distribution one obtains for the case of quantum theory with
real amplitudes. In that case, the dimension of the set of pure states is the same as the dimension of the associated probability simplex, and we find that there is a fairly natural distribution within classical probability theory that is essentially identical to the real-amplitude version of the Scrooge distribution. This distribution arises as the solution to a certain classical communication problem that we describe in
Section 4.
With this interpretation of the real-amplitude Scrooge distribution in hand, we ask how the classical communication scenario might be modified to arrive at the original Scrooge ensemble for standard, complex-amplitude quantum theory. As we will see, the necessary modification is not particularly natural, but it is simple.
Thus, we begin in
Section 2 and
Section 3 by reviewing the derivation of the Scrooge distribution and by working out the analogous distribution for the case of real amplitudes. Then, in
Section 4, we set up and analyze the classical communication problem that, as we show in
Section 5, gives rise to a distribution that is equivalent to the real-amplitude Scrooge distribution. In
Section 6, we modify the classical communication scenario to produce the standard, complex-amplitude Scrooge distribution. Finally, we summarize and discuss our results in
Section 7.
2. The Scrooge Distribution
There are several ways in which one can generate the Scrooge distribution. In this section, we review the main steps of the derivation given in Ref. [
4], which applies to a Hilbert space of finite dimension. (The distribution can also be defined for an infinite-dimensional Hilbert space [
5,
6,
7,
8].) We begin by setting up the problem.
We imagine the following scenario. One participant, Alice, prepares a quantum system with an n-dimensional Hilbert space in a pure state and sends it to Bob. Bob then tries to gain information about the identity of this pure state. Initially, Bob’s state of knowledge is represented by a probability density over the set of pure states. (The symbol x represents a multi-dimensional parameterization of the set of pure states.) Bob makes a measurement on the system and thereby gains information. The amount of information he gains may depend on the outcome he obtains, so we are interested in the average amount of information he gains about x, the average being over all outcomes.
The standard quantification of Bob’s average gain in information is the Shannon mutual information between the identity of the pure state and the outcome of the measurement. We can express this mutual information in terms of two probability functions: (i) the probability
of the outcome
j when the state is
, and (ii) the overall probability
of the outcome
j averaged over the whole ensemble. In terms of these functions, the mutual information is
The
accessible information of the ensemble defined by
is the maximum value of the mutual information
I, where the maximum is taken over all possible measurements.
Again, for a given density matrix
, the Scrooge distribution is defined to be the pure-state
-ensemble with the lowest value of the accessible information. One can obtain the Scrooge distribution via the following algorithm [
4].
We start by recalling the concept of “
distortion.” Consider for now a finite ensemble
of pure states (
) whose density matrix is the completely mixed state:
Let
be the subnormalized state vector
, so that
Under
distortion, each vector
is mapped to another subnormalized vector
defined by
Note that the density matrix formed by the
’s is
:
In terms of normalized vectors, the new ensemble is
, with the new probabilities
equal to
In this way, any ensemble having the completely mixed density matrix can be mapped to a “
distorted” ensemble with a density matrix
.
The Scrooge ensemble is a continuous ensemble, not a discrete one, but the concept of distortion can be immediately extended to the continuous case, and the Scrooge distribution can be easily characterized in those terms; it is the distortion of the uniform distribution over the unit sphere in Hilbert space. The uniform distribution is the unique probability distribution over the set of pure states that is invariant under all unitary transformations.
Let us see how the
distortion works out in this case. First, for the uniform distribution, it is convenient to label the parameters of the pure states by
y instead of
x, so that we can reserve
x for the Scrooge distribution. Let
be the probability density over
y that represents the uniform distribution over the unit sphere (a particular parameterization will be specified shortly). In terms of normalized states, a
distortion maps each pure state
into the pure state
defined by
This mapping defines
x as a function of
y:
. (We write
f explicitly below.) The resulting probability density over
x is obtained from the continuous version of Equation (
9).
Here,
is the Jacobian of the
y variables with respect to the
x variables. On the right-hand side of Equation (
11), each
y is interpreted as
, so that we get an expression that depends only on
x.
To get an explicit expression for the Scrooge distribution—that is, an explicit expression for the probability density
—we need to choose a specific set of parameters labeling the pure states. We choose the same set of parameters to label both the uniform distribution (where we call the parameters
y) and the Scrooge distribution (where we call the parameters
x). We define our parameters relative to a set of normalized eigenstates
of the density matrix
. A general pure state
can be written as
where each
is a non-negative real number, and each phase
runs from zero to
. For definiteness, employing the freedom to choose an overall phase, we define
to be zero. We take
x (or
y) to consist of the following parameters: the squared amplitudes
for
, and the phases
for
. This set of
parameters uniquely identifies any pure state. Later, we also use the symbol
. Note that the
s are the probabilities of the outcomes of a particular orthogonal measurement associated with the eigenstates of
.
In terms of these parameters, the uniform distribution over the unit sphere takes a particularly simple form: it is the product of a uniform distribution over the phases and a uniform distribution over the
-dimensional probability simplex whose points are labeled by
[
25]. The Scrooge distribution will likewise be a product and will be uniform over the phases but will typically have a certain bias over the probability simplex. Because the phases are always independent and uniformly distributed in the cases we consider, we omit the phases in our distribution expressions, writing the probability densities as functions of
(or
).
Our aim now is to find explicit expressions for each of the factors appearing on the right-hand side of Equation (
11). Since the uniform distribution over the unit sphere induces a uniform distribution over the probability simplex, the corresponding probability density
is a constant function, with the value of the constant being
as required by normalization:
The function
defined by the
-distortion map, Equation (
10), is given by
where the
’s are the eigenvalues of the density matrix
. One finds that the inverse map is
and the Jacobian is
Meanwhile, the factor
can be written as
By substituting the expressions from Equations (
16) and (
17) into Equation (
11), we finally arrive at the probability density defining the Scrooge distribution:
This probability density is normalized in the sense that the integral over the probability simplex is unity:
Now, how do we know that the distribution given by Equation (
18) minimizes the amount of accessible information? First, one can show that for this distribution the mutual information
I is independent of the choice of measurement as long as the measurement is complete [
4]. So, one can compute the value of the accessible information by considering any such measurement, and the easiest one to consider is the orthogonal measurement along the eigenstates. The result is
which defines the subentropy
Q. One can also show that for
any -ensemble, the
average mutual information over all complete orthogonal measurements is equal to
Q, which implies that
Q is always a lower bound on the accessible information. Since the Scrooge distribution
achieves the value
Q, it achieves the minimum possible accessible information among all
-ensembles.
3. The Real-Amplitude Analog of the Scrooge Distribution
Though our own world is described by standard quantum theory with complex amplitudes, we can also consider an analogous, hypothetical theory with real amplitudes. A pure state in the real-amplitude theory is represented by a real unit vector, and a density matrix is represented by a symmetric real matrix with non-negative eigenvalues and unit trace. Time evolution in this theory is generated by an antisymmetric real operator in place of the antihermitian operator .
The question considered in the preceding section can also be asked in regard to the real-amplitude theory. Given a density matrix , we ask what -ensemble has the smallest value of accessible information. It turns out that essentially all of the methods used in the preceding section continue to work in the real case. Again one begins with the uniform distribution over the unit sphere of pure states, and again, one obtains the Scrooge ensemble (in this case the real-amplitude Scrooge ensemble) via distortion. The arguments leading to the conclusion that the ensemble produced in this way minimizes the accessible information work just as well in the real-amplitude case as in the complex-amplitude case.
The one essential difference between the two cases lies in the form of the initial probability density that is associated with the uniform distribution over the unit sphere in Hilbert space. Whereas in the complex case the induced distribution over the probability simplex is uniform, in the real case, the induced distribution over the probability simplex is more heavily weighted towards the edges and corners.
We can see an example by considering the case with
. Instead of starting with a uniform distribution over the surface of the Bloch sphere, one starts with a uniform distribution over the unit circle in a two-dimensional real vector space. Let
be the angle around this circle measured from some chosen axis (once a density matrix has been specified, we will take this axis to be along one of the eigenstates of the density matrix). Then,
is initially uniformly distributed. The parameter analogous to
of the preceding section is
. Note that
y runs from 0 to 1 as
runs from 0 to
. The initial probability density
is therefore obtained from
which leads to
(the subscript
r represents “real”). This is in contrast to the function
that would apply in the complex-amplitude case. We see that in the real case,
is largest around
and
.
For n dimensions, we take as our parameters specifying a pure state (i) the first probabilities () of the outcomes of a certain orthogonal measurement (which we will choose to be the measurement along the eigenvectors of the given density matrix), and (ii) a set of discrete phase parameters (each of them taking the values ), which will always be independently and uniformly distributed and therefore suppressed in our expressions for the probability densities.
For the uniform distribution over the unit sphere in the
n-dimensional real Hilbert space, one can show that the induced distribution over the parameters
is given by [
26]
where
. This probability density is normalized over the probability simplex, as in Equation (
19):
The general expression for
given in Equation (
11) remains valid in the real case, as do Equations (
15)–(
17) for the various factors in Equation (
11). Again, the one difference is in
, for which we now use Equation (
23). By combining these ingredients, we arrive at our expression for the real-amplitude Scrooge ensemble:
where, as before, the
’s are the eigenvalues of the density matrix whose Scrooge distribution is being computed.
Though Equation (
25) was derived as a distribution over the set of pure states in real-amplitude quantum theory, it reads as a probability distribution over the
-dimensional probability simplex for a classical random variable with
n possible values. One can therefore at least imagine that there might be a classical scenario in which this distribution is natural. In the following section, we identify such a scenario.
4. Communicating with Dice
Ref. [
26] imagined the following classical communication scenario. Alice is trying to convey to Bob the location of a point in an
-dimensional probability simplex. To do this, she constructs a weighted
n-sided die that, for Bob, has the probabilities corresponding to the point that Alice is trying to convey. She then sends the die to Bob, who rolls the die many times in order to estimate the probabilities of the various possible outcomes. However, the information transmission is limited in that Bob is allowed only a fixed number of rolls—let us call this number
N (perhaps the die automatically self-destructs after
N rolls). So, Bob will always have an imperfect estimate of the probabilities that Alice is trying to convey. Alice and Bob are allowed to choose in advance a discrete set of points in the probability simplex—these are the points representing the set of signals Alice might try to send—and they choose this set of points, along with their
a priori weights, so as to maximize the mutual information between the identity of the point being conveyed and the result of Bob’s rolls of the die. The main result of that paper was that in the limit of a large
N, the optimal distribution of points in the probability simplex approximates the continuous distribution over the simplex expressed by the following probability density:
where the
s are the probabilities (we use a hat in our labels of probability densities that arise in a classical context). This result is interesting because it is the same probability density as the one induced by the uniform distribution over the unit sphere in real Hilbert space (Equation (
23) above). Thus, in a world based on real-amplitude quantum theory as opposed to the complex-amplitude theory, there is a sense in which one could say that nature optimizes the transfer of information.
That paper—and closely related papers [
27,
28]—deal only with the uniform distribution over the unit sphere, not with non-trivial Scrooge distributions. In the present section, we consider a modification of the above communication scenario, and in the next section, we show that this modified scheme yields the real-amplitude Scrooge distribution.
A natural way to generalize the above communication scheme is this: let the allowed number N of rolls of the die vary from one die to another (that is, some dice last longer than others before they self-destruct). Now, once N is allowed to vary, it makes sense to let N itself be another random variable that conveys information. We are thus led to consider the following scenario.
Alice is trying to convey to Bob an ordered n-tuple of non-negative real numbers (Alice and Bob agree in advance on a specific set of such ordered n-tuples, any one of which Alice might try to convey). Let us refer to such an n-tuple as a “signal.” In order to convey her signal, Alice sends Bob an n-sided die that Bob then begins to roll over and over, keeping track of the number of times each outcome occurs. is the number of times that the outcome j occurs. At some point, the die self-destructs. Alice has constructed both the weighting of the die and the self-destruction mechanism so that the average value of is .
However, both the rolling of the die and its duration are probabilistic, and Alice cannot completely control either the individual numbers
or their sum. For any given signal
, we assume that each
is distributed independently according to a Poisson distribution with mean value
:
This is equivalent to assuming that the
total number
N of rolls of the die is Poisson distributed with a mean value of
and that for a given total number of rolls, the numbers of occurrences of the individual outcomes are distributed according to a multinomial distribution with weights
. That is, we are assuming the usual statistics for rolling a die, together with a Poisson distribution for the total number of rolls (another model we could have used is to have Alice send Bob a radioactive sample that can decay in
n ways and that Bob is allowed to observe with detectors for a fixed amount of time).
To make the problem interesting, and to keep Alice from being able to send Bob an arbitrarily large amount of information in a single die, limits are placed on the sizes of . This is done by imposing, for each j, an upper bound (script M) on the expectation value of the number of times the j outcome occurs. This expectation value is an average over all the possible signals that Alice might send.
We also need to say in what sense Alice and Bob are optimizing their communication. There are a number of reasonable options for doing this—e.g., we could say they maximize the mutual information, or minimize the probability of error for a fixed number of signals—but it is likely that many of these formulations will be essentially equivalent when the values of
become very large. Here, we take a simple, informal approach. We say that, in order to make the various signals distinguishable from each other, Alice and Bob choose their
n-tuples
so that neighboring signals, say
and
, are at least a certain distance from each other, and we use the Fisher information metric to measure distance. Specifically, we require the Fisher information distance between the probability distributions
and
to be greater than or equal to a specified value
(or, equivalently for small
, we require the Kullback–Leibler divergence to be at least
). For the Poisson distribution and for small values of the ratios
, this condition works out to be
For our purposes the exact value of
is not important. We also assume that the various signals have equal
a priori probabilities. This is a natural choice if one wants to convey as much information as possible. Under these assumptions, Alice and Bob’s aim is to maximize the number of distinct signals.
The analysis will be much simpler if we parameterize each die not by
, but rather by the variables
Then, for neighboring signals we can write
so that the condition in Equation (
28) becomes
That is, in the space parameterized by
, we want the points representing Alice’s signals to be evenly separated from each other. Thus Alice’s signals will be roughly uniformly distributed over some region of
-space—she wants to pack in as many signals as possible without exceeding the bounds
on the expectation values of the
s. In what follows, we approximate this discrete but roughly uniform distribution of the values of
by a continuous probability distribution. The probability density is zero outside the region where Alice’s possible signals lie; inside that region, it has a constant value of
, where
V is the volume of the region.
The communication problem then becomes a straightforward geometry problem—within the “positive” section of
-space (that is, the section in which each
is non-negative), the aim is to find the region
of largest volume that satisfies the constraints
where
is the volume of
. We maximize the volume because Alice’s signals have a fixed packing density within
; thus the larger the volume, the more signals Alice has at her disposal.
It is not hard to see that the solution to this geometry problem is to make region
the positive section of a certain ellipsoid centered at the origin. To see this, the conditions (
32) can be written as
Now, let
. In terms of the
s, the above conditions become
where
is the region of
-space corresponding to the region
of
-space. In particular, the equation obtained by summing these
n conditions must also be true:
where
. That is, the average squared distance from the origin over region
must be equal to
n. The maximum volume region
satisfying this one condition is the positive section of a sphere, and one can work out that the radius of the sphere must be
. Moreover, that region also satisfies all of the conditions (
34). So, that same region is the maximum volume region that satisfies those conditions as well. Going back to the
’s, we see that the maximum volume region satisfying the conditions (
32) is the positive section of an ellipsoid, with semi-axis lengths
Thus, the strategy that Alice and Bob adopt is to choose a set of closely packed signals with some minimum separation in -space that occupies the positive section of an ellipsoid centered at the origin. Again, in this paper, we treat this discrete but roughly uniform distribution of signals as if it were actually uniform. This approximation becomes more and more reasonable as the values of the s increase.
5. A Distribution over the Probability Simplex
So far, we have not made any connection between our communication problem and the real-amplitude Scrooge distribution. We do this now by seeing how the uniform distribution over the ellipsoid in -space induces a certain probability distribution over the -dimensional probability simplex for Alice’s n-sided die. We define this probability distribution as follows.
Let us imagine many rounds of communication from Alice to Bob: she has sent him many dice for which the expected numbers of occurrences of the various outcomes,
, cover a representative range of values: the corresponding values of
are distributed fairly uniformly over the region
in
-space. Bob has rolled each of these dice as many times as it can be rolled. Now consider a small region of the probability simplex, say the region
for which the probability of the
jth outcome lies between
and
for
. Some of the dice Alice has sent to Bob have probabilities lying in this region. The weight we want to attach to the region
is, roughly speaking, the fraction of the total number of rolls that came from dice in this region. Note that for a die at location
, the expectation value of the number of times it will be rolled is
. So, we multiply the density of signals by the factor
to get the “density of rolls.” These considerations lead us to the following definition of the weight
that we assign to the infinitesimal region
:
Here,
is the cone (within the region
) representing dice for which the probabilities of the outcomes lie in
:
Our use of the weighting factor
is reminiscent of the “adjustment” stage in the construction of the GAP measure in Refs. [
5,
6,
7,
8], and the integration over
is reminiscent of the projection stage of that same construction. We can express
more formally as
where
is the Dirac delta function.
It is not difficult to obtain an explicit expression for
starting with Equation (
39). For example, in the integral appearing in the numerator of that equation, one can use the integration variables
and
, where
. Then,
becomes
, and the integral becomes straightforward. Here, though, we take a different path to the same answer, starting with Equation (
37). This latter approach turns out to be more parallel to our derivation of the Scrooge distribution in the quantum mechanical setting.
First, note that the numerator in Equation (
37) can be written as
where
is the largest value of
over all points in
satisfying
for
. We get Equation (
40) by writing
as
, with some constant
k, for the purpose of integrating over the cone. We can find the value of
by finding the point of intersection between (i) the ellipsoid that defines the boundary of
, given by
and (ii) the line parameterized by
and defined by the equations
The value of
at this intersection point is
We can therefore rewrite Equation (
40) as
Meanwhile, it follows from Equation (
32) that the denominator in Equation (
37) is
Our next step is to compare
to the analogous distribution
induced by the uniform distribution of the vector
—the same
as in
Section 4—over its domain
(recall that
is the positive section of a sphere):
Here,
is the cone in
for which
. We can immediately write down an explicit expression for
. It is the same as the distribution (
23) on the probability simplex induced by the uniform distribution over the unit sphere in the
n-dimensional real Hilbert space—the extra radial dimension represented by
has no bearing on the distribution over the probability simplex. Thus,
The expression for
is determined by finding the factors by which the numerator and denominator in Equation (
46) change when the sphere in
-space is stretched into an ellipsoid in
-space. In this transformation (in which
), the relation between
y (in Equation (
46)) and
x (in Equation (
37)) is given by
, where
g takes the point
in the probability simplex to the point
.
Essentially, any appearance of
in our expression (
37) for
becomes a 1 in Equation (
46). Thus, according to Equation (
44), when we transform from
to
, the numerator in Equation (
46) is multiplied by
and according to Equation (
45), in this same transformation, the denominator in Equation (
46) is multiplied by
For both the transitions
and
, the volume increases by a factor of
. So, these volume factors cancel out. By inserting the other factors from Equations (
48) and (
49), it is found that
where
is the Jacobian of
y with respect to
x.
Let us now write
y explicitly in terms of
x:
From this, we can get the Jacobian (very much like the one in Equation (
16)):
By inserting the results of Equations (
51) and (
52) into Equation (
50), we arrive at
where
. This is essentially the same as the expression (
25) obtained earlier as the real-amplitude Scrooge distribution. The agreement can be made more explicit by defining the ratios
, in which case Equation (
53) becomes exactly identical to Equation (
25), with these
s playing the role of the eigenvalues of the density matrix.
Note that in the above derivation, we see an analog of distortion. The stretching of the sphere in -space into an ellipsoid in -space is very much like distortion, though in place of the notion of a density matrix, we have a uniform distribution within the sphere or ellipsoid.
It may seem that our communication set-up, in which Alice sends a die equipped with a probabilistic self-destruction mechanism, is rather artificial. However, the mathematics is actually fairly simple and natural. We are considering a set of Poisson-distributed random variables and are basically constructing a measure on the set of values of these variables based on distinguishability (this is the measure derived from the Fisher information metric). That measure then induces a measure on the probability simplex which agrees with the real-amplitude Scrooge distribution.
6. A Classical Interpretation of the Complex-Amplitude Scrooge Distribution
We now show how to modify the above classical communication scenario to arrive at the original, complex-amplitude Scrooge distribution.
Not surprisingly, we begin by doubling the number of sides of Alice’s dice. Let the outcomes be labeled
. The communication scheme is exactly as it was in
Section 4, except that instead of placing an upper bound on the expectation value of the number of times each individual outcome occurs, the
and
outcomes are grouped together and an upper bound
is placed on the expectation value of the total number of times the two
j outcomes occur. This is done for each
. Again, Alice and Bob are asked to maximize the number of distinguishable signals under this constraint, where “distinguishable” again means having a Fisher-distance separation of at least
.
As before, it is easiest to view the problem in
-space; let us label the variables in the space
and
. We now look for the maximum-volume region
of the positive section of
-space satisfying the constraints
In terms of the variables
and
, the constraints become
where
is the region in
-space corresponding to
. Upon summing these
n constraints, the equation
is obtained, where
. Maximizing the volume under this constraint again gives a sphere in
-space, which becomes an ellipsoid in
-space (restricted to the positive section).
Continuing as before, one finds that the induced probability distribution over the
-dimensional probability simplex associated with a
-sided die is the analog of Equation (
53), the
n values
now being replaced by the
values
.
where
. Here,
and
are the probabilities of the outcomes
and
, and
refers to the point
in the
-dimensional probability simplex (the value of
is determined by the requirement that the probabilities sum to unity).
Finally, a distribution over the
-dimensional probability simplex is obtained by ignoring the difference between the outcomes
and
. We can imagine an observer who, unlike Alice and Bob, cannot see the
a and
b. For this “
-blind” observer, the distribution of Equation (
57) looks like the following distribution over the
-dimensional probability simplex:
Here,
is the Dirac delta function and the integral is over the
-dimensional probability simplex.
The integral in Equation (
58) is straightforward, and it can be found that
This is the same as the original Scrooge distribution of Equation (
18). The role of the eigenvalues of the density matrix is now played by the set of values
, where, again,
is the maximum allowed expectation value of the number of times that the outcomes
and
occur.
7. Discussion
In this paper we have shown how the real-amplitude version of the Scrooge distribution emerges naturally from a classical communication scenario in which information is transmitted via the values of several random variables . Essentially, the real-amplitude Scrooge distribution, regarded as a probability distribution over the probability simplex, is derived from an underlying distribution based on distinguishability. Our analysis includes a transformation that plays something like the role of a distortion: in place of a density matrix, what is distorted is a distribution over the space of potential signals.
In order to get the original complex-amplitude Scrooge distribution for dimension n, we needed to consider a case with twice as many random variables, grouped into pairs, and then we imagined an observer for whom only the sum of the variables within each pair was observable.
The reader will probably have noticed that the role played by the concept of information in our classical communication problem seems to be exactly the opposite of the role it plays in the quantum origin of the Scrooge distribution. In quantum theory, the Scrooge distribution is the distribution over pure states that, upon measurement, provides an observer with the least possible amount of information. In contrast, in our classical communication scenario, the Scrooge distribution emerges from a requirement that Alice convey as much information as possible to Bob. What is common to both cases is that the information-based criterion favors a distribution that is highly spread out over the probability simplex. In the quantum case, a distribution spread out over many non-orthogonal states tends to make it difficult for an observer to gain information about the state. In the classical case, Alice and Bob want to spread their signals as widely as possible over the space of possibilities in order to maximize the number of distinguishable signals. Thus, though the two scenarios are quite different, their extremization criteria have similar effects.
An intriguing aspect of our classical scenario is that the probability simplex is not itself taken as the domain in which the problem is formulated. Instead, the problem is formulated in terms of the number of times each outcome occurs. The distribution over the probability simplex is a secondary concept, being derived from a more fundamental distribution over the space of the numbers of occurrences of the outcomes. That is, the
values are more fundamental in the problem than the probabilities of the outcomes, which are defined in terms of the
s by the equation
. In this specific respect, then, the effort to find a classical interpretation of the Scrooge distribution seems to lead us away from the models studied in Refs. [
26,
28], in which the set of frequencies of occurrence of the measurement outcomes was the only source of information considered.
It is interesting to ask whether this feature of our scenario is necessary in order to get the Scrooge distribution classically. To address this question, in
Appendix A we consider another classical communication problem, in which we impose a separate restriction for each outcome as in
Section 4, but now with Alice’s signals consisting purely of probabilities (which are estimated by Bob through the observed frequencies of occurrence). For simplicity, we restrict our attention to the most basic case, in which there are only two possible outcomes—so Alice’s die is now a coin to be tossed—and we are aiming just for the real-amplitude Scrooge distribution as opposed to the complex-amplitude version. We find that the resulting probability distribution over the probability simplex is
not of the same form as the real-amplitude Scrooge distribution. This result can be taken as one bit of evidence that it is indeed necessary to go beyond the probability simplex and to work in a space of one additional dimension in order to obtain the Scrooge distribution classically. In this connection, it is worth noting that something very similar has been seen in research on
subentropy—certain simple relations between subentropy and the Shannon entropy can be obtained only by lifting the normalization restriction that defines the probability simplex and working in the larger space of unnormalized
n-tuples [
21,
23].
Finally, one might wonder about the potential significance of our need to invoke an “-blind” observer in order to obtain the complex-amplitude Scrooge distribution. It is well known that the number of independent parameters required to specify a pure quantum state (of a system with a finite-dimensional Hilbert space) is exactly twice the number of independent probabilities associated with a complete orthogonal measurement on the system. Here, we are seeing another manifestation of this factor of two: the classical measurement outcomes, corresponding to the sides of a rolled die, have to be grouped into pairs, and we need to imagine an observer incapable of distinguishing between the elements of any pair. In our actual quantum world, one can reasonably ask whether there is any interesting sense in which we ourselves are “-blind.” This question, though, lies well beyond the scope of the present paper.