[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Next Article in Journal
Laboratory Observations of Linkage of Preslip Zones Prior to Stick-Slip Instability
Previous Article in Journal
On the Use of Transfer Entropy to Investigate the Time Horizon of Causal Influences between Signals
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Frequentist and Bayesian Quantum Phase Estimation

1
Institute of Theoretical Physics and Department of Physics, State Key Laboratory of Quantum Optics and Quantum Optics Devices, Collaborative Innovation Center of Extreme Optics, Shanxi University, Taiyuan 030006, China
2
QSTAR, INO-CNR and LENS, Largo Enrico Fermi 2, 50125 Firenze, Italy
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Entropy 2018, 20(9), 628; https://doi.org/10.3390/e20090628
Submission received: 7 July 2018 / Revised: 6 August 2018 / Accepted: 10 August 2018 / Published: 23 August 2018
(This article belongs to the Special Issue Advances in Quantum Metrology)
Figure 1
<p>(<b>a</b>) Bias <math display="inline"><semantics> <mrow> <msub> <mrow> <mo>〈</mo> <msub> <mi>θ</mi> <mi>MLE</mi> </msub> <mo>〉</mo> </mrow> <mrow> <mrow> <mi mathvariant="bold-italic">μ</mi> <mo>|</mo> </mrow> <msub> <mi>θ</mi> <mn>0</mn> </msub> </mrow> </msub> <mo>−</mo> <msub> <mi>θ</mi> <mn>0</mn> </msub> </mrow> </semantics></math> (green dots) as function of <span class="html-italic">m</span> with error bars <math display="inline"><semantics> <msub> <mrow> <mo>(</mo> <mo>Δ</mo> <msub> <mi>θ</mi> <mi>MLE</mi> </msub> <mo>)</mo> </mrow> <mrow> <mrow> <mi mathvariant="bold-italic">μ</mi> <mo>|</mo> </mrow> <msub> <mi>θ</mi> <mn>0</mn> </msub> </mrow> </msub> </semantics></math>. The red lines are <math display="inline"><semantics> <mrow> <mo>±</mo> <mo>Δ</mo> <msub> <mi>θ</mi> <mi>CRLB</mi> </msub> <mo>=</mo> <mo>±</mo> <mrow> <mo>|</mo> <mi>d</mi> <msub> <mrow> <mo>〈</mo> <msub> <mi>θ</mi> <mi>MLE</mi> </msub> <mo>〉</mo> </mrow> <mrow> <mrow> <mi mathvariant="bold-italic">μ</mi> <mo>|</mo> </mrow> <msub> <mi>θ</mi> <mn>0</mn> </msub> </mrow> </msub> <mo>/</mo> <mi>d</mi> <msub> <mi>θ</mi> <mn>0</mn> </msub> <mo>|</mo> </mrow> <mo>/</mo> <msqrt> <mrow> <mi>m</mi> <mi>F</mi> <mo>(</mo> <msub> <mi>θ</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> </msqrt> </mrow> </semantics></math>; (<b>b</b>) variance of the maximum likelihood estimator multiplied by the Fisher information, <math display="inline"><semantics> <mrow> <mi>m</mi> <mi>F</mi> <mrow> <mo>(</mo> <msub> <mi>θ</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> <msub> <mrow> <mo>(</mo> <msup> <mo>Δ</mo> <mn>2</mn> </msup> <msub> <mi>θ</mi> <mi>MLE</mi> </msub> <mo>)</mo> </mrow> <mrow> <mrow> <mi mathvariant="bold-italic">μ</mi> <mo>|</mo> </mrow> <msub> <mi>θ</mi> <mn>0</mn> </msub> </mrow> </msub> </mrow> </semantics></math> (red circles), as a function of the sample size <span class="html-italic">m</span>. It is compared to the bias <math display="inline"><semantics> <msup> <mrow> <mo stretchy="false">(</mo> <mi>d</mi> <msub> <mrow> <mo>〈</mo> <msub> <mi>θ</mi> <mi>MLE</mi> </msub> <mo>〉</mo> </mrow> <mrow> <mrow> <mi mathvariant="bold-italic">μ</mi> <mo>|</mo> </mrow> <msub> <mi>θ</mi> <mn>0</mn> </msub> </mrow> </msub> <mo>/</mo> <mi>d</mi> <msub> <mi>θ</mi> <mn>0</mn> </msub> <mo stretchy="false">)</mo> </mrow> <mn>2</mn> </msup> </semantics></math> (red dashed line). We recall that <math display="inline"><semantics> <mrow> <msub> <mi>θ</mi> <mn>0</mn> </msub> <mo>=</mo> <mi>π</mi> <mo>/</mo> <mn>4</mn> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>F</mi> <mo>(</mo> <msub> <mi>θ</mi> <mn>0</mn> </msub> <mo>)</mo> <mo>=</mo> <mn>4</mn> </mrow> </semantics></math> here.</p> ">
Figure 2
<p>(<b>a</b>) comparison between unbiased frequentist bounds for the example considered in this manuscript, Equation (<a href="#FD1-entropy-20-00628" class="html-disp-formula">1</a>): the CRLB <math display="inline"><semantics> <mrow> <mi>m</mi> <msup> <mo>Δ</mo> <mn>2</mn> </msup> <msubsup> <mi>θ</mi> <mi>CRLB</mi> <mi>ub</mi> </msubsup> <mo>=</mo> <mn>1</mn> <mo>/</mo> <mi>F</mi> <mrow> <mo>(</mo> <msub> <mi>θ</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> </mrow> </semantics></math> (black line), the Hammersley–Chapman–Robbins bound <math display="inline"><semantics> <mrow> <mi>m</mi> <msup> <mo>Δ</mo> <mn>2</mn> </msup> <msubsup> <mi>θ</mi> <mi>ChRB</mi> <mi>ub</mi> </msubsup> </mrow> </semantics></math> (Equation (<a href="#FD15-entropy-20-00628" class="html-disp-formula">15</a>), filled triangles) and the extended Hammersley–Chapman–Robbins bound <math display="inline"><semantics> <mrow> <mi>m</mi> <msup> <mo>Δ</mo> <mn>2</mn> </msup> <msubsup> <mi>θ</mi> <mi>EChRB</mi> <mi>ub</mi> </msubsup> </mrow> </semantics></math> (Equation (<a href="#FD18-entropy-20-00628" class="html-disp-formula">18</a>), empty triangles); (<b>b</b>) values of <math display="inline"><semantics> <mi>λ</mi> </semantics></math> achieving the supremum in Equation (<a href="#FD15-entropy-20-00628" class="html-disp-formula">15</a>), as a function of <span class="html-italic">m</span>.</p> ">
Figure 3
<p>Comparisons of phase estimation variance as a function of the sample size for Bayesian and frequentist data analysis under different prior distributions, (<b>a</b>) <math display="inline"><semantics> <mrow> <mi>α</mi> <mo>=</mo> <mo>−</mo> <mn>100</mn> </mrow> </semantics></math>, (<b>b</b>) <math display="inline"><semantics> <mrow> <mi>α</mi> <mo>=</mo> <mo>−</mo> <mn>10</mn> </mrow> </semantics></math>, (<b>c</b>) <math display="inline"><semantics> <mrow> <mi>α</mi> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>, (<b>d</b>) <math display="inline"><semantics> <mrow> <mi>α</mi> <mo>=</mo> <mn>10</mn> </mrow> </semantics></math>. In all figures, Red circles (frequentist) are <math display="inline"><semantics> <mrow> <mi>m</mi> <msub> <mrow> <mo>(</mo> <msup> <mo>Δ</mo> <mn>2</mn> </msup> <msub> <mi>θ</mi> <mi>BL</mi> </msub> <mo>)</mo> </mrow> <mrow> <mrow> <mi mathvariant="bold-italic">μ</mi> <mo>|</mo> </mrow> <msub> <mi>θ</mi> <mn>0</mn> </msub> </mrow> </msub> </mrow> </semantics></math>, the red dashed line is the Cramér-Rao lower bound <math display="inline"><semantics> <mrow> <mi>m</mi> <msup> <mo>Δ</mo> <mn>2</mn> </msup> <msub> <mi>θ</mi> <mi>CRLB</mi> </msub> </mrow> </semantics></math>, Equation (<a href="#FD8-entropy-20-00628" class="html-disp-formula">8</a>). Blue circles (Bayesian) are <math display="inline"><semantics> <mrow> <mi>m</mi> <msub> <mrow> <mo>(</mo> <msup> <mo>Δ</mo> <mn>2</mn> </msup> <msub> <mi>θ</mi> <mi>BL</mi> </msub> <mo>)</mo> </mrow> <mrow> <mrow> <mi mathvariant="bold-italic">μ</mi> <mo>,</mo> <mi>θ</mi> <mo>|</mo> </mrow> <msub> <mi>θ</mi> <mn>0</mn> </msub> </mrow> </msub> </mrow> </semantics></math>, the blue solid line is the likelihood-averaged Ghosh bound <math display="inline"><semantics> <mrow> <mi>m</mi> <msup> <mo>Δ</mo> <mn>2</mn> </msup> <msub> <mi>θ</mi> <mi>aGB</mi> </msub> </mrow> </semantics></math>, Equation (<a href="#FD25-entropy-20-00628" class="html-disp-formula">25</a>). The inset in each panel is <math display="inline"><semantics> <mrow> <msub> <mi>p</mi> <mi>pri</mi> </msub> <mrow> <mo>(</mo> <mi>θ</mi> <mo>)</mo> </mrow> </mrow> </semantics></math>, Equation (<a href="#FD26-entropy-20-00628" class="html-disp-formula">26</a>), for the corresponding values of <math display="inline"><semantics> <mi>α</mi> </semantics></math>.</p> ">
Figure 4
<p>Comparisons of average posterior Bayesian variance, <math display="inline"><semantics> <mrow> <mi>m</mi> <msub> <mrow> <mo>(</mo> <msup> <mo>Δ</mo> <mn>2</mn> </msup> <msub> <mi>θ</mi> <mi>BL</mi> </msub> <mo>)</mo> </mrow> <mrow> <mi mathvariant="bold-italic">μ</mi> <mo>,</mo> <mi>θ</mi> </mrow> </msub> </mrow> </semantics></math> (dots), as a function of the sample size <span class="html-italic">m</span> under different prior distributions, (<b>a</b>) <math display="inline"><semantics> <mrow> <mi>α</mi> <mo>=</mo> <mo>−</mo> <mn>100</mn> </mrow> </semantics></math>, (<b>b</b>) <math display="inline"><semantics> <mrow> <mi>α</mi> <mo>=</mo> <mo>−</mo> <mn>10</mn> </mrow> </semantics></math>, (<b>c</b>) <math display="inline"><semantics> <mrow> <mi>α</mi> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>, (<b>d</b>) <math display="inline"><semantics> <mrow> <mi>α</mi> <mo>=</mo> <mn>10</mn> </mrow> </semantics></math>. This variance is compared to to the average Ghosh bound for random parameters <math display="inline"><semantics> <mrow> <mi>m</mi> <mo>(</mo> <msup> <mo>Δ</mo> <mn>2</mn> </msup> <msub> <mi>θ</mi> <mi>aGBr</mi> </msub> <mo>)</mo> </mrow> </semantics></math> (grey line), the Van Trees bound <math display="inline"><semantics> <mrow> <mi>m</mi> <mo>(</mo> <msup> <mo>Δ</mo> <mn>2</mn> </msup> <msub> <mi>θ</mi> <mi>VTB</mi> </msub> <mo>)</mo> </mrow> </semantics></math> (green line), the Ziv–Zakai bound <math display="inline"><semantics> <mrow> <mi>m</mi> <mo>(</mo> <msup> <mo>Δ</mo> <mn>2</mn> </msup> <msub> <mi>θ</mi> <mi>ZZB</mi> </msub> <mo>)</mo> </mrow> </semantics></math> (red line) and <math display="inline"><semantics> <mrow> <mn>1</mn> <mo>/</mo> <mi>F</mi> <mo>(</mo> <msub> <mi>θ</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> </semantics></math> (black horizontal line). The inset in each panel is the prior <math display="inline"><semantics> <mrow> <msub> <mi>p</mi> <mi>pri</mi> </msub> <mrow> <mo>(</mo> <mi>θ</mi> <mo>)</mo> </mrow> </mrow> </semantics></math>, Equation (<a href="#FD26-entropy-20-00628" class="html-disp-formula">26</a>), for the corresponding values of <math display="inline"><semantics> <mi>α</mi> </semantics></math>.</p> ">
Versions Notes

Abstract

:
Frequentist and Bayesian phase estimation strategies lead to conceptually different results on the state of knowledge about the true value of an unknown parameter. We compare the two frameworks and their sensitivity bounds to the estimation of an interferometric phase shift limited by quantum noise, considering both the cases of a fixed and a fluctuating parameter. We point out that frequentist precision bounds, such as the Cramér–Rao bound, for instance, do not apply to Bayesian strategies and vice versa. In particular, we show that the Bayesian variance can overcome the frequentist Cramér–Rao bound, which appears to be a paradoxical result if the conceptual difference between the two approaches are overlooked. Similarly, bounds for fluctuating parameters make no statement about the estimation of a fixed parameter.

1. Introduction

The estimation of a phase shift using interferometric techniques is at the core of metrology and sensing [1,2,3]. Applications range from the definition of the standard of time [4] to the detection of gravitational waves [5,6]. The general problem can be concisely stated as the search for optimal strategies to minimize the phase estimation uncertainty. The noise that limits the achievable phase sensitivity can have a “classical” or a “quantum” nature. Classical noise originates from the coupling of the interferometer with some external source of disturbance, like seismic vibrations, parasitic magnetic fields or from incoherent interactions within the interferometer. Such noise can, in principle, be arbitrarily reduced, e.g., by shielding the interferometer from external noise or by tuning interaction parameters to ensure a fully coherent time evolution. The second source of uncertainty has an irreducible quantum origin [7,8]. Quantum noise cannot be fully suppressed, even in the idealized case of the creation and manipulation of pure quantum states. Using classically-correlated probe states, it is possible to reach the so-called shot noise or standard quantum limit, which is the limiting factor for the current generation of interferometers and sensors [9,10,11,12]. Strategies involving probe states characterized by squeezed quadratures [13] or entanglement between particles [14,15,16,17,18,19] are able to overcome the shot noise, the ultimate quantum bound being the so-called Heisenberg limit. Quantum noise reduction in phase estimation has been demonstrated in several proof-of-principle experiments with atoms and photons [20,21].
There is a vast amount of literature dealing with the parameter estimation problem that has been mostly developed following two different approaches [22,23,24]: frequentist and Bayesian. Both approaches have been investigated in the context of quantum phase estimation [18,20,25,26,27,28,29,30,31] and implemented/tested experimentally [32,33,34,35,36]. They build on conceptually different meanings attached to the word “probability” and their respective results provide conceptually different information on the estimated parameters and their uncertainties.
In the limit of a large number of repeated measurements, the sensitivity reached by the frequentist and Bayesian methods generally agree: this fact has very often induced the belief that the two paradigms can be interchangeably used in the phase estimation theory without acknowledging their irreconcilable nature. Overlooking these differences is not only conceptually inconsistent but can even create paradoxes, as, for instance, the existence of ultimate bounds in sensitivity proven in one paradigm that can be violated in the other.
In this manuscript, we directly compare the frequentist and the Bayesian parameter estimation theory. We study different sensitivity bounds obtained in the two frameworks and highlight the conceptual differences between the two. Besides the asymptotic regime of many repeated measurements, we also study bounds that are relevant for small samples. In particular, we show that the Bayesian variance can overcome the frequentist Cramér–Rao bound. The Cramér–Rao bound is a mathematical theorem providing the highest possible sensitivity in a phase estimation problem. The fact that the Bayesian sensitivity can be higher than the Cramér–Rao bound is therefore paradoxical. The paradox is solved by clarifying the conceptual differences between the frequentist and the Bayesian approaches, which therefore cannot be directly compared. Such difference should be considered when discussing theoretical and experimental figures of merit in interferometric phase estimation.
Our results are illustrated with a simple test model [37,38]. We consider N qubits with basis states | 0 and | 1 , initially prepared in a (generalized) GHZ state | GHZ = ( | 0 N + | 1 N ) / 2 , with all particles being either in | 1 or in | 0 . The phase-encoding is a rotation of each qubit in the Bloch sphere | 0 e i θ / 2 | 0 and | 1 e + i θ / 2 | 1 , which transforms the | GHZ state into | GHZ ( θ ) = ( e i N θ / 2 | 0 N + e + i N θ / 2 | 1 N ) / 2 . The phase is estimated by measuring the parity ( 1 ) N 0 , where N 0 is the number of particles in the state | 0 [37,39,40,41]. The parity measurement has two possible results μ = ± 1 that are conditioned by the “true value of the phase shift” θ 0 with probability p ( ± 1 | θ 0 ) = ( 1 ± cos N θ 0 ) / 2 . The probability to observe the sequence of results μ = { μ 1 , μ 2 , , μ m } in m independent repetitions of the experiment (with same probe state and phase encoding transformation) is
p ( μ | θ 0 ) = i = 1 m p ( μ i | θ 0 ) = 1 + cos N θ 0 2 m + 1 cos N θ 0 2 m ,
where m ± is the number of the observed results ± 1 , respectively. Notice that p ( μ | θ 0 ) is the conditional probability for the measurement outcome μ , given that the true value of the phase shift is θ 0 (which we consider to be unknown in the estimation protocol). Equation (1) provides the probability that will be used in the following sections for the case N = 2 and θ 0 [ 0 , π / 2 ] . Section 2 and Section 3 deal with the case where θ 0 has a fixed value and in Section 4 we discuss precision bounds for a fluctuating phase shift.

2. Frequentist Approach

In the frequentist paradigm, the phase (assumed having a fixed but unknown value θ 0 ) is estimated via an arbitrarily chosen function of the measurement results, θ est ( μ ) , called the estimator. Typically, θ est ( μ ) is chosen by maximizing the likelihood of the observed data (see below). The estimator, being a function of random outcomes, is itself a random variable. It is characterized by a statistical distribution that has an objective, measurable character. The relative frequency with which the event θ est occurs converges to a probability asymptotically with the number of repeated experimental trials.

2.1. Frequentist Risk Functions

Statistical fluctuations of the data reflect the statistical uncertainty of the estimation. This is quantified by the variance,
Δ 2 θ est μ | θ 0 = μ θ est ( μ ) θ est μ | θ 0 2 p ( μ | θ 0 ) ,
around the mean value θ est μ | θ 0 = μ θ est ( μ ) p ( μ | θ 0 ) , the sum extending over all possible measurement sequences (for fixed θ 0 and m). An important class is that of locally unbiased estimators, namely those satisfying θ est μ | θ 0 = θ 0 and d θ est μ | θ d θ | θ = θ 0 = 1 (see, for instance, [42]). An estimator is unbiased if and only if it is locally unbiased at every θ 0 .
The quality of the estimator can also be quantified by the mean square error (MSE) [23]
MSE ( θ est ) μ | θ 0 = μ θ est ( μ ) θ 0 2 p ( μ | θ 0 ) ,
giving the deviation of θ est from the true value of the phase shift θ 0 . It is related to Equation (2) by the relation
MSE ( θ est ) μ | θ 0 = Δ 2 θ est μ | θ 0 + θ est μ | θ 0 θ 0 2 .
In the frequentist approach, often the variance is not considered as a proper way to quantify the goodness of an estimator. For instance, an estimator that always gives the same value independently of the measurement outcomes is strongly biased: it has zero variance but a large MSE that does not scale with the number of repeated measurements. Notice that the MSE cannot be accessed from the experimentally available data since the true value θ 0 is unknown. In this sense, only the fluctuations of θ est around its mean value, i.e., the variance ( Δ 2 θ est ) μ | θ 0 , have experimental relevance. For unbiased estimators, Equations (2) and (4) coincide. In general, since the bias term in Equation (4) is never negative, MSE ( θ est ) μ | θ 0 Δ 2 θ est μ | θ 0 and any lower bound on ( Δ 2 θ est ) μ | θ 0 automatically provides a lower bound on MSE ( θ est ) μ | θ 0 but not vice versa. In the following section, we therefore limit our attention to bounds on ( Δ 2 θ est ) μ | θ 0 . The distinction between the two quantities becomes more important in the case of a fluctuating phase shift θ 0 , where the bias can affect the corresponding bounds in different ways. We will see this explicitly in Section 4.

2.2. Frequentist Bounds on Phase Sensitivity

2.2.1. Barankin Bound

The Barankin bound (BB) provides the tightest lower bound to the variance (2) [43]. It can be proven to be always (for any m) saturable, in principle, by a specific local (i.e., dependent of θ 0 ) estimator and measurement observable. Of course, since the estimator that saturates the BB depends on the true value of the parameter (which is unknown), the bound is of not much use in practice. Nevertheless, the BB plays a central role, from the theoretical point of view, as it provides a hierarchy of weaker bounds which can be used in practice with estimators that are asymptotically unbiased. The BB can be written as [44]
Δ 2 θ est μ | θ 0 Δ 2 θ BB sup θ i , a i , n i = 1 n a i [ θ est μ | θ i θ est μ | θ 0 ] 2 μ i = 1 n a i L ( μ | θ i , θ 0 ) 2 p ( μ | θ 0 ) ,
where L ( μ | θ i , θ ) = p ( μ | θ i ) / p ( μ | θ ) is generally indicated as likelihood ratio and the supremum is taken over n parameters a i R , which are arbitrary real numbers, and θ i , which are arbitrary phase values in the parameter domain. For unbiased estimators, we can replace θ est μ | θ i = θ i for all i and the BB becomes independent of the estimator:
Δ 2 θ est μ | θ 0 Δ 2 θ BB ub sup θ i , a i , n i = 1 n a i [ θ i θ 0 ] 2 μ i = 1 n a i L ( μ | θ i , θ 0 ) 2 p ( μ | θ 0 ) .
A derivation of the BB is presented in Appendix A.
The explicit calculation of Δ 2 θ BB is impractical in most applications due to the number of free variables that must be optimized. However, the BB provides a strict hierarchy of bounds of increasing complexity that can be of great practical importance. Restricting the number of variables in the optimization can provide local lower bounds that are much simpler to determine at the expense of not being saturable in general, namely, for an arbitrary number of measurements. Below, we demonstrate the following hierarchy of bounds:
Δ 2 θ est μ | θ 0 Δ 2 θ BB Δ 2 θ EChRB Δ 2 θ ChRB Δ 2 θ CRLB ,
where Δ 2 θ CRLB is the Cramér–Rao lower bound (CRLB) [45,46] and Δ 2 θ ChRB is the Hammersley–Chapman–Robbins bound (ChRB) [47,48]. We will also introduce a novel extended version of the ChRB, indicated as Δ 2 θ EChRB .

2.2.2. Cramér–Rao Lower Bound and Maximum Likelihood Estimator

The CRLB is the most common frequentist bound in parameter estimation. It is given by [45,46]:
Δ 2 θ CRLB = d θ est μ | θ 0 d θ 0 2 m F ( θ 0 ) .
The inequality Δ 2 θ est μ | θ 0 Δ 2 θ CRLB is obtained by differentiating θ est μ | θ 0 with respect to θ 0 and using a Cauchy–Schwarz inequality:
d θ est μ | θ 0 d θ 0 2 = μ ( θ est ( μ ) θ est μ | θ 0 ) d p ( μ | θ 0 ) d θ 0 2 m F ( θ 0 ) Δ 2 θ est μ | θ 0 ,
where we have used μ d p ( μ | θ 0 ) d θ 0 = 0 and μ 1 p ( μ | θ 0 ) ( p ( μ | θ ) θ | θ 0 ) 2 = m μ 1 p ( μ | θ 0 ) ( p ( μ | θ ) θ | θ 0 ) 2 valid for m independent measurements, and
F θ 0 = μ 1 p ( μ | θ 0 ) p ( μ | θ ) θ | θ 0 2
is the Fisher information. The equality Δ 2 θ est μ | θ 0 = Δ 2 θ CRLB is achieved if and only if
θ est ( μ ) θ est μ | θ 0 = λ θ 0 d log p ( μ | θ 0 ) d θ 0 ,
with λ θ 0 a parameter independent of μ (while it may depend on θ 0 ). Noticing that d θ est μ | θ 0 d θ 0 = μ θ est ( μ ) f ( θ 0 ) d p ( μ | θ 0 ) d θ 0 , the CRLB can be straightforwardly generalized to any function f ( θ 0 ) independent of μ . In particular, choosing f ( θ 0 ) = θ 0 , we can directly prove that MSE ( θ est ) μ | θ 0 Δ 2 θ CRLB , which also depends on the bias.
Asymptotically in m, the saturation of Equation (8) is obtained for the maximum likelihood estimator (MLE) [22,23,49]. This is the value θ MLE ( μ ) that maximizes the likelihood p ( μ | θ 0 ) (as a function of the parameter θ 0 ) for the observed measurement sequence μ ,
θ MLE ( μ ) arg max θ 0 { p ( μ | θ 0 ) } .
For a sufficiently large sample size m (in the central limit), independently of the probability distribution p ( μ | θ 0 ) , the MLE becomes normally distributed [18,22,23,49]:
p ( θ MLE | θ 0 ) = m F θ 0 2 π e m F θ 0 2 θ 0 θ MLE 2 ( m 1 ) ,
with mean given by the true value θ 0 and variance equal to the inverse of the Fisher information. The MLE is well defined provided that there is a unique maximum in the considered phase interval. In the case of Equation (1), this condition is fulfilled provided that one restrict the phase domain to [ 0 , π / ( 2 N ) ] for instance.
In Figure 1, we plot the results of a maximum likelihood analysis for the example considered in this manuscript. In this case, the MLE is readily calculated and given by θ MLE ( μ ) = 1 2 arccos ( m + m m + + m ) , and the Fisher information is F ( θ 0 ) = N 2 , independent of θ 0 (we recall that N = 2 in our example). In Figure 1a we plot the bias θ MLE μ | θ 0 θ 0 (dots) as a function of m, for θ 0 = π / 4 . Error bars are ± Δ θ CRLB . Notice that θ MLE μ | θ 0 = θ 0 for every m. This does not mean that the estimator is locally unbiased: indeed, the derivative d θ MLE μ | θ 0 / d θ 0 [see panel (b)] is different from 1 for every value of m. We have d θ MLE μ | θ 0 / d θ 0 1 asymptotically in m. In Figure 1b, we plot m F ( θ 0 ) ( Δ 2 θ MLE ) μ | θ 0 as a function of the number of independent measurements m (red dots). This quantity is compared to m F ( θ 0 ) Δ 2 θ CRLB = ( d θ MLE μ | θ 0 / d θ 0 ) 2 (red line). With increasing sample size m, ( Δ 2 θ MLE ) μ | θ 0 1 / m F ( θ 0 ) corresponding to the CRLB for unbiased estimators.

2.2.3. Hammersley–Chapman–Robbins Bound

The ChRB is obtained from Equation (5) by taking n = 2 , a 1 = 1 , a 2 = 1 , θ 1 = θ 0 + λ , θ 2 = θ 0 , and can be written as [47,48]
Δ 2 θ ChRB = sup λ θ est μ | θ 0 + λ θ est μ | θ 0 2 μ p ( μ | θ 0 + λ ) 2 p ( μ | θ 0 ) 1 .
Clearly, restricting the number of parameters in the optimization in Equation (5) leads to a less strict bound. We thus have Δ 2 θ BB Δ 2 θ ChRB . For unbiased estimators, we obtain
Δ 2 θ ChRB ub = sup λ λ 2 μ p ( μ | θ 0 + λ ) 2 p ( μ | θ 0 ) 1 .
Furthermore, the supremum over λ on the right side of Equation (14) is always larger or equal to its limit λ 0 :
sup λ θ est μ | θ 0 + λ θ est μ | θ 0 2 μ p ( μ | θ 0 + λ ) 2 p ( μ | θ 0 ) 1 lim λ 0 θ est μ | θ 0 + λ θ est μ | θ 0 2 μ p ( μ | θ 0 + λ ) 2 p ( μ | θ 0 ) 1 = d θ est μ | θ 0 d θ 0 2 m μ 1 p ( μ | θ 0 ) ( d p ( μ | θ 0 ) d θ 0 ) 2 ,
provided that the derivatives on the right-hand side exist. We thus recover the CRLB as a limiting case of the ChRB. The ChRB is always stricter than the CRLB and we obtain the last inequality in the chain (7). Notice that the CRLB requires the probability distribution p ( μ | θ 0 ) to be differentiable [24]—a condition that can be dropped for the ChRB and the more general BB. Even if the distribution is regular, the above derivation shows that the ChRB, and more generally the BB, provide tighter error bounds than the CRLB. With increasing n, the BB becomes tighter and tighter and the CRLB represents the weakest bound in this hierarchy, which can be observed in Figure 2a. Next, we determine a stricter bound in this hierarchy.

2.2.4. Extended Hammersley–Chapman–Robbins Bound

We obtain the extended Hammersley–Chapman–Robbins bound (EChRB) as a special case of Equation (5), by taking n = 3 , a 1 = 1 , a 2 = A , a 3 = 1 , θ 1 = θ 0 + λ 1 , θ 2 = θ 0 + λ 2 , and θ 3 = θ 0 , giving
Δ 2 θ EChRB = sup λ 1 , λ 2 , A θ est μ | θ 0 + λ 1 + A θ est μ | θ 0 + λ 2 ( 1 + A ) θ est μ | θ 0 2 μ p ( μ | θ 0 + λ 1 ) p ( μ | θ 0 ) + A p ( μ | θ 0 + λ 2 ) 2 p ( μ | θ 0 ) ,
where the supremum is taken over all possible λ 1 , λ 2 N and A R . Since the ChRB is obtained from Equation (17) in the specific case A = 0 , we have that Δ 2 θ EChRB Δ 2 θ ChRB . For unbiased estimators, we obtain
Δ 2 θ EChRB ub = sup λ 1 , λ 2 , A λ 1 + A λ 2 2 μ p ( μ | θ 0 + λ 1 ) p ( μ | θ 0 ) + A p ( μ | θ 0 + λ 2 ) 2 p ( μ | θ 0 ) .
In Figure 2a, we compare the different bounds for unbiased estimators and for the example considered in the manuscript: the CRLB (black line), the ChRB (filled triangles) and the EChRB (empty triangles), satisfying the chain of inequalities (7). In Figure 2b, we show the values of λ in Equation (15) for which the supremum is achieved in our case.

3. Bayesian Approach

The Bayesian approach makes use of the Bayes–Laplace theorem, which can be very simply stated and proved. The joint probability of two stochastic variables μ and θ is symmetric: p ( μ , θ ) = p ( μ | θ ) p ( θ ) = p ( θ | μ ) p ( μ ) = p ( θ , μ ) , where p ( θ ) and p ( μ ) are the marginal distributions, obtained by integrating the joint probability over one of the two variables, while p ( μ | θ ) and p ( θ | μ ) are conditional distributions.
We recall that in a phase inference problem, the set of measurement results μ is generated by a fixed and unknown value θ 0 according to the likelihood p ( μ | θ 0 ) . In the Bayesian approach to the estimation of θ 0 , one introduces a random variable θ and uses the Bayes–Laplace theorem to define the conditional probability
p post ( θ | μ ) = p ( μ | θ ) p pri ( θ ) p mar ( μ ) .
The posterior probability p post ( θ | μ ) provides a degree of belief, or plausibility, that θ 0 = θ (i.e., that θ is the true value of the phase), in the light of the measurement data μ [50]. In Equation (19), the prior distribution p pri ( θ ) expresses the a priori state of knowledge on θ , p ( μ | θ ) is the likelihood that is determined by the quantum mechanical measurement postulate, e.g., as in Equation (1), and the marginal probability p mar ( μ ) = a b d θ p ( θ , μ ) is obtained through the normalization for the posterior, where a and b are boundaries of the phase domain. The posterior probability p post ( θ | μ ) describes the current knowledge about the random variable θ based on the available information, i.e., the measurement results μ .

3.1. Noninformative Prior

In the Bayesian approach, the information on θ provided by the posterior probability always depends on the prior distribution p pri ( θ ) . It is possible to account for the available a priori information on θ by choosing a prior distribution accordingly. However, if no a priori information is available, it is not obvious how to choose a “noninformative” prior [51]. The flat prior p pri ( θ ) = const was first introduced by Laplace to express the absence of information on θ [51]. However, this prior would not be flat for other functions of θ and, in the complete absence of a priori information, it seems unreasonable that some information is available for different parametrizations of the problem. To see this, recall that a transformation of variables requires that p pri ( φ ) = p pri ( θ ) | d f 1 ( φ ) / d φ | for any function φ = f ( θ ) . Hence, if p pri ( θ ) is flat, one obtains that p pri ( φ ) = | d f 1 ( φ ) / d φ | is, in general, not flat.
Notice that p pri ( θ ) F ( θ ) —called Jeffreys prior [52,53]—where F ( θ ) is the Fisher information (10), remains invariant under re-parametrization. For arbitrary transformations φ = f ( θ ) , the Fisher information obeys the transformation property F ( φ ) = F ( θ ) ( d θ / d φ ) 2 = F ( θ ) ( d f 1 ( φ ) / d φ ) 2 . Therefore, if p pri ( θ ) F ( θ ) and we perform the change of variable φ = f ( θ ) , then the transformation property of the Fisher information ensures that p pri ( φ ) = p pri ( θ ) | d f 1 ( φ ) / d φ | F ( φ ) . Notice that, as in our case, the Fisher information F ( θ ) may actually be independent of θ . In this case, the invariance property does not imply that Jeffreys prior is flat for arbitrary re-parametrizations φ = f ( θ ) , instead, F ( φ ) = | d f 1 ( φ ) / d φ | .

3.2. Posterior Bounds

From the posterior probability (19), we can provide an estimate θ BL ( μ ) of θ 0 . This can be the maximum a posteriori, θ BL ( μ ) = arg max θ p post ( θ | μ ) , which coincides with the maximum likelihood Equation (12) when the prior is flat, p pri ( θ ) = const , or the mean of the distribution, θ BL ( μ ) = a b d θ θ p post ( θ | μ ) .
With the Bayesian approach, it is possible to provide a confidence interval around the estimator, given an arbitrary measurement sequence μ , even with a single measurement. The variance
Δ 2 θ BL ( μ ) θ | μ = a b d θ p post ( θ | μ ) θ θ BL ( μ ) 2
can be taken as a measure of fluctuation of our degree of belief around θ BL ( μ ) . There is no such concept in the frequentist paradigm. The Bayesian posterior variance Δ 2 θ BL ( μ ) θ | μ and the frequentist variance ( Δ 2 θ BL ) μ | θ 0 have entirely different operational meanings. Equation (20) provides a degree of plausibility that θ BL ( μ ) = θ 0 , given the measurement results μ . There is no notion of bias in this case. On the other hand, the quantity ( Δ 2 θ BL ) μ | θ 0 measures the statistical fluctuations of θ BL ( μ ) when repeating the sequence of m measurements infinitely many times.

Ghosh Bound

In the following, we derive a lower bound to Equation (20) first introduced by Ghosh [54]. Using a b d θ p post ( θ | μ ) = 1 , we have
a b d θ θ θ BL ( μ ) d p post ( θ | μ ) d θ = p post ( θ | μ ) θ θ BL ( μ ) a b a b d θ p post ( θ | μ ) = f μ , a , b 1 ,
where f μ , a , b = b p post ( b | μ ) a p post ( a | μ ) θ BL ( μ ) ( p post ( b | μ ) p post ( a | μ ) ) depends on the value of the posterior distribution calculated at the boundaries. If p pri ( a ) = p pri ( b ) = 0 , we have f μ , a , b = 0 . Analogously with the derivation of the (frequenstist) CRLB, we exploit the Cauchy–Schwarz inequality,
a b d θ d p post ( θ | μ ) d θ 2 1 p post ( θ | μ ) a b d θ p post ( θ | μ ) θ θ BL ( μ ) 2 ( f μ , a , b 1 ) 2 ,
leading to ( Δ 2 θ BL ( μ ) ) θ | μ Δ 2 θ GB ( μ ) , where [54]
Δ 2 θ GB ( μ ) = ( f μ , a , b 1 ) 2 a b d θ 1 p post ( θ | μ ) d p post ( θ | μ ) d θ 2 .
The above bound is a function of the specific measurement sequence μ and depends on a b d θ 1 p post ( θ | μ ) ( d p post ( θ | μ ) d θ ) 2 that we can identify as a “Fisher information of the posterior distribution”. The Ghosh bound is saturated if and only if
θ θ BL ( μ ) = λ μ d log p ( θ | μ ) d θ ,
where λ μ does not depend on θ while it may depend on μ .

3.3. Average Posterior Bounds

While Equation (20) depends on the specific μ , it is natural to consider its average over all possible measurement sequences at fixed θ 0 and m, weighted by the likelihood p ( μ | θ 0 ) :
Δ 2 θ BL μ , θ | θ 0 = μ Δ 2 θ BL ( μ ) θ | μ p ( μ | θ 0 ) = μ a b d θ p ( θ , μ | θ 0 ) θ θ BL ( μ ) 2 ,
which we indicate as average Bayesian posterior variance, where p ( θ , μ | θ 0 ) = p post ( θ | μ ) p ( μ | θ 0 ) .
We would be tempted to compare the average posterior sensitivity ( Δ 2 θ BL ) μ , θ | θ 0 to the frequentist Cramér–Rao bound Δ 2 θ CRLB . However, because of the different operational meanings of the frequentist and the Bayesian paradigms, there is no reason for Equation (24) to fulfill the Cramér–Rao bound: indeed, it does not, as we show below.

Likelihood-Averaged Ghosh Bound

A lower bound to Equation (24) is obtained by averaging the Ghosh bound Equation (22) over the likelihood function. We have ( Δ 2 θ BL ) μ , θ | θ 0 Δ 2 θ aGB , where [18]
Δ 2 θ aGB = μ ( f μ , a , b 1 ) 2 a b d θ 1 p post ( θ | μ ) ( p post ( θ | μ ) θ ) 2 p ( μ | θ 0 ) .
This likelihood-averaged Ghosh bound is independent of μ because of the statistical average.

3.4. Numerical Comparison of Bayesian and Frequentist Phase Estimation

In the numerical calculations shown in Figure 3, we consider a Bayesian estimator given by θ BL ( μ ) = a b d θ θ p post ( θ | μ ) with prior distributions
p pri ( θ ) = 2 π e α sin ( 2 θ ) 2 1 e α / 2 I 0 ( α / 2 ) 1 ,
where I 0 ( α ) is the modified Bessel function of the first kind. This choice of prior distribution can continuously turn from a peaked function to a flat one when changing α , while being differentiable in the full phase interval. The more negative is α , the more p pri ( θ ) broadens in [ 0 , π / 2 ] . In particular, in the limit α , the prior approaches the flat distribution, which in our case coincides with Jeffreys prior since the Fisher information is independent of θ . In the limit α = 0 , the prior is given by lim α 0 p pri ( θ ) = 4 sin ( 2 θ ) 2 / π . For positive values of α , the larger α , the more peaked is p pri ( θ ) around θ 0 = π / 4 . In particular p pri ( θ ) e 4 α ( θ π / 4 ) 2 / π / 4 α for α 1 . Equation (26) is normalized to one for θ [ 0 , π 2 ] . In the inset of the different panels of Figure 3, we plot p pri ( θ ) for α = 100 [panel (a)], α = 10 (b), α = 1 (c) and α = 10 (d).
In Figure 3, we plot, as a function of m, the posterior variance ( Δ 2 θ BL ) μ , θ | θ 0 (blue circles) that, as expected, is always larger than the likelihood-averaged Ghosh bound Equation (25) (solid blue lines). For comparison, we also plot the frequentist variance ( Δ 2 θ BL ) μ | θ 0 = μ ( θ BL ( μ ) θ BL μ | θ 0 ) 2 p ( μ | θ 0 ) (red dots) around the mean value θ BL μ | θ 0 = μ θ BL ( μ ) p ( μ | θ 0 ) of the estimator. This quantity obeys the Cramér–Rao theorem Δ 2 θ BL μ | θ 0 Δ 2 θ CRLB and the more general chain of inequalities (7). This is confirmed in the figure where we show Δ 2 θ CRLB = | d θ BL μ | θ 0 / d θ 0 | 2 / m F ( θ 0 ) (red line). Notice that, when the prior narrows around θ 0 , the variance Δ 2 θ BL μ | θ 0 decreases, but, at the same time, the estimator becomes more and more biased, i.e., | d θ BL μ | θ 0 / d θ 0 | decreases as well (note indeed that the red dashed line is proportional to | d θ BL μ | θ 0 / d θ 0 | 2 ).
Interestingly, in Figure 3, we clearly see that the Bayesian posterior variance ( Δ 2 θ BL ) μ , θ | θ 0 and the likelihood-averaged Ghosh bound may stay in some cases below the (frequentist) Δ 2 θ CRLB [see panels (a) and (b)], even if the prior is almost flat. The discrepancy with the CRLB is remarkable and can be quite large for small values of m. Still, there is no contradiction since ( Δ 2 θ BL ) μ , θ | θ 0 and Δ 2 θ BL μ | θ 0 have different operational meanings and interpretations. They both respect their corresponding sensitivity bounds.
Asymptotically in the number of measurements m, the Ghosh bound as well as its likelihood average converge to the Cramér–Rao bound. Indeed, it is well known that in this limit the posterior probability becomes a Gaussian centered at the true value of the phase shift and with variance given by the inverse of the Fisher information,
p post ( θ | μ ) = m F ( θ 0 ) 2 π e m F ( θ 0 ) 2 ( θ θ 0 ) 2 , ( m 1 ) ,
a result known as Laplace–Bernstein–von Mises theorem [18,23,55]. By replacing Equation (27) into Equation (22), we recover a posterior variance given by 1 / m F ( θ 0 ) .

4. Bounds for Random Parameters

In this section, we derive bounds of phase sensitivity obtained when θ 0 is a random variable distributed according to p ( θ 0 ) . Operationally, this corresponds to the situation where θ 0 remains fixed (but unknown) when collecting a single sequence of m measurements μ . In between measurement sequences, θ 0 fluctuates according to p ( θ 0 ) .

4.1. Frequentist Risk Functions for Random Parameters

Let us first consider the frequentist estimation of a fluctuating parameter θ 0 with the estimator θ est . The mean sensitivity obtained by averaging ( Δ 2 θ est ) μ | θ 0 , Equation (3), over p ( θ 0 ) is
( Δ 2 θ est ) μ , θ 0 = a b d θ 0 ( Δ 2 θ est ) μ | θ 0 p ( θ 0 ) = μ a b d θ 0 p ( μ | θ 0 ) p ( θ 0 ) ( θ est μ | θ 0 θ est ( μ ) ) 2 = μ a b d θ 0 p ( μ , θ 0 ) ( θ est μ | θ 0 θ est ( μ ) ) 2 ,
where μ and θ 0 are both random variables and we have used p ( μ | θ 0 ) p ( θ 0 ) = p ( μ , θ 0 ) .
An averaged risk function for the efficiency of the estimator is given by averaging the mean square error (3) over p ( θ 0 ) , leading to
MSE ( θ est ) μ , θ 0 = d θ 0 MSE ( θ est ) μ | θ 0 p ( θ 0 ) = d θ 0 μ θ est ( μ ) θ 0 2 p ( μ , θ 0 ) .
Analogously to Equation (4), we can write
MSE ( θ est ) μ , θ 0 = Δ 2 θ est μ , θ 0 + d θ 0 θ est μ | θ 0 θ 0 2 p ( θ 0 ) .
In the following, we derive lower bounds for both ( Δ 2 θ est ) μ , θ 0 and MSE ( θ est ) μ , θ 0 . Notice that bounds on ( Δ 2 θ est ) μ , θ 0 hold also for MSE ( θ est ) μ , θ 0 due to MSE ( θ est ) μ , θ 0 ( Δ 2 θ est ) μ , θ 0 . Nevertheless, bounds on the average the mean square error are widely used (and are often called Bayesian bounds [56]) since they can be expressed independently of the bias.

4.2. Bounds on the Mean Square Error

We first consider bounds on MSE ( θ est ) μ , θ 0 , Equation (29), for arbitrary estimators.

4.2.1. Van Trees Bound

It is possible to derive a general lower bound on the mean square error (29) based on the following assumptions:
  • p ( μ , θ 0 ) θ 0 and 2 p ( μ , θ 0 ) θ 0 2 are absolutely integrable with respect to μ and θ 0 ;
  • p a ξ ( a ) p b ξ ( b ) = 0 , where ξ ( θ 0 ) = μ θ est ( μ ) θ 0 p ( μ | θ 0 ) .
Multiplying ξ ( θ 0 ) by p ( θ 0 ) and differentiating with respect to θ 0 , we have
p ( θ 0 ) ξ ( θ 0 ) θ 0 = μ θ est ( μ ) θ 0 p ( μ , θ 0 ) θ 0 p ( θ 0 ) .
Integrating over θ 0 in the range of [ a , b ] and considering the above properties, we find
μ a b d θ 0 θ BL ( μ ) θ 0 p ( μ , θ 0 ) θ 0 = 1 .
Finally, using the Cauchy–Schwarz inequality, we arrive at MSE ( θ est ) μ , θ 0 Δ 2 θ VTB , where
Δ 2 θ VTB = 1 μ a b d θ 0 1 p ( μ , θ 0 ) ( p ( μ , θ 0 ) θ 0 ) 2
is generally indicated as Van Trees bound [24,56,57]. The equality holds if and only if
θ est ( μ ) θ 0 = λ d log p ( μ , θ 0 ) d θ 0 ,
where λ does not depend on θ 0 and μ . It is easy to show that
μ a b d θ 0 1 p ( μ , θ 0 ) p ( μ , θ 0 ) θ 0 2 = m a b d θ 0 p ( θ 0 ) F ( θ 0 ) + a b d θ 0 1 p ( θ 0 ) p ( θ 0 ) θ 0 2 ,
where the first term is the Fisher information F ( θ 0 ) , defined by Equation (10), averaged over p ( θ 0 ) , and the second term can be interpreted as a Fisher information of the prior [24]. Asymptotically in the number of measurements m and for regular distributions p ( θ 0 ) , the first term in Equation (34) dominates over the second one.

4.2.2. Ziv–Zakai Bound

A further bound on MSE ( θ est ) μ , θ 0 can be derived by mapping the phase estimation problem to a continuous series of binary hypothesis testing problems. A detailed derivation of the Ziv–Zakai bound [24,58,59] is provided in Appendix B. The final result reads MSE ( θ est ) μ , θ 0 Δ 2 θ ZZB , where
Δ 2 θ ZZB = 1 2 d h h d θ 0 p θ 0 + p θ 0 + h P min θ 0 , θ 0 + h ,
and
P min θ 0 , θ 0 + h = 1 2 1 μ p θ 0 p μ | θ 0 p θ 0 + p θ 0 + h p θ 0 + h p μ | θ 0 + h p θ 0 + p θ 0 + h
is the minimum error probability of the binary hypothesis testing problem. This bound has been adopted for quantum phase estimation in Ref. [26]. To this end, the probability P min ( θ 0 , θ 0 + h ) can be maximized over all possible quantum measurements, which leads to the trace distance [7]. As the optimal measurement may depend on θ 0 and h, the bound (35), which involves integration over all values of θ 0 and h, is usually not saturable. We remark that the trace distance also defines a saturable frequentist bound for a different risk function than the variance [60].

4.3. Bounds on the Average Estimator Variance

We now consider bounds on ( Δ 2 θ est ) μ , θ 0 , Equation (28), for arbitrary estimators.

4.3.1. Average CRLB

Taking the average over p ( θ 0 ) of Equation (7), we obtain a chain of bounds for ( Δ 2 θ est ) μ , θ 0 . In particular, in its simplest form, we have ( Δ 2 θ est ) μ , θ 0 Δ 2 θ aCRLB , where
Δ 2 θ aCRLB = a b d θ 0 d θ est μ | θ 0 d θ 0 2 m F ( θ 0 ) p ( θ 0 )
is the average CRLB.

4.3.2. Van Trees Bound for the Average Estimator Variance

We can derive a general lower bound for the variance (28) by following the derivation of the Van Trees bound, which was discussed in Section 4.2.1. In contrast to the standard Van Trees bound for the mean square error, here the bias enters explicitly. Defining ξ ( θ 0 ) = μ θ est ( μ ) θ est μ | θ 0 p ( μ | θ 0 ) and assuming the same requirements as in the derivation of the Van Trees bound for the MSE, we arrive at
μ a b d θ 0 ( θ est ( μ ) θ est μ | θ 0 ) p ( μ , θ 0 ) θ 0 = a b d θ 0 d θ est μ | θ 0 d θ 0 p ( θ 0 ) .
Finally, a Cauchy–Schwarz inequality gives ( Δ 2 θ est ) μ , θ 0 Δ 2 θ fVTB , where
Δ 2 θ fVTB = ( a b d θ 0 d θ est μ | θ 0 d θ 0 p ( θ 0 ) ) 2 μ a b d θ 0 1 p ( μ , θ 0 ) ( p ( μ , θ 0 ) θ 0 ) 2 ,
with equality if and only if
θ est ( μ ) θ est μ | θ 0 = λ d log p ( μ , θ 0 ) d θ 0 ,
where λ is independent of θ 0 and μ .
We can compare Equation (38) with the average CRLB Equation (37). We find
a b d θ 0 ( d θ est μ | θ 0 d θ 0 ) 2 m F ( θ 0 ) p ( θ 0 ) ( a b d θ 0 d θ est μ | θ 0 d θ 0 p ( θ 0 ) ) 2 m a b d θ 0 p ( θ 0 ) F ( θ 0 ) ( a b d θ 0 | d θ est μ | θ 0 d θ 0 | p ( θ 0 ) ) 2 μ a b d θ 0 1 p ( μ , θ 0 ) ( p ( μ , θ 0 ) θ 0 ) 2 ,
where in the first step we use Jensen’s inequality, and the second step follows from Equation (34) which implies m a b d θ 0 p ( θ 0 ) F ( θ 0 ) μ a b d θ 0 1 p ( μ , θ 0 ) ( p ( μ , θ 0 ) θ 0 ) 2 since a b d θ 0 1 p ( θ 0 ) ( d p ( θ 0 ) d θ 0 ) 2 0 .
We thus arrive at
( Δ 2 θ est ) μ , θ 0 Δ 2 θ aCRLB Δ 2 θ fVTB ,
which is valid for generic estimators.

4.4. Bayesian Framework for Random Parameters

The Bayesian posterior variance, ( Δ 2 θ BL ) μ , θ | θ 0 , Equation (24), averaged over p ( θ 0 ) is
( Δ 2 θ BL ) μ , θ , θ 0 = a b d θ 0 ( Δ 2 θ BL ) μ , θ | θ 0 p ( θ 0 ) = μ a b d θ a b d θ 0 p post ( θ | μ ) p ( μ | θ 0 ) p ( θ 0 ) θ θ BL ( μ ) 2 = μ a b d θ p post ( θ | μ ) p ( μ ) θ θ BL ( μ ) 2 ,
where p ( μ ) = a b d θ 0 p ( μ | θ 0 ) p ( θ 0 ) is the average probability to observe μ taking into account fluctuations of θ 0 .
A bound on Equation (41) can be obtained by averaging Equation (25) over p ( θ 0 ) , or, equivalently, averaging the Ghosh bound, Equation (22), over p ( μ ) . We obtain the average Ghosh bound for random parameters θ 0 , ( Δ 2 θ BL ) μ , θ , θ 0 Δ 2 θ aGBr , where
Δ 2 θ aGBr = a b d θ 0 μ ( f μ , a , b 1 ) 2 a b d θ 1 p post ( θ | μ ) d p post ( θ | μ ) d θ 2 p ( μ | θ 0 ) p ( θ 0 ) = μ ( f μ , a , b 1 ) 2 a b d θ 1 p post ( θ | μ ) d p post ( θ | μ ) d θ 2 p ( μ ) .
The bound holds for any prior p pri ( θ ) and is saturated if and only if, for every value of μ , there exists a λ μ such that Equation (23) holds.

Bayesian Bounds

In Equation (41), the prior used to define the posterior p post ( θ | μ ) via the Bayes–Laplace theorem is arbitrary. In general, such a prior p pri ( θ ) is different from the statistical distribution of θ 0 , which can be unknown. If p ( θ 0 ) is known, then one can use it as a prior in the Bayesian posterior probability, i.e., p pri ( θ ) = p ( θ 0 ) . In this specific case, we have p mar ( μ ) = p ( μ ) , and thus p post ( θ | μ ) p ( μ ) = p post ( θ | μ ) p mar ( μ ) = p ( μ , θ ) . In other words, for this specific choice of prior, the physical joint probability p ( μ , θ 0 ) of random variables θ 0 and μ coincides with the Bayesian p ( μ , θ ) . Equation (41) thus simplifies to
( Δ 2 θ BL ) μ , θ = μ a b d θ p ( μ , θ ) θ θ BL ( μ ) 2 .
Notice that this expression is mathematically equivalent to the frequentist average mean square error (29) if we replace θ with θ 0 and θ BL ( μ ) with θ est ( μ ) . This means that precision bounds for Equation (29), e.g., the Van Trees and Ziv–Zakai bounds can also be applied to Equation (43). These bounds are indeed often referred to as “Bayesian bounds” (see Ref. [24]).
We emphasize that the average over the marginal distribution p mar ( μ ) , which connects Equations (24) and (43), has operational meaning if we consider that θ 0 is a random variable distributed according to p ( θ 0 ) , and p ( θ ) is used as prior in the Bayes–Laplace theorem to define a posterior distribution. In this case, and under the condition f ( μ , a , b ) = 0 (for instance if the prior distribution vanishes at the borders of the phase domain), using Jensen’s inequality, we find
Δ 2 θ aGBr = μ p ( μ ) a b d θ 1 p post ( θ | μ ) ( d p post ( θ | μ ) d θ ) 2 1 μ p ( μ ) a b d θ 1 p post ( θ | μ ) ( d p post ( θ | μ ) d θ ) 2 = 1 μ a b d θ 1 p ( θ , μ ) ( p ( θ , μ ) θ ) 2 ,
which coincides with the Van Trees bound discussed above. We thus find that the averaged Ghosh bound for random parameters (42) is sharper than the Van Trees bound (38):
( Δ 2 θ BL ) μ , θ Δ 2 θ aGBr Δ 2 θ VTB ,
which is also confirmed by the numerical data shown in Figure 4.
In Figure 4, we compare Δ 2 θ BL μ , θ with the various bounds discussed in this section. As p ( θ 0 ) , we consider the same prior (26) used in Figure 3. We observe that all bounds approach the Van Trees bound with increasing sharpness of the prior distribution. Asymptotically in the number of measurements m, all bounds converge to the Cramér–Rao bound.

5. Discussion and Conclusions

In this manuscript, we have clarified the differences between frequentist and Bayesian approaches to phase estimation. The two paradigms provide statistical results that have a different conceptual meaning and cannot be compared. We have also reviewed and discussed phase sensitivity bounds in the frequentist and Bayesian frameworks, when the true value of the phase shift θ 0 is fixed or fluctuates. These bounds are summarized in Table 1.
In the frequentist approach, for a fixed θ 0 , the phase sensitivity is determined from the width of the probability distribution of the estimator. The physical content of the distribution is that, when repeating the estimation protocol, the obtained θ est ( μ ) will fall, with a certain confidence, in an interval around the mean value θ est μ | θ 0 (e.g., 68 % of the times within a 2 ( Δ θ est ) μ | θ 0 interval for a Gaussian distribution) that, for unbiased estimators, coincides with the true value of the phase shift.
In the Bayesian case, the posterior p post ( θ | μ ) provides a degree of plausibility that the phase shift θ equals the interferometer phase θ 0 when the data μ was obtained. This allows the Bayesian approach to provide statistical information for any number of measurements, even a single one. To be sure, this is not a sign of failure or superiority of one approach with respect to the other one, since the two frameworks manipulate conceptually different quantities. The experimentalist can choose to use one or both approaches, keeping in mind the necessity to clearly state the nature of the statistical significance of the reported results.
The two predictions converge asymptotically in the limit of a large number of measurements. This does not mean that in this limit the significance of the two approaches is interchangeable (it cannot be stated that in the limit of large repetition of the measurements, frequentist ad Bayesian provide the same results). In this respect, it is quite instructive to notice that the Bayesian 2 σ confidence may be below that of the Cramér–Rao bound, as shown in Figure 3. This, at first sight, seems paradoxical, since the CRLB is a theorem about the minimum error achievable in parameter estimation theory. However, the CRLB is a frequentist bound and, again, the paradox is solved taking it into account that the frequentist and the Bayesian approaches provide information about different quantities.
Finally, a different class of estimation problems with different precision bounds is encountered if θ 0 is itself a random variable. In this case, the frequentist bounds for the mean-square error (Van Trees, Ziv–Zakai) become independent of the bias, while those on the estimator variance are still functions of the bias. The Van Trees and Ziv–Zakai bounds can be applied to the Bayesian paradigm if the average of the posterior variance over the marginal distribution is the relevant risk function. This is only meaningful if the prior p pri ( θ ) that enters the Bayes–Laplace theorem coincides with the actual distribution p ( θ 0 ) of the phase shift θ 0 .
We conclude with a remark regarding the so-called Heisenberg limit, which is a saturable lower bound on the CRLB over arbitrary quantum states with a fixed number of particles. For instance, for a collection of N two-level systems, the CRLB can be further bounded by Δ θ est 1 / m F ( θ 0 ) 1 / m N [18,20]. This bound is often called the ultimate precision bound since no quantum state is able to achieve a tighter scaling than N. From the discussions presented in this article, it becomes apparent that Bayesian approaches (as discussed in Section 3) or precision bounds for random parameters (Section 4) are expected to lead to entirely different types of ‘ultimate’ lower bounds. Such bounds are interesting within the respective paradigm for which they are derived, but they cannot replace or improve the Heisenberg limit since they address fundamentally different scenarios that cannot be compared in general.

Author Contributions

Y.L., L.P., M.G., W.L. and A.S. conceived the study, performed theoretical calculations and drafted the article. All authors have read and approved the final manuscript.

Acknowledgments

This work was supported by the National Key R & D Program of China (No. 2017YFA0304500 and No. 2017YFA0304203), the National Natural Science Foundation of China (Grant No. 11874247), the 111 plan of China (No. D18001), the Hundred Talent Program of the Shanxi Province (2018), the Program of State Key Laboratory of Quantum Optics and Quantum Optics Devices (No. KF201703), and the QuantEra project Q-Clocks. M.G. acknowledges support by the Alexander von Humboldt Foundation.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Derivation of the Barankin Bound

Let θ est be an arbitrary estimator for θ . Its mean value
θ est μ | θ = μ θ est ( μ ) p ( μ | θ )
coincides with θ if and only if the estimator is unbiased (for arbitrary values of θ ). In the following, we make no assumption about the bias of θ est and therefore do not replace θ est μ | θ by θ .
Introducing the likelihood ratio
L ( μ | θ i , θ 0 ) = p ( μ | θ i ) p ( μ | θ 0 )
under the condition p ( μ | θ 0 ) > 0 for all μ , we obtain with Equation (A1) that
μ θ est ( μ ) L ( μ | θ i , θ 0 ) p ( μ | θ 0 ) = θ est μ | θ i ,
for an arbitrary family of phase values θ 1 , , θ n picked from the parameter domain. Furthermore, we have
μ L ( μ | θ i , θ 0 ) p ( μ | θ 0 ) = μ p ( μ | θ i ) = 1
for all θ i . Multiplying both sides of Equation (A4) with θ est μ | θ 0 and subtracting it from (A3) yields
μ θ est ( μ ) θ est μ | θ 0 L ( μ | θ i , θ 0 ) p ( μ | θ 0 ) = θ est μ | θ i θ est μ | θ 0 .
Let us now pick a family of n finite coefficients a 1 , , a n . From Equation (A5), we obtain
μ θ est ( μ ) θ est μ | θ 0 i = 1 n a i L ( μ | θ i , θ 0 ) p ( μ | θ 0 ) = i = 1 n a i θ est μ | θ i θ est μ | θ 0 .
The Cauchy–Schwarz inequality now yields
i = 1 n a i θ est μ | θ i θ est μ | θ 0 2 Δ 2 θ est μ | θ 0 μ i = 1 n a i L ( μ | θ i , θ 0 ) 2 p ( μ | θ 0 ) ,
where
Δ 2 θ est μ | θ 0 = μ θ est ( μ ) θ est μ | θ 0 2 p ( μ | θ 0 )
is the variance of the estimator θ est . We thus obtain
Δ 2 θ est μ | θ 0 i = 1 n a i θ est μ | θ i θ est μ | θ 0 2 μ i = 1 n a i L ( μ | θ i , θ 0 ) 2 p ( μ | θ 0 ) ,
for all n, a i , and θ i . The Barankin bound then follows by taking the supremum over these variables.

Appendix B. Derivation of the Ziv–Zakai Bound

Derivations of the Ziv–Zakai bound can be found in the literature (see, for instance, Refs. [24,58,59]). This Appendix follows these derivations closely and provides additional background, which may be useful for readers less familiar with the field of hypothesis testing.
Let X [ 0 , a ] be a random variable with probability density p ( x ) . We can formally write p ( x ) = d P ( X x ) / d x , where P ( X x ) x a p ( y ) d y is the probability that X is larger or equal than x. We obtain from integration by parts
X 2 = 0 a x 2 p ( x ) d x = x 2 P ( X x ) 0 a + 2 0 a P ( X x ) x d x = 2 0 a P ( X x ) x d x = 1 2 0 2 a P X h 2 h d h ,
where we assume that a is finite [if a the above relation holds when lim a a 2 P ( X a ) = 0 ]. Finally, we can formally extend the above integral up to since P ( X a ) = 0 :
X 2 = 1 2 0 P X h 2 h d h .
Following Ref. [59], we now take ϵ = θ est ( μ ) θ 0 and X = | ϵ | . We thus have
MSE ( θ est ) μ , θ 0 = | ϵ | 2 = 1 2 0 P | ϵ | h 2 h d h .
We express the probability as
P | ϵ | h 2 = P ϵ > h 2 + P ϵ h 2 = P θ est ( μ ) θ 0 > h 2 + P θ est ( μ ) θ 0 h 2 = P θ est ( μ ) θ 0 > h 2 | θ 0 p ( θ 0 ) d θ 0 + P θ est ( μ ) θ 0 h 2 | θ 0 p ( θ 0 ) d θ 0 .
Next, we replace θ 0 with θ 0 + h in the second integral:
P | ϵ | h 2 = P θ est ( x ) θ 0 > h 2 | θ 0 p ( θ 0 ) d θ 0 + P θ est ( x ) θ 0 h 2 | θ 0 + h p ( θ 0 + h ) d θ 0 = ( p ( φ ) + p ( φ + h ) ) p ( φ ) p ( φ ) + p ( φ + h ) P θ est ( x ) φ > h 2 | θ 0 = φ + + p ( φ + h ) p ( φ ) + p ( φ + h ) P θ est ( x ) φ h 2 | θ 0 = φ + h d φ .
We now take a closer look at the expression within the angular brackets and interpret it in the framework of hypothesis testing. Suppose that we try to discriminate between the two cases θ 0 = φ (hypothesis 1, denoted H 1 ) and θ 0 = φ + h (denoted H 2 ). We decide between the two hypothesis H 1 and H 2 on the basis of the measurement result x using the estimator θ est ( x ) . One possible strategy consists in choosing the hypothesis whose value is closest to the obtained estimator. Hence, if θ est ( x ) φ + h / 2 , we assume H 1 to be correct and, otherwise, if θ est ( x ) > φ + h / 2 , we pick H 2 .
Let us now determine the probability to make an erroneous decision using this strategy. There are two scenarios that will lead to a mistake. First, our strategy fails whenever θ est ( x ) φ + h / 2 when θ 0 = φ + h . In this case, H 2 is true, but our strategy leads us to choose H 1 . The probability for this to happen, given that θ 0 = φ + h , is P ( θ est ( x ) φ h 2 | θ 0 = φ + h ) . To obtain the probability error of our strategy, we need to multiply this with the probability with which θ 0 assumes the value φ + h , which is given by p ( H 2 ) = p ( φ + h ) p ( φ ) + p ( φ + h ) . Second, our strategy also fails if θ est ( x ) > φ + h / 2 for θ 0 = φ . This occurs with the conditional probability P ( θ est ( x ) φ > h 2 | θ 0 = φ ) , and θ 0 = φ with probability p ( H 1 ) = p ( φ ) p ( φ ) + p ( φ + h ) . The total probability to make a mistake is consequently given by
P err ( φ , φ + h ) = P θ est ( x ) φ > h 2 | H 1 p ( H 1 ) + P θ est ( x ) φ h 2 | H 2 p ( H 2 ) = p ( φ ) p ( φ ) + p ( φ + h ) P θ est ( x ) φ > h 2 | θ 0 = φ + + p ( φ + h ) p ( φ ) + p ( φ + h ) P θ est ( x ) φ h 2 | θ 0 = φ + h ,
and we can rewrite Equation (A13) as
P | ϵ | h 2 = ( p ( φ ) + p ( φ + h ) ) P err ( φ , φ + h ) d φ .
The strategy described above depends on the estimator θ est and may not be optimal. In general, a binary hypothesis testing strategy can be characterized in terms of the separation of the possible values of x into the two disjoint subsets X 1 and X 2 which are used to choose hypothesis H 1 or H 2 , respectively. That is, if x X 1 we pick H 1 and otherwise H 2 . Since one of the two hypotheses must be true, we have
1 = p ( H 1 ) + p ( H 2 ) = X 1 d x p ( x | H 1 ) p ( H 1 ) + X 2 d x p ( x | H 1 ) p ( H 1 ) + X 1 d x p ( x | H 2 ) p ( H 2 ) + X 2 d x p ( x | H 2 ) p ( H 2 ) = X 1 d x p ( x | H 1 ) p ( H 1 ) + X 2 d x p ( x | H 2 ) p ( H 2 ) + P err X 1 ( H 1 , H 2 ) ,
where the error made by such a strategy is given by
P err X 1 ( H 1 , H 2 ) = P ( x X 2 | H 1 ) p ( H 1 ) + P ( x X 1 | H 2 ) p ( H 2 ) = X 2 p ( x | H 1 ) p ( H 1 ) d x + X 1 p ( x | H 2 ) p ( H 2 ) d x = p ( H 1 ) + X 1 p ( x | H 2 ) p ( H 2 ) p ( x | H 1 ) p ( H 1 ) d x .
This probability is minimized if p ( x | H 2 ) p ( H 2 ) < p ( x | H 1 ) p ( H 1 ) for x X 1 and, consequently, p ( x | H 2 ) p ( H 2 ) p ( x | H 1 ) p ( H 1 ) for x X 2 . This actually identifies an optimal strategy for hypothesis testing, known as the likelihood ratio test: if the likelihood ratio p ( x | H 1 ) / p ( x | H 2 ) is larger than the threshold value p ( H 2 ) / p ( H 1 ) , we pick H 1 , whereas, if it is smaller, we pick H 2 . With this choice, the error probability is minimal and reads
P min ( H 1 , H 2 ) = X 2 p ( x | H 1 ) p ( H 1 ) p ( x | H 2 ) p ( H 2 ) d x + X 1 p ( x | H 2 ) p ( H 2 ) p ( x | H 1 ) p ( H 1 ) d x + + X 1 p ( x | H 1 ) p ( H 1 ) d x + X 2 p ( x | H 2 ) p ( H 2 ) d x = 1 2 1 2 p ( x | H 1 ) p ( H 1 ) p ( x | H 2 ) p ( H 2 ) d x ,
where we used Equation (A15).
Applied to our case, we obtain
P min ( φ , φ + h ) = 1 2 1 μ p ( μ | θ 0 = φ ) p ( φ ) p ( φ ) + p ( φ + h ) p ( μ | θ 0 = φ + h ) p ( φ + h ) p ( φ ) + p ( φ + h ) .
This result represents a lower bound on P err X 1 ( φ , φ + h ) for arbitrary choices of X 1 . This includes the case discussed in Equation (A13). Thus, using
P err ( φ , φ + h ) P min ( φ , φ + h )
in Equation (A14) and inserting back into Equation (A12), we finally obtain the Ziv–Zakai bound for the mean square error:
MSE ( θ est ) μ , θ 0 1 2 0 h d h d θ 0 ( p ( θ 0 ) + p ( θ 0 + h ) ) P min ( θ 0 , θ 0 + h ) .
This bound can be further sharpened by introducing a valley-filling function [61], which is not considered here.

References

  1. Zehnder, L. Ein neuer Interferenzrefraktor. Zeitschrift für Instrumentenkunde 1891, 11, 275. (In German) [Google Scholar]
  2. Mach, L. Ueber einen Interferenzrefraktor. Zeitschrift für Instrumentenkunde 1892, 12, 89. (In German) [Google Scholar]
  3. Ramsey, N.F. Molecular Beams; Oxford University Press: London, UK, 1963. [Google Scholar]
  4. Wynands, R. Atomic Clocks. In Lecture Notes in Physics; Muga, G., Ruschhaupt, A., Campo, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2009; Volume 789. [Google Scholar]
  5. Barish, B.C.; Weiss, R. LIGO and the Detection of Gravitational Waves. Phys. Today 1999, 52, 44–50. [Google Scholar] [CrossRef]
  6. Pitkin, M.; Reid, S.; Rowan, S.; Hough, J. Gravitational Wave Detection by Interferometry (Ground and Space). Living Rev. Relativ. 2011, 14, 5. [Google Scholar] [CrossRef] [PubMed]
  7. Helstrom, C.W. Quantum detection and estimation theory. J. Stat. Phys. 1969, 1, 231. [Google Scholar] [CrossRef]
  8. Holevo, A.S. Probabilistic and Statistical Aspects of Quantum Theory; North-Holland Publishing Company: Amsterdam, The Netherlands, 1982. [Google Scholar]
  9. Ludlow, A.D.; Boyd, M.M.; Ye, J.; Peik, E.; Schmidt, P.O. Optical atomic clocks. Rev. Mod. Phys. 2015, 87, 637–701. [Google Scholar] [CrossRef] [Green Version]
  10. Schnabel, R.; Mavalvala, N.; McClelland, D.E.; Lam, P.K. Quantum metrology for gravitational wave astronomy. Nat. Commun. 2010, 1, 121. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  11. Aasi, J.; Abadie, J.; Abbott, B.P.; Abbott, R.; Abbott, T.D.; Abernathy, M.R.; Adams, C.; Adams, T.; Addesso, P.; Adhikari, R.X.; et al. Enhanced sensitivity of the LIGO gravitational wave detector by using squeezed states of light. Nat. Photon. 2010, 7, 613–619. [Google Scholar] [CrossRef]
  12. Cronin, A.D.; Schmiedmayer, J.; Pritchard, D.E. Optics and interferometry with atoms and molecules. Rev. Mod. Phys. 2009, 81, 1051–1129. [Google Scholar] [CrossRef] [Green Version]
  13. Caves, C.M. Quantum-mechanical noise in an interferometer. Phys. Rev. D 1981, 23, 1693–1708. [Google Scholar] [CrossRef]
  14. Giovannetti, V.; Lloyd, S.; Maccone, L. Quantum metrology. Phys. Rev. Lett. 2006, 96, 010401. [Google Scholar] [CrossRef] [PubMed]
  15. Pezzè, L.; Smerzi, A. Entanglement, nonlinear dynamics, and the Heisenberg limit. Phys. Rev. Lett. 2009, 102, 100401. [Google Scholar] [CrossRef] [PubMed]
  16. Hyllus, P.; Laskowski, W.; Krischek, R.; Schwemmer, C.; Wieczorek, W.; Weinfurter, H.; Pezzè, L.; Smerzi, A. Fisher information and multiparticle entanglement. Phys. Rev. A 2012, 85, 022321. [Google Scholar] [CrossRef]
  17. Tóth, G. Multipartite entanglement and high-precision metrology. Phys. Rev. A 2012, 85, 022322. [Google Scholar] [CrossRef]
  18. Pezzè, L.; Smerzi, A. Quantum theory of phase estimation. In Atom Interferometry, Proceedings of the International School of Physics "Enrico Fermi", Italy, 15–20 July 2013; Tino, G.M., Kasevich, M.A., Eds.; IOS Press: Sesto Fiorentino, Italy, 2014; Course 188, 691. [Google Scholar]
  19. Tóth, G.; Apellaniz, I. Quantum metrology from a quantum information science perspective. J. Phys. A Math. Theor. 2014, 47, 424006. [Google Scholar] [CrossRef] [Green Version]
  20. Giovannetti, V.; Lloyd, S.; Maccone, L. Advances in quantum metrology. Nat. Photon. 2011, 5, 222–229. [Google Scholar] [CrossRef] [Green Version]
  21. Pezzè, L.; Smerzi, A.; Oberthaler, M.K.; Schimed, R.; Treutlein, P. Quantum metrology with nonclassical states of atomic ensembles. Rev. Mod. Phys. 2018, in press. [Google Scholar]
  22. Kay, S.M. Fundamentals of Statistical Signal Processing: Estimation Theory, Volume I; Prentice Hall: Upper Saddle River, NJ, USA, 1993. [Google Scholar]
  23. Lehmann, E.L.; Casella, G. Theory of Point Estimation; Springer: Berlin, Germany, 1998. [Google Scholar]
  24. Van Trees, H.L.; Bell, K.L. Bayesian Bounds for Parameter Estimation and Nonlinear Filtering/Tracking; Wiley: New York, NY, USA, 2007. [Google Scholar]
  25. Lane, A.S.; Braunstein, S.L.; Caves, C.M. Maximum-likelihood statistics of multiple quantum phase measurements. Phys. Rev. A 1993, 47, 1667. [Google Scholar] [CrossRef] [PubMed]
  26. Tsang, M. Ziv–Zakai error bounds for quantum parameter estimation. Phys. Rev. Lett. 2012, 108, 230401. [Google Scholar] [CrossRef] [PubMed]
  27. Lu, X.M.; Tsang, M. Quantum Weiss-Weinstein bounds for quantum metrology. Quantum Sci. Technol. 2016, 1, 015002. [Google Scholar] [CrossRef] [Green Version]
  28. Hall, M.J.; Wiseman, H.M. Heisenberg-style bounds for arbitrary estimates of shift parameters including prior information. New J. Phys. 2012, 14, 033040. [Google Scholar] [CrossRef]
  29. Giovannetti, V.; Maccone, L. Sub-Heisenberg estimation strategies are ineffective. Phys. Rev. Lett. 2012, 108, 210404. [Google Scholar] [CrossRef] [PubMed]
  30. Pezzè, L. Sub-Heisenberg phase uncertainties. Phys. Rev. A 2013, 88, 060101(R). [Google Scholar] [CrossRef]
  31. Pezzè, L.; Hyllus, P.; Smerzi, A. Phase-sensitivity bounds for two-mode interferometers. Phys. Rev. A 2015, 91, 032103. [Google Scholar] [CrossRef]
  32. Hradil, Z.; Myška, R.; Peřina, J.; Zawisky, M.; Hasegawa, Y.; Rauch, H. Quantum phase in interferometry. Phys. Rev. Lett. 1996, 76, 4295. [Google Scholar] [CrossRef] [PubMed]
  33. Pezzè, L.; Smerzi, A.; Khoury, G.; Hodelin, J.F.; Bouwmeester, D. Phase detection at the quantum limit with multiphoton mach-zehnder interferometry. Phys. Rev. Lett. 2007, 99, 223602. [Google Scholar] [CrossRef] [PubMed]
  34. Kacprowicz, M.; Demkowicz-Dobrzanski, R.; Wasilewski, W.; Banaszek, K.; Walmsley, I.A. Experimental quantum-enhanced estimation of a lossy phase shift. Nat. Photon. 2010, 4, 357. [Google Scholar] [CrossRef]
  35. Krischek, R.; Schwemmer, C.; Wieczorek, W.; Weinfurter, H.; Hyllus, P.; Pezzè, L.; Smerzi, A. Useful multiparticle entanglement and sub-shot-noise sensitivity in experimental phase estimation. Phys. Rev. Lett. 2011, 107, 080504. [Google Scholar] [CrossRef] [PubMed]
  36. Xiang, G.Y.; Higgins, B.L.; Berry, D.W.; Wiseman, H.M.; Pryde, G.J. Entanglement-enhanced measurement of a completely unknown optical phase. Nat. Photon. 2011, 5, 43–47. [Google Scholar] [CrossRef]
  37. Bollinger, J.J.; Itano, W.M.; Wineland, D.J.; Heinzen, D.J. Optimal frequency measurements with maximally correlated states. Phys. Rev. A 1996, 54, R4649–R4652. [Google Scholar] [CrossRef] [PubMed]
  38. Pezzè, L.; Smerzi, A. Sub shot-noise interferometric phase sensitivity with beryllium ions Schrödinger cat states. Europhys. Lett. 2007, 78, 30004. [Google Scholar] [CrossRef] [Green Version]
  39. Gerry, C.C.; Mimih, J. The parity operator in quantum optical metrology. Contemp. Phys. 2010, 51, 497. [Google Scholar] [CrossRef]
  40. Sackett, C.A.; Kielpinski, D.; King, B.E.; Langer, C.; Meyer, V.; Myatt, C.J.; Rowe, M.; Turchette, Q.A.; Itano, W.M.; et al. Experimental entanglement of four particles. Nature 2000, 404, 256–259. [Google Scholar] [CrossRef] [PubMed]
  41. Monz, T.; Schindler, P.; Barreiro, J.T.; Chwalla, M.; Nigg, D.; Coish, W.; Harlander, M.; Hänsel, W.; Hennrich, M.; Blatt, R. 14-Qubit Entanglement: Creation and Coherence. Phys. Rev. Lett. 2011, 106, 130506. [Google Scholar] [CrossRef] [PubMed]
  42. Hayashi, M. Asymptotic Theory of Quantum Statistical Inference, Selected Papers; World Scientific Publishing: Singapore, 2005. [Google Scholar]
  43. Barankin, E.W. Locally best unbiased estimates. Ann. Math. Stat. 1949, 20, 477. [Google Scholar] [CrossRef]
  44. Mcaulay, R.J.; Hofstetter, E.M. Barankin bounds on parameter estimation. IEEE Trans. Inf. Theory 1971, 17, 669–676. [Google Scholar] [CrossRef]
  45. Cramér, H. Mathematical Methods of Statistics; Princeton University Press: Princeton, NJ, USA, 1946. [Google Scholar]
  46. Rao, C.R. Information and the Accuracy Attainable in the Estimation of Statistical Parameters. Bull. Calcutta Math. Soc. 1971, 37, 81–91. [Google Scholar]
  47. Hammersley, J.M. On estimating restricted parameters. J. R. Stat. Soc. Ser. B 1950, 12, 192. [Google Scholar]
  48. Chapman, D.G.; Robbins, H. Minimum variance estimation without regularity assumptions. Ann. Math. Stat. 1951, 22, 581. [Google Scholar] [CrossRef]
  49. Pflanzagl, J.; Hamböker, R. Parametric Statistical Theory; De Gruyter: Berlin, Germany, 1994. [Google Scholar]
  50. Sivia, D.S.; Skilling, J. Data Analysis: A Bayesian Tutorial; Oxford University Press: London, UK, 2006. [Google Scholar]
  51. Robert, C.P. The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation; Springer: New York, NY, USA, 2007. [Google Scholar]
  52. Jeffreys, H. An invariant form for the prior probability in estimation problems. Proc. R. Soc. Lond. A 1946, 186, 453. [Google Scholar] [CrossRef]
  53. Jeffreys, H. Theory of Probability; Oxford University Press: London, UK, 1961. [Google Scholar]
  54. Ghosh, M. Cramér–Rao bounds for posterior variances. Stat. Probabil. Lett. 1993, 17, 173. [Google Scholar] [CrossRef]
  55. Cam, L.L. Asymptotic Methods in Statistical Decision Theory; Springer: New York, NY, USA, 1986. [Google Scholar]
  56. Van Trees, H.L. Detection, Estimation, and Modulation Theory, Part I; Wiley: New York, NY, USA, 1968. [Google Scholar]
  57. Schutzenberger, M.P. A generalization of the Fréchet-Cramér inequality to the case of Bayes estimation. Bull. Am. Math. Soc. 1957, 63, 142. [Google Scholar]
  58. Ziv, J.; Zakai, M. Some lower bounds on signal parameter estimation. IEEE Trans. Inform. Theor. 1969, 15, 386–391. [Google Scholar] [CrossRef]
  59. Bell, K.L.; Steinberg, Y.; Ephraim, Y.; Van Trees, H.L. Extended Ziv–Zakai lower bound for vector parameter estimation. IEEE Trans. Inf. Theor. 1997, 43, 624–637. [Google Scholar] [CrossRef]
  60. Gessner, M.; Smerzi, A. Statistical speed of quantum states: Generalized quantum Fisher information and Schatten speed. Phys. Rev. A 2018, 97, 022109. [Google Scholar] [CrossRef] [Green Version]
  61. Bellini, S.; Tartara, G. Bounds on error in signal parameter estimation. IEEE Trans. Commun. 1974, 22, 340–342. [Google Scholar] [CrossRef]
Figure 1. (a) Bias θ MLE μ | θ 0 θ 0 (green dots) as function of m with error bars ( Δ θ MLE ) μ | θ 0 . The red lines are ± Δ θ CRLB = ± | d θ MLE μ | θ 0 / d θ 0 | / m F ( θ 0 ) ; (b) variance of the maximum likelihood estimator multiplied by the Fisher information, m F ( θ 0 ) ( Δ 2 θ MLE ) μ | θ 0 (red circles), as a function of the sample size m. It is compared to the bias ( d θ MLE μ | θ 0 / d θ 0 ) 2 (red dashed line). We recall that θ 0 = π / 4 and F ( θ 0 ) = 4 here.
Figure 1. (a) Bias θ MLE μ | θ 0 θ 0 (green dots) as function of m with error bars ( Δ θ MLE ) μ | θ 0 . The red lines are ± Δ θ CRLB = ± | d θ MLE μ | θ 0 / d θ 0 | / m F ( θ 0 ) ; (b) variance of the maximum likelihood estimator multiplied by the Fisher information, m F ( θ 0 ) ( Δ 2 θ MLE ) μ | θ 0 (red circles), as a function of the sample size m. It is compared to the bias ( d θ MLE μ | θ 0 / d θ 0 ) 2 (red dashed line). We recall that θ 0 = π / 4 and F ( θ 0 ) = 4 here.
Entropy 20 00628 g001
Figure 2. (a) comparison between unbiased frequentist bounds for the example considered in this manuscript, Equation (1): the CRLB m Δ 2 θ CRLB ub = 1 / F ( θ 0 ) (black line), the Hammersley–Chapman–Robbins bound m Δ 2 θ ChRB ub (Equation (15), filled triangles) and the extended Hammersley–Chapman–Robbins bound m Δ 2 θ EChRB ub (Equation (18), empty triangles); (b) values of λ achieving the supremum in Equation (15), as a function of m.
Figure 2. (a) comparison between unbiased frequentist bounds for the example considered in this manuscript, Equation (1): the CRLB m Δ 2 θ CRLB ub = 1 / F ( θ 0 ) (black line), the Hammersley–Chapman–Robbins bound m Δ 2 θ ChRB ub (Equation (15), filled triangles) and the extended Hammersley–Chapman–Robbins bound m Δ 2 θ EChRB ub (Equation (18), empty triangles); (b) values of λ achieving the supremum in Equation (15), as a function of m.
Entropy 20 00628 g002
Figure 3. Comparisons of phase estimation variance as a function of the sample size for Bayesian and frequentist data analysis under different prior distributions, (a) α = 100 , (b) α = 10 , (c) α = 1 , (d) α = 10 . In all figures, Red circles (frequentist) are m ( Δ 2 θ BL ) μ | θ 0 , the red dashed line is the Cramér-Rao lower bound m Δ 2 θ CRLB , Equation (8). Blue circles (Bayesian) are m ( Δ 2 θ BL ) μ , θ | θ 0 , the blue solid line is the likelihood-averaged Ghosh bound m Δ 2 θ aGB , Equation (25). The inset in each panel is p pri ( θ ) , Equation (26), for the corresponding values of α .
Figure 3. Comparisons of phase estimation variance as a function of the sample size for Bayesian and frequentist data analysis under different prior distributions, (a) α = 100 , (b) α = 10 , (c) α = 1 , (d) α = 10 . In all figures, Red circles (frequentist) are m ( Δ 2 θ BL ) μ | θ 0 , the red dashed line is the Cramér-Rao lower bound m Δ 2 θ CRLB , Equation (8). Blue circles (Bayesian) are m ( Δ 2 θ BL ) μ , θ | θ 0 , the blue solid line is the likelihood-averaged Ghosh bound m Δ 2 θ aGB , Equation (25). The inset in each panel is p pri ( θ ) , Equation (26), for the corresponding values of α .
Entropy 20 00628 g003
Figure 4. Comparisons of average posterior Bayesian variance, m ( Δ 2 θ BL ) μ , θ (dots), as a function of the sample size m under different prior distributions, (a) α = 100 , (b) α = 10 , (c) α = 1 , (d) α = 10 . This variance is compared to to the average Ghosh bound for random parameters m ( Δ 2 θ aGBr ) (grey line), the Van Trees bound m ( Δ 2 θ VTB ) (green line), the Ziv–Zakai bound m ( Δ 2 θ ZZB ) (red line) and 1 / F ( θ 0 ) (black horizontal line). The inset in each panel is the prior p pri ( θ ) , Equation (26), for the corresponding values of α .
Figure 4. Comparisons of average posterior Bayesian variance, m ( Δ 2 θ BL ) μ , θ (dots), as a function of the sample size m under different prior distributions, (a) α = 100 , (b) α = 10 , (c) α = 1 , (d) α = 10 . This variance is compared to to the average Ghosh bound for random parameters m ( Δ 2 θ aGBr ) (grey line), the Van Trees bound m ( Δ 2 θ VTB ) (green line), the Ziv–Zakai bound m ( Δ 2 θ ZZB ) (red line) and 1 / F ( θ 0 ) (black horizontal line). The inset in each panel is the prior p pri ( θ ) , Equation (26), for the corresponding values of α .
Entropy 20 00628 g004
Table 1. Frequentist vs Bayesian bounds for fixed and random parameters.
Table 1. Frequentist vs Bayesian bounds for fixed and random parameters.
ParadigmRisk FunctionBoundsRemarks
θ 0 fixedFrequentist ( Δ 2 θ est ) μ | θ 0 BBEquation (5)hierarchy of bounds, Equation (7)
EChRBEquation (17)
MSE ( θ est ) μ | θ 0 ChRBEquation (14)
CRLBEquation (8)
Bayesian ( Δ 2 θ BL ) μ | θ 0 GBEquation (22)function of μ
( Δ 2 θ BL ) μ , θ | θ 0 aGBEquation (25)average over likelihood p ( μ | θ 0 )
θ 0 randomFrequentist ( Δ 2 θ est ) μ , θ 0 aCRLBEquation (37)hierarchy of bounds, Equation (40)
fVTBEquation (38)
MSE ( θ est ) μ , θ 0 VTBEquation (32)bounds are independent of the bias
ZZBEquation (35)
Bayesian ( Δ 2 θ BL ) μ , θ , θ 0 aGBrEquation (42)prior p pri ( θ ) and fluctuations p ( θ 0 ) arbitrary
( Δ 2 θ BL ) μ , θ VTBEquation (32)prior p pri ( θ ) and fluctuations p ( θ 0 ) coincide
ZZBEquation (35)hierarchy of bounds, Equation (45)

Share and Cite

MDPI and ACS Style

Li, Y.; Pezzè, L.; Gessner, M.; Ren, Z.; Li, W.; Smerzi, A. Frequentist and Bayesian Quantum Phase Estimation. Entropy 2018, 20, 628. https://doi.org/10.3390/e20090628

AMA Style

Li Y, Pezzè L, Gessner M, Ren Z, Li W, Smerzi A. Frequentist and Bayesian Quantum Phase Estimation. Entropy. 2018; 20(9):628. https://doi.org/10.3390/e20090628

Chicago/Turabian Style

Li, Yan, Luca Pezzè, Manuel Gessner, Zhihong Ren, Weidong Li, and Augusto Smerzi. 2018. "Frequentist and Bayesian Quantum Phase Estimation" Entropy 20, no. 9: 628. https://doi.org/10.3390/e20090628

APA Style

Li, Y., Pezzè, L., Gessner, M., Ren, Z., Li, W., & Smerzi, A. (2018). Frequentist and Bayesian Quantum Phase Estimation. Entropy, 20(9), 628. https://doi.org/10.3390/e20090628

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop