Abstract
In a regression with independent and identically distributed normal residuals, the log-likelihood function yields an empirical form of the \(\mathcal{L}^2\)-norm, whereas the normal distribution can be obtained as a solution of differential entropy maximization subject to a constraint on the \(\mathcal{L}^2\)-norm of a random variable. The \(\mathcal{L}^1\)-norm and the double exponential (Laplace) distribution are related in a similar way. These are examples of an “inter-regenerative” relationship. In fact, \(\mathcal{L}^2\)-norm and \(\mathcal{L}^1\)-norm are just particular cases of general error measures introduced by Rockafellar et al. (Finance Stoch 10(1):51–74, 2006) on a space of random variables. General error measures are not necessarily symmetric with respect to ups and downs of a random variable, which is a desired property in finance applications where gains and losses should be treated differently. This work identifies a set of all error measures, denoted by \(\mathscr {E}\), and a set of all probability density functions (PDFs) that form “inter-regenerative” relationships (through log-likelihood and entropy maximization). It also shows that M-estimators, which arise in robust regression but, in general, are not error measures, form “inter-regenerative” relationships with all PDFs. In fact, the set of M-estimators, which are error measures, coincides with \(\mathscr {E}\). On the other hand, M-estimators are a particular case of L-estimators that also arise in robust regression. A set of L-estimators which are error measures is identified—it contains \(\mathscr {E}\) and the so-called trimmed \(\mathcal{L}^p\)-norms.
Similar content being viewed by others
Notes
Rockafellar et al. [38, 39] proposed a unifying axiomatic framework for general measures of error, deviation and risk—all of them are positively homogenous convex functionals defined on a space of r.v.’s, see also [34, 37], whereas recently, Grechuk and Zabarankin [15] analyzed sensitivity of optimal values of positively homogenous convex functionals in various optimization problems, including linear regression, to noise in the data.
We assume that 0 ln 0 = 0.
A deviation measure is a functional \(\mathcal{D}:\mathcal{L}^r(\Theta )\rightarrow [0,\infty ]\) satisfying axioms E2–E4 and such that \(\mathcal{D}(Z) = 0\) for constant Z, and \(\mathcal{D}(Z) > 0\) otherwise [38]. A deviation measure is called law-invariant if \(\mathcal{D}(X) = \mathcal{D}(Y)\) whenever r.v.’s X and Y have the same distribution [12].
References
Alfons, A., Croux, C., Gelper, S.: Sparse least trimmed squares regression for analyzing high-dimensional large data sets. Ann. Appl. Stat. 7(1), 226–248 (2013)
Bartolucci, F., Scaccia, L.: The use of mixtures for dealing with non-normal regression errors. Comput. Stat. Data Anal. 48(4), 821–834 (2005)
Bernholt, T.: Computing the least median of squares estimator in time o(\(n^d\)). In: International Conference on Computational Science and Its Applications, pp. 697–706. Springer (2005)
Boscovich, R.J.: De litteraria expeditione per pontificiam ditionem, et synopsis amplioris operis, ac habentur plura ejus ex exemplaria etiam sensorum impressa. Bononiensi Scientarum et Artum Instituto Atque Academia Commentarii 4, 353–396 (1757)
Box, G.: Non-normality and tests on variances. Biometrika 40, 318–335 (1953)
Cover, T., Thomas, J.: Elements of Information Theory. Wiley, New York (2012)
Edgeworth, F.: On observations relating to several quantities. Hermathena 6(13), 279–285 (1887)
Efron, B.: Regression percentiles using asymmetric squared error loss. Stat. Sin. 1(1), 93–125 (1991)
Föllmer, H., Schied, A.: Stochastic Finance, 3rd edn. de Gruyter, Berlin (2011)
Gauss, C.F.: Theoria motus corporum coelestium in sectionibus conicis solem ambientium. sumtibus Frid. Perthes et IH Besser (1809)
Grechuk, B., Molyboha, A., Zabarankin, M.: Maximum entropy principle with general deviation measures. Math. Oper. Res. 34(2), 445–467 (2009)
Grechuk, B., Molyboha, A., Zabarankin, M.: Chebyshev inequalities with law-invariant deviation measures. Probab. Eng. Inf. Sci. 24(1), 145–170 (2010)
Grechuk, B., Zabarankin, M.: Schur convex functionals: Fatou property and representation. Math. Finance 22(2), 411–418 (2012)
Grechuk, B., Zabarankin, M.: Inverse portfolio problem with mean-deviation model. Eur. J. Oper. Res. 234(2), 481–490 (2014)
Grechuk, B., Zabarankin, M.: Sensitivity analysis in applications with deviation, risk, regret, and error measures. SIAM J. Optim. 27(4), 2481–2507 (2017)
Gu, Y., Zou, H.: High-dimensional generalizations of asymmetric least squares regression and their applications. Ann. Stat. 44(6), 2661–2694 (2016)
Harter, L.: The method of least squares and some alternatives: Part I. In: International Statistical Review/Revue Internationale de Statistique, pp. 147–174 (1974)
Hosking, J., Balakrishnan, N.: A uniqueness result for l-estimators, with applications to l-moments. Stat. Methodol. 24, 69–80 (2015)
Huber, P.: Robust estimation of a location parameter. Ann. Math. Stat. 35(1), 73–101 (1964)
Huber, P.: Robust Statistics. Wiley, New York (1981)
Jaynes, E.T.: Information theory and statistical mechanics (notes by the lecturer). Stat. Phys. 3 1, 181 (1963)
Jouini, E., Schachermayer, W., Touzi, N.: Law invariant risk measures have the Fatou property. Adv. Math. Econ. 9, 49–71 (2006)
Koenker, R., Bassett Jr., G.: Regression quantiles. Econ. J. Econ. Soc. 46(1), 33–50 (1978)
Krokhmal, P.: Higher moment coherent risk measures. Quant. Finance 7(4), 373–387 (2007)
Kullback, S., Leibler, R.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)
Laplace, P.S.: Traité de mécanique céleste, vol. 2. J. B. M. Duprat, Paris (1799)
Lee, W.M., Hsu, Y.C., Kuan, C.M.: Robust hypothesis tests for m-estimators with possibly non-differentiable estimating functions. Econom. J. 18(1), 95–116 (2015)
Legendre, A.M.: Nouvelles méthodes pour la détermination des orbites des comètes. 1. F. Didot, Paris (1805)
Lisman, J., Van Zuylen, M.: Note on the generation of most probable frequency distributions. Stat. Neerl. 26(1), 19–23 (1972)
Loh, P.L.: Statistical consistency and asymptotic normality for high-dimensional robust \(m\)-estimators Ann. Stat. 45(2), 866–896 (2017)
Mafusalov, A., Uryasev, S.: CVaR (superquantile) norm: stochastic case. Eur. J. Oper. Res. 249(1), 200–208 (2016)
Morales-Jimenez, D., Couillet, R., McKay, M.: Large dimensional analysis of robust m-estimators of covariance with outliers. IEEE Trans. Signal Process. 63(21), 5784–5797 (2015)
Mount, D., Netanyahu, N., Piatko, C., Silverman, R., Wu, A.: On the least trimmed squares estimator. Algorithmica 69(1), 148–183 (2014)
Rockafellar, R.T., Royset, J.: Measures of residual risk with connections to regression, risk tracking, surrogate models, and ambiguity. SIAM J. Optim. 25(2), 1179–1208 (2015)
Rockafellar, R.T., Royset, J.: Random variables, monotone relations, and convex analysis. Math. Program. 148(1–2), 297–331 (2014)
Rockafellar, R.T., Uryasev, S.: Conditional value-at-risk for general loss distributions. J. Bank. Finance 26(7), 1443–1471 (2002)
Rockafellar, R.T., Uryasev, S.: The fundamental risk quadrangle in risk management, optimization and statistical estimation. Surv. Oper. Res. Manag. Sci. 18(1), 33–53 (2013)
Rockafellar, R.T., Uryasev, S., Zabarankin, M.: Generalized deviations in risk analysis. Finance Stoch. 10(1), 51–74 (2006)
Rockafellar, R.T., Uryasev, S., Zabarankin, M.: Risk tuning with generalized linear regression. Math. Oper. Res. 33(3), 712–729 (2008)
Rousseeuw, P., Leroy, A.: Robust Regression and Outlier Detection, vol. 589. Wiley, New York (2005)
Rousseeuw, P., Van Driessen, K.: Computing LTS regression for large data sets. Data Min. Knowl. Disc. 12(1), 29–45 (2006)
Rousseeuw, P.G.: Least median of squares regression. J. Am. Stat. Assoc. 79, 871–880 (1984)
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. 27, 379–423, 623–656 (1948)
Xie, S., Zhou, Y., Wan, A.: A varying-coefficient expectile model for estimating value at risk. J. Bus. Econ. Stat. 32(4), 576–592 (2014)
Zabarankin, M., Uryasev, S.: Statistical Decision Problems: Selected Concepts and Portfolio Safeguard Case Studies. Springer, Berlin (2014)
Acknowledgements
We are grateful to the referees for the comments and suggestions, which helped to improve the quality of the paper. The first author thanks the University of Leicester for granting him the academic study leave to do this research.
Author information
Authors and Affiliations
Corresponding author
Appendix A: Proofs of Propositions 1–6
Appendix A: Proofs of Propositions 1–6
1.1 Appendix A.1: Proof of Proposition 1
Since \(\mathcal{E}(Z)\) assumes all values in \([0,+\infty )\), the range of h is \([0,+\infty )\), hence it is continuous and \(h(0)=0\). This implies that h has a strictly increasing continuous inverse function \(h^{-1}:\mathbb {R}^+\rightarrow \mathbb {R}^+\), and
For constant \(Z=t\geqslant 0\),
Similarly, \(\rho (t)=h^{-1}(|t|\mathcal{E}(-1))\) for \(t\leqslant 0\). Consequently, in general,
where \(a=\mathcal{E}(1)>0\) and \(b=\mathcal{E}(-1)>0\). Thus,
where \(\varphi =h^{-1}\).
Since \(\Theta =(\Omega , \mathcal{M}, \mathbb {P})\) is non-trivial, there exists an event \(A\in \mathcal{M}\) such that \(p=\mathbb {P}[A]\in (0,1)\). For any non-negative constants c and d, let Z be an r.v. assuming values \(Z(\omega )=c/a\geqslant 0\) and \(Z(\omega )=d/a\geqslant 0\) for \(\omega \in A\) and \(\omega \not \in A\), respectively. Then
for any \(\lambda \geqslant 0\). Replacing c and d by \(\varphi ^{-1}(c)\) and \(\varphi ^{-1}(d)\), respectively, and applying \(\varphi (\cdot )\) to the left-hand and right-hand parts of (29), we obtain
Consequently, the function \(g(x)=\varphi (\lambda \varphi ^{-1}(x))\) satisfies
Let
By definition, \(0\in \mathcal{A}\) and \(1\in \mathcal{A}\). Also, (30) implies that \(p a + (1-p)b \in \mathcal{A}\) whenever \(a,b\in \mathcal{A}\), hence \(\mathcal{A}\) is a dense subset of [0, 1]. Finally, \(\mathcal{A}\) is closed due to continuity of g, so that \(\mathcal{A}=[0,1]\), and g is a linear function. Since \(g(0)=\varphi (\lambda \varphi ^{-1}(0))=0\), there exists a constant \(C(\lambda )\) such that
Setting \(x=\varphi (y)\) in (31), we obtain
Then setting \(y=1\) in (32), we obtain \(\varphi (\lambda )=C(\lambda )\varphi (1)\). Consequently, \(C(\lambda )=\varphi (\lambda )/\varphi (1)\), and (32) takes the form \(\varphi (\lambda y)=\varphi (\lambda )\varphi (y)/\varphi (1)\quad \forall y, \lambda \geqslant 0\). For the function
this implies that
Since g is additive, continuous, and \(g(0)=0\), it is linear, i.e., \(g(x)=px\) for some constant p. Consequently, \(e^{px}=e^{g(x)}=\varphi (e^x)/\varphi (1)\). Finally, with \(e^x=y\), we obtain \(\varphi (y)=\varphi (1)y^p\), and (28) simplifies to
The condition \(p\geqslant 1\) follows from sub-additivity of \(\mathcal{E}\).
1.2 Appendix A.2: Proof of Proposition 2
Proposition 4.7 (b) in [11] implies that if \(Z^*\in \mathcal{C}^1(\Theta )\) has a log-concave PDF, then it is a solution to
for \(\mu =\mathbb {E}[Z^{*}]\) and some law-invariant the deviation measureFootnote 5\(\mathcal{D}\). Hence \(Z^*\) is a solution to with \(\mathcal{X}=\{Z\in \mathcal{C}^1(\Theta )\,|\,\mathbb {E}[Z]=\mu ,\,\mathcal{D}(Z)\leqslant 1\}\).
Conversely, let \(Z^*\in \mathcal{C}^1(\Theta )\) be a solution to (13) for some convex closed law-invariant set \(\mathcal{X}\). Then it is a solution to (33) for the deviation measure
where
\(\mathrm{CVaR}_{0}^\Delta (Z)=\mathbb {E}[Z]-\inf Z\) and \(\mathrm{CVaR}_{1}^\Delta (Z)=\sup Z - \mathbb {E}[Z]\), see [14]. Indeed, if an r.v. Z satisfies the constraints in (33) with \(\mathcal{D}\) given by (34), then \( \mathbb {E}[Z]=\mu =\mathbb {E}[Z^*]\), and \(\mathrm{CVaR}_\alpha ^\Delta (Z)\leqslant \mathrm{CVaR}_\alpha ^\Delta (Z^*)\) for all \(\alpha \in [0,1]\), so that Z dominates \(Z^*\) with respect to concave ordering, see Proposition 1 in [14]. Since \(Z^*\) has a PDF, the underlying probability space \(\Theta \) is, by definition, atomless, and part “(a) to (d)” of Corollary 2.61 in [9] along with Lemma 4.2 in [22] implies that \(Z \in \mathcal{X}\). Since \(Z^*\in \mathcal{C}^1(\Theta )\) is a solution to (13), this yields \(S(Z^*)\geqslant S(Z)\), and consequently, \(Z^*\) is a solution to (33). Thus, \(Z^*\) has a log-concave PDF by Proposition 4.11 in [11].
1.3 Appendix A.3: Proof of Proposition 3
If \(Z^*\in \mathcal{C}^1(\Theta )\) has a log-concave PDF, then it is a solution to (33) for some law-invariant deviation measure \(\mathcal{D}\). On the other hand, Proposition 5.1 in [45] shows that problem (33) is equivalent to (14) with an error measure \(\mathcal{E}\) such that \(\mathcal{D}(Z)=\inf _{C\in \mathbb {R}} \mathcal{E}(Z-C)\), i.e., \(\mathcal{D}\) is the deviation measure projected from \(\mathcal{E}\). In general, for a given deviation measure \(\mathcal{D}\), such an error measure is non-unique and can be determined by
which is called inverse projection of \(\mathcal{D}\), see [39]. Thus, \(Z^*\) is a solution to (14) with (35).
Conversely, let \(Z^*\in \mathcal{C}^1(\Theta )\) be a solution to (14) for some law-invariant error measure \(\mathcal{E}\). Then positive homogeneity of \(\mathcal{E}\) and relation \(S(kZ)=S(Z)+\ln k,k>0\), imply that \(Z^*\) is also a solution to
Since \(\{Z\,|\, \mathcal{E}(Z)\leqslant 1\}\) is a convex closed law-invariant set, \(Z^*\) has a log-concave PDF by Proposition 2.
1.4 Appendix A.4: Proof of Proposition 4
If \(\mathcal{E}\) and f satisfy the conditions of Proposition 4, then \(\mathcal{E}\) and \(\rho (t) = -\log (f(t))\) satisfy the conditions of Proposition 1. Consequently, \(\rho \) has the form in (12), which implies that \(f(t)=e^{-\rho (t)}\) has the form of (2b).
1.5 Appendix A.5: Proof of Proposition 5
Since h is strictly increasing, problem (8) with \(\mathcal{E}^*\) is equivalent to minimizing \(\mathbb {E}[\rho ^*(Z)]\) or to maximizing \(\mathbb {E}[\ln (f^*(Z))]\). For an r.v. Z such that \(\mathbb {P}[Z=z_i]=1/n,i=1,\dots ,n\), it reduces to (6).
With \(c=h\left( - \int _{-\infty }^\infty f^*(t)\ln f^*(t)\,dt\right) \), the constraint \(\mathcal{E}^*(Z)= c\) in (19) simplifies to
which holds for \(f=f^*\) and for any \(f \ne f^*\) implies that
where the first inequality follows from the non-negativity of relative entropy (Kullback-Leibler divergence between f and \(f^*\)), defined as \(D_{KL}(f||f^*)=\int _{-\infty }^\infty f(t)\ln \frac{f(t)}{f^*(t)}\,dt \geqslant 0\), see [25].
1.6 Appendix A.6: Proof of Proposition 6
We first prove the “if” part in (a) and (b). If \(\mathcal{E}\) is a particular case of (2a), it is an error measure that can be represented in the form of (11), which is (21) with M being a Lebesgue measure on (0, 1), and the “if” part in (a) follows. If \(\mathcal{E}\) is a particular case of (25), then it can be represented in the form of (23) with \(M(c,d)=\int _c^d w(\alpha ) \, d\alpha , \, 0\leqslant c<d\leqslant 1,\rho (t)=t_{a,b}^p\), and \(h(x)=x^{1/p}\). For \(Z\ne 0,q_{Z_{a,b}}^p(\alpha )\) is a non-negative non-decreasing function with \(\int _0^1 q_{Z_{a,b}}^p(\alpha ) \,d\alpha > 0\), so that \(L=\lim \limits _{\alpha \rightarrow 1} q_{Z_{a,b}}^p(\alpha ) > 0\), and we claim that
Indeed, if \(w(\alpha )\) is a delta function at 1, (36) reduces to \(I=L>0\). Otherwise \(\lim \limits _{\alpha \rightarrow 1} w(\alpha ) > 0\), hence \(w(\alpha ^*)>0\) and \(q_{Z_{a,b}}^p(\alpha ^*)>0\) for some \(\alpha ^*<1\), and \(I \geqslant \int _{\alpha ^*}^1 w(\alpha ^*)q_{Z_{a,b}}^p(\alpha ^*) = (1-\alpha ^*)w(\alpha ^*)q_{Z_{a,b}}^p(\alpha ^*) > 0\).
Inequality \(I>0\) implies that \(\mathcal{E}(Z)\) is well-defined and satisfies E1. Property E2 is obvious, whereas E4 is proved for \(w(\alpha )=1\) in [38, Proposition 6], and the general case holds by a similar argument. Next, we claim that
holds for all \(X,Y \in \mathcal{L}^r(\Theta )\). Indeed, the second inequality in (37) is a triangle inequality for the \(\mathcal{L}^p[0,1]\)-norm, and the first one states that
for \(f(\alpha )=q_{(X+Y)_{a,b}}^p(\alpha )\) and \(g(\alpha )=(q_{X_{a,b}}(\alpha )+q_{Y_{a,b}}(\alpha ))^p\).
If \(f, g \in \mathcal{L}^r[0,1]\) are such that (38) holds for any non-negative non-decreasing \(w\in \mathcal{L}^1[0,1]\), we write \(g \succcurlyeq f\). The relation \(\succcurlyeq \) is
-
(i)
associative;
-
(ii)
monotone, in sense that \(f_1(\alpha ) \geqslant f_2(\alpha )\)\(\forall \alpha \in [0,1]\) implies that \(f_1 \succcurlyeq f_2\);
-
(iii)
\(q_{X}(\alpha ) + q_{Y}(\alpha ) \succcurlyeq q_{X+Y}(\alpha )\) for any r.v.’s \(X,Y \in \mathcal{L}^r(\Theta )\) due to sub-additivity of functional \(\mathcal{F}(Z) = \int _0^1 w(\alpha ) \, q_Z(\alpha ) \, d\alpha \), see [13, Proposition 4.3];
-
(iv)
\(f_1 \succcurlyeq f_2\) is equivalent to \(\int _c^1 f_1(\alpha )\,d\alpha \geqslant \int _c^1 \,f_2(\alpha )\,d\alpha \) for all \(c\in (0,1)\), which, in turn, is equivalent to \(\int _0^1 u(f_1(\alpha ))\,d\alpha \geqslant \int _0^1 u(f_2(\alpha ))\,d\alpha \) for all convex increasing u, see [35, Theorem 8]; and
-
(v)
\(f_1 \succcurlyeq f_2\) implies that \(u(f_1) \succcurlyeq u(f_2)\) for any convex increasing function u, which follows from (iv) and the fact that superposition of two convex increasing functions is convex increasing.
Properties (i)–(iii) imply that
and since the function \(\xi (z)=z^p\) is convex increasing for \(z\geqslant 0\), (38) follows from (v). This finishes the proof of “if” part in (b).
Now we prove the “only if” part. Let \(\mathcal{E}\) be an error measure that can be represented in the form of either (21) or (23). Since \(\mathcal{E}(Z)\) assumes all values in \([0,+\infty ),h\) is a strictly increasing continuous function with \(h(0)=0\) and has a strictly increasing continuous inverse function \(h^{-1}:\mathbb {R}^+\rightarrow \mathbb {R}^+\). Applying \(h^{-1}\) to both parts of either (21) or (23) and setting \(Z=t\), we obtain
Consequently, \(M(0,1)\ne 0\) and \(\rho (t) = \frac{1}{M(0,1)}h^{-1}(\mathcal{E}(t))\). If M and \(\rho \) are replaced by \(-M\) by \(-\rho \), respectively, then \(\mathcal{E}\) in (21) remains unchanged. Consequently, without loss of generality, we may assume that \(M(0,1)>0\). Positive homogeneity of \(\mathcal{E}\) implies that
where \(\varphi =h^{-1},t_{a,b}\) is given by (3), \(a=\mathcal{E}(1)>0\) and \(b=\mathcal{E}(-1)>0\). In particular, both (21) and (23) imply that
where we used \(q_{\varphi \left( aZ\right) }(\alpha )=\varphi (q_{aZ}(\alpha ))\).
If \(M(0,\alpha )=0\) for all \(\alpha <1\), (21) reduces to \(\mathcal{E}(Z)= a\,[\sup \, Z]_+ +b\,[\sup \, Z]_-\), which is not an error measure (property E1 fails), whereas (23) simplifies to \(\mathcal{E}(Z)=\sup (Z_{a,b})\), which is a particular case of (25) with w being the Dirac delta function at 1. Otherwise there exists \(\alpha \in (0,1)\) such that \(q=M(0,\alpha )/M(0,1)>0\). Since \(\Theta \) is atomless, there exists an event \(A\in \Theta \) with \(\mathbb {P}[A]=\alpha \). Let \(0 \leqslant c \leqslant d\), and let Z be an r.v. such that \(Z(\omega )=c/a\) for \(\omega \in A\) and \(Z(\omega )=d/a\) for \(\omega \not \in A\). Then (39) implies that
for any \(\lambda \geqslant 0\). Expression (40) coincides with (29), and the proof of Proposition 1 implies that \(\varphi \) should be in the form of \(\varphi (y)=\varphi (1)y^p,p>0\). Consequently,
and
In particular, (39) simplifies to
Let \(0=\alpha _0\leqslant \alpha _1<\alpha _2<\alpha _3\leqslant \alpha _4=1\) be such that \(\alpha _2-\alpha _1=\alpha _3-\alpha _2\), and let
Since \(\Theta \) is atomless, there exist events \(A,B \in \mathcal{M}\) such that \(\mathbb {P}[A]=\mathbb {P}[B]=\alpha _2\) and \(\mathbb {P}[A \cap B]=\alpha _1\). Subadditivity of \(\mathcal{E}\) implies that
where I is an indicator function. With (43), this yields
which simplifies to
Dividing both parts of (44) by \(\epsilon >0\) and taking limit \(\epsilon \rightarrow 0^+\), we obtain \(p2^{p-1}M_3\geqslant p2^{p-1}M_2\), or \(M_3\geqslant M_2\). This implies that the measure \(M(d\alpha )\) has a non-decreasing density \(\omega \) on [0, 1], which can be the Dirac delta function at the ends of the interval.
By selecting \(\alpha _1=\alpha _2-\delta \) and \(\alpha _3=\alpha _2+\delta \) and by taking \(\delta \rightarrow 0^+\), we can make \(M_3\) arbitrarily close to \(M_2\). Consequently, (44) may hold only if \((2+2\epsilon )^p - (2+\epsilon )^p\geqslant (2+\epsilon )^p-2^p\). With \(\epsilon =1\), this inequality reduces to \(4^p - 2\cdot 3^p + 2^p\geqslant 0\) and implies that \(p\geqslant 1\). If \(\mathcal{E}\) can be represented in the form of (23), inequality \(p \ge 1\) along with (41) and (42) yields (25). Moreover, \(\int _0^1 w(\alpha )d\alpha =M[0,1]>0\). To prove (b), it is left to verify that w is non-negative.
Let \(a\geqslant b\) in (25)—the case \(a \leqslant b\) is treated similarly. Since \(\Theta \) is atomless, for every \(\alpha \in (0,1/2]\), there exist events \(A,B \in \mathcal{M}\) such that \(\mathbb {P}[A]=\mathbb {P}[B]=\alpha \) and \(\mathbb {P}[A \cap B]=0\). Subadditivity of \(\mathcal{E}\) implies that
With (25), this yields
which simplifies to
Let \(\alpha ^*=\sup \{\alpha : w(\alpha )<0\}\). Since \(w(\alpha )\) is non-decreasing, (45) fails for \(\alpha =\alpha ^*/2\), and consequently, \(\alpha ^*=0\). Then \(\lim \limits _{\alpha \rightarrow 0} M(\alpha , 2\alpha ) \leqslant \lim \limits _{\alpha \rightarrow 0} \alpha w(2\alpha ) = 0\), so that \(\lim \limits _{\alpha \rightarrow 0} M(0, \alpha ) \geqslant 0\) by (45), which implies that w has no negative delta function at 0 as well. This finishes the proof of (b).
Finally, suppose that \(\mathcal{E}\) has the form of (21). Then an analogue of (43) for negative r.v.’s is given by
Since \(q_{-Z}(\alpha )=-q_{Z}(1-\alpha )\) for almost all \(\alpha \in (0,1)\), (46) can be written as
where \(Z'=-Z\) and \(M'\) is a measure such that \(M'(a,b)=M(1-b,1-a)\) for any interval (a, b). The last expression coincides with (43) and the same argument implies that \(M'(d\alpha )\) has a non-decreasing density \(\omega '\) on (0, 1). Since \(\omega '(\alpha )=\omega (1-\alpha ),\alpha \in (0,1)\), both \(\omega \) and \(\omega '\) may be non-decreasing only if \(\omega \) is constant, which along with (41) and (42) yields (2a) and proves (a).
Rights and permissions
About this article
Cite this article
Grechuk, B., Zabarankin, M. Regression analysis: likelihood, error and entropy. Math. Program. 174, 145–166 (2019). https://doi.org/10.1007/s10107-018-1256-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-018-1256-6