Abstract
The widespread use of quantile regression methods depends crucially on the existence of fast algorithms. Despite numerous algorithmic improvements, the computation time is still non-negligible because researchers often estimate many quantile regressions and use the bootstrap for inference. We suggest two new fast algorithms for the estimation of a sequence of quantile regressions at many quantile indexes. The first algorithm applies the preprocessing idea of Portnoy and Koenker (Stat Sci 12(4):279–300, 1997) but exploits a previously estimated quantile regression to guess the sign of the residuals. This step allows for a reduction in the effective sample size. The second algorithm starts from a previously estimated quantile regression at a similar quantile index and updates it using a single Newton–Raphson iteration. The first algorithm is exact, while the second is only asymptotically equivalent to the traditional quantile regression estimator. We also apply the preprocessing idea to the bootstrap by using the sample estimates to guess the sign of the residuals in the bootstrap sample. Simulations show that our new algorithms provide very large improvements in computation time without significant (if any) cost in the quality of the estimates. For instance, we divide by 100 the time required to estimate 99 quantile regressions with 20 regressors and 50,000 observations.
Similar content being viewed by others
Notes
Among others, see Chapter 1 in Stigler (1986).
See Koenker and Bassett (1978).
On the other hand, note that quantile regression is not robust to outliers in the x-direction.
Parametric programming is a technique for investigating the effects of a change in the parameters (here of the quantile index \(\tau \)) of the objective function.
See, e.g., Chernozhukov et al. (2013).
It is actually possible to use the estimates from the previous quantile regression as starting values for the next quantile regression. These better starting values allow for reducing the computing time and are, therefore, used by all our algorithms.
Schmidt and Zhu (2016) have suggested a different iterative estimation strategy. They also start from one quantile regression, but they add or subtract sums of nonnegative functions to it to calculate other quantiles. Their procedure has a different objective (monotonicity of the estimated conditional quantile function), and their estimator is not asymptotically equivalent to the traditional quantile regression estimator.
A similar idea could be applied to adjust the constant m in Algorithm 2. The additional difficulty is that the optimal constant probably depends on the quantile index \(\tau \), which is not the case for the bootstrap.
Algorithm 2 can be slightly improved by using preprocessing with \(\hat{\beta }(\tau _1)\) as a preliminary estimate of \(\hat{\beta }^{*b}(\tau _1)\) instead of computing it completely from scratch.
We provide the results for the median regression, but the ranking was similar at other quantile indexes.
To make the estimates comparable across quantiles and regressors, we first normalize them such that they have unit variance in the specification with \(n=50,000\). Then, we calculate the measures of performance separately for each parameter and average them over all quantile indices and regressors. The reported relative MSE and MAE are the averaged relative MSE and MAE. Alternatively, it is possible to calculate the ratio of the averaged MSE and MAE with the results in Table 2. These ratios of averages and averages of ratios are very similar.
In small samples, they are both sensitive to the exact choice of the bandwidth. Pouliot (2020) gives simulation evidence of this sensitivity.
See the supplementary appendix SA to Chernozhukov et al. (2013) for the construction and the validity of the uniform bands.
The largest p-value is 0.02 for high school.
Pouliot (2020) suggests a preprocessing algorithm for instrumental variable quantile regression.
References
Abrevaya J (2001) The effects of demographics and maternal behavior on the distribution of birth outcomes. Empir Econ 26(1):247–257
Angrist J, Chernozhukov V, Fernández-Val I (2006) Quantile regression under misspecification, with an application to the us wage structure. Econometrica 74:539–563
Baidal JAW, Locks LM, Cheng ER, Blake-Lamb TL, Perkins ME, Taveras EM (2016) Risk factors for childhood obesity in the first 1,000 days: a systematic review. Am J Prev Med 50(6):761–779
Barrodale I, Roberts F (1974) Solution of an overdetermined system of equations in the l 1 norm [f4]. Commun ACM 17(6):319–320
Belloni A, Chernozhukov V, Fernández-Val I, Hansen C (2017) Program evaluation and causal inference with high-dimensional data. Econometrica 85(1):233–298
Black SE, Devereux PJ, Salvanes KG (2007) From the cradle to the labor market? The effect of birth weight on adult outcomes. Q J Econ 122(1):409–439
Chernozhukov V, Fernández-Val I (2005) Subsampling inference on quantile regression processes. Sankhya Indian J Stat 67:253–276
Chernozhukov V, Fernández-Val I (2011) Inference for extremal conditional quantile models, with an application to market and birthweight risks. Rev Econ Stud 78(2):559–589
Chernozhukov V, Hansen C (2006) Instrumental quantile regression inference for structural and treatment effect models. J Econom 132:491–525
Chernozhukov V, Fernández-Val I, Melly B (2013) Inference on counterfactual distributions. Econometrica 81(6):2205–2268
Fortin N, Lemieux T, Firpo S (2011) Decomposition methods in economics. In: Handbook of labor economics, vol 4. Elsevier, pp 1–102
Giné E, Zinn J et al (1984) Some limit theorems for empirical processes. Ann Probab 12(4):929–989
Hagemann A (2017) Cluster-robust bootstrap inference in quantile regression models. J Am Stat Assoc 112(517):446–456
Hahn J (1997) Bayesian bootstrap of the quantile regression estimator: a large sample study. Int Econ Rev 38:795–808
Hall P, Sheather SJ (1988) On the distribution of a studentized quantile. J R Stat Soc Ser B 50:381–391
He F, Cheng Y, Tong T (2016) Estimation of extreme conditional quantiles through an extrapolation of intermediate regression quantiles. Stat Probab Lett 113:30–37
Kline P, Santos A (2012) A score based approach to wild bootstrap inference. J Econom Methods 1(1):23–41
Koenker R (2000) Galton, edgeworth, frisch, and prospects for quantile regression in econometrics. J Econom 95(2):347–374
Koenker R (2005) Quantile regression. Cambridge University Press, Cambridge
Koenker R (2017) Computational methods for quantile regression. In: Handbook of quantile regression, Chapman and Hall/CRC, pp 55–67
Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46:33–50
Koenker R, d’Orey V (1994) Remark as r92: a remark on algorithm as 229: Computing dual regression quantiles and regression rank scores. J R Stat Soc Ser C (Appl Stat) 43(2):410–414
Koenker R, Hallock KF (2001) Quantile regression. J Econ Perspect 15:143–156
Koenker R, Portnoy S (1987) L-estimation for linear models. J Am Stat Assoc 82(399):851–857
Koenker R, Xiao Z (2002) Inference on the quantile regression process. Econometrica 70:1583–1612
Koenker R, Chernozhukov V, He X, Peng L (2017) Handbook of quantile regression. CRC Press, Boca Raton
Koenker RW, D’Orey V (1987) Algorithm as 229: computing regression quantiles. J R Stat Soc Ser C (Appl Stat) 36(3):383–393
Kordas G (2006) Smoothed binary regression quantiles. J Appl Econom 21(3):387–407
Le Cam L (1956) On the asymptotic theory of estimation and testing hypotheses. In: Proceedings of the third Berkeley symposium on mathematical statistics and probability, vol 1: contributions to the theory of statistics. University of California Press, Berkeley: pp 129–156
Machado J, Mata J (2005) Counterfactual decomposition of changes in wage distributions using quantile regression. J Appl Econom 20:445–465
Mammen E et al (1993) Bootstrap and wild bootstrap for high dimensional linear models. Ann Stat 21(1):255–285
Manski CF (1975) Maximum score estimation of the stochastic utility model of choice. J Econom 3:205–228
Neocleous T, Portnoy S (2008) On monotonicity of regression quantile functions. Stat Probab Lett 78(10):1226–1229
Portnoy S (1991) Asymptotic behavior of the number of regression quantile breakpoints. SIAM J Sci Stat Comput 12(4):867–883
Portnoy S, Koenker R (1997) The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators. Stat Sci 12(4):279–300
Pouliot GA (2020) Instrumental variables quantile regression with multivariate endogenous variable. Unpublished manuscript
Powell JL (1987) Semiparametric estimation of bivariate latent variable models. unpublished manuscript University of Wisconsin-Madison
Powell JL (1991) Estimation of monotonic regression models under quantile restrictions. Nonparametric and semiparametric methods in econometrics. Cambridge University Press, New York, pp 357–384
Schmidt L, Zhu Y (2016) Quantile spacings: a simple method for the joint estimation of multiple quantiles without crossing. Available at SSRN 2220901
Stigler SM (1986) The history of statistics: the measurement of uncertainty before 1900. Harvard University Press, Cambridge
Thisted RA (1997) The gaussian hare and the laplacian tortoise: Computability of squared-error versus absolute-error estimators: Comment. Statistical Science pp 296–298
van der Vaart A (1998) Asymptotic statistics. Cambridge University Press, Cambridge
Volgushev S, Chao SK, Cheng G et al (2019) Distributed inference for quantile regression processes. Ann Stat 47(3):1634–1662
Wang H, Li D, He X (2012) Estimation of high conditional quantiles for heavy-tailed distributions. J Am Stat Assoc 107:1453–1464
Acknowledgements
We would like to thank the associate editor Roger Koenker, two anonymous referees and the participants to the conference “Economic Applications of Quantile Regressions 2.0” that took place at the Nova School of Business and Economics for useful comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chernozhukov, V., Fernández-Val, I. & Melly, B. Fast algorithms for the quantile regression process. Empir Econ 62, 7–33 (2022). https://doi.org/10.1007/s00181-020-01898-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00181-020-01898-0