Abstract
Longitudinal data, measurements taken from the same subjects over time, appear routinely in many scientific fields, such as biomedical science, public health, ecology and environmental sciences. With the rapid development of information technology, modern longitudinal data are becoming massive in volume and high dimensional, hence often require distributed analysis in real-world applications. Standard divide-and-conquer techniques do not apply directly to longitudinal big data due to within-subject dependence. In this paper, we focus on developing a distributed algorithm to support quantile regression (QR) analysis of longitudinal big data, which currently remains an open and challenging issue. We employ weighted quantile regression (WQR) to accommodate the correlation in longitudinal big data, and parallelize the WQR estimation process with a two-stage algorithm to support distributed computing. Based on weights estimated in the first stage by the Newton–Raphson algorithm, the second stage solves the WQR problem using the multi-block alternating direction method of multipliers (ADMM). Simulation studies show that, compared to traditional non-distributed algorithms, our proposed method has favorable estimation accuracy and is computationally more efficient in both non-distributed and distributed environments. Further, we also analyze an air quality data set to illustrate the practical performance of this method.
Similar content being viewed by others
References
Ai M, Wang F, Yu J, Zhang H (2021) Optimal subsampling for large-scale quantile regression. J Complex 62:101512
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122
Brown BM, Wang Y-G (2005) Standard errors and covariance matrices for smoothed rank estimators. Biometrika 92(1):149–158
Burden RL, Faires JD (2010) Numerical analysis, (9th edn.), Cengage Learning
Chen C, Wei Y (2005) Computational issues for quantile regression. Sankhyā: Indian J Stat 67(2):399–417
Chen C, He B, Ye Y, Yuan X (2016) The direct extension of admm for multi-block convex minimization problems is not necessarily convergent. Math Program 155(1):57–79
Chen X, Liu W, Zhang Y (2019) Quantile regression under memory constraint. Annals Stat 47(6):3244–3273
Chen L, Zhou Y (2020) Quantile regression in big data: a divide and conquer based strategy. Comput Stat Data Anal 144:106892
Chen X, Liu W, Mao X, Yang Z (2020) Distributed high-dimensional regression under a quantile loss function. J Mach Learn Res 21(182):1–43
Deng W, Lai M-J, Peng Z, Yin W (2017) Parallel multi-block ADMM with \(o(1/k)\) convergence. J Sci Comput 71(2):712–736
Engels JM, Diehr P (2003) Imputation of missing longitudinal data: a comparison of methods. J Clin Epidemiol 56(10):968–976
Fu L, Wang Y-G (2012) Quantile regression for longitudinal data with a working correlation model. Comput Stat Data Anal 56(8):2526–2538
Gabay D, Mercier B (1976) A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput Math Appl 2(1):17–40
Geraci M, Bottai M (2007) Quantile regression for longitudinal data using the asymmetric Laplace distribution. Biostatistics 8(1):140–154
Glowinski R, Marroco A (1975) Sur L’approximation, par Éléments Finis D’ordre un, et la Résolution, par Pénalisation-Dualité D’une Classe de Problèmes de Dirichlet Nonlinéaires. Revue Française D’automatique Inf Rech Opérationnelle. Anal Numér 9(2):41–76
Guan L, Qiao L, Li D, Sun T, Ge K, Lu X (2018) An efficient ADMM-based algorithm to nonconvex penalized support vector machines. In: Proceedings of the 2018 IEEE international conference on data mining workshops (ICDMW), pp 1209–1216. IEEE
Gu Y, Fan J, Kong L, Ma S, Zou H (2018) ADMM for high-dimensional sparse penalized quantile regression. Technometrics 60(3):319–331
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67
Hu A, Jiao Y, Liu Y, Shi Y, Wu Y (2021) Distributed quantile regression for massive heterogeneous data. Neurocomputing 448:249–262
Kibria BG, Joarder AH (2006) A short review of multivariate \(t\)-distribution. J Stat Res 40(1):59–72
Koenker R (2004) Quantile regression for longitudinal data. J Multivar Anal 91(1):74–89
Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46(1):33–50
Leng C, Zhang W (2014) Smoothing combined estimating equations in quantile regression for longitudinal data. Stat Comput 24(1):123–136
Liang K-Y, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73(1):13–22
Liang X, Zou T, Guo B, Li S, Zhang H, Zhang S, Huang H, Chen S (2015) Assessing Beijing’s PM2.5 pollution: severity, weather impact, apec and winter heating. Proc R Soc A: Math Phys Eng Sci 471(2182):20150257
Lu W, Zhu Z, Lian H (2020) High-dimensional quantile tensor regression. J Mach Learn Res 21(250):1–31
Lv Y, Qin G, Zhu Z, Tu D (2019) Quantile regression and empirical likelihood for the analysis longitudinal data with monotone missing responses due to dropout, with applications to quality of life measurements from clinical trials. Stat Med 38(16):2972–2991
Marino MF, Farcomeni A (2015) Linear quantile regression models for longitudinal experiments: an overview. METRON 73(2):229–247
Nesterov Y, Nemirovski A (2013) On first-order algorithms for \(l_1\)/nuclear norm minimization. Acta Numer 22:509–575
Ochando LC, Julián CIF, Ochando FC, Ferri C (2015) Airvlc: an application for real-time forecasting urban air pollution. In: Proceedings of the 2nd international conference on mining urban data, pp. 72–79
Portnoy S, Koenker R (1997) The Gaussian Hare and the laplacian tortoise: computability of squared-error versus absolute-error estimators. Stat Sci 12(4):279–300
Qu A, Lindsay BG, Li B (2000) Improving generalised estimating equations using quadratic inference functions. Biometrika 87(4):823–836
Royen T (1995) On some central and non-central multivariate chi-square distributions. Stat Sin 5:373–397
Shi Y, Jiao Y, Cao Y, Liu Y (2018) An alternating direction method of multipliers for mcp-penalized regression with high-dimensional data. Acta Math Sin Engl Ser 34(12):1892–1906
Shi Y, Wu Y, Xu D, Jiao Y (2018) An ADMM with continuation algorithm for non-convex sica-penalized regression in high dimensions. J Stat Comput Simul 88(9):1826–1846
Smith V, Forte S, Ma C, Takáč M, Jordan MI, Jaggi M (2018) CoCoA: a general framework for communication-efficient distributed optimization. J Mach Learn Res 18(230):1–49
Tang CY, Leng C (2011) Empirical likelihood and quantile regression in longitudinal data analysis. Biometrika 98(4):1001–1006
Tang Y, Wang Y, Li J, Qian W (2015) Improving Estimation efficiency in quantile regression with longitudinal data. J Stat Plan Inference 165:38–55
Volgushev S, Chao S-K, Cheng G (2019) Distributed inference for quantile regression processes. Ann Stat 47(3):1634–1662
Wang H, Li C (2017) Distributed quantile regression over sensor networks. IEEE Trans Signal Inf Process Netw 4(2):338–348
Wang H, Ma Y (2021) Optimal subsampling for quantile regression in big data. Biometrika 108(1):99–112
Wang HJ, Zhu Z (2011) Empirical likelihood for quantile regression models with longitudinal data. J Stat Plan Inference 141(4):1603–1615
Yang J, Meng X, Mahoney MW (2014) Quantile regression for large-scale applications. SIAM J Sci Comput 36(5):78–110
Yuan X, Lin N, Dong X, Liu T (2017) Weighted quantile regression for longitudinal data using empirical likelihood. Sci China Math 60(1):147–164
Yu L, Lin N (2017) ADMM for penalized quantile regression in big data. Int Stat Rev 85(3):494–518
Yu L, Lin N, Wang L (2017) A parallel algorithm for large-scale nonconvex penalized quantile regression. J Comput Gr Stat 26(4):935–939
Zhao W, Lian H, Song X (2017) Composite quantile regression for correlated data. Comput Stat S Data Anal 109:15–33
Funding
Nan Lin’s work is supported by NVDIA GPU grant program. Ye Fan’s work is supported by Initial Scientific Research Fund of Young Teachers in Capital University of Economics and Business [Grant No. XRZ2022062], and partially supported by Special Fund for Basic Scientific Research of Beijing Municipal Colleges in Capital University of Economics and Business [Grant No. QNTD202207].
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fan, Y., Lin, N. & Yu, L. Distributed quantile regression for longitudinal big data. Comput Stat 39, 751–779 (2024). https://doi.org/10.1007/s00180-022-01318-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-022-01318-0