[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Distributed quantile regression for longitudinal big data

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Longitudinal data, measurements taken from the same subjects over time, appear routinely in many scientific fields, such as biomedical science, public health, ecology and environmental sciences. With the rapid development of information technology, modern longitudinal data are becoming massive in volume and high dimensional, hence often require distributed analysis in real-world applications. Standard divide-and-conquer techniques do not apply directly to longitudinal big data due to within-subject dependence. In this paper, we focus on developing a distributed algorithm to support quantile regression (QR) analysis of longitudinal big data, which currently remains an open and challenging issue. We employ weighted quantile regression (WQR) to accommodate the correlation in longitudinal big data, and parallelize the WQR estimation process with a two-stage algorithm to support distributed computing. Based on weights estimated in the first stage by the Newton–Raphson algorithm, the second stage solves the WQR problem using the multi-block alternating direction method of multipliers (ADMM). Simulation studies show that, compared to traditional non-distributed algorithms, our proposed method has favorable estimation accuracy and is computationally more efficient in both non-distributed and distributed environments. Further, we also analyze an air quality data set to illustrate the practical performance of this method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Ai M, Wang F, Yu J, Zhang H (2021) Optimal subsampling for large-scale quantile regression. J Complex 62:101512

    MathSciNet  Google Scholar 

  • Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122

    Google Scholar 

  • Brown BM, Wang Y-G (2005) Standard errors and covariance matrices for smoothed rank estimators. Biometrika 92(1):149–158

    MathSciNet  Google Scholar 

  • Burden RL, Faires JD (2010) Numerical analysis, (9th edn.), Cengage Learning

  • Chen C, Wei Y (2005) Computational issues for quantile regression. Sankhyā: Indian J Stat 67(2):399–417

    MathSciNet  Google Scholar 

  • Chen C, He B, Ye Y, Yuan X (2016) The direct extension of admm for multi-block convex minimization problems is not necessarily convergent. Math Program 155(1):57–79

    MathSciNet  Google Scholar 

  • Chen X, Liu W, Zhang Y (2019) Quantile regression under memory constraint. Annals Stat 47(6):3244–3273

    MathSciNet  Google Scholar 

  • Chen L, Zhou Y (2020) Quantile regression in big data: a divide and conquer based strategy. Comput Stat Data Anal 144:106892

    MathSciNet  Google Scholar 

  • Chen X, Liu W, Mao X, Yang Z (2020) Distributed high-dimensional regression under a quantile loss function. J Mach Learn Res 21(182):1–43

    MathSciNet  Google Scholar 

  • Deng W, Lai M-J, Peng Z, Yin W (2017) Parallel multi-block ADMM with \(o(1/k)\) convergence. J Sci Comput 71(2):712–736

    MathSciNet  Google Scholar 

  • Engels JM, Diehr P (2003) Imputation of missing longitudinal data: a comparison of methods. J Clin Epidemiol 56(10):968–976

    PubMed  Google Scholar 

  • Fu L, Wang Y-G (2012) Quantile regression for longitudinal data with a working correlation model. Comput Stat Data Anal 56(8):2526–2538

    MathSciNet  Google Scholar 

  • Gabay D, Mercier B (1976) A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput Math Appl 2(1):17–40

    Google Scholar 

  • Geraci M, Bottai M (2007) Quantile regression for longitudinal data using the asymmetric Laplace distribution. Biostatistics 8(1):140–154

    PubMed  Google Scholar 

  • Glowinski R, Marroco A (1975) Sur L’approximation, par Éléments Finis D’ordre un, et la Résolution, par Pénalisation-Dualité D’une Classe de Problèmes de Dirichlet Nonlinéaires. Revue Française D’automatique Inf Rech Opérationnelle. Anal Numér 9(2):41–76

    Google Scholar 

  • Guan L, Qiao L, Li D, Sun T, Ge K, Lu X (2018) An efficient ADMM-based algorithm to nonconvex penalized support vector machines. In: Proceedings of the 2018 IEEE international conference on data mining workshops (ICDMW), pp 1209–1216. IEEE

  • Gu Y, Fan J, Kong L, Ma S, Zou H (2018) ADMM for high-dimensional sparse penalized quantile regression. Technometrics 60(3):319–331

    MathSciNet  Google Scholar 

  • Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67

    Google Scholar 

  • Hu A, Jiao Y, Liu Y, Shi Y, Wu Y (2021) Distributed quantile regression for massive heterogeneous data. Neurocomputing 448:249–262

    Google Scholar 

  • Kibria BG, Joarder AH (2006) A short review of multivariate \(t\)-distribution. J Stat Res 40(1):59–72

    MathSciNet  Google Scholar 

  • Koenker R (2004) Quantile regression for longitudinal data. J Multivar Anal 91(1):74–89

    MathSciNet  Google Scholar 

  • Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46(1):33–50

    MathSciNet  Google Scholar 

  • Leng C, Zhang W (2014) Smoothing combined estimating equations in quantile regression for longitudinal data. Stat Comput 24(1):123–136

    MathSciNet  Google Scholar 

  • Liang K-Y, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73(1):13–22

    MathSciNet  Google Scholar 

  • Liang X, Zou T, Guo B, Li S, Zhang H, Zhang S, Huang H, Chen S (2015) Assessing Beijing’s PM2.5 pollution: severity, weather impact, apec and winter heating. Proc R Soc A: Math Phys Eng Sci 471(2182):20150257

    ADS  Google Scholar 

  • Lu W, Zhu Z, Lian H (2020) High-dimensional quantile tensor regression. J Mach Learn Res 21(250):1–31

    MathSciNet  Google Scholar 

  • Lv Y, Qin G, Zhu Z, Tu D (2019) Quantile regression and empirical likelihood for the analysis longitudinal data with monotone missing responses due to dropout, with applications to quality of life measurements from clinical trials. Stat Med 38(16):2972–2991

    MathSciNet  PubMed  Google Scholar 

  • Marino MF, Farcomeni A (2015) Linear quantile regression models for longitudinal experiments: an overview. METRON 73(2):229–247

    MathSciNet  Google Scholar 

  • Nesterov Y, Nemirovski A (2013) On first-order algorithms for \(l_1\)/nuclear norm minimization. Acta Numer 22:509–575

    MathSciNet  Google Scholar 

  • Ochando LC, Julián CIF, Ochando FC, Ferri C (2015) Airvlc: an application for real-time forecasting urban air pollution. In: Proceedings of the 2nd international conference on mining urban data, pp. 72–79

  • Portnoy S, Koenker R (1997) The Gaussian Hare and the laplacian tortoise: computability of squared-error versus absolute-error estimators. Stat Sci 12(4):279–300

    MathSciNet  Google Scholar 

  • Qu A, Lindsay BG, Li B (2000) Improving generalised estimating equations using quadratic inference functions. Biometrika 87(4):823–836

    MathSciNet  Google Scholar 

  • Royen T (1995) On some central and non-central multivariate chi-square distributions. Stat Sin 5:373–397

    MathSciNet  Google Scholar 

  • Shi Y, Jiao Y, Cao Y, Liu Y (2018) An alternating direction method of multipliers for mcp-penalized regression with high-dimensional data. Acta Math Sin Engl Ser 34(12):1892–1906

    MathSciNet  Google Scholar 

  • Shi Y, Wu Y, Xu D, Jiao Y (2018) An ADMM with continuation algorithm for non-convex sica-penalized regression in high dimensions. J Stat Comput Simul 88(9):1826–1846

    MathSciNet  Google Scholar 

  • Smith V, Forte S, Ma C, Takáč M, Jordan MI, Jaggi M (2018) CoCoA: a general framework for communication-efficient distributed optimization. J Mach Learn Res 18(230):1–49

    MathSciNet  Google Scholar 

  • Tang CY, Leng C (2011) Empirical likelihood and quantile regression in longitudinal data analysis. Biometrika 98(4):1001–1006

    MathSciNet  Google Scholar 

  • Tang Y, Wang Y, Li J, Qian W (2015) Improving Estimation efficiency in quantile regression with longitudinal data. J Stat Plan Inference 165:38–55

    MathSciNet  Google Scholar 

  • Volgushev S, Chao S-K, Cheng G (2019) Distributed inference for quantile regression processes. Ann Stat 47(3):1634–1662

    MathSciNet  Google Scholar 

  • Wang H, Li C (2017) Distributed quantile regression over sensor networks. IEEE Trans Signal Inf Process Netw 4(2):338–348

    MathSciNet  Google Scholar 

  • Wang H, Ma Y (2021) Optimal subsampling for quantile regression in big data. Biometrika 108(1):99–112

    MathSciNet  Google Scholar 

  • Wang HJ, Zhu Z (2011) Empirical likelihood for quantile regression models with longitudinal data. J Stat Plan Inference 141(4):1603–1615

    MathSciNet  Google Scholar 

  • Yang J, Meng X, Mahoney MW (2014) Quantile regression for large-scale applications. SIAM J Sci Comput 36(5):78–110

    MathSciNet  Google Scholar 

  • Yuan X, Lin N, Dong X, Liu T (2017) Weighted quantile regression for longitudinal data using empirical likelihood. Sci China Math 60(1):147–164

    MathSciNet  Google Scholar 

  • Yu L, Lin N (2017) ADMM for penalized quantile regression in big data. Int Stat Rev 85(3):494–518

    MathSciNet  Google Scholar 

  • Yu L, Lin N, Wang L (2017) A parallel algorithm for large-scale nonconvex penalized quantile regression. J Comput Gr Stat 26(4):935–939

    MathSciNet  Google Scholar 

  • Zhao W, Lian H, Song X (2017) Composite quantile regression for correlated data. Comput Stat S Data Anal 109:15–33

    MathSciNet  Google Scholar 

Download references

Funding

Nan Lin’s work is supported by NVDIA GPU grant program. Ye Fan’s work is supported by Initial Scientific Research Fund of Young Teachers in Capital University of Economics and Business [Grant No. XRZ2022062], and partially supported by Special Fund for Basic Scientific Research of Beijing Municipal Colleges in Capital University of Economics and Business [Grant No. QNTD202207].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nan Lin.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 405 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fan, Y., Lin, N. & Yu, L. Distributed quantile regression for longitudinal big data. Comput Stat 39, 751–779 (2024). https://doi.org/10.1007/s00180-022-01318-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-022-01318-0

Keywords

Navigation