[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content

Advertisement

Log in

A divisive hierarchical clustering methodology for enhancing the ensemble prediction power in large scale population studies: the ATHLOS project

  • Research
  • Published:
Health Information Science and Systems Aims and scope Submit manuscript

Abstract

The ATHLOS cohort is composed of several harmonized datasets of international groups related to health and aging. As a result, the Healthy Aging index has been constructed based on a selection of variables from 16 individual studies. In this paper, we consider additional variables found in ATHLOS and investigate their utilization for predicting the Healthy Aging index. For this purpose, motivated by the volume and diversity of the dataset, we focus our attention upon data clustering, where unsupervised learning is utilized to enhance prediction power. Thus we show the predictive utility of exploiting hidden data structures. In addition, we demonstrate that imposed computation bottlenecks can be surpassed when using appropriate hierarchical clustering, within a clustering for ensemble classification scheme, while retaining prediction benefits. We propose a complete methodology that is evaluated against baseline methods and the original concept. The results are very encouraging suggesting further developments in this direction along with applications in tasks with similar characteristics. A straightforward open source implementation for the R project is also provided (https://github.com/Petros-Barmpas/HCEP).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Lee K-S, Lee B-S, Semnani S, Avanesian A, Um C-Y, Jeon H-J, Seong K-M, Yu K, Min K-J, Jafari M. Curcumin extends life span, improves health span, and modulates the expression of age-associated aging genes in drosophila melanogaster. Rejuvenation Res. 2010;13(5):561–70.

    Article  Google Scholar 

  2. Mathias JS, Agrawal A, Feinglass J, Cooper AJ, Baker DW, Choudhary A. Development of a 5 year life expectancy index in older adults using predictive mining of electronic health record data. J Am Med Inform Assoc. 2013;20(e1):e118–24.

    Article  Google Scholar 

  3. Herland M, Khoshgoftaar TM, Wald R. A review of data mining using big data in health informatics. J Big data. 2014;1(1):1–35.

    Article  Google Scholar 

  4. Eurostat, Population structure and ageing. statistics explained.

  5. Mather M, Jacobsen LA, Pollard KM. Aging in the united states, Population Reference Bureau; 2015.

  6. Organization WH, et al. Men, ageing and health: achieving health across the life span. Tech. rep. Geneva: World Health Organization; 2001.

    Google Scholar 

  7. DESA U. World population ageing 2015, in: United Nations DoEaSA, population division editor; 2015.

  8. Alwan A, et al. Global status report on noncommunicable diseases 2010. Geneva: World Health Organization; 2011.

    Google Scholar 

  9. Seeman TE, Crimmins E, Huang M-H, Singer B, Bucur A, Gruenewald T, Berkman LF, Reuben DB. Cumulative biological risk and socio-economic differences in mortality: Macarthur studies of successful aging. Soc Sci Med. 2004;58(10):1985–97.

    Article  Google Scholar 

  10. Wu M-S, Lan T-H, Chen C-M, Chiu H-C, Lan T-Y. Socio-demographic and health-related factors associated with cognitive impairment in the elderly in Taiwan. BMC Public Health. 2011;11(1):22.

    Article  Google Scholar 

  11. Wagner K-H, Cameron-Smith D, Wessner B, Franzke B. Biomarkers of aging: from function to molecular biology. Nutrients. 2016;8:338. https://doi.org/10.3390/nu8060338.

    Article  Google Scholar 

  12. Caballero FF, Soulis G, Engchuan W, Sánchez-Niubó A, Arndt H, Ayuso-Mateos JL, Haro JM, Chatterji S, Panagiotakos DB. Advanced analytical methodologies for measuring healthy ageing and its determinants, using factor analysis and machine learning techniques: the athlos project. Sci Rep. 2017;7:43955.

    Article  Google Scholar 

  13. Higueras-Fresnillo S, Guallar-Castillón P, Cabanas-Sanchez V, Banegas JR, Rodríguez-Artalejo F, Martinez-Gomez D. Changes in physical activity and cardiovascular mortality in older adults. J Geriatr Cardiol: JGC. 2017;14(4):280.

    Google Scholar 

  14. Martinez-Gomez D, Guallar-Castillon P, Higueras-Fresnillo S, Garcia-Esquinas E, Lopez-Garcia E, Bandinelli S, Rodríguez-Artalejo F. Physical activity attenuates total and cardiovascular mortality associated with physical disability: a national cohort of older adults. J Gerontol: Ser A. 2018;73(2):240–7.

    Article  Google Scholar 

  15. Graciani A, García-Esquinas E, López-García E, Banegas J. Ideal cardiovascular health and risk of frailty in older adults. Circulation. 2016;9(3):239–45.

    Google Scholar 

  16. Tyrovolas S, Panagiotakos D, Georgousopoulou E, Chrysohoou C, Tousoulis D, Haro JM, Pitsavos C. Skeletal muscle mass in relation to 10 year cardiovascular disease incidence among middle aged and older adults: the attica study. J Epidemiol Community Health. 2020;74(1):26–31.

    Article  Google Scholar 

  17. Kollia N, Panagiotakos DB, Chrysohoou C, Georgousopoulou E, Tousoulis D, Stefanadis C, Papageorgiou C, Pitsavos C. Determinants of healthy ageing and its relation to 10-year cardiovascular disease incidence: the Attica study. Cent Eur J Public Health. 2018;26(1):3–9.

    Article  Google Scholar 

  18. Kollia N, Caballero FF, Sánchez-Niubó A, Tyrovolas S, Ayuso-Mateos JL, Haro JM, Chatterji S, Panagiotakos DB. Social determinants, health status and 10-year mortality among 10,906 older adults from the English longitudinal study of aging: the athlos project. BMC Public Health. 2018;18(1):1357.

    Article  Google Scholar 

  19. Soler-Vila H, García-Esquinas E, León-Muñoz LM, López-García E, Banegas JR, Rodríguez-Artalejo F. Contribution of health behaviours and clinical factors to socioeconomic differences in frailty among older adults. J Epidemiol Community Health. 2016;70(4):354–60.

    Article  Google Scholar 

  20. Doménech-Abella J, Mundó J, Moneta MV, Perales J, Ayuso-Mateos JL, Miret M, Haro JM, Olaya B. The impact of socioeconomic status on the association between biomedical and psychosocial well-being and all-cause mortality in older spanish adults. Soc Psychiatry Psychiatr Epidemiol. 2018;53(3):259–68.

    Article  Google Scholar 

  21. Hossin M, Koupil I. Early life social and health determinants of adult socioeconomic position across two generations. Eur J Public Health. 2018;28(4):cky213.

    Google Scholar 

  22. Machado-Fragua MD, Struijk EA, Graciani A, Guallar-Castillon P, Rodríguez-Artalejo F, Lopez-Garcia E. Coffee consumption and risk of physical function impairment, frailty and disability in older adults. Eur J Nutr. 2019;58(4):1415–27.

    Article  Google Scholar 

  23. Tyrovolas S, Haro JM, Foscolou A, Tyrovola D, Mariolis A, Bountziouka V, Piscopo S, Valacchi G, Anastasiou F, Gotsis E, et al. Anti-inflammatory nutrition and successful ageing in elderly individuals: the multinational medis study. Gerontology. 2018;64(1):3–10.

    Article  Google Scholar 

  24. Stefler D, Malyutina S, Nikitin Y, Nikitenko T, Rodriguez-Artalejo F, Peasey A, Pikhart H, Sabia S, Bobak M. Fruit, vegetable intake and blood pressure trajectories in older age. J Hum Hypertens. 2019;33(9):671–8.

    Article  Google Scholar 

  25. León-Muñoz LM, Guallar-Castillón P, García-Esquinas E, Galán I, Rodríguez-Artalejo F. Alcohol drinking patterns and risk of functional limitations in two cohorts of older adults. Clin Nutr. 2017;36(3):831–8.

    Article  Google Scholar 

  26. Ortolá R, García-Esquinas E, Galán I, Guallar-Castillón P, López-García E, Banegas J, Rodríguez-Artalejo F. Patterns of alcohol consumption and risk of falls in older adults: a prospective cohort study. Osteoporos Int. 2017;28(11):3143–52.

    Article  Google Scholar 

  27. de la Torre-Luque A, Ayuso-Mateos JL, Sanchez-Carro Y, de la Fuente J, Lopez-Garcia P. Inflammatory and metabolic disturbances are associated with more severe trajectories of late-life depression. Psychoneuroendocrinology. 2019;110:104443.

    Article  Google Scholar 

  28. de la Torre-Luque A, de la Fuente J, Sanchez-Niubo A, Caballero FF, Prina M, Muniz-Terrera G, Haro JM, Ayuso-Mateos JL. Stability of clinically relevant depression symptoms in old-age across 11 cohorts: a multi-state study. Acta Psychiatr Scand. 2019;140(6):541–51.

    Article  Google Scholar 

  29. de la Torre-Luque A, de la Fuente J, Prina M, Sanchez-Niubo A, Haro JM, Ayuso-Mateos JL. Long-term trajectories of depressive symptoms in old age: relationships with sociodemographic and health-related factors. J Affect Disord. 2019;246:329–37.

    Article  Google Scholar 

  30. Panaretos D, Koloverou E, Dimopoulos AC, Kouli G-M, Vamvakari M, Tzavelas G, Pitsavos C, Panagiotakos DB. A comparison of statistical and machine-learning techniques in evaluating the association between dietary patterns and 10-year cardiometabolic risk (2002–2012): the attica study. Br J Nutr. 2018;120(3):326–34.

    Article  Google Scholar 

  31. Engchuan W, Dimopoulos AC, Tyrovolas S, Caballero FF, Sanchez-Niubo A, Arndt H, Ayuso-Mateos JL, Haro JM, Chatterji S, Panagiotakos DB. Sociodemographic indicators of health status using a machine learning approach and data from the English longitudinal study of aging (elsa). Med Sci Monit. 2019;25:1994.

    Article  Google Scholar 

  32. Alapati YK, Sindhu K. Combining clustering with classification: a technique to improve classification accuracy. Lung Cancer. 2016;32(57):3.

    Google Scholar 

  33. Rouzbahman M, Jovicic A, Chignell M. Can cluster-boosted regression improve prediction of death and length of stay in the ICU? IEEE J Biomed Health Inform. 2017;21(3):851–8. https://doi.org/10.1109/JBHI.2016.2525731.

    Article  Google Scholar 

  34. Trivedi S, Pardos ZA, Heffernan NT. The utility of clustering in prediction tasks, arXiv:1509.06163.

  35. Gan H, Sang N, Huang R, Tong X, Dan Z. Using clustering analysis to improve semi-supervised classification. Neurocomputing. 2013;101:290–8.

    Article  Google Scholar 

  36. Belkin M, Niyogi P, Sindhwani V. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res. 2006;7:2399–434.

    MathSciNet  MATH  Google Scholar 

  37. Agrawal U, Soria D, Wagner C, Garibaldi J, Ellis IO, Bartlett JM, Cameron D, Rakha EA, Green AR. Combining clustering and classification ensembles: a novel pipeline to identify breast cancer profiles. Artif Intell Med. 2019;97:27–37.

    Article  Google Scholar 

  38. Tran CT, Zhang M, Andreae P, Xue B, Bui LT. Improving performance of classification on incomplete data using feature selection and clustering. Appl Soft Comput. 2018;73:848–61.

    Article  Google Scholar 

  39. Sanchez-Niubo A, Egea-Cortés L, Olaya B, Caballero FF, Ayuso-Mateos JL, Prina M, Bobak M, Arndt H, Tobiasz-Adamczyk B, Pająk A, et al. Cohort profile: the ageing trajectories of health-longitudinal opportunities and synergies (athlos) project. Int J Epidemiol. 2019;48(4):1052–1053i.

    Article  Google Scholar 

  40. Prina AM, Acosta D, Acosta I, Guerra M, Huang Y, Jotheeswaran A, Jimenez-Velazquez IZ, Liu Z, Llibre RJ, Salas JA. Cohort profile: the 10/66 study. Int J Epidemiol. 2017;46(2):406.

    Google Scholar 

  41. Luszcz MA, Giles LC, Anstey KJ, Browne-Yung KC, Walker RA, Windsor TD. Cohort profile: the Australian longitudinal study of ageing (alsa). Int J Epidemiol. 2016;45(4):1054–63.

    Article  Google Scholar 

  42. Leonardi M, Chatterji S, Koskinen S, Ayuso-Mateos JL, Haro JM, Frisoni G, Frattura L, Martinuzzi A, Tobiasz-Adamczyk B, Gmurek M, et al. Determinants of health and disability in ageing population: the courage in Europe project (collaborative research on ageing in europe). Clin Psychol Psychother. 2014;21(3):193–8.

    Article  Google Scholar 

  43. Steptoe A, Breeze E, Banks J, Nazroo J. Cohort profile: the English longitudinal study of ageing. Int J Epidemiol. 2013;42(6):1640–8.

    Article  Google Scholar 

  44. Rodríguez-Artalejo F, Graciani A, Guallar-Castillón P, León-Muñoz LM, Zuluaga MC, López-García E, Gutiérrez-Fisac JL, Taboada JM, Aguilera MT, Regidor E, et al. Rationale and methods of the study on nutrition and cardiovascular risk in Spain (enrica). Revista Española de Cardiología (English Edition). 2011;64(10):876–82.

    Article  Google Scholar 

  45. Peasey A, Bobak M, Kubinova R, Malyutina S, Pajak A, Tamosiunas A, Pikhart H, Nicholson A, Marmot M. Determinants of cardiovascular disease and other non-communicable diseases in central and eastern Europe: rationale and design of the hapiee study. BMC Public Health. 2006;6(1):255.

    Article  Google Scholar 

  46. KS, Health 2000 and 2011 surveys-thl biobank. National Institute for Health and Welfare (2018). Accessed 18 July 2008.

  47. Sonnega A, Faul JD, Ofstedal MB, Langa KM, Phillips JW, Weir DR. Cohort profile: the health and retirement study (hrs). Int J Epidemiol. 2014;43(2):576–85.

    Article  Google Scholar 

  48. Ichimura H, Shimizutani S, Hashimoto H. Jstar first results 2009 report. Research Institute of Economy, Trade and Industry (RIETI): Tech. rep; 2009.

  49. Park JH, Lim S, Lim J, Kim K, Han M, Yoon IY, Kim J, Chang Y, Chang CB, Chin HJ, et al. An overview of the Korean longitudinal study on health and aging. Psychiatry Investig. 2007;4(2):84.

    Google Scholar 

  50. Wong R, Michaels-Obregon A, Palloni A. Cohort profile: the Mexican health and aging study (MHAS). Int J Epidemiol. 2017;46(2):e2–e2.

    Article  Google Scholar 

  51. Kowal P, Chatterji S, Naidoo N, Biritwum R, Fan W, Lopez Ridaura R, Maximova T, Arokiasamy P, Phaswana-Mafuya N, Williams S, et al. Data resource profile: the world health organization study on global ageing and adult health (Sage). Int J Epidemiol. 2012;41(6):1639–49.

    Article  Google Scholar 

  52. Börsch-Supan A, Brandt M, Hunkler C, Kneip T, Korbmacher J, Malter F, Schaan B, Stuck S, Zuber S. Data resource profile: the survey of health, ageing and retirement in Europe (SHARE). Int J Epidemiol. 2013;42(4):992–1001.

    Article  Google Scholar 

  53. Whelan BJ, Savva GM. Design and methodology of the Irish longitudinal study on ageing. J Am Geriatr Soc. 2013;61:S265–8.

    Article  Google Scholar 

  54. Arokiasamy P, Bloom D, Lee J, Feeney K, Ozolins M. Longitudinal aging study in India: vision, design, implementation, and preliminary findings. In: Smith JP, Majmundar M, editors. Aging in Asia: findings from new and emerging data initiatives. Washington: National Academies Press; 2012.

    Google Scholar 

  55. Seetharaman P, Wichern G, Le Roux J, Pardo B. Bootstrapping single-channel source separation via unsupervised spatial clustering on stereo mixtures. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2019. pp. 356–360.

  56. Dietterich TG, Ensemble methods in machine learning. In: International workshop on multiple classifier systems, Springer, 2000; pp. 1–15.

  57. Boongoen T, Iam-On N. Cluster ensembles: a survey of approaches with recent extensions and applications. Comput Sci Rev. 2018;28:1–25.

    Article  MathSciNet  MATH  Google Scholar 

  58. Saraçli S, Doğan N, Doğan İ. Comparison of hierarchical cluster analysis methods by cophenetic correlation. J Inequal Appl. 2013;2013(1):1–8.

    Article  MATH  Google Scholar 

  59. Pavlidis NG, Hofmeyr DP, Tasoulis SK. Minimum density hyperplanes. J Mach Learn Res. 2016;17(1):5414–46.

    MathSciNet  MATH  Google Scholar 

  60. Murtagh F, Legendre P. Ward’s hierarchical agglomerative clustering method: which algorithms implement ward’s criterion? J Classif. 2014;31(3):274–95.

    Article  MathSciNet  MATH  Google Scholar 

  61. Zhang W, Zhao D, Wang X. Agglomerative clustering via maximum incremental path integral. Pattern Recogn. 2013;46(11):3056–65.

    Article  MATH  Google Scholar 

  62. Sharma A, López Y, Tsunoda T. Divisive hierarchical maximum likelihood clustering. BMC Bioinform. 2017;18(16):546.

    Article  Google Scholar 

  63. Tasoulis S, Cheng L, Välimäki N, Croucher NJ, Harris SR, Hanage WP, Roos T, Corander J. Random projection based clustering for population genomics. IEEE Int Conf Big Data (Big Data). 2014;2014:675–82. https://doi.org/10.1109/BigData.2014.7004291.

    Article  Google Scholar 

  64. Tasoulis SK, Tasoulis DK, Plagianakos VP. Enhancing principal direction divisive clustering. Pattern Recogn. 2010;43(10):3391–411.

    Article  MATH  Google Scholar 

  65. Hofmeyr DP. Clustering by minimum cut hyperplanes. IEEE Trans Pattern Anal Mach Intell. 2016;39(8):1547–60.

    Article  Google Scholar 

  66. Azzalini A, Torelli N. Clustering via nonparametric density estimation. Stat Comput. 2007;17(1):71–80.

    Article  MathSciNet  Google Scholar 

  67. Stuetzle W, Nugent R. A generalized single linkage method for estimating the cluster tree of a density. J Comput Graph Stat. 2010;19(2):397–418.

    Article  MathSciNet  Google Scholar 

  68. Menardi G, Azzalini A. An advancement in clustering via nonparametric density estimation. Stat Comput. 2014;24(5):753–67.

    Article  MathSciNet  MATH  Google Scholar 

  69. Ben-David S, Lu T, Pál D, Sotáková M. Learning low density separators. In: Artificial Intelligence and Statistics; 2009, pp. 25–32.

  70. Boley D. Principal direction divisive partitioning. Data Min Knowl Disc. 1998;2(4):325–44.

    Article  Google Scholar 

  71. Zumel N, Mount J vtreat: a data. frame processor for predictive modeling, arXiv:1611.09477.

  72. Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987;20:53–65.

    Article  MATH  Google Scholar 

  73. Baker FB, Hubert LJ. Measuring the power of hierarchical cluster analysis. J Am Stat Assoc. 1975;70(349):31–8.

    Article  MATH  Google Scholar 

  74. Tasoulis S, Pavlidis NG, Root T. Nonlineardimensionality reduction for clustering. Pattern Recogn. 2020;107:107508.

    Article  Google Scholar 

  75. Emerson J, Kane M. biganalytics: Utilities for “big. matrix” objects from package “bigmemory”, J Stat Softw.

  76. Liaw A, Wiener M, et al. Classification and regression by randomforest. R News. 2002;2(3):18–22.

    Google Scholar 

  77. Chai T, Draxler RR. Root mean square error (RMSE) or mean absolute error (MAE)?-Arguments against avoiding RMSE in the literature. Geosci Model Develop. 2014;7(3):1247–50.

    Article  Google Scholar 

  78. Kim J-H. Estimatingclassification error rate: repeated cross-validation, repeated hold-out and bootstrap. Comput Stat Data Anal. 2009;53(11):3735–45. https://doi.org/10.1016/j.csda.2009.04.009.

    Article  MATH  Google Scholar 

  79. Microsoft, S. Weston, foreach: provides Foreach Looping Construct, r package version 1.4.7 url = https://CRAN.R-project.org/package=foreach (2019).

  80. Chen T, Guestrin C. Xgboost: a scalable tree boosting system, in: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining; 2016, pp. 785–794.

  81. Kingma DP, Ba J. Adam: a method for stochastic optimization, arXiv:1412.6980.

  82. Rousseeuw PJ, Kaufman L. Finding groups in data, Hoboken: Wiley Online Library 1.

  83. Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Ser B. 2001;63(2):411–23.

    Article  MathSciNet  MATH  Google Scholar 

  84. Hofmeyr D, Pavlidis N. Ppci: an r package for cluster identification using projection pursuit. R J Appear. 2019. https://doi.org/10.32614/RJ-2019-046.

    Article  Google Scholar 

  85. Tasoulis SK, Vrahatis AG, Georgakopoulos SV, Plagianakos VP. Biomedical data ensemble classification using random projections. In: 2018 IEEE International Conference on Big Data (Big Data), IEEE; 2018, pp. 166–172.

  86. Cannings TI, Samworth RJ. Random-projection ensemble classification. J R Stat Soc Ser B. 2017;79(4):959–1035.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work is supported by the ATHLOS (Aging Trajectories of Health: Longitudinal Opportunities and Synergies) project, funded by the European Union’s Horizon 2020 Research and Innovation Program under Grant Agreement Number 635316.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Petros Barmpas.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (XLS 76 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Barmpas, P., Tasoulis, S., Vrahatis, A.G. et al. A divisive hierarchical clustering methodology for enhancing the ensemble prediction power in large scale population studies: the ATHLOS project. Health Inf Sci Syst 10, 6 (2022). https://doi.org/10.1007/s13755-022-00171-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13755-022-00171-1

Keywords

Navigation