Abstract
An accurate data characterization is essential for a reliable selection of clustering algorithms via meta-learning. This work evaluates a set of measures for characterizing clustering problems using beta regression and two well-known machine learning regression techniques as meta-models. We have observed a subset of meta-features which demonstrates greater resourcefulness to characterize the clustering datasets. In addition, secondary findings made it possible to verify the direction and magnitude of the influence and the importance of such measures in predicting the performance of the algorithms under analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Barrat, A., Barthelemy, M., Pastor-Satorras, R., Vespignani, A.: The architecture of complex weighted networks. Proc. Natl. Acad. Sci. 101(11), 3747–3752 (2004)
Bonacich, P.: Power and centrality: a family of measures. Am. J. Sociol. 92(5), 1170–1182 (1987)
Brazdil, P., Carrier, C.G., Soares, C., Vilalta, R.: Metalearning: Applications to Data Mining. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-73263-1
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Campello, R.J.G.B., Moulavi, D., Sander, J.: Density-based clustering based on hierarchical density estimates. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 160–172. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2_14
De Souto, M.C., et al.: Ranking and selecting clustering algorithms using a meta-learning approach. In: 2008 IEEE International Joint Conference on Neural Networks, pp. 3729–3735 (2008)
Espinheira, P.L., da Silva, L.C.M., Silva, A.D.O., Ospina, R.: Model selection criteria on beta regression for machine learning. Mach. Learn. Knowl. Extract. 1(1), 427–449 (2019)
Fernandes, L.H.D.S., Lorena, A.C., Smith-Miles, K.: Towards understanding clustering problems and algorithms: an instance space analysis. Algorithms 14(3), 95 (2021)
Ferrari, D.G., de Castro, L.N.: Clustering algorithm recommendation: a meta-learning approach. In: Panigrahi, B.K., Das, S., Suganthan, P.N., Nanda, P.K. (eds.) SEMCCO 2012. LNCS, vol. 7677, pp. 143–150. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35380-2_18
Ferrari, D.G., De Castro, L.N.: Clustering algorithm selection by meta-learning systems: a new distance-based problem characterization and ranking combination methods. Inf. Sci. 301, 181–194 (2015)
Ferrari, S., Cribari-Neto, F.: Beta regression for modelling rates and proportions. J. Appl. Stat. 31(7), 799–815 (2004)
Fränti, P., Sieranoja, S.: K-means properties on six clustering benchmark datasets. Appl. Intell. 48(12), 4743–4759 (2018). https://doi.org/10.1007/s10489-018-1238-7
Handl, J., Knowles, J.: Cluster generators for large high-dimensional data sets with large numbers of clusters (2005). https://personalpages.manchester.ac.uk/staff/Julia.Handl/generators.html. Accessed 5 Aug 2021
Handl, J., Knowles, J., Kell, D.B.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15), 3201–3212 (2005)
Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985). https://doi.org/10.1007/BF01908075
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)
Kraskov, A., Stögbauer, H., Grassberger, P.: Estimating mutual information. Phys. Rev. E 69(6), 066138 (2004)
Lorena, A.C., Garcia, L.P., Lehmann, J., Souto, M.C., Ho, T.K.: How complex is your classification problem? A survey on measuring classification complexity. ACM Comput. Surv. (CSUR) 52(5), 1–34 (2019)
Ma, J.: Estimating transfer entropy via copula entropy. arXiv preprint. arXiv:1910.04375 (2019)
Mardia, K.V.: Measures of multivariate skewness and kurtosis with applications. Biometrika 57(3), 519–530 (1970)
Pimentel, B.A., de Carvalho, A.C.: A new data characterization for selecting clustering algorithms using meta-learning. Inf. Sci. 477, 203–219 (2019)
Pimentel, B.A., de Carvalho, A.C.: A meta-learning approach for recommending the number of clusters for clustering algorithms. Knowl.-Based Syst. 195, 105682 (2020)
Rice, J.R.: The algorithm selection problem. In: Advances in Computers, vol. 15, pp. 65–118. Elsevier (1976)
Sáez, J.A., Corchado, E.: A meta-learning recommendation system for characterizing unsupervised problems: on using quality indices to describe data conformations. IEEE Access 7, 63247–63263 (2019)
Smith-Miles, K.A.: Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput. Surv. (CSUR) 41(1), 6 (2009)
Soares, R.G.F., Ludermir, T.B., De Carvalho, F.A.T.: An analysis of meta-learning techniques for ranking clustering algorithms applied to artificial data. In: Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas, G. (eds.) ICANN 2009. LNCS, vol. 5768, pp. 131–140. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04274-4_14
Vanschoren, J.: Meta-learning: a survey. arXiv preprint arXiv:1810.03548 (2018)
Acknowledgements
To the Brazilian research agency CNPq.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Fernandes, L.H.d.S., de Souto, M.C.P., Lorena, A.C. (2021). Evaluating Data Characterization Measures for Clustering Problems in Meta-learning. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13108. Springer, Cham. https://doi.org/10.1007/978-3-030-92185-9_51
Download citation
DOI: https://doi.org/10.1007/978-3-030-92185-9_51
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92184-2
Online ISBN: 978-3-030-92185-9
eBook Packages: Computer ScienceComputer Science (R0)