Abstract
Data availability in a wide variety of domains has boosted the use of Machine Learning techniques for knowledge discovery and classification. The performance of a technique in a given classification task is significantly impacted by specific characteristics of the dataset, which makes the problem of choosing the most adequate approach a challenging one. Meta-Learning approaches, which learn from meta-features calculated from the dataset, have been successfully used to suggest the most suitable classification algorithms for specific datasets. This work proposes the adaptation of clustering measures based on internal indices for supervised problems as additional meta-features in the process of learning a recommendation system for classification tasks. The gains in performance due to Meta-Learning and the additional meta-features are investigated with experiments based on 400 datasets, representing diverse application contexts and domains. Results suggest that (i) meta-learning is a viable solution for recommending a classifier, (ii) the use of clustering features can contribute to the performance of the recommendation system, and (iii) the computational cost of Meta-Learning is substantially smaller than that of running all candidate classifiers in order to select the best.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Baker, F.B., Hubert, L.J.: Measuring the power of hierarchical cluster analysis. J. Am. Stat. Assoc. 70(349), 31–38 (1975)
Bezdek, J.C., Pal, N.R.: Some new indexes of cluster validity. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 28(3), 301–315 (1998)
Bilalli, B., Abelló, A., Aluja-Banet, T.: On the predictive power of meta-features in OpenML. Int. J. Appl. Math. Comput. Sci. 27(4), 697–712 (2017)
Brazdil, P., Giraud-Carrier, C., Soares, C., Vilalta, R.: Metalearning - Applications to Data Mining. Cognitive Technologies, 1st edn. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-73263-1
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth and Brooks (1984)
Brock, G., Pihur, V., Datta, S., Datta, S.: clValid: An R package for cluster validation. J. Stat. Softw. 25(4), 1–22 (2008). http://www.jstatsoft.org/v25/i04/
Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. Theory Methods 3(1), 1–27 (1974)
Castiello, C., Castellano, G., Fanelli, A.M.: Meta-data: characterization of input features for meta-learning. In: Torra, V., Narukawa, Y., Miyamoto, S. (eds.) MDAI 2005. LNCS (LNAI), vol. 3558, pp. 457–468. Springer, Heidelberg (2005). https://doi.org/10.1007/11526018_45
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-1(2), 224–227 (1979)
Desgraupes, B.: clusterCrit Vignette (2018). https://CRAN.R-project.org/package=clusterCrit/vignettes/clusterCrit.pdf
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Dunn, J.C.: Well-separated clusters and optimal fuzzy partitions. J. Cybern. 4(1), 95–104 (1974)
Filchenkov, A., Pendryak, A.: Datasets meta-feature description for recommending feature selection algorithm. In: Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT), vol. 7, pp. 11–18 (2015)
Garcia, L.P.F., Lorena, A.C., de Souto, M.C.P., Ho, T.K.: Classifier recommendation using data complexity measures. In: 24th International Conference on Pattern Recognition (ICPR), pp. 874–879 (2018)
Garcia, L.P.F., Rivolli, A., Alcobaça, E., Lorena, A.C., de Carvalho, A.C.P.L.F.: Boosting meta-learning with simulated data complexity measures. Intell. Data Anal. 24(5), 1011–1028 (2020)
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2–3), 107–145 (2001)
Handl, J., Knowles, J.: Exploiting the trade-off — the benefits of multiple objectives in data clustering. In: Coello Coello, C.A., Hernández Aguirre, A., Zitzler, E. (eds.) EMO 2005. LNCS, vol. 3410, pp. 547–560. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31880-4_38
Haykin, S.S.: Neural Networks: A Comprehensive Foundation. Prentice Hall, Hoboken (1999)
Hubert, L., Schultz, J.: Quadratic assignment as a general data analysis strategy. Br. J. Math. Stat. Psychol. 29(2), 190–241 (1976)
Mitchell, T.M.: Machine Learning. McGraw Hill Series in Computer Science. McGraw Hill, New York (1997)
Montgomery, D.C.: Design and Analysis of Experiments, 5th edn. Wiley, Hoboken (2000)
Muñoz, M.A., Villanova, L., Baatar, D., Smith-Miles, K.: Instance spaces for machine learning classification. Mach. Learn. 107(1), 109–147 (2018). https://doi.org/10.1007/s10994-017-5629-5
Pfahringer, B., Bensusan, H., Giraud-Carrier, C.G.: Meta-learning by landmarking various learning algorithms. In: 17th International Conference on Machine Learning (ICML), pp. 743–750 (2000)
Pimentel, B.A., de Carvalho, A.C.P.L.F.: A new data characterization for selecting clustering algorithms using meta-learning. Inf. Sci. 477, 203–219 (2019)
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Ray, S., Turi, R.H.: Determination of number of clusters in k-means clustering and application in colour segmentation. In: 4th International Conference on Advances in Pattern Recognition and Digital Techniques (ICAPRDT), pp. 137–143 (1999)
Reif, M., Shafait, F., Goldstein, M., Breuel, T., Dengel, A.: Automatic classifier selection for non-experts. Pattern Anal. Appl. 17(1), 83–96 (2014)
Rice, J.R.: The algorithm selection problem. Adv. Comput. 15, 65–118 (1976)
Rivolli, A., Garcia, L.P.F., Soares, C., Vanschoren, J., de Carvalho, A.C.P.L.F.: Characterizing classification datasets: a study of meta-features for meta-learning. CoRR abs/1808.10406, 1–49 (2019)
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Sakamoto, Y., Ishiguro, M., Kitagawa, G.: Akaike Information Criterion Statistics. Springer, Netherlands (1986)
Smith-Miles, K.A.: Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput. Surv. 41(1), 1–25 (2008)
Stephenson, N., et al.: Survey of machine learning techniques in drug discovery. Curr. Drug metab. 20(3), 185–193 (2019)
Van Rijn, J.N., et al.: OpenML: a collaborative science platform. In: European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD), pp. 645–649 (2013)
Vukicevic, M., Radovanovic, S., Delibasic, B., Suknovic, M.: Extending meta-learning framework for clustering gene expression data with component-based algorithm design and internal evaluation measures. Int. J. Data Min. Bioinform. (IJDMB) 14(2), 101–119 (2016)
Xie, X.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 13(8), 841–847 (1991)
Acknowledgment
Research carried out using the computational resources of the Center for Mathematical Sciences Applied to Industry (CeMEAI) funded by FAPESP (grant 2013/07375-0).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Garcia, L.P.F., Campelo, F., Ramos, G.N., Rivolli, A., de Carvalho, A.C.P.d.L.F. (2021). Evaluating Clustering Meta-features for Classifier Recommendation. In: Britto, A., Valdivia Delgado, K. (eds) Intelligent Systems. BRACIS 2021. Lecture Notes in Computer Science(), vol 13073. Springer, Cham. https://doi.org/10.1007/978-3-030-91702-9_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-91702-9_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91701-2
Online ISBN: 978-3-030-91702-9
eBook Packages: Computer ScienceComputer Science (R0)