Evaluating Clustering Meta-features for Classifier Recommendation

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13073))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

725 Accesses
1 Citations
1 Altmetric

Abstract

Data availability in a wide variety of domains has boosted the use of Machine Learning techniques for knowledge discovery and classification. The performance of a technique in a given classification task is significantly impacted by specific characteristics of the dataset, which makes the problem of choosing the most adequate approach a challenging one. Meta-Learning approaches, which learn from meta-features calculated from the dataset, have been successfully used to suggest the most suitable classification algorithms for specific datasets. This work proposes the adaptation of clustering measures based on internal indices for supervised problems as additional meta-features in the process of learning a recommendation system for classification tasks. The gains in performance due to Meta-Learning and the additional meta-features are investigated with experiments based on 400 datasets, representing diverse application contexts and domains. Results suggest that (i) meta-learning is a viable solution for recommending a classifier, (ii) the use of clustering features can contribute to the performance of the recommendation system, and (iii) the computational cost of Meta-Learning is substantially smaller than that of running all candidate classifiers in order to select the best.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 63.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 79.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An improved data characterization method and its application in classification algorithm recommendation

Article 02 July 2015

CIAMS: clustering indices-based automatic classification model selection

Article 19 August 2023

A Study of the Correlation of Metafeatures Used for Metalearning

Notes

1.
https://github.com/rivolli/mfe.

References

Baker, F.B., Hubert, L.J.: Measuring the power of hierarchical cluster analysis. J. Am. Stat. Assoc. 70(349), 31–38 (1975)
Article Google Scholar
Bezdek, J.C., Pal, N.R.: Some new indexes of cluster validity. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 28(3), 301–315 (1998)
Google Scholar
Bilalli, B., Abelló, A., Aluja-Banet, T.: On the predictive power of meta-features in OpenML. Int. J. Appl. Math. Comput. Sci. 27(4), 697–712 (2017)
Article MathSciNet Google Scholar
Brazdil, P., Giraud-Carrier, C., Soares, C., Vilalta, R.: Metalearning - Applications to Data Mining. Cognitive Technologies, 1st edn. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-73263-1
Book MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth and Brooks (1984)
Google Scholar
Brock, G., Pihur, V., Datta, S., Datta, S.: clValid: An R package for cluster validation. J. Stat. Softw. 25(4), 1–22 (2008). http://www.jstatsoft.org/v25/i04/
Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. Theory Methods 3(1), 1–27 (1974)
Article MathSciNet Google Scholar
Castiello, C., Castellano, G., Fanelli, A.M.: Meta-data: characterization of input features for meta-learning. In: Torra, V., Narukawa, Y., Miyamoto, S. (eds.) MDAI 2005. LNCS (LNAI), vol. 3558, pp. 457–468. Springer, Heidelberg (2005). https://doi.org/10.1007/11526018_45
Chapter MATH Google Scholar
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)
Google Scholar
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-1(2), 224–227 (1979)
Google Scholar
Desgraupes, B.: clusterCrit Vignette (2018). https://CRAN.R-project.org/package=clusterCrit/vignettes/clusterCrit.pdf
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Dunn, J.C.: Well-separated clusters and optimal fuzzy partitions. J. Cybern. 4(1), 95–104 (1974)
Article MathSciNet Google Scholar
Filchenkov, A., Pendryak, A.: Datasets meta-feature description for recommending feature selection algorithm. In: Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT), vol. 7, pp. 11–18 (2015)
Google Scholar
Garcia, L.P.F., Lorena, A.C., de Souto, M.C.P., Ho, T.K.: Classifier recommendation using data complexity measures. In: 24th International Conference on Pattern Recognition (ICPR), pp. 874–879 (2018)
Google Scholar
Garcia, L.P.F., Rivolli, A., Alcobaça, E., Lorena, A.C., de Carvalho, A.C.P.L.F.: Boosting meta-learning with simulated data complexity measures. Intell. Data Anal. 24(5), 1011–1028 (2020)
Google Scholar
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2–3), 107–145 (2001)
Article Google Scholar
Handl, J., Knowles, J.: Exploiting the trade-off — the benefits of multiple objectives in data clustering. In: Coello Coello, C.A., Hernández Aguirre, A., Zitzler, E. (eds.) EMO 2005. LNCS, vol. 3410, pp. 547–560. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31880-4_38
Chapter Google Scholar
Haykin, S.S.: Neural Networks: A Comprehensive Foundation. Prentice Hall, Hoboken (1999)
Google Scholar
Hubert, L., Schultz, J.: Quadratic assignment as a general data analysis strategy. Br. J. Math. Stat. Psychol. 29(2), 190–241 (1976)
Article MathSciNet Google Scholar
Mitchell, T.M.: Machine Learning. McGraw Hill Series in Computer Science. McGraw Hill, New York (1997)
Google Scholar
Montgomery, D.C.: Design and Analysis of Experiments, 5th edn. Wiley, Hoboken (2000)
Google Scholar
Muñoz, M.A., Villanova, L., Baatar, D., Smith-Miles, K.: Instance spaces for machine learning classification. Mach. Learn. 107(1), 109–147 (2018). https://doi.org/10.1007/s10994-017-5629-5
Article MathSciNet MATH Google Scholar
Pfahringer, B., Bensusan, H., Giraud-Carrier, C.G.: Meta-learning by landmarking various learning algorithms. In: 17th International Conference on Machine Learning (ICML), pp. 743–750 (2000)
Google Scholar
Pimentel, B.A., de Carvalho, A.C.P.L.F.: A new data characterization for selecting clustering algorithms using meta-learning. Inf. Sci. 477, 203–219 (2019)
Google Scholar
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Article Google Scholar
Ray, S., Turi, R.H.: Determination of number of clusters in k-means clustering and application in colour segmentation. In: 4th International Conference on Advances in Pattern Recognition and Digital Techniques (ICAPRDT), pp. 137–143 (1999)
Google Scholar
Reif, M., Shafait, F., Goldstein, M., Breuel, T., Dengel, A.: Automatic classifier selection for non-experts. Pattern Anal. Appl. 17(1), 83–96 (2014)
Article MathSciNet Google Scholar
Rice, J.R.: The algorithm selection problem. Adv. Comput. 15, 65–118 (1976)
Article Google Scholar
Rivolli, A., Garcia, L.P.F., Soares, C., Vanschoren, J., de Carvalho, A.C.P.L.F.: Characterizing classification datasets: a study of meta-features for meta-learning. CoRR abs/1808.10406, 1–49 (2019)
Google Scholar
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Article Google Scholar
Sakamoto, Y., Ishiguro, M., Kitagawa, G.: Akaike Information Criterion Statistics. Springer, Netherlands (1986)
MATH Google Scholar
Smith-Miles, K.A.: Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput. Surv. 41(1), 1–25 (2008)
Article Google Scholar
Stephenson, N., et al.: Survey of machine learning techniques in drug discovery. Curr. Drug metab. 20(3), 185–193 (2019)
Article Google Scholar
Van Rijn, J.N., et al.: OpenML: a collaborative science platform. In: European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD), pp. 645–649 (2013)
Google Scholar
Vukicevic, M., Radovanovic, S., Delibasic, B., Suknovic, M.: Extending meta-learning framework for clustering gene expression data with component-based algorithm design and internal evaluation measures. Int. J. Data Min. Bioinform. (IJDMB) 14(2), 101–119 (2016)
Article Google Scholar
Xie, X.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 13(8), 841–847 (1991)
Article Google Scholar

Download references

Acknowledgment

Research carried out using the computational resources of the Center for Mathematical Sciences Applied to Industry (CeMEAI) funded by FAPESP (grant 2013/07375-0).

Author information

Authors and Affiliations

Department of Computer Science, University of Brasília, Brasília, Brazil
Luís P. F. Garcia & Guilherme N. Ramos
College of Engineering and Physical Sciences, Aston University, Birmingham, UK
Felipe Campelo
Computing Department, Technological University of Paraná, Cornélio Procópio, Brazil
Adriano Rivolli
Institute of Mathematical and Computer Sciences, University of São Paulo, São Carlos, Brazil
André C. P. de L. F. de Carvalho

Authors

Luís P. F. Garcia
View author publications
You can also search for this author in PubMed Google Scholar
Felipe Campelo
View author publications
You can also search for this author in PubMed Google Scholar
Guilherme N. Ramos
View author publications
You can also search for this author in PubMed Google Scholar
Adriano Rivolli
View author publications
You can also search for this author in PubMed Google Scholar
André C. P. de L. F. de Carvalho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luís P. F. Garcia .

Editor information

Editors and Affiliations

Universidade Federal de Sergipe, São Cristóvão, Brazil
André Britto
Universidade de São Paulo, São Paulo, Brazil
Karina Valdivia Delgado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Garcia, L.P.F., Campelo, F., Ramos, G.N., Rivolli, A., de Carvalho, A.C.P.d.L.F. (2021). Evaluating Clustering Meta-features for Classifier Recommendation. In: Britto, A., Valdivia Delgado, K. (eds) Intelligent Systems. BRACIS 2021. Lecture Notes in Computer Science(), vol 13073. Springer, Cham. https://doi.org/10.1007/978-3-030-91702-9_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-91702-9_30
Published: 28 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91701-2
Online ISBN: 978-3-030-91702-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics