Abstract
Distance metric over a given space of data should reflect the precise comparison among objects. The Euclidean distance of data points represented by a large number of features is not capturing the actual relationship between those points. However, objects of similar cluster both often have some common attributes despite the fact that their geometrical distance could be somewhat large. In this study, we proposed a new method that replaced the given data space to categorical space based on ensemble clustering (EC). The EC space is defined by tracking the membership of the points over multiple runs of clustering algorithms. To assess our suggested method, it was integrated within the framework of the Decision Trees, K Nearest Neighbors, and the Random Forest classifiers. The results obtained by applying EC on 10 datasets confirmed that our hypotheses embedding the EC space as a distance metric, would improve the performance and reduce the feature space dramatically.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Topchy, A., Jain, A.K., Punch, W.: Combining multiple weak clusterings. In: Third IEEE International Conference on Data Mining, pp. 0–7 (2003)
Strehl, A., Ghosh, J.: Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)
Topchy, A., Jain, A.K., Punch, W.: Clustering ensembles: models of consensus and weak partitions. IEEE Trans. Pattern Anal. Mach. Intell. 27(12), 1866–1881 (2005)
Dudoit, S., Fridlyand, J.: Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19(9), 1090–1099 (2003)
Fern, X.Z., Brodley, C.E.: Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the Twentieth International Conference on Machine Learning, vol. 20, pp. 186–193 (2003)
Fischer, B., Buhmann, J.M.: Bagging for path-based clustering. IEEE Trans. Pattern Anal. Mach. Intell. 25(11), 1411–1415 (2003)
Derbeko, P., El-Yaniv, R., Meir, R.: Explicit learning curves for transduction and application to clustering and compression algorithms. J. Artif. Intell. Res. 22, 117–142 (2004)
AbedAllah, L., Shimshoni, I.: k nearest neighbor using ensemble clustering. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2012. LNCS, vol. 7448, pp. 265–278. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32584-7_22
AbedAllah, L., Shimshoni, I.: An ensemble-clustering-based distance metric and its applications. Int. J. Bus. Intell. Data Min. 8(3), 264–287 (2013)
Yousef, M., Khalifa, W., AbedAllah, L.: Ensemble clustering classification compete SVM and one-class classifiers applied on plant microRNAs data. J. Integr. Bioinform. 13(5), 304 (2016)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Griffiths-Jones, S.: miRBase: microRNA sequences and annotation. Curr. Protoc. Bioinform. Chapter 12, Unit 12.9.1–10 (2010)
Yousef, M., Nigatu, D., Levy, D., Allmer, J., Henkel, W.: Categorization of species based on their microRNAs employing sequence motifs, information-theoretic sequence feature extraction, and k-mers. EURASIP J. Adv. Signal Process. 2017(1), 70 (2017)
Yousef, M., Nebozhyn, M., Shatkay, H., Kanterakis, S., Showe, L.C., Showe, M.K.: Combining multi-species genomic data for microRNA identification using a Naive Bayes classifier. Bioinformatics 22(11), 1325–1334 (2006)
Sacar, M.D., Allmer, J.: Data mining for microRNA gene prediction: on the impact of class imbalance and feature number for microRNA gene prediction. In: 2013 8th International Symposium on Health Informatics and Bioinformatics, pp. 1–6 (2013)
Yousef, M., Yousef, A., Allmer, J.: K-mer Distance a New Set of Features for Delineating among Pre-Cursor microRNAs from Different Species (2018)
Acknowledgment
This research was supported by the Max Stern Yezreel Valley College for LA and by Zefat Academic College for MY.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Abddallah, L., Yousef, M. (2018). Ensemble Clustering Based Dimensional Reduction. In: Elloumi, M., et al. Database and Expert Systems Applications. DEXA 2018. Communications in Computer and Information Science, vol 903. Springer, Cham. https://doi.org/10.1007/978-3-319-99133-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-99133-7_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99132-0
Online ISBN: 978-3-319-99133-7
eBook Packages: Computer ScienceComputer Science (R0)