Abstract
Random Forest is one of the most popular supervised machine learning algorithms; it is an ensemble of decision trees combined together to accurately discover more rules and ensure diversity. Generally, constructing a large number of trees could lead to constructing redundant ones which may badly affect storage memory, computational time resources, performance attainability and interpretability. A plenty of methods have been proposed in the literature for the sake of selecting a sub-forest while maintaining or even increasing the whole performance. In this paper, a new sub-forest selection method is proposed. It comes in two flavours: first, selecting the minimal number of trees possible, and second, maintaining or even ameliorating the original ensemble performance attainability. A noisy variable technique is introduced as an indicator of underperforming trees. This generated variable is injected in the feature space at each node during the tree’s construction. As a result, the noisy trees are eliminated from the final sub-forest. To test the validity of the proposed method, we have employed real and artificial benchmarking datasets. The obtained results confirm that the generated sub-forest is of a small size and high performance compared to the state-of-the-art algorithms.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Adnan MN, Islam MZ (2016) Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm. Knowl-Based Syst 110:86–97
Adnan MN, Islam MZ et al (2017a) Forex++: a new framework for knowledge discovery from decision forests. Australas J Inf Syst 21
Adnan MN, Islam MZ (2017b) Forest pa: constructing a decision forest by penalizing attributes used in previous trees. Expert Syst Appl 89:389–403
Akhiat Y, Chahhou M, Zinedine A (2018) Feature selection based on graph representation. In: 2018 IEEE 5th international congress on Information science and technology (CiSt). IEEE, pp 232–237
Akhiat Y, Chahhou M, Zinedine A (2019) Ensemble feature selection algorithm. Int J Intell Syst Appl 11(1):24
Akhiat Y, Asnaoui Y, Chahhou M, Zinedine A (2021a) A new graph feature selection approach. In: 2020 6th IEEE congress on information science and technology (CiSt). IEEE, pp 156–161
Akhiat Y, Manzali Y, Chahhou M, Zinedine A (2021b) A new noisy random forest based method for feature selection. Cybern Inf Technol 21(2)
Ali KM, Pazzani MJ (1996) Error reduction through learning multiple descriptions. Mach Learn 24(3):173–202
Angelov PP, Gu X (2019) Empirical approach to machine learning. Springer, Berlin
Angelov PP, Gu X, Príncipe JC (2017) A generalized methodology for data analysis. IEEE Trans Cybern 48(10):2981–2993
Bernard S, Heutte L, Adam S (2008) Forest-rk: a new random forest induction method. In: International conference on intelligent computing. Springer, pp 430–437
Bernard S, Heutte L, Adam S (2009) On the selection of decision trees in random forests. In: 2009 international joint conference on neural networks. IEEE, pp 302–307
Bernard S, Adam S, Heutte L (2012) Dynamic random forests. Pattern Recogn Lett 33(12):1580–1586
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Brieman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees, vol 67. Wadsworth Inc, Routledge
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Daho MEH, Settouti N, Bechar MEA, Boublenza A, Chikh MA (2021) A new correlation-based approach for ensemble selection in random forests. Int J Intell Comput Cybern 14(2):251–268
Deng H (2019) Interpreting tree ensembles with in trees. Int J Data Sci Anal 7(4):277–287
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42
Guo H, Liu H, Li R, Wu C, Guo Y, Xu M (2018) Margin & diversity based ordering ensemble pruning. Neurocomputing 275:237–246
Hart PE, Stork DG, Duda RO (2000) Pattern classification. Wiley, Hoboken
Huang G-B, Zhou H, Ding X, Zhang R (2011) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B (Cybern) 42(2):513–529
Jiang X, Wu C-A, Guo H (2017) Forest pruning based on branch importance. Comput Intell Neurosci 2017
Keller JM, Gray MR, Givens JA (1985) A fuzzy k-nearest neighbor algorithm. IEEE Trans Syst Man Cybern 4:580–585
Khan Z, Gul A, Mahmoud O, Miftahuddin M, Perperoglou A, Adler W, Lausen B (2016) An ensemble of optimal trees for class membership probability estimation. In: Analysis of large and complex data. Springer, pp 395–409
Khan Z, Gul A, Perperoglou A, Miftahuddin M, Mahmoud O, Adler W, Lausen B (2020) Ensemble of optimal trees, random forest and random projection ensemble classification. Adv Data Anal Classif 14(1):97–116
Khan Z, Gul N, Faiz N, Gul A, Adler W, Lausen B (2021) Optimal trees selection for classification via out-of-bag assessment and sub-bagging. IEEE Access 9:28591–28607
Kulkarni V, Sinha P, Singh A (2012) Heuristic based improvements for effective random forest classifier. In: Proceedings of international conference on computational intelligence. Springer, Chennai
Latinne P, Debeir O, Decaestecker C (2001) Limiting the number of trees in random forests. In: International workshop on multiple classifier systems. Springer, pp 178–187
Lichman M et al (2013) Uci machine learning repository
Lu Z, Wu X, Zhu X, Bongard J (2010) Ensemble pruning via individual contribution ordering. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 871–880
Maclin R, Opitz D (2011) Popular ensemble methods: an empirical study. J Artif Intell Res. arXiv:1106.0257
Margineantu DD, Dietterich TG (1997) Pruning adaptive boosting. In: ICML, vol 97. Citeseer, pp. 211–218
Martınez-Munoz G, Suárez A (2004) Aggregation ordering in bagging. In: Proc. of the IASTED international conference on artificial intelligence and applications. Citeseer, pp 258–263
Nan F, Wang J, Saligrama V (2016) Pruning random forests for prediction on a budget. In: Advances in neural information processing systems, pp 2334–2342
Oh D-Y (2012) GA-Boost: a genetic algorithm for robust boosting. The University of Alabama
Ordóñez FJ, Ledezma A, Sanchis A (2008) Genetic approach for optimizing ensembles of classifiers. In: FLAIRS conference, pp 89–94
Oshiro TM, Perez PS, Baranauskas JA (2012) How many trees in a random forest? In: International workshop on machine learning and data mining in pattern recognition. Springer, pp 154–168
Rakers C, Reker D, Brown JB (2017) Small random forest models for effective chemogenomic active learning. J Comput Aided Chem 18:124–142
Rodríguez-Fdez I, Canosa A, Mucientes M, Bugarín A (2015) STAC: a web platform for the comparison of algorithms using statistical tests. In: Proceedings of the 2015 IEEE international conference on Fuzzy systems (FUZZ-IEEE)
Ruta D, Gabrys B (2005) Classifier selection for majority voting. Inf Fusion 6(1):63–81
Souad TZ, Abdelkader A (2019) Pruning of random forests: a diversity-based heuristic measure to simplify a random forest ensemble. INFOCOMP: J Comput Sci 18(1)
Tripoliti EE, Fotiadis DI, Manis G (2010) Dynamic construction of random forests: evaluation using biomedical engineering problems. In: Proceedings of the 10th IEEE international conference on information technology and applications in biomedicine. IEEE, pp 1–4
Wang Y, Wang D, Geng N, Wang Y, Yin Y, Jin Y (2019) Stacking-based ensemble learning of decision trees for interpretable prostate cancer detection. Appl Soft Comput 77:188–204
Xu X, Chen W (2017) Implementation and performance optimization of dynamic random forest. In: 2017 International conference on cyber-enabled distributed computing and knowledge discovery (CyberC). IEEE, pp 283–289
Yang F, Lu W-H, Luo L-K, Li T (2012) Margin optimization based pruning for random forest. Neurocomputing 94:54–63
Yassine A, Mohamed C, Zinedine A (2017) Feature selection based on pairwise evalution. In: 2017 intelligent systems and computer vision (ISCV). IEEE, pp 1–6
Ying X (2019) An overview of overfitting and its solutions. J Phys: Conf Ser 1168:022022
Zhang H, Wang M (2009) Search for the smallest random forest. Stat Interface 2(3):381
Zhang C, Liu C, Zhang X, Almpanidis G (2017) An up-to-date comparison of state-of-the-art classification algorithms. Expert Syst Appl 82:128–150
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Manzali, Y., Akhiat, Y., Chahhou, M. et al. Reducing the number of trees in a forest using noisy features. Evolving Systems 14, 157–174 (2023). https://doi.org/10.1007/s12530-022-09441-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12530-022-09441-5