Abstract
Random forest is one of the most used machine learning algorithms since its high predictive performance. However, many studies criticize it for the fact that it generates a large number of trees, which requires important storage space and a significant learning time. In addition, the final model induced by RF may contain redundant trees and others that do not contribute to the prediction that may even disadvantage performance. This is why many researchers try to reduce the number of trees in a forest called forest pruning. This article presents a study of the pruning work of random forest classifiers, explains in detail the operating principle of each technique, and cites their advantages and disadvantages. Finally, it compares their classification performance in terms of accuracy, speed of learning, and complexity.
Similar content being viewed by others
Data Availability
The datasets used during the current study are freely available in the UCI repository.
Code Availability
The code will be available upon request to reviewers.
References
Nilsson NJ (1965) Learning machines
Yu K, Wang L, Yu Y (2020) Ordering-based Kalman filter selective ensemble for classification. IEEE Access 8:9715–9727
Skurichina M, Duin RP (1998) Bagging for linear classifiers. Pattern Recogn 31(7):909–930
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. In: ICML, vol 96. Citeseer, pp 148–156
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Robnik-Sikonja M (2004) Improving random forests. In: European Conference on Machine Learning. Springer, pp 359–370
Tsymbal A, Pechenizkiy M, Cunningham P (2006) Dynamic integration with random forests. In: European Conference on Machine Learning. Springer, pp 801–808
Breitenbach M, Nielsen R, Grudic GZ (2003) Probabilistic random forests: Predicting data point specific misclassification probabilities. University of Colorado at Boulder, Tech. Rep. CU-CS-954-03
Biau G, Devroye L, Lugosi G (2008) Consistency of random forests and other averaging classifiers. J Mach Learn Res 9(9)
Kulkarni VY, Sinha PK (2012) Pruning of random forest classifiers: A survey and future directions. In: 2012 International Conference on Data Science & Engineering (ICDSE). IEEE, pp 64–68
Shaik AB, Srinivasan S (2019) A brief survey on random forest ensembles in classification model. In: International Conference on Innovative Computing and Communications. Springer, pp 253–260
Hu R, Zhou S, Liu Y, Tang Z (2019) Margin-based pareto ensemble pruning: an ensemble pruning algorithm that learns to search optimized ensembles. Comput Intell Neurosci 2019
Martinez WG (2021) Ensemble pruning via quadratic margin maximization. IEEE Access 9:48931–48951
Chung D, Kim H (2015) Accurate ensemble pruning with Pl-bagging. Comput Stat Data Anal 83:1–13
Jiang Z-Q, Shen X-J, Gou J-P, Wang L, Zha Z-J (2019) Dynamically building diversified classifier pruning ensembles via canonical correlation analysis. Multimed Tools Appl 78(1):271–288
Zhang H, Song Y, Jiang B, Chen B, Shan G (2019) Two-stage bagging pruning for reducing the ensemble size and improving the classification performance. Math Probl Eng 2019
Croux C, Joossens K, Lemmens A (2007) Trimmed bagging. Comput Stat Data Anal 52(1):362–368
Ni Z, Xia P, Zhu X, Ding Y, Ni L (2020) A novel ensemble pruning approach based on information exchange glowworm swarm optimization and complementarity measure. J Intell Fuzzy Syst 39(6):8299–8313
Guo H, Liu H, Li R, Wu C, Guo Y, Xu M (2018) Margin & diversity based ordering ensemble pruning. Neurocomputing 275:237–246
Zhu X, Ni Z, Ni L, Jin F, Cheng M, Li J (2019) Improved discrete artificial fish swarm algorithm combined with margin distance minimization for ensemble pruning. Comput Ind Eng 128:32–46
Nguyen TT, Luong AV, Dang MT, Liew AW-C, McCall J (2020) Ensemble selection based on classifier prediction confidence. Pattern Recogn 100:107104
Fawagreh K, Gaber MM, Elyan E (2015) Club-DRF: a clustering approach to extreme pruning of random forests. In: International Conference on Innovative Techniques and Applications of Artificial Intelligence. Springer, pp 59–73
Zhang H, Cao L (2014) A spectral clustering based ensemble pruning approach. Neurocomputing 139:289–297
Lustosa Filho JAS, Canuto AM, Santiago RHN (2018) Investigating the impact of selection criteria in dynamic ensemble selection methods. Expert Syst Appl 106:141–153
Zouggar ST, Adla A (2019) A diversity-accuracy measure for homogenous ensemble selection. International Journal of Interactive Multimedia & Artificial Intelligence 5(5)
Bader-El-Den M, Gaber M (2012) GARF: towards self-optimised random forests. In: International Conference on Neural Information Processing. Springer, pp 506–515
Adnan MN, Islam MZ (2016) Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm. Knowl-Based Syst 110:86–97
Daho MEH, Settouti N, Bechar MEA, Boublenza A, Chikh MA (2021) A new correlation-based approach for ensemble selection in random forests. Int J Intell Comput Cybern
Souad TZ, Abdelkader A (2019) Pruning of random forests: a diversity-based heuristic measure to simplify a random forest ensemble. INFOCOMP: J Comput Sci 18(1)
Khan Z, Gul A, Perperoglou A, Miftahuddin M, Mahmoud O, Adler W, Lausen B (2020) Ensemble of optimal trees, random forest and random projection ensemble classification. Adv Data Anal Classif 14(1):97–116
Dheenadayalan K, Srinivasaraghavan G, Muralidhara V (2016) Pruning a random forest by learning a learning algorithm. In: International Conference on Machine Learning and Data Mining in Pattern Recognition. Springer, pp 516–529
Giffon L, Lamothe C, Bouscarrat L, Milanesi P, Cherfaoui F, Koço S (2020) Pruning random forest with orthogonal matching trees
Jiang X, Wu C-A, Guo H (2017) Forest pruning based on branch importance. Comput Intell Neurosci 2017
Fawagreh K, Gaber MM (2020) egap: an evolutionary game theoretic approach to random forest pruning. Big Data Cogn Comput 4(4):37
Ren S, Cao X, Wei Y, Sun J (2015) Global refinement of random forest. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 723–730
Narassiguin A, Elghazel H, Aussem A (2016) Similarity tree pruning: a novel dynamic ensemble selection approach. In: 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW). IEEE, pp 1243–1250
Rodriguez-Fdez I, Canosa A, Mucientes M, Bugarin A (2015) STAC: a web platform for the comparison of algorithms using statistical tests. In: 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). IEEE, pp 1–8
Funding
The authors did not receive support from any organization for the submitted work.
Author information
Authors and Affiliations
Contributions
The authors confirm their contribution to the paper as follows:
• Study conception and design: Y. Manzali, M. El far
• Analysis and interpretation of results: M. El far
• Draft manuscript preparation: Y. Manzali
• Critical revision of the article: M.El far
All authors reviewed the results and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Ethical Approval
Not applicable
Consent to Participate
Not applicable
Consent for Publication
All authors of the manuscript have agreed for authorship, read and approved the manuscript, and given consent for the submission of the manuscript.
Conflict of Interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Manzali, Y., Elfar, M. Random Forest Pruning Techniques: A Recent Review. Oper. Res. Forum 4, 43 (2023). https://doi.org/10.1007/s43069-023-00223-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s43069-023-00223-6