[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Random Forest Pruning Techniques: A Recent Review

  • Review
  • Published:
Operations Research Forum Aims and scope Submit manuscript

Abstract

Random forest is one of the most used machine learning algorithms since its high predictive performance. However, many studies criticize it for the fact that it generates a large number of trees, which requires important storage space and a significant learning time. In addition, the final model induced by RF may contain redundant trees and others that do not contribute to the prediction that may even disadvantage performance. This is why many researchers try to reduce the number of trees in a forest called forest pruning. This article presents a study of the pruning work of random forest classifiers, explains in detail the operating principle of each technique, and cites their advantages and disadvantages. Finally, it compares their classification performance in terms of accuracy, speed of learning, and complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Similar content being viewed by others

Data Availability

The datasets used during the current study are freely available in the UCI repository.

Code Availability

The code will be available upon request to reviewers.

References

  1. Nilsson NJ (1965) Learning machines

  2. Yu K, Wang L, Yu Y (2020) Ordering-based Kalman filter selective ensemble for classification. IEEE Access 8:9715–9727

    Article  Google Scholar 

  3. Skurichina M, Duin RP (1998) Bagging for linear classifiers. Pattern Recogn 31(7):909–930

    Article  Google Scholar 

  4. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    Article  Google Scholar 

  5. Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139

    Article  Google Scholar 

  6. Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. In: ICML, vol 96. Citeseer, pp 148–156

  7. Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844

    Article  Google Scholar 

  8. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  9. Robnik-Sikonja M (2004) Improving random forests. In: European Conference on Machine Learning. Springer, pp 359–370

  10. Tsymbal A, Pechenizkiy M, Cunningham P (2006) Dynamic integration with random forests. In: European Conference on Machine Learning. Springer, pp 801–808

  11. Breitenbach M, Nielsen R, Grudic GZ (2003) Probabilistic random forests: Predicting data point specific misclassification probabilities. University of Colorado at Boulder, Tech. Rep. CU-CS-954-03

  12. Biau G, Devroye L, Lugosi G (2008) Consistency of random forests and other averaging classifiers. J Mach Learn Res 9(9)

  13. Kulkarni VY, Sinha PK (2012) Pruning of random forest classifiers: A survey and future directions. In: 2012 International Conference on Data Science & Engineering (ICDSE). IEEE, pp 64–68

  14. Shaik AB, Srinivasan S (2019) A brief survey on random forest ensembles in classification model. In: International Conference on Innovative Computing and Communications. Springer, pp 253–260

  15. Hu R, Zhou S, Liu Y, Tang Z (2019) Margin-based pareto ensemble pruning: an ensemble pruning algorithm that learns to search optimized ensembles. Comput Intell Neurosci 2019

  16. Martinez WG (2021) Ensemble pruning via quadratic margin maximization. IEEE Access 9:48931–48951

    Article  Google Scholar 

  17. Chung D, Kim H (2015) Accurate ensemble pruning with Pl-bagging. Comput Stat Data Anal 83:1–13

    Article  Google Scholar 

  18. Jiang Z-Q, Shen X-J, Gou J-P, Wang L, Zha Z-J (2019) Dynamically building diversified classifier pruning ensembles via canonical correlation analysis. Multimed Tools Appl 78(1):271–288

    Article  Google Scholar 

  19. Zhang H, Song Y, Jiang B, Chen B, Shan G (2019) Two-stage bagging pruning for reducing the ensemble size and improving the classification performance. Math Probl Eng 2019

  20. Croux C, Joossens K, Lemmens A (2007) Trimmed bagging. Comput Stat Data Anal 52(1):362–368

    Article  Google Scholar 

  21. Ni Z, Xia P, Zhu X, Ding Y, Ni L (2020) A novel ensemble pruning approach based on information exchange glowworm swarm optimization and complementarity measure. J Intell Fuzzy Syst 39(6):8299–8313

    Article  Google Scholar 

  22. Guo H, Liu H, Li R, Wu C, Guo Y, Xu M (2018) Margin & diversity based ordering ensemble pruning. Neurocomputing 275:237–246

    Article  Google Scholar 

  23. Zhu X, Ni Z, Ni L, Jin F, Cheng M, Li J (2019) Improved discrete artificial fish swarm algorithm combined with margin distance minimization for ensemble pruning. Comput Ind Eng 128:32–46

    Article  Google Scholar 

  24. Nguyen TT, Luong AV, Dang MT, Liew AW-C, McCall J (2020) Ensemble selection based on classifier prediction confidence. Pattern Recogn 100:107104

  25. Fawagreh K, Gaber MM, Elyan E (2015) Club-DRF: a clustering approach to extreme pruning of random forests. In: International Conference on Innovative Techniques and Applications of Artificial Intelligence. Springer, pp 59–73

  26. Zhang H, Cao L (2014) A spectral clustering based ensemble pruning approach. Neurocomputing 139:289–297

    Article  Google Scholar 

  27. Lustosa Filho JAS, Canuto AM, Santiago RHN (2018) Investigating the impact of selection criteria in dynamic ensemble selection methods. Expert Syst Appl 106:141–153

    Article  Google Scholar 

  28. Zouggar ST, Adla A (2019) A diversity-accuracy measure for homogenous ensemble selection. International Journal of Interactive Multimedia & Artificial Intelligence 5(5)

  29. Bader-El-Den M, Gaber M (2012) GARF: towards self-optimised random forests. In: International Conference on Neural Information Processing. Springer, pp 506–515

  30. Adnan MN, Islam MZ (2016) Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm. Knowl-Based Syst 110:86–97

    Article  Google Scholar 

  31. Daho MEH, Settouti N, Bechar MEA, Boublenza A, Chikh MA (2021) A new correlation-based approach for ensemble selection in random forests. Int J Intell Comput Cybern

  32. Souad TZ, Abdelkader A (2019) Pruning of random forests: a diversity-based heuristic measure to simplify a random forest ensemble. INFOCOMP: J Comput Sci 18(1)

  33. Khan Z, Gul A, Perperoglou A, Miftahuddin M, Mahmoud O, Adler W, Lausen B (2020) Ensemble of optimal trees, random forest and random projection ensemble classification. Adv Data Anal Classif 14(1):97–116

    Article  Google Scholar 

  34. Dheenadayalan K, Srinivasaraghavan G, Muralidhara V (2016) Pruning a random forest by learning a learning algorithm. In: International Conference on Machine Learning and Data Mining in Pattern Recognition. Springer, pp 516–529

  35. Giffon L, Lamothe C, Bouscarrat L, Milanesi P, Cherfaoui F, Koço S (2020) Pruning random forest with orthogonal matching trees

  36. Jiang X, Wu C-A, Guo H (2017) Forest pruning based on branch importance. Comput Intell Neurosci 2017

  37. Fawagreh K, Gaber MM (2020) egap: an evolutionary game theoretic approach to random forest pruning. Big Data Cogn Comput 4(4):37

    Article  Google Scholar 

  38. Ren S, Cao X, Wei Y, Sun J (2015) Global refinement of random forest. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 723–730

  39. Narassiguin A, Elghazel H, Aussem A (2016) Similarity tree pruning: a novel dynamic ensemble selection approach. In: 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW). IEEE, pp 1243–1250

  40. Rodriguez-Fdez I, Canosa A, Mucientes M, Bugarin A (2015) STAC: a web platform for the comparison of algorithms using statistical tests. In: 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). IEEE, pp 1–8

Download references

Funding

The authors did not receive support from any organization for the submitted work.

Author information

Authors and Affiliations

Authors

Contributions

The authors confirm their contribution to the paper as follows:

• Study conception and design: Y. Manzali, M. El far

• Analysis and interpretation of results: M. El far

• Draft manuscript preparation: Y. Manzali

• Critical revision of the article: M.El far

All authors reviewed the results and approved the final version of the manuscript.

Corresponding author

Correspondence to Youness Manzali.

Ethics declarations

Ethical Approval

Not applicable

Consent to Participate

Not applicable

Consent for Publication

All authors of the manuscript have agreed for authorship, read and approved the manuscript, and given consent for the submission of the manuscript.

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Manzali, Y., Elfar, M. Random Forest Pruning Techniques: A Recent Review. Oper. Res. Forum 4, 43 (2023). https://doi.org/10.1007/s43069-023-00223-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s43069-023-00223-6

Keywords

Navigation