Influence of Hyperparameters on Random Forest Accuracy

Simon Bernard¹⁹,
Laurent Heutte¹⁹ &
Sébastien Adam¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5519))

Included in the following conference series:

International Workshop on Multiple Classifier Systems

3413 Accesses
64 Citations

Abstract

In this paper we present our work on the Random Forest (RF) family of classification methods. Our goal is to go one step further in the understanding of RF mechanisms by studying the parametrization of the reference algorithm Forest-RI. In this algorithm, a randomization principle is used during the tree induction process, that randomly selects K features at each node, among which the best split is chosen. The strength of randomization in the tree induction is thus led by the hyperparameter K which plays an important role for building accurate RF classifiers. We have decided to focus our experimental study on this hyperparameter and on its influence on classification accuracy. For that purpose, we have evaluated the Forest-RI algorithm on several machine learning problems and with different settings of K in order to understand the way it acts on RF performance. We show that default values of K traditionally used in the literature are globally near-optimal, except for some cases for which they are all significatively sub-optimal. Thus additional experiments have been led on those datasets, that highlight the crucial role played by feature relevancy in finding the optimal setting of K.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Case Study I: Tuning Random Forest (Ranger)

Double random forest

Article 02 July 2020

Comments on: A random forest guided tour

Article 19 April 2016

References

Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
MATH Google Scholar
Ho, T.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)
Article Google Scholar
Bernard, S., Heutte, L., Adam, S.: Using random forests for handwritten digit recognition. In: International Conference on Document Analysis and Recognition, pp. 1043–1047 (2007)
Google Scholar
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar
Breiman, L.: Consistency of random forests and other averaging classifiers. Technical Report (2004)
Google Scholar
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Machine Learning 36(1), 3–42 (2006)
Article MATH Google Scholar
Latinne, P., Debeir, O., Decaestecker, C.: Limiting the number of trees in random forests. In: Kittler, J., Roli, F. (eds.) MCS 2001. LNCS, vol. 2096, pp. 178–187. Springer, Heidelberg (2001)
Chapter Google Scholar
Rodriguez, J., Kuncheva, L., Alonso, C.: Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(10), 1619–1630 (2006)
Article Google Scholar
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Chapman and Hall/Wadsworth, Inc., New York (1984)
MATH Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Asuncion, A., Newman, D.: UCI machine learning repository (2007)
Google Scholar
Chatelain, C., Heutte, L., Paquet, T.: A two-stage outlier rejection strategy for numerical field extraction in handwritten documents. In: International Conference on Pattern Recognition, Honk Kong, China, vol. 3, pp. 224–227 (2006)
Google Scholar
Heutte, L., Paquet, T., Moreau, J., Lecourtier, Y., Olivier, C.: A structural/statistical feature based vector for handwritten character recognition. Pattern Recognition Letters 19(7), 629–641 (1998)
Article Google Scholar
Kimura, F., Tsuruoka, S., Miyake, Y., Shridhar, M.: A lexicon directed algorithm for recognition of unconstrained handwritten words. IEICE Transaction on Information and System E77-D(7), 785–793 (1994)
Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Université de Rouen, LITIS EA 4108, BP 12, 76801, Saint-Etienne du Rouvray, France
Simon Bernard, Laurent Heutte & Sébastien Adam

Authors

Simon Bernard
View author publications
You can also search for this author in PubMed Google Scholar
Laurent Heutte
View author publications
You can also search for this author in PubMed Google Scholar
Sébastien Adam
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Electrical and Computer Engineering, University of Iceland, Hjardarhagi 2-6, 107, Reykjavik, Iceland
Jón Atli Benediktsson
Speech and Signal Processing, Guildford, University of Surrey, Centre for Vision, GU2 7XH, Surrey, United Kingdom
Josef Kittler
Department of Electrical and Electronic Engineering, Piazza d’Armi, University of Cagliari, 09123, Cagliari, Italy
Fabio Roli

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bernard, S., Heutte, L., Adam, S. (2009). Influence of Hyperparameters on Random Forest Accuracy. In: Benediktsson, J.A., Kittler, J., Roli, F. (eds) Multiple Classifier Systems. MCS 2009. Lecture Notes in Computer Science, vol 5519. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02326-2_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-02326-2_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02325-5
Online ISBN: 978-3-642-02326-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics