Root Attribute Behavior within a Random Forest

Thais Mayumi Oshiro¹⁹ &
José Augusto Baranauskas¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7435))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1655 Accesses

Abstract

Random Forest is a computationally efficient technique that can operate quickly over large datasets. It has been used in many recent research projects and real-world applications in diverse domains. However, the associated literature provides few information about what happens in the trees within a Random Forest. The research reported here analyzes the frequency that an attribute appears in the root node in a Random Forest in order to find out if it uses all attributes with equal frequency or if there is some of them most used. Additionally, we have also analyzed the estimated out-of-bag error of the trees aiming to check if the most used attributes present a good performance. Furthermore, we have analyzed if the use of pre-pruning could influence the performance of the Random Forest using out-of-bag errors. Our main conclusions are that the frequency of the attributes in the root node has an exponential behavior. In addition, the use of the estimated out-of-bag error can help to find relevant attributes within the forest. Concerning to the use of pre-pruning, it was observed the execution time can be faster, without significant loss of performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Pruning a Random Forest by Learning a Learning Algorithm

D2TS: a dual diversity tree selection approach to pruning of random forests

Article Open access 15 September 2022

An Advanced Random Forest Algorithm Targeting the Big Data with Redundant Features

References

Cancer program data sets. Broad Institute (2010), http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi
Dataset repository in arff (weka). BioInformatics Group Seville (2010), http://www.upo.es/eps/bigs/datasets.html
Datasets. Cilab (2010), http://cilab.ujn.edu.cn/datasets.htm
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B 57, 289–300 (1995)
MathSciNet MATH Google Scholar
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
MathSciNet MATH Google Scholar
Costa, P.R., Acencio, M.L., Lemke, N.: A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data. BMC Genomics 11(suppl. 5) (2010)
Google Scholar
Demšar, J.: Statistical comparison of classifiers over multiple data sets. Journal of Machine Learning Research 7(1), 1–30 (2006)
MATH Google Scholar
Dubath, P., Rimoldini, L., Süveges, M., Blomme, J., López, M., Sarro, L.M., De Ridder, J., Cuypers, J., Guy, L., Lecoeur, I., Nienartowicz, K., Jan, A., Beck, M., Mowlavi, N., De Cat, P., Lebzelter, T., Eyer, L.: Random forest automated supervised classification of hipparcos periodic variable stars. Monthly Notices of the Royal Astronomical Society 414(3), 2602–2617 (2011), http://dx.doi.org/10.1111/j.1365-2966.2011.18575.x
Article Google Scholar
Frank, A., Asuncion, A.: UCI machine learning repository (2010), http://archive.ics.uci.edu/ml
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 123–140. Morgan Kaufmann, Lake Tahoe (1996)
Google Scholar
Friedman, M.: A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics 11(1), 86–92 (1940)
Article MATH Google Scholar
Goldstein, B., Hubbard, A., Cutler, A., Barcellos, L.: An application of random forests to a genome-wide association dataset: Methodological considerations and new findings. BMC Genetics 11(1), 49 (2010), http://www.biomedcentral.com/1471-2156/11/49
Article Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. Association for Computing Machinery’s Special Interest Group on Knowledge Discovery and Data Mining Explor. Newsl. 11(1), 10–18 (2009)
Google Scholar
Lee, J.-H., Kim, D.-Y., Ko, B.C., Nam, J.-Y.: Keyword annotation of medical image with random forest classifier and confidence assigning. In: International Conference on Computer Graphics, Imaging and Visualization, pp. 156–159 (2011)
Google Scholar
Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 1–5 (2002), http://CRAN.R-project.org/doc/Rnews/
Google Scholar
Netto, O.P., Nozawa, S.R., Mitrowsky, R.A.R., Macedo, A.A., Baranauskas, J.A.:
Google Scholar
Oh, I.S., Lee, J.S., Moon, B.R.: Hybrid genetic algorithms for feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 26, 1424–1437 (2004)
Article Google Scholar
Oshiro, T.M., Perez, P.S., Baranauskas, J.A.: How Many Trees in a Random Forest? In: Perner, P. (ed.) MLDM 2012. LNCS, vol. 7376, pp. 154–168. Springer, Heidelberg (2012)
Chapter Google Scholar
Saeys, Y., Inza, I.n., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23, 2507–2517 (2007)
Article Google Scholar
Sirikulviriya, N., Sinthupinyo, S.: Integration of rules from a random forest. In: International Conference on Information and Electronics Engineering, vol. 6, pp. 194–198 (2011)
Google Scholar
Tang, Y.: Real-Time Automatic Face Tracking Using Adaptive Random Forests. Master’s thesis, Department of Electrical and Computer Engineering, McGill University, Montreal, Canada (June 2010)
Google Scholar
Wang, G., Hao, J., Ma, J., Jiang, H.: A comparative assessment of ensemble learning for credit scoring. Expert Systems with Applications 38, 223–230 (2011)
Article Google Scholar
Yaqub, M., Javaid, M.K., Cooper, C., Noble, J.A.: Improving the Classification Accuracy of the Classic RF Method by Intelligent Feature Selection and Weighted Voting of Trees with Application to Medical Image Segmentation. In: Suzuki, K., Wang, F., Shen, D., Yan, P. (eds.) MLMI 2011. LNCS, vol. 7009, pp. 184–192. Springer, Heidelberg (2011)
Chapter Google Scholar
Zhao, Y., Zhang, Y.: Comparison of decision tree methods for finding active objects. Advances in Space Research 41, 1955–1959 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Mathematics, Faculty of Philosophy, Sciences and Languages at Ribeirao Preto, University of Sao Paulo, Brazil
Thais Mayumi Oshiro & José Augusto Baranauskas

Authors

Thais Mayumi Oshiro
View author publications
You can also search for this author in PubMed Google Scholar
José Augusto Baranauskas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electrical and Electronic Engineering, The University of Manchester, M13 9PL, Manchester, UK
Hujun Yin
Department of Electrical Engineering, Federal University of Rio Grande do Norte, Lagoa Nova, 59072-970, Natal, RN, Brazil
José A. F. Costa
Department of Teleinformatics Engineering, Federal University of Ceará, Campus of Pici, CP 6005, 60455-760, Fortaleza, CE, Brazil
Guilherme Barreto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oshiro, T.M., Baranauskas, J.A. (2012). Root Attribute Behavior within a Random Forest. In: Yin, H., Costa, J.A.F., Barreto, G. (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2012. IDEAL 2012. Lecture Notes in Computer Science, vol 7435. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32639-4_87

Download citation

DOI: https://doi.org/10.1007/978-3-642-32639-4_87
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32638-7
Online ISBN: 978-3-642-32639-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Root Attribute Behavior within a Random Forest

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Pruning a Random Forest by Learning a Learning Algorithm

D2TS: a dual diversity tree selection approach to pruning of random forests

An Advanced Random Forest Algorithm Targeting the Big Data with Redundant Features

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Root Attribute Behavior within a Random Forest

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Pruning a Random Forest by Learning a Learning Algorithm

D2TS: a dual diversity tree selection approach to pruning of random forests

An Advanced Random Forest Algorithm Targeting the Big Data with Redundant Features

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation