[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Root Attribute Behavior within a Random Forest

  • Conference paper
Intelligent Data Engineering and Automated Learning - IDEAL 2012 (IDEAL 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7435))

  • 1655 Accesses

Abstract

Random Forest is a computationally efficient technique that can operate quickly over large datasets. It has been used in many recent research projects and real-world applications in diverse domains. However, the associated literature provides few information about what happens in the trees within a Random Forest. The research reported here analyzes the frequency that an attribute appears in the root node in a Random Forest in order to find out if it uses all attributes with equal frequency or if there is some of them most used. Additionally, we have also analyzed the estimated out-of-bag error of the trees aiming to check if the most used attributes present a good performance. Furthermore, we have analyzed if the use of pre-pruning could influence the performance of the Random Forest using out-of-bag errors. Our main conclusions are that the frequency of the attributes in the root node has an exponential behavior. In addition, the use of the estimated out-of-bag error can help to find relevant attributes within the forest. Concerning to the use of pre-pruning, it was observed the execution time can be faster, without significant loss of performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 35.99
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 44.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Cancer program data sets. Broad Institute (2010), http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi

  2. Dataset repository in arff (weka). BioInformatics Group Seville (2010), http://www.upo.es/eps/bigs/datasets.html

  3. Datasets. Cilab (2010), http://cilab.ujn.edu.cn/datasets.htm

  4. Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B 57, 289–300 (1995)

    MathSciNet  MATH  Google Scholar 

  5. Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  6. Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)

    MathSciNet  MATH  Google Scholar 

  7. Costa, P.R., Acencio, M.L., Lemke, N.: A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data. BMC Genomics 11(suppl. 5) (2010)

    Google Scholar 

  8. Demšar, J.: Statistical comparison of classifiers over multiple data sets. Journal of Machine Learning Research 7(1), 1–30 (2006)

    MATH  Google Scholar 

  9. Dubath, P., Rimoldini, L., Süveges, M., Blomme, J., López, M., Sarro, L.M., De Ridder, J., Cuypers, J., Guy, L., Lecoeur, I., Nienartowicz, K., Jan, A., Beck, M., Mowlavi, N., De Cat, P., Lebzelter, T., Eyer, L.: Random forest automated supervised classification of hipparcos periodic variable stars. Monthly Notices of the Royal Astronomical Society 414(3), 2602–2617 (2011), http://dx.doi.org/10.1111/j.1365-2966.2011.18575.x

    Article  Google Scholar 

  10. Frank, A., Asuncion, A.: UCI machine learning repository (2010), http://archive.ics.uci.edu/ml

  11. Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning, pp. 123–140. Morgan Kaufmann, Lake Tahoe (1996)

    Google Scholar 

  12. Friedman, M.: A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics 11(1), 86–92 (1940)

    Article  MATH  Google Scholar 

  13. Goldstein, B., Hubbard, A., Cutler, A., Barcellos, L.: An application of random forests to a genome-wide association dataset: Methodological considerations and new findings. BMC Genetics 11(1), 49 (2010), http://www.biomedcentral.com/1471-2156/11/49

    Article  Google Scholar 

  14. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. Association for Computing Machinery’s Special Interest Group on Knowledge Discovery and Data Mining Explor. Newsl. 11(1), 10–18 (2009)

    Google Scholar 

  15. Lee, J.-H., Kim, D.-Y., Ko, B.C., Nam, J.-Y.: Keyword annotation of medical image with random forest classifier and confidence assigning. In: International Conference on Computer Graphics, Imaging and Visualization, pp. 156–159 (2011)

    Google Scholar 

  16. Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 1–5 (2002), http://CRAN.R-project.org/doc/Rnews/

    Google Scholar 

  17. Netto, O.P., Nozawa, S.R., Mitrowsky, R.A.R., Macedo, A.A., Baranauskas, J.A.:

    Google Scholar 

  18. Oh, I.S., Lee, J.S., Moon, B.R.: Hybrid genetic algorithms for feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 26, 1424–1437 (2004)

    Article  Google Scholar 

  19. Oshiro, T.M., Perez, P.S., Baranauskas, J.A.: How Many Trees in a Random Forest? In: Perner, P. (ed.) MLDM 2012. LNCS, vol. 7376, pp. 154–168. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  20. Saeys, Y., Inza, I.n., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23, 2507–2517 (2007)

    Article  Google Scholar 

  21. Sirikulviriya, N., Sinthupinyo, S.: Integration of rules from a random forest. In: International Conference on Information and Electronics Engineering, vol. 6, pp. 194–198 (2011)

    Google Scholar 

  22. Tang, Y.: Real-Time Automatic Face Tracking Using Adaptive Random Forests. Master’s thesis, Department of Electrical and Computer Engineering, McGill University, Montreal, Canada (June 2010)

    Google Scholar 

  23. Wang, G., Hao, J., Ma, J., Jiang, H.: A comparative assessment of ensemble learning for credit scoring. Expert Systems with Applications 38, 223–230 (2011)

    Article  Google Scholar 

  24. Yaqub, M., Javaid, M.K., Cooper, C., Noble, J.A.: Improving the Classification Accuracy of the Classic RF Method by Intelligent Feature Selection and Weighted Voting of Trees with Application to Medical Image Segmentation. In: Suzuki, K., Wang, F., Shen, D., Yan, P. (eds.) MLMI 2011. LNCS, vol. 7009, pp. 184–192. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  25. Zhao, Y., Zhang, Y.: Comparison of decision tree methods for finding active objects. Advances in Space Research 41, 1955–1959 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Oshiro, T.M., Baranauskas, J.A. (2012). Root Attribute Behavior within a Random Forest. In: Yin, H., Costa, J.A.F., Barreto, G. (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2012. IDEAL 2012. Lecture Notes in Computer Science, vol 7435. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32639-4_87

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32639-4_87

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32638-7

  • Online ISBN: 978-3-642-32639-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics