Abstract
In ensemble methods each base learner is trained on a resampled version of the original training sample with the same size. In this paper we have used resampling without replacement or subsampling to train base classifiers with low subsample ratio i.e., the size of each subsample is smaller than the original training sample. The main objective of this paper is to check if the scalability performance of several well known ensemble methods with low subsample ratio are competent and compare them with their original counterpart. We have selected three ensemble methods: Bagging, Adaboost and Bundling. In all the ensemble methods a full decision tree is used as the base classifier. We have applied the subsampled version of the above ensembles in several well known benchmark datasets to check the error rate. We have also checked the time complexity of each ensemble method with low subsampling ratio. From the experiments, it is apparent that in the case of bagging and adaboost with low subsampling ratio for most of the cases the error rate is inversely related with subsample size, while for bundling it is opposite. Overall performance of the ensemble methods with low subsampling ratio from experiments showed that bundling is superior in accuracy with low subsampling ratio in almost all the datasets, while bagging is superior in reducing time complexity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases, http://www.ics.uci.edu/mlearn/MLRepository.html
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996a)
Breiman, L.: Out-of-bag estimation. Statistics Department, University of Berkeley CA 94708, Technical Report (1996b)
Breiman, L.: Heuristics of instability and stabilization in model selection. Annals of Statistics 24(6), 2350–2383 (1996c)
Bühlman, P.: Bagging, subagging and bragging for improving some prediction algorithms. In: Arkitas, M.G., Politis, D.N. (eds.) Recent Advances and Trends in Nonparametric Statistics, pp. 9–34. Elsevier, Amsterdam (2003)
Demšar, J.: Statistical comparisons of classifiers over multiple datasets. J. Mach. Learn. Research 7, 1–30 (2006)
Freund, Y., Schapire, R.: Experiments with a New boosting algorithm. In: Machine Learning: Proceedings to the Thirteenth International Conference, pp. 148–156. Morgan Kaufmann, San Francisco (1996)
Freund, Y., Schapire, R.: A decision-theoretic generalization of online learning and an application to boosting. J. Comput. System Sci. 55, 119–139 (1997)
Friedman, J.: Stochastic gradient boosting. Comput. Statist. Data Anal. 38, 367–378 (2002)
Friedman, J., Hall, P.: On Bagging and Non-linear Estimation. J. Statist. Planning and Infer. 137(3), 669–683 (2007)
Hastie, T., Tibshirani, R., Freidman, J.: The elements of statistical learning: data mining, inference and prediction. Springer, New York (2001)
Hothorn, T., Lausen, B.: Double-bagging: combining classifiers by bootstrap aggregation. Pattern Recognition 36(6), 1303–1309 (2003)
Hothorn, T., Lausen, B.: Bundling classifiers by bagging trees. Comput. Statist. Data Anal. 49, 1068–1078 (2005)
Kuncheva, L.I.: Combining Pattern Classifiers. Methods and Algorithms. John Wiley and Sons, Chichester (2004)
Rodríguez, J., Kuncheva, L., Alonso, C.: Rotation forest: A new classifier ensemble method. IEEE Trans. Patt. Analys. Mach. Intell. 28(10), 1619–1630 (2006)
Zaman, F., Hirose, H.: Double SVMbagging: A subsampling approach to SVM ensemble. To appear in Intelligent Automation and Computer Engineering. Springer, Heidelberg (2009)
Zaman, F., Hirose, H.: Effect of Subsampling Rate on Subbagging and Related Ensembles of Stable Classifiers. In: Chaudhury, S., Mitra, S., Murthy, C.A., Sastry, P.S., Pal, S.K. (eds.) PReMI 2009. LNCS, vol. 5909, pp. 44–49. Springer, Heidelberg (2009)
Zhang, C.X., Zhang, J.S., Zhang, G.Y.: An efficient modified boosting method for solving classification problems. J. Comput. Applied Mathemat. 214, 381–392 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Faisal, Z., Hirose, H. (2010). A Comparative Study on the Performance of Several Ensemble Methods with Low Subsampling Ratio. In: Nguyen, N.T., Le, M.T., Świątek, J. (eds) Intelligent Information and Database Systems. ACIIDS 2010. Lecture Notes in Computer Science(), vol 5991. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12101-2_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-12101-2_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12100-5
Online ISBN: 978-3-642-12101-2
eBook Packages: Computer ScienceComputer Science (R0)