Abstract
This paper introduces a new ensemble approach, Feature-Subspace Aggregating (Feating), which builds local models instead of global models. Feating is a generic ensemble approach that can enhance the predictive performance of both stable and unstable learners. In contrast, most existing ensemble approaches can improve the predictive performance of unstable learners only. Our analysis shows that the new approach reduces the execution time to generate a model in an ensemble through an increased level of localisation in Feating. Our empirical evaluation shows that Feating performs significantly better than Boosting, Random Subspace and Bagging in terms of predictive accuracy, when a stable learner SVM is used as the base learner. The speed up achieved by Feating makes feasible SVM ensembles that would otherwise be infeasible for large data sets. When SVM is the preferred base learner, we show that Feating SVM performs better than Boosting decision trees and Random Forests. We further demonstrate that Feating also substantially reduces the error of another stable learner, k-nearest neighbour, and an unstable learner, decision tree.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Asuncion, A., & Newman, D. J. (2007). UCI repository of machine learning databases. University of California, Irvine, CA.
Atkeson, C. G., Moore, A. W., & Schaal, S. (1997). Locally weighted learning. Artificial Intelligence Review, 11, 11–73.
Bingham, E., & Mannila, H. (2001). Random projection in dimensionality reduction: applications to image and text data. In Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining (pp. 245–250). New York: ACM.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140.
Breiman, L. (1998). Arcing classifiers (with discussion). Annals of Statistics, 26(3), 801–849.
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32
Cerquides, J., & Mantaras, R. L. D. (2005). Robust Bayesian linear classifier ensembles. In Proceedings of the sixteenth European conference on machine learning (pp. 70–81). Berlin: Springer.
Davidson, I. (2004). An ensemble technique for stable learners with performance bounds. In Proceedings of the thirteenth national conference on artificial intelligence (pp. 330–335). Menlo Park: AAAI Press.
DePasquale, J., & Polikar, O. (2007). Random feature subset selection for ensemble based classification of data with missing features. In Lecture notes in computer science: Vol. 4472. Multiple classifier systems (pp. 251–260). Berlin: Springer.
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., & Lin, C.-J. (2008). LIBLINEAR: a library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.
Frank, E., Hall, M., & Pfahringer, B. (2003). Locally weighted Naive Bayes. In Proceedings of the 19th conference on uncertainty in artificial intelligence (pp. 249–256). San Mateo: Morgan Kaufmann.
Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions Pattern Analysis and Machine Intelligence, 20(8), 832–844.
Kim, H.-C., Pang, S., Je, H.-M., Kim, D., & Bang, S.-Y. (2002). Support vector machine ensemble with bagging. In Lecture notes in computer science: Vol. 2388. Pattern recognition with support vector machines (pp. 131–141). Berlin: Springer.
Klanke, S., Vijayakumar, S., & Schaal, S. (2008). A library for local weighted projection regression. Journal of Machine Learning Research, 9, 623–626.
Kohavi, R. (1996). Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In Proceedings of the 2nd international conference on knowledge discovery and data mining (pp. 202–207). New York: ACM.
Kohavi, R., & Li, C. H. (1995). Oblivious decision trees, graphs, and top-down pruning. In Proceedings of the 14th international joint conference on artificial intelligence (pp. 1071–1077). San Mateo: Morgan Kaufmann.
Li, X., Wang, L., & Sung, E. (2005). A study of AdaBoost with SVM based weak learners. In Proceedings of the international joint conference on neural networks (pp. 196–201). New York: IEEE Press.
Liu, F. T., Ting, K. M., Yu, Y., & Zhou, Z. H. (2008). Spectrum of variable-random trees. Journal of Artificial Intelligence Research, 32, 355–384.
Opitz, D. (1999). Feature selection for ensembles. In Proceedings of the 16th national conference on artificial intelligence (pp. 379–384). Menlo Park: AAAI Press.
Oza, N. C., & Tumer, K. (2001). Input decimation ensembles: decorrelation through dimensionality reduction. In LNCS: Vol. 2096. Proceedings of the second international workshop on multiple classifier systems (pp. 238–247). Berlin: Springer.
Pavlov, D., Mao, J., & Dom, B. (2000). Scaling-up support vector machines using the boosting algorithm. In Proceedings of the 15th international conference on pattern recognition (pp. 219–222). Los Alamitos: IEEE Comput. Soc.
Quinlan, J. R. (1993). C4.5: program for machine learning. San Mateo: Morgan Kaufmann.
Schapire, R. E., & Singer, S. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37, 297–336.
Tao, D., Tang, X., Li, X., & Wu, X. (2006). Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(7), 1088–1099.
Tsang, I. W., Kwok, J. T., & Cheung, P.-M. (2005). Core vector machines: fast SVM training on very large data sets. Journal of Machine Learning Research, 6, 363–392.
Tsang, I. W., Kocsor, A., & Kwok, J. T. (2007). Simpler core vector machines with enclosing balls. In Proceedings of the twenty-fourth international conference on machine learning (pp. 911–918). San Mateo: Morgan Kaufmann.
Webb, G. I., Boughton, J., & Wang, Z. (2005). Not so naive Bayes: averaged one-dependence estimators. Machine Learning, 58(1), 5–24.
Witten, I. H., & Frank, E. (2005). Data mining: practical machine learning tools and techniques (2nd edn.). San Mateo: Morgan Kaufmann.
Yang, Y., Webb, G. I., Cerquides, J., Korb, K., Boughton, J., & Ting, K. M. (2007). To select or to weigh: a comparative study of linear combination schemes for superparent-one-dependence ensembles. IEEE Transaction on Knowledge and Data Engineering, 19(12), 1652–1665.
Yu, H.-F., Hsieh, C.-J., Chang, K.-W., & Lin, C.-J. (2010). Large linear classification when data cannot fit in memory. In Proceedings of the sixteenth ACM SIGKDD conference on knowledge discovery and data mining (pp. 833–842). New York: ACM.
Zheng, Z., & Webb, G. I. (2000). Lazy learning of Bayesian rules. Machine Learning, 41(1), 53–84.
Zheng, F., & Webb, G. I. (2006). Efficient lazy elimination for averaged one-dependence estimators. In Proceedings of the twenty-third international conference on machine learning (pp. 1113–1120). San Mateo: Morgan Kaufmann.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Mark Craven.
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Ting, K.M., Wells, J.R., Tan, S.C. et al. Feature-subspace aggregating: ensembles for stable and unstable learners. Mach Learn 82, 375–397 (2011). https://doi.org/10.1007/s10994-010-5224-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-010-5224-5