Abstract
This paper introduces a robust variant of AdaBoost, cw-AdaBoost, that uses weight perturbation to reduce variance error, and is particularly effective when dealing with data sets, such as microarray data, which have large numbers of features and small number of instances. The algorithm is compared with AdaBoost, Arcing and MultiBoost, using twelve gene expression datasets, using 10-fold cross validation. The new algorithm consistently achieves higher classification accuracy over all these datasets. In contrast to other AdaBoost variants, the algorithm is not susceptible to problems when a zero-error base classifier is encountered.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ambroise C, McLachlan GJ (2002) Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci USA 99(10):6562–6566
Amit Y, Blanchard G (2001) Multiple randomized classifiers. Technical report, University of Chicago
Ali KM, Pazzani MJ (1996) Error reduction through learning multiple descriptions. Int J Mach Learn 24:173–202
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Natl Acad Sci Cell Biol 96:6745–6750
Armstrong SA, Staunton JE, Silverman LB, Pieters R (2002) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 30:41–47
Ash AA, Michael BE, Davis RE, Ma C, Izidore SL, Andreas R (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503–511
Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Int J Mach Learn 36:105–139
Breiman L (1996) Bias, variance, and arcing classifiers. Technical report 460, Statistics Department, UC Berkeley
Breiman L (1996) Bagging predictors. Int J Mach Learn 24:134–140
Catherine LN, Mani DR, Rebecca AB, Pablo T (2003) Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res 63:1602–1607
Dasgupta S, Long PM (2003) Boosting with diverse base classifiers. In: Proceedings of the conference on computational learning theory, pp 273–287
Dettling M (2004) BagBoosting for tumor classification with gene expression data. Bioinformatics 20(18):3583–3593
Dinesh S, Phillip GF, Kenneth R, Donald GJ, Judith M (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1:203–209
Domingo C, Watanabe O (2000) MadaBoost: A modification of AdaBoost. Technical reports on mathematical and computing sciences TR-C138
Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: Proceedings of the thirteenth international conference on machine learning, San Francisco, pp 148–156
Freund Y (2001) An adaptive version of the boost by majority algorithm. Int J Mach Learn 43(3):293–318
Friedman J (2002) Stochastic gradient boosting. Comput Stat Data Anal 38:367–368
Friedman JH, Hastie T, Tibshirani R (2000) Additive logistic regression: A statistical view of boosting. Ann Stat 28:337–374
Gavin JG, Roderick VJ, Li-Li H, Steven RG, Joshua EB, Sridhar R, William GR, David JS, Raphael B (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62:4963–4967
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537
Kohavi R, Wolpert D (1996) Bias plus variance decomposition for zero-one loss functions. In: Proceedings of the thirteenth international machine learning conference
Kuncheva LI (2005) Diversity in multiple classifier systems. Inf Fusion 6:3–4
Long PM, Vega VB (2003) Boosting and microarray data. Int J Mach Learn 52:31–44
Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415:436–442
Quinlan JR (1996) Bagging, boosting and c4.5. In: Proceedings of the thirteenth national conference on artificial intelligence, pp 725–730
Tan AC, Gilbert D (2003) Ensemble machine learning on gene expression data for cancer classification. Appl Bioinf 2:S75–S83
Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871):484–485
Wang C-W (2006) New ensemble machine learning method for classification and prediction on gene expression data. In: Proceedings of the international conference of the IEEE engineering in medicine and biology society, vol 2, pp 3478–3481
Warmuth MK, Liao J, Ratsch G (2006) Totally corrective boosting algorithms that maximize the margin. In: Proceedings of the 23rd international conference on machine learning, vol 148, pp 1001–1008
Webb GI (2000) MultiBoosting: a technique for combining boosting and wagging. Int J Mach Learn 40:159–196
Yeoh EJ, Ross ME, Shurtleff SA, Williams WK, Patel D, Mahfouz R, Behm FG (2002) Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1(2):133–143
Zembutsu H, Ohnishi Y, Tsunoda T, Furukawa Y, Katagiri T, Ueyama Y (2002) Genome-wide cDNA microarray screening to correlate gene expression profiles with sensitivity of 85 human cancer xenografts to anticancer drugs. Cancer Res 62(2):518–527
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, CW., Hunter, A. A low variance error boosting algorithm. Appl Intell 33, 357–369 (2010). https://doi.org/10.1007/s10489-009-0172-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-009-0172-0