[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Optimal Aggregation of Binary Classifiers for Multiclass Cancer Diagnosis Using Gene Expression Profiles

Published: 01 April 2009 Publication History

Abstract

Multiclass classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. There have been many studies of aggregating binary classifiers to construct a multiclass classifier based on one-versus-the-rest (1R), one-versus-one (11), or other coding strategies, as well as some comparison studies between them. However, the studies found that the best coding depends on each situation. Therefore, a new problem, which we call the “optimal coding problem,” has arisen: how can we determine which coding is the optimal one in each situation? To approach this optimal coding problem, we propose a novel framework for constructing a multiclass classifier, in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. Although there is no a priori answer to the optimal coding problem, our weight tuning method can be a consistent answer to the problem. We apply this method to various classification problems including a synthesized data set and some cancer diagnosis data sets from gene expression profiling. The results demonstrate that, in most situations, our method can improve classification accuracy over simple voting heuristics and is better than or comparable to state-of-the-art multiclass predictors.

References

[1]
T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander, "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring," Science, vol. 286, no. 5439, pp. 531-537, Oct. 1999.
[2]
J. Khan, J.S. Wei, M. Ringner, L.H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C.R. Antonescu, C. Peterson, and P.S. Meltzer, "Classification and Diagnostic Prediction of Cancers Using Gene Expression Profiling and Artificial Neural Networks," Nature Medicine, vol. 7, no. 6, pp. 673-679, June 2001.
[3]
S. Ramaswamy, P. Tamayo, R. Rifkin, S. Mukherjee, C. Yeang, M. Angelo, C. Ladd, M. Reich, E. Latulippe, J.P. Mesirov, T. Poggio, W. Gerald, M. Loda, E.S. Lander, and T.R. Golub, "Multiclass Cancer Diagnosis Using Tumor Gene Expression Signatures," Proc. Nat'l Academy Sciences USA, vol. 98, no. 26, pp. 15149-15154, Dec. 2001.
[4]
I. Hedenfalk, M. Ringner, A. Ben-Dor, Z. Yakhini, Y. Chen, G. Chebil, R. Ach, N. Loman, H. Olsson, P. Meltzer, A. Borg, and J. Trent, "Molecular Classification of Familial non-BRCA1/BRCA2 Breast Cancer," Proc. Nat'l Academy Sciences USA, vol. 100, no. 5, pp. 2532-2537, Mar. 2003.
[5]
B. Schoelkopf, C. Burges, and V. Vapnik, "Extracting Support Data for a Given Task," Proc. First Int'l Conf. Knowledge Discovery and Data Mining, pp. 252-257, 1995.
[6]
B. Schoelkopf, C. Burges, and A. Smola, Advances in Kernel Methods Support Vector Learning. MIT Press, 1999.
[7]
T.G. Dietterich and G. Bakiri, "Solving Multiclass Learning Problems via Error-Correcting Output Codes," J. Artificial Intelligence Research, vol. 2, pp. 263-286, 1995.
[8]
E.L. Allwein, R.E. Schapire, and Y. Singer, "Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers," Proc. 17th Int'l Conf. Machine Learning, pp. 9-16, 2000.
[9]
T. Hastie and R. Tibshirani, "Classification by Pairwise Coupling," Advances in Neural Information Processing Systems, vol. 10, pp. 507- 513, 1998.
[10]
B. Zadrozny, "Reducing Multiclass to Binary by Coupling Probability Estimates," Advances in Neural Information Processing Systems, vol. 14, pp. 1041-1048, 2001.
[11]
T. Li, C. Zhang, and M. Ogihara, "A Comparative Study of Feature Selection and Multiclass Classification Methods for Tissue Classification Based on Gene Expression," Bioinformatics, vol. 20, no. 15, pp. 2429-2437, Oct. 2004.
[12]
A. Statnikov, C.F. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy, "A Comprehensive Evaluation of Multicategory Classification Methods for Microarray Gene Expression Cancer Diagnosis," Bioinformatics, vol. 21, no. 5, pp. 631-643, 2005.
[13]
J. Weston and C. Watkins, "Multi-Class Support Vector Machine," technical report, Univ. of London, 1998.
[14]
K. Crammer and Y. Singer, "On the Algorithmic Implementation of Multiclass Kernel-Based Vector Machines," J. Machine Learning Research, vol. 2, pp. 265-292, 2001.
[15]
E.L. Allwein, R.E. Schapire, and Y. Singer, "Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers," J. Machine Learning Research, vol. 1, pp. 113-141, 2001.
[16]
L. Shen and E.C. Tan, "Reducing Multiclass Cancer Classification to Binary by Output Coding and SVM," Computational Biology and Chemistry, vol. 30, no. 1, pp. 63-71, Feb. 2006.
[17]
J. Platt, "Probabilistic Outputs for Support Vector Machines and Comparison to Regularized Likelihood Methods," Advances in Large Margin Classifiers, A.J. Smola, P. Bartlett, B. Schoelkopf, and D. Schuurmans, eds., pp. 61-74, 2000.
[18]
K. Kato, "Adaptor-Tagged Competitive PCR: A Novel Method for Measuring Relative Gene Expression," Nucleic Acids Research, vol. 25, no. 22, pp. 4694-4696, Nov. 1997.
[19]
E. Saxen, K. Franssila, O. Bjarnason, T. Normann, and N. Ringertz, "Observer Variation in Histologic Classification of Thyroid Cancer," Acta Pathologica et Microbiologica Scandinavica A, vol. 86A, no. 6, pp. 483-486, Nov. 1978.
[20]
A.S. Fassina, M.C. Montesco, V. Ninfo, P. Denti, and G. Masarotto, "Histological Evaluation of Thyroid Carcinomas: Reproducibility of the WHO Classification," Tumori, vol. 79, no. 5, pp. 314-320, Oct. 1993.
[21]
Z.W. Baloch, S. Fleisher, V.A. LiVolsi, and P.K. Gupta, "Diagnosis of Follicular Neoplasm: A Gray Zone in Thyroid Fine-Needle Aspiration Cytology," Diagnostic Cytophathology, vol. 26, no. 1, pp. 41-44, Jan. 2002.
[22]
K. Kato, R. Yamashita, R. Matoba, M. Monden, S. Noguchi, T. Takagi, and K. Nakai, "Cancer Gene Expression Database (CGED): A Database for Gene Expression Profiling and Accompanying Clinical Information of Human Cancer Tissues," Nucleic Acids Research, vol. 33, pp. D533-D536, 2005.
[23]
K. Taniguchi, T. Takano, A. Miyauchi, K. Koizumi, Y. Ito, Y. Takamura, M. Ishitobi, Y. Miyoshi, T. Taguchi, Y. Tamaki, K. Kato, and S. Noguchi, "Differentiation of Follicular Thyroid Adenoma from Carcinoma by Gene Expression Profiling with Adapter-Tagged Competitive Polymerase Chain Reaction," Oncology, vol. 69, pp. 428-435, 2005.
[24]
S.A. Armstrong, J.E. Staunton, L.B. Silverman, R. Pieters, M.L. den Boer, M.D. Minden, S.E. Sallan, E.S. Lander, T.R. Golub, and S.J. Korsmeyer, "MLL Translocations Specify a Distinct Gene Expression Profile that Distinguishes a Unique Leukemia," Nature Genetics, vol. 30, no. 1, pp. 41-47, Jan. 2002.
[25]
R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu, "Diagnosis of Multiple Cancer Types by Shrunken Centroids of Gene Expression," Proc. Nat'l Academy Sciences USA, vol. 99, no. 10, pp. 6567- 6572, May 2002.
[26]
M. Ohira, S. Oba, Y. Nakamura, E. Isogai, S. Kaneko, A. Nakagawa, T. Hirata, H. Kubo, T. Goto, S. Yamada, Y. Yoshida, M. Fuchioka, S. Ishii, and A. Nakagawara, "Expression Profiling Using a Tumor-Specific cDNA Microarray Predicts the Prognosis of Intermediate Risk Neuroblastomas," Cancer Cell, vol. 7, no. 4, pp. 337-350, Apr. 2005.
[27]
T.S. Furey, N. Cristianini, N. Duffy, D.W. Bednarski, M. Schummer, and D. Haussler, "Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data," Bioinformatics, vol. 16, no. 10, pp. 906-914, evaluation studies, Oct. 2000.
[28]
S. Dudoit, J. Fridlyand, and T.P. Speed, "Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data," J. Am. Statistical Assoc., vol. 97, pp. 77-87, 2002.
[29]
Y. Freund and R. Schapire, "Experiments with a New Boosting Algorithm," Proc. Int'l Conf. Machine Learning (ICML '96), pp. 148- 156, 1996.
[30]
I. Guyon, J. Weston, S.M.D. Barnhill, and V. Vapnik, "Gene Selection for Cancer Classification Using Support Vector Machines," Machine Learning, vol. 46, pp. 389-422, 2002.

Cited By

View all
  • (2018)Binary classifiers ensemble based on Bregman divergence for multi-class classificationNeurocomputing10.1016/j.neucom.2017.08.004273:C(424-434)Online publication date: 17-Jan-2018
  • (2012)Investigating Topic Models' Capabilities in Expression Microarray Data ClassificationIEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2012.1219:6(1831-1836)Online publication date: 1-Nov-2012
  • (2011)ICGA-PSO-ELM Approach for Accurate Multiclass Cancer Classification Resulting in Reduced Gene Sets in Which Genes Encoding Secreted Proteins Are Highly RepresentedIEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2010.138:2(452-463)Online publication date: 1-Mar-2011

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 6, Issue 2
April 2009
191 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 April 2009
Published in TCBB Volume 6, Issue 2

Author Tags

  1. Multiclass classification
  2. cancer diagnosis.
  3. error correcting output coding
  4. gene expression profiling

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Binary classifiers ensemble based on Bregman divergence for multi-class classificationNeurocomputing10.1016/j.neucom.2017.08.004273:C(424-434)Online publication date: 17-Jan-2018
  • (2012)Investigating Topic Models' Capabilities in Expression Microarray Data ClassificationIEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2012.1219:6(1831-1836)Online publication date: 1-Nov-2012
  • (2011)ICGA-PSO-ELM Approach for Accurate Multiclass Cancer Classification Resulting in Reduced Gene Sets in Which Genes Encoding Secreted Proteins Are Highly RepresentedIEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2010.138:2(452-463)Online publication date: 1-Mar-2011

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media