[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1007/978-3-642-13059-5_4guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Improving multiclass text classification with error-correcting output coding and sub-class partitions

Published: 31 May 2010 Publication History

Abstract

Error-Correcting Output Coding (ECOC) is a general framework for multiclass text classification with a set of binary classifiers It can not only help a binary classifier solve multi-class classification problems, but also boost the performance of a multi-class classifier When building each individual binary classifier in ECOC, multiple classes are randomly grouped into two disjoint groups: positive and negative However, when training such a binary classifier, sub-class distribution within positive and negative classes is neglected Utilizing this information is expected to improve a binary classifier We thus design a simple binary classification strategy via multi-class categorization (2vM) to make use of sub-class partition information, which can lead to better performance over the traditional binary classification The proposed binary classification strategy is then applied to enhance ECOC Experiments on document categorization and question classification show its effectiveness.

References

[1]
Sebastiani, F.: Machine learning in automated text categorization ACM Computing Surveys 34(1), 1-47 (2002)
[2]
Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes J of Artificial Intelligence Research 2, 263-286 (1995)
[3]
Ghani, R.: Using Error-Correcting Codes for Text Classification In: The Seventeenth International Conference on Machine Learning, ICML 2000 (2000)
[4]
Mccallum, A., Nigam, K.: A Comparison of Event Models for Naive Bayes Text Classification In: The AAAI/ICML 1998 Workshop on Learning for Text Categorization (1998)
[5]
Cardoso-Cachopo, A.: Improving Methods for Single-label Text Categorization PhD Thesis, Instituto Superior Técnico, Portugal (2007)
[6]
Li, X., Roth, D.: Learning question classifiers In: The 19th International Conference on Computational Linguistics (COLING 2002), pp 556-562 (2002)
[7]
Hacioglu, K., Ward, W.: Question Classification with Support Vector Machines and Error Correcting Codes In: Proceedings of HLT-NAACL 2003 (2003) (short papers)
[8]
Dietterich, T.G., Bakiri, G.: Error-correcting output codes: A general method for improving multiclass inductive learning programs In: The Ninth National Conference on Artificial Intelligence (AAAI 1991), pp 572-577 (1991)
[9]
Berger, A.: Error-correcting output coding for text classification In: IJCAI 1999 Workshop on Machine Learning for Information Filtering (1999)
[10]
Rennie, J., Rifkin, R.: Improving Multiclass Text Classification with the Support Vector Machine Massachusetts Institute of Technology, AI Memo, AIM-2001-026 (2001)
[11]
Tan, S., Wu, G., Cheng, X.: Enhancing the Performance of Centroid Classifier by ECOC and Model Refinement In: Buntine, W., Grobelnik, M., Mladenic, D., Shawe-Taylor, J (eds.) ECML PKDD 2009, Part II LNCS, vol 5782, pp 458-472 Springer, Heidelberg (2009)
[12]
Crammer, K., Singer, Y.: Improved Output Coding for Classification Using Continuous Relaxtion In: Neural Information Processing Systems (NIPS 2000), pp 437-443 (2000)
[13]
Pujol, O., Radeva, P., Vitria, J.: Discriminant ECOC: A Heuristic Method for Application Dependent Design of Error Correcting Output Codes IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 1007-1012 (2006)
[14]
Zhou, J., Peng, H., Suen, C.Y.: Data-driven Decomposition for Multi-class Classification Pattern Recognition 41, 67-76 (2008)
[15]
Luo, D., Xiong, R.: An improved error-correcting output coding framework with kernelbased decoding Neurocomputing 71, 3131-3139 (2008)

Cited By

View all
  • (2018)Construction of Efficient V-Gram Dictionary for Sequential Data AnalysisProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3271789(1343-1352)Online publication date: 17-Oct-2018
  • (2014)Exploring Multidimensional Continuous Feature Space to Extract Relevant WordsStatistical Language and Speech Processing10.1007/978-3-319-11397-5_12(159-170)Online publication date: 14-Oct-2014
  1. Improving multiclass text classification with error-correcting output coding and sub-class partitions

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      AI'10: Proceedings of the 23rd Canadian conference on Advances in Artificial Intelligence
      May 2010
      423 pages
      ISBN:3642130585
      • Editors:
      • Atefeh Farzindar,
      • Vlado Kešelj

      Sponsors

      • MultiCorpora R&D: MultiCorpora R&D Inc.
      • University of Ottawa: University of Ottawa
      • Palomino System Innovations: Palomino System Innovations Inc.
      • CAIAC: Canadian Artificial Intelligence Association/Association pour l'intelligence artificielle au Canada
      • NLP Technologies Inc.: NLP Technologies Inc.

      Publisher

      Springer-Verlag

      Berlin, Heidelberg

      Publication History

      Published: 31 May 2010

      Author Tags

      1. binary classification
      2. error correcting output coding
      3. text classification

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 10 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2018)Construction of Efficient V-Gram Dictionary for Sequential Data AnalysisProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3271789(1343-1352)Online publication date: 17-Oct-2018
      • (2014)Exploring Multidimensional Continuous Feature Space to Extract Relevant WordsStatistical Language and Speech Processing10.1007/978-3-319-11397-5_12(159-170)Online publication date: 14-Oct-2014

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media