[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/502585.502603acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Combining multiple classifiers for text categorization

Published: 05 October 2001 Publication History

Abstract

A major problem facing online information services is how to index and supplement large document collections with respect to a rich set of categories. We focus upon the routing of case law summaries to various secondary law volumes in which they should be cited. Given the large number (> 13,000) of closely related categories, this is a challenging task that is unlikely to succumb to a single algorithmic solution. Our fully implemented and recently deployed system shows that a superior classification engine for this task can be constructed from a combination of classifiers. The multi-classifier approach helps us leverage all the relevant textual features and meta data, and appears to generalize to related classification tasks.

References

[1]
Bartell, B. T., Cottrell, G. W., & Belew, R. K. (1994). Automated combination of multiple ranked retrieval systems. Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 173-181.
[2]
Cohen, W. W. & Singer Y. (1996). Context-sensitive learning methods for text categorization. Proceedings of the 19th Annual International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 307- 315.
[3]
Dumais, S., Platt, J., Heckerman, D., & Sahami, M. (1998). Inductive learning algorithms and representations for text categorization. Proceedings of the 7th International Conference on Information and Knowledge Management, pp. 148-155.
[4]
Hayes, P. J. and Weinstein, S. P. (1990). CONSTRUE/TIS: A system for content-based indexing of a database of news stories. In 2nd Annual Conference on Innovative Applications of Artificial Intelligence, pp. 1-5.
[5]
Iyer, R. D., Lewis, D. D., Schapire, R. E., Singer, Y. & Singhal, A. (2000). Boosting for document routing. Proceedings of the 9th International Conference on Information and Knowledge Management, pp. 70-77
[6]
Joachims, T. (1996). A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. Technical Report, Carnegie Mellon University, CMU-CS-96-118.
[7]
Larkey, L. & Croft, W. B. (1996). Combining classifiers in text categorization. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 289-297.
[8]
Lewis, D. D. & Ringuette, M. (1994). A comparison of two learning algorithms for text categorization. 3rd Annual Symposium of Document Analysis and Information Retrieval, pp. 81-93.
[9]
McCallum, A. & Nigam, K. (1998). A comparison of event models for naive Bayes classification. AAAHCML-98 Workshop on Learning for Text Categorization, Technical Report WS-98-05, AAAI Press.
[10]
Papka, R. & Allan, J. (1998). Document Classification using Multiword Features. Proceedings of the 7th International Conference on Information and Knowledge Management, pp. 124- 131.
[11]
Ponte, J. M. & Croft, W. B. (1998). A language modeling approach to information retrieval. Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275-281.
[12]
Salton, G. (1971). Automatic Text Processing. Reading, MA: Addison Wesley.
[13]
Salton, G. & Buckley, C. (1988). Term weighting approaches in automatic text retrieval. Information Processing and Management, Vol. 24, No. 5, pp. 513- 523.
[14]
Stewart, C. V. (1999). Robust parameter estimation in computer vision. SIAM Review, Vol. 41, No. 3, pp. 513-537.
[15]
Turner, K. & Ghosh, J. (1999). Linear and order statistics combiners for pattern classification. In Sharkey, A. (ed.) "Combining Artificial Neural Networks," Springer Verlag, pp. 127-162.
[16]
Van Rijsbergen, C. J. Information Retrieval. Butterworths, London, 1979.
[17]
Voorhees, E. M. (1994). Query expansion using lexical-semantic relations. Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 311-317.
[18]
Yang, Y. (1994). Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. Proceedings of the 17th Annual International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 13-22.
[19]
Yang, Y. & Liu, X. (1999). A re-examination of text categorization methods. Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42-49.

Cited By

View all
  • (2015)Boosting drug named entity recognition using an aggregate classifierArtificial Intelligence in Medicine10.1016/j.artmed.2015.05.00765:2(145-153)Online publication date: 1-Oct-2015
  • (2012)A fast subspace text categorization method using parallel classifiersProceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II10.1007/978-3-642-28601-8_12(132-143)Online publication date: 11-Mar-2012
  • (2011)Semantic subspace learning for text classification using hybrid intelligent techniquesInternational Journal of Hybrid Intelligent Systems10.5555/2010580.20105858:2(99-114)Online publication date: 1-Apr-2011
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '01: Proceedings of the tenth international conference on Information and knowledge management
October 2001
616 pages
ISBN:1581134363
DOI:10.1145/502585
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2001

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. document classification
  2. multi-classifier

Qualifiers

  • Article

Conference

CIKM01
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2015)Boosting drug named entity recognition using an aggregate classifierArtificial Intelligence in Medicine10.1016/j.artmed.2015.05.00765:2(145-153)Online publication date: 1-Oct-2015
  • (2012)A fast subspace text categorization method using parallel classifiersProceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II10.1007/978-3-642-28601-8_12(132-143)Online publication date: 11-Mar-2012
  • (2011)Semantic subspace learning for text classification using hybrid intelligent techniquesInternational Journal of Hybrid Intelligent Systems10.5555/2010580.20105858:2(99-114)Online publication date: 1-Apr-2011
  • (2011)Legal document clustering with built-in topic segmentationProceedings of the 20th ACM international conference on Information and knowledge management10.1145/2063576.2063636(383-392)Online publication date: 24-Oct-2011
  • (2011)Examples of specialized legal metadata adapted to the digital environment, from the U.S. code of federal regulationsProceedings of the 12th Annual International Digital Government Research Conference: Digital Government Innovation in Challenging Times10.1145/2037556.2037594(229-234)Online publication date: 12-Jun-2011
  • (2011)Adapting specialized legal metadata to the digital environmentProceedings of the 13th International Conference on Artificial Intelligence and Law10.1145/2018358.2018377(126-130)Online publication date: 6-Jun-2011
  • (2010)Reinforcement Post-Processing and Feedback Algorithm for Optimal Combination in Bottom-Up Hierarchical ClassificationThe KIPS Transactions:PartB10.3745/KIPSTB.2010.17B.2.13917B:2(139-148)Online publication date: 30-Apr-2010
  • (2010)Text Mining and Information ExtractionData Mining and Knowledge Discovery Handbook10.1007/978-0-387-09823-4_42(809-835)Online publication date: 7-Jul-2010
  • (2007)Generating Value from Textual DiscoveryProceedings of the 7th international conference on Computational Science, Part III: ICCS 200710.1007/978-3-540-72588-6_123(746-753)Online publication date: 27-May-2007
  • (2006)MEMPHISJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2005.05.00752:6(315-331)Online publication date: 1-Jun-2006
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media