Article

Free access

Fast methods for kernel-based text analysis

Authors:

Taku Kudo,

Yuji MatsumotoAuthors Info & Claims

ACL '03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1

Pages 24 - 31

https://doi.org/10.3115/1075096.1075100

Published: 07 July 2003 Publication History

PDF eReader

Abstract

Kernel-based learning (e.g., Support Vector Machines) has been successfully applied to many hard problems in Natural Language Processing (NLP). In NLP, although feature combinations are crucial to improving performance, they are heuristically selected. Kernel methods change this situation. The merit of the kernel methods is that effective feature combination is implicitly expanded without loss of generality and increasing the computational costs. Kernel-based text analysis shows an excellent performance in terms in accuracy; however, these methods are usually too slow to apply to large-scale text analysis. In this paper, we extend a Basket Mining algorithm to convert a kernel-based classifier into a simple and fast linear classifier. Experimental results on English BaseNP Chunking, Japanese Word Segmentation and Japanese Dependency Parsing show that our new classifiers are about 30 to 300 times faster than the standard kernel-based classifiers.

References

[1]

Junichi Aoe. 1989. An efficient digital search algorithm by using a double-array structure. IEEE Transactions on Software Engineering, 15(9).

Digital Library

Google Scholar

[2]

Michael Collins and Nigel Duffy. 2001. Convolution kernels for natural language. In Advances in Neural Information Processing Systems 14, Vol.1 (NIPS 2001), pages 625--632.

Google Scholar

[3]

Hideki Isozaki and Hideto Kazawa. 2002. Efficient support vector classifiers for named entity recognition. In Proceedings of the COLING-2002, pages 390--396.

Digital Library

Google Scholar

[4]

Hisashi Kashima and Teruo Koyanagi. 2002. Svm kernels for semi-structured data. In Proceedings of the ICML-2002, pages 291--298.

Digital Library

Google Scholar

[5]

Taku Kudo and Yuji Matsumoto. 2000. Japanese Dependency Structure Analysis based on Support Vector Machines. In Proceedings of the EMNLP/VLC-2000, pages 18--25.

Digital Library

Google Scholar

[6]

Taku Kudo and Yuji Matsumoto. 2001. Chunking with support vector machines. In Proceedings of the the NAACL, pages 192--199.

Digital Library

Google Scholar

[7]

Taku Kudo and Yuji Matsumoto. 2002. Japanese dependency analyisis using cascaded chunking. In Proceedings of the CoNLL-2002, pages 63--69.

Digital Library

Google Scholar

[8]

Sadao Kurohashi and Makoto Nagao. 1997. Kyoto University text corpus project. In Proceedings of the ANLP-1997, pages 115--118.

Google Scholar

[9]

Huma Lodhi, Craig Saunders, John Shawe-Taylor, Nello Cristianini, and Chris Watkins. 2002. Text classification using string kernels. Journal of Machine Learning Research, 2.

Digital Library

Google Scholar

[10]

Tetsuji Nakagawa, Taku Kudo, and Yuji Matsumoto. 2002. Revision learning and its application to part-of-speech tagging. In Proceedings of the ACL 2002, pages 497--504.

Digital Library

Google Scholar

[11]

Jian Pei, Jiawei Han, and et al. 2001. Prefixspan: Mining sequential patterns by prefix-projected growth. In Proc. of International Conference of Data Engineering, pages 215--224.

Digital Library

Google Scholar

[12]

Lance A. Ramshaw and Mitchell P. Marcus. 1995. Text chunking using transformation-based learning. In Proceedings of the VLC, pages 88--94.

Google Scholar

[13]

Vladimir N. Vapnik. 1995. The Nature of Statistical Learning Theory. Springer.

Digital Library

Google Scholar

[14]

Mohammed Zaki. 2002. Efficiently mining frequent trees in a forest. In Proceedings of the 8th International Conference on Knowledge Discovery and Data Mining KDD, pages 71--80.

Digital Library

Google Scholar

Cited By

View all

Aliannejadi MCrestani F(2018)Personalized Context-Aware Point of Interest RecommendationACM Transactions on Information Systems10.1145/323193336:4(1-28)Online publication date: 3-Oct-2018
https://dl.acm.org/doi/10.1145/3231933
Li QWong DChao LZhu MXiao TZhu JZhang M(2018)Linguistic Knowledge-Aware Neural Machine TranslationIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2018.286464826:12(2341-2354)Online publication date: 1-Dec-2018
https://dl.acm.org/doi/10.1109/TASLP.2018.2864648
Winiwarter W(2017)ELSABETProceedings of the 19th International Conference on Information Integration and Web-based Applications & Services10.1145/3151759.3151805(558-562)Online publication date: 4-Dec-2017
https://dl.acm.org/doi/10.1145/3151759.3151805
Show More Cited By

Fast methods for kernel-based text analysis
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Kernel methods for word sense disambiguation

Many applications of natural language processing (NLP) need an accurate resolution of various ambiguities existing in natural language. The task of fulfilling this need is also called word sense disambiguation (WSD). WSD is to resolve the correct sense ...
Fast kernel Fisher discriminant analysis via approximating the kernel principal component analysis

Kernel Fisher discriminant analysis (KFDA) extracts a nonlinear feature from a sample by calculating as many kernel functions as the training samples. Thus, its computational efficiency is inversely proportional to the size of the training sample set. In ...
A knowledge-based semantic Kernel for text classification
SPIRE'11: Proceedings of the 18th international conference on String processing and information retrieval

Typically, in textual document classification the documents are represented in the vector space using the "Bag of Words" (BOW) approach. Despite its ease of use, BOW representation cannot handle word synonymy and polysemy problems and does not consider ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

ACL '03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1

July 2003

571 pages

Program Chairs:
Erhard W. Hinrichs,
Dan Roth

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 07 July 2003

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

69
Total Citations
View Citations
892
Total Downloads

Downloads (Last 12 months)86
Downloads (Last 6 weeks)9

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Aliannejadi MCrestani F(2018)Personalized Context-Aware Point of Interest RecommendationACM Transactions on Information Systems10.1145/323193336:4(1-28)Online publication date: 3-Oct-2018
https://dl.acm.org/doi/10.1145/3231933
Li QWong DChao LZhu MXiao TZhu JZhang M(2018)Linguistic Knowledge-Aware Neural Machine TranslationIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2018.286464826:12(2341-2354)Online publication date: 1-Dec-2018
https://dl.acm.org/doi/10.1109/TASLP.2018.2864648
Winiwarter W(2017)ELSABETProceedings of the 19th International Conference on Information Integration and Web-based Applications & Services10.1145/3151759.3151805(558-562)Online publication date: 4-Dec-2017
https://dl.acm.org/doi/10.1145/3151759.3151805
Juan YZhuang YChin WLin CSen SGeyer WFreyne JCastells P(2016)Field-aware Factorization Machines for CTR PredictionProceedings of the 10th ACM Conference on Recommender Systems10.1145/2959100.2959134(43-50)Online publication date: 7-Sep-2016
https://dl.acm.org/doi/10.1145/2959100.2959134
Winiwarter WAnderst-Kotsis GIndrawan-Santiago M(2015)JAMREDProceedings of the 17th International Conference on Information Integration and Web-based Applications & Services10.1145/2837185.2837246(1-5)Online publication date: 11-Dec-2015
https://dl.acm.org/doi/10.1145/2837185.2837246
Winiwarter WAnderst-Kotsis GIndrawan-Santiago M(2015)JILLProceedings of the 17th International Conference on Information Integration and Web-based Applications & Services10.1145/2837185.2837191(1-9)Online publication date: 11-Dec-2015
https://dl.acm.org/doi/10.1145/2837185.2837191
Liu JMicchelli CWang RXu Y(2013)Finite rank kernels for multi-task learningAdvances in Computational Mathematics10.1007/s10444-011-9244-x38:2(427-439)Online publication date: 1-Feb-2013
https://dl.acm.org/doi/10.1007/s10444-011-9244-x
Yamauchi THayashi YNakano Y(2013)Searching emotional scenes in TV programs based on twitter emotion analysisProceedings of the 5th international conference on Online Communities and Social Computing10.1007/978-3-642-39371-6_48(432-441)Online publication date: 21-Jul-2013
https://dl.acm.org/doi/10.1007/978-3-642-39371-6_48
Basile PCaputo ASemeraro G(2013)Supervised Learning and Distributional Semantic Models for Super-Sense TaggingProceeding of the XIIIth International Conference on AI*IA 2013: Advances in Artificial Intelligence - Volume 824910.1007/978-3-319-03524-6_9(97-108)Online publication date: 4-Dec-2013
https://dl.acm.org/doi/10.1007/978-3-319-03524-6_9
Prabhakaran VBloodgood MDiab MDorr BLevin LPiatko CRambow OVan Durme B(2012)Statistical modality tagging from rule-based annotations and crowdsourcingProceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics10.5555/2392701.2392708(57-64)Online publication date: 13-Jul-2012
https://dl.acm.org/doi/10.5555/2392701.2392708
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

References

Cited By

Recommendations

Kernel methods for word sense disambiguation

Fast kernel Fisher discriminant analysis via approximating the kernel principal component analysis

A knowledge-based semantic Kernel for text classification

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations