[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2393216.2393317acmotherconferencesArticle/Chapter ViewAbstractPublication PagesccseitConference Proceedingsconference-collections
research-article

An extensive empirical study of feature terms selection for text summarization and categorization

Published: 26 October 2012 Publication History

Abstract

The ever-increasing availability of online textual data bases and the development of Internet have necessitated intensive research in the area of automatic text summarization within the Natural Language Processing (NLP) community. Researchers and students constantly face the problem when they work on a research project that, it is almost impossible to read most of the newly published papers. The goal of text summarization based on extraction approach is sentences selection. One of the methods to obtain the sentences is to assign some feature terms of sentences for the summary called ranking sentences and then select the best ones. Broad indexing and speedy search alone are not enough for effective retrieval. Categorized data are easy for user to browse if the data is well organized. In the first stage each document is prepared by preprocessing process: sentence segmentation, tokenization, stop word removal, case folding, lemmatization, and stemming. Then, we used important features, sentence filtering features, data compression features and finally calculate their score for each sentence. We proposed text summarization based on HMM tagger to improve the quality of the summary. By creating impressions the documents are also categorized. We compared our results with the Copernicus summarizer, Great summarizer and Microsoft Word 2007 summarizers etc. The proposed system is tested with four types' similarities: Cosine, Jaccard, Jaro-winkler and Sorenson similarities. The results show that the best quality for the summaries was obtained by feature terms method. Our text categorization approach is validated with Naïve Bayesian, Decision Tree Induction, KNN and SVM approaches.

References

[1]
D. R. Radev and W. Fan, "Automatic summarization of search engine hit lists", Proceedings of the ACL-2000 workshop on recent advances in natural language processing and information retrieval, Hong Kong, 2000, pp. 99--109
[2]
ISC "ISC Internet Domain Survey", Available at: http://ftp.isc.org/www/survey/reports/current/
[3]
H. Dang and K. Owczarzak, "Overview of the TAC 2008 Update Summarization Task," in Proceedings of Text Analysis Conference, 2008, pp. 1--16
[4]
Vishal Gupta and Gurpreet Singh Lehal, "A Survey of Text Summarization Extractive Techniques", Journal of Emerging Technologies In Web Intelligence, Vol. 2, No. 3, August 2010.
[5]
Luhn H. P, "The Automatic Creation of Literature Abstracts", IBM Journal April 1958 pp. 159--165.
[6]
Edmundson H. P, "New Methods in Automatic Extracting", Journal of the Association for Computing Machinery, Vol 16, No 2, April 1969, PP. 264--285.
[7]
J. J.Pollock and A. Zamora, "Automatic Abstracting Research at Chemical Abstracts Service", Journal of Chemical Information and Computer Sciences, 15(4), 226--232(1975).
[8]
Kathleen R. McKeown, "Discourse Strategies for Generating Natural Language Text", Department of Computer Science, Columbia University, New York, 1982
[9]
Brandow, R., Mitze, K., Rau, L. F. Automatic condensation of electronic publications by sentence selection. Information Processing Management, 31(5):675--685, 1995.
[10]
Barzilay R., Elhadad M., Boguraev & Kennedy M., Using Lexical Chains for Text Summarization, Workshop on Intelligent Scalable Text Summarization, Ben Gurion University of the Negev, Be'er Sheva Israel, 1997.
[11]
Radev, R., Blair-goldensohn, S, Zhang, Z. Experiments in Single and Multi-Docuemtn Summarization using MEAD. In First Document Understanding Conference, New Orleans, LA, 2001.
[12]
Chena NOU. "Khmer Part-of-Speech Tagging". Global Information and Telecommunication Studies, Waseda University.

Cited By

View all
  • (2023)Selection Informative Units for Extractive SummarizationWSEAS TRANSACTIONS ON SYSTEMS10.37394/23202.2023.22.3122(287-294)Online publication date: 23-Mar-2023
  • (2021)Text Summarization of Multiple Documents Using Binary Fruit Fly Optimization AlgorithmProceedings of the 2nd International Conference on Computational and Bio Engineering10.1007/978-981-16-1941-0_78(769-778)Online publication date: 28-Sep-2021
  • (2018)User Identification on Social Networks Through Text Mining Techniques: A Systematic Literature ReviewInformation Science and Applications 201810.1007/978-981-13-1056-0_49(485-498)Online publication date: 24-Jul-2018
  • Show More Cited By
  1. An extensive empirical study of feature terms selection for text summarization and categorization

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      CCSEIT '12: Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology
      October 2012
      800 pages
      ISBN:9781450313100
      DOI:10.1145/2393216
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      • Avinashilingam University: Avinashilingam University

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 26 October 2012

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. HMM tagger
      2. Natural Language Processing
      3. inverse sentence frequency
      4. parts of speech tagging
      5. term frequency
      6. text categorization
      7. text summarization
      8. verb featured sentences

      Qualifiers

      • Research-article

      Conference

      CCSEIT '12
      Sponsor:
      • Avinashilingam University

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 09 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Selection Informative Units for Extractive SummarizationWSEAS TRANSACTIONS ON SYSTEMS10.37394/23202.2023.22.3122(287-294)Online publication date: 23-Mar-2023
      • (2021)Text Summarization of Multiple Documents Using Binary Fruit Fly Optimization AlgorithmProceedings of the 2nd International Conference on Computational and Bio Engineering10.1007/978-981-16-1941-0_78(769-778)Online publication date: 28-Sep-2021
      • (2018)User Identification on Social Networks Through Text Mining Techniques: A Systematic Literature ReviewInformation Science and Applications 201810.1007/978-981-13-1056-0_49(485-498)Online publication date: 24-Jul-2018
      • (2015)TCBR-HMMApplied Soft Computing10.1016/j.asoc.2014.10.01926:C(463-473)Online publication date: 1-Jan-2015

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media