research-article

An extensive empirical study of feature terms selection for text summarization and categorization

Authors:

CCSEIT '12: Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology

Pages 606 - 613

https://doi.org/10.1145/2393216.2393317

Published: 26 October 2012 Publication History

Get Access

Abstract

The ever-increasing availability of online textual data bases and the development of Internet have necessitated intensive research in the area of automatic text summarization within the Natural Language Processing (NLP) community. Researchers and students constantly face the problem when they work on a research project that, it is almost impossible to read most of the newly published papers. The goal of text summarization based on extraction approach is sentences selection. One of the methods to obtain the sentences is to assign some feature terms of sentences for the summary called ranking sentences and then select the best ones. Broad indexing and speedy search alone are not enough for effective retrieval. Categorized data are easy for user to browse if the data is well organized. In the first stage each document is prepared by preprocessing process: sentence segmentation, tokenization, stop word removal, case folding, lemmatization, and stemming. Then, we used important features, sentence filtering features, data compression features and finally calculate their score for each sentence. We proposed text summarization based on HMM tagger to improve the quality of the summary. By creating impressions the documents are also categorized. We compared our results with the Copernicus summarizer, Great summarizer and Microsoft Word 2007 summarizers etc. The proposed system is tested with four types' similarities: Cosine, Jaccard, Jaro-winkler and Sorenson similarities. The results show that the best quality for the summaries was obtained by feature terms method. Our text categorization approach is validated with Naïve Bayesian, Decision Tree Induction, KNN and SVM approaches.

References

[1]

D. R. Radev and W. Fan, "Automatic summarization of search engine hit lists", Proceedings of the ACL-2000 workshop on recent advances in natural language processing and information retrieval, Hong Kong, 2000, pp. 99--109

Digital Library

Google Scholar

[2]

ISC "ISC Internet Domain Survey", Available at: http://ftp.isc.org/www/survey/reports/current/

Google Scholar

[3]

H. Dang and K. Owczarzak, "Overview of the TAC 2008 Update Summarization Task," in Proceedings of Text Analysis Conference, 2008, pp. 1--16

Google Scholar

[4]

Vishal Gupta and Gurpreet Singh Lehal, "A Survey of Text Summarization Extractive Techniques", Journal of Emerging Technologies In Web Intelligence, Vol. 2, No. 3, August 2010.

Google Scholar

[5]

Luhn H. P, "The Automatic Creation of Literature Abstracts", IBM Journal April 1958 pp. 159--165.

Digital Library

Google Scholar

[6]

Edmundson H. P, "New Methods in Automatic Extracting", Journal of the Association for Computing Machinery, Vol 16, No 2, April 1969, PP. 264--285.

Digital Library

Google Scholar

[7]

J. J.Pollock and A. Zamora, "Automatic Abstracting Research at Chemical Abstracts Service", Journal of Chemical Information and Computer Sciences, 15(4), 226--232(1975).

Crossref

Google Scholar

[8]

Kathleen R. McKeown, "Discourse Strategies for Generating Natural Language Text", Department of Computer Science, Columbia University, New York, 1982

Google Scholar

[9]

Brandow, R., Mitze, K., Rau, L. F. Automatic condensation of electronic publications by sentence selection. Information Processing Management, 31(5):675--685, 1995.

Digital Library

Google Scholar

[10]

Barzilay R., Elhadad M., Boguraev & Kennedy M., Using Lexical Chains for Text Summarization, Workshop on Intelligent Scalable Text Summarization, Ben Gurion University of the Negev, Be'er Sheva Israel, 1997.

Google Scholar

[11]

Radev, R., Blair-goldensohn, S, Zhang, Z. Experiments in Single and Multi-Docuemtn Summarization using MEAD. In First Document Understanding Conference, New Orleans, LA, 2001.

Google Scholar

[12]

Chena NOU. "Khmer Part-of-Speech Tagging". Global Information and Telecommunication Studies, Waseda University.

Google Scholar

Cited By

View all

Turan M(2023)Selection Informative Units for Extractive SummarizationWSEAS TRANSACTIONS ON SYSTEMS10.37394/23202.2023.22.3122(287-294)Online publication date: 23-Mar-2023
https://doi.org/10.37394/23202.2023.22.31
Mamidala KSanampudi S(2021)Text Summarization of Multiple Documents Using Binary Fruit Fly Optimization AlgorithmProceedings of the 2nd International Conference on Computational and Bio Engineering10.1007/978-981-16-1941-0_78(769-778)Online publication date: 28-Sep-2021
https://doi.org/10.1007/978-981-16-1941-0_78
Zahra KAzam FButt WIlyas F(2018)User Identification on Social Networks Through Text Mining Techniques: A Systematic Literature ReviewInformation Science and Applications 201810.1007/978-981-13-1056-0_49(485-498)Online publication date: 24-Jul-2018
https://doi.org/10.1007/978-981-13-1056-0_49
Show More Cited By

An extensive empirical study of feature terms selection for text summarization and categorization
1. Computing methodologies
  1. Artificial intelligence
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

A Survey on Recent Text Summarization Techniques
Multi-disciplinary Trends in Artificial Intelligence
Abstract
NLP (Natural Language Processing) is a subfield of artificial intelligence that examines the interactions between computers and human languages, specifically how to design computers to process and evaluate vast quantities of natural language data. ...
Study of automatic text summarization approaches in different languages
Abstract
Nowadays we see huge amount of information is available on both, online and offline sources. For single topic we see hundreds of articles are available, containing vast amount of information about it. It is really a difficult task to manually ...
Optimal Features Set for Extractive Automatic Text Summarization
ACCT '15: Proceedings of the 2015 Fifth International Conference on Advanced Computing & Communication Technologies

The goal of text summarization is to reduce the size of the text while preserving its important information and overall meaning. With the availability of internet, data is growing leaps and bounds and it is practically impossible summarizing all this ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

CCSEIT '12: Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology

October 2012

800 pages

ISBN:9781450313100

DOI:10.1145/2393216

General Chairs:
Natarajan Meghanathan
Jackson State University
,
Michal Wozniak
Wroclaw University of Technology, Poland

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CCSEIT '12

Sponsor:

Avinashilingam University

CCSEIT '12: The Second International Conference on Computational Science, Engineering

October 26 - 28, 2012

Coimbatore UNK, India

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
239
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Turan M(2023)Selection Informative Units for Extractive SummarizationWSEAS TRANSACTIONS ON SYSTEMS10.37394/23202.2023.22.3122(287-294)Online publication date: 23-Mar-2023
https://doi.org/10.37394/23202.2023.22.31
Mamidala KSanampudi S(2021)Text Summarization of Multiple Documents Using Binary Fruit Fly Optimization AlgorithmProceedings of the 2nd International Conference on Computational and Bio Engineering10.1007/978-981-16-1941-0_78(769-778)Online publication date: 28-Sep-2021
https://doi.org/10.1007/978-981-16-1941-0_78
Zahra KAzam FButt WIlyas F(2018)User Identification on Social Networks Through Text Mining Techniques: A Systematic Literature ReviewInformation Science and Applications 201810.1007/978-981-13-1056-0_49(485-498)Online publication date: 24-Jul-2018
https://doi.org/10.1007/978-981-13-1056-0_49
Borrajo LSeara Vieira AIglesias E(2015)TCBR-HMMApplied Soft Computing10.1016/j.asoc.2014.10.01926:C(463-473)Online publication date: 1-Jan-2015
https://dl.acm.org/doi/10.1016/j.asoc.2014.10.019

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Recommendations

A Survey on Recent Text Summarization Techniques

Study of automatic text summarization approaches in different languages

Optimal Features Set for Extractive Automatic Text Summarization