[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2660859.2660972acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiconiaacConference Proceedingsconference-collections
research-article

Query-based Multi-Document Summarization by Clustering of Documents

Published: 10 October 2014 Publication History

Abstract

Information Retrieval (IR) systems such as search engines retrieve a large set of documents, images and videos in response to a user query. Computational methods such as Automatic Text Summarization (ATS) reduce this information load enabling users to find information quickly without reading the original text. The challenges to ATS include both the time complexity and the accuracy of summarization. Our proposed Information Retrieval system consists of three different phases: Retrieval phase, Clustering phase and Summarization phase. In the Clustering phase, we extend the Potential-based Hierarchical Agglomerative (PHA) clustering method to a hybrid PHA-ClusteringGain-K-Means clustering approach. Our studies using the DUC 2002 dataset show an increase in both the efficiency and accuracy of clusters when compared to both the conventional Hierarchical Agglomerative Clustering (HAC) algorithm and PHA.

References

[1]
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008, Introduction to Information Retrieval, Cambridge University Press, New York, NY, USA.
[2]
Ricardo A. Baeza-Yates and Berthier Ribeiro-Neto. 1999, Modern Information Retrieval, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.
[3]
David A. Grossman and Ophir Frieder. 1998, Information Retrieval: Algorithms and Heuristics, Kluwer Academic Publishers, Norwell, MA, USA.
[4]
Daniel M. Dunlavy, Dianne P. O Leary, John M. Conroy, Judith D. Schlesinger, QCS: A system for querying, clustering and summarizing documents, Information Processing and Management, ScienceDirect, 2007. Volume 43, Issue 6, November 2007, Pages 1588--1605.
[5]
Yonggang Lu, Yi Wan, PHA: A fast potential-based hierarchical agglomerative clustering method, Pattern Recognition, ScienceDirect, 2013, Volume 46, Issue 5, May 2013, Pages 1227--1239.
[6]
Rada Mihalcea. 2004, Graph-based ranking algorithms for sentence extraction, applied to text summarization, In Proceedings of the ACL 2004 on Interactive poster and demonstration sessions (ACLdemo '04), Association for Computational Linguistics, Stroudsburg, PA, USA, Article 20.
[7]
Yunjae Jung, Haesun Park, A Decision Criteria for the Optimal Number of Clusters in Hierarchical Clustering, 2002, Kluwer Academic Publishers.
[8]
Anastasios Tombros, Robert Villa, & C. J. Van Rijsbergen, The effectiveness of query-specific hierarchic clustering in information retrieval, Information Processing and Management, ScienceDirect, 2002, Volume 38, Issue 4, July 2002, Pages 559--582.
[9]
Velmurugan T., Performance based analysis between k-Means and Fuzzy C-Means clustering algorithms for connection oriented telecommunication data, Applied Soft Computing, ScienceDirect, 2014, Volume 19, June 2014, Pages 134--146.
[10]
DUC, 2002, Document Understanding Conference (DUC), 2002, http://wwwnlpir.nist.gov/projects/duc/guidelines/2002.html.
[11]
Lin, Chin-Yew. 2004a, ROUGE: a Package for Automatic Evaluation of Summaries, In Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), Barcelona, Spain, July 25 - 26, 2004.
[12]
Ju-Hong Lee, Sun Park, Chan-Min Ahn, & Daeho Kim, Automatic generic document summarization based on non-negative matrix factorization, Information Processing and Management, ScienceDirect, 2009, Volume 45, Issue 1, January 2009, Pages 20--34.
[13]
A. K. Jain, M. N. Murty, and P. J. Flynn. 1999, Data clustering: a review, ACM Comput. Surv. 31, 3 (September 1999), 264--323.
[14]
Lin, Chin-Yew and E.H. Hovy 2003, Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics, In Proceedings of 2003 Language Technology Conference (HLT-NAACL 2003), Edmonton, Canada, May 27 - June 1, 2003.

Cited By

View all
  • (2024)Knowledge Extraction from Distributed Heterogeneous Data Sources2024 5th International Conference for Emerging Technology (INCET)10.1109/INCET61516.2024.10592904(1-6)Online publication date: 24-May-2024
  • (2023)Review on Query-focused Multi-document Summarization (QMDS) with Comparative AnalysisACM Computing Surveys10.1145/359729956:1(1-38)Online publication date: 16-May-2023
  • (2023)Network Analysis of Research Base Papers: Metrics and Potential use2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT56998.2023.10307301(1-9)Online publication date: 6-Jul-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICONIAAC '14: Proceedings of the 2014 International Conference on Interdisciplinary Advances in Applied Computing
October 2014
374 pages
ISBN:9781450329088
DOI:10.1145/2660859
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Amrita: Amrita Vishwa Vidyapeetham

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Automatic Text Summarization
  2. Clustering Gain
  3. Hierarchical Agglomerative Clustering algorithms
  4. Information Retrieval
  5. Potential based Hierarchical Agglomerative clustering
  6. k-means

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICONIAAC '14

Acceptance Rates

ICONIAAC '14 Paper Acceptance Rate 69 of 176 submissions, 39%;
Overall Acceptance Rate 69 of 176 submissions, 39%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)3
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Knowledge Extraction from Distributed Heterogeneous Data Sources2024 5th International Conference for Emerging Technology (INCET)10.1109/INCET61516.2024.10592904(1-6)Online publication date: 24-May-2024
  • (2023)Review on Query-focused Multi-document Summarization (QMDS) with Comparative AnalysisACM Computing Surveys10.1145/359729956:1(1-38)Online publication date: 16-May-2023
  • (2023)Network Analysis of Research Base Papers: Metrics and Potential use2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT56998.2023.10307301(1-9)Online publication date: 6-Jul-2023
  • (2022)Extractive text summarization using clustering-based topic modelingSoft Computing10.1007/s00500-022-07534-627:7(3965-3982)Online publication date: 4-Oct-2022
  • (2021)A Novel Approach to Text Summarisation Using Topic Modelling and Noun Phrase ExtractionAdvances in Computing and Network Communications10.1007/978-981-33-6987-0_24(285-298)Online publication date: 13-Jun-2021
  • (2019)Extractive Approach For Query Based Text Summarization2019 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT)10.1109/ICICT46931.2019.8977708(1-5)Online publication date: Sep-2019
  • (2019)A Modified Medical Information Retrieval System2019 IEEE 9th International Conference on Advanced Computing (IACC)10.1109/IACC48062.2019.8971587(218-222)Online publication date: Dec-2019
  • (2018)A Community Based Web Summarization in Near Linear Time2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI)10.1109/ICACCI.2018.8554789(962-968)Online publication date: Sep-2018
  • (2018)What to Read Next? Challenges and Preliminary Results in Selecting Representative DocumentsDatabase and Expert Systems Applications10.1007/978-3-319-99133-7_19(230-242)Online publication date: 7-Aug-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media