[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1774088.1774465acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Traveling among clusters: a way to reconsider the benefits of the cluster hypothesis

Published: 22 March 2010 Publication History

Abstract

Relying on the Cluster Hypothesis which states that relevant documents tend to be more similar one to each other than to non-relevant documents, most of information retrieval systems organizing search results as a set of clusters seek to gather all relevant documents in the same cluster. We propose here to reconsider the benefits of the entailed concentration of the relevant information. Contrary to what is commonly admitted, we believe that systems which aim to distribute the relevant documents in different clusters, since being more likely to highlight different aspects of the subject, may be at least as useful for the user as systems gathering all relevant documents in a single group. Since existing evaluation measures tend to greatly favor the latter systems, we first investigate ways to more fairly assess the ability to reach the relevant information from the list of cluster descriptions. At last, we show that systems distributing the relevant information in different clusters may actually provide a better information access than classical systems.

References

[1]
R. A. Baeza-Yates and B. A. Ribeiro-Neto. Modern Information Retrieval. ACM Press / Addison-Wesley, 1999.
[2]
P. Bellot and M. El-Bèze. Query length, number of classes and routes through clusters: Experiments with a clustering method for information retrieval. In ICSC'99, pages 196--205, 1999. Springer-Verlag.
[3]
W. Cooper. Expected search length: A single measure of retrieval effectiveness based on weak ordering action of retrieval systems. Journal of the American Society for Information Science, 19:30--41, 1968.
[4]
W. B. Croft. A model of cluster searching bases on classification. Information Systems, 5(3):189--195, 1980.
[5]
D. R. Cutting, J. O. Pedersen, D. Karger, and J. W. Tukey. Scatter/gather: A cluster-based approach to browsing large document collections. In SIGIR'92, pages 318--329, 1992.
[6]
M. A. Hearst and J. O. Pedersen. Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In SIGIR'96, pages 76--84, Zürich, CH, 1996.
[7]
N. Jardine and C. J. Van Rijsbergen. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval, 7(5):217--240, 1971.
[8]
S. Koshman, A. Spink, and B. J. Jansen. Web searching on the vivisimo search engine. Journal of the American Society for Information Science and Technology, 57(14), 2006.
[9]
A. Leuski. Evaluating document clustering for interactive information retrieval. In CIKM '01, pages 33--40, New York, NY, USA, 2001. ACM.
[10]
A. V. Leuski. Interactive information organization: techniques and evaluation. PhD thesis, University of Amhert, Massachussets, 2001. Director-James Allan.
[11]
D. Liu, Y. He, D. Ji, and H. Yang. Genetic algorithm based multi-document summarization. In PRICAI'06, pages 1140--1144. 2006. Springer.
[12]
C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008.
[13]
G. Salton. Cluster search strategies and the optimization of retrieval effectiveness. In The SMART Retrieval System: Experiments in Automatic Document Processing, pages 223--242, Prentice-Hall, 1971.
[14]
G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513--523, 1988.
[15]
G. C. Stein, A. Bagga, and G. B. Wise. Multi-document summarization: Methodologies and evaluations. In TALN'00, pages 337--346, 2000.
[16]
A. Tombros. The Effectiveness of Query-Based Hierarchic Clustering of Documents for Information Retrieval. PhD thesis, University of Glasgow, 2002.
[17]
A. Tombros and C. J. Van Rijsbergen. Query-sensitive similarity measures for information retrieval. Knowledge Information Systems, 6(5):617--642, 2004.
[18]
J. T. Tou and R. C. Gonzalez. Pattern recognition principles. Applied Mathematics and Computation, Reading, Mass.: Addison-Wesley, 1974, 1974.
[19]
C. Van Rijsbergen. Information Retrieval, 2nd edition. Departement of Computer Science, Information Retrieval Group, University of Glasgow, 1979.

Cited By

View all
  • (2024)A Framework for Defining Algorithmic Fairness in the Context of Information AccessProceedings of the Association for Information Science and Technology10.1002/pra2.107761:1(667-672)Online publication date: 15-Oct-2024
  • (2022)Rethinking Algorithmic Fairness in the Context of Information AccessProceedings of the Association for Information Science and Technology10.1002/pra2.73659:1(815-817)Online publication date: 14-Oct-2022
  • (2011)Envisioning dynamic quantum clustering in information retrievalProceedings of the 5th international conference on Quantum interaction10.5555/2074824.2074854(211-216)Online publication date: 26-Jun-2011

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '10: Proceedings of the 2010 ACM Symposium on Applied Computing
March 2010
2712 pages
ISBN:9781605586397
DOI:10.1145/1774088
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 March 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. evaluation measures
  2. search results clustering

Qualifiers

  • Research-article

Conference

SAC'10
Sponsor:
SAC'10: The 2010 ACM Symposium on Applied Computing
March 22 - 26, 2010
Sierre, Switzerland

Acceptance Rates

SAC '10 Paper Acceptance Rate 364 of 1,353 submissions, 27%;
Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Framework for Defining Algorithmic Fairness in the Context of Information AccessProceedings of the Association for Information Science and Technology10.1002/pra2.107761:1(667-672)Online publication date: 15-Oct-2024
  • (2022)Rethinking Algorithmic Fairness in the Context of Information AccessProceedings of the Association for Information Science and Technology10.1002/pra2.73659:1(815-817)Online publication date: 14-Oct-2022
  • (2011)Envisioning dynamic quantum clustering in information retrievalProceedings of the 5th international conference on Quantum interaction10.5555/2074824.2074854(211-216)Online publication date: 26-Jun-2011

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media