[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2396761.2398678acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

Exploring the cluster hypothesis, and cluster-based retrieval, over the web

Published: 29 October 2012 Publication History

Abstract

We present a study of the cluster hypothesis, and of the performance of cluster-based retrieval methods, performed over large scale Web collections. Among the findings we present are (i) the cluster hypothesis can hold, as determined by a specific test, for large scale Web corpora to the same extent it does for newswire corpora; (ii) while spam documents do not affect the extent to which the cluster hypothesis holds, they considerably affect the performance of cluster based, as well as that of document-based, retrieval methods; and, (iii) as is the case for newswire corpora, cluster-based methods can yield better performance than document-based methods for Web corpora.

References

[1]
N. Abdul-Jaleel, J. Allan, W. B. Croft, F. Diaz, L. Larkey, X. Li, M. D. Smucker, and C. Wade. UMASS at TREC 2004 -- novelty and hard. In Proceedings of TREC-13 2004.
[2]
G. V. Cormack, M. D. Smucker, and C. L. A. Clarke. Efficient and effective spam filtering and re-ranking for large web datasets. CoRR, abs/1004.5168, 2010.
[3]
W. B. Croft. A model of cluster searching based on classification. Information Systems, 5:189--195, 1980.
[4]
A. El-Hamdouchi and P. Willett. Techniques for the measurement of clustering tendency in document retrieval systems. Journal of Information Science, 13:361--365, 1987.
[5]
M. A. Hearst and J. O. Pedersen. Reexamining the cluster hypothesis: Scatter/Gather on retrieval results. In Proceedings of SIGIR, pages 76--84, 1996.
[6]
N. Jardine and C. J. van Rijsbergen. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval, 7(5):217--240, 1971.
[7]
O. Kurland. The opposite of smoothing: A language model approach to ranking query-specific document clusters. In Proceedings of SIGIR, pages 171--178, 2008.
[8]
O. Kurland. Re-ranking search results using language models of query-specific clusters. Journal of Information Retrieval, 12(4):437--460, August 2009.
[9]
O. Kurland and L. Lee. Respect my authority! HITS without hyperlinks utilizing cluster-based language models. In Proceedings of SIGIR, pages 83--90, 2006.
[10]
O. Kurland and L. Lee. Clusters, language models, and ad hoc information retrieval. ACM Transactions on information systems, 27(3), 2009.
[11]
V. Lavrenko and W. B. Croft. Relevance-based language models. In Proceedings of SIGIR, pages 120--127, 2001.
[12]
X. Liu and W. B. Croft. Cluster-based retrieval using language models. In Proceedings of SIGIR, pages 186--193, 2004.
[13]
X. Liu and W. B. Croft. Experiments on retrieval of optimal clusters. Technical Report IR-478, Center for Intelligent Information Retrieval (CIIR), University of Massachusetts, 2006.
[14]
X. Liu and W. B. Croft. Evaluating text representations for retrieval of the best group of documents. In Proceedings of ECIR, pages 454--462, 2008.
[15]
D. Metzler and W. B. Croft. A Markov random field model for term dependencies. In Proceedings of SIGIR, pages 472--479, 2005.
[16]
S.-H. Na, I.-S. Kang, and J.-H. Lee. Revisit of nearest neighbor test for direct evaluation of inter-document similarities. In Proceedings of ECIR, pages 674--678, 2008.
[17]
J. Seo and W. B. Croft. Geometric representations for multiple documents. In Proceedings of SIGIR, pages 251--258, 2010.
[18]
M. D. Smucker and J. Allan. A new measure of the cluster hypothesis. In Proceedings of ICTIR, pages 281--288, 2009.
[19]
A. Tombros, R. Villa, and C. van Rijsbergen. The effectiveness of query-specific hierarchic clustering in information retrieval. Information Processing and Management}, 38(4):559--582, 2002.
[20]
C. J. van Rijsbergen. Information Retrieval. Butterworths, second edition, 19
[21]
E. M. Voorhees. The cluster hypothesis revisited. In Proceedings of SIGIR, pages 188--196, 1985.
[22]
P. Willett. Query specific automatic document classification. International Forum on Information and Documentation, 10(2):28--32, 1985.
[23]
L. Yang, D. Ji, G. Zhou, Y. Nie, and G. Xiao. Document re-ranking using cluster validation and label propagation. In Proceedings of CIKM, pages 690--697, 2006.
[24]
C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of SIGIR, pages 334--342, 2001.

Cited By

View all
  • (2019)Hypergraph-of-entityOpen Computer Science10.1515/comp-2019-00069:1(103-127)Online publication date: 6-Jun-2019
  • (2019)Relevance-driven Clustering for Visual Information Retrieval on TwitterProceedings of the 2019 Conference on Human Information Interaction and Retrieval10.1145/3295750.3298914(349-353)Online publication date: 8-Mar-2019
  • (2018)Testing the Cluster Hypothesis with Focused and Graded Relevance JudgmentsThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210120(1173-1176)Online publication date: 27-Jun-2018
  • Show More Cited By

Index Terms

  1. Exploring the cluster hypothesis, and cluster-based retrieval, over the web

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
    October 2012
    2840 pages
    ISBN:9781450311564
    DOI:10.1145/2396761
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 October 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cluster hypothesis
    2. cluster-based retrieval

    Qualifiers

    • Poster

    Conference

    CIKM'12
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 18 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Hypergraph-of-entityOpen Computer Science10.1515/comp-2019-00069:1(103-127)Online publication date: 6-Jun-2019
    • (2019)Relevance-driven Clustering for Visual Information Retrieval on TwitterProceedings of the 2019 Conference on Human Information Interaction and Retrieval10.1145/3295750.3298914(349-353)Online publication date: 8-Mar-2019
    • (2018)Testing the Cluster Hypothesis with Focused and Graded Relevance JudgmentsThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210120(1173-1176)Online publication date: 27-Jun-2018
    • (2018)Fine-grained document clustering via ranking and its application to social media analyticsSocial Network Analysis and Mining10.1007/s13278-018-0508-z8:1Online publication date: 7-Apr-2018
    • (2014)On Clustering and PolyrepresentationProceedings of the 36th European Conference on IR Research on Advances in Information Retrieval - Volume 841610.5555/2964060.2964107(618-623)Online publication date: 13-Apr-2014
    • (2014)Parameter Tuning with User ModelsProceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management10.1145/2661829.2661911(1823-1826)Online publication date: 3-Nov-2014
    • (2014)The correlation between cluster hypothesis tests and the effectiveness of cluster-based retrievalProceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval10.1145/2600428.2609533(1155-1158)Online publication date: 3-Jul-2014
    • (2014)The Cluster Hypothesis in Information RetrievalProceedings of the 36th European Conference on IR Research on Advances in Information Retrieval - Volume 841610.1007/978-3-319-06028-6_105(823-826)Online publication date: 13-Apr-2014

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media