[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2983323.2983737acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Selective Cluster-Based Document Retrieval

Published: 24 October 2016 Publication History

Abstract

We address the long standing challenge of selective cluster-based retrieval; namely, deciding on a per-query basis whether to apply cluster-based document retrieval or standard document retrieval. To address this classification task, we propose a few sets of features based on those utilized by the cluster-based ranker, query-performance predictors, and properties of the clustering structure. Empirical evaluation shows that our method outperforms state-of-the-art retrieval approaches, including cluster-based, query expansion, and term proximity methods.

References

[1]
N. Abdul-Jaleel, J. Allan, W. B. Croft, F. Diaz, L. Larkey, X. Li, M. D. Smucker, and C. Wade. UMASS at TREC 2004 -- novelty and hard. In Proc. of TREC-13, 2004.
[2]
N. Balasubramanian and J. Allan. Learning to select rankers. In Proc. of SIGIR, pages 855--856, 2010.
[3]
M. Bendersky, W. B. Croft, and Y. Diao. Quality-biased ranking of web documents. In Proc. of WSDM, pages 95--104, 2011.
[4]
D. Carmel and E. Yom-Tov. Estimating the Query Difficulty for Information Retrieval. Synthesis lectures on information concepts, retrieval, and services. Morgan & Claypool, 2010.
[5]
K. Collins-Thompson, P. N. Bennett, F. Diaz, C. Clarke, and E. M. Voorhees. TREC 2013 web track overview. In Proc. of TREC, 2013.
[6]
G. V. Cormack, C. L. A. Clarke, and S. Büttcher. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In Proc. of SIGIR, pages 758--759, 2009.
[7]
G. V. Cormack, M. D. Smucker, and C. L. A. Clarke. Efficient and effective spam filtering and re-ranking for large web datasets. Information Retrieval Journal, 14(5):441--465, 2011.
[8]
W. B. Croft. A model of cluster searching based on classification. Information Systems, 5:189--195, 1980.
[9]
W. B. Croft and R. Thompson. The use of adaptive mechanisms for selection of search strategies in document retrieval systems. In Proc. of SIGIR, pages 95--110, 1984.
[10]
W. B. Croft and R. H. Thompson. I 3 R: A new approach to the design of document retrieval systems. Journal of the American Society for Information Science and Technology, 38(6):389--404, 1984.
[11]
S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In Proc. of SIGIR, pages 299--306, 2002.
[12]
F. Diaz. Regularizing ad hoc retrieval scores. In Proc. of CIKM, pages 672--679, 2005.
[13]
B. Gaonkar, A. Sotiras, and C. Davatzikos. Deriving statistical significance maps for support vector regression using medical imaging data. In International Workshop on Pattern Recognition in Neuroimaging, PRNI, pages 13--16, 2013.
[14]
A. Griffiths, H. C. Luckhurst, and P. Willett. Using interdocument similarity information in document retrieval systems. Journal of the American Society for Information Science (JASIS), 37(1):3--11, 1986.
[15]
M. A. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The WEKA data mining software: an update. SIGKDD Explorations, 11(1):10--18, 2009.
[16]
B. He and I. Ounis. Inferring query performance using pre-retrieval predictors. In Proc. of SPIRE, pages 43--54, 2004.
[17]
N. Jardine and C. J. van Rijsbergen. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval, 7(5):217--240, 1971.
[18]
O. Kurland. The opposite of smoothing: A language model approach to ranking query-specific document clusters. In Proc. of SIGIR, pages 171--178, 2008.
[19]
O. Kurland. Re-ranking search results using language models of query-specific clusters. Journal of Information Retrieval, 12(4):437--460, August 2009.
[20]
O. Kurland and C. Domshlak. A rank-aggregation approach to searching for optimal query-specific clusters. In Proc. of SIGIR, pages 547--554, 2008.
[21]
O. Kurland and L. Lee. PageRank without hyperlinks: Structural re-ranking using links induced by language models. In Proc. of SIGIR, pages 306--313, 2005.
[22]
O. Kurland and L. Lee. Respect my authority! HITS without hyperlinks utilizing cluster-based language models. In Proc. of SIGIR, pages 83--90, 2006.
[23]
O. Kurland, F. Raiber, and A. Shtok. Query-performance prediction and cluster ranking: Two sides of the same coin. In Proc. of CIKM, pages 2459--2462, 2012.
[24]
J. D. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In Proc. of SIGIR, pages 111--119, 2001.
[25]
V. Lavrenko and W. B. Croft. Relevance-based language models. In Proc. of SIGIR, pages 120--127, 2001.
[26]
K.-S. Lee, W. B. Croft, and J. Allan. A cluster-based resampling method for pseudo-relevance feedback. In Proc. of SIGIR, pages 235--242, 2008.
[27]
K.-S. Lee, Y.-C. Park, and K.-S. Choi. Re-ranking model based on document clusters. Information Processing and Management, 37(1):1--14, 2001.
[28]
X. Liu and W. B. Croft. Cluster-based retrieval using language models. In Proc. of SIGIR, pages 186--193, 2004.
[29]
X. Liu and W. B. Croft. Experiments on retrieval of optimal clusters. Technical Report IR-478, University of Massachusetts, 2006.
[30]
X. Liu and W. B. Croft. Evaluating text representations for retrieval of the best group of documents. In Proc. of ECIR, pages 454--462, 2008.
[31]
C. Macdonald, R. L. T. Santos, and I. Ounis. On the usefulness of query features for learning to rank. In Proc. of CIKM, pages 2559--2562, 2012.
[32]
L. Meister, O. Kurland, and I. G. Kalmanovich. Re-ranking search results using an additional retrieved list. Information Retrieval, 14(4):413--437, 2010.
[33]
D. Metzler and W. B. Croft. A Markov random field model for term dependencies. In Proc. of SIGIR, pages 472--479, 2005.
[34]
J. C. Platt. Sequential minimal optimization: A fast algorithm for training support vector machines. Technical report, Advances in Kernel Methods - Support Vector Learning, 1998.
[35]
F. Raiber and O. Kurland. Ranking document clusters using markov random fields. In Proc. of SIGIR, pages 333--342, 2013.
[36]
F. Raiber and O. Kurland. Query-performance prediction: setting the expectations straight. In Proc. of SIGIR, pages 13--22, 2014.
[37]
A. Shtok, O. Kurland, and D. Carmel. Predicting query performance by query-drift estimation. In Proc. of ICTIR, pages 305--312, 2009.
[38]
A. Tombros, R. Villa, and C. van Rijsbergen. The effectiveness of query-specific hierarchic clustering in information retrieval. Information Processing and Management, 38(4):559--582, 2002.
[39]
V. Vinay, I. J. Cox, N. Milic-Frayling, and K. R. Wood. On ranking the effectiveness of searches. In Proc. of SIGIR, pages 398--404, 2006.
[40]
E. M. Voorhees. The cluster hypothesis revisited. In Proc. of SIGIR, pages 188--196, 1985.
[41]
P. Willett. Query specific automatic document classification. International Forum on Information and Documentation, 10(2):28--32, 1985.
[42]
P. Willett. Recent trends in hierarchical document clustering: A critical review. Information Processing and Management, 24(5):577--97, 1988.
[43]
L. Yang, D. Ji, G. Zhou, Y. Nie, and G. Xiao. Document re-ranking using cluster validation and label propagation. In Proc. of CIKM, pages 690--697, 2006.
[44]
C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proc. of SIGIR, pages 334--342, 2001.
[45]
Y. Zhao, F. Scholer, and Y. Tsegay. Effective pre-retrieval query performance prediction using similarity and variability evidence. In Proc. of ECIR, pages 52--64, 2008.

Cited By

View all
  • (2022)From Cluster Ranking to Document RankingProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531819(2137-2141)Online publication date: 6-Jul-2022
  • (2020)Cluster-based information retrieval using pattern miningApplied Intelligence10.1007/s10489-020-01922-xOnline publication date: 17-Oct-2020
  • (2019)Employing query disambiguation using clustering techniquesEvolving Systems10.1007/s12530-019-09292-711:2(305-315)Online publication date: 11-Jul-2019
  • Show More Cited By

Index Terms

  1. Selective Cluster-Based Document Retrieval

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
    October 2016
    2566 pages
    ISBN:9781450340731
    DOI:10.1145/2983323
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 October 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. ad hoc retrieval
    2. cluster-based retrieval

    Qualifiers

    • Research-article

    Conference

    CIKM'16
    Sponsor:
    CIKM'16: ACM Conference on Information and Knowledge Management
    October 24 - 28, 2016
    Indiana, Indianapolis, USA

    Acceptance Rates

    CIKM '16 Paper Acceptance Rate 160 of 701 submissions, 23%;
    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 27 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)From Cluster Ranking to Document RankingProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531819(2137-2141)Online publication date: 6-Jul-2022
    • (2020)Cluster-based information retrieval using pattern miningApplied Intelligence10.1007/s10489-020-01922-xOnline publication date: 17-Oct-2020
    • (2019)Employing query disambiguation using clustering techniquesEvolving Systems10.1007/s12530-019-09292-711:2(305-315)Online publication date: 11-Jul-2019
    • (2018)Fusion in Information RetrievalThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210186(1383-1386)Online publication date: 27-Jun-2018
    • (2018)Selective Cluster Presentation on the Search Results PageACM Transactions on Information Systems10.1145/315867236:3(1-42)Online publication date: 28-Feb-2018
    • (2018)Query Disambiguation Based on Clustering TechniquesArtificial Intelligence Applications and Innovations10.1007/978-3-319-92016-0_13(133-145)Online publication date: 22-May-2018

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media