[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1935826.1935847acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

A comparative analysis of cascade measures for novelty and diversity

Published: 09 February 2011 Publication History

Abstract

Traditional editorial effectiveness measures, such as nDCG, remain standard for Web search evaluation. Unfortunately, these traditional measures can inappropriately reward redundant information and can fail to reflect the broad range of user needs that can underlie a Web query. To address these deficiencies, several researchers have recently proposed effectiveness measures for novelty and diversity. Many of these measures are based on simple cascade models of user behavior, which operate by considering the relationship between successive elements of a result list. The properties of these measures are still poorly understood, and it is not clear from prior research that they work as intended. In this paper we examine the properties and performance of cascade measures with the goal of validating them as tools for measuring effectiveness. We explore their commonalities and differences, placing them in a unified framework; we discuss their theoretical difficulties and limitations, and compare the measures experimentally, contrasting them against traditional measures and against other approaches to measuring novelty. Data collected by the TREC 2009 Web Track is used as the basis for our experimental comparison. Our results indicate that these measures reward systems that achieve an balance between novelty and overall precision in their result lists, as intended. Nonetheless, other measures provide insights not captured by the cascade measures, and we suggest that future evaluation efforts continue to report a variety of measures.

Supplementary Material

JPG File (wsdm2011_craswell_cac_01.jpg)
MP4 File (wsdm2011_craswell_cac_01.mp4)

References

[1]
R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In 2nd ACM International Conference on Web Search and Data Mining, pages 5--14, 2009.
[2]
C. Buckley and E. M. Voorhees. Retrieval evaluation with incomplete information. In 27th Annual International ACM SIGIR Conference, pages 25--32, 2004.
[3]
J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In 21st Annual International ACM SIGIR Conference, pages 335--336, 1998.
[4]
B. Carterette. An analysis of NP-completeness in novelty and diversity ranking. In 2nd International Conference on the Theory of Information Retrieval, pages 200--211, 2009.
[5]
O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. In 18th ACM Conference on Information and Knowledge Management, pages 621--630, 2009.
[6]
H. Chen and D. R. Karger. Less is more: Probabilistic models for retrieving fewer relevant documents. In 29th Annual International ACM SIGIR Conference, pages 429--436, 2006.
[7]
C. L. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkann, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In 31st Annual International ACM SIGIR Conference, pages 659--666, 2008.
[8]
C. L. A. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2009 Web track. In 18th Text REtrieval Conference, 2009.
[9]
C. L. A. Clarke, M. Kolla, and O. Vechtomova. An effectiveness measure for ambiguous and underspecified queries. In 2nd International Conference on the Theory of Information Retrieval, pages 188--199, 2009.
[10]
N. Craswell, O. Zoeter, M. J. Taylor, and B. Ramsey. An experimental comparison of click position-bias models. In International Conference on Web Search and Web Data Mining, pages 87--94, 2008.
[11]
K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4):422--446, 2002.
[12]
A. Moffat and J. Zobel. Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information Systems, 27(1):1--27, 2008.
[13]
F. Radlinski, R. Kleinberg, and T. Joachims. Learning diverse rankings with multi-armed bandits. In 25th International Conference on Machine Learning, pages 784--791, 2008.
[14]
D. Rafiei, K. Bharat, and A. Shukla. Diversifying Web search results. In 19th International World Wide Web Conference, 2010.
[15]
S. Robertson. On GMAP: and other transformations. In 15th ACM International Conference on Information and Knowledge management, pages 78--83, 2006.
[16]
T. Sakai. Evaluating evaluation metrics based on the bootstrap. In 29th Annual International ACM SIGIR Conference, pages 525--532, 2006.
[17]
T. Sakai, N. Craswell, R. Song, S. Robertson, Z. Dou, and C.-Y. Lin. Simple evaluation metrics for diversified search results. In 3rd International Workshop on Evaluating Information Access, 2010.
[18]
E. M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. In 21st Annual International ACM SIGIR Conference, pages 315--323, 1998.
[19]
E. Yilmaz, M. Shokouhi, N. Craswell, and S. Robertson. Incorporating user behavior information in IR evaluation. In SIGIR 2009 Workshop on Understanding the User: Logging and Interpreting User Interactions in Information Retrieval, 2009.
[20]
E. Yilmaz, M. Shokouhi, N. Craswell, and S. Robertson. Expected browsing utility for Web search evaluation. In 19th ACM International Conference on Information and Knowledge Management, 2010.
[21]
C. Zhai, W. W. Cohen, and J. Lafferty. Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In 26th Annual International ACM SIGIR Conference, pages 10--17, 2003.
[22]
B. Zhang, H. Li, Y. Liu, L. Ji, W. Xi, W. Fan, Z. Chen, and W.-Y. Ma. Improving web search results using affinity graph. In 28th Annual International ACM SIGIR Conference, pages 504--511, 2005.
[23]
Y. Zhang, L. A. F. Park, and A. Moffat. Click-based evidence for decaying weight distributions in search effectiveness metrics. Information Retrieval, 13(1):46--69, February 2010.
[24]
X. Zhu, A. B. Goldberg, J. Van Gael, and D. Andrzejewski. Improving diversity in ranking using absorbing random walks. In Proceedings of Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pages 97--104, 2007.

Cited By

View all
  • (2023)A is for Adele: An Offline Evaluation Metric for Instant SearchProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605115(3-12)Online publication date: 9-Aug-2023
  • (2022)Novelty Detection: A Perspective from Natural Language ProcessingComputational Linguistics10.1162/coli_a_0042948:1(77-117)Online publication date: 4-Apr-2022
  • (2021)Towards Unified Metrics for Accuracy and Diversity for Recommender SystemsProceedings of the 15th ACM Conference on Recommender Systems10.1145/3460231.3474234(75-84)Online publication date: 13-Sep-2021
  • Show More Cited By

Index Terms

  1. A comparative analysis of cascade measures for novelty and diversity

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '11: Proceedings of the fourth ACM international conference on Web search and data mining
    February 2011
    870 pages
    ISBN:9781450304931
    DOI:10.1145/1935826
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 February 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. diversity
    2. effectiveness measures
    3. novelty

    Qualifiers

    • Research-article

    Conference

    Acceptance Rates

    WSDM '11 Paper Acceptance Rate 83 of 372 submissions, 22%;
    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)34
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 17 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)A is for Adele: An Offline Evaluation Metric for Instant SearchProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605115(3-12)Online publication date: 9-Aug-2023
    • (2022)Novelty Detection: A Perspective from Natural Language ProcessingComputational Linguistics10.1162/coli_a_0042948:1(77-117)Online publication date: 4-Apr-2022
    • (2021)Towards Unified Metrics for Accuracy and Diversity for Recommender SystemsProceedings of the 15th ACM Conference on Recommender Systems10.1145/3460231.3474234(75-84)Online publication date: 13-Sep-2021
    • (2021)On the Instability of Diminishing Return IR MeasuresAdvances in Information Retrieval10.1007/978-3-030-72113-8_38(572-586)Online publication date: 27-Mar-2021
    • (2020)Retrieval Evaluation Measures that Agree with Users’ SERP PreferencesACM Transactions on Information Systems10.1145/343181339:2(1-35)Online publication date: 31-Dec-2020
    • (2020)Is your document novel? Let attention guide you. An attention-based model for document-level novelty detectionNatural Language Engineering10.1017/S1351324920000194(1-28)Online publication date: 24-Apr-2020
    • (2020)A Framework for Argument RetrievalAdvances in Information Retrieval10.1007/978-3-030-45439-5_29(431-445)Online publication date: 14-Apr-2020
    • (2019)Revisiting Online Personal Search Metrics with the User in MindProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331266(625-634)Online publication date: 18-Jul-2019
    • (2019)Which Diversity Evaluation Measures Are "Good"?Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331215(595-604)Online publication date: 18-Jul-2019
    • (2019)The Evolution of CranfieldInformation Retrieval Evaluation in a Changing World10.1007/978-3-030-22948-1_2(45-69)Online publication date: 14-Aug-2019
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media