Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2024
Evaluation Measures of Individual Item Fairness for Recommender Systems: A Critical Study
ACM Transactions on Recommender Systems (TORS), Volume 3, Issue 2Article No.: 18, Pages 1–52https://doi.org/10.1145/3631943Fairness is an emerging and challenging topic in recommender systems. In recent years, various ways of evaluating and therefore improving fairness have emerged. In this study, we examine existing evaluation measures of fairness in recommender systems. ...
- research-articleAugust 2023
A Versatile Framework for Evaluating Ranked Lists in Terms of Group Fairness and Relevance
ACM Transactions on Information Systems (TOIS), Volume 42, Issue 1Article No.: 11, Pages 1–36https://doi.org/10.1145/3589763We present a simple and versatile framework for evaluating ranked lists in terms of Group Fairness and Relevance, in which the groups (i.e., possible attribute values) can be either nominal or ordinal in nature. First, we demonstrate that when our ...
- keynoteAugust 2023
Evaluating Parrots and Sociopathic Liars (keynote)
ICTIR '23: Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information RetrievalPage 1https://doi.org/10.1145/3578337.3605144This talk builds on my SWAN (Schematised Weighted Average Nugget) paper published in May 2023, which discusses a generic framework for auditing a given textual conversational system. The framework assumes that conversation sessions have already been ...
- short-paperJuly 2023
VoMBaT: A Tool for Visualising Evaluation Measure Behaviour in High-Recall Search Tasks
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information RetrievalPages 3105–3109https://doi.org/10.1145/3539618.3591802The objective of High-Recall Information Retrieval (HRIR) is to retrieve as many relevant documents as possible for a given search topic. One approach to HRIR is Technology-Assisted Review (TAR), which uses information retrieval and machine learning ...
- surveyDecember 2022
Survey on the Objectives of Recommender Systems: Measures, Solutions, Evaluation Methodology, and New Perspectives
ACM Computing Surveys (CSUR), Volume 55, Issue 5Article No.: 93, Pages 1–38https://doi.org/10.1145/3527449Recently, recommender systems have played an increasingly important role in a wide variety of commercial applications to help users find favourite products. Research in the recommender system field has traditionally focused on the accuracy of predictions ...
-
- research-articleAugust 2022
Towards Formally Grounded Evaluation Measures for Semantic Parsing-based Knowledge Graph Question Answering
ICTIR '22: Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information RetrievalPages 3–12https://doi.org/10.1145/3539813.3545146Knowledge graph question answering (KGQA) is important to make structured information accessible without formal query language expertise on the part of the users. The semantic parsing (SP) flavor of this task maps a natural language question to a formal ...
- short-paperAugust 2022
Do Extractive Summarization Algorithms Amplify Lexical Bias in News Articles?
ICTIR '22: Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information RetrievalPages 133–137https://doi.org/10.1145/3539813.3545123Users who read news summaries on search engine result pages and social media may not access the original news articles. Hence, if the summaries are automatically generated, it is vital that the automatic summaries represent the contents of the original ...
- research-articleJuly 2022
Ranking Interruptus: When Truncated Rankings Are Better and How to Measure That
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information RetrievalPages 588–598https://doi.org/10.1145/3477495.3532051Most of information retrieval effectiveness evaluation metrics assume that systems appending irrelevant documents at the bottom of the ranking are as effective as (or not worse than) systems that have a stopping criteria to 'truncate' the ranking at the ...
- research-articleDecember 2020
Retrieval Evaluation Measures that Agree with Users’ SERP Preferences: Traditional, Preference-based, and Diversity Measures
ACM Transactions on Information Systems (TOIS), Volume 39, Issue 2Article No.: 14, Pages 1–35https://doi.org/10.1145/3431813We examine the “goodness” of ranked retrieval evaluation measures in terms of how well they align with users’ Search Engine Result Page (SERP) preferences for web search. The SERP preferences cover 1,127 topic-SERP-SERP triplets extracted from the NTCIR-...
- research-articleJuly 2020
Good Evaluation Measures based on Document Preferences
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information RetrievalPages 359–368https://doi.org/10.1145/3397271.3401115For offline evaluation of IR systems, some researchers have proposed to utilise pairwise document preference assessments instead of relevance assessments of individual documents, as it may be easier for assessors to make relative decisions rather than ...
- short-paperSeptember 2019
Generalising Kendall's Tau for Noisy and Incomplete Preference Judgements
ICTIR '19: Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information RetrievalPages 193–196https://doi.org/10.1145/3341981.3344246We propose a new ranking evaluation measure for situations where multiple preference judgements are given for each item pair but they may be noisy (i.e., some judgements are unreliable) and/or incomplete (i.e., some judgements are missing). While it is ...
- short-paperJuly 2019
Evaluating Variable-Length Multiple-Option Lists in Chatbots and Mobile Search
SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information RetrievalPages 997–1000https://doi.org/10.1145/3331184.3331308In recent years, the proliferation of smart mobile devices has lead to the gradual integration of search functionality within mobile platforms. This has created an incentive to move away from the "ten blue links" metaphor, as mobile users are less ...
- research-articleJuly 2019
Which Diversity Evaluation Measures Are "Good"?
SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information RetrievalPages 595–604https://doi.org/10.1145/3331184.3331215This study evaluates 30 IR evaluation measures or their instances, of which nine are for adhoc IR and 21 are for diversified IR, primarily from the viewpoint of whether their preferences of one SERP (search engine result page) over another actually ...
- short-paperOctober 2018
Unsupervised Evaluation of Text Co-clustering Algorithms Using Neural Word Embeddings
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge ManagementPages 1827–1830https://doi.org/10.1145/3269206.3269282Text clustering, which allows to divide a dataset into groups of similar documents, plays an important role at various stages of the information retrieval process. Co-clustering is an extension of one-side clustering, and consists in simultaneously ...
- short-paperJune 2018
Comparing Two Binned Probability Distributions for Information Access Evaluation
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information RetrievalPages 1073–1076https://doi.org/10.1145/3209978.3210073Some modern information access tasks such as natural language dialogue tasks are difficult to evaluate, for often there is no such thing as the ground truth: different users may have different opinions about the system's output. A few task designs for ...
- research-articleOctober 2017
Evaluation Measures for Relevance and Credibility in Ranked Lists
ICTIR '17: Proceedings of the ACM SIGIR International Conference on Theory of Information RetrievalPages 91–98https://doi.org/10.1145/3121050.3121072Recent discussions on alternative facts, fake news, and post truth politics have motivated research on creating technologies that allow people not only to access information, but also to assess the credibility of the information presented to them by ...
- research-articleOctober 2017Best Paper
Are IR Evaluation Measures on an Interval Scale?
ICTIR '17: Proceedings of the ACM SIGIR International Conference on Theory of Information RetrievalPages 67–74https://doi.org/10.1145/3121050.3121058In this paper, we formally investigate whether, or not, IR evaluation measures are on an interval scale, which is needed to safely compute the basic statistics, such as mean and variance, we daily use to compare IR systems. We face this issue in the ...
- research-articleAugust 2017
What Does Affect the Correlation Among Evaluation Measures?
ACM Transactions on Information Systems (TOIS), Volume 36, Issue 2Article No.: 19, Pages 1–40https://doi.org/10.1145/3106371Information Retrieval (IR) is well-known for the great number of adopted evaluation measures, with new ones popping up more and more frequently. In this context, correlation analysis is the tool used to study the evaluation measures and to let us ...
- research-articleSeptember 2015
An Axiomatically Derived Measure for the Evaluation of Classification Algorithms
ICTIR '15: Proceedings of the 2015 International Conference on The Theory of Information RetrievalPages 11–20https://doi.org/10.1145/2808194.2809449We address the general problem of finding suitable evaluation measures for classification systems. To this end, we adopt an axiomatic approach, i.e., we discuss a number of properties ("axioms") that an evaluation measure for classification should ...
- research-articleFebruary 2015
Listwise Approach for Rank Aggregation in Crowdsourcing
WSDM '15: Proceedings of the Eighth ACM International Conference on Web Search and Data MiningPages 253–262https://doi.org/10.1145/2684822.2685308Inferring a gold-standard ranking over a set of objects, such as documents or images, is a key task to build test collections for various applications like Web search and recommender systems. Crowdsourcing services provide an efficient and inexpensive ...