More Web Proxy on the site http://driver.im/

research-article

How Well do Offline and Online Evaluation Metrics Measure User Satisfaction in Web Image Search?

Authors:

Shaoping MaAuthors Info & Claims

SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

Pages 615 - 624

https://doi.org/10.1145/3209978.3210059

Published: 27 June 2018 Publication History

Abstract

Comparing to general Web search engines, image search engines present search results differently, with two-dimensional visual image panel for users to scroll and browse quickly. These differences in result presentation can significantly impact the way that users interact with search engines, and therefore affect existing methods of search evaluation. Although different evaluation metrics have been thoroughly studied in the general Web search environment, how those offline and online metrics reflect user satisfaction in the context of image search is an open question. To shed light on this, we conduct a laboratory user study that collects both explicit user satisfaction feedbacks as well as user behavior signals such as clicks. Based on the combination of both externally assessed topical relevance and image quality judgments, offline image search metrics can be better correlated with user satisfaction than merely using topical relevance. We also demonstrate that existing offline Web search metrics can be adapted to evaluate on a two-dimensional presentation for image search. With respect to online metrics, we find that those based on image click information significantly outperform offline metrics. To our knowledge, our work is the first to thoroughly establish the relationship between different measures and user satisfaction in image search.

References

[1]

Azzah Al-Maskari and Mark Sanderson . 2010. A review of factors influencing user satisfaction in information retrieval. Journal of the Association for Information Science & Technology Vol. 61, 5 (2010), 859--868.

Digital Library

[2]

Azzah Al-Maskari, Mark Sanderson, and Paul Clough . 2007. The relationship between IR effectiveness measures and user satisfaction Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 773--774.

Digital Library

[3]

Paul André, Edward Cutrell, Desney S. Tan, and Greg Smith . 2009. Designing Novel Image Search Interfaces by Understanding Unique Characteristics and Usage. Springer Berlin Heidelberg, Berlin, Heidelberg, 340--353.

[4]

Ben Carterette . 2011. System effectiveness, user models, and user utility: a conceptual framework for investigation. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 903--912.

Digital Library

[5]

Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan . 2009. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM conference on Information and knowledge management. ACM, 621--630.

Digital Library

[6]

Ye Chen, Yiqun Liu, Ke Zhou, Meng Wang, Min Zhang, and Shaoping Ma . 2015. Does vertical bring more satisfaction?: Predicting search satisfaction in a heterogeneous environment. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 1581--1590.

Digital Library

[7]

Ye Chen, Ke Zhou, Yiqun Liu, Min Zhang, and Shaoping Ma . 2017. Meta-evaluation of Online and Offline Web Search Evaluation Metrics Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 15--24.

Digital Library

[8]

Flavio Chierichetti, Ravi Kumar, and Prabhakar Raghavan . 2011. Optimizing two-dimensional search results presentation Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 257--266.

Digital Library

[9]

Aleksandr Chuklin, Pavel Serdyukov, and Maarten De Rijke . 2013. Click model-based information retrieval metrics. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. ACM, 493--502.

Digital Library

[10]

Cyril Cleverdon and Michael Keen . 1966. Aslib Cranfield research project. Factors determining the performance of indexing systems Vol. 2 (1966).

[11]

Ovidiu Dan and Brian D Davison . 2016. Measuring and Predicting Search Engine Users' Satisfaction. ACM Computing Surveys (CSUR) Vol. 49, 1 (2016), 18.

Digital Library

[12]

Zhicheng Dou and Zhicheng Dou . 2013. Summaries, ranked retrieval and sessions: a unified framework for information access evaluation. In International ACM SIGIR Conference on Research and Development in Information Retrieval. 473--482.

Digital Library

[13]

Henry A Feild, James Allan, and Rosie Jones . 2010. Predicting searcher frustration. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 34--41.

Digital Library

[14]

Steve Fox, Kuldeep Karnawat, Mark Mydland, Susan Dumais, and Thomas White . 2005. Evaluating implicit measures to improve web search. Acm Transactions on Information Systems Vol. 23, 2 (2005), 147--168.

Digital Library

[15]

Bo Geng, Linjun Yang, Chao Xu, Xian-Sheng Hua, and Shipeng Li . 2011. The role of attractiveness in web image search. In Proceedings of the 19th ACM international conference on Multimedia. ACM, 63--72.

Digital Library

[16]

Qi Guo and Yang Song . 2016. Large-scale analysis of viewing behavior: Towards measuring satisfaction with mobile proactive systems. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 579--588.

Digital Library

[17]

Qi Guo, Shuai Yuan, and Eugene Agichtein . 2011. Detecting success in mobile search from interaction International ACM SIGIR Conference on Research and Development in Information Retrieval. 1229--1230.

Digital Library

[18]

Ahmed Hassan, Rosie Jones, and Kristina Lisa Klinkner . 2010. Beyond DCG: user behavior as a predictor of a successful search Proceedings of the third ACM international conference on Web search and data mining. ACM, 221--230.

Digital Library

[19]

Ahmed Hassan and Ryen W White . 2013. Personalized models of search satisfaction. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. ACM, 2009--2018.

Digital Library

[20]

Thorsten Hennig-Thurau and Alexander Klee . 1997. The impact of customer satisfaction and relationship quality on customer retention: A critical reassessment and model development. Psychology & Marketing Vol. 14, 8 (1997), 737--764.

[21]

Katja Hofmann, Lihong Li, Filip Radlinski, et almbox. . 2016. Online evaluation for information retrieval. Foundations and Trends® in Information Retrieval Vol. 10, 1 (2016), 1--117.

Digital Library

[22]

Scott B Huffman and Michael Hochster . 2007. How well does result relevance predict session satisfaction? Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 567--574.

Digital Library

[23]

Vidit Jain and Manik Varma . 2011. Learning to re-rank: query-dependent image re-ranking using click data Proceedings of the 20th international conference on World wide web. ACM, 277--286.

Digital Library

[24]

Kalervo J"arvelin and Jaana Kek"al"ainen . 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS) Vol. 20, 4 (2002), 422--446.

Digital Library

[25]

Jiepu Jiang, Ahmed Hassan Awadallah, Xiaolin Shi, and Ryen W. White . 2015. Understanding and Predicting Graded Search Satisfaction Eighth ACM International Conference on Web Search and Data Mining. ACM, 57--66.

Digital Library

[26]

Thorsten Joachims . 2002 a. Evaluating retrieval performance using clickthrough data. Text Mining Vol. 57 (2002), 79--96.

[27]

Thorsten Joachims . 2002 b. Optimizing search engines using clickthrough data. In Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 133--142.

Digital Library

[28]

Karen Sparck Jones . 1981. Information retrieval experiment. Butterworth-Heinemann.

[29]

Madian Khabsa, Aidan Crook, Ahmed Hassan Awadallah, Imed Zitouni, Tasos Anastasakos, and Kyle Williams . 2016. Learning to Account for Good Abandonment in Search Success Metrics ACM International on Conference on Information and Knowledge Management. 1893--1896.

Digital Library

[30]

Youngho Kim, Ahmed Hassan, Ryen W White, and Imed Zitouni . 2014. Comparing client and server dwell time estimates for click-level satisfaction prediction. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. ACM, 895--898.

Digital Library

[31]

Ron Kohavi, Roger Longbotham, Sommerfield Dan, and Randal M. Henne . 2009. Controlled experiments on the web: survey and practical guide. Data Mining & Knowledge Discovery Vol. 18, 1 (2009), 140--181.

Digital Library

[32]

Dmitry Lagun, Chih Hung Hsieh, Dale Webster, and Vidhya Navalpakkam . 2014. Towards better measurement of attention and satisfaction in mobile search International ACM SIGIR Conference on Research & Development in Information Retrieval. 113--122.

Digital Library

[33]

J Richard Landis and Gary G Koch . 1977. The measurement of observer agreement for categorical data. biometrics (1977), 159--174.

[34]

Lihong Li, Jin Young Kim, and Imed Zitouni . 2015. Toward predicting the outcome of an A/B experiment for search relevance Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. ACM, 37--46.

Digital Library

[35]

Yiqun Liu, Ye Chen, Jinhui Tang, Jiashen Sun, Min Zhang, Shaoping Ma, and Xuan Zhu . 2015. Different users, different opinions: Predicting search satisfaction with mouse movement information. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 493--502.

Digital Library

[36]

Cheng Luo, Yiqun Liu, Tetsuya Sakai, Fan Zhang, Min Zhang, and Shaoping Ma . 2017. Evaluating Mobile Search with Height-Biased Gain. In International ACM SIGIR Conference on Research and Development in Information Retrieval. 435--444.

Digital Library

[37]

Christopher D Manning, Prabhakar Raghavan, Hinrich Schütze, et almbox. . 2008. Introduction to information retrieval. Vol. Vol. 1. Cambridge university press Cambridge.

Digital Library

[38]

Jiaxin Mao, Yiqun Liu, Ke Zhou, Jian-Yun Nie, Jingtao Song, Min Zhang, Shaoping Ma, Jiashen Sun, and Hengliang Luo . 2016. When does Relevance Mean Usefulness and User Satisfaction in Web Search? Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 463--472.

Digital Library

[39]

Alistair Moffat, Paul Thomas, and Falk Scholer . 2013. Users versus models: What observation tells us about effectiveness metrics Proceedings of the 22nd ACM international conference on Information & Knowledge Management. ACM, 659--668.

Digital Library

[40]

Alistair Moffat and Justin Zobel . 2008. Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information Systems (TOIS) Vol. 27, 1 (2008), 2.

Digital Library

[41]

Neil O'Hare, Paloma de Juan, Rossano Schifanella, Yunlong He, Dawei Yin, and Yi Chang . 2016. Leveraging user interaction signals for web image search Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 559--568.

Digital Library

[42]

Jose San Pedro and Stefan Siersdorfer . 2009. Ranking and classifying attractiveness of photos in folksonomies International Conference on World Wide Web. 771--780.

Digital Library

[43]

Hsiao Tieh Pu . 2013. A comparative analysis of web image and textual queries. Online Information Review Vol. 29, 5 (2013), 457--467.

[44]

Mark Sanderson . 2010. Performance measures used in image information retrieval. In ImageCLEF. Springer, 81--94.

[45]

Mark Sanderson, Monica Lestari Paramita, Paul Clough, and Evangelos Kanoulas . 2010. Do user preferences and evaluation measures line up? Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 555--562.

Digital Library

[46]

Tefko Saracevic . 1975. Relevance: A review of and a framework for the thinking on the notion in information science. Journal of the Association for Information Science and Technology Vol. 26, 6 (1975), 321--343.

[47]

Anne Schuth, Katja Hofmann, and Filip Radlinski . 2015. Predicting search satisfaction metrics with interleaved comparisons Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 463--472.

Digital Library

[48]

Mark D Smucker and Charles LA Clarke . 2012. Time-based calibration of effectiveness measures. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. ACM, 95--104.

Digital Library

[49]

Louise T. Su . 1992. Evaluation measures for interactive information retrieval. Pergamon Press, Inc. 503--516 pages.

Digital Library

[50]

Reinier H van Leuken, Lluis Garcia, Ximena Olivares, and Roelof van Zwol . 2009. Visual diversification of image search results. In Proceedings of the 18th international conference on World wide web. ACM, 341--350.

Digital Library

[51]

Hongning Wang, Yang Song, Ming Wei Chang, Xiaodong He, Ahmed Hassan, and Ryen W. White . 2014. Modeling action-level satisfaction for search task satisfaction prediction. ACM. 123--132 pages.

Digital Library

[52]

Kyle Williams, Julia Kiseleva, Aidan C. Crook, Imed Zitouni, Ahmed Hassan Awadallah, and Madian Khabsa . 2016. Is This Your Final Answer?: Evaluating the Effect of Answers on Good Abandonment in Mobile Search. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '16). ACM, New York, NY, USA, 889--892.

Digital Library

[53]

Xiaohui Xie, Yiqun Liu, Maarten De Rijke, Jiyin He, Min Zhang, and Shaoping Ma . 2018. Why People Search for Images using Web Search Engines. WSDM'18 (2018).

Digital Library

[54]

Xiaohui Xie, Yiqun Liu, Xiaochuan Wang, Meng Wang, Zhijing Wu, Yingying Wu, Min Zhang, and Shaoping Ma . 2017. Investigating Examination Behavior of Image Search Users Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '17). ACM, New York, NY, USA, 275--284.

Digital Library

[55]

Emine Yilmaz, Manisha Verma, Nick Craswell, Filip Radlinski, and Peter Bailey . 2014. Relevance and effort: An analysis of document utility Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. ACM, 91--100.

Digital Library

Cited By

Tavakoli LTrippas JZamani HScholer FSanderson M(2024)Online and Offline Evaluation in Search ClarificationACM Transactions on Information Systems10.1145/368178643:1(1-30)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3681786
Shao YLi HWu YLiu YAi QMao JMa YMa S(2023)An Intent Taxonomy of Legal Case RetrievalACM Transactions on Information Systems10.1145/362609342:2(1-27)Online publication date: 29-Sep-2023
https://dl.acm.org/doi/10.1145/3626093
Arabzadeh NKmet OCarterette BClarke CHauff CChandar PYoshioka MKiseleva JAliannejadi M(2023)A is for Adele: An Offline Evaluation Metric for Instant SearchProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605115(3-12)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.1145/3578337.3605115
Show More Cited By

Index Terms

How Well do Offline and Online Evaluation Metrics Measure User Satisfaction in Web Image Search?
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
    2. Specialized information retrieval
      1. Multimedia and multimodal retrieval
        Image search

Recommendations

Grid-based Evaluation Metrics for Web Image Search
WWW '19: The World Wide Web Conference

Compared to general web search engines, web image search engines display results in a different way. In web image search, results are typically placed in a grid-based manner rather than a sequential result list. In this scenario, users can view results ...
Meta-evaluation of Online and Offline Web Search Evaluation Metrics
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

As in most information retrieval (IR) studies, evaluation plays an essential part in Web search research. Both offline and online evaluation metrics are adopted in measuring the performance of search engines. Offline metrics are usually based on ...
Query/Task Satisfaction and Grid-based Evaluation Metrics Under Different Image Search Intents
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

People use web image search with various search intents: from serious demands for work to just passing time by browsing images of a favorite actor. Such a diversity of intents can influence user satisfaction and evaluation metrics, both of which are ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

June 2018

1509 pages

ISBN:9781450356572

DOI:10.1145/3209978

General Chairs:
Kevyn Collins-Thompson
University of Michigan, United States
,
Qiaozhu Mei
University of Michigan, United States
,
Program Chairs:
Brian Davison
Lehigh University, United States
,
Yiqun Liu
Tsinghua University, China
,
Emine Yilmaz
University College London, United Kingdom

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key Basic Research Program
Natural Science Foundation of China

Conference

SIGIR '18

Sponsor:

SIGIR

SIGIR '18: The 41st International ACM SIGIR conference on research and development in Information Retrieval

July 8 - 12, 2018

MI, Ann Arbor, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
561
Total Downloads

Downloads (Last 12 months)18
Downloads (Last 6 weeks)3

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tavakoli LTrippas JZamani HScholer FSanderson M(2024)Online and Offline Evaluation in Search ClarificationACM Transactions on Information Systems10.1145/368178643:1(1-30)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3681786
Shao YLi HWu YLiu YAi QMao JMa YMa S(2023)An Intent Taxonomy of Legal Case RetrievalACM Transactions on Information Systems10.1145/362609342:2(1-27)Online publication date: 29-Sep-2023
https://dl.acm.org/doi/10.1145/3626093
Arabzadeh NKmet OCarterette BClarke CHauff CChandar PYoshioka MKiseleva JAliannejadi M(2023)A is for Adele: An Offline Evaluation Metric for Instant SearchProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605115(3-12)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.1145/3578337.3605115
Yang QYe MCai ZSu KDu B(2023)Composed Image Retrieval via Cross Relation Network With Hierarchical Aggregation TransformerIEEE Transactions on Image Processing10.1109/TIP.2023.329979132(4543-4554)Online publication date: 2023
https://doi.org/10.1109/TIP.2023.3299791
Markwald MLiu JYu R(2023)Constructing and meta-evaluating state-aware evaluation metrics for interactive search systemsInformation Retrieval Journal10.1007/s10791-023-09426-126:1-2Online publication date: 31-Oct-2023
https://doi.org/10.1007/s10791-023-09426-1
Liu JLiu J(2023)Formally Modeling Users in Information RetrievalA Behavioral Economics Approach to Interactive Information Retrieval10.1007/978-3-031-23229-9_2(23-64)Online publication date: 18-Feb-2023
https://doi.org/10.1007/978-3-031-23229-9_2
Gogleva Apapa eJansson EDe Baets G(2021)Drug Discovery as a Recommendation Problem: Challenges and Complexities in Biological DecisionsProceedings of the 15th ACM Conference on Recommender Systems10.1145/3460231.3474598(548-550)Online publication date: 13-Sep-2021
https://dl.acm.org/doi/10.1145/3460231.3474598
Piccardi TRedi MColavizza GWest R(2021)On the Value of Wikipedia as a Gateway to the WebProceedings of the Web Conference 202110.1145/3442381.3450136(249-260)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3442381.3450136
Wu DZhang CAiniwaer ALv S(2021)Hybrid Research on Relevance Judgment and Eye Movement for Reverse Image SearchDiversity, Divergence, Dialogue10.1007/978-3-030-71292-1_19(211-228)Online publication date: 19-Mar-2021
https://doi.org/10.1007/978-3-030-71292-1_19
Xie XMao JLiu Yde Rijke MHuang JChang YCheng XKamps JMurdock VWen JLiu Y(2020)Modeling User Behavior for Vertical Search: Images, Apps and ProductsProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401423(2440-2443)Online publication date: 25-Jul-2020
https://dl.acm.org/doi/10.1145/3397271.3401423
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten