[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3209978.3210059acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections

How Well do Offline and Online Evaluation Metrics Measure User Satisfaction in Web Image Search?

Published: 27 June 2018 Publication History


Comparing to general Web search engines, image search engines present search results differently, with two-dimensional visual image panel for users to scroll and browse quickly. These differences in result presentation can significantly impact the way that users interact with search engines, and therefore affect existing methods of search evaluation. Although different evaluation metrics have been thoroughly studied in the general Web search environment, how those offline and online metrics reflect user satisfaction in the context of image search is an open question. To shed light on this, we conduct a laboratory user study that collects both explicit user satisfaction feedbacks as well as user behavior signals such as clicks. Based on the combination of both externally assessed topical relevance and image quality judgments, offline image search metrics can be better correlated with user satisfaction than merely using topical relevance. We also demonstrate that existing offline Web search metrics can be adapted to evaluate on a two-dimensional presentation for image search. With respect to online metrics, we find that those based on image click information significantly outperform offline metrics. To our knowledge, our work is the first to thoroughly establish the relationship between different measures and user satisfaction in image search.


Azzah Al-Maskari and Mark Sanderson . 2010. A review of factors influencing user satisfaction in information retrieval. Journal of the Association for Information Science & Technology Vol. 61, 5 (2010), 859--868.
Azzah Al-Maskari, Mark Sanderson, and Paul Clough . 2007. The relationship between IR effectiveness measures and user satisfaction Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 773--774.
Paul André, Edward Cutrell, Desney S. Tan, and Greg Smith . 2009. Designing Novel Image Search Interfaces by Understanding Unique Characteristics and Usage. Springer Berlin Heidelberg, Berlin, Heidelberg, 340--353.
Ben Carterette . 2011. System effectiveness, user models, and user utility: a conceptual framework for investigation. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 903--912.
Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan . 2009. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM conference on Information and knowledge management. ACM, 621--630.
Ye Chen, Yiqun Liu, Ke Zhou, Meng Wang, Min Zhang, and Shaoping Ma . 2015. Does vertical bring more satisfaction?: Predicting search satisfaction in a heterogeneous environment. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 1581--1590.
Ye Chen, Ke Zhou, Yiqun Liu, Min Zhang, and Shaoping Ma . 2017. Meta-evaluation of Online and Offline Web Search Evaluation Metrics Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 15--24.
Flavio Chierichetti, Ravi Kumar, and Prabhakar Raghavan . 2011. Optimizing two-dimensional search results presentation Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 257--266.
Aleksandr Chuklin, Pavel Serdyukov, and Maarten De Rijke . 2013. Click model-based information retrieval metrics. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. ACM, 493--502.
Cyril Cleverdon and Michael Keen . 1966. Aslib Cranfield research project. Factors determining the performance of indexing systems Vol. 2 (1966).
Ovidiu Dan and Brian D Davison . 2016. Measuring and Predicting Search Engine Users' Satisfaction. ACM Computing Surveys (CSUR) Vol. 49, 1 (2016), 18.
Zhicheng Dou and Zhicheng Dou . 2013. Summaries, ranked retrieval and sessions: a unified framework for information access evaluation. In International ACM SIGIR Conference on Research and Development in Information Retrieval. 473--482.
Henry A Feild, James Allan, and Rosie Jones . 2010. Predicting searcher frustration. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 34--41.
Steve Fox, Kuldeep Karnawat, Mark Mydland, Susan Dumais, and Thomas White . 2005. Evaluating implicit measures to improve web search. Acm Transactions on Information Systems Vol. 23, 2 (2005), 147--168.
Bo Geng, Linjun Yang, Chao Xu, Xian-Sheng Hua, and Shipeng Li . 2011. The role of attractiveness in web image search. In Proceedings of the 19th ACM international conference on Multimedia. ACM, 63--72.
Qi Guo and Yang Song . 2016. Large-scale analysis of viewing behavior: Towards measuring satisfaction with mobile proactive systems. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 579--588.
Qi Guo, Shuai Yuan, and Eugene Agichtein . 2011. Detecting success in mobile search from interaction International ACM SIGIR Conference on Research and Development in Information Retrieval. 1229--1230.
Ahmed Hassan, Rosie Jones, and Kristina Lisa Klinkner . 2010. Beyond DCG: user behavior as a predictor of a successful search Proceedings of the third ACM international conference on Web search and data mining. ACM, 221--230.
Ahmed Hassan and Ryen W White . 2013. Personalized models of search satisfaction. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management. ACM, 2009--2018.
Thorsten Hennig-Thurau and Alexander Klee . 1997. The impact of customer satisfaction and relationship quality on customer retention: A critical reassessment and model development. Psychology & Marketing Vol. 14, 8 (1997), 737--764.
Katja Hofmann, Lihong Li, Filip Radlinski, et almbox. . 2016. Online evaluation for information retrieval. Foundations and Trends® in Information Retrieval Vol. 10, 1 (2016), 1--117.
Scott B Huffman and Michael Hochster . 2007. How well does result relevance predict session satisfaction? Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 567--574.
Vidit Jain and Manik Varma . 2011. Learning to re-rank: query-dependent image re-ranking using click data Proceedings of the 20th international conference on World wide web. ACM, 277--286.
Kalervo J"arvelin and Jaana Kek"al"ainen . 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS) Vol. 20, 4 (2002), 422--446.
Jiepu Jiang, Ahmed Hassan Awadallah, Xiaolin Shi, and Ryen W. White . 2015. Understanding and Predicting Graded Search Satisfaction Eighth ACM International Conference on Web Search and Data Mining. ACM, 57--66.
Thorsten Joachims . 2002 a. Evaluating retrieval performance using clickthrough data. Text Mining Vol. 57 (2002), 79--96.
Thorsten Joachims . 2002 b. Optimizing search engines using clickthrough data. In Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 133--142.
Karen Sparck Jones . 1981. Information retrieval experiment. Butterworth-Heinemann.
Madian Khabsa, Aidan Crook, Ahmed Hassan Awadallah, Imed Zitouni, Tasos Anastasakos, and Kyle Williams . 2016. Learning to Account for Good Abandonment in Search Success Metrics ACM International on Conference on Information and Knowledge Management. 1893--1896.
Youngho Kim, Ahmed Hassan, Ryen W White, and Imed Zitouni . 2014. Comparing client and server dwell time estimates for click-level satisfaction prediction. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. ACM, 895--898.
Ron Kohavi, Roger Longbotham, Sommerfield Dan, and Randal M. Henne . 2009. Controlled experiments on the web: survey and practical guide. Data Mining & Knowledge Discovery Vol. 18, 1 (2009), 140--181.
Dmitry Lagun, Chih Hung Hsieh, Dale Webster, and Vidhya Navalpakkam . 2014. Towards better measurement of attention and satisfaction in mobile search International ACM SIGIR Conference on Research & Development in Information Retrieval. 113--122.
J Richard Landis and Gary G Koch . 1977. The measurement of observer agreement for categorical data. biometrics (1977), 159--174.
Lihong Li, Jin Young Kim, and Imed Zitouni . 2015. Toward predicting the outcome of an A/B experiment for search relevance Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. ACM, 37--46.
Yiqun Liu, Ye Chen, Jinhui Tang, Jiashen Sun, Min Zhang, Shaoping Ma, and Xuan Zhu . 2015. Different users, different opinions: Predicting search satisfaction with mouse movement information. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 493--502.
Cheng Luo, Yiqun Liu, Tetsuya Sakai, Fan Zhang, Min Zhang, and Shaoping Ma . 2017. Evaluating Mobile Search with Height-Biased Gain. In International ACM SIGIR Conference on Research and Development in Information Retrieval. 435--444.
Christopher D Manning, Prabhakar Raghavan, Hinrich Schütze, et almbox. . 2008. Introduction to information retrieval. Vol. Vol. 1. Cambridge university press Cambridge.
Jiaxin Mao, Yiqun Liu, Ke Zhou, Jian-Yun Nie, Jingtao Song, Min Zhang, Shaoping Ma, Jiashen Sun, and Hengliang Luo . 2016. When does Relevance Mean Usefulness and User Satisfaction in Web Search? Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 463--472.
Alistair Moffat, Paul Thomas, and Falk Scholer . 2013. Users versus models: What observation tells us about effectiveness metrics Proceedings of the 22nd ACM international conference on Information & Knowledge Management. ACM, 659--668.
Alistair Moffat and Justin Zobel . 2008. Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information Systems (TOIS) Vol. 27, 1 (2008), 2.
Neil O'Hare, Paloma de Juan, Rossano Schifanella, Yunlong He, Dawei Yin, and Yi Chang . 2016. Leveraging user interaction signals for web image search Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 559--568.
Jose San Pedro and Stefan Siersdorfer . 2009. Ranking and classifying attractiveness of photos in folksonomies International Conference on World Wide Web. 771--780.
Hsiao Tieh Pu . 2013. A comparative analysis of web image and textual queries. Online Information Review Vol. 29, 5 (2013), 457--467.
Mark Sanderson . 2010. Performance measures used in image information retrieval. In ImageCLEF. Springer, 81--94.
Mark Sanderson, Monica Lestari Paramita, Paul Clough, and Evangelos Kanoulas . 2010. Do user preferences and evaluation measures line up? Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 555--562.
Tefko Saracevic . 1975. Relevance: A review of and a framework for the thinking on the notion in information science. Journal of the Association for Information Science and Technology Vol. 26, 6 (1975), 321--343.
Anne Schuth, Katja Hofmann, and Filip Radlinski . 2015. Predicting search satisfaction metrics with interleaved comparisons Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 463--472.
Mark D Smucker and Charles LA Clarke . 2012. Time-based calibration of effectiveness measures. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. ACM, 95--104.
Louise T. Su . 1992. Evaluation measures for interactive information retrieval. Pergamon Press, Inc. 503--516 pages.
Reinier H van Leuken, Lluis Garcia, Ximena Olivares, and Roelof van Zwol . 2009. Visual diversification of image search results. In Proceedings of the 18th international conference on World wide web. ACM, 341--350.
Hongning Wang, Yang Song, Ming Wei Chang, Xiaodong He, Ahmed Hassan, and Ryen W. White . 2014. Modeling action-level satisfaction for search task satisfaction prediction. ACM. 123--132 pages.
Kyle Williams, Julia Kiseleva, Aidan C. Crook, Imed Zitouni, Ahmed Hassan Awadallah, and Madian Khabsa . 2016. Is This Your Final Answer?: Evaluating the Effect of Answers on Good Abandonment in Mobile Search. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '16). ACM, New York, NY, USA, 889--892.
Xiaohui Xie, Yiqun Liu, Maarten De Rijke, Jiyin He, Min Zhang, and Shaoping Ma . 2018. Why People Search for Images using Web Search Engines. WSDM'18 (2018).
Xiaohui Xie, Yiqun Liu, Xiaochuan Wang, Meng Wang, Zhijing Wu, Yingying Wu, Min Zhang, and Shaoping Ma . 2017. Investigating Examination Behavior of Image Search Users Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '17). ACM, New York, NY, USA, 275--284.
Emine Yilmaz, Manisha Verma, Nick Craswell, Filip Radlinski, and Peter Bailey . 2014. Relevance and effort: An analysis of document utility Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. ACM, 91--100.

Cited By

View all
  • (2024)Online and Offline Evaluation in Search ClarificationACM Transactions on Information Systems10.1145/368178643:1(1-30)Online publication date: 4-Nov-2024
  • (2023)An Intent Taxonomy of Legal Case RetrievalACM Transactions on Information Systems10.1145/362609342:2(1-27)Online publication date: 29-Sep-2023
  • (2023)A is for Adele: An Offline Evaluation Metric for Instant SearchProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605115(3-12)Online publication date: 9-Aug-2023
  • Show More Cited By

Index Terms

  1. How Well do Offline and Online Evaluation Metrics Measure User Satisfaction in Web Image Search?



      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors


      Published In

      cover image ACM Conferences
      SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval
      June 2018
      1509 pages
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]



      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 June 2018


      Request permissions for this article.

      Check for updates

      Author Tags

      1. evaluation metrics
      2. user satisfaction
      3. web image search


      • Research-article

      Funding Sources

      • National Key Basic Research Program
      • Natural Science Foundation of China


      SIGIR '18


      Other Metrics

      Bibliometrics & Citations


      Article Metrics

      • Downloads (Last 12 months)18
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 25 Feb 2025

      Other Metrics


      Cited By

      View all
      • (2024)Online and Offline Evaluation in Search ClarificationACM Transactions on Information Systems10.1145/368178643:1(1-30)Online publication date: 4-Nov-2024
      • (2023)An Intent Taxonomy of Legal Case RetrievalACM Transactions on Information Systems10.1145/362609342:2(1-27)Online publication date: 29-Sep-2023
      • (2023)A is for Adele: An Offline Evaluation Metric for Instant SearchProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605115(3-12)Online publication date: 9-Aug-2023
      • (2023)Composed Image Retrieval via Cross Relation Network With Hierarchical Aggregation TransformerIEEE Transactions on Image Processing10.1109/TIP.2023.329979132(4543-4554)Online publication date: 2023
      • (2023)Constructing and meta-evaluating state-aware evaluation metrics for interactive search systemsInformation Retrieval Journal10.1007/s10791-023-09426-126:1-2Online publication date: 31-Oct-2023
      • (2023)Formally Modeling Users in Information RetrievalA Behavioral Economics Approach to Interactive Information Retrieval10.1007/978-3-031-23229-9_2(23-64)Online publication date: 18-Feb-2023
      • (2021)Drug Discovery as a Recommendation Problem: Challenges and Complexities in Biological DecisionsProceedings of the 15th ACM Conference on Recommender Systems10.1145/3460231.3474598(548-550)Online publication date: 13-Sep-2021
      • (2021)On the Value of Wikipedia as a Gateway to the WebProceedings of the Web Conference 202110.1145/3442381.3450136(249-260)Online publication date: 19-Apr-2021
      • (2021)Hybrid Research on Relevance Judgment and Eye Movement for Reverse Image SearchDiversity, Divergence, Dialogue10.1007/978-3-030-71292-1_19(211-228)Online publication date: 19-Mar-2021
      • (2020)Modeling User Behavior for Vertical Search: Images, Apps and ProductsProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401423(2440-2443)Online publication date: 25-Jul-2020
      • Show More Cited By

      View Options

      Login options

      View options


      View or Download as a PDF file.



      View online with eReader.







      Share this Publication link

      Share on social media