[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2983323.2983356acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Measuring Metrics

Published: 24 October 2016 Publication History

Abstract

You get what you measure, and you can't manage what you don't measure. Metrics are a powerful tool used in organizations to set goals, decide which new products and features should be released to customers, which new tests and experiments should be conducted, and how resources should be allocated. To a large extent, metrics drive the direction of an organization, and getting metrics 'right' is one of the most important and difficult problems an organization needs to solve. However, creating good metrics that capture long-term company goals is difficult. They try to capture abstract concepts such as success, delight, loyalty, engagement, life-time value, etc. How can one determine that a metric is a good one? Or, that one metric is better than another? In other words, how do we measure the quality of metrics? Can the evaluation process be automated so that anyone with an idea of a new metric can quickly evaluate it? In this paper we describe the metric evaluation system deployed at Bing, where we have been working on designing and improving metrics for over five years. We believe that by applying a data driven approach to metric evaluation we have been able to substantially improve our metrics and, as a result, ship better features and improve search experience for Bing's users.

References

[1]
Angrist, J. D. and Pischke, J-S. Mastering Metrics: The Path from Cause to Effect. 2014.
[2]
Blackburn, C. and Valerdi, R. Navigating the Metrics Landscape: An Introductory Literature Guide to Metric Selection, Implementation, & Decision Making. Conference on Systems Engineering Research, 2009.
[3]
Davis, J. Measuring Marketing: 103 Key Metrics Every Marketer Needs. 2006.
[4]
Deng, A., Shi, X. Data-Driven Metric Development for Online Controlled Experiment: Seven Lessons Learned. Conference on Knowledge Discovery and Data Mining, 2016.
[5]
Deng, A., Xu, Y., Kohavi, R. and Walker, T. Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data. Conference on Web Search and Data Mining, 2013.
[6]
Farris, P. W., Bendle, N. T., Pfeifer, P. E. and Reibstein, D. J. Marketing Metrics: The Definitive Guide to Measuring Marketing Performance. 2010.
[7]
Hassan, A., Jones, R. and Klinkner, K.L. Beyond DCG: user behavior as a predictor of a successful search. Conference on Web Search and Data Mining, 2010.
[8]
Hassan, A., Shi, X., Craswell, N. and Ramsey, B. Beyond Clicks: Query Reformulation as a Predictor of Search Satisfaction. Conference on Information and Knowledge Management, 2013.
[9]
Hauser, J. and Katz, G. Metrics: you are what you measure! European Management Journal, 1998.
[10]
Hohnhold, H., O'Brien, D., Tang, D. Focus on the Long-Term: It's better for Users and Business. Conference on Knowledge Discovery and Data Mining, 2015.
[11]
Jarvelin, K. and Kekalainen, J. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20(4), 422--446, 2002.
[12]
Jiang, J., Hassan, A., Shi, X. and White, R. Understanding and Predicting Graded Search Satisfaction. Conference on Web Search and Data Mining, 2015.
[13]
Kaplan, R. and Norton, D. The Balanced Scorecard - Measures that Drive Performance. Harvard Business Review, Vol. 70, Issue 1, pp. 71--80, 1992.
[14]
Kelly, D. and Teevan, J. Implicit feedback for inferring user preference: A bibliography. ACM SIGIR Forum, 37(2), pp. 18--28, 2003.
[15]
Kerr, S. On the folly of rewarding A, while hoping for B. Academy of Management Executive, Vol. 9, No. 1, pp. 7--14, 1995.
[16]
Kohavi, R., Longbotham, R., Sommerfield, D. and Henne, R. Controlled Experiments on the web: survey and practical guide. Data Mining and Knowledge Discovery journal, Vol 18(1), pp. 140--181, 2009.
[17]
Kohavi, R., Deng, A., Frasca, B., Longbotham, R., Walker, T. and Xu, Y. Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained. Conference on Knowledge Discovery and Data Mining, 2012.
[18]
Kohavi, R., Deng, A., Frasca, B., Walker, T., Xu, Y. and Pohlmann, N. Online Controlled Experiments at Large Scale. Conference on Knowledge Discovery and Data Mining, 2013.
[19]
Kohavi, R., Deng, A., Longbotham, R. and Xu, Y. Seven Rules of Thumb for Web Site Experimenters. Conference on Knowledge Discovery and Data Mining, 2014.
[20]
Li, J., Huffman, B. and Tokuda, A. Good Abandonment in Mobile and PC Internet Search. Special Group on Information Retrieval (SIGIR) Conference, 2009.
[21]
Li, P., Burges, C. and Wu, Q. Learning to Rank Using Classification and Gradient Boosting. Conference on Neural Information Processing Systems (NIPS), 2007.
[22]
Marr, B. Key Performance Indicators (KPI): The 75 measures every manager needs to know. 2012.
[23]
Ridgeway, V. F. Dysfunctional consequences of performance measurements. Administrative Science Quarterly, Vol.1, Issue 2, pp. 240--247, 1956.
[24]
Sadeghi, S., Blanco, R., Mika, P., Sanderson, M., Scholer, F., and Vallet, D. Predicting Re-Finding Activity and Difficulty. European Conference on Information Retrieval, 2015.
[25]
Schmenner, R.W., and Vollmann, T. E. Performance Measures: Gaps, False Alarms and the 'Usual Suspects'. International Journal of Operations and Production Management, Vol. 14, No. 12, pp. 58--69, 1994.
[26]
Tang, D., Agarwal, A., O'Brien, D. and Meyer, M. Overlapping Experiment Infrastructure: More, Better, Faster Experimentation. Conference on Knowledge Discovery and Data Mining, 2010.
[27]
Yi, X., Hong, L., Zhong, E., Liu, N. and Rajan, S. Beyond Clicks: Dwell Time for Personalization. ACM Conference on Recommender Systems, 2014.

Cited By

View all
  • (2024)Choosing a Proxy Metric from Past ExperimentsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671543(5803-5812)Online publication date: 25-Aug-2024
  • (2024)What Matters in a Measure? A Perspective from Large-Scale Search EvaluationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657845(282-292)Online publication date: 10-Jul-2024
  • (2024)Mean dependency length — a new metric for requirements qualityINCOSE International Symposium10.1002/iis2.1319334:1(1021-1035)Online publication date: 7-Sep-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
October 2016
2566 pages
ISBN:9781450340731
DOI:10.1145/2983323
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. a/b testing
  2. measurement
  3. online experimentation
  4. quality
  5. search metrics

Qualifiers

  • Research-article

Conference

CIKM'16
Sponsor:
CIKM'16: ACM Conference on Information and Knowledge Management
October 24 - 28, 2016
Indiana, Indianapolis, USA

Acceptance Rates

CIKM '16 Paper Acceptance Rate 160 of 701 submissions, 23%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)200
  • Downloads (Last 6 weeks)23
Reflects downloads up to 03 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Choosing a Proxy Metric from Past ExperimentsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671543(5803-5812)Online publication date: 25-Aug-2024
  • (2024)What Matters in a Measure? A Perspective from Large-Scale Search EvaluationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657845(282-292)Online publication date: 10-Jul-2024
  • (2024)Mean dependency length — a new metric for requirements qualityINCOSE International Symposium10.1002/iis2.1319334:1(1021-1035)Online publication date: 7-Sep-2024
  • (2023)Clustering-Based Imputation for Dropout Buyers in Large-Scale Online ExperimentationThe New England Journal of Statistics in Data Science10.51387/23-NEJSDS33(415-425)Online publication date: 24-May-2023
  • (2023)All about Sample-Size Calculations for A/B Testing: Novel Extensions & Practical GuideProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614779(3574-3583)Online publication date: 21-Oct-2023
  • (2023)A/B Integrations: 7 Lessons Learned from Enabling A/B Testing as a Product FeatureProceedings of the 45th International Conference on Software Engineering: Software Engineering in Practice10.1109/ICSE-SEIP58684.2023.00033(304-314)Online publication date: 17-May-2023
  • (2023)Detect and Interpret: Towards Operationalization of Automated User Experience EvaluationDesign, User Experience, and Usability10.1007/978-3-031-35702-2_6(82-100)Online publication date: 9-Jul-2023
  • (2022)Using Survival Models to Estimate User Engagement in Online ExperimentsProceedings of the ACM Web Conference 202210.1145/3485447.3512038(3186-3195)Online publication date: 25-Apr-2022
  • (2021)Towards Inclusive Software Engineering Through A/B Testing: A Case-Study at Windows2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)10.1109/ICSE-SEIP52600.2021.00027(180-187)Online publication date: May-2021
  • (2021)A Visual Analytics Interface for Formulating Evaluation Metrics of Multi-Dimensional Time-Series DataIEEE Access10.1109/ACCESS.2021.30986219(102783-102800)Online publication date: 2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media