[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3394486.3403340acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open access

User Sentiment as a Success Metric: Persistent Biases Under Full Randomization

Published: 20 August 2020 Publication History

Abstract

We study user sentiment (reported via optional surveys) as a metric for fully randomized A/B tests. Both user-level covariates and treatment assignment can impact response propensity. We show that a simple mean comparison produces biased population level estimates and propose a set of consistent estimators for the average and local treatment effects on treated and respondent users. We show that our problem can be mapped onto the intersection of the missing data problem and observational causal inference, and we identify conditions under which consistent estimators exist. Finally, we evaluate the performance of estimators and find that more complicated models do not necessarily provide superior performance as long as models satisfy consistency criteria.

References

[1]
Eugene W. Anderson. 1998. Customer Satisfaction and Word of Mouth. Journal of Service Research 1, 1 (1998), 5--17. https://doi.org/10.1177/109467059800100102
[2]
Eugene W. Anderson and Mary W. Sullivan. 1993. The Antecedents and Consequences of Customer Satisfaction for Firms. Marketing Science 12, 2 (1993), 125--143. http://www.jstor.org/stable/184036
[3]
Heejung Bang and James M. Robins. 2005. Doubly Robust Estimation in Missing Data and Causal Inference Models. Biometrics 61, 4 (2005), 962--973. https://doi.org/10.1111/j.1541-0420.2005.00377.x arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1541-0420.2005.00377.x
[4]
J Michael Brick and Graham Kalton. 1996. Handling missing data in survey research. Statistical methods in medical research 5, 3 (1996), 215--238.
[5]
Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. 2018. Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal 21, 1 (2018), C1--C68. https://doi.org/10.1111/ectj.12097 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/ectj.12097
[6]
Edith D. de Leeuw. 2012. Counting and Measuring Online: The Quality of Internet Surveys. BMS: Bulletin of Sociological Methodology / Bulletin de Méthodologie Sociologique 114 (2012), 68--78. http://www.jstor.org/stable/24311411
[7]
Frederick F Reichheld. 2004. The One Number you Need to Grow. Harvard business review 81 (06 2004), 46--54, 124.
[8]
Anders Gustafsson, Michael Johnson, and Inger Roos. 2005. The Effects of Customer Satisfaction, Relationship Commitment Dimensions, and Triggers on Customer Retention. Journal of Marketing - J MARKETING 69 (10 2005), 210--218. https://doi.org/10.1509/jmkg.2005.69.4.210
[9]
Jens Hainmueller. 2012. Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies. Political Analysis 20, 1 (2012), 25--46. https://doi.org/10.1093/pan/mpr025
[10]
Guido W. Imbens and Donald B. Rubin. 2015. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press. https://doi.org/10.1017/CBO9781139025751
[11]
G. Kalton. 1983. Compensating for missing survey data. Survey Research Center, Institute for Social Research, the University of Michigan. https://books.google. com/books?id=vTpHAAAAMAAJ
[12]
Graham Kalton and Ismael Flores-Cervantes. 2003. Weighting methods. Journal of official statistics 19, 2 (2003), 81.
[13]
Joseph D. Y. Kang and Joseph L. Schafer. 2007. Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data. Statist. Sci. 22, 4 (11 2007), 523--539. https://doi.org/10.1214/07- STS227
[14]
Ron Kohavi, Alex Deng, Brian Frasca, Toby Walker, Ya Xu, and Nils Pohlmann. 2013. Online Controlled Experiments at Large Scale. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Chicago, Illinois, USA) (KDD '13). ACM, New York, NY, USA, 1168--1176. https: //doi.org/10.1145/2487575.2488217
[15]
Roderick J A Little and Donald B Rubin. 1986. Statistical Analysis with Missing Data. John Wiley & Sons, Inc., New York, NY, USA.
[16]
Neil A. Morgan, Eugene W. Anderson, and Vikas Mittal. 2005. Understanding Firms' Customer Satisfaction Information Usage. Journal of Marketing 69, 3 (2005), 131--151. http://www.jstor.org/stable/30162061
[17]
Hendrik Muller and Aaron Sedley. 2014. HaTS: Large-scale In-product Measurement of User Attitudes: Experiences with Happiness Tracking Surveys. In Proceedings of the 26th Australian Computer-Human Interaction Conference on Designing Futures: The Future of Design (Sydney, New South Wales, Australia) (OzCHI '14). ACM, New York, NY, USA, 308--315. https://doi.org/10.1145/2686612.2686656
[18]
J.H. Myers. 1999. Measuring Customer Satisfaction: Hot Buttons and Other Measurement Issues. American Marketing Association. https://books.google.com/ books?id=PdlmQgAACAAJ
[19]
Optimizely. 2019. Optimizely. https://www.optimizely.com/
[20]
Michal Ozery-Flato, Pierre Thodoroff, and Tal El-Hay. 2018. Adversarial Balancing for Causal Inference. arXiv e-prints, Article arXiv:1810.07406 (Oct 2018), arXiv:1810.07406 pages. arXiv:1810.07406 [cs.LG]
[21]
Judea Pearl. 2009. Causality: Models, Reasoning and Inference (2nd ed.). Cambridge University Press, New York, NY, USA.
[22]
Donald B. Rubin. 1976. Inference and Missing Data. Biometrika 63, 3 (1976), 581--592. http://www.jstor.org/stable/2335739
[23]
Xiaolin Shi, Somit Gupta, Pavel Dmitriev, and Xin Fu. 2019. Tutorial: Challenges, Best Practicesand Pitfalls in Evaluating Results of Online Controlled Experiments. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery Data Mining (Anchorage, AK, USA) (KDD '19). ACM, New York, NY, USA. https://sites.google.com/view/kdd2019-exp-evaluation/
[24]
Louisa H Smith and Tyler J VanderWeele. 2019. Bounding bias due to selection. Epidemiology (Cambridge, Mass.) 30, 4 (2019), 509.
[25]
Diane Tang, Ashish Agarwal, Deirdre O'Brien, and Mike Meyer. 2010. Overlapping Experiment Infrastructure: More, Better, Faster Experimentation. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Washington, DC, USA) (KDD '10). ACM, New York, NY, USA, 17--26. https://doi.org/10.1145/1835804.1835810
[26]
Eugene W. Anderson, Claes Fornell, and Roland T. Rust. 1997. Customer Satisfaction, Productivity, and Profitability: Differences Between Goods and Services. Marketing Science 16 (05 1997), 129--145. https://doi.org/10.1287/mksc.16.2.129
[27]
Xiaojing Wang, Jingang Miao, and Yunting Sun. 2019. A Python Library For Empirical Calibration. arXiv preprint arXiv:1906.11920 (2019).
[28]
Ya Xu, Nanyu Chen, Addrian Fernandez, Omar Sinno, and Anmol Bhasin. 2015. From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Sydney, NSW, Australia) (KDD '15). ACM, New York, NY, USA, 2227--2236. https://doi.org/10.1145/2783258.2788602

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
August 2020
3664 pages
ISBN:9781450379984
DOI:10.1145/3394486
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2020

Check for updates

Author Tags

  1. average treatment effect
  2. bias
  3. causal inference
  4. survey bias
  5. user sentiment

Qualifiers

  • Research-article

Conference

KDD '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 878
    Total Downloads
  • Downloads (Last 12 months)106
  • Downloads (Last 6 weeks)16
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media