[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2810103.2813614acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article
Public Access

Sunlight: Fine-grained Targeting Detection at Scale with Statistical Confidence

Published: 12 October 2015 Publication History

Abstract

We present Sunlight, a system that detects the causes of targeting phenomena on the web -- such as personalized advertisements, recommendations, or content -- at large scale and with solid statistical confidence. Today's web is growing increasingly complex and impenetrable as myriad of services collect, analyze, use, and exchange users' personal information. No one can tell who has what data, for what purposes they are using it, and how those uses affect the users. The few studies that exist reveal problematic effects -- such as discriminatory pricing and advertising -- but they are either too small-scale to generalize or lack formal assessments of confidence in the results, making them difficult to trust or interpret. Sunlight brings a principled and scalable methodology to personal data measurements by adapting well-established methods from statistics for the specific problem of targeting detection. Our methodology formally separates different operations into four key phases: scalable hypothesis generation, interpretable hypothesis formation, statistical significance testing, and multiple testing correction. Each phase bears instantiations from multiple mechanisms from statistics, each making different assumptions and tradeoffs. Sunlight offers a modular design that allows exploration of this vast design space. We explore a portion of this space, thoroughly evaluating the tradeoffs both analytically and experimentally. Our exploration reveals subtle tensions between scalability and confidence. Sunlight's default functioning strikes a balance to provide the first system that can diagnose targeting at fine granularity, at scale, and with solid statistical justification of its results.
We showcase our system by running two measurement studies of targeting on the web, both the largest of their kind. Our studies -- about ad targeting in Gmail and on the web -- reveal statistically justifiable evidence that contradicts two Google statements regarding the lack of targeting on sensitive and prohibited topics.

References

[1]
AdBlockPlus.small https://adblockplus.org/, 2015.
[2]
Barford, P., Canadi, I., Krushevskaja, D., Ma, Q., and Muthukrishnan, S. Adscape: Harvesting and Analyzing Online Display Ads. WWW (Apr. 2014).
[3]
Benjamini, Y., and Yekutieli, D. The control of the false discovery rate in multiple testing under dependency. Annals of statistics (2001), 1165--1188.
[4]
Bickel, P. J., Ritov, Y., and Tsybakov, A. B. Simultaneous analysis of lasso and dantzig selector. Ann. Statist. 37, 4 (08 2009), 1705--1732.
[5]
Bodik, P., Goldszmidt, M., Fox, A., Woodard, D. B., and Andersen, H. Fingerprinting the datacenter: Automated classification of performance crises. In European Conference on Computer Systems (2010).
[6]
Book, T., and Wallach, D. S. An Empirical Study of Mobile Ad Targeting. arXiv.org (2015).
[7]
Brandeis, L. What Publicity Can Do. Harper's Weekly (Dec. 1913).
[8]
Datta, A., Tschantz, M. C., and Datta, A. Automated Experiments on Ad Privacy Settings. In Proceedings of Privacy Enhancing Technologies (2015).
[9]
Donoho, D. L. Compressed sensing. IEEE Transactions on Information Theory 52, 4 (2006), 1289--1306.
[10]
Dudoit, S., and van der Laan, M. Multiple testing procedures with applications to genomics. Springer, 2008.
[11]
Feldman, V. Optimal hardness results for maximizing agreement with monomials. SIAM Journal on Computing 39, 2 (2009), 606--645.
[12]
Google. AdSense policy.small https://support.google.com/adsense/answer/3016459?hl=en, 2015.
[13]
Google. AdWords policy.small https://support.google.com/adwordspolicy/answer/6008942?hl=en, 2015.
[14]
Gretton, A., Bousquet, O., Smola, A., and Schölkopf, B. Measuring statistical dependence with Hilbert-Schmidt norms. In Algorithmic Learning Theory (2005).
[15]
Hannak, A., Sapiezynski, P., Kakhki, A. M., Krishnamurthy, B., Lazer, D., Mislove, A., and Wilson, C. Measuring personalization of web search. In WWW (May 2013).
[16]
Hannak, A., Soeller, G., Lazer, D., Mislove, A., and Wilson, C. Measuring Price Discrimination and Steering on E-commerce Web Sites. In IMC (Nov. 2014).
[17]
Holm, S. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 2 (1979), 65--70.
[18]
Lécuyer, M., Ducoffe, G., Lan, F., Papancea, A., Petsios, T., Spahn, R., Chaintreau, A., and Geambasu, R. XRay: Enhancing the Web's Transparency with Differential Correlation. 23rd USENIX Security Symposium (USENIX Security 14) (2014).
[19]
Liu, B., Sheth, A., Weinsberg, U., Chandrashekar, J., and Govindan, R. AdReveal: improving transparency into online targeted advertising. In HotNets-XII (Nov. 2013).
[20]
Mikians, J., Gyarmati, L., Erramilli, V., and Laoutaris, N. Detecting price and search discrimination on the internet. In HotNets-XI: Proceedings of the 11th ACM Workshop on Hot Topics in Networks (Oct. 2012), ACM Request Permissions.
[21]
Mikians, J., Gyarmati, L., Erramilli, V., and Laoutaris, N. Crowd-assisted Search for Price Discrimination in E-Commerce: First results. arXiv.org (July 2013).
[22]
Nath, S. MAdScope: Characterizing Mobile In-App Targeted Ads. Proceedings of ACM Mobisys (2015).
[23]
Ng, A. Y. Feature selection, l1 vs. l2 regularization, and rotational invariance. In Proceedings of the Twenty-first International Conference on Machine Learning (2004).
[24]
Rubin, D. B. Estimating the causal effects of treatments in randomized and non-randomized studies. Journal of Educational Psychology 66 (1974), 688--701.
[25]
Selenium.small http://www.seleniumhq.org/, 2015.
[26]
Tibshirani, R. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B 58 (1994), 267--288.
[27]
Vissers, T., Nikiforakis, N., Bielova, N., and Joosen, W. Crying Wolf? On the Price Discrimination of Online Airline Tickets. Hot Topics in Privacy Enhancing Technologies (June 2014), 1--12.
[28]
Wu, T. T., Chen, Y. F., Hastie, T., Sobel, E., and Lange, K. Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics 25, 6 (2009), 714--721.
[29]
Xing, X., Meng, W., Doozan, D., Feamster, N., Lee, W., and Snoeren, A. C. Exposing Inconsistent Web Search Results with Bobble. In PAM '14: Proceedings of the Passive and Active Measurements Conference (2014).

Cited By

View all
  • (2024)Analyzing the (In)Accessibility of Online AdvertisementsProceedings of the 2024 ACM on Internet Measurement Conference10.1145/3646547.3688427(92-106)Online publication date: 4-Nov-2024
  • (2023)Collaborative Ad Transparency: Promises and Limitations2023 IEEE Symposium on Security and Privacy (SP)10.1109/SP46215.2023.10179448(2639-2657)Online publication date: May-2023
  • (2022)Cart-ology: Intercepting Targeted Advertising via Ad Network Identity EntanglementProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security10.1145/3548606.3560641(2401-2414)Online publication date: 7-Nov-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CCS '15: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security
October 2015
1750 pages
ISBN:9781450338325
DOI:10.1145/2810103
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. measurement
  2. privacy
  3. web transparency

Qualifiers

  • Research-article

Funding Sources

Conference

CCS'15
Sponsor:

Acceptance Rates

CCS '15 Paper Acceptance Rate 128 of 660 submissions, 19%;
Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

Upcoming Conference

CCS '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)80
  • Downloads (Last 6 weeks)6
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Analyzing the (In)Accessibility of Online AdvertisementsProceedings of the 2024 ACM on Internet Measurement Conference10.1145/3646547.3688427(92-106)Online publication date: 4-Nov-2024
  • (2023)Collaborative Ad Transparency: Promises and Limitations2023 IEEE Symposium on Security and Privacy (SP)10.1109/SP46215.2023.10179448(2639-2657)Online publication date: May-2023
  • (2022)Cart-ology: Intercepting Targeted Advertising via Ad Network Identity EntanglementProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security10.1145/3548606.3560641(2401-2414)Online publication date: 7-Nov-2022
  • (2021)OmniCrawl: Comprehensive Measurement of Web Tracking With Real Desktop and Mobile BrowsersProceedings on Privacy Enhancing Technologies10.2478/popets-2022-00122022:1(227-252)Online publication date: 20-Nov-2021
  • (2021)Polls, clickbait, and commemorative $2 billsProceedings of the 21st ACM Internet Measurement Conference10.1145/3487552.3487850(507-525)Online publication date: 2-Nov-2021
  • (2021)What Makes a “Bad” Ad? User Perceptions of Problematic Online AdvertisingProceedings of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411764.3445459(1-24)Online publication date: 6-May-2021
  • (2020)Stop tracking me Bro! Differential Tracking of User Demographics on Hyper-Partisan WebsitesProceedings of The Web Conference 202010.1145/3366423.3380221(1479-1490)Online publication date: 20-Apr-2020
  • (2020)Taking Data Out of Context to Hyper-Personalize AdsProceedings of the 2020 CHI Conference on Human Factors in Computing Systems10.1145/3313831.3376415(1-13)Online publication date: 21-Apr-2020
  • (2020)A Security Analysis of the Facebook Ad Library2020 IEEE Symposium on Security and Privacy (SP)10.1109/SP40000.2020.00084(661-678)Online publication date: May-2020
  • (2019)Predicting Voting Behavior Using Digital Trace DataSocial Science Computer Review10.1177/089443931988289639:5(862-883)Online publication date: 22-Oct-2019
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media