Collecting Data

To collect data for feature extraction and classifier, first run the customized version of OpenWPM. A sample demo.py file is given which crawls sites sampled from top-100K sites (top 1K and 9K from 1K-100K). The demo file checkpoints after 1000 sites to ensure the crawl is recoverable in case of a malfunction. See the README file included in the OpenWPM folder for more information.

Feature extraction

To extract features from the collected dataset, run the run.py file in Feature Extraction and Classfication folder. The feature extraction pipeline will extract the relevant features and construct the graph required for classification pipeline.

Classification

Before running the classification pipeline, conflict resolution for labels has to be run which makes sure that the labels for cookie nodes are propagated to same scripts writing first-party cookies on other sites which are not included in the ground truth. To run conflict resolution, first run find_commmon_scripts.py and then run identifier_cookies.py. Finally run the code in the jupyter notebook provided in the conflict resolution folder.

To run classification, run the classify.py file in the classification folder.

Name		8000 Name	Last commit message	Last commit date
Latest commit History 9 Commits
Breakage Analysis		Breakage Analysis
Data		Data
Feature Extraction and Classifier/code		Feature Extraction and Classifier/code
OpenWPM		OpenWPM
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Collecting Data

Feature extraction

Classification

About

Uh oh!

Releases

Packages

Uh oh!

Languages

cookiegraph/CookieGraph

Folders and files

Latest commit

History

Repository files navigation

Collecting Data

Feature extraction

Classification

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages