Uplift Modeling

Code relating to uplift modeling

Download data to ./datasets/ -http://ailab.criteo.com/criteo-uplift-prediction-dataset/ -https://blog.minethatdata.com/2008/03/minethatdata-e-mail-analytics-and-data.html -https://isps.yale.edu/research/data/d001 -https://github.com/joshxinjie/Data_Scientist_Nanodegree/tree/master/starbucks_portfolio_exercise -https://zenodo.org/record/3653141
Extract data to .csv-file.
Install python requirements listed in requirements.txt (some might be unnecessary).
Create load_data.DatasetCollection object following instructions in load_data.py
Pick model from models, train and predict
Evaluate performance using uplift_metrics.UpliftMetrics class.

./data/ contains files related to data wrangling, with the class load_data.DatasetCollection being at the heart of everything.
./experiments/ contains code tying together data wrangling, models, and metrics into experiments run. Some of the results are published.
./metrics/ contains code for estimating metrics with the class uplift_metrics.UpliftMetrics estimating the most commonly used metrics in uplift modeling.
./models/ contains a wide selection of different uplift models, some published, some unpublished.
./slurm/ contains code for running parallel experiments on a cluster using the slurm workload manager
./tests/ contains a few tests with the most important one being a test for the metrics package.

Publications

The following publications are based on the code in this repository:

"Uplift Modeling with High Class Imbalance" by Otto Nyberg, Tomasz Kuśmierczyk, and Arto Klami. Asian Conference on Machine Learning, PMLR, 2021. https://proceedings.mlr.press/v157/nyberg21a
"Exploring Uplift Modeling with High Class Imbalance" by Otto Nyberg and Arto Klami. Data Mining and Knowledge Discovery, 2023. https://www.springer.com/journal/10618
"Quantifying Uncertainty of Uplift: Trees and T-learners" by Otto Nyberg and Arto Klami. Neurocomputing, 2024. https://doi.org/10.1016/j.neucom.2024.127741

Running the experiments for "Quantifying Uncertainty of Uplift: Trees and T-learners"

As the environment required for this experiment contains a mix of old an new packages, everything is kept separately zipped at ./experiments/uncertainty_experiments.zip. Unzip and follow the instructions in the accompanying readme-file to run the experiment.

Running the experiments for "Exploring Uplift Modeling with High Class Imbalance"

Store appropriate datasets to ./datasets/ in csv-format
Change rate between treated and untreated if desired (e.g. Criteo-uplift 2 was used with 1:1 treated to untreated ratio)
For split undersampling experiments, run 'python -m experiments.run_crf_experiment [dataset file] ./datasets/ 1.0 [model] [k_t] [k_c]' with appropriate parameters, e.g.

python -m experiments.run_crf_experiment starbucks.csv ./datasets/ 1.0 uplift_dc 2 16

Here model can be uplift_dc or uplift_rf (the only ones tested compatible with split undersampling), and k_t and k_c can take values larger or equal to 1.

For the other undersampling experiments, run 'python -m experiments.undersampling_experiments [dataset] [undersampling scheme] [model] [calibration method] [output file] [k-values] [positive rate]', e.g.

python -m experiments.undersampling_experiments starbucks naive_undersampling dc_lr isotonic results.csv 8 1

This will run a double classifier (a.k.a T-learner) with logistic regression as base learner with naive undersampling with k=8 and tau-isotonic regression for calibration on the starbucks-dataset.

Running the experiments for "Uplift Modeling with High Class Imbalance"

Store appropriate datasets to ./dataset/ in csv-format
Run python -m data.pickle_dataset to prepare data appropriately. This normalizes the data and prepares training, validation, and testing sets and creates a new label by running the class-variable transformation. Be patient. The Criteo-uplift 1 dataset is large and we recommend reserving 120GB of RAM for this step. We ran this 10 times to get 10 differently randomized datasets.
Run undersampling experiments by running undersampling_experiments.py with suitable parameters, e.g. python -m experiments.split_undersampling ./datasets/criteo-uplift.csv123.gz cvt 1,200,10 (replace '123' with whatever your file is named, 'cvt' refers to class-variable transformation, '1,600,10' indicates "test k from 1 to 600 with a step of 10"). Note that the last print section shows the testing set metrics for the best model.
Run isotonic regression experiments, e.g. python -m experiments.isotonic_regression_for_calibration ./datasets/criteo-uplift.csv123.gz dclr 3 (replace '123' with your dataset file, 'dclr' refers to double-classifier with logistic regression, '3' refers to k=3).
Results are printed to screen and stored in uplift_results.csv. Look for rows with 'Test description' set to 'testing set'.

The alternative models for both undersampling and isotonic regression experiments are

'dc' (or 'dclr'): double-classifier with logistic regression
'dcrf': double-classifier with random forest
'cvt' (or 'cvtlr'): class-variable transformation with logistic regression
'cvtrf': class-variable transformation with random forest

In the paper, we created 10 randomized data sets, ran the code 10 times and averaged the results. For visualizations, use function plot_uplift_curve() in uplift_metrics.py.

Notes

The code is mostly written by Otto Nyberg as part of my work on my dissertation at the University of Helsinki. Tomasz Kusmierczyk has contributed to the code and Arto Klami has provided useful feedback. The file experiments/dirichlet_gp.py requires an exotic environment (old version of python, ancient version of tensorflow etc.). Details for that might be provided later.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Uplift Modeling

Publications

Running the experiments for "Quantifying Uncertainty of Uplift: Trees and T-learners"

Running the experiments for "Exploring Uplift Modeling with High Class Imbalance"

Running the experiments for "Uplift Modeling with High Class Imbalance"

Notes

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
data		data
experiments		experiments
metrics		metrics
models		models
slurm		slurm
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Trinli/uplift_modeling

Folders and files

Latest commit

History

Repository files navigation

Uplift Modeling

Publications

Running the experiments for "Quantifying Uncertainty of Uplift: Trees and T-learners"

Running the experiments for "Exploring Uplift Modeling with High Class Imbalance"

Running the experiments for "Uplift Modeling with High Class Imbalance"

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages