Voting study application replication issue and solution #5

patrickvossler18 · 2021-06-29T17:10:13Z

Hello, I was attempting to replicate the voting study application in your paper, and I noticed an issue in the parse_data.R file that caused my replicated results not to match the results in the paper.

The issue has to do with the seed not being set in the parse_data.R file before sampling DF.nona:

rm(list = ls())
set.seed(1)
# the full dataset is available from
# https://github.com/gsbDBI/ExperimentData/tree/master/Mobilization/ProcessedData
data = read.csv("mobilization_no_unlisted 2.csv")

# W is intent to treat
# contact is received treatment

covariates = c("persons", "state", "county", "competiv",
               "st_sen", "st_hse", "newreg", "vote98",
               "vote00", "age", "female")
X = data[,which(names(data) %in% covariates)]
W = data$W
received_treatment = data$contact
Y = data$vote02

DF = data.frame(X, Y, W)

DF.nona = DF[!is.na(rowSums(DF)),]

idx.all = sample(c(sample(which(DF.nona$W == 0), sum(DF.nona$W) * 3/2), which(DF.nona$W == 1)))
DF.subset = DF.nona[idx.all,]

write.csv(DF.subset, "data_clean.csv", row.names = FALSE)

Additionally, I found this function useful for directly recreating the cleaned data.

make_data <- function(){
    temp_zip <- tempfile()
    temp <- tempfile()
    download.file("https://github.com/gsbDBI/ExperimentData/raw/master/Mobilization/ProcessedData/mobilization_no_unlisted.zip", temp_zip)
    unzip(zipfile = temp_zip, exdir = temp)
    data = read.csv(file.path(temp,"mobilization_no_unlisted.csv"))
    unlink(c(temp, temp_zip))
    covariates = c("persons", "state", "county", "competiv",
                   "st_sen", "st_hse", "newreg", "vote98",
                   "vote00", "age", "female")
    X = data[,which(names(data) %in% covariates)]
    W = data$W
    received_treatment = data$contact
    Y = data$vote02
    
    DF = data.frame(X, Y, W)
    
    DF.nona = DF[!is.na(rowSums(DF)),]
    
    idx.all = sample(c(sample(which(DF.nona$W == 0), sum(DF.nona$W) * 3/2), which(DF.nona$W == 1)))
    DF.subset = DF.nona[idx.all,]
}

Using this modified version of parse_data.R, I successfully replicated the results, except for the boosting MSE, but I think that is because of variability in the boosting algorithm?

Method	Reported MSE	Replicated MSE
Boosting	0.00079	0.00123
Lasso	0.00047	0.00047
Single Lasso	0.0006	0.00061
BART	0.00409	0.00405
var(tau)	0.01615	0.016

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Voting study application replication issue and solution #5

Voting study application replication issue and solution #5

Voting study application replication issue and solution #5

Voting study application replication issue and solution #5

Comments