Beanstalk Data and Processing Artifacts

This repo contains all the data and tools needed to process data for Beanstalk. The data/software is attached to both the above Zenodo DOI and Github Releases.

Dataset Structure

All assets are included as *.zip files in releases. Our dataset is organized as follows:

data-raw/
    beanstalk/          # raw data from cluster collected for our method
    baseline/           # raw data used for our 100% instrumented baseline

data/
    violations.json     # list of unique violations discovered
    beanstalk/          # data collected for our method
        indirect.npz    # different file for each benchmark
        ...
    baseline/           # data used for our 100% instrumented baseline
        ...

Each benchmark file in data has the following arrays:

t: uint32[N]: Execution time (in microseconds) of each run
device: uint8[N]: Device that each run was executed on
density: uint8[N]: Instrumentation density (%) for each run
bugs: uint8[N, ceil(b/8)] --> bool[N, b]: Packed bit array indicating whether each bug was discovered on each run. Unpack using np.unpackbits(arr, axis=1).
sites: uint32[b, 2]: Pair of code indices (in Wasm module) responsible for the bug

Processed Dataset Structure

The evaluation/processing scripts generate the following processed data:

summary/                # statistical summarized 'data' directory
    indirect.npz        # different file for each benchmark
    ...

simulations/
    # Compute budget simulations
    abl_density.npz     # Density ablation
    abl_device.npz      # Device ablation
    baseline.npz        # Homogeneous baseline ablation
    beanstalk.npz       # Beanstalk ablation (unconstrained instrumentation density)
    # Maximum instrumentation density simulation
    density.npz         # Beanstalk ablation (constrained instrumentation density)

figures/                # PDF figures generated for the paper

Generating Figures from Packaged Data

Extract {data,summary,simulations}.zip packaged in Zenodo/Github releases to the root directory of this repo.

To generate figures, run ./gen_figures.sh (ignore any runtime warnings). The output should mirror figures.zip .

Reproducing Results from Raw Data

The three zip files for data, summary, and simulations can be reproduced from the raw cluster data, following these steps:

Extract Raw Data from data-raw.zip to the root directory of the repo.
Data: Run ./gen_data.sh (approx. 2 min to run).
Summary: Run ./summarize.sh (approx. 2 min to run on GPU).
Simulations: Run ./run_simulations.sh (approx. 20 min to run 10000 replicates on GPU). If necessary, replicates can be configured with first argument to the script.
Generate figures as in previous section.

NOTES:

GPU support for JAX is recommended; CPU backends can be alternatively be used, but may take significantly longer to execute.

The manage.py script manages all data scripting (see -h option for more information) .

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
code_examples		code_examples
plot		plot
tools		tools
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
gen_data.sh		gen_data.sh
gen_figures.sh		gen_figures.sh
manage.py		manage.py
run_simulations.sh		run_simulations.sh
summarize.sh		summarize.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Beanstalk Data and Processing Artifacts

Dataset Structure

Processed Dataset Structure

Generating Figures from Packaged Data

Reproducing Results from Raw Data

About

Releases 3

Packages

Contributors 2

Languages

arjunr2/beanstalk

Folders and files

Latest commit

History

Repository files navigation

Beanstalk Data and Processing Artifacts

Dataset Structure

Processed Dataset Structure

Generating Figures from Packaged Data

Reproducing Results from Raw Data

About

Resources

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Languages

Packages