Reproducible Data Science in Python using Renku

Description

The expectation of reproducibility in scientific work has been long established, and, increasingly, communities and funding sources are actually demanding it. Within the Python ecosystem, there are a variety of tools available to support reproducible data science, but choosing and using one is not always straightforward. In this tutorial, we will take a closer look at the concept of reproducibility, and, we will examine the technologies that provide building blocks and survey the landscape of tools. We spend the majority of the time looking at one solution in particular, Renku and work through an end-to-end scenario with it.

Set Up

There are several easy ways to set up an environment for working through the tutorial. The easiest is to use a hosted environment.

Hosted

Renkulab is a Renku environment hosted by SDSC. Follow these instructions to use Renkulab.
Alternatively, you can use a MyBinder Environment.

Local

If you wish to run the tutorial on your own computer, you can create an environment with conda or docker.

If you prefer to use something else (e.g., pipenv), you will need to ensure that git, git-lfs, curl, and node are installed in your environment, but you should be able to pip install the requirements.txt file for the rest.

Note for Windows users If you are on Windows, we recommend using one of the hosted environments, either renkulab or binder.

Schedule

Introduction (1h)
15 min	Background & Theory	Terminology, history, and philosophy of reproducibility
30 min	Building Blocks	Building blocks for achieving reproducibility
15 min	Tools	Survey of the current tool landscape

Break (10 min)

Hands-on with Renku (1h 30m)
30 min	Starting	Starting a project, importing data, building a workflow
30 min	Iterating	Updating code and data to improve analysis
30 min	Details and Reflection	What is the benefit? How much effort was it? How do we view, share, and reuse artifacts? How do things work under the covers?

Acknowledgements

Many thanks to Erica Moreira, Laura Levin-Gleba, and Maja Garbulinksa from the Harvard School of Public Health for their helpful comments and suggestions!

The icons used are from Icons8.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
docker		docker
hands-on		hands-on
images		images
presentation		presentation
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
Dockerfile.renku		Dockerfile.renku
LICENSE		LICENSE
README-conda.md		README-conda.md
README-docker.md		README-docker.md
README-renkulab.md		README-renkulab.md
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reproducible Data Science in Python using Renku

Description

Set Up

Hosted

Local

Schedule

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

tomgrainge/reproducible-data-science

Folders and files

Latest commit

History

Repository files navigation

Reproducible Data Science in Python using Renku

Description

Set Up

Hosted

Local

Schedule

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages