8000 GitHub - tomgrainge/reproducible-data-science: Repository for the Reproducible Data Science tutorial
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

tomgrainge/reproducible-data-science

 
 

Repository files navigation

Reproducible Data Science in Python using Renku

Description

The expectation of reproducibility in scientific work has been long established, and, increasingly, communities and funding sources are actually demanding it. Within the Python ecosystem, there are a variety of tools available to support reproducible data science, but choosing and using one is not always straightforward. In this tutorial, we will take a closer look at the concept of reproducibility, and, we will examine the technologies that provide building blocks and survey the landscape of tools. We spend the majority of the time looking at one solution in particular, Renku and work through an end-to-end scenario with it.

Set Up

There are several easy ways to set up an environment for working through the tutorial. The easiest is to use a hosted environment.

Hosted

Binder

Local

If you wish to run the tutorial on your own computer, you can create an environment with conda or docker.

If you prefer to use something else (e.g., pipenv), you will need to ensure that git, git-lfs, curl, and node are installed in your environment, but you should be able to pip install the requirements.txt file for the rest.

Note for Windows users If you are on Windows, we recommend using one of the hosted environments, either renkulab or binder.

Schedule

Introduction (1h)
15 min Background & Theory Terminology, history, and philosophy of reproducibility
30 min Building Blocks Building blocks for achieving reproducibility
15 min Tools Survey of the current tool landscape
Break (10 min)
Hands-on with Renku (1h 30m)
30 min Starting Starting a project, importing data, building a workflow
30 min Iterating Updating code and data to improve analysis
30 min Details and Reflection What is the benefit? How much effort was it? How do we view, share, and reuse artifacts? How do things work under the covers?

Acknowledgements

Many thanks to Erica Moreira, Laura Levin-Gleba, and Maja Garbulinksa from the Harvard School of Public Health for their helpful comments and suggestions!

The icons used are from Icons8.

About

Repository for the Reproducible Data Science tutorial

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.1%
  • Python 0.9%
0