Reproducible Experiment Platform (REP) REP is ipython-based environment for conducting data-driven research in a consistent and reproducible way. Main features: unified python wrapper for different ML libraries (wrappers follow extended scikit-learn interface) Sklearn TMVA XGBoost uBoost Theanets Pybrain Neurolab MatrixNet service(available to CERN) parallel training of classifiers on cluster classification/regression reports with plots interactive plots supported smart grid-search algorithms with parallel execution research versioning using git pluggable quality metrics for classification meta-algorithm design (aka 'rep-lego') REP is not trying to substitute scikit-learn, but extends it and provides better user experience. Howto examples To get started, look at the notebooks in /howto/ Notebooks can be viewed (not executed) online at nbviewer There are basic introductory notebooks (about python, IPython) and more advanced ones (about the REP itself) Examples code is written in python 2, but library is python 2 and python 3 compatible. Installation with Docker We provide the docker image with REP and all it's dependencies. It is a recommended way, specially if you're not experienced in python. install with Docker on Linux install with Docker on Mac and Windows Installation with bare hands However, if you want to install REP and all of its dependencies on your machine yourself, follow this manual: installing manually and running manually. Links documentation howto bugtracker gitter chat, troubleshooting API, contributing new estimator API, contributing new metric Tutorial based on Flavour of physics challenge If you use REP in research, please consider citing License Apache 2.0, library is open-source. Minimal examples REP wrappers are sklearn compatible: from rep.estimators import XGBoostClassifier, SklearnClassifier, TheanetsClassifier clf = XGBoostClassifier(n_estimators=300, eta=0.1).fit(trainX, trainY) probabilities = clf.predict_proba(testX) Beloved trick of kagglers is to run bagging over complex algorithms. This is how it is done in REP: from sklearn.ensemble import BaggingClassifier clf = BaggingClassifier(base_estimator=XGBoostClassifier(), n_estimators=10) # wrapping sklearn to REP wrapper clf = SklearnClassifier(clf) Another useful trick is to use folding instead of splitting data into train/test. This is specially useful when you're using some kind of complex stacking from rep.metaml import FoldingClassifier clf = FoldingClassifier(TheanetsClassifier(), n_folds=3) probabilities = clf.fit(X, y).predict_proba(X) In example above all data are splitted into 3 folds, and each fold is predicted by classifier which was trained on other 2 folds. Also REP classifiers provide report: report = clf.test_on(testX, testY) report.roc().plot() # plot ROC curve from rep.report.metrics import RocAuc # learning curves are useful when training GBDT! report.learning_curve(RocAuc(), steps=10) You can read about other REP tools (like smart distributed grid search, folding and factory) in documentation and howto examples.