8000 Home · ilplo/MetaStackVis Wiki · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Ilya Ploshchik edited this page Apr 14, 2022 · 4 revisions

Welcome to the 2dv50e wiki!

Data

4 datasets are provided:

  1. Heart Disease
  2. Breast Cancer Wisconsin (Diagnostic)
  3. Pima Indian Diabetes
  4. Vehicle Silhouettes

Dataset structure

Each dataset includes following files:

  • dataset.csv - original csv file with all respective features
  • target.csv - csv file with target class instances
  • topModels.csv - top 55 models (5 models per base learning algorithm)
    Following base classifiers (with respective hyperparameters alternatives) are used:
    • K-Nearest Neighbor: {'n_neighbors': list(range(1, 25)), 'metric': ['chebyshev', 'manhattan', 'euclidean', 'minkowski'], 'algorithm': ['brute', 'kd_tree', 'ball_tree'], 'weights': ['uniform', 'distance']}
    • Support Vector Machine: {'C': list(np.arange(0.1,4.43,0.11)), 'kernel': ['rbf','linear', 'poly', 'sigmoid']}
    • Gaussian Naive Bayes: {'var_smoothing': list(np.arange(0.00000000001,0.0000001,0.0000000002))}
    • Multilayer Perceptron: {'alpha': list(np.arange(0.00001,0.001,0.0002)), 'tol': list(np.arange(0.00001,0.001,0.0004)), 'max_iter': list(np.arange(100,200,100)), 'activation': ['relu', 'identity', 'logistic', 'tanh'], 'solver' : ['adam', 'sgd']}
    • Logistic Regression: {'C': list(np.arange(0.5,2,0.075)), 'max_iter': list(np.arange(50,250,50)), 'solver': ['lbfgs', 'newton-cg', 'sag', 'saga'], 'penalty': ['l2', 'none']}
    • Linear Discriminant Analysis: {'shrinkage': list(np.arange(0,1,0.01)), 'solver': ['lsqr', 'eigen']}
    • Quadratic Discriminant Analysis: {'reg_param': list(np.arange(0,1,0.02)), 'tol': list(np.arange(0.00001,0.001,0.0002))}
    • Random Forests: {'n_estimators': list(range(60, 140)), 'criterion': ['gini', 'entropy']}
    • Extra Trees: {'n_estimators': list(range(60, 140)), 'criterion': ['gini', 'entropy']}
    • Adaptive Boosting: {'n_estimators': list(range(40, 80)), 'learning_rate': list(np.arange(0.1,2.3,1.1)), 'algorithm': ['SAMME.R', 'SAMME']}
    • Gradient Boosting: {'n_estimators': list(range(85, 115)), 'learning_rate': list(np.arange(0.01,0.23,0.11)), 'criterion': ['friedman_mse', 'mse', 'mae']}

Each instance (row) represents one model with model_id, algorthm id, all calculated metrics and overall performance. Overall performance is calculated as a single average of all 8 metrics. Column "params" identifies the hyperparameters, used for this particular model

  • topModelsProbabilities.csv - csv file with class predictions for all 55 best models

each row represents class probabilities per instance of target variable for every model

Clone this wiki locally
0