-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Ilya Ploshchik edited this page Apr 14, 2022
·
4 revisions
Welcome to the 2dv50e wiki!
4 datasets are provided:
- Heart Disease
- Breast Cancer Wisconsin (Diagnostic)
- Pima Indian Diabetes
- Vehicle Silhouettes
Each dataset includes following files:
- dataset.csv - original csv file with all respective features
- target.csv - csv file with target class instances
-
topModels.csv - top 55 models (5 models per base learning algorithm)
Following base classifiers (with respective hyperparameters alternatives) are used:- K-Nearest Neighbor: {'n_neighbors': list(range(1, 25)), 'metric': ['chebyshev', 'manhattan', 'euclidean', 'minkowski'], 'algorithm': ['brute', 'kd_tree', 'ball_tree'], 'weights': ['uniform', 'distance']}
- Support Vector Machine: {'C': list(np.arange(0.1,4.43,0.11)), 'kernel': ['rbf','linear', 'poly', 'sigmoid']}
- Gaussian Naive Bayes: {'var_smoothing': list(np.arange(0.00000000001,0.0000001,0.0000000002))}
- Multilayer Perceptron: {'alpha': list(np.arange(0.00001,0.001,0.0002)), 'tol': list(np.arange(0.00001,0.001,0.0004)), 'max_iter': list(np.arange(100,200,100)), 'activation': ['relu', 'identity', 'logistic', 'tanh'], 'solver' : ['adam', 'sgd']}
- Logistic Regression: {'C': list(np.arange(0.5,2,0.075)), 'max_iter': list(np.arange(50,250,50)), 'solver': ['lbfgs', 'newton-cg', 'sag', 'saga'], 'penalty': ['l2', 'none']}
- Linear Discriminant Analysis: {'shrinkage': list(np.arange(0,1,0.01)), 'solver': ['lsqr', 'eigen']}
- Quadratic Discriminant Analysis: {'reg_param': list(np.arange(0,1,0.02)), 'tol': list(np.arange(0.00001,0.001,0.0002))}
- Random Forests: {'n_estimators': list(range(60, 140)), 'criterion': ['gini', 'entropy']}
- Extra Trees: {'n_estimators': list(range(60, 140)), 'criterion': ['gini', 'entropy']}
- Adaptive Boosting: {'n_estimators': list(range(40, 80)), 'learning_rate': list(np.arange(0.1,2.3,1.1)), 'algorithm': ['SAMME.R', 'SAMME']}
- Gradient Boosting: {'n_estimators': list(range(85, 115)), 'learning_rate': list(np.arange(0.01,0.23,0.11)), 'criterion': ['friedman_mse', 'mse', 'mae']}
Each instance (row) represents one model with model_id, algorthm id, all calculated metrics and overall performance. Overall performance is calculated as a single average of all 8 metrics. Column "params" identifies the hyperparameters, used for this particular model
- topModelsProbabilities.csv - csv file with class predictions for all 55 best models
each row represents class probabilities per instance of target variable for every model