Repository aggregates full-cycle machine learning scripts of binary classificator model training, testing and deployment in case with unbalanced dirty real data.
Check-points:
- Exploratory data analysis (With pretty Plotly visualisations)
- Cleaning data
- Imputation NAs
- Exploratory modelling and specification choice
- MinMax normalizing and RFECV
- GridSearchCV for hyperparameter tuning
- Different models training with quality metrics representation on StratifiedCV
- Best estimator predicting on test data
- Feature Importance, ROC_AUC curve, quality metrics, confusion matrix
- Serializing and deserializing model objects for easy deploy