-
Notifications
You must be signed in to change notification settings - Fork 0
Revert "Revert "Merge pull request #3 from carleondel/module_3"" #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This reverts commit b7d0db0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice job! I left some minor comments. It is clear you have got a lot of learnings from Guille's solution and applied them successfully.
I want to highlight the fact that you have stated the insights and actions, after most cells, which is important.
Keep it up like this!
|
||
|
||
|
||
As we saw during our Exploratory Data Analysis, there is a strong temporal evolution in the data reflecting th eevolution of the underlying business. Therefore we cannot assume that the user base nor the purchasing dynamics are the same across it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really good that you explain this
Train since: 2020-10-05 | ||
Train until: 2021-02-04 | ||
Val until: 2021-02-22 | ||
Test until: 2021-03-03 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any concerns / caveats with these numbers?
|
||
|
||
```python | ||
def plot_metrics( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good you start by defining metric AND defining a method
```python | ||
def plot_metrics( | ||
model_name:str, y_pred:pd.Series, y_test:pd.Series, target_precision:float=0.05, | ||
figure:Tuple[matplotlib.figure.Figure, np.array]=None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type hinting for figure shoulde be Optional[Tuple[matplotlib.figure.Figure, np.array]]
Also with type hinting there are spaces around : and =
|
||
```python | ||
def plot_metrics( | ||
model_name:str, y_pred:pd.Series, y_test:pd.Series, target_precision:float=0.05, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
target_precision?
```python | ||
"""Our no skill precision-recall curve would be a horizontal line with | ||
y = proportion of our minority class""" | ||
no_skill = (train_df[label_col] ==1).sum() / len(train_df[label_col]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also called prevalence
We could improve our model considering different thresholds (Lower thresholds making our model more permisive, since we are detecting very few positives). | ||
This would result in a higher recall value but lower our precision due to having more false positives. | ||
|
||
But we should be careful when choosing a more permisive model. Since the objective of our model is to send push notifications, maybe we should aim for a more conservative model where we control the false positive cases (each positive prediction = push notification sent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good you keep in miind the busines goal, whih is our ultimate objective and propose an actio to be validated with business
|
||
|
||
### Insights | ||
- For a large regularization, we have the same results as if we were predicting at random. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should double check it but I'd bet that the all the predictions have the same value so by changing our threshold, we either set all samples to 0 or to 1, thus we have two points in the plot and a straight line connecting them. So it is slightly different than random, but terrible either way
|
||
|
||
|
||
- We can be sure that we made an improvement with our final models compared to the baseline model. But not so much when compared to our first models trained with all variables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good!
|
||
- Our AUC in our ROC curves are close to 1, meaning a good performance in our model. **But ROC Curves and ROC AUC can be optimistic on severely imbalanced classification problems with few samples of the minority class** | ||
|
||
- (We can think of the ROC plot as the fraction of correct predictions for the positive class (y-axis) versus the fraction of errors for the negative class (x-axis). Ideally we only have a point in (0,1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AUC represents the chance of picking a pair data points and the score positive > negative class of those two points
This reverts commit b7d0db0.