8000 Revert "Revert "Merge pull request #3 from carleondel/module_3"" by pabaldonedo · Pull Request #4 · carleondel/zrive-ds · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Revert "Revert "Merge pull request #3 from carleondel/module_3"" #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pabaldonedo
Copy link
Collaborator

This reverts commit b7d0db0.

Copy link
Collaborator Author
@pabaldonedo pabaldonedo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice job! I left some minor comments. It is clear you have got a lot of learnings from Guille's solution and applied them successfully.
I want to highlight the fact that you have stated the insights and actions, after most cells, which is important.
Keep it up like this!




As we saw during our Exploratory Data Analysis, there is a strong temporal evolution in the data reflecting th eevolution of the underlying business. Therefore we cannot assume that the user base nor the purchasing dynamics are the same across it.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really good that you explain this

Train since: 2020-10-05
Train until: 2021-02-04
Val until: 2021-02-22
Test until: 2021-03-03
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any concerns / caveats with these numbers?



```python
def plot_metrics(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good you start by defining metric AND defining a method

```python
def plot_metrics(
model_name:str, y_pred:pd.Series, y_test:pd.Series, target_precision:float=0.05,
figure:Tuple[matplotlib.figure.Figure, np.array]=None
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type hinting for figure shoulde be Optional[Tuple[matplotlib.figure.Figure, np.array]]

Also with type hinting there are spaces around : and =


```python
def plot_metrics(
model_name:str, y_pred:pd.Series, y_test:pd.Series, target_precision:float=0.05,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

target_precision?

```python
"""Our no skill precision-recall curve would be a horizontal line with
y = proportion of our minority class"""
no_skill = (train_df[label_col] ==1).sum() / len(train_df[label_col])
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also called prevalence

We could improve our model considering different thresholds (Lower thresholds making our model more permisive, since we are detecting very few positives).
This would result in a higher recall value but lower our precision due to having more false positives.

But we should be careful when choosing a more permisive model. Since the objective of our model is to send push notifications, maybe we should aim for a more conservative model where we control the false positive cases (each positive prediction = push notification sent)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good you keep in miind the busines goal, whih is our ultimate objective and propose an actio to be validated with business



### Insights
- For a large regularization, we have the same results as if we were predicting at random.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should double check it but I'd bet that the all the predictions have the same value so by changing our threshold, we either set all samples to 0 or to 1, thus we have two points in the plot and a straight line connecting them. So it is slightly different than random, but terrible either way




- We can be sure that we made an improvement with our final models compared to the baseline model. But not so much when compared to our first models trained with all variables.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good!


- Our AUC in our ROC curves are close to 1, meaning a good performance in our model. **But ROC Curves and ROC AUC can be optimistic on severely imbalanced classification problems with few samples of the minority class**

- (We can think of the ROC plot as the fraction of correct predictions for the positive class (y-axis) versus the fraction of errors for the negative class (x-axis). Ideally we only have a point in (0,1))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AUC represents the chance of picking a pair data points and the score positive > negative class of those two points

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0