8000 Revert "Revert "Merge pull request #3 from carleondel/module_3"" by pabaldonedo · Pull Request #4 · carleondel/zrive-ds · GitHub

More Web Proxy on the site http://driver.im/

Revert "Revert "Merge pull request #3 from carleondel/module_3"" #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

pabaldonedo wants to merge 1 commit into main from module_3

Collaborator

pabaldonedo commented

This reverts commit b7d0db0.


          Revert "Revert "Merge pull request #3 from carleondel/module_3""

5ab1b6d

This reverts commit b7d0db0.

pabaldonedo commented

View reviewed changes

Collaborator Author

pabaldonedo left a comment

Really nice job! I left some minor comments. It is clear you have got a lot of learnings from Guille's solution and applied them successfully.
I want to highlight the fact that you have stated the insights and actions, after most cells, which is important.
Keep it up like this!

src/module_3/exploration_models.md

    
              As we saw during our Exploratory Data Analysis, there is a strong temporal evolution in the data reflecting th eevolution of the underlying business. Therefore we cannot assume that the user base nor the purchasing dynamics are the same across it.

Collaborator Author

pabaldonedo

really good that you explain this

src/module_3/exploration_models.md

    
                  Train since: 2020-10-05

                  Train until: 2021-02-04

                  Val until: 2021-02-22

                  Test until: 2021-03-03

Collaborator Author

pabaldonedo

Any concerns / caveats with these numbers?

src/module_3/exploration_models.md

    
              ```python

              def plot_metrics(

Collaborator Author

pabaldonedo

good you start by defining metric AND defining a method

src/module_3/exploration_models.md

    
              ```python

              def plot_metrics(

                      model_name:str, y_pred:pd.Series, y_test:pd.Series, target_precision:float=0.05,

                      figure:Tuple[matplotlib.figure.Figure, np.array]=None

Collaborator Author

pabaldonedo

type hinting for figure shoulde be Optional[Tuple[matplotlib.figure.Figure, np.array]]

Also with type hinting there are spaces around : and =

src/module_3/exploration_models.md

    
              ```python

              def plot_metrics(

                      model_name:str, y_pred:pd.Series, y_test:pd.Series, target_precision:float=0.05,

Collaborator Author

pabaldonedo

target_precision?

src/module_3/exploration_models.md

    
              ```python

              """Our no skill precision-recall curve would be a horizontal line with 

              y = proportion of our minority class"""

              no_skill = (train_df[label_col] ==1).sum() / len(train_df[label_col])

Collaborator Author

pabaldonedo

also called prevalence

src/module_3/exploration_models.md

    
              We could improve our model considering different thresholds (Lower thresholds making our model more permisive, since we are detecting very few positives).

              This would result in a higher recall value but lower our precision due to having more false positives.

              But we should be careful when choosing a more permisive model. Since the objective of our model is to send push notifications, maybe we should aim for a more conservative model where we control the false positive cases (each positive prediction = push notification sent)

Collaborator Author

pabaldonedo

Very good you keep in miind the busines goal, whih is our ultimate objective and propose an actio to be validated with business

src/module_3/exploration_models.md

    
              ### Insights

              - For a large regularization, we have the same results as if we were predicting at random.

Collaborator Author

pabaldonedo

We should double check it but I'd bet that the all the predictions have the same value so by changing our threshold, we either set all samples to 0 or to 1, thus we have two points in the plot and a straight line connecting them. So it is slightly different than random, but terrible either way

src/module_3/exploration_models.md

    
              - We can be sure that we made an improvement with our final models compared to the baseline model. But not so much when compared to our first models trained with all variables.

Collaborator Author

pabaldonedo

Good!

src/module_3/exploration_models.md

    
              - Our AUC in our ROC curves are close to 1, meaning a good performance in our model. **But ROC Curves and ROC AUC can be optimistic on severely imbalanced classification problems with few samples of the minority class**

               - (We can think of the ROC plot as the fraction of correct predictions for the positive class (y-axis) versus the fraction of errors for the negative class (x-axis). Ideally we only have a point in (0,1))

Collaborator Author

pabaldonedo

AUC represents the chance of picking a pair data points and the score positive > negative class of those two points

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

0