-
Notifications
You must be signed in to change notification settings - Fork 32
Add threshold weighted scores #609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add threshold weighted scores #609
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor feedback.
In included.md:
- Capitalisation suggestions (for consistency with other table entries)
- The four entries don't appear to be in alphabetical order
In the docstrings, the references don't appear to follow APA7.
Edit: Re. API docstrings rendering in readthedocs - I think you might need two back ticks (not just one) around function names. (You can check when it is put into branch 228, or if you build it locally in readthedocs).
Further Edit: it looks like sometimes you need one back tick and sometimes you need two back ticks. It looks like @nikeethr used a mix (depending on use case) in FSS API docstrings, and the FSS API docstrings appear to be rendering nicely in readthedocs.
I haven't reviewed the tests yet, but so I don't neglect to mention it later, I will be looking to see whether there are some consistency checks against unweighted scores when the weights are set neutrally. |
Co-authored-by: Stephanie Chong <168800785+Steph-Chong@users.noreply.github.com> Signed-off-by: Nicholas Loveday <48701367+nicholasloveday@users.noreply.github.com>
You will find these consistency tests in |
I couldn't work out how to leave these as code comments. Some very minor feedback about the tutorial header levels (as they currently appear in branch 228 - I didn't rebuild the docs, so apologies if these have already been addressed):
|
In the "Other Threshold Weighted Scores" section, the first sentence says: "In this tutorial, we demonstrated the threshold weighted square error (or twMSE) function." I think there may be a missing "mean" - as in "threshold weighted mean square error (or twMSE) function." |
Thanks @Steph-Chong for those suggestions. I've made some updates that address your feedback. |
I think that the issue isn't the environment. When I run the notebooks in VS code they don't render the figures when I build the docs. When I run the notebooks in a a Jupyter notebook in a web browser, then they do render when I build the docs! |
I am making comments about the tutorial here, since the file is too large for inline comments through github code review. Regarding the tutorial, it notes that MSE can be interpreted as putting equal weight on all thresholds. Since it squares the error, it puts exponentially more weight on higher thresholds (since multiplicative biases at higher thresholds will consequentially have higher absolute errors which are then squared), so I think this isn't quite right to say. Also, the previous version noted the issue with MSE potentially obscuring certain things. |
With rectangular weighting, it's unclear if multiple segments/intervals could be supplied, e.g. the user might want to weight anything <= 3 and also anything >= 3 as "1", and "0" between -3 and positive 3. I'm assuming this is currently not possible, but the tutorial could do a better job of explaining what an "interval" is here. |
The trapezoidal weighting example given in the paragraph "Trapezoidal Weighting" isn't the same as the one presented in code in the example. Consider adding a plot to the "Trapezoidal Weighting" section to show what is described in the text, or consider adjusting the text description to match the later code and say "see below for an example" |
One option here would be to replace the threshold_where_one with a callable which returned '1' when 1 and 0 when 0. The nice thing about the current approach is that it's simpler for non-coders who might be less comfortable making up functions on the fly, and also to provide some guidance on the most common things to do. One way to support both approaches would be to be support some optional argument which would allow a function to be passed instead. I also think we might need to explain why these weighting parameters are being passed in rather than using the already suitable "weights" argument to specify the weights. For example, why are we not using functions like weights_where_one or trap_weights to calculate a suitable weights array instead? Part of the answer is convenience and guidance for users, but I think it could be confusing to people who are thinking about the weights. Do the functions treat the weights in |
It might be interesting to add an example for a variable which has both positive and negative values, either where the interval of interest is around the mean, or where it's on the extremes. It's not clear to me how this can be used on something like temperature, where both very hot and very cold extremes are both of similar (although potentially non-equal) interest. I really like the examples that are currently included, and I'm a bit worried about overcomplicating the tutorial, but I want to make sure this approach is suitable for that situation and an example would both put the implementation to the test and demonstrate appropriate practise. Examples could include for example human comfort ranges for temperature or humidity, agriculture (where both heat and frost/ice could be problems), or operating equipment like trains where again there would be a temperature range. |
@tennlee , they way you have put it is not quite correct. Take the following example. My decision threshold is 40C. If tomorrow's forecast exceeds 40C, I will tell the workers on my construction site not to come into work tomorrow. Now consider the case where the forecast for tomorrow was 20C and the observations was 10C. That is a large error, and the squared error is even larger (100C^2). However, I still made the correct decision on the basis of the forecast to not close the construction site. So the size of the penalty can be independent of whether the forecast was good enough for people whose decision thresholds are 40C or higher. In this case, threshold-weighted squared error actually give a penalty of 0. What Nick says is correct. MSE can be interpreted as putting equal weight on all thresholds. The Ehm 2016 paper explains why. Happy to talk about this in another forum rather than a gitlab comment |
@nicholasloveday , one of @tennlee comments about weights makes me wonder whether:
I noticed in your docstrings that you talk about weights where one, etc, which is the second type of weight listed above, not the first. To avoid confusion, you may need to talk consistently about threshold-weights (and explain that this is related to decision thresholds) to distinguish this from the other type of weighting. Both in the tutorial and in the docstrings for your functions |
Thanks, I've followed up offline. My comments aren't about what's right and wrong, but about whether things are clear when introduced to a new reader with a general mathematical understanding of errors but without prior knowledge of decision thresholds. i.e. to help people avoid confusion about the exact disagreement being expressed in the comment. |
Yes, spot on. |
This would be a nice example to include. |
Added a text description in the trapezoidal weighting paragraph. I added rather than replaced as it is important to show how trapezoidal weights can be on either one side or both sides of the rectangular weights.
I've added a modified example that lends itself more to a range of decision thresholds rather than a single decision threshold
I have updated the tutorial and docstrings to clarify this.
Added an example in the tutorial. Also added an additional thing to try in the "things to try next" section. |
…ted Scores notebook
Please work through the following checklists. Delete anything that isn't relevant.
Development for new xarray-based metrics
reduce_dims
,preserve_dims
, andweights
args.Testing of new xarray-based metrics
Tutorial notebook
Documentation