-
Notifications
You must be signed in to change notification settings - Fork 76
Fix off-by-one error in degradation.py #339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello! Thank you for writing such a great package. I have found an issue with degradation.degradation_year_on_year() that occurs when inputting only two years of data. It seems as if the package relies on elapsed days to compute the minimum amount of data needed in order to use the method. It compares the number of days that have passed to 730, but it uses the < operator instead of <= so the method fails for exactly 2 years of data. This PR fixes that issue. Thanks! To reproduce, pass any set of data containing two full years to degradation_year_on_year(). You will see the following traceback: ``` ValueError Traceback (most recent call last) Input In [29], in <cell line: 1>() ----> 1 yearly_rd, yearly_ci, yearly_info = rdtools.degradation_year_on_year(daily.loc[daily.index.year.isin([2017,2018])], confidence_level=68.2) File ~\Anaconda3\lib\site-packages\rdtools\degradation.py:237, in degradation_year_on_year(energy_normalized, recenter, exceedance_prob, confidence_level) 234 # Detect less than 2 years of data 235 if energy_normalized.index[-1] - energy_normalized.index[0] < \ 236 pd.Timedelta('730d'): --> 237 raise ValueError('must provide at least two years of ' 238 'normalized energy') 240 # Auto center 241 if recenter: ValueError: must provide at least two years of normalized energy ``` Thanks for your hard work on the package!
Thanks @SandraVillamar! Note that it matters whether the input contains a leap day, for example compare: # errors:
rdtools.degradation_year_on_year(pd.Series(1, index=pd.date_range('2014-01-01', '2015-12-31', freq='d')))
# succeeds:
rdtools.degradation_year_on_year(pd.Series(1, index=pd.date_range('2015-01-01', '2016-12-31', freq='d'))) I wonder if the error condition should take that into account somehow. Also, ideally bugfixes are accompanied by tests to make sure the bug is actually fixed and we don't accidentally reintroduce it someday. Would you be interested in adding one to this PR? Happy to provide guidance if needed. |
Hi @kanderso-nrel, thanks for the quick reply. Unfortunately, I do not have time to contribute a test case but I believe this code should fix the issue by taking into account leap years.
|
That would work in many cases, but I think the leap year case assumes that a leap day is present in the data period, which isn't always true. Consider for example What makes this a even more complicated is that |
@kanderso-nrel yes of course. Thank you for helping! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is looking good. All the edge case testing is great! One thing I'd like us to consider before merging. The new logic is implemented after the NaNs are removed, this is different than the prior behavior, which would treat a two year dataset with NaNs at the beginning or end as still valid. Would it make sense to implement this same logic earlier in the function?
@mdeceglie good point. I think the older behavior regarding NaN is now preserved as of the latest commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks @SandraVillamar for your help with this bug.
Thank you so much for fixing the bug! Appreciate all the work. |
Hello! Thank you for writing such a great package. I have found an issue with degradation.degradation_year_on_year() that occurs when inputting only two years of data. It seems as if the package relies on elapsed days to compute the minimum amount of data needed in order to use the method. It compares the number of days that have passed to 730, so the method fails for exactly 2 years of data where the difference would equal 729.
This PR fixes that issue.
Thanks!
To reproduce, pass any set of data containing two full years to degradation_year_on_year(). You will see the following traceback:
Thanks for your hard work on the package!
[ ] Code changes have been evaluated for compatibility/integration with TrendAnalysis[ ] New functions added to__init__.py
[ ] API.rst is up to date, along with other sphinx docs pages[ ] Example notebooks are rerun and differences in results scrutinized