8000 Fix off-by-one error in degradation.py by SandraVillamar · Pull Request #339 · NREL/rdtools · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Fix off-by-one error in degradation.py #339

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Nov 30, 2022

Conversation

SandraVillamar
Copy link
Contributor
@SandraVillamar SandraVillamar commented Aug 1, 2022

Hello! Thank you for writing such a great package. I have found an issue with degradation.degradation_year_on_year() that occurs when inputting only two years of data. It seems as if the package relies on elapsed days to compute the minimum amount of data needed in order to use the method. It compares the number of days that have passed to 730, so the method fails for exactly 2 years of data where the difference would equal 729.

This PR fixes that issue.

Thanks!

To reproduce, pass any set of data containing two full years to degradation_year_on_year(). You will see the following traceback:

ValueError                                Traceback (most recent call last)
Input In [29], in <cell line: 1>()
----> 1 yearly_rd, yearly_ci, yearly_info = rdtools.degradation_year_on_year(daily.loc[daily.index.year.isin([2017,2018])], confidence_level=68.2)

File ~\Anaconda3\lib\site-packages\rdtools\degradation.py:237, in degradation_year_on_year(energy_normalized, recenter, exceedance_prob, confidence_level)
    234 # Detect less than 2 years of data
    235 if energy_normalized.index[-1] - energy_normalized.index[0] < \
    236         pd.Timedelta('730d'):
--> 237     raise ValueError('must provide at least two years of '
    238                      'normalized energy')
    240 # Auto center
    241 if recenter:

ValueError: must provide at least two years of normalized energy

Thanks for your hard work on the package!

  • Code changes are covered by tests
  • [ ] Code changes have been evaluated for compatibility/integration with TrendAnalysis
  • [ ] New functions added to __init__.py
  • [ ] API.rst is up to date, along with other sphinx docs pages
  • [ ] Example notebooks are rerun and differences in results scrutinized
  • Updated changelog

Hello!  Thank you for writing such a great package.  I have found an issue with degradation.degradation_year_on_year() that occurs when inputting only two years of data.  It seems as if the package relies on elapsed days to compute the minimum amount of data needed in order to use the method.  It compares the number of days that have passed to 730, but it uses the < operator instead of <= so the method fails for exactly 2 years of data.

This PR fixes that issue.

Thanks!

To reproduce, pass any set of data containing two full years to degradation_year_on_year().  You will see the following traceback:

```
ValueError                                Traceback (most recent call last)
Input In [29], in <cell line: 1>()
----> 1 yearly_rd, yearly_ci, yearly_info = rdtools.degradation_year_on_year(daily.loc[daily.index.year.isin([2017,2018])], confidence_level=68.2)

File ~\Anaconda3\lib\site-packages\rdtools\degradation.py:237, in degradation_year_on_year(energy_normalized, recenter, exceedance_prob, confidence_level)
    234 # Detect less than 2 years of data
    235 if energy_normalized.index[-1] - energy_normalized.index[0] < \
    236         pd.Timedelta('730d'):
--> 237     raise ValueError('must provide at least two years of '
    238                      'normalized energy')
    240 # Auto center
    241 if recenter:

ValueError: must provide at least two years of normalized energy
```

Thanks for your hard work on the package!
@kandersolar
Copy link
Member

Thanks @SandraVillamar! Note that it matters whether the input contains a leap day, for example compare:

# errors:
rdtools.degradation_year_on_year(pd.Series(1, index=pd.date_range('2014-01-01', '2015-12-31', freq='d')))
# succeeds:
rdtools.degradation_year_on_year(pd.Series(1, index=pd.date_range('2015-01-01', '2016-12-31', freq='d')))

I wonder if the error condition should take that into account somehow.

Also, ideally bugfixes are accompanied by tests to make sure the bug is actually fixed and we don't accidentally reintroduce it someday. Would you be interested in adding one to this PR? Happy to provide guidance if needed.

@SandraVillamar
Copy link
Contributor Author
SandraVillamar commented Aug 2, 2022

Hi @kanderso-nrel, thanks for the quick reply. Unfortunately, I do not have time to contribute a test case but I believe this code should fix the issue by taking into account leap years.

if energy_normalized.index.is_leap_year.any() & 
(energy_normalized.index[-1] - energy_normalized.index[0] < pd.Timedelta('730d')):
    raise ValueError('must provide at least two years of normalized energy')
elif energy_normalized.index[-1] - energy_normalized.index[0] < pd.Timedelta('729d'):
    raise ValueError('must provide at least two years of normalized energy')

@kandersolar
Copy link
Member

That would work in many cases, but I think the leap year case assumes that a leap day is present in the data period, which isn't always true. Consider for example pd.date_range('2016-06-01', '2018-05-31', freq='d'), which contains part of a leap year but not an actual leap day.

What makes this a even more complicated is that degradation_year_on_year also supports non-daily inputs (e.g. weekly), and two years' worth of weekly values won't have 730 (or 729) days between the first and last dates. I might have some ideas how to go about fixing this issue in the general case -- @SandraVillamar do you mind if I push to this PR branch myself for a more general fix as well as test cases? I could do a new PR but would prefer to preserve your commits and their authorship if possible.

@SandraVillamar
Copy link
Contributor Author

@kanderso-nrel yes of course. Thank you for helping!

@kandersolar kandersolar added the bug label Sep 9, 2022
@kandersolar kandersolar mentioned this pull request Sep 13, 2022
6 tasks
@mdeceglie mdeceglie changed the base branch from master to version_2.1.4 November 23, 2022 21:16
Copy link
Collaborator
@mdeceglie mdeceglie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking good. All the edge case testing is great! One thing I'd like us to consider before merging. The new logic is implemented after the NaNs are removed, this is different than the prior behavior, which would treat a two year dataset with NaNs at the beginning or end as still valid. Would it make sense to implement this same logic earlier in the function?

@kandersolar
Copy link
Member

@mdeceglie good point. I think the older behavior regarding NaN is now preserved as of the latest commit.

Copy link
Collaborator
@mdeceglie mdeceglie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks @SandraVillamar for your help with this bug.

@mdeceglie mdeceglie merged commit e0759b2 into NREL:version_2.1.4 Nov 30, 2022
@SandraVillamar
Copy link
Contributor Author
SandraVillamar commented Dec 5, 2022

Thank you so much for fixing the bug! Appreciate all the work.

@SandraVillamar SandraVillamar deleted the patch-1 branch December 5, 2022 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0