-
Notifications
You must be signed in to change notification settings - Fork 174
JP-3311: OutlierDetectionStep producing extraneous outlier_i2d files #8464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8464 +/- ##
==========================================
+ Coverage 56.38% 56.53% +0.15%
==========================================
Files 387 387
Lines 38716 38822 +106
==========================================
+ Hits 21830 21949 +119
+ Misses 16886 16873 -13 ☔ View full report in Codecov by Sentry. |
fixed failed reg test and rerunning at https://plwishmaster.stsci.edu:8081/job/RT/job/JWST-Developers-Pull-Requests/1425/ |
Happy with results of last reg test run. The failed tests are not related to changes in this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this on. I left a few comments/questions (not a full review yet) and I have one general question. My understanding (correct me if I'm wrong) is that the intermediate files are (or perhaps only should be) written only if in_memory
is False. The current code (without this PR) doesn't delete these files until the end of the step (which makes "book-keeping" difficult and contributes to the issue addressed in this PR).
Would it be easier to not write out the files if they aren't required and to delete them as soon as they are no longer useful? I think this is already the case for the median
image (it's not written if save_intermediate_results is False). The same can't be said for blot_models
which are always written out. For these they are no longer needed after the call to detect_outliers and could be deleted there if save_intermediate_results
is False
. Similarly the drizzled_models
are not needed after the call to create_median and could be deleted there.
What do you think of this alternative approach? It should avoid needing to add the new parameter to the spec and the output_list. Will it be compatible with the other issue fixed in this PR (fixing the output directory for the i2d files)?
jwst/resample/resample.py
Outdated
output_model.save(output_name) | ||
if self.mk_output_list: | ||
self.output_list.append(output_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like output_list
contains the same items as output_models
. What required adding the new list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list contains the strings of the full paths of the files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that the same thing that's in output_models
? line 315 contains:
self.output_models.append(output_name)
which appends the same variable as:
self.output_list.append(output_name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I agree that the 2 lines are appending the name, however when you print self.output_models you get a model container and not a list of strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I did miss that point, and it looks like there is an issue with resample (one that exists on main) where it creates the container with an unknown argument with presumably the expectation that the models are not used:
jwst/jwst/resample/resample.py
Line 186 in 9e857a5
self.output_models = ModelContainer(open_models=False) |
As resample appends filenames it's possible to either:
- disable
return_open
(which I expect is what theopen_models
was intended to do) and indexing the container will return the filenames - access
_models
(which will contain the filenames)
I went with the second approach in penaguerrero#1
CI style check is failing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the changes. I don't think I did a good job explaining the alternative approach. I opened a PR against your branch to hopefully illustrate the idea:
penaguerrero#1
It:
- removes the
mk_output_list
parameter from the step spec - removes the changes to resample
- removes
output_list
- deletes
drizzled_models
after they are no longer needed - deletes
blot_models
after they are no longer needed
I ran the outlier detection unit tests but not the full unit tests (or regtests) for the PR against your branch. Please take a look at the PR.
It seems possible to fix this issue without changes to resample (to help keep the API for the 2 steps separate).
Would you add a unit test that fails on jwst master but passes with this PR to both illustrate the fix and to make sure the the upcoming outlier detection and resample changes do not re-introduce these issues?
delete files after they're no longer useful
started a new reg test run at https://plwishmaster.stsci.edu:8081/job/RT/job/JWST-Developers-Pull-Requests/1431/ |
failed tests are unrelated to this PR |
@braingram @penaguerrero Is everyone happy now? Are the changes in resample/resample_spec still necessary or not? |
@penaguerrero is a test for JP-3311 doable as part of this PR? I think the resample changes are needed for JP-3039. |
Given that this PR contains a fix for a persistent regtest error I'd like to go ahead and get this merged ASAP. We can always submit a unit test or regtest for this condition in a separate PR. Is there anything that anyone thinks really needs changing or updating yet in this PR before merging? |
Is the regtest fix the changes in I'll dismiss my review and leave it up to @hbushouse but I believe the outlier detection changes warrant a test. For example, I do not know if the changes in my PR against the source branch for this PR even addressed the issue. |
I don't see a button to dismiss my own review so I marked it as 'approved' to remove my changes request. |
Resolves JP-3311
Closes #7747
This PR addresses intermediate files being left over after step was run. I made a series of modifications to produce an output file list and then clean up these files. In the process of creating such a list, https://jira.stsci.edu/browse/JP-3039 got resolved as well - the _i2d files were not being saved to the specified output directory.
Checklist for maintainers
CHANGES.rst
within the relevant release sectionHow to run regression tests on a PR