8000 JP-3311: OutlierDetectionStep producing extraneous outlier_i2d files by penaguerrero · Pull Request #8464 · spacetelescope/jwst · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

JP-3311: OutlierDetectionStep producing extraneous outlier_i2d files #8464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
May 10, 2024

Conversation

penaguerrero
Copy link
Contributor
@penaguerrero penaguerrero commented May 7, 2024

Resolves JP-3311

Closes #7747

This PR addresses intermediate files being left over after step was run. I made a series of modifications to produce an output file list and then clean up these files. In the process of creating such a list, https://jira.stsci.edu/browse/JP-3039 got resolved as well - the _i2d files were not being saved to the specified output directory.

Checklist for maintainers

  • added entry in CHANGES.rst within the relevant release section
  • updated or added relevant tests
  • updated relevant documentation
  • added relevant milestone
  • added relevant label(s)
  • ran regression tests, post a link to the Jenkins job below.
    How to run regression tests on a PR
  • Make sure the JIRA ticket is resolved properly

@penaguerrero
Copy link
Contributor Author

Copy link
codecov bot commented May 7, 2024

Codecov Report

Attention: Patch coverage is 71.87500% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 56.53%. Comparing base (6580914) to head (2b006d2).
Report is 14 commits behind head on master.

❗ Current head 2b006d2 differs from pull request most recent head 30f6d15. Consider uploading reports for the commit 30f6d15 to get more accurate results

Files Patch % Lines
jwst/outlier_detection/outlier_detection_spec.py 12.50% 7 Missing ⚠️
jwst/resample/resample_spec.py 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8464      +/-   ##
==========================================
+ Coverage   56.38%   56.53%   +0.15%     
==========================================
  Files         387      387              
  Lines       38716    38822     +106     
==========================================
+ Hits        21830    21949     +119     
+ Misses      16886    16873      -13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hbushouse hbushouse requested a review from braingram May 7, 2024 19:17
@penaguerrero
Copy link
Contributor Author

fixed failed reg test and rerunning at https://plwishmaster.stsci.edu:8081/job/RT/job/JWST-Developers-Pull-Requests/1425/

@penaguerrero
Copy link
Contributor Author

Happy with results of last reg test run. The failed tests are not related to changes in this PR.

@hbushouse hbushouse added this to the Build 11.0 milestone May 8, 2024
Copy link
Collaborator
@braingram braingram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on. I left a few comments/questions (not a full review yet) and I have one general question. My understanding (correct me if I'm wrong) is that the intermediate files are (or perhaps only should be) written only if in_memory is False. The current code (without this PR) doesn't delete these files until the end of the step (which makes "book-keeping" difficult and contributes to the issue addressed in this PR).

Would it be easier to not write out the files if they aren't required and to delete them as soon as they are no longer useful? I think this is already the case for the median image (it's not written if save_intermediate_results is False). The same can't be said for blot_models which are always written out. For these they are no longer needed after the call to detect_outliers and could be deleted there if save_intermediate_results is False. Similarly the drizzled_models are not needed after the call to create_median and could be deleted there.

What do you think of this alternative approach? It should avoid needing to add the new parameter to the spec and the output_list. Will it be compatible with the other issue fixed in this PR (fixing the output directory for the i2d files)?

output_model.save(output_name)
if self.mk_output_list:
self.output_list.append(output_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like output_list contains the same items as output_models. What required adding the new list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list contains the strings of the full paths of the files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that the same thing that's in output_models? line 315 contains:

self.output_models.append(output_name)

which appends the same variable as:

self.output_list.append(output_name)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I agree that the 2 lines are appending the name, however when you print self.output_models you get a model container and not a list of strings

Copy link
Collaborator
@braingram braingram May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I did miss that point, and it looks like there is an issue with resample (one that exists on main) where it creates the container with an unknown argument with presumably the expectation that the models are not used:

self.output_models = ModelContainer(open_models=False)

As resample appends filenames it's possible to either:

  • disable return_open (which I expect is what the open_models was intended to do) and indexing the container will return the filenames
  • access _models (which will contain the filenames)

I went with the second approach in penaguerrero#1

@hbushouse
Copy link
Collaborator

CI style check is failing.

Copy link
Collaborator
@hbushouse hbushouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now.

@hbushouse hbushouse requested a review from braingram May 8, 2024 20:30
Copy link
Collaborator
@braingram braingram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes. I don't think I did a good job explaining the alternative approach. I opened a PR against your branch to hopefully illustrate the idea:
penaguerrero#1
It:

  • removes the mk_output_list parameter from the step spec
  • removes the changes to resample
  • removes output_list
  • deletes drizzled_models after they are no longer needed
  • deletes blot_models after they are no longer needed

I ran the outlier detection unit tests but not the full unit tests (or regtests) for the PR against your branch. Please take a look at the PR.

It seems possible to fix this issue without changes to resample (to help keep the API for the 2 steps separate).

Would you add a unit test that fails on jwst master but passes with this PR to both illustrate the fix and to make sure the the upcoming outlier detection and resample changes do not re-introduce these issues?

penaguerrero and others added 2 commits May 9, 2024 08:33
delete files after they're no longer useful
@penaguerrero
Copy link
Contributor Author

@penaguerrero
Copy link
Contributor Author

failed tests are unrelated to this PR

@hbushouse
Copy link
Collaborator

@braingram @penaguerrero Is everyone happy now? Are the changes in resample/resample_spec still necessary or not?

@braingram
Copy link
Collaborator

Would you add a unit test that fails on jwst master but passes with this PR to both illustrate the fix and to make sure the the upcoming outlier detection and resample changes do not re-introduce these issues?

@penaguerrero is a test for JP-3311 doable as part of this PR?

I think the resample changes are needed for JP-3039.

@hbushouse
Copy link
Collaborator

Given that this PR contains a fix for a persistent regtest error I'd like to go ahead and get this merged ASAP. We can always submit a unit test or regtest for this condition in a separate PR. Is there anything that anyone thinks really needs changing or updating yet in this PR before merging?

@braingram
Copy link
Collaborator
braingram commented May 10, 2024

Is the regtest fix the changes in test_nirspec_mos_spec3? Are those related to the outlier detection changes? If not, wouldn't a different PR that describes those changes make sense?

I'll dismiss my review and leave it up to @hbushouse but I believe the outlier detection changes warrant a test. For example, I do not know if the changes in my PR against the source branch for this PR even addressed the issue.

@braingram braingram self-requested a review May 10, 2024 12:42
@braingram
Copy link
Collaborator

I don't see a button to dismiss my own review so I marked it as 'approved' to remove my changes request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OutlierDetectionStep producing extraneous outlier_i2d files
3 participants
0