8000 JP-3584: Use rolling window median for TSO outlier detection by emolter · Pull Request #8473 · spacetelescope/jwst · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

JP-3584: Use rolling window median for TSO outlier detection #8473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
May 30, 2024

Conversation

emolter
Copy link
Collaborator
@emolter emolter commented May 8, 2024

Resolves JP-3584
Also resolves JP-3441

Closes #8378
Closes #8009

This PR updates the outlier detection step to use a rolling median instead of a flat median for all TSO modes, such that real time variability in the data is less likely to be flagged as outliers. The number of integrations that comprise the window for the rolling median can be specified by an input parameter to the step.

Checklist for PR authors (skip items if you don't have permissions or they are not applicable)

  • added entry in CHANGES.rst within the relevant release section
  • updated or added relevant tests
  • updated relevant documentation
  • added relevant milestone
  • added relevant label(s)
  • ran regression tests, post a link to the Jenkins job below.
    How to run regression tests on a PR
  • All comments are resolved
  • Make sure the JIRA ticket is resolved properly

Copy link
codecov bot commented May 8, 2024

Codecov Report

Attention: Patch coverage is 84.68468% with 17 lines in your changes are missing coverage. Please review.

Project coverage is 57.98%. Comparing base (4179c09) to head (6bafa32).
Report is 5 commits behind head on master.

Current head 6bafa32 differs from pull request most recent head bcfa7fc

Please upload reports for the commit bcfa7fc to get more accurate results.

Files Patch % Lines
jwst/outlier_detection/outlier_detection_tso.py 87.30% 8 Missing ⚠️
jwst/pipeline/calwebb_tso3.py 0.00% 7 Missing ⚠️
jwst/outlier_detection/outlier_detection.py 94.11% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #8473   +/-   ##
=======================================
  Coverage   57.97%   57.98%           
=======================================
  Files         387      388    +1     
  Lines       38830    38927   +97     
=======================================
+ Hits        22513    22572   +59     
- Misses      16317    16355   +38     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@emolter
Copy link
Collaborator Author
emolter commented May 12, 2024
8000

Regression tests started here.

Marking this ready for review even though I still need to figure out why the readthedocs build is failing.

@emolter emolter marked this pull request as ready for review May 12, 2024 19:34
@emolter emolter requested a review from a team as a code owner May 12, 2024 19:34
@emolter
Copy link
Collaborator Author
emolter commented May 13, 2024

the only regtest failure was also present here, so I don't think this PR causes that issue.

However, I seem to have introduced a new unit test failure with the last small bugfix. Will track that down

< 8000 include-fragment data-nonce="v2:ec8cfdda-e4ba-8a48-a95a-38cc1fa0a2bf" data-view-component="true">
@emolter
Copy link
Collaborator Author
emolter commented May 14, 2024

New regtest run started here for code changes that address comment from @braingram. The imaging mode is still refactored, but the new version should preserve the way memory is used. Let me know whether the refactor is desired - some of the refactoring is definitely extraneous to the TSO changes and could be reverted if not desired.

@emolter emolter added this to the Build 11.0 milestone May 14, 2024
@braingram
Copy link
Collaborator

New regtest run started here for code changes that address comment from @braingram. The imaging mode is still refactored, but the new version should preserve the way memory is used. Let me know whether the refactor is desired - some of the refactoring is definitely extraneous to the TSO changes and could be reverted if not desired.

Thanks for putting this together. One general question I have is: what does the refactoring of the non-tso modes provide? If it's not required for the tso changes I think it makes sense to be a separate PR where the addressed issues and improvements are described.

@emolter
Copy link
Collaborator Author
emolter commented May 14, 2024

Agree with you RE some aspects of the additional refactor, specifically the modelwise_operation decorator. I started working on it when I misunderstood exactly what was needed for the TSO modes, and then just didn't revert it when I realized the TSO modes needed something different. I do think the decorators could be a useful way to ensure that memory is taken care of responsibly when running certain oper 8000 ations, and I'd be curious to hear your thoughts on that. However it's a solution to a problem that doesn't exist yet, and may never exist.

It does feel a bit strange to revert the use of timewise_operation in the imaging mode, because it is needed for the TSO data and its use in both places avoids substantial duplication. But I am happy to do so if that is preferred.

I think splitting the imaging mode into its own class and inheriting from a base class makes sense to include in this PR because imaging, spec, and now TSO all use similar but distinct data processing workflows. I don't know much about the coronagraphic modes but it's possible that coronagraphy might need its own, very similar, workflow as well - this is something I have been meaning to ask @hbushouse.

@braingram
Copy link
Collaborator

Thanks for the detailed response!

Agree with you RE some aspects of the additional refactor, specifically the modelwise_operation decorator. I started working on it when I misunderstood exactly what was needed for the TSO modes, and then just didn't revert it when I realized the TSO modes needed something different. I do think the decorators could be a useful way to ensure that memory is taken care of responsibly when running certain operations, and I'd be curious to hear your thoughts on that. However it's a solution to a problem that doesn't exist yet, and may never exist.

Thanks for the clarification. I'd be happy to discuss this.

It does feel a bit strange to revert the use of timewise_operation in the imaging mode, because it is needed for the TSO data and its use in both places avoids substantial duplication. But I am happy to do so if that is preferred.

Perhaps we could discuss this also as I'm still struggling to see how the get_sections api helps with the TSO data. As an example for non-tso data if an association containing nircam images is fed into outlier detection (with the default in_memory=False and resample_data=True) the drizzled_models produced by the resampling will be a ModelContainer containing filenames. When these are passed to create_median the get_sections api is used to limit the number of in-memory models to 1 at a time while still being able to take a median across all models. If the input to create_median is instead a container of models (not filenames) the function becomes very inefficient where each model is cloned for each section.

Is the input tso data a 3D cube? If so, the entirety of the input will be read on the first call to datamodels.open. calwebb_tso3 does transform the input to a container of 2d images before passing them to outlier_detection (do you know why it doesn't pass the cube? let's assume that can be changed as part of this improvement). Since the entire cube is in memory (and is not resampled) does it make sense to compute medians on slices from the cube (based on the size of the rolling window). This should be simpler and more efficient than trying to use the get_sections api.

I think splitting the imaging mode into its own class and inheriting from a base class makes sense to include in this PR because imaging, spec, and now TSO all use similar but distinct data processing workflows. I don't know much about the coronagraphic modes but it's possible that coronagraphy might need its own, very similar, workflow as well - this is something I have been meaning to ask @hbushouse.

Splitting up the modes into sub-classes sounds great to me. I'm much less familiar with the non-image paths in this step and have been mostly focused on trying to improve the performance. I certainly do not mean to derail this PR but hopefully some of my comments have been helpful.

@emolter
Copy link
Collaborator Author
emolter commented May 15, 2024

After a long discussion with @braingram about existing memory issues with calwebb_tso3 and calwebb_image3, it's clear that this PR is attempting to do too much. It's trying to use, and make more generic, some memory optimizations that are not necessarily working correctly anyway, e.g. this issue. The proposed path forward for this JIRA ticket is to make a very straightforward change to support the rolling median calculation, without worrying about memory at all. This requires undoing a lot of the changes in this PR.

Then for the future, in order to implement the above-mentioned issue in a way that will also support TSO data, I'll look into what kinds of large datasets might need to be processed through this step.

@emolter
Copy link
Collaborator Author
emolter commented May 16, 2024

Yet another regtest run started here.

In this version the only refactors should be splitting some chunks of the imaging pipeline flow into their own functions to be re-used by outlier_detection_tso. I think I fixed all the other issues @braingram brought up too and I'm going to resolve those comments now.

@braingram
Copy link
Collaborator

Thanks! Please do consider all of my previous comments addressed. The code changes look good to me and feel free to ping me when the regtests finish if a review would be helpful.

@emolter emolter requested a review from hbushouse May 17, 2024 13:06
@emolter
Copy link
Collaborator Author
emolter commented May 17, 2024

All the regression test failures seem to be unrelated. This is ready for your review @hbushouse.

You may wonder why there aren't any expected regtest failures for such a change, which should affect all data processed through calwebb_tso3. The reason is that there are no regtest datasets with >25 integrations (the default rolling window), which means that as written the code will revert to using a simple median.

Another note: I did check that the unit test I wrote fails for simple median but succeeds for rolling-median. I think the fake test data are therefore similar to the real test cases where this behavior was seen.

@perrygreenfield perrygreenfield self-requested a review May 21, 2024 20:34
@emolter
Copy link
Collaborator Author
emolter commented May 29, 2024

I believe the remaining regtest failures in test_miri_lrs_slitless are expected: the TSO3 pipeline is taking in a .crfints file with 10 integrations, and with this code change it should also get a CubeModel with 10 integrations out of outlier detection.

@hbushouse
Copy link
Collaborator

@emolter Latest CI run had lots of failures due to the n_ints name needing updating in one or more unit tests.

Copy link
Collaborator
@hbushouse hbushouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest doc updates look good. I hereby approve.

@hbushouse hbushouse merged commit 7bae12b into spacetelescope:master May 30, 2024
23 of 24 checks passed
@emolter emolter deleted the JP-3584 branch May 30, 2024 19:33
@braingram braingram mentioned this pull request Jul 10, 2024
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Outlier detection step skips execution using call() on calints
5 participants
0