8000 feat(health): Add custom healthcheck for NATS resources #23167 by Koalk · Pull Request #23180 · argoproj/argo-cd · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

feat(health): Add custom healthcheck for NATS resources #23167 #23180

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Koalk
Copy link
@Koalk Koalk commented May 28, 2025

NATS uses a controller called nacks to deploy resources in a deployed server based on CRDs created. The controller provides kubernetes events giving the status updates of the processing of the CRDs.
Closes #23167

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • The title of the PR conforms to the Toolchain Guide
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).
  • My new feature complies with the feature status guidelines.
  • I have added a brief description of why this PR is necessary and/or what this PR solves.
  • Optional. My organization is added to USERS.md.
  • Optional. For bug fixes, I've indicated what older releases this fix should be cherry-picked into (this may or may not happen depending on risk/complexity).

@Koalk Koalk requested a review from a team as a code owner May 28, 2025 09:25
Copy link
bunnyshell bot commented May 28, 2025

🔴 Preview Environment stopped on Bunnyshell

See: Environment Details | Pipeline Logs

Available commands (reply to this comment):

  • 🔵 /bns:start to start the environment
  • 🚀 /bns:deploy to redeploy the environment
  • /bns:delete to remove the environment

Copy link
codecov bot commented May 28, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (master@97a38b4). Learn more about missing BASE report.
Report is 102 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##             master   #23180   +/-   ##
=========================================
  Coverage          ?   59.98%           
=========================================
  Files             ?      343           
  Lines             ?    57890           
  Branches          ?        0           
========================
10000
=================
  Hits              ?    34725           
  Misses            ?    20384           
  Partials          ?     2781           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@crenshaw-dev crenshaw-dev changed the title feat: Add custom healthcheck for NATS resources #23167 feat(health): Add custom healthcheck for NATS resources #23167 Jun 4, 2025
@crenshaw-dev crenshaw-dev added this to the v3.1 milestone Jun 4, 2025
min = minute, sec = seconds}
if os.difftime(os.time(), event_time) > 30 then
hs.status = "Degraded"
hs.message = "Trying to create resource for more than 30 seconds"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens later? Would it keep trying or give up?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not defined in their documentation. I think it will depend on the reason it didn't work. I have seen it retry multiple times before giving up but also restarting the Nack controller or interacting with the NATS server resources directly can put it in an inconsistent state and I am not sure how it untangles itself.

Comment on lines +19 to +23
local pattern = "(%d+)%-(%d+)%-(%d+)%a(%d+)%:(%d+)%:([%d]+)%.%d+Z"
local year, month, day, hour, minute, seconds = condition.lastTransitionTime:match(pattern)
local event_time = os.time{year = year, month = month, day = day, hour = hour,
min = minute, sec = seconds}
if os.difftime(os.time(), event_time) > 30 then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argo CD is not the place to decide that. If the resource is unhealthy after 30 seconds and this has an impact, then it should have a feature similar to a progressDeadlineSeconds and update the status/conditions accordingly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resource doesn't have a "progressDeadlineSeconds" definition. I could check if it exists in case they add it in the future and keep the 30 seconds as a reasonable default?

Comment on lines +19 to +21
reason: Creating
status: 'True'
type: Ready
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds weird to me to have a resource that is "creating" and have the Ready status be already true. Are you sure that it is the behavior of the controller? This seems incorrect.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is. I think their idea is that "the controller knows about this resource" makes it ready. The controller then needs to read the properties from the resource and make the corresponding calls to the NATS server to create whatever the resource is representing.

@agaudreault agaudreault modified the milestones: v3.1, v3.2 Jun 16, 2025
Signed-off-by: DanielS <daniels@playerdata.com>
@Koalk Koalk force-pushed the nats-nack-healthcheck branch from eaf54bd to 28a9694 Compare June 17, 2025 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NATS custom healthchecks for CRDs
4 participants
0