-
Notifications
You must be signed in to change notification settings - Fork 6.1k
feat(health): Add custom healthcheck for NATS resources #23167 #23180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
🔴 Preview Environment stopped on BunnyshellSee: Environment Details | Pipeline Logs Available commands (reply to this comment):
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #23180 +/- ##
=========================================
Coverage ? 59.98%
=========================================
Files ? 343
Lines ? 57890
Branches ? 0
========================
10000
=================
Hits ? 34725
Misses ? 20384
Partials ? 2781 ☔ View full report in Codecov by Sentry. |
min = minute, sec = seconds} | ||
if os.difftime(os.time(), event_time) > 30 then | ||
hs.status = "Degraded" | ||
hs.message = "Trying to create resource for more than 30 seconds" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens later? Would it keep trying or give up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not defined in their documentation. I think it will depend on the reason it didn't work. I have seen it retry multiple times before giving up but also restarting the Nack controller or interacting with the NATS server resources directly can put it in an inconsistent state and I am not sure how it untangles itself.
local pattern = "(%d+)%-(%d+)%-(%d+)%a(%d+)%:(%d+)%:([%d]+)%.%d+Z" | ||
local year, month, day, hour, minute, seconds = condition.lastTransitionTime:match(pattern) | ||
local event_time = os.time{year = year, month = month, day = day, hour = hour, | ||
min = minute, sec = seconds} | ||
if os.difftime(os.time(), event_time) > 30 then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argo CD is not the place to decide that. If the resource is unhealthy after 30 seconds and this has an impact, then it should have a feature similar to a progressDeadlineSeconds and update the status/conditions accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The resource doesn't have a "progressDeadlineSeconds" definition. I could check if it exists in case they add it in the future and keep the 30 seconds as a reasonable default?
reason: Creating | ||
status: 'True' | ||
type: Ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds weird to me to have a resource that is "creating" and have the Ready status be already true. Are you sure that it is the behavior of the controller? This seems incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is. I think their idea is that "the controller knows about this resource" makes it ready. The controller then needs to read the properties from the resource and make the corresponding calls to the NATS server to create whatever the resource is representing.
Signed-off-by: DanielS <daniels@playerdata.com>
eaf54bd
to
28a9694
Compare
NATS uses a controller called nacks to deploy resources in a deployed server based on CRDs created. The controller provides kubernetes events giving the status updates of the processing of the CRDs.
Closes #23167
Checklist: