-
Notifications
You must be signed in to change notification settings - Fork 9.6k
Add template functions to support time handling in Loki alerts. #16619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add template functions to support time handling in Loki alerts. #16619
Conversation
Signed-off-by: Dmitry Ponomaryov <me@halje.ru>
I cannot really judge the usefulness of this feature in the context of Prometheus templating. (For example, I don't understand how Prometheus templating is used in Loki alerting.) I don't know who is currently most qualified to act as a guardian of the templating. Maybe @grobinson-grafana and @gotjosh as the Alertmanager experts have an opinion here. Maybe @slim-bean can help explaining the connection to Loki. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some code level comments. This should not imply that I think we should add this feature. I leave this to the experts, see other comment.
Signed-off-by: Dmitry Ponomaryov <iamhalje@gmail.com>
the previous name 'parseDurationTime' was unclear. The new name 'parseGoDuration' makes it explicit that this function uses Go's standard `time.ParseDuration`, which does not support Prometheus-style durations like `3d` or `1w`. Signed-off-by: Dmitry Ponomaryov <iamhalje@gmail.com>
…prometheus into add-template-parsedurationtime
@beorn7 thanks for reaction on this request overall, i just want to say that this is an attempt to approach the long-standing issue of adding time-based information into Loki alerts - even if it's just from one possible angle. the problem has been around for over five years, and this is initial step toward making such functionality possible for reference, this is how things currently work in Loki: Loki relies entirely on prometheus/templating within the ruler (alert component) to tempating and take dynamic data in alerts, but this mechanism lacks access to time-related values it's absolutely the right move to involve the developers at this point - thank you for that. i just wanna to stress again that this PR alone won't be sufficient to solve the entire problem, but perhaps it will prompt the Grafana Loki team to consider adding something similar or native on their side. Thanks again! |
As @iamhalje mentioned above, Loki uses As a result, if
I propose adding additional function |
Signed-off-by: Dmitry Ponomaryov <iamhalje@gmail.com>
@verdel ideally, Loki itself would track the exact alert firing time and provide a variable in the templating context representing this specific alert start time. This would allow generating dashboard URLs with precise time ranges relative to when the alert actually began however, the current approach (let’s call it nowTime) is still a practical solution:
since we can use time.Duration offsets relative to this way, alert templates can use @beorn7 await for feedback from @prometheus/@grafana teams for adding this functions in templating. thanks |
When an instance of But in that case, the function name
A small note:
It’s also important to note that Loki uses vendoring for third-party modules, and currently, the version of the Prometheus module it uses is slightly behind the latest version in that repository. |
@beorn7, the reason for creating this PR might become a bit clearer if you read the discussion in my PR in the Alertmanager repository. Since Prometheus/Loki doesn’t support performing time operations when preparing additional information in an AlertRule, I pass a template into the annotation, which is then re-templated on the Alertmanager side. It looks like this: AlertRule
Alertmanager Template
But since Alertmanager still doesn’t have a function to work with durations, I still have to use a static value in the argument of the Additionally, using a template inside We discussed with @grobinson-grafana that this approach—passing a template into an AlertRule—is not the right solution, and that the value should be generated on the Prometheus/Loki side, not in Alertmanager. If we add these two time-handling functions to the AlertRule templating in Prometheus, it would solve the problem. |
In view of all the @-mentioning here, I should re-emphasize that I'm not qualified to vet the validity of this feature. |
@beorn7, is there any other way to notify the maintainers responsible for the area related to our issue that we need their help reviewing this PR? |
In principle, the PR alone should eventually trigger a review. @-mentioning people that you think are qualified might help. You could also try to catch people on the CNCF Slack. But these are all using increasing invasiveness, and reviewers are usually overloaded already. |
Looking at templating in general (of which I am not an expert, as said): There is I'm not sure about the Final point: All of this must also be documented in https://github.com/prometheus/prometheus/blob/main/docs/configuration/template_reference.md . |
…seGoDuration Signed-off-by: Dmitry Ponomaryov <iamhalje@gmail.com>
Signed-off-by: Dmitry Ponomaryov <iamhalje@gmail.com>
Signed-off-by: Dmitry Ponomaryov <iamhalje@gmail.com>
based on discussions above, alerts should support negative duration values directly without relying on complicated workarounds. to achieve this cleanly, we will have two functions: therefore, a sub-task was created to add negative duration support to i hope have correctly understood everything mentioned earlier. next, i'll try to involve Loki developers in this ticket for further collaboration. |
@cyriltovena @owen-d @chaudum @slim-bean @trevorwhitney @sandeepsukhani @jeschkies @paul1r @poyzannur FYI adds time templating for Loki alerts, would appreciate a review for anyone |
prometheus/common has a new release tag v0.65.0, which you can update this PR to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comments. I think, if we do it this way, it will be very uncontroversial because it sticks to the pattern we already have with toTime
.
It would still be good to get feedback from Loki folks that this is even needed.
Signed-off-by: Dmitry Ponomaryov <iamhalje@gmail.com>
Signed-off-by: Dmitry Ponomaryov <iamhalje@gmail.com>
Signed-off-by: Dmitry Ponomaryov <iamhalje@gmail.com>
as a result, we added the functions this allows us to use expressions like the following in Loki:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Looks good generally, but still a few details to put straight.
template/template.go
Outdated
//nolint:gocritic // must be a function to avoid template panics (as in Loki). | ||
"now": func() model.Time { | ||
return model.Now() | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some consideration, I think we should better return a float representing the number of seconds since epoch to make this useful for toTime
and humanizeTime
and generally the time and duration arithmetic, which is based on seconds, not milliseconds.
//nolint:gocritic // must be a function to avoid template panics (as in Loki). | |
"now": func() model.Time { | |
return model.Now() | |
}, | |
"now": func() float64 { | |
now := time.Now() | |
return float64(now.Unix()) + float64(now.Nanosecond())/float64(time.Second) | |
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In different news, it would be good to add tests for now
. (For which we need to call a function that we can replace for tests, rather than time.Now()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive-by comment only slightly relevant to the rest of the PR:
Go added testing/synctest
which mocks time, as an experiment in Go 1.24 and for real in upcoming Go 1.25.
So we could consider using that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bboreham thanks for pointer, but Prometheus is still using Go 1.23, we can upgrade?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prometheus is committed to work with a few Go versions prior to the current one.
You could write a test with a build tag for the higher version, but in this simple case, I have an even better idea.
NewTemplateExpander
already accepts an argument timestamp model.Time
, which is essentially the evaluation time. So I would simply return that from now
. This has two advantages: Easy for test. And most importantly: Consistent between different invocations of now
in the same template expansion.
Signed-off-by: Dmitry Ponomaryov <iamhalje@gmail.com>
Signed-off-by: Dmitry Ponomaryov <iamhalje@gmail.com>
with the latest changes, the working solution for us is to use
|
Or maybe with a more pipeline-y approach:
|
Which is probably a good template to add to the tests. |
Currently, Prometheus template functions include
parseDuration
, which parses a duration string (e.g."1h2m10ms"
) and returns the total duration in seconds as a float. However, it is often more useful to work directly with Gotime.Duration
objects ortime.Time
operations in templates for alerting and logging purposes.For example, in Loki alerting, when adding a
StartsAt
(need add to Loki) function for alert start time, usingparseDurationTime
allows returning the time in the needed format, such as:Adding parseDurationTime enhances Prometheus templating by providing rich duration/time operations in templates, improving alerting capabilities with use cases in Loki.
Related Issues and Pull Requests
grafana/loki#4980
prometheus/alertmanager#3816