8000 Add template functions to support time handling in Loki alerts. by iamhalje · Pull Request #16619 · prometheus/prometheus · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add template functions to support time handling in Loki alerts. #16619

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

iamhalje
Copy link
@iamhalje iamhalje commented May 20, 2025

Currently, Prometheus template functions include parseDuration, which parses a duration string (e.g. "1h2m10ms") and returns the total duration in seconds as a float. However, it is often more useful to work directly with Go time.Duration objects or time.Time operations in templates for alerting and logging purposes.

For example, in Loki alerting, when adding a StartsAt (need add to Loki) function for alert start time, using parseDurationTime allows returning the time in the needed format, such as:

{{ (StartsAt.Add (parseDurationTime "30m")).UnixMilli }}

Adding parseDurationTime enhances Prometheus templating by providing rich duration/time operations in templates, improving alerting capabilities with use cases in Loki.

Related Issues and Pull Requests

grafana/loki#4980
prometheus/alertmanager#3816

Signed-off-by: Dmitry Ponomaryov <me@halje.ru>
@beorn7
Copy link
Member
beorn7 commented Jun 4, 2025

I cannot really judge the usefulness of this feature in the context of Prometheus templating. (For example, I don't understand how Prometheus templating is used in Loki alerting.) I don't know who is currently most qualified to act as a guardian of the templating. Maybe @grobinson-grafana and @gotjosh as the Alertmanager experts have an opinion here. Maybe @slim-bean can help explaining the connection to Loki.

Copy link
Member
@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some code level comments. This should not imply that I think we should add this feature. I leave this to the experts, see other comment.

iamhalje added 3 commits June 4, 2025 22:00
Signed-off-by: Dmitry Ponomaryov <iamhalje@gmail.com>
the previous name 'parseDurationTime' was unclear. The new name 'parseGoDuration' makes it explicit that this function uses Go's standard `time.ParseDuration`, which does not support Prometheus-style durations like `3d` or `1w`.

Signed-off-by: Dmitry Ponomaryov <iamhalje@gmail.com>
…prometheus into add-template-parsedurationtime
@iamhalje
Copy link
Author
iamhalje commented Jun 4, 2025

@beorn7 thanks for reaction on this request

overall, i just want to say that this is an attempt to approach the long-standing issue of adding time-based information into Loki alerts - even if it's just from one possible angle. the problem has been around for over five years, and this is initial step toward making such functionality possible

for reference, this is how things currently work in Loki:
https://github.com/grafana/loki/blob/main/pkg/ruler/compat.go#L317

Loki relies entirely on prometheus/templating within the ruler (alert component) to tempating and take dynamic data in alerts, but this mechanism lacks access to time-related values

it's absolutely the right move to involve the developers at this point - thank you for that. i just wanna to stress again that this PR alone won't be sufficient to solve the entire problem, but perhaps it will prompt the Grafana Loki team to consider adding something similar or native on their side.

Thanks again!

@iamhalje iamhalje changed the title Add parseDurationTime in templating Add parseGoDuration in templating Jun 4, 2025
@verdel
Copy link
verdel commented Jun 6, 2025

As @iamhalje mentioned above, Loki uses prometheus/templating for AlertRule templating, which lacks functions for retrieving the current timestamp and time duration.

As a result, if prometheus/templating were to support getting time.Time and time.Duration, it would be possible, for example, to add a link to a Grafana dashboard with a specific timestamp in the AlertRule annotations

"annotations": {
  "title": "Alert Summary text",
  "description": "Detailed alert description",
  "log_url": "https://grafana.example.com ...from={{ now.Add (parseGoDuration "-2m") }}, to={{ now }}"
}

I propose adding additional function now, to help resolve time-related issues in AlertRule templating for Loki and Prometheus.

Signed-off-by: Dmitry Ponomaryov <iamhalje@gmail.com>
@iamhalje
Copy link
Author
iamhalje commented Jun 6, 2025

@verdel ideally, Loki itself would track the exact alert firing time and provide a variable in the templating context representing this specific alert start time. This would allow generating dashboard URLs with precise time ranges relative to when the alert actually began

however, the current approach (let’s call it nowTime) is still a practical solution:

groups:
- name: alert
  rules:
    for: 15m
    annotations:
      dashboard: "https://grafana.example.com/dashboard?from={{ (nowTime.Add (parseGoDuration \"-30m\")).UnixMilli }}&to={{ (nowTime.Add (parseGoDuration \"30m\")).UnixMilli }}"

since we can use time.Duration offsets relative to nowTime, we can select any desired time window and pass it to the alert template as needed. that said, simply returning nowTime as the current time is still useful and requires no changes on Loki side

this way, alert templates can use nowTime to get at least the current timestamp, and we can add alert firing times later as needed

@beorn7 await for feedback from @prometheus/@grafana teams for adding this functions in templating. thanks

@verdel
Copy link
verdel commented Jun 6, 2025

@verdel ideally, Loki itself would track the exact alert firing time and provide a variable in the templating context representing this specific alert start time. This would allow generating dashboard URLs with precise time ranges relative to when the alert actually began

When an instance of Expander is created, a timestamp is passed to it. If I’m not mistaken, in that case, it’s possible to change the nowTime function from time.Now() to timestamp.Time(), and we will get the exact timestamp that Loki provides during the alert rule evaluation process.

But in that case, the function name nowTime doesn’t quite reflect its functionality. It might be necessary to rename it.

dashboard: "https://grafana.example.com/dashboard?from={{ nowTime.Add (parseGoDuration "-30m").UnixMilli }}&to={{ nowTime.Add (parseGoDuration "30m").UnixMilli }}"

A small note: UnixMilli should be called not on the output of parseGoDuration, but on the result of nowTime.Add, like this:

{{ (nowTime.Add (parseGoDuration "-30m")).UnixMilli }}

It’s also important to note that Loki uses vendoring for third-party modules, and currently, the version of the Prometheus module it uses is slightly behind the latest version in that repository.

@verdel
Copy link
verdel commented Jun 7, 2025

@beorn7, the reason for creating this PR might become a bit clearer if you read the discussion in my PR in the Alertmanager repository.

Since Prometheus/Loki doesn’t support performing time operations when preparing additional information in an AlertRule, I pass a template into the annotation, which is then re-templated on the Alertmanager side.

It looks like this:

AlertRule

annotations:
    log_url: `https://grafana.test.com/explore?schemaVersion=1&"range":{"from":"%from%","to":"%to%"}}}&orgId=1`,

Alertmanager Template

{{- if (index .Alerts 0).Annotations.log_url }}:books: [Logs]({{ (index .Alerts 0).Annotations.log_url | reReplaceAll "%from%" (printf "%d000" ((index .Alerts 0).StartsAt.Add -120000000000).Unix) | reReplaceAll "%to%" (printf "%d000" (index .Alerts 0).StartsAt.Unix)}}) · {{ end -}}

But since Alertmanager still doesn’t have a function to work with durations, I still have to use a static value in the argument of the .Add function.

Additionally, using a template inside .Annotation breaks the display of information in the Alertmanager UI itself.

We discussed with @grobinson-grafana that this approach—passing a template into an AlertRule—is not the right solution, and that the value should be generated on the Prometheus/Loki side, not in Alertmanager.

If we add these two time-handling functions to the AlertRule templating in Prometheus, it would solve the problem.

@beorn7
Copy link
Member
beorn7 commented Jun 10, 2025

In view of all the @-mentioning here, I should re-emphasize that I'm not qualified to vet the validity of this feature.

@verdel
Copy link
verdel commented Jun 10, 2025

@beorn7, is there any other way to notify the maintainers responsible for the area related to our issue that we need their help reviewing this PR?

@beorn7
Copy link
Member
beorn7 commented Jun 11, 2025

@beorn7, is there any other way to notify the maintainers responsible for the area related to our issue that we need their help reviewing this PR?

In principle, the PR alone should eventually trigger a review. @-mentioning people that you think are qualified might help. You could also try to catch people on the CNCF Slack. But these are all using increasing invasiveness, and reviewers are usually overloaded already.

@beorn7
Copy link
Member
beorn7 commented Jun 11, 2025

Looking at templating in general (of which I am not an expert, as said): There is toTime, which converts a string or a number representing a Unix timestamp in seconds into the Go time.Time type. I would say that a corresponding toDuration would make sense and would probably be less controversial.

I'm not sure about the nowTime in general, but I would, in the same spirit, call it just now and let it return a Unix timestamp in seconds, which you can then convert into a Go time.Time with toTime if needed.

Final point: All of this must also be documented in https://github.com/prometheus/prometheus/blob/main/docs/configuration/template_reference.md .

@iamhalje
Copy link
Author
iamhalje commented Jun 11, 2025

@beorn7 @verdel today/tomorrow i will check all your suggestions in the Loki itself, will tell you result and will try to call the Loki developers, thanks

iamhalje added 3 commits June 12, 2025 14:01
…seGoDuration

Signed-off-by: Dmitry Ponomaryov <iamhalje@gmail.com>
Signed-off-by: Dmitry Ponomaryov <iamhalje@gmail.com>
Signed-off-by: Dmitry Ponomaryov <iamhalje@gmail.com>
@iamhalje iamhalje changed the title Add parseGoDuration in templating Add parseGoDuration to support time handling in Loki alerts. Jun 12, 2025
@iamhalje
Copy link
Author

based on discussions above, alerts should support negative duration values directly without relying on complicated workarounds. to achieve this cleanly, we will have two functions: now and parseGoDuration.

therefore, a sub-task was created to add negative duration support to model.ParseDuration in prometheus/common#793.

i hope have correctly understood everything mentioned earlier. next, i'll try to involve Loki developers in this ticket for further collaboration.

@iamhalje
Copy link
Author
iamhalje commented Jun 20, 2025

@cyriltovena @owen-d @chaudum @slim-bean @trevorwhitney @sandeepsukhani @jeschkies @paul1r @poyzannur FYI adds time templating for Loki alerts, would appreciate a review for anyone

@beorn7
Copy link
Member
beorn7 commented Jun 22, 2025

prometheus/common has a new release tag v0.65.0, which you can update this PR to.

Copy link
Member
@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comments. I think, if we do it this way, it will be very uncontroversial because it sticks to the pattern we already have with toTime.

It would still be good to get feedback from Loki folks that this is even needed.

iamhalje added 2 commits June 28, 2025 23:37
Signed-off-by: Dmitry Ponomaryov <iamhalje@gmail.com>
Signed-off-by: Dmitry Ponomaryov <iamhalje@gmail.com>
@iamhalje
Copy link
Author
iamhalje commented Jun 28, 2025

as a result, we added the functions now with prometheus type model.Time and toDuration, and replaced ParseDuration with a new function ParseDurationAllowNegative introduced in prometheus/common@v0.65.0 that support negative durations in function parseDuration

this allows us to use expressions like the following in Loki:

groups:
- name: alert
  interval: 1m
  rules:
    expr: |
      16619 == 16619
    for: 0m
    annotations:
      summary: "https://grafana.example.com/dashboard?from={{ (now.Add (toDuration (parseDuration \"-30m\"))).Time.UnixMilli }}&to={{ (now.Add (toDuration (parseDuration \"30m\"))).Time.UnixMilli }}"

@iamhalje iamhalje changed the title Add parseGoDuration to support time handling in Loki alerts. Add template functions to support time handling in Loki alerts. Jun 28, 2025
Copy link
Member
@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks good generally, but still a few details to put straight.

Comment on lines 274 to 277
//nolint:gocritic // must be a function to avoid template panics (as in Loki).
"now": func() model.Time {
return model.Now()
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some consideration, I think we should better return a float representing the number of seconds since epoch to make this useful for toTime and humanizeTime and generally the time and duration arithmetic, which is based on seconds, not milliseconds.

Suggested change
//nolint:gocritic // must be a function to avoid template panics (as in Loki).
"now": func() model.Time {
return model.Now()
},
"now": func() float64 {
now := time.Now()
return float64(now.Unix()) + float64(now.Nanosecond())/float64(time.Second)
},

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In different news, it would be good to add tests for now. (For which we need to call a function that we can replace for tests, rather than time.Now()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by comment only slightly relevant to the rest of the PR:

Go added testing/synctest which mocks time, as an experiment in Go 1.24 and for real in upcoming Go 1.25.
So we could consider using that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bboreham thanks for pointer, but Prometheus is still using Go 1.23, we can upgrade?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prometheus is committed to work with a few Go versions prior to the current one.

You could write a test with a build tag for the higher version, but in this simple case, I have an even better idea.

NewTemplateExpander already accepts an argument timestamp model.Time, which is essentially the evaluation time. So I would simply return that from now. This has two advantages: Easy for test. And most importantly: Consistent between different invocations of now in the same template expansion.

iamhalje added 2 commits July 1, 2025 21:33
Signed-off-by: Dmitry Ponomaryov <iamhalje@gmail.com>
Signed-off-by: Dmitry Ponomaryov <iamhalje@gmail.com>
@iamhalje
Copy link
Author
iamhalje commented Jul 1, 2025

with the latest changes, the working solution for us is to use

{{ ((toTime (now)).Add (toDuration (parseDuration "-30m"))).UnixMilli }}

@beorn7
Copy link
Member
beorn7 commented Jul 1, 2025

with the latest changes, the working solution for us is to use

{{ ((toTime (now)).Add (toDuration (parseDuration "-30m"))).UnixMilli }}

Or maybe with a more pipeline-y approach:

{{ ( "-30m" | parseDuration | toDuration | ( now | toTime ).Add ).UnixMilli }}

@beorn7
Copy link
Member
beorn7 commented Jul 1, 2025

{{ ( "-30m" | parseDuration | toDuration | ( now | toTime ).Add ).UnixMilli }}

Which is probably a good template to add to the tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0