8000 Add config to escalate notifications by thatportugueseguy · Pull Request #195 · ahrefs/monorobot · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add config to escalate notifications #195

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 23, 2025

Conversation

thatportugueseguy
Copy link
Collaborator

Description of the task

This change allows users to configure a time threshold value for monorobot to insist on notifying broken steps.

If configured, when a failed build notification is received, monorobot will check if any of the steps that are broken in that build (old and new failed steps) has been marked has broken for longer than the allowed threshold. If so, it will notify again, even if no new step has been broken, showing a notification saying that the build is still failing and the @mentions of the different users that broke steps that qualify for notification.

in
Lwt.return [ Slack.generate_failed_build_notification ?slack_user_id ~failed_steps n channel ])
Lwt.return [ Slack.generate_failed_build_notification ~slack_ids ~cfg ~failed_steps n channel ])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Lwt.return [ Slack.generate_failed_build_notification ~slack_ids ~cfg ~failed_steps n channel ])
let slack_ids = List.filter_map Fun.id slack_ids in
Lwt.return [ Slack.generate_failed_build_notification ~slack_ids ~cfg ~failed_steps n channel ])

and then remove the filter in slack.ml

I think it makes more sense to filter before and have a simpler type for the generate_failed_build_notification function.

lib/slack.ml Outdated
Comment on lines 464 to 472
Util.Webhook.get_escalation_threshold cfg n
|> Option.map_default
(fun escalation_threshold ->
FailedStepSet.exists
(fun step ->
(* escalate steps that have been broken for longer than the threshold *)
Util.Webhook.is_past_span step.created_at escalation_threshold)
failed_steps)
false

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Util.Webhook.get_escalation_threshold cfg n
|> Option.map_default
(fun escalation_threshold ->
FailedStepSet.exists
(fun step ->
(* escalate steps that have been broken for longer than the threshold *)
Util.Webhook.is_past_span step.created_at escalation_threshold)
failed_steps)
false
match Util.Webhook.get_escalation_threshold cfg n with
| None -> false
| Some escalation_threshold ->
FailedStepSet.exists
(fun step ->
(* escalate steps that have been broken for longer than the threshold *)
Util.Webhook.is_past_span step.created_at escalation_threshold)
failed_steps

a bit simpler to write it plainer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm always on the fence on one or the other.. I thought about it, and decided for the Option version, but I think it's good to go with the match option. Thanks

Comment on lines +428 to +429
let one_hour = Ptime.Span.of_int_s (60 * 60) in
is_past_span escalated_at one_hour)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this constant be part of the config instead? Like reescalate_notification_delay or something like that? It seems quite specific to some use-case (e.g., some teams might work on more like a day schedule).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, maybe it should, but i want to avoid flooding the config with options.

I think this is a sensible default: you have a first escalation warning after a certain period of time that you consider a bit too long for the build to have been broken and unattended, and then every extra hour you notify again when you have new builds that are broken due to the original author not fixing the issues.

In any case it's better to test first and tweak afterwards. You know the old "how you use your product/how users use the product" meme.. 🤷‍♂️

@thatportugueseguy thatportugueseguy merged commit e229fb4 into master May 23, 2025
2 of 3 checks passed
@thatportugueseguy thatportugueseguy deleted the escalate-failing-steps branch May 23, 2025 10:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0