Add config to escalate notifications #195

thatportugueseguy · 2025-05-22T17:11:02Z

Description of the task

This change allows users to configure a time threshold value for monorobot to insist on notifying broken steps.

If configured, when a failed build notification is received, monorobot will check if any of the steps that are broken in that build (old and new failed steps) has been marked has broken for longer than the allowed threshold. If so, it will notify again, even if no new step has been broken, showing a notification saying that the build is still failing and the @mentions of the different users that broke steps that qualify for notification.

raphael-proust · 2025-05-23T09:21:44Z

lib/action.ml

                 in
-                 Lwt.return [ Slack.generate_failed_build_notification ?slack_user_id ~failed_steps n channel ])
+                 Lwt.return [ Slack.generate_failed_build_notification ~slack_ids ~cfg ~failed_steps n channel ])


Suggested change

Lwt.return [ Slack.generate_failed_build_notification ~slack_ids ~cfg ~failed_steps n channel ])

let slack_ids = List.filter_map Fun.id slack_ids in

Lwt.return [ Slack.generate_failed_build_notification ~slack_ids ~cfg ~failed_steps n channel ])

and then remove the filter in slack.ml

I think it makes more sense to filter before and have a simpler type for the generate_failed_build_notification function.

raphael-proust · 2025-05-23T09:37:11Z

lib/slack.ml

+      Util.Webhook.get_escalation_threshold cfg n
+      |> Option.map_default
+           (fun escalation_threshold ->
+             FailedStepSet.exists
+               (fun step ->
+                 (* escalate steps that have been broken for longer than the threshold *)
+                 Util.Webhook.is_past_span step.created_at escalation_threshold)
+               failed_steps)
+           false


Suggested change

Util.Webhook.get_escalation_threshold cfg n

|> Option.map_default

(fun escalation_threshold ->

FailedStepSet.exists

(fun step ->

(* escalate steps that have been broken for longer than the threshold *)

Util.Webhook.is_past_span step.created_at escalation_threshold)

failed_steps)

false

match Util.Webhook.get_escalation_threshold cfg n with

| None -> false

| Some escalation_threshold ->

FailedStepSet.exists

(fun step ->

(* escalate steps that have been broken for longer than the threshold *)

Util.Webhook.is_past_span step.created_at escalation_threshold)

failed_steps

a bit simpler to write it plainer

i'm always on the fence on one or the other.. I thought about it, and decided for the Option version, but I think it's good to go with the match option. Thanks

raphael-proust · 2025-05-23T09:40:04Z

lib/util.ml

+                let one_hour = Ptime.Span.of_int_s (60 * 60) in
+                is_past_span escalated_at one_hour)


Should this constant be part of the config instead? Like reescalate_notification_delay or something like that? It seems quite specific to some use-case (e.g., some teams might work on more like a day schedule).

yeah, maybe it should, but i want to avoid flooding the config with options.

I think this is a sensible default: you have a first escalation warning after a certain period of time that you consider a bit too long for the build to have been broken and unattended, and then every extra hour you notify again when you have new builds that are broken due to the original author not fixing the issues.

In any case it's better to test first and tweak afterwards. You know the old "how you use your product/how users use the product" meme.. 🤷‍♂️

add config to escalate notifications

a527d1c

thatportugueseguy requested review from raphael-proust and EmileTrotignon May 22, 2025 17:11

raphael-proust reviewed May 23, 2025

View reviewed changes

thatportugueseguy added 2 commits May 23, 2025 11:36

slack/failed build not: small refactor/cleanup

15e794a

fmt

ba387ac

thatportugueseguy merged commit e229fb4 into master May 23, 2025
2 of 3 checks passed

thatportugueseguy deleted the escalate-failing-steps branch May 23, 2025 10:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add config to escalate notifications #195

Add config to escalate notifications #195

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

	Lwt.return [ Slack.generate_failed_build_notification ~slack_ids ~cfg ~failed_steps n channel ])
	let slack_ids = List.filter_map Fun.id slack_ids in
	Lwt.return [ Slack.generate_failed_build_notification ~slack_ids ~cfg ~failed_steps n channel ])

		let one_hour = Ptime.Span.of_int_s (60 * 60) in
		is_past_span escalated_at one_hour)

Add config to escalate notifications #195

Add config to escalate notifications #195

Uh oh!

Conversation

Description of the task

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!