[prometheus] Fix grafana sqlite locking errors (#13063) #13068

lazovskiy · 2025-04-17T14:07:42Z

Description

Grafana uses a SQLite database as a backend for storing data. There is only one database file, and it appears that Grafana executes transactions from the different code parts concurrently, hence leading to high database access contention. If some part of the code cannot take an exclusive lock, it fails and retries. When there are a lot of dashboards, alerts, etc., this particular code may exhaust all retries and fail, leading to errors.

There is a consistency mechanism better than exclusive locking - WAL.

Backports #13063

Why do we need it, and what problem does it solve?

It has been found that dashboards with alerts cannot be provisioned, or the alerts are never executed due to the issue described above.

Why do we need it in the patch release (if we do)?

We do, as several setups use alerts on the dashboards, and confirmed to have this issue.

Checklist

The code is covered by unit tests.
e2e tests passed.
Documentation updated according to the changes.
Changes were tested in the Kubernetes cluster manually.

Changelog entries

section: prometheus
type: fix
summary: Enables WAL for Grafana SQLite database to prevent locking errors, thus fixing in-dashboard alerting.
impact: Graf
8000
ana deployment will be rollout restarted.
impact_level: default

Signed-off-by: Vadim Lazovsky <vadim.lazovsky@flant.com> Co-authored-by: Vadim Lazovsky <vadim.lazovsky@flant.com>

[prometheus] Fix grafana sqlite locking errors (#13063)

5622c46

Signed-off-by: Vadim Lazovsky <vadim.lazovsky@flant.com> Co-authored-by: Vadim Lazovsky <vadim.lazovsky@flant.com>

lazovskiy requested a review from vladimirGuryanov as a code owner April 17, 2025 14:07

github-actions bot added the area/monitoring Pull requests that update monitoring modules label Apr 17, 2025

github-actions bot assigned lazovskiy Apr 17, 2025

Taior added this to the v1.68.13 milestone Apr 17, 2025

vladimirGuryanov approved these changes Apr 17, 2025

View reviewed changes

vladimirGuryanov merged commit c18e1f8 into release-1.68 Apr 17, 2025
37 of 39 checks passed

vladimirGuryanov deleted the backport-13063-to-1.68 branch April 17, 2025 14:43

This was referenced Apr 17, 2025

Changelog v1.68.13 #13058

Merged

Backport: Changelog v1.68.13 #13108

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[prometheus] Fix grafana sqlite locking errors (#13063) #13068

[prometheus] Fix grafana sqlite locking errors (#13063) #13068

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[prometheus] Fix grafana sqlite locking errors (#13063) #13068

[prometheus] Fix grafana sqlite locking errors (#13063) #13068

Uh oh!

Conversation

Uh oh!

Description

Why do we need it, and what problem does it solve?

Why do we need it in the patch release (if we do)?

Checklist

Changelog entries

Uh oh!

Uh oh!

Uh oh!