8000 [prometheus] Fix grafana sqlite locking errors (#13063) by lazovskiy · Pull Request #13068 · deckhouse/deckhouse · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[prometheus] Fix grafana sqlite locking errors (#13063) #13068

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 17, 2025

Conversation

lazovskiy
Copy link
Contributor
@lazovskiy lazovskiy commented Apr 17, 2025

Description

Grafana uses a SQLite database as a backend for storing data. There is only one database file, and it appears that Grafana executes transactions from the different code parts concurrently, hence leading to high database access contention. If some part of the code cannot take an exclusive lock, it fails and retries. When there are a lot of dashboards, alerts, etc., this particular code may exhaust all retries and fail, leading to errors.

There is a consistency mechanism better than exclusive locking - WAL.

Backports #13063

Why do we need it, and what problem does it solve?

It has been found that dashboards with alerts cannot be provisioned, or the alerts are never executed due to the issue described above.

Why do we need it in the patch release (if we do)?

We do, as several setups use alerts on the dashboards, and confirmed to have this issue.

Checklist

  • The code is covered by unit tests.
  • e2e tests passed.
  • Documentation updated according to the changes.
  • Changes were tested in the Kubernetes cluster manually.

Changelog entries

section: prometheus
type: fix
summary: Enables WAL for Grafana SQLite database to prevent locking errors, thus fixing in-dashboard alerting.
impact: Graf
8000
ana deployment will be rollout restarted.
impact_level: default

Signed-off-by: Vadim Lazovsky <vadim.lazovsky@flant.com>
Co-authored-by: Vadim Lazovsky <vadim.lazovsky@flant.com>
@github-actions github-actions bot added the area/monitoring Pull requests that update monitoring modules label Apr 17, 2025
@Taior Taior added this to the v1.68.13 milestone Apr 17, 2025
@vladimirGuryanov vladimirGuryanov merged commit c18e1f8 into release-1.68 Apr 17, 2025
37 of 39 checks passed
@vladimirGuryanov vladimirGuryanov deleted the backport-13063-to-1.68 branch April 17, 2025 14:43
This was referenced Apr 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/monitoring Pull requests that update monitoring modules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0