8000 [prometheus] Fix grafana sqlite locking errors by lazovskiy · Pull Request #13063 · deckhouse/deckhouse · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[prometheus] Fix grafana sqlite locking errors #13063

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 17, 2025

Conversation

lazovskiy
Copy link
Contributor
@lazovskiy lazovskiy commented Apr 17, 2025

Description

Grafana uses a SQLite database as a backend for storing data. There is only one database file, and it appears that Grafana executes transactions from the different code parts concurrently, hence leading to high database access contention. If some part of the code cannot take an exclusive lock, it fails and retries. When there are a lot of dashboards, alerts, etc., this particular code may exhaust all retries and fail, leading to errors.

There is a consistency mechanism better than exclusive locking - WAL.

Why do we need it, and what problem does it solve?

It has been found that dashboards with alerts cannot be provisioned, or the alerts are never executed due to the issue described above.

Why do we need it in the patch release (if we do)?

We do, as several setups use alerts on the dashboards, and confirmed to have this issue.

Checklist

  • The code is covered by unit tests.
  • e2e tests passed.
  • Documentation updated according to the changes.
  • Changes were tested in the Kubernetes cluster manually.

Changelog entries

section: prometheus
type: fix
summary: enable WAL for the grafana SQLite database to prevent locking errors, thus fixing in-dashboard alerting.
impact: the grafana deployment will be rollout restarted
impact_level: default

Signed-off-by: Vadim Lazovsky <vadim.lazovsky@flant.com>
@github-actions github-actions bot added the area/monitoring Pull requests that update monitoring modules label Apr 17, 2025
@lazovskiy lazovskiy added this to the v1.69.2 milestone Apr 17, 2025
@lazovskiy lazovskiy added the status/backport Cherry-pick PR to the release branch from the Milestone label Apr 17, 2025
@vladimirGuryanov vladimirGuryanov added the e2e/run/yandex-cloud Run e2e tests in Yandex Cloud label Apr 17, 2025
@deckhouse-BOaTswain
Copy link
Collaborator
deckhouse-BOaTswain commented Apr 17, 2025

🟢 e2e: Yandex.Cloud for deckhouse:fix/grafana-sqlite-locks succeeded in 44m28s.

Workflow details

Yandex.Cloud-WithoutNAT-Containerd-1.30 - Connection string: ssh redos@158.160.57.55

🟢 e2e: Yandex.Cloud, Containerd, Kubernetes 1.30 succeeded in 35m3s.

@github-actions github-actions bot removed the e2e/run/yandex-cloud Run e2e tests in Yandex Cloud label Apr 17, 2025
@vladimirGuryanov vladimirGuryanov merged commit f4e8c7f into main Apr 17, 2025
76 of 78 checks passed
@vladimirGuryanov vladimirGuryanov deleted the fix/grafana-sqlite-locks branch April 17, 2025 13:53
github-actions bot pushed a commit that referenced this pull request Apr 17, 2025
Signed-off-by: Vadim Lazovsky <vadim.lazovsky@flant.com>
Co-authored-by: Vadim Lazovsky <vadim.lazovsky@flant.com>
@deckhouse-BOaTswain deckhouse-BOaTswain removed the status/backport Cherry-pick PR to the release branch from the Milestone label Apr 17, 2025
deckhouse-BOaTswain added a commit that referenced this pull request Apr 17, 2025
Signed-off-by: Vadim Lazovsky <vadim.lazovsky@flant.com>
Co-authored-by: lazovskiy <vadim.lazovskiy@gmail.com>
Co-authored-by: Vadim Lazovsky <vadim.lazovsky@flant.com>
@deckhouse-BOaTswain
Copy link
Collaborator

Cherry pick PR 13067 to the branch release-1.69 successful!

@Taior
Copy link
Member
Taior commented Apr 17, 2025

/backport 1.68

@deckhouse-BOaTswain
Copy link
Collaborator

Failure: cherry-pick commit f4e8c7f to the branch release-1.68 failed. See Job for details.

lazovskiy added a commit that referenced this pull request Apr 17, 2025
Signed-off-by: Vadim Lazovsky <vadim.lazovsky@flant.com>
Co-authored-by: Vadim Lazovsky <vadim.lazovsky@flant.com>
vladimirGuryanov pushed a commit that referenced this pull request Apr 17, 2025
Signed-off-by: Vadim Lazovsky <vadim.lazovsky@flant.com>
Co-authored-by: Vadim Lazovsky <vadim.lazovsky@flant.com>
@saribaev021 saribaev021 added this to the v1.69.2 milestone Apr 17, 2025
morhayn pushed a commit that referenced this pull request May 14, 2025
Signed-off-by: Vadim Lazovsky <vadim.lazovsky@flant.com>
Co-authored-by: Vadim Lazovsky <vadim.lazovsky@flant.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/monitoring Pull requests that update monitoring modules status/backport/success
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0