-
Notifications
You must be signed in to change notification settings - Fork 103
add kubevirt metrics and dashboards #573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add kubevirt metrics and dashboards #573
Conversation
Caution Review failedThe pull request is closed. WalkthroughThis pull request introduces several enhancements to KubeVirt monitoring capabilities. A new ServiceMonitor resource is added for metrics collection, alongside a PrometheusRule to monitor the statuses of virtual machines (VMs) and virtual machine instances (VMIs). Additionally, a new dashboard entry for the KubeVirt control plane is included. The changes also implement a templating logic that aggregates alert configurations into a single YAML document, improving the overall monitoring infrastructure. Changes
Possibly related PRs
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (5)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (1)
packages/system/kubevirt-operator/templates/PrometheusRule.yaml (1)
11-18
: Optimize alert expressions for better performance.The current expressions using
max_over_time
with regex matching could be optimized.Consider these optimizations:
- Use
absent()
function to detect when metrics are missing- Add more specific label matchers instead of regex
- Consider using
for: 10m
instead ofmax_over_time
Example optimization:
- expr: | - max_over_time( - kubevirt_vm_info{ - status!="Running", - exported_namespace=~".+", - name=~".+" - }[10m] - ) == 1 + expr: kubevirt_vm_info{status!="Running"} == 1 + for: 10mAlso applies to: 27-34
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
packages/system/kubevirt-operator/templates/PrometheusRule.yaml
(1 hunks)packages/system/kubevirt-operator/templates/ServiceMonitor.yaml
(1 hunks)packages/system/kubevirt/templates/kubevirt-cr.yaml
(1 hunks)
🔇 Additional comments (1)
packages/system/kubevirt/templates/kubevirt-cr.yaml (1)
16-16
: Verify namespace configuration consistency.The
monitorNamespace
is set totenant-root
, but the PrometheusRule is deployed in themonitoring
namespace. This inconsistency might affect metric collection.Let's verify the namespace configuration:
selector: | ||
matchLabels: | ||
prometheus.kubevirt.io: "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Specify namespaceSelector for better control.
The ServiceMonitor lacks a namespaceSelector, which means it could potentially scrape metrics from any namespace where the label selector matches.
Add a namespaceSelector to restrict metric collection to specific namespaces:
spec:
selector:
matchLabels:
prometheus.kubevirt.io: "true"
+ namespaceSelector:
+ matchNames:
+ - cozy-kubevirt
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
selector: | |
matchLabels: | |
prometheus.kubevirt.io: "true" | |
selector: | |
matchLabels: | |
prometheus.kubevirt.io: "true" | |
namespaceSelector: | |
matchNames: | |
- cozy-kubevirt |
tlsConfig: | ||
insecureSkipVerify: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Security concern: TLS verification is disabled.
Using insecureSkipVerify: true
in production environments is risky as it bypasses certificate validation, making the connection susceptible to man-in-the-middle attacks.
Consider one of these approaches:
- Configure proper certificates and remove
insecureSkipVerify
- If self-signed certificates are used, specify the CA certificate instead:
tlsConfig:
- insecureSkipVerify: true
+ ca:
+ configMap:
+ name: kubevirt-ca-cert
+ key: ca.crt
Committable suggestion skipped: line range outside the PR's diff.
labels: | ||
severity: critical | ||
annotations: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance alert metadata for better operability.
The alerts lack important metadata that would help with routing and troubleshooting.
Add standard labels for better alert handling:
labels:
severity: critical
+ team: virtualization
+ component: kubevirt
+ alert_group: vm_status
Also add runbook URLs to help operators:
annotations:
summary: "VM '{{ $labels.name }}' in namespace '{{ $labels.exported_namespace }}' is not Running for more than 10 minutes"
description: >
The VM '{{ $labels.name }}' in namespace '{{ $labels.exported_namespace }}' has been in a non-Running state
for more than 10 minutes. Please investigate the issue.
+ runbook_url: https://your-runbooks/kubevirt/vm-not-running
Also applies to: 35-37
The VM '{{ $labels.name }}' in namespace '{{ $labels.exported_namespace }}' has been in a non-Running state | ||
for more than 10 minutes. Please investigate the issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix incorrect reference in VMI alert description.
The description for VMINotRunningFor10Minutes
incorrectly refers to "VM" instead of "VMI".
description: >
- The VM '{{ $labels.name }}' in namespace '{{ $labels.exported_namespace }}' has been in a non-Running state
+ The VMI '{{ $labels.name }}' in namespace '{{ $labels.exported_namespace }}' has been in a non-Running state
for more than 10 minutes. Please investigate the issue.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
The VM '{{ $labels.name }}' in namespace '{{ $labels.exported_namespace }}' has been in a non-Running state | |
for more than 10 minutes. Please investigate the issue. | |
The VMI '{{ $labels.name }}' in namespace '{{ $labels.exported_namespace }}' has been in a non-Running state | |
for more than 10 minutes. Please investigate the issue. |
kind: PrometheusRule | ||
metadata: | ||
name: vm-not-running-alert | ||
namespace: monitoring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
PrometheusRule namespace must be aligned with monitoring stack
The PrometheusRule is currently set to use the monitoring
namespace, but the monitoring stack is configured to use cozy-monitoring
. Update the namespace to match:
metadata:
name: vm-not-running-alert
namespace: cozy-monitoring
🔗 Analysis chain
Verify namespace alignment with monitoring stack.
The PrometheusRule is deployed in the monitoring
namespace, but this might not align with your monitoring stack's configuration.
Let's verify the monitoring setup:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for Prometheus Operator CRDs and their configuration
kubectl get prometheusrules -A -o json | jq -r '.items[].metadata.namespace' | sort | uniq
# Check if monitoring namespace has required labels
kubectl get ns monitoring -o json 2>/dev/null | jq -r '.metadata.labels'
Length of output: 251
Script:
#!/bin/bash
# Find all PrometheusRule resources and their namespace configurations
rg -l "kind:\s*PrometheusRule" --type yaml
rg "namespace:\s*monitoring" --type yaml -C2
# Look for monitoring stack configuration
fd -e yaml -e yml | xargs rg "prometheusOperator|prometheus-operator" -l
Length of output: 3593
Script:
#!/bin/bash
# Check monitoring namespace configuration in monitoring components
rg -A 3 "namespace:" packages/system/monitoring-agents/
rg -A 3 "namespace:" packages/system/victoria-metrics-operator/
# Check for monitoring stack configuration files
fd -e yaml -e yml -p "monitoring|prometheus" packages/system/monitoring-agents/ | xargs cat
Length of output: 71604
a5c049e
to
30abd86
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
packages/apps/versions_map
(4 hunks)packages/extra/versions_map
(2 hunks)packages/system/kubevirt-operator/templates/PrometheusRule.yaml
(1 hunks)packages/system/kubevirt-operator/templates/ServiceMonitor.yaml
(1 hunks)packages/system/kubevirt/templates/kubevirt-cr.yaml
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- packages/extra/versions_map
🚧 Files skipped from review as they are similar to previous changes (3)
- packages/system/kubevirt/templates/kubevirt-cr.yaml
- packages/system/kubevirt-operator/templates/ServiceMonitor.yaml
- packages/system/kubevirt-operator/templates/PrometheusRule.yaml
🔇 Additional comments (1)
packages/apps/versions_map (1)
69-69
: Clarify the need for postgres and redis version updates.The updates to
postgres
andredis
packages appear unrelated to the PR's objective of adding KubeVirt metrics and dashboards. Please explain why these changes are necessary in this PR.Also applies to: 82-82
packages/apps/versions_map
Outdated
@@ -42,7 +42,7 @@ kubernetes 0.12.0 74649f8 | |||
kubernetes 0.12.1 28fca4e | |||
kubernetes 0.13.0 ced8e5b9 | |||
kubernetes 0.14.0 bfbde07c | |||
kubernetes 0.14.1 d14b66ce | |||
kubernetes 0.14.1 fde4bcfa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Standardize commit hash lengths to 8 characters
Please update the following commit hashes to use 8 characters for consistency:
175a65f
→175a65f0
(postgres)cad9cde
→cad9cde0
(virtual-machine)
🔗 Analysis chain
Verify commit hashes and standardize their length.
Please verify that all commit hashes are valid and consider standardizing their length (7 or 8 characters) for consistency:
- kubernetes: fde4bcf
- postgres: 175a65f (7 chars)
- redis: abc8f08
- virtual-machine: cad9cde (7 chars)
- vm-instance: 4f767ee
Also applies to: 69-69, 82-82, 108-108, 112-112
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify if the commit hashes exist and are valid
# Function to check commit
check_commit() {
local package=$1
local version=$2
local commit=$3
echo "Checking $package $version commit $commit:"
if git rev-parse --quiet --verify "$commit^{commit}" >/dev/null 2>&1; then
echo "✓ Valid commit"
# Show first line of commit message
git log -1 --pretty=format:"%s" "$commit" 2>/dev/null
else
echo "✗ Invalid commit"
fi
echo
}
# Check all updated commits
check_commit "kubernetes" "0.14.1" "fde4bcfa"
check_commit "postgres" "0.7.1" "175a65f"
check_commit "redis" "0.4.0" "abc8f082"
check_commit "virtual-machine" "0.5.0" "cad9cde"
check_commit "vm-instance" "0.2.0" "4f767ee3"
Length of output: 1534
30abd86
to
dd87935
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Summary by CodeRabbit
New Features
Monitoring Improvements
Configuration Updates