8000 feat: metrics support for llmaz by googs1025 · Pull Request #316 · InftyAI/llmaz · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

feat: metrics support for llmaz #316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 15, 2025

Conversation

googs1025
Copy link
Member
@googs1025 googs1025 commented Mar 15, 2025

What this PR does / why we need it

  • change helm chart
  • change kustomize config
  • add prometheus operator install docs

Which issue(s) this PR fixes

Fixes part of #307

Special notes for your reviewer

Does this PR introduce a user-facing change?

None

@InftyAI-Agent InftyAI-Agent added needs-triage Indicates an issue or PR lacks a label and requires one. needs-priority Indicates a PR lacks a label and requires one. do-not-merge/needs-kind Indicates a PR lacks a label and requires one. labels Mar 15, 2025
@InftyAI-Agent InftyAI-Agent requested a review from kerthcet March 15, 2025 02:08
@pacoxu
Copy link
Contributor
pacoxu commented Mar 17, 2025

/kind feature
/lgtm

@InftyAI-Agent InftyAI-Agent added feature Categorizes issue or PR as related to a new feature. lgtm Looks good to me, indicates that a PR is ready to be merged. and removed do-not-merge/needs-kind Indicates a PR lacks a label and requires one. labels Mar 17, 2025
@googs1025
Copy link
Member Author

/hold
I want to test this for myself to check this if working as expect

@InftyAI-Agent InftyAI-Agent added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 17, 2025
@@ -52,7 +52,7 @@ metricsService:
- name: https
port: 8443
protocol: TCP
targetPort: https
targetPort: 8080
Copy link
Member
@kerthcet kerthcet Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can not change the values.yaml directly, it's auto generated which means it will be overwritten once we run make helm-package.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert.

@InftyAI-Agent InftyAI-Agent removed the 8000 lgtm Looks good to me, indicates that a PR is ready to be merged. label Mar 17, 2025
@googs1025 googs1025 force-pushed the fix/metrics_bind branch 2 times, most recently from 1ebb10b to 3a01d1c Compare March 17, 2025 08:14
Makefile Outdated
@@ -283,6 +283,10 @@ envtest: $(ENVTEST) ## Download envtest-setup locally if necessary.
$(ENVTEST): $(LOCALBIN)
test -s $(LOCALBIN)/setup-envtest || GOBIN=$(LOCALBIN) go install sigs.k8s.io/controller-runtime/tools/setup-envtest@latest

.PHONY: prometheus
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be possible to support make prometheus here

@@ -1,4 +1,7 @@
{{- if .Values.prometheus.enable -}}
{{- if .Values.prometheus.enable }}
{{- if not (.Capabilities.APIVersions.Has "monitoring.coreos.com/v1/ServiceMonitor") }}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some docs: https://helm.sh/docs/chart_template_guide/builtin_objects/
to check ServiceMonitor resourecs

Copy link
Member
@kerthcet kerthcet Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we haven't install the prometheus-operator, it will report error here, right? This is useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, test in local(without install the prometheus-operator)

root@VM-0-5-ubuntu:/home/ubuntu/llmaz# vi chart/values.global.yaml
root@VM-0-5-ubuntu:/home/ubuntu/llmaz# helm upgrade --install llmaz ./chart -f ./chart/values.global.yaml --namespace=llmaz-system
Release "llmaz" does not exist. Installing it now.
Error: execution error at (llmaz/templates/prometheus/serviceaccount.yaml:3:4): The cluster does not support the required API resource `monitoring.coreos.com/v1/ServiceMonitor`.
root@VM-0-5-ubuntu:/home/ubuntu/llmaz#

@kerthcet
Copy link
Member

Hi @googs1025 I updated some configs, ptal. Documentation is still missing.

@googs1025
Copy link
Member Author

Hi @googs1025 I updated some configs, ptal. Documentation is still missing.

Thanks for your help! I'll test it locally to double check. Also, what kind of docs do we need?

@kerthcet
Copy link
Member

Like how to install the prometheus operator, step to step to view the metrics.

@@ -39,6 +39,9 @@ spec:
periodSeconds: 20
name: manager
ports:
- containerPort: 8080
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use make helm-package
Generate the missing port part 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's ok then. we just avoid to update the charts manually. This is because we added the metric patch in kustomize.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so do I need to revert ? or just add this is ok?

@googs1025
Copy link
Member Author

Like how to install the prometheus operator, step to step to view the metrics.

We can directly see the docs which is more direct

https://github.com/googs1025/llmaz/tree/fix/metrics_bind/docs/prometheus-operator

@googs1025 googs1025 changed the title wip: fix miss metrics endpoint bind fix miss metrics endpoint bind Mar 19, 2025
@googs1025 googs1025 changed the title fix miss metrics endpoint bind feat: metrics support for llmaz Mar 19, 2025
@googs1025
Copy link
Member Author

I checked the changes and found that the metrics port was changed from 8080 to 8443 in other PR, but when I tested it with 8443, it didn't seem to work. I'm not sure what went wrong. hi @kerthcet If you have time, could you help me check if this is the case in your test? 🤔

@kerthcet
Copy link
Member

Sorry for the late reply, I will test tonight.

@kerthcet
Copy link
Member

I tried this yesterday, but my network is broken, I'll try this later. Once I have a result, I'll comment here.

@kerthcet
Copy link
Member

please rebase then I'll merge this pr.

@kerthcet
Copy link
Member

@googs1025

@kerthcet
Copy link
Member

See the last commit for fixes, generally the label selector not match.

@kerthcet
Copy link
Member

encounter bug with make helm-install, see arttor/helmify#169

@kerthcet
Copy link
Member

Install manually, it works great.
image

Could you rebase then let's merge this PR, the helm chart problem I'll solve it in #343

@googs1025
Copy link
Member Author

use make install make deploy make install-prometheus
this works as expected

image
root@VM-0-12-ubuntu:/home/ubuntu# curl 'http://localhost:9090/api/v1/query?query=controller_runtime_reconcile_errors_total'
{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"controller_runtime_reconcile_errors_total","container":"manager","controller":"backendruntime","endpoint":"https","instance":"10.6.2.8:8443","job":"llmaz-controller-manager-metrics-service","namespace":"llmaz-system","pod":"llmaz-controller-manager-7d894849f6-2cx9s","service":"llmaz-controller-manager-metrics-service"},"value":[1744711370.609,"0"]},{"metric":{"__name__":"controller_runtime_reconcile_errors_total","container":"manager","controller":"cert-rotator","endpoint":"https","instance":"10.6.2.8:8443","job":"llmaz-controller-manager-metrics-service","namespace":"llmaz-system","pod":"llmaz-controller-manager-7d894849f6-2cx9s","service":"llmaz-controller-manager-metrics-service"},"value":[1744711370.609,"0"]},{"metric":{"__name__":"controller_runtime_reconcile_errors_total","container":"manager","controller":"openmodel","endpoint":"https","instance":"10.6.2.8:8443","job":"llmaz-controller-manager-metrics-service","namespace":"llmaz-system","pod":"llmaz-controller-manager-7d894849f6-2cx9s","service":"llmaz-controller-manager-metrics-service"},"value":[1744711370.609,"0"]},{"metric":{"__name__":"controller_runtime_reconcile_errors_total","container":"manager","controller":"playground","endpoint":"https","instance":"10.6.2.8:8443","job":"llmaz-controller-manager-metrics-service","namespace":"llmaz-system","pod":"llmaz-controller-manager-7d894849f6-2cx9s","service":"llmaz-controller-manager-metrics-service"},"value":[1744711370.609,"14"]},{"metric":{"__name__":"controller_runtime_reconcile_errors_total","container":"manager","controller":"service","endpoint":"https","instance":"10.6.2.8:8443","job":"llmaz-controller-manager-metrics-service","namespace":"llmaz-system","pod":"llmaz-controller-manager-7d894849f6-2cx9s","service":"llmaz-controller-manager-metrics-service"},"value":[1744711370.609,"0"]}]}}root@VM-0-12-ubuntu:/home/ubuntu#

@googs1025
Copy link
Member Author

/unhold

@googs1025
Copy link
Member Author

/unhold

@kerthcet
Copy link
Member

/lgtm
/approve
/hold cancel

@InftyAI-Agent InftyAI-Agent added lgtm Looks good to me, indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Apr 15, 2025
@InftyAI-Agent InftyAI-Agent assigned kerthcet and unassigned pacoxu Apr 15, 2025
@InftyAI-Agent InftyAI-Agent merged commit 617ee75 into InftyAI:main Apr 15, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. feature Categorizes issue or PR as related to a new feature. lgtm Looks good to me, indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a label and requires one. needs-triage Indicates an issue or PR lacks a label and requires one.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0