-
Notifications
You must be signed in to change notification settings - Fork 129
Ensure target_info and traces_target_info are consistent with the instrumentation state #1861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ctx context.Context | ||
service *svc.Attrs | ||
provider *metric.MeterProvider | ||
resourceAttributes []attribute.KeyValue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must remember the original resource attributes when the service is created for clean-up purposes. When a service is removed the attributes might be different if we had lost the K8S data.
pkg/internal/appolly/appolly.go
Outdated
} | ||
|
||
// New Instrumenter, given a Config | ||
func New(ctx context.Context, ctxInfo *global.ContextInfo, config *beyla.Config) (*Instrumenter, error) { | ||
setupFeatureContextInfo(ctx, ctxInfo, config) | ||
|
||
tracesInput := msg.NewQueue[[]request.Span](msg.ChannelBufferLen(config.ChannelBufferLen)) | ||
processEventsInput := msg.NewQueue[exec.ProcessEvent](msg.ChannelBufferLen(10)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is the best place to inject this additional small pipeline. @mariomac
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! I think so. I guess we can always move it to another part if later we require expanding the access.
pkg/transform/k8s.go
Outdated
@@ -95,30 +98,89 @@ func KubeDecoratorProvider( | |||
} | |||
} | |||
|
|||
func ProcessEventDecoratorProvider( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mariomac Essentially I needed a decorator that decorates K8S or host based. It was easiest to place it here, but I'm thinking there might be a better place for this. Perhaps if we ever expand decorators to also fetch information from the known cloud vendors, we might need something like generic Metadata decorator, which decorates consistently depending on what's available.
What I'm thinking is if it makes sense to unify the read_decorator and the k8s decorator in a metadata decorator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there aren't data dependencies or something that need to coordinated, maybe keeping two different nodes makes code simpler an easier to test/debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, sounds good. I'll split this code and move the logic for enrichment of the non k8s metadata where the read_decorator is today.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1861 +/- ##
==========================================
- Coverage 74.20% 73.97% -0.24%
==========================================
Files 177 177
Lines 19232 19395 +163
==========================================
+ Hits 14272 14347 +75
- Misses 4227 4313 +86
- Partials 733 735 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM! I only have a concern/question about the automatic generation of target_info.
pkg/internal/appolly/appolly.go
Outdated
} | ||
|
||
// New Instrumenter, given a Config | ||
func New(ctx context.Context, ctxInfo *global.ContextInfo, config *beyla.Config) (*Instrumenter, error) { | ||
setupFeatureContextInfo(ctx, ctxInfo, config) | ||
|
||
tracesInput := msg.NewQueue[[]request.Span](msg.ChannelBufferLen(config.ChannelBufferLen)) | ||
processEventsInput := msg.NewQueue[exec.ProcessEvent](msg.ChannelBufferLen(10)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! I think so. I guess we can always move it to another part if later we require expanding the access.
@@ -259,6 +264,7 @@ type Metrics struct { | |||
serviceGraphServer *Expirer[*request.Span, instrument.Float64Histogram, float64] | |||
serviceGraphFailed *Expirer[*request.Span, instrument.Int64Counter, int64] | |||
serviceGraphTotal *Expirer[*request.Span, instrument.Int64Counter, int64] | |||
targetInfo instrument.Int64UpDownCounter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
target_info is automatically created by the collector from the metrics' resource attributes, when the OTEL metrics are converted to Prometheus. Might we end up having duplicate target_info entries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I experimented with this quite a bit and it seems that if we store the exact same things as the resource attributes we are good. That's why I record the resource attributes on the Metric type, so that we get exactly the same values. The only way we'd get a duplicate is if the Span resource attributes are different than the ProcessEvent attributes, but since I'm using the same code for both now, after I refactored the setting of the metadata, it shouldn't happen, unless the metadata changes after the initial launch. But if the metadata ends up changing, we would have dups no matter what, even with Spans, some early ones will get stored with the old resource attributes.
Essentially, I made the following test. I launched the OTel demo and I didn't run any transactions, we get maybe 12 or so services in target_info. These will be naturally created because of Kafka messages, since those run all the time even if there's no workload. Then I would run a few transactions and the number of services would go up to 20+.
After the change we end up at the 20+ since the beginning and I was testing by then scaling down everything in the OTel demo to 0 and waiting around 5 minutes for Prometheus to remove the deleted services.
pkg/transform/k8s.go
Outdated
@@ -95,30 +98,89 @@ func KubeDecoratorProvider( | |||
} | |||
} | |||
|
|||
func ProcessEventDecoratorProvider( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there aren't data dependencies or something that need to coordinated, maybe keeping two different nodes makes code simpler an easier to test/debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏🏻🚀
So far we haven't paid much attention to the quality of the information present in
target_info
andtraces_target_info
. Essentially, as long as certain information was there, e.g. certain labels, we didn't care if the data was correctly stored or removed.There were couple of issues:
To mitigate these issues, I'm changing how we are processing the
target_info
and thetraces_target_info
metrics, such they are driven by the instrumentation events. Namely: