8000 #316 conversion to OTEL Metrics by thompson-tomo · Pull Request #871 · nats-io/nats.net · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

#316 conversion to OTEL Metrics #871

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: release/2.7
Choose a base branch
from

Conversation

thompson-tomo
Copy link
Contributor
@thompson-tomo thompson-tomo commented May 24, 2025

This PR focusses on improving the open telemetry integration by introducing support for metrics and refinement of the spans

The duration of the following operations are captured as both spans & duration metrics

  • publish
  • Create
  • Send
  • Process
  • Subscribe
  • Unsubscribe

Additional metrics include:

  • subscription count
  • Sent bytes
  • Received bytes
  • Qty of messages sent
  • Qty of messages received

Prequiste work

Closes #316

@thompson-tomo thompson-tomo marked this pull request as draft May 24, 2025 08:08
@thompson-tomo thompson-tomo force-pushed the features/#316_AddMetrics branch 4 times, most recently from a0d2e83 to 6285e87 Compare May 24, 2025 09:06
@thompson-tomo thompson-tomo mentioned this pull request May 24, 2025
@mtmk
Copy link
Member
mtmk commented May 24, 2025

may i move this PR to release/2.7 branch so we can have preview release quickly?

@thompson-tomo
Copy link
Contributor Author

Sounds good to me. I still need to work on the tags aspect & refine the timing.

@mtmk
Copy link
Member
mtmk commented May 24, 2025

I still need to work on the tags aspect & refine the timing.

totally fine! please take your time, no worries

@mtmk mtmk changed the base branch from main to release/2.7 May 24, 2025 15:41
@thompson-tomo thompson-tomo force-pushed the features/#316_AddMetrics branch from 6285e87 to 1c662ea Compare May 25, 2025 02:09
@mtmk mtmk added this to the 2.7 milestone May 25, 2025
@thompson-tomo thompson-tomo force-pushed the features/#316_AddMetrics branch from 1c662ea to a568cfe Compare May 25, 2025 23:58
Copy link
Contributor
@aradalvand aradalvand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is problematic. It ties metrics to tracing. That is, if someone hasn't enabled NATS tracing, but still wants metrics, this implementation won't work.

This mistake has been made due to a misreading of the specs, which recommend setting activity durations for operation duration metrics WHEN an activity/span accompanies the operation for which the metric is being emitted — the specification states the following:

When this metric is reported alongside a messaging span, the metric value SHOULD be the same as the corresponding span duration.

(emphasis mine)

This current implementation, however, has made metrics depend on tracing, which is wrong. It means that the following setup in application code will result in no metrics being reported — violating an obvious user expectation:

builder.Services.AddOpenTelemetry()
    .WithTracing(tracerBuilder =>
    {
        tracerBuilder
            .AddAspNetCoreInstrumentation()
            .AddHttpClientInstrumentation()
            // .AddSource("NATS.Net") // NOTE: you would have to have this for *metrics* to be reported, which makes no sense.
    })
    .WithMetrics(metricsBuilder =>
    {
        metricsBuilder.AddMeter("NATS.Net") // NOTE: The metrics are clearly being subscribed to.
    })

Instead, the metrics implementation should anticipate activities being null and be independently implemented; but optionally use activity durations when the accompanying activity is non-null.

P.S. If you want examples, here's the de facto standard Kafka instrumentation library — it's careful NOT to tie metrics to tracing, as I said: https://github.com/open-telemetry/opentelemetry-dotnet-contrib/blob/03c577940d9b3eb223e21af98ac8eeb692e8e5c9/src/OpenTelemetry.Instrumentation.ConfluentKafka/InstrumentedProducer.cs#L207-L212

@thompson-tomo
Copy link
Contributor Author

First off, the implementation is not complete hence why it is in draft status. As such it is unfair to say it is problemtic to use when it is clearly not ready to review especially as a number of aspects are still missing which means that The issues which you pre-emptively foreshadowed would have all been addressed in due course. However to make the review easier when it is ready I will slightly adapt my plans and draw more inspiration from the kafka example.

@thompson-tomo thompson-tomo force-pushed the features/#316_AddMetrics branch 9 times, most recently from d1c1501 to 07a1eb6 Compare June 2, 2025 10:15
}
catch
var size = payloadBuffer.Length + (headersBuffer?.Length ?? 0);
if (_connection.ServerInfo is { } info && size > info.MaxPayload)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should just use the size from props.

@@ -240,19 +231,15 @@ internal async Task InitializeInboxSubscriptionAsync(CancellationToken cancellat
{
if (Interlocked.CompareExchange(ref _inboxSub, _inboxSubSentinel, _inboxSubSentinel) == _inboxSubSentinel)
{
var inboxSubject = $"{_inboxPrefix}.*";
var inboxSubject = new NatsSubscriptionProps($"{_inboxPrefix}.*", _connection.InboxPrefix);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check why connection is being used

@@ -41,8 +42,9 @@ public SubscriptionManager(NatsConnection connection, string inboxPrefix)
_cleanupInterval = _connection.Opts.SubscriptionCleanUpInterval;
_timer = Task.Run(CleanupAsync);
InboxSubBuilder = new InboxSubBuilder(connection.Opts.LoggerFactory.CreateLogger<InboxSubBuilder>());
_inboxSubSentinel = new InboxSub(InboxSubBuilder, nameof(_inboxSubSentinel), default, connection, this);
_inboxSubSentinel = new InboxSub(InboxSubBuilder, new NatsSubscriptionProps(nameof(_inboxSubSentinel), _connection.InboxPrefix), default, connection, this);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check which inbox prefix is used

@thompson-tomo
Copy link
Contributor Author

@mtmk apologies this has ended up becoming alot bigger than I had hoped especially as I worked through reviewing what was happening and seeing replicated functionality. Are you happy with this PR also including the refactoring? Or would you prefer it be split?

As I am nearly done with just a couple of things to look into, when you can could can you start taking a look at it & let me know if you have any questions/feedback.

@mtmk
Copy link
Member
mtmk commented Jun 2, 2025

@mtmk apologies this has ended up becoming alot bigger than I had hoped especially as I worked through reviewing what was happening and seeing replicated functionality. Are you happy with this PR also including the refactoring? Or would you prefer it be split?

As I am nearly done with just a couple of things to look into, when you can could can you start taking a look at it & let me know if you have any questions/feedback.

thanks for your hard work @thompson-tomo. i'm hoping to have a look this week. it is probably a good idea to split refactoring in general. again in general couple of things we need to make sur 8000 e and keep an eye on: refactoring may have unintentional behavior changes, sometimes it might be desirable to put them behind a feature flag if applicable. another thing is that we have to make sure performance is not affected negatively.

@thompson-tomo
Copy link
Contributor Author

No worries @mtmk I will go about splitting this PR up by making smaller PR's which as they are merged will bring down this PR diff until it is more focused on otel. First of which will be the properties being passed as a record.

@thompson-tomo thompson-tomo force-pushed the features/#316_AddMetrics branch from 07a1eb6 to 6228cd3 Compare June 2, 2025 13:16
Signed-off-by: James Thompson <thompson.tomo@outlook.com>
Signed-off-by: James Thompson <thompson.tomo@outlook.com>
Signed-off-by: James Thompson <thompson.tomo@outlook.com>
Signed-off-by: James Thompson <thompson.tomo@outlook.com>
@thompson-tomo thompson-tomo force-pushed the features/#316_AddMetrics branch 2 times, most recently from 6ea40f6 to 9e8188e Compare June 3, 2025 03:17
Signed-off-by: James Thompson <thompson.tomo@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add OTEL Metrics
3 participants
0