Automatic `tracing::Span` propagation #266

texascloud · 2024-10-02T22:39:55Z

Context

In async Rust without using ractor, annotating async fns with #[tracing::instrument] will cause an appropriate nesting hierarchy to be created when calling async fns from other async fns. This preserves the callstack and is very useful when leveraging other crates in the tracing ecosystem. When using ractor, this automatic span nesting is lost since using call and cast cause the current span to not be propagated. It is possible to manually propagate the span by having a tracing::Span field in your custom message struct for the particular Actor you're impling. However, if you have many enum variants of your message and wish to propagate the span in all of them, it is cumbersome. Additionally, you must manually instrument any futures you invoke within your message handler with the span from the message you're handling. This is error-prone.

Solution

In this PR I add the automatic inclusion of the current Span when instantiating a BoxedMessage. Then when the handler is invoked, I instrument the future with that captured Span. This allows both cast and call to automatically benefit from tracing span propagation. The only thing a user of ractor would need to do to benefit from this is annotate their handle fn in their impl Actor for ActorStruct logic with #[tracing::instrument]. Even if the user did not do this, the current span could still be retrieved since we're propagating it anyways.

Test Plan:

cargo bench -p ractor on main

     Running benches/actor.rs (target/release/deps/actor-fd9a9f57eedaa4d3)
Gnuplot not found, using plotters backend
Creation of 100 actors  time:   [266.39 µs 270.86 µs 275.36 µs]
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  5 (5.00%) low mild
  3 (3.00%) high mild

Creation of 10000 actors
                        time:   [31.654 ms 32.843 ms 34.145 ms]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

Benchmarking Waiting on 100 actors to process first message: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.9s, enable flat sampling, or reduce sample count to 50.
Waiting on 100 actors to process first message
                        time:   [1.0713 ms 1.1400 ms 1.2095 ms]

Waiting on 1000 actors to process first message
                        time:   [12.655 ms 13.147 ms 13.635 ms]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

Waiting on 100000 messages to be processed
                        time:   [14.885 ms 15.123 ms 15.522 ms]
Found 14 outliers among 100 measurements (14.00%)
  5 (5.00%) low severe
  4 (4.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

     Running benches/async_traits.rs (target/release/deps/async_traits-e1754e50a02bb4eb)
Gnuplot not found, using plotters backend
Waiting on 50 messages with large data in the Future to be processed
                        time:   [120.73 µs 122.43 µs 124.51 µs]
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

cargo bench -p ractor on these changes

     Running benches/actor.rs (target/release/deps/actor-fd9a9f57eedaa4d3)
Gnuplot not found, using plotters backend
Creation of 100 actors  time:   [268.98 µs 274.91 µs 281.19 µs]
                        change: [+0.4473% +4.3969% +8.4713%] (p = 0.03 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  5 (5.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

Creation of 10000 actors
                        time:   [26.730 ms 27.284 ms 27.868 ms]
                        change: [-20.488% -16.925% -13.452%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

Benchmarking Waiting on 100 actors to process first message: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.2s, enable flat sampling, or reduce sample count to 60.
Waiting on 100 actors to process first message
                        time:   [851.50 µs 884.69 µs 928.90 µs]
                        change: [-29.934% -26.442% -22.701%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

Waiting on 1000 actors to process first message
                        time:   [8.1509 ms 8.3325 ms 8.5340 ms]
                        change: [-39.237% -36.622% -33.672%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

Waiting on 100000 messages to be processed
                        time:   [17.252 ms 17.303 ms 17.358 ms]
                        change: [+11.411% +14.419% +16.342%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

     Running benches/async_traits.rs (target/release/deps/async_traits-e1754e50a02bb4eb)
Gnuplot not found, using plotters backend
Waiting on 50 messages with large data in the Future to be processed
                        time:   [124.63 µs 125.51 µs 126.45 µs]
                        change: [+2.6848% +3.7470% +4.7820%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

In async Rust without using `ractor`, annotating async fns with `#[tracing::instrument]` will cause an appropriate nesting hierarchy to be created when calling async fns from other async fns. This preserves the callstack and is very useful when leveraging other crates in the tracing ecosystem. When using `ractor`, this automatic span nesting is lost since using `call` and `cast` cause the current span to not be propagated. It is possible to manually propagate the span by having a `tracing::Span` field in your custom message struct for the particular Actor you're impling. However, if you have many enum variants of your message and wish to propagate the span in all of them, it is cumbersome. Additionally, you must manually instrument any futures you invoke within your message handler with the span from the message you're handling. This is error-prone. In this PR I add the automatic inclusion of the current Span when instantiating a `BoxedMessage`. Then when the handler is invoked, I instrument the future with that captured Span. This allows both `cast` and `call` to automatically benefit from tracing span propagation. The only thing a user of `ractor` would need to do to benefit from this is annotate their `handle` fn in their `impl Actor for ActorStruct` logic with `#[tracing::instrument]`. Even if the user did not do this, the current span could still be retrieved since we're propagating it anyways. Test Plan: `cargo bench -p ractor` on main ``` Running benches/actor.rs (target/release/deps/actor-fd9a9f57eedaa4d3) Gnuplot not found, using plotters backend Creation of 100 actors time: [266.39 µs 270.86 µs 275.36 µs] Found 9 outliers among 100 measurements (9.00%) 1 (1.00%) low severe 5 (5.00%) low mild 3 (3.00%) high mild Creation of 10000 actors time: [31.654 ms 32.843 ms 34.145 ms] Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe Benchmarking Waiting on 100 actors to process first message: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.9s, enable flat sampling, or reduce sample count to 50. Waiting on 100 actors to process first message time: [1.0713 ms 1.1400 ms 1.2095 ms] Waiting on 1000 actors to process first message time: [12.655 ms 13.147 ms 13.635 ms] Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) low mild Waiting on 100000 messages to be processed time: [14.885 ms 15.123 ms 15.522 ms] Found 14 outliers among 100 measurements (14.00%) 5 (5.00%) low severe 4 (4.00%) low mild 2 (2.00%) high mild 3 (3.00%) high severe Running benches/async_traits.rs (target/release/deps/async_traits-e1754e50a02bb4eb) Gnuplot not found, using plotters backend Waiting on 50 messages with large data in the Future to be processed time: [120.73 µs 122.43 µs 124.51 µs] Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe ``` `cargo bench -p ractor` on these changes ``` Running benches/actor.rs (target/release/deps/actor-fd9a9f57eedaa4d3) Gnuplot not found, using plotters backend Creation of 100 actors time: [268.98 µs 274.91 µs 281.19 µs] change: [+0.4473% +4.3969% +8.4713%] (p = 0.03 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 1 (1.00%) low severe 5 (5.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe Creation of 10000 actors time: [26.730 ms 27.284 ms 27.868 ms] change: [-20.488% -16.925% -13.452%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe Benchmarking Waiting on 100 actors to process first message: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.2s, enable flat sampling, or reduce sample count to 60. Waiting on 100 actors to process first message time: [851.50 µs 884.69 µs 928.90 µs] change: [-29.934% -26.442% -22.701%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 1 (1.00%) low mild 2 (2.00%) high mild 2 (2.00%) high severe Waiting on 1000 actors to process first message time: [8.1509 ms 8.3325 ms 8.5340 ms] change: [-39.237% -36.622% -33.672%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe Waiting on 100000 messages to be processed time: [17.252 ms 17.303 ms 17.358 ms] change: [+11.411% +14.419% +16.342%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild Running benches/async_traits.rs (target/release/deps/async_traits-e1754e50a02bb4eb) Gnuplot not found, using plotters backend Waiting on 50 messages with large data in the Future to be processed time: [124.63 µs 125.51 µs 126.45 µs] change: [+2.6848% +3.7470% +4.7820%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild ```

texascloud · 2024-10-03T17:28:50Z

I did some more benchmarks. I ran cargo bench -p ractor on main, this PR, and then this PR while having the spans not be Boxed. Each run is the second run to allow for any "warming". I am ignoring criterion's "changed" numbers cuz I noticed they fluctuated quite a bit even if I made no changes to the logic being benchmarked.

Results order:
main pr_boxed pr_unboxed

Creation of 100 actors
[279.44 µs 285.23 µs 290.89 µs] [286.42 µs 296.90 µs 312.37 µs] [344.79 µs 361.02 µs 381.86 µs]

Creation of 10000 actors
[26.891 ms 27.396 ms 27.937 ms] [27.969 ms 29.289 ms 31.113 ms] [31.933 ms 32.978 ms 34.170 ms]

Waiting on 100 actors to process first message
[753.90 µs 787.61 µs 842.21 µs] [835.32 µs 858.61 µs 897.29 µs] [889.10 µs 913.09 µs 938.30 µs]

Waiting on 1000 actors to process first message
[7.5806 ms 7.7571 ms 7.9715 ms] [8.2428 ms 8.3540 ms 8.4691 ms] [9.2888 ms 9.8780 ms 10.574 ms]

Waiting on 100000 messages to be processed
[12.822 ms 12.865 ms 12.916 ms] [17.474 ms 17.527 ms 17.586 ms] [15.876 ms 15.964 ms 16.110 ms]

Waiting on 50 messages with large data in the Future to be processed
[125.25 µs 126.03 µs 126.80 µs] [119.46 µs 120.47 µs 121.67 µs] [124.89 µs 126.23 µs 127.84 µs]

codecov · 2024-10-03T20:38:26Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.98%. Comparing base (3d6f629) to head (8a61dc1).
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #266      +/-   ##
==========================================
+ Coverage   79.95%   79.98%   +0.02%     
==========================================
  Files          60       60              
  Lines       10273    10287      +14     
==========================================
+ Hits         8214     8228      +14     
  Misses       2059     2059

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

slawlor · 2024-10-03T21:00:27Z

We do appear to take ~17% hit in message processing performance (benchmark which runs consistently in the 13s range degraded to 15s).

As for memory

197951488 bytes with propagating the span
197427200 with no span (a value of None)

measured with $ /usr/bin/time -l cargo run --example a_whole_lotta_messages -r on both span set and not set

slawlor · 2024-10-03T23:50:25Z

I'm disabling the async-std CI suite. There's a flaky test that I can't repro outside of github's CI pipeline and no one uses it anyways (since we have a hard dep on tokio with at least the sync feature)

slawlor

Looks good, let's merge it

This reverts commit d695638.

Merge branch 'main' into tracing_span_propagation

4a7ce8c

slawlor force-pushed the tracing_span_propagation branch from 47ac581 to ec66563 Compare October 3, 2024 21:00

fix functionality in other feature mixes

0057e77

slawlor force-pushed the tracing_span_propagation branch 2 times, most recently from c1fc1ae to 7312f84 Compare October 3, 2024 23:39

Merge branch 'main' into tracing_span_propagation

ad9d53f

slawlor force-pushed the tracing_span_propagation branch from 7312f84 to ad9d53f Compare October 3, 2024 23:49

slawlor added 2 commits October 4, 2024 09:30

Merge branch 'main' into tracing_span_propagation

f974bf3

Merge branch 'main' into tracing_span_propagation

8a61dc1

slawlor approved these changes Oct 4, 2024

View reviewed changes

slawlor merged commit d695638 into slawlor:main Oct 4, 2024
11 checks passed

romac mentioned this pull request Oct 23, 2024

code: Logs have wrong and/or redundant spans informalsystems/malachite#489

Closed

slawlor mentioned this pull request Dec 6, 2024

Log format is changed. #297

Closed

FancyQian added a commit to Righ-inc/ractor that referenced this pull request Dec 9, 2024

Revert "Automatic tracing::Span propagation (slawlor#266)"

06737d6

This reverts commit d695638.

slawlor mentioned this pull request Dec 17, 2024

Add proper span handling to factories through extension trait #300

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Automatic `tracing::Span` propagation #266

Automatic `tracing::Span` propagation #266

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Automatic tracing::Span propagation #266

Automatic tracing::Span propagation #266

Uh oh!

Conversation

Uh oh!

Context

Solution

Test Plan:

Uh oh!

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Automatic `tracing::Span` propagation #266

Automatic `tracing::Span` propagation #266