8000 Logging (1/2): new delay detector by nikurt · Pull Request #9540 · near/nearcore · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Logging (1/2): new delay detector #9540

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Sep 21, 2023
Merged

Logging (1/2): new delay detector #9540

merged 16 commits into from
Sep 21, 2023

Conversation

nikurt
Copy link
Contributor
@nikurt nikurt commented Sep 19, 2023

Improves the logging system to automatically export durations of spans as metrics and detect spans taking too long.
This PR compliments the functionality Grafana Tempo.

Changes:

  • Adds a tracing layer that keeps track of the duration of spans.
  • Exports durations of spans to Prometheus.
  • Logs spans that take longer than 1 sec to level=DEBUG target=delay_detector.

Performance impact should be minimal.
Memory usage increase is minimal.

@nikurt nikurt marked this pull request as ready for review September 20, 2023 12:53
@nikurt nikurt requested a review from a team as a code owner September 20, 2023 12:53
@nikurt nikurt requested review from akhi3030, nagisa and wacban and removed request for akhi3030 September 20, 2023 12:53
@nikurt nikurt changed the title Logging: new delay detector Logging (1/2): new delay detector Sep 20, 2023
Copy link
Contributor
@wacban wacban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇

use tracing_subscriber::Layer;

#[derive(Default)]
pub(crate) struct DelayDetectorLayer {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about TracingSpanDurationLogger or Subscriber?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion. SpanDurationLogger should be clear enough.

let mut extensions = span.extensions_mut();
if let Some(timings) = extensions.get_mut::<Timings>() {
let now = Instant::now();
timings.idle += (now - timings.last).as_nanos() as u64;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Instant::duration_since, or preferably checked_duration_since. Although Instant should never go backwards, that unfortunately does not always hold experimentally, and we would rather not have neard crash because of a debugging feature.

Copy link
Contributor Author
@nikurt nikurt Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duration_since() doesn't panic given a timestamp later than self, but saturates: https://doc.rust-lang.org/std/time/struct.Instant.html#method.duration_since

let span = ctx.span(&id).expect("Span not found, this is a bug");
let extensions = span.extensions();
if let Some(Timings { busy, mut idle, last }) = extensions.get::<Timings>() {
idle += (Instant::now() - *last).as_nanos() as u64;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would perhaps add a method to Timings along the lines of

impl Timings {
    fn observe_idle(&mut self) {
         let previous = std::mem::replace(self.last, Instant::now());
         self.idle += self.last.checked_duration_since(previous).map_or(0, |d| d.as_nanos());
    }
}

Thus reusing the logic between this and on_enter.

@nikurt nikurt requested a review from nagisa September 20, 2023 16:29
nikurt and others added 9 commits September 20, 2023 18:35
The `lib.rs` file accumulated structs and functionality from different areas.
Split the file into several smaller files.
The `lib.rs` file accumulated structs and functionality from different areas.
Split the file into several smaller files.
The `lib.rs` file accumulated structs and functionality from different areas.
Split the file into several smaller files.
@nikurt nikurt added this pull request to the merge queue Sep 21, 2023
Merged via the queue into master with commit e58d166 Sep 21, 2023
@nikurt nikurt deleted the nikurt-delays branch September 21, 2023 16:27
nikurt added a commit that referenced this pull request Sep 22, 2023
* Logging: New delay detector

* Logging: New delay detector

* Logging: New delay detector

* Logging: New delay detector

* Rename to SpanDurationLogger

* Use Durations instead of nanoseconds

* refactor: Tracing (#9544)

The `lib.rs` file accumulated structs and functionality from different areas.
Split the file into several smaller files.

* Logging: New delay detector

* Rename to SpanDurationLogger

* Use Durations instead of nanoseconds

* Use Durations instead of nanoseconds

* refactor: Tracing (#9544)

The `lib.rs` file accumulated structs and functionality from different areas.
Split the file into several smaller files.

* Logging: New delay detector

* refactor: Tracing (#9544)

The `lib.rs` file accumulated structs and functionality from different areas.
Split the file into several smaller files.

* Use Durations instead of nanoseconds
github-merge-queue bot pushed a commit that referenced this pull request Sep 25, 2023
Continues #9540

Deletes the `DelayDetector` class and replaces the uses of it with
spans.

---------

Co-authored-by: Anton Puhach <anton@near.org>
Co-authored-by: Simonas Kazlauskas <git@kazlauskas.me>
Co-authored-by: Jakob Meier <mail@jakobmeier.ch>
nikurt added a commit that referenced this pull request Sep 26, 2023
* Logging: New delay detector

* Logging: New delay detector

* Logging: New delay detector

* Logging: New delay detector

* Rename to SpanDurationLogger

* Use Durations instead of nanoseconds

* refactor: Tracing (#9544)

The `lib.rs` file accumulated structs and functionality from different areas.
Split the file into several smaller files.

* Logging: New delay detector

* Rename to SpanDurationLogger

* Use Durations instead of nanoseconds

* Use Durations instead of nanoseconds

* refactor: Tracing (#9544)

The `lib.rs` file accumulated structs and functionality from different areas.
Split the file into several smaller files.

* Logging: New delay detector

* refactor: Tracing (#9544)

The `lib.rs` file accumulated structs and functionality from different areas.
Split the file into several smaller files.

* Use Durations instead of nanoseconds
nikurt added a commit that referenced this pull request Sep 26, 2023
Continues #9540

Deletes the `DelayDetector` class and replaces the uses of it with
spans.

---------

Co-authored-by: Anton Puhach <anton@near.org>
Co-authored-by: Simonas Kazlauskas <git@kazlauskas.me>
Co-authored-by: Jakob Meier <mail@jakobmeier.ch>
nikurt added a commit that referenced this pull request Sep 26, 2023
* Logging: New delay detector

* Logging: New delay detector

* Logging: New delay detector

* Logging: New delay detector

* Rename to SpanDurationLogger

* Use Durations instead of nanoseconds

* refactor: Tracing (#9544)

The `lib.rs` file accumulated structs and functionality from different areas.
Split the file into several smaller files.

* Logging: New delay detector

* Rename to SpanDurationLogger

* Use Durations instead of nanoseconds

* Use Durations instead of nanoseconds

* refactor: Tracing (#9544)

The `lib.rs` file accumulated structs and functionality from different areas.
Split the file into several smaller files.

* Logging: New delay detector

* refactor: Tracing (#9544)

The `lib.rs` file accumulated structs and functionality from different areas.
Split the file into several smaller files.

* Use Durations instead of nanoseconds
nikurt added a commit that referenced this pull request Sep 26, 2023
Continues #9540

Deletes the `DelayDetector` class and replaces the uses of it with
spans.

---------

Co-authored-by: Anton Puhach <anton@near.org>
Co-authored-by: Simonas Kazlauskas <git@kazlauskas.me>
Co-authored-by: Jakob Meier <mail@jakobmeier.ch>
nikurt added a commit that referenced this pull request Oct 11, 2023
Continues #9540

Deletes the `DelayDetector` class and replaces the uses of it with
spans.

---------

Co-authored-by: Anton Puhach <anton@near.org>
Co-authored-by: Simonas Kazlauskas <git@kazlauskas.me>
Co-authored-by: Jakob Meier <mail@jakobmeier.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0