-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HDFS-16582. Expose aggregate latency of slow node as perceived by the reporting node #4323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🎊 +1 overall
This message was automatically generated. |
@saintstack @jojochuang @tomscut could you please review this PR? |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks straightforward change. LGTM +1.
Thanks @virajjasani
* [de]serialization easy. | ||
*/ | ||
@InterfaceAudience.Private | ||
final class SlowPeerJsonReport { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this class is adapted from the subclass ReportForJson in SlowPeerTracker.
@tomscut would you like to give it a review too? |
Thanks @jojochuang for ping me, I'm looking at this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thanks @virajjasani for the contribution! Thanks @jojochuang for the review! |
… reporting node (#4323) Reviewed-by: Wei-Chiu Chuang <weichiu@apache.org> Signed-off-by: Tao Li <tomscut@apache.org>
… reporting node (apache#4323) Reviewed-by: Wei-Chiu Chuang <weichiu@apache.org> Signed-off-by: Tao Li <tomscut@apache.org>
Description of PR
When any datanode is reported to be slower by another node, we expose the slow node as well as the reporting nodes list for the slow node. However, we don't provide latency numbers of the slownode as reported by the reporting node. Having the latency exposed in the metrics would be really helpful for operators to keep a track of how far behind a given slow node is performing compared to the rest of the nodes in the cluster.
The operator should be able to gather aggregated latencies of all slow nodes with their reporting nodes in Namenode metrics.
How was this patch tested?
Dev cluster and UT.
For code changes: