-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HDFS-16396. Reconfig slow peer parameters for datanode #3827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Hi @jojochuang @ayushtkn @sodonnel @Hexiaoqiao @ferhui , could you please take a look at this. Thanks. |
🎊 +1 overall
This message was automatically generated. |
...-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
Outdated
Show resolved
Hide resolved
peerMetrics = DataNodePeerMetrics.create(getDisplayName(), getConf()); | ||
} | ||
} else { | ||
peerMetrics = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
peerMetrics
isn't synchronised and it is being fetched by BpServiceActor & BlockReciever. I doubt in race conditions this may lead to some NPE there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
peerMetrics
isn't synchronised and it is being fetched by BpServiceActor & BlockReciever. I doubt in race conditions this may lead to some NPE there
Thanks @ayushtkn for your comment.
To avoid NPE, I did not set peerMetrics
to null when disable, but peerMetrics
remains. Do you think such a solution is feasible? Thank you.
...-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
Outdated
Show resolved
Hide resolved
...p-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeReconfiguration.java
Outdated
Show resolved
Hide resolved
Hi @tamaashu , please also take a look at this. Thank you. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
...p-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodePeerMetrics.java
Outdated
Show resolved
Hide resolved
this.slowNodeDetector = | ||
new OutlierDetector(minOutlierDetectionNodes, lowThresholdMs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this.slowNodeDetector
has to update minOutlierDetectionNodes
and lowThresholdMs
after reconfiguring them, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tasanuma for your comment, and it makes sense to me. I updated the code, please have a look when you are free.
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@ayushtkn Could you check that the latest PR resolves your concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org> (cherry picked from commit 0c194f2)
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
JIRA: HDFS-16396.
In large clusters, rolling restart datanodes takes a long time. We can make slow peers parameters and slow disks parameters in datanode reconfigurable to facilitate cluster operation and maintenance.