-
Notifications
You must be signed in to change notification settings - Fork 122
vine: remove invalid files on worker #4133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vine: remove invalid files on worker #4133
Conversation
Combining with this pr: The long tails and execution sparsity seem gone: All file transfers can complete: Most files reach their desired replica count (10): |
@JinZhou5042 I don't doubt that this fixes your immediate problem. But it's pretty obvious that this fix is not a good general solution. If you delete a replica at the source every time a transfer files, then problems at the receiving end are going to destroy good data aggressively. We need to better understand the source of a problem in order to design a good solution. Can you explain why these transfers are failing in the first place? |
@dthain It turned out that when we receive the
I believe the sequence is as follows:
Even if we don't want to agreesively delete files from the source worker, at least deleting them from the destination worker that returns the |
And
I believe that file has been correctly removed on the worker due to the transfer failure, so what really matters is that we should remove it from the replica table? |
Could it be that, the transfer has failed, but a zombie record with |
@JinZhou5042 you are making a lot of hypotheses that could be answered by reading the code. :) Look at It's ok to hypothesize potential bugs, but then you need to show a path through the code that is clearly incorrect. Don't just guess! |
Ah yes, I missed a couple of lines. |
See #4152 |
Proposed Changes
For #4134
Merge Checklist
The following items must be completed before PRs can be merged.
Check these off to verify you have completed all steps.
make test
Run local tests prior to pushing.make format
Format source code to comply with lint policies. Note that some lint errors can only be resolved manually (e.g., Python)make lint
Run lint on source code prior to pushing.