Closed
Description
This is another reason causing workflow slowdown.
If a transfer fails with a cache-invalid
message, it means the source worker is unable to fetch the file from the destination worker, indicating that the file has crashed on either the source or the destination.
To prevent future replication attempts or task executions from using this invalid file, we need to remove it from both the source and destination workers.
Otherwise, the manager may get stuck trying to use what it thinks are valid files.
Specifically, we should:
- remove the replica from
vine_file_replica_table
- explicitly clean the file by sending a
unlink
message
Calling delete_worker_file
seems appropriate