8000 vine: race condition in file unlink and cache update · Issue #4152 · cooperative-computing-lab/cctools · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
vine: race condition in file unlink and cache update #4152
Open
@JinZhou5042

Description

@JinZhou5042

This is a race condition where the cache-update message may arrive after the file has already been deleted (unlink) on the worker. As a result, the manager mistakenly believes the file is still available, leading to a series of issues. For example:

  • The replica table becomes inconsistent, with incorrect replica counts. This can cause chaos under heavy file volumes and frequent file pruning.

  • Tasks are scheduled to workers without their inputs. When inputs are found missing, the worker forsakes the tasks. Worse still, if that worker is the only replica holder, the task is rescheduled repeatedly and forsaken over and over again, resulting in a deadlock.

The following figure illustrates this symtom:

Image

At T3, the following two things happen simultaneously:

  • the manager unlinks a file from two sources and clears the replica table
  • one worker has got that file and sent the cache-update message, but this message is on the way or queued up in the link

Eventually, the two workers correctly remove the file upon receving unlink, but the manager wrongly updates the replica table when receving the outdated cache-update

#4134, #4084, #4038 can be in part related to this issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    TaskVinebugFor modifications that fix a flaw in the code.critical

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0