8000 Process-stealing dead lock · Issue #52 · riot-ml/riot · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Process-stealing dead lock #52
Open
@leostera

Description

@leostera

When running on a large number of cores, the current process stealing starts dead-locking schedulers and shows a few other bugs:

  • a process gets queued up in several schedulers, which is likely a bug in the Proc_queue or Proc_set, and once its terminated in one scheduler, the next scheduler that tries to run it will fail because finalized processes should never be put on a queue.

  • when moving timers around sometimes a timer will get triggered on a scheduler before its moved out of it – moving timers to the IO scheduler helps, and can improve the reliability of the timers since the polling workload has a strict deadline, but also means reworking the timeouts for receives and syscalls.

I've been unable to fix with additional safeguards (like more restrictive locking of the process queue), but I have identified that the Proc_set is not working as intended (likely due to the use of Atomics instead of a lock).

In the meantime main has disabled process-stealing until we figure out next steps here.

This is a good time to step back and maybe rewrite the scheduler into more module pieces that can be easier to reason about and test.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0