8000 Attempt to fix some race conditions in local subincludes by Tatskaari · Pull Request #3354 · thought-machine/please · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Attempt to fix some race conditions in local subincludes #3354

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 17, 2025

Conversation

Tatskaari
Copy link
Member
@Tatskaari Tatskaari commented Mar 13, 2025

Overview

This is a followup to #3305

Please has a race condition where it will erroneously report that a subinclude must appear after the target it is subincluding. This happens during preloaded subincludes, because the base package is nil, but we use the context package (i.e. the package the subinclude target was defined in) to check if a subinclude is local. This means that the pre-load is considered a local subinclude, which seems erroneous.

This was introduced with this PR that attempted to resolve a subinclude lockup:
#3305

The lockup would happen under the following conditions:

  1. We have two build defs in a file defined :def_2 and :def_1 (in that order)
  2. :def_2 subincludes :def_1
  3. The BUILD file these are defined in also subincludes :def_1
  4. We build, and begin parsing :def_2, reaching the subinclude(":def_1") from there before registering :def_1 to the build graph.

The main BUILD file would have registered :def_1 to the build graph in an inactive state. Normally we would queue :def_1 up again in subincludeTarget() (link), and it would be activated in the parse step (link). This doesn't happen because we have a map where we sync parses of subincludesb (link). This means the target would never be activated, and nothing can make progress.

Race condition breakdown

The sequence of events before this PR are:

Go-routine 1:

  1. Preload(":def_2")
  2. Wait for build and download :def_2 (triggers Go-routine to parse //a:all for :defs_2)
  3. Parse def_2.build_defs into the preload scope
  4. From defs_2.build_defs, eval subinclude(":defs_1")
  5. blow up because :defs_1 hasn't been added to the build graph yet (due to the sleep we added to induce this race), and this is considered a local subinclude

Go-routine 2:

  1. Parse //a:all for :defs_2
  2. Register and activate :defs_2, which is built successfully
  3. sleep(100)
  4. Register by don't activate :defs_1
  5. Done

With the fix to isLocal, we get the following change to go-routine 1:
5) Wait for built target (defs_1) because it's not local anymore
6) Add pending target and wait for build as the package already exists
7) Wait forever because the target was never activated

To fix this, when adding a target to the build graph, we check if the target is a pending target. If it is, we activate it. This means that at step 4, we activate :defs_1, and it gets built successfully unblocking the build. This is a bit grungy as we loose the original target when activating the target.

@Tatskaari
Copy link
Member Author
Tatskaari commented Mar 13, 2025

I think it should be possible to do this without the hack. The sequence of events should be:

  1. [Preload thread] Queue //a:defs_2 up for build
  2. [Preload thread] Wait for built target
  3. [Parse thread 1] Sync package parse, and wins the race for //a:all - begin parsing the BUILD file
  4. [Parse thread 1] Activate //a:defs_2, triggering it to be built
  5. [Parse thread 1] Sleep 100
  6. [Build thread] Build //a:defs_2
  7. [Preload thread] Wake up and start parsing defs_2.build_defs
  8. [Preload thread] `subinclude("//defs_1")
  9. [Preload thread] Wait for built target queues //a:all up for //a:defs_1
  10. [Parse thread 2] Sync package parse and loose the race to the other parse thread and wait
  11. [Parse thread 1] Wake up and continue parsing
  12. [Parse thread 1] subinclude("defs_2")
  13. [Parse thread 1] Wait for built target finds the target, and continues
  14. [Parse thread 1] parses the build def, finds `subcinclude("defs_1") and queues this up
  15. [Parse thread 3] Picks up the parse task. The target is in the build graph now, so this can probably just activate the target and return, before syncing for the package parse.

The problem is that step 14 hangs. We have some logic to sync subincludes to avoid duplicate work. The subinclude for "defs_1" is held by the preload thread by the time parse thread 1 gets there. I'm not sure how to work around this. Maybe we can just pay the price of having multiple things parse subincludes until it's written to the map:

  1. Check the map to see if it's there but don't lock it
  2. Parse the subinclude
  3. Add the subinclude unless somebody else beat us to it.

I think maybe the existing solution here is acceptable too though.

@Tatskaari Tatskaari requested a review from peterebden March 13, 2025 19:48
@Tatskaari Tatskaari requested a review from toastwaffle March 17, 2025 12:55
Comment on lines 134 to 139
// sleep can be used to induce race conditions in please
func sleep(s *scope, _ []pyObject) pyObject {
time.Sleep(100 * time.Millisecond)
return None
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably remove this. It's useful to debug these kind of things though.

@@ -71,6 +71,10 @@ func (m *Map[K, V]) Get(key K) V {
return v
}

func (m *Map[K, V]) Contains(key K) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted irl, this is a potential performance hit as we'll do this for every build target, but for now it's a price we're willing to pay. We'll keep an eye on the performance test

@@ -893,11 +893,13 @@ func (state *BuildState) WaitForPackage(l, dependent BuildLabel, mode ParseMode)
}

func (state *BuildState) WaitForBuiltTarget(l, dependent BuildLabel, mode ParseMode) *BuildTarget {
if t := state.Graph.Target(l); t != nil {
t := state.Graph.Target(l)
if t != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I reading this right that the behaviour hasn't actually changed here? You don't appear to be referring to t later in the function, so this could be:

if t := state.Graph.Target(l); t != nil && t.State().IsBuilt {
  return t
}

Could also just change back to what it was before to avoid the no-op change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops yeah I changed this to debug log some stuff.

@@ -0,0 +1,2 @@
[Parse]
PreloadSubincludes = //a:defs_2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitty mcnitface: plz add trailing newline

@Tatskaari Tatskaari merged commit f8ab20e into thought-machine:master Mar 17, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0