8000 Enable parallel build by max-au · Pull Request #2039 · erlang/rebar3 · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Enable parallel build #2039

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 4, 2019
Merged

Enable parallel build #2039

merged 1 commit into from
Apr 4, 2019

Conversation

max-au
Copy link
Contributor
@max-au max-au commented Mar 29, 2019

Support for parallel compilation of *.erl file was dropped before 3.0 release.
However, our tests for a project containing ~500 source files show substantial gain, lowering compilation time from 58 seconds to 18 on a MacBook Pro 15" (4 cores, 8 threads), and to just 10 seconds on Xeon-D machine.

Support for parallel compilation of *.erl file was dropped before 3.0 release.
However, our tests for a project containing ~500 source files show substantial gain, lowering compilation time from 58 seconds to 18 on a MacBook Pro 15" (4 cores, 8 threads), and to just 10 seconds on Xeon-D machine.
@ferd
Copy link
Collaborator
ferd commented Mar 29, 2019

Interesting. The build fails on OTP-17 and OTP-18 for the test detecting changes in inline declaration of parse transforms (

recompile_when_parse_transform_inline_changes(Config) ->
AppDir = ?config(apps, Config),
Name = rebar_test_utils:create_random_name("parse_transform_inline_"),
Vsn = rebar_test_utils:create_random_vsn(),
rebar_test_utils:create_app(AppDir, Name, Vsn, [kernel, stdlib]),
ok = filelib:ensure_dir(filename:join([AppDir, "src", "dummy"])),
ModSrc = <<"-module(example).\n"
"-export([foo/2]).\n"
"-compile([{parse_transform, example_parse_transform}]).\n"
"foo(_, _) -> ok.">>,
ok = file:write_file(filename:join([AppDir, "src", "example.erl"]),
ModSrc),
ParseTransform = <<"-module(example_parse_transform).\n"
"-export([parse_transform/2]).\n"
"parse_transform(AST, _) -> AST.\n">>,
ok = file:write_file(filename:join([AppDir, "src", "example_parse_transform.erl"]),
ParseTransform),
rebar_test_utils:run_and_check(Config, [], ["compile"], {ok, [{app, Name}]}),
EbinDir = filename:join([AppDir, "_build", "default", "lib", Name, "ebin"]),
{ok, Files} = rebar_utils:list_dir(EbinDir),
ModTime = [filelib:last_modified(filename:join([EbinDir, F]))
|| F <- Files, filename:basename(F, ".beam") == "example"],
timer:sleep(1000),
NewParseTransform = <<"-module(example_parse_transform).\n"
"-export([parse_transform/2]).\n"
"parse_transform(AST, _) -> identity(AST).\n"
"identity(AST) -> AST.\n">>,
ok = file:write_file(filename:join([AppDir, "src", "example_parse_transform.erl"]),
NewParseTransform),
rebar_test_utils:run_and_check(Config, [], ["compile"], {ok, [{app, Name}]}),
{ok, NewFiles} = rebar_utils:list_dir(EbinDir),
NewModTime = [filelib:last_modified(filename:join([EbinDir, F]))
|| F <- NewFiles, filename:basename(F, ".beam") == "example"],
?assert(ModTime =/= NewModTime).
).

I've tried to restart the job one one of the two runs, and it still failed. There's no major reason why that should be, but my guess (and this is only a guess) is that there is an order in which some modules need to be compiled (behaviours and parse transforms before regular modules), and the parallel compilation here is a bit too naive -- in some cases, it is possible that an essential module like a behaviour or a parse transform finishes compiling after a module that depends on them is popped off the queue.

Let me explain.

To be reliable, this PR would need to be able to define priority steps. For example, it is a possible that we have a dependency chain like behaviour A <- parse_transform B <- behaviour C <- module D where <- means "is depended on by", in which case there would need to be at least 3 sequential steps before D can be safely popped off the queue.

The problem is that the current structure you've used is the one used by the sequential compiler which splits things into 'first files' and 'rest of files'. The gotcha is that the 'first files' section you currently use sequentially contains only parse transforms that are defined in compiler options (erl_opts), but not those that are declared inline (-compile({parse_transform, Mod})) -- the inline parse transforms are found by the directed acyclic graph built in rebar_compiler a 8000 s returned by rebar_compiler_erl:dependencies/3.

Basically the "first files" is the override to mandate a high priority for files that must be declared first because of compiler options invisible to analysis of .erl files, but there is still an important sequential ordering that can exist in the rest of files when you analyze them on their own.

Parallelizing module compilation properly would require to not just use a topological sort as in

DepErlsOrdered = digraph_utils:topsort(SubGraph),
but to instead return the various "levels" of digraph traversal that may internally but not individually be done in parallel.

Until then, this patch (and this is still just a guess) has a heavy chance of breaking random builds.

@ferd
Copy link
Collaborator
ferd commented Mar 29, 2019

Oh and the real sucky part; it seems like the current flat list approach is part of the new compiler interface, which makes it really annoying to just create more parallel groups. If my hunch on the error is right, it might make sense to instead take the topological sort, and force all the depended-on modules into the first files to turn on parallelism safely. This could be more conservative with more sequential files than the currently proposed approach in this PR, but otherwise safe.

I'm thinking that the topological sort's files with no out-neighbours (iirc, the outgoing edges represent "is depended on", but we should double-check) should be safe to parallelize while the others need to keep their relative order.

EDIT: that split approach wouldn't work because the compiler options for files built in erl_first_files removes the global parse transform options. So if a parse transform relies on an erl_opts parse transform value itself, things will break. There won't be a choice but to use some kind of various list levels.

@ferd ferd mentioned this pull request Mar 29, 2019
@ferd
Copy link
Collaborator
ferd commented Mar 29, 2019

@max-au Check out #2040 where I implement one version that should respect the sequential dependencies better. Let me know if it still works fast enough for your use cases too

@tsloughter
Copy link
Collaborator

Because there is a slow down if there aren't a lot of files to compile this needs to not do a parallel compile if there are <N to compile. Not sure what a good N is :)

@max-au
Copy link
Contributor Author
max-au commented Mar 29, 2019

Because there is a slow down if there aren't a lot of files to compile this needs to not do a parallel compile if there are <N to compile. Not sure what a good N is :)

We have projects with >500 files. I am doing performance run on a fixed (#2040) version, and it seems to be just perfect. And I can't see a visible slow down on smaller projects.

@ferd ferd merged commit ae0af35 into erlang:master Apr 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0