8000 New Build System by pmocz · Pull Request #791 · MESAHub/mesa · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

New Build System #791

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 84 commits into
base: main
Choose a base branch
from
Open

New Build System #791

wants to merge 84 commits into from

Conversation

pmocz
Copy link
Member
@pmocz pmocz commented Mar 2, 2025

by Vincent Vanlaer

@warrickball
Copy link
Contributor

I decided that I'll fight with ifort after the build system is refreshed, so I configured our University cluster to run using the SDK. It submitted test results over the course of last night. There are 25 failures but I want to get some big picture input on that result.

tl;dr: We have two options:

  1. We could probably go back to whatever compiler options meant that stuff runs under the current build system.
  2. We can try to fix a lot of issues that have been raised first.

I'm strongly in favour of 2. I think these runs have flagged some genuine issues that are worth fixing now rather than sweeping back under a rug of compiler options. Or maybe some of these are fixed on main or just missing one or two reasonable compiler flags?

Regarding those issues, they actually fall into some large categories.

  • double_bh, 15M_dynamo, accreted_material_j, high_rot_darkening and ppisn fail because of a checksum difference in the final model when it's computed from a restart. This is presumably a symptom of something not being correctly passed into and out of the photo, though I can't tell at a glance if it's the same thing.
  • 1M_thermohaline, 20M_z2m2_high_rotation, conserve_angular_momentum and magnetic_braking hit a shared issue, where it looks like s% D_ST(2) is NaN on restart. I guess this is also something not being restored from a photo correctly.
  • The RSP tests all fail because of the same energy conservation test.
  • The astero failures are all caused by index out of bounds errors in ADIPLS. I'll make investigating these my next priority.

The remaining tests (carbon_kh, relax_composition_j_entropy and simplex_solar_calibration) have unique issues. simplex_solar_calibration has an unusual structure that includes a bit of astero that BillP isolated. I think the build system just needs updating here.

@warrickball
Copy link
Contributor

Also probably worth mentioning that this is a continuation of this original external PR (#724).

@VincentVanlaer VincentVanlaer force-pushed the VincentVanlaer/build-system branch from fecf573 to 971fa3c Compare April 16, 2025 18:42
@VincentVanlaer
Copy link
Collaborator

I have rebased on main cleaned up some things (e.g. extracted #808).

There are some test cases which run into issues. From what I could see running tests locally (i.e. without checksums), the ccsn, all rsp tests, and all adipls tests fail with FPEs. While these can easily be fixed by initializing certain variables to zero at somewhat arbitrary points, it doesn't feel like a clean solution. Moreover, adipls is not failing on some uninitialized variable, but on an actual divide by zero.

I actually prefer going for option one, I have added a commit that disables FPEs by default. Syncing this PR with main was rather painful, so I would rather see this merged sooner than later (as long as there are no outstanding issues of course). Since the option to enabled FPEs still exists, getting the tests/other parts of MESA to cooperate can still be done in the future.

I am running the test suite on the local cluster with the latest changes, and I'll see whether any other issues crop up. This is without the checksums, since they are not in the repository.

@VincentVanlaer
Copy link
Collaborator

Something I forgot to add is that the default insertion of NaNs could have also been a reason for the difference in the photos.

@VincentVanlaer VincentVanlaer force-pushed the VincentVanlaer/build-system branch from ccbe028 to d248833 Compare April 17, 2025 22:12
This commit introduces a new build system (still based on make files).
Reasons for replacing the previous build system were:

- the previous build system had a lot of duplication between the
  modules, both in terms of make files and shell scripts.
- parallelisation was limited. The new build system parallelises better
  within a module, and between modules.
- build scripts and output from the build ended up in the same folder.
  With the new system, all build output files are stored in the folder
  ``build`` in the repository root, making cleaning build files
  equivalent to deleting this folder

For the details of the implementation, see the newly added documentation
(Developing -> Build system).

Some further notable changes:

- makedepf90 has been replaced by a perl script that scans the fortran
  source files. This allows for better flexibility in construction the
  make files.
- Currently, `pymesa` does not work with the new build system, as the
  new system does not provide a way to build shared libraries for every
  module.
- Dependencies are being discovered using pkg-config. This change
  makes it easier to build MESA without the SDK, as linux distributions
  typically ship with pkg-config files.
This commit adds the necessary pkg-config files until the SDK ships
them.
@warrickball
Copy link
Contributor

That would be great :) If you fix those test cases, I think it would make most sense to submit them as a separate PR, since (or at least that was the case in my testing) they are also broken on main.

Just to confirm that you're absolutely right: these fail on main. See e.g. these bluebear results.

So I think this looks ready to merge? It's a big change so we probably want an extra set of eyes, especially from the build side. @rhdtownsend?

@warrickball
Copy link
Contributor

Ah, I now realise I haven't reviewed the actual code fixes. I'll have a look as soon as I get a chance and put in a proper GitHub review.

@Debraheem
Copy link
Member

"there is already a bunch of documentation in this PR on how the new build system works with also information on how to use a module in another project."

Can this information make its way into some excerpt on the mesadocs website? Ultimately, anything explained inside this pr will be lost to history in due time. Perhaps under "modules", there can be a small section titled "build system" that rehashes this.

@Debraheem
Copy link
Member
Debraheem commented May 21, 2025

Super happy to see the new build system reach it's final form!

Given the MESA release candidate is coming very soon. At the moment, I am hesitant toward merging this in until after the release comes out, unless we delay the release. Imo, the new build system should live in main for some months before making into a new release. Are my concerns valid, and or @VincentVanlaer is your preferred timeline such that you'd prefer this to make it immediately into the next release? Am i just fearmongering? let me know.

Perhaps, this is something @VincentVanlaer can discuss with mesa-dev.

@pmocz
Copy link
Member Author
pmocz commented May 21, 2025

@VincentVanlaer have you been able to build on MacOS? I get a

make/defaults-module.mk:6: *** extracting `make` directory based on defaults-module.mk location failed.  Stop.

@VincentVanlaer
Copy link
Collaborator

"there is already a bunch of documentation in this PR on how the new build system works with also information on how to use a module in another project."

Can this information make its way into some excerpt on the mesadocs website? Ultimately, anything explained inside this pr will be lost to history in due time. Perhaps under "modules", there can be a small section titled "build system" that rehashes this.

By in this PR, I mean actual changes to the documentation. So not the comments on this or the previous PR or commit messages. I've put it under developing.

@Debraheem
Copy link
Member

Thanks, I found it!

@VincentVanlaer
Copy link
Collaborator

Super happy to see the new build system reach it's final form!

Given the MESA release candidate is coming very soon. At the moment, I am hesitant toward merging this in until after the release comes out, unless we delay the release. Imo, the new build system should live in main for some months before making into a new release. Are my concerns valid, and or @VincentVanlaer is your preferred timeline such that you'd prefer this to make it immediately into the next release? Am i just fearmongering? let me know.

Somewhat of a concern I have, is that there is some upkeep for not merging this branch, as it touches all modules and conflicts do arise. But I do agree that it is preferable to not have something that is broken for a certain subset of OSes (e.g. MacOS, various Linux installations, or maybe even certain minimal environments). I can see if I can extract some changes from this branch to reduce the number of conflicts arising.

Perhaps, this is something @VincentVanlaer can discuss with mesa-dev.

@VincentVanlaer
Copy link
Collaborator

@VincentVanlaer have you been able to build on MacOS? I get a

make/defaults-module.mk:6: *** extracting `make` directory based on defaults-module.mk location failed.  Stop.

There was some early testing, but I don't own a mac, so I can't check myself. I am not sure exactly what is provided by the mesasdk on macos, and what is provided by the OS itself (and what the versions of those things are). My suspicion is that sed is not available?

@pmocz
Copy link
Member Author
pmocz commented May 22, 2025

@VincentVanlaer have you been able to build on MacOS? I get a

make/defaults-module.mk:6: *** extracting `make` directory based on defaults-module.mk location failed.  Stop.

There was some early testing, but I don't own a mac, so I can't check myself. I am not sure exactly what is provided by the mesasdk on macos, and what is provided by the OS itself (and what the versions of those things are). My suspicion is that sed is not available?

Mac does have sed but can require a different syntax and may be missing advanced features. E.g.

if [[ "$(uname)" == "Darwin" ]]; then
  # Macs require a different syntax for sed
  sed -i '' 's/log_center_density_limit/log_center_density_upper_limit/' inlist*
else
  sed 's/log_center_density_limit/log_center_density_upper_limit/' -i inlist*
fi

I'll have a look at this too. It builds in my linux system

@VincentVanlaer
Copy link
Collaborator

In the meantime, I have set up a MacOS VM, and have gotten everything to work, all the way to running the default work dir with pgplot showing up. Some notes:

  • There was an spurious -o option passed to ar, this has been fixed
  • make 3.81 is too old, but I think I can fix most of the issues. For now I installed a newer version of make via homebrew
  • Installed pkg-config via homebrew as well. This either needs to be included in the SDK or added as a prerequisite. The same holds for Linux, but pkg-config is easily installed by package managers.

@pmocz
Copy link
Member Author
pmocz commented May 29, 2025

A few issues on my mac, and I was wondering what your thoughts are @VincentVanlaer in avoiding these:

  • My gnu make does not have the -Oline option (used in the install file)
  • I also do not have $(.SHELLSTATUS), (used in make/defaults-module.mk) the variable is empty

Finally, if I remove the use of the above, my current install is failing at:

../make/compile-settings-gnu.mk:41: *** pkg-config failed to find some of hdf5_fortran, check PKG_CONFIG_PATH is correct.  Stop.

This is not available in the MacOS make version, which seems to be
perpertually locked at 3.81
Old version of make (e.g. the 3.81 version that MacOS uses) drops
trailing slashes, which makes it impossible for us to distinguish normal
files and directories.
This was introduced in make 4, and is therefore not available in 3.81
Make version is too old
@VincentVanlaer
Copy link
Collaborator

Yeah I wasn't entirely clear in my previous comments. I came across the same issues, and temporarily fixed them by switching to a newer version of gnumake. However, I still needed to fix them properly. This has been done now, and the new build system works in a MacOS VM.

The only thing you should need to do compared to main is install pkg-config.

@pmocz
Copy link
Member Author
pmocz commented Jun 6, 2025

It installs on my macOS! I did have to do a brew install pkgconf

@pmocz
Copy link
Member Author
pmocz commented Jun 6, 2025

@VincentVanlaer the install process creates a sample/ folder with diff.txt and tmp.txt in it. Is that intentional?

@VincentVanlaer
Copy link
Collaborator

Hmm, I doesn't do that for me. Are these leftover from a previous install maybe? I think I removed the sample directory at some point, so changing to this branch probably removed all the files except for the untracked ones.

@pmocz
Copy link
Member Author
pmocz commented Jun 6, 2025

Nice! Some timings on my macOS:

# OLD
./install  # 4:40
./install  # 0:55
./clean    # 0:10


# NEW
./install  # 3:55
./install  # 0:02
./clean    # 0:01

@pmocz
Copy link
Member Author
pmocz commented Jun 6, 2025

I am getting a warning: ld: warning: search path '/Applications/mesasdk/pgplot' not found .. Is this an issue? (I think my pgplot is in /Applications/mesasdk/lib/pgplot)

@VincentVanlaer
Copy link
Collaborator

Looks like I have made a typo in the pgplot pkg-config file. Should have been ${MESASDK_ROOT}/lib instead of ${MESASDK_ROOT}/pgplot. Since the library path is the same for all the other libraries, it doesn't break anything.

This is to ensure the correct runpath is set by the gfortran wrapper in
the sdk
@VincentVanlaer
Copy link
Collaborator

Should be fixed now. I have also added a fix for the correct selection of the math libraries in the SDK.

Transitive dependencies are now only added at the linking stage, not the
compile stage
Old MacOS versions don't have realpath
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0