8000 Releases · mpickpt/mana · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Releases: mpickpt/mana

MANA v1.2.0

17 Apr 19:06
Compare
Choose a tag to compare

We're pleased to announce MANA 1.2.0 !

This minor version release is backward compatible with version 1.1.0 and 1.0.0. The changelog for 1.2.0 is described relative to 1.0.0, and not the intermediate 1.1.0 release. There are many changes for numerous bug fixes, performance improvements, portabillity to many environments, and robustness across MPI applications.

Furthermore, detailed documentation of how to run MANA with or without Slurm, with examples for CentOS 7 (similar to Rocky 8) and with examples for the Perlmutter supercomputer (SUSE Enterprise) at NERSC/LBNL. The upstream version of the documentation is at https://github.com/mpickpt/mana-doc.

Major changes

  • MANA has been tested to support CentOS 7, Rocky 8, SUSE Enterprise Linux on Perlmutter at NERSC/LBNL, and Ubuntu 22.04i (single-node only).
  • MANA has been tested to support CentOS 7, Rocky 8, SUSE Enterprise Linux on Perlmutter at NERSC/LBNL, and Ubuntu 22.04.
  • MANA has been tested to support x86_64, ARM64, and RISC-V.
  • MANA has been tested to support MPICH-3.x, MPICH-4.x, Open MPI-4.x, Open MPI-5.x, and ExaMPI.
  • Due to the underlying DMTCP, MANA requires C++14 or higher. Note that CentOS 7 supplies an incompatible gcc-4.8 by default, and where icc is installed it must be based on a newer version of gcc than gcc-4.8.
  • MANA has not been tested for all possible combinations of the above.
  • MANA is organized as a DMTCP plugin. This release has been updated to use DMTCP commit 9370f9a on the main branch.

The mana_launch command now directly executes MPI programs compiled with the native mpicc command for any of the supported MPI implementations of that site. However, for some MPI implementations (especially MPICH-4.x), you may need to use the new flag --use-shadowlibs:

  • mana_launch --use-shadowlibs ... <MPI_APLICATION>

Alternatively, you can compile an MPI application with mpicc_mana, to directly build an executable
that can be executed by MANA (but which cannot be executed solely by an MPI implementation without MANA.

Changed features

Performance (runtime overhead)

Runtime overhead has been greatly improved, to the extent that most codes will see substantially less than 1% runtime overhead when running with MANA on top of the native MPI implementation, as opposed to directly executing solely with the native MPI implementation. Previous to this, up to 30% runtime overhead had been observed on MPI applications that particularly stressed intensive use of MPI functions. There were two major improvements to achieve this, concerning collective MPI operations and point-to-point MPI functions.

  • Collective MPI functions: A novel sequence number algorithm was implemented. The sequence number algorithm is documented in: "Enabling Practical Transparent Checkpointing for MPI: A Topological Sort Approach", Cluster'24, Y. Xu and G. Cooperman
  • Point-to-point functions: A new implementation of point-to-point wrapper functions was implemented, different from the implementation in the original MANA paper ("MANA for MPI: MPI-Agnostic Network-Agnostic Transparent Checkpointing", HPDC'19, R. Garg, G. Price, G. Cooperman). The new wrapper implementation calls MPI_Iprobe before MPI_Recv, so as to efficiently guarantee that MANA is not blocked inside an MPI function at the time of a checkpoint request..

Modest runtime overhead on older Linux kernels (< 5.9)

On older Linux kernels, (kernel version less than 5.9), Linux did not
yet support the fsgsbase instructions for x86_64. This primarily
affects CentOS 7 for x86_64 CPUs.

You can detect if this
affects you, when doing ./configure. Check if you see 'yes' or 'no' for:

checking if FSGSBASE for x86_64 (setting fs in user space) is available.

If this affects you, you may see up to 5% additional runtime overhead due
to the lack of availability of these assembly instructions in user-mode.
The exact runtime overhead depends on how intensively your code calls
MPI functions. In MANA (on x86_64 only), each call to an MPI function
requires resetting the pointer to the thread-local storage, which is
held in the x86_64 fs register. Prior to Linux 5.9, this required
a kernel call and could not be done in user space.

MANA detailed documentation in readthedocs.io

MANA now has detailed documentation at readthedocs.io. For issues of incomplete or erroneous documentation, please notify the developers by opening an issue at the upstream repo:
https://github.com/mpickpt/mana-doc.

Compatibility with MPI-3.x

MANA has removed support for those functions that existed in MPI-2.x, but were removed in the MPI-3.x standard. This was necessary, since the mpi.h file of some MPI implementations had already removed the signatures for the MPI functions that had been removed from the standard.

MANA v1.1.0

17 Feb 18:47
Compare
Choose a tag to compare
  1. Improved compatibility of different clusters.
  2. Update the DMTCP used by MANA to the latest version.
  3. Renames the "kernel-loader" program to "lower-half", and adds a README file describing the functionality of the lower-half program.

What's Changed

New Contributors

Full Changelog: v1.0.2...v1.1.0

MANA v1.0.2

19 Sep 15:36
Compare
Choose a tag to compare
  1. Fixed a bug in the kernel loader that copies the wrong stack size.
  2. Support the Discovery cluster.

MANA v1.0.1

12 Sep 16:15
Compare
Choose a tag to compare

Removing an obsolete configure option.

v0.9.0

21 Dec 18:58
Compare
Choose a tag to compare
v0.9.0 Pre-release
Pre-release

As the version number (0.9.0) indicates, please test this after installing.

This version of MANA should compile and execute on CentOS.
Further, it should work functionally on Perlmutter at NERSC.

However, an upgrade to Perlmutter in mid-November 2023 caused a severe slowdown when using multiple nodes. This has been traced to MANA's lower-half MPI library, which is statically linked. MPI applications continue to be dynamically linked, as usual, but MPI calls are eventually passed to a statically linked MPI library.

In early 2024, MANA will be ported to use a dynamically linked MPI library in MANA's lower-half MPI library. This will be labelled version 1.0.0.

0