Releases: dmtcp/dmtcp
DMTCP 4.0.0
This is a major release which introduces breaking checkpoint-image format. As such, the checkpoint images are not compatible with older releases. Other fixes include:
- bug-fixes related to corner cases related to initialization.
- bug-fixes to support custom malloc libraries.
- bug-fix related to a regression involving interval checkpointing.
- fixed a regression involving --restartdir.
- support for close_range system call.
- Logging improvements.
Changelog:
- Added DmtcpCkptHeader struct by @karya0 in #1144
- Fixed readdmtcp.sh and minor cleanup for restore buf handling. by @karya0 in #1188
- Check if plugins need to skip nscd regions by @xuyao0127 in #1194
- Use static buffer for motherofall. by @karya0 in #1193
- Handle applications with user-defined mmap wrappers. by @karya0 in #1195
- Fixed a bug in Util::mmap_fixed_noreplace by @xuyao0127 in #1197
- Coordinator: Fixed interval checkpointing. by @karya0 in #1198
- dmtcp_coordinator --status-file: started/exited by @gc00 in #1199
- Fixed IPC_PRIVATE handling for SysV Shm. by @karya0 in #1192
- Fix VirtPidTbl initialization to not rely on getpid. by @karya0 in #1200
- Added close_range test to syscall-tester. by @karya0 in #1202
- Several initialization bugfixes by @karya0 in #1203
- Logging improvements by @karya0 in #1201
- A few bug fixes related to exec and initialization. by @karya0 in #1204
- Coord: Fixed epoll_wait corner case. by @karya0 in #1205
- Use linux_dirent64 type with sys_getdents64 by @xuyao0127 in #1207
- FIxed --restartdir flag by @xuyao0127 in #1208
- Bumped version to 4.0.0 and added NEWS. by @karya0 in #1209
Full Changelog: 3.2.0...4.0.0
DMTCP 3.2.0
This minor release includes:
- support for
[vvar_vclock]
memory regions present on modern kernels. - bug fix for pthread_cancel handling.
- bug fix for dlopen(NULL, ...) calls.
- bug fix for thread handling on RISCV.
Full Changelog: v3.1.2...3.2.0
DMTCP v3.1.2
A regression in 3.1.1 caused "dmtcp_launch -i XX ..." to fail.
A commit was created to fix this.
DMTCP v3.1.1
- jalib/jalloc.cpp: bool_atomic_dwcas() -- Align the storage buffers for DMTCP internal allocations to 128 bits (16 bytes)
- This affected primarily ARM64. 128-bit data types must be 16-byte aligned, or the CPU throws a SIGBUS error
- Small number of minor other change, primarily refactoring for maintenance
DMTCP v3.1.0
- Many bug fixes for robustness, performance
- Supports: x86_64, aarch64 (ARM64), RISC-V
- Supports 32-bit arm and x86 (but not recently tested; bug reports welcome)
- New flags: --stale-timeout (default: 8 hours) and --timeout (default: none)
- python3 executable is now the standard for DMTCP:
- Obsolete DMTCP plugins removed
- Enhanced use of atomics for internal lock-free data structures
(a regresssion fixed for better performance for OpenMP) - DMTCP tested to support new platforms:
MANA ckpt for MPI (release 1.0.0); CUDA ckpt (experimental;
McMini (Model Checker: MINImal for easy modification)
(release 1.0.0; experimental branch for deep debugging)) - Enhanced util/gdb-dmtcp-utils.py tools for GDB debugging
- Enhanced tools for debugging user code in GDB after restart
- See NEWS file for further details
Contributors: @aayushi363 @dahongli @JainTwinkle @karya0 @xuyao0127
DMTCP 3.0 released
Summary
For some time, it has been recommended to use the latest github master branch for new projects using DMTCP. This release formalizes that status. At this time, the InfiniBand plugin is deprecated and likely doesn't work. Further, the DMTCP flag '--no-coordinator' is not currently supported. It may be brought back to life if important use cases are seen. AARCH64 support may or may not work. Please write to developers if needed. DMTCP now requires C++14.
However, for transparent checkpointing of MPI, please see: https://github.com/mpickpt/mana That project is undergoing intensive testing. Please write to the developers for the latest status.
There is also a highly experimental branch to support transparent checkpointing of CUDA: https://github.com/DMTCP-CRAC/CRAC-early-development. Please write to the developers for plans to replace that experimental version.
Major DMTCP enhancements:
-
The plugin facility for end users has now been made more flexible. In particular, a plugin can now declare a PRESUSPEND phase. See DMTCP test/plugin/presuspend/ for an example plugin using presuspend. See the mpi-proxy-split plugin of the MANA project for a real-world example.
-
DMTCP now includes the ability to create an MTCP restart plugin, for use in split processes (see above). The lower-half application can use the MTCP restart plugin to restore the upper half from its checkpoint image.
-
The DMTCP key-value database (KVDB) was extended, for use by user plugins.
-
A new GDB utility, DMTCP/util/gdb-dmtcp-utils, is provided. Source this file into GDB when debugging DMTCP or other software. 'gdb-dmtcp-utils' does not depend on DMTCP, and can be used more generally.
Other enhancements and bug fixes:
- Much of the DMTCP coordinator was rewritten to be more flexible, and support the new split process model.
- DMTCP ordered maps were made more efficient.
- Support for Linux Hugepages was added.
- DMTCP supports Microsoft Windows WSL
- New events, RUNNING and THREAD_RESUME, were added.
- Added DMTCP_COORD_WRITE_CKPT environment variable
- Improved DMTCP logging for use when debugging DMTCP
- DMTCP now simulate vfork using fork.
- Added ability to truncate append-only/RW files on restart.
- Add './configure --disable-dlsym-wrapper' for special cases
- MAP_FIXED_NOREPLACE used for safer execution during restart
- Preserving user-requested rlimit across checkpoint-restart
- Fixed SysV msg queue logic
- Fixed freopen logic
- Many smaller bug fixes
DMTCP 2.6.0
Version 2.6.0 release notes
Newer flags for configure:
- Rename --enable-debug to --enable-logging
- Add --enable-debug: "-Wall -g3 -O0" (for debugging DMTCP)
Newer flags for dmtcp_restart:
- Add --debug-restart-pause flag to dmtcp_restart
Bug fixes and enhancements:
- Fixes for glibc versions greater than or equal to 2.24
- Fix deadlock in system() wrapper when the child crashes
- Fix deadlock when a process is forked in the resume phase (issue #691)
- jsocket: Warn user if peer closes socket while draining (issue #701)
- Fix epoll1 test (initialize addrlen for accept()) (#705)
- Fix to correctly calculate Coordinator/Host IP:
Affects some distributed applications - Allow restored stack to grow if needed.
- Fix bug in POSIX timer: race condition manifested in test/timer.c/Ubuntu-18.04
- Modified InfiniBand plugin for more robust support
(primarily of interest for MPI) - The floating point environment (fegetenv()) is now restored on restart.
(Formerly, only the rounding mode (fegetround()) was restored.) - The current resource limits (rlim_cur) for RLIMIT_NOFILE and RLIMIT_STACK
are restored if possible. - Mutex ownership and robust mutexes are now supported if DMTCP is configured
with --enable-mutex-wrappers. (However, this configuration can also
add runtime overhead if mutex operations are called very frequently.)
[Thanks to Johannes Stoelp, Laurent Buchard, Pankaj Mehta of Synopsys, Inc.] - Fix bug if stack grows a lot after a restart.
- Improved support for pty's
- util/gdbinit-example added for those who wish to debug DMTCP internals.
- Many bug fixes
DMTCP 2.5.2
- All fixes in Release DMTCP-2.4.9 are incorporated in this release.
- An incompatibility of DMTCP with Open MPI 1.10 when using orterun (mpirun)
was discovered. This does not affect recent versions, such as Open MPI 2.x. - In some rare cases, open files were not properly restored due to
a use-after-free bug. This is now fixed. - In some rare cases, one process had created a SysV shared memory object,
and a different process was assigned to restore it on restart. This
was not handled correctly, and is now fixed. - Correctly restore CPU affinities of threads
- Virtualized SysV shared memory keys to avoid race condition on restart
- Fixed logic for checking if relative path to file was a duplicate
of another existing path - The NSCD area for name service caching daemon was not handled correctly
in CentOS 6.8 and later correctly. Fixed now. - The Linux sched.h include file for scheduling of cores was added to
satisfy some older Linux distros that needed it for compiling DMTCP. - Fixed a regression in which --enable-debug (for verbose debug logs)
was not being properly written. - The DMTCP coordinator was displaying a spurious warning, "Failed to find
coordinator IP address", because it did not check for a canoncial hostname.
A related issue prevented DMTCP from working properly on some
SUSE/openSUSE distros.
DMTCP 2.4.9
Version 2.4.9 release notes
- Fixed a regression causing deleted NFS files to be handled incorrectly
- Fixed handling of glibc for versions greater than glibc-2.24
- Errors and warnings with gcc-7.x are fixed
- A rare bug affecting pthread_cancel, etc., created incorrect pid on restart
- man pages fixed: Description section was always describing dmtcp_command
DMTCP 2.5.1
Version 2.5.1 release notes
This release mostly provides added robustness. Two notable items of
added functionality are:
i. DMTCP_RESTART_PAUSE and DMTCP_RESTART_PAUSE0 environment variables
for easier debugging upon initial restart
ii. The --debug-logs flag was added to dmtcp_launch/dmtcp_restart.
One can now turn on logging individually for separate plugins,
instead of only turning it on globally.
An incompatibility of DMTCP with Open MPI 1.10 when using orterun (mpirun)
was discovered. This may also affect some other versions of Open MPI 1.10.
This bug will be fixed in a future release.
- Fixed an issue when starting multiple DMTCP coordinators on same host
at approximately the same time - Fixed issue with PBS scheduler for HPC
- Fixed issue when restarting on a different host with a larger
limit on the number of open file descriptors - dmtcp_launch/dmtcp_restart now accept '--debug-logs' flag to specify
which DMTCP plugins should produce logging information
(It used to be all or nothing.) - Improved robustness for IB (InfiniBand) plugin
- Fixed DMTCP_RESTART_PAUSE and DMTCP_RESTART_PAUSE0 environment variables
for debugging upon restart - The brk() call was failing on restart on Debian due to overly strict assert
- dmtcp_launch was hanging on some RHEL5 and RHEL6 due to deadlock with
libc low-level locks. Fixed now. - Updated tls_pid_offset in DMTCP to handle newer GLIBc (versions > 2.24)
- Fixed launch of 32-bit binary when forking/execing from a 64-bit executable
- Fixed issue that can affect a parent holding a malloc-lock while forking
- Fixed issue when a user thread calls 'dmtcp_get_coord_ckpt_dir()'