Releases: gperftools/gperftools
gperftools-2.16.90
gperftools 2.17rc is out!
2.17 headline changes are removal of heap leak checker and removal of legacy Perl pprof implementation. People should install and use much, much improved pprof implementation from github.com/google/pprof.
Here is a complete list of reasonably notable changes:
- [headline] heap leak checker has been amputated, as promised earlier
- [headline] we don't ship pprof anymore. People need to get modern and awesome pprof implementation from github.com/google/pprof
- we now have some basic CI infrastructure via Github Actions
- we now have basic Bazel support
- our docs have been slightly updated and converted to AsciiDoc format
- we now implement C23 free{,_aligned}_sized functions (but no libc-s offer those yet anyway)
- FreeBSD bits don't depend on procfs anymore (proc maps iterator was broken anyway; now it works)
- we don't offer mmap profiling anymore. It wasn't entirely complete for some years now, and killing it has eliminated a lot of complexity. MMap hooks are still part of ABI, but they do nothing.
Please find the list of tickets explicitly closed by this release here: https://github.com/gperftools/gperftools/issues?q=label%3A%22fixed-in-2.17%22%20
Many thanks for the following contributions:
- Alex Faxa has contributed (partial) make install support for cmake bits
- Andrey Semashev contributed one build fix and another fix for the test failure
gperftools-2.16
gperftools 2.16 is out!
This release doesn't have major fixes or big headline features, but it has quite a lot of internal modernizations and cleanups. By the number of commits, 2.16 is going to be our biggest release ever.
This release's main focus was making our code and building infrastructure simpler, more straightforward, more portable, and more modern.
Please note that the gperftools 2.16 release will be the last release with the heap leak checker included. The time has come to drop this feature entirely. All users should migrate to relevant gcc/clang sanitizers.
Here are the most notable changes:
-
we've upgraded our C++ standard to C++ 17. Some fraction of our code base was modernized.
-
We've integrated (vendored copy of) GoogleTest, and most tests now use it. GoogleTest has helped us eliminate some legacy code and reduce the number of tests that use shell scripts.
-
There are no more unnecessary wrappers around mutexes and threads for unit tests. We now use C++ standard mutexes and threads in our tests.
-
We've done the bulk of the work necessary to enable hidden visibility. The most significant change is that tests no longer reach into libtcmalloc's guts. We use a special TestingPortal interface instead. We now offer the --enable-hidden-visibility configure option, which does what it says. But please note that hidden visibility is off by default for now.
-
autotools build was significantly refactored, modernized and simplified.
-
The cmake build has also been radically simplified. The previous version attempted to duplicate the same complexity that we had in the autotools build and did not do it very well. More tests now pass under cmake. But please note that cmake support is still not entirely functional, and we're not yet able to promise anything about it.
-
Thread-local storage access and emergency malloc integration have been reworked. We now support emergency malloc even on systems with emutls and similarly "bad" TLS support. As a result, backtracing is now more reliable (e.g., on QNX).
-
OSX operator new/delete performance has been improved. OSX's malloc performance is badly compromised by its support of malloc zones, so we cannot help much (the same applies to much of our competition among memory allocators). But the C++ new/delete API doesn't have to integrate with this stuff, so we now directly replace those functions for a sizeable speedup. Note that OSX performance is still not on par with other "prime tier" OSes due to its lack of efficient TLS support.
-
Long deprecated google/ headers have been deleted (use, e.g., "gperftools/tcmalloc.h" instead)
-
All clang builds now use -Wthread-safety and actually check thread-safety declarations
-
Our code has stopped being incompatible with _TIME_BITS=64 on modern GNU Linux systems (relevant only for 32-bit systems)
-
OpenSolaris build has been verified and fixed when needed
Thanks to the following people for code contributions:
-
Github user oPiZiL (build fix for gcc 7.5)
-
Github user zhangdexin (qnx fixes)
-
Ishant Goyal (support for configuring minimal per-thread cache size)
-
Lennox Ho (several build fixes and several fixes around Windows support)
-
Olivier Langlois
-
Sergey Fedorovhas (another fix for building gperftools on old PPC
OSX computers) -
Xiang.Lin (several OSX fixes)
-
Yikai Zhao (aarch64 generic_fp stack frame validation)
You can find the list of all GitHub issues fixes in this release here:
https://github.com/gperftools/gperftools/issues?q=label%3Afixed-in-2.16+is%3Aclosed
gperftools-2.15.90
gperftools 2.16rc is out!
This release doesn't have major fixes or big headline features, but it has quite a lot of internal modernizations and cleanups. By the number of commits, 2.16 is going to be our biggest release ever.
This release's main focus was making our code and building infrastructure simpler, more straightforward, more portable, and more modern.
Please note that the gperftools 2.16 release will be the last release with the heap leak checker included. The time has come to drop this feature entirely. All users should migrate to relevant gcc/clang sanitizers.
Here are the most notable changes:
-
we've upgraded our C++ standard to C++ 17. Some fraction of our code base was modernized.
-
We've integrated (vendored copy of) GoogleTest, and most tests now use it. GoogleTest has helped us eliminate some legacy code and reduce the number of tests that use shell scripts.
-
There are no more unnecessary wrappers around mutexes and threads for unit tests. We now use C++ standard mutexes and threads in our tests.
-
We've done the bulk of the work necessary to enable hidden visibility. The most significant change is that tests no longer reach into libtcmalloc's guts. We use a special TestingPortal interface instead. We now offer the --enable-hidden-visibility configure option, which does what it says. But please note that hidden visibility is off by default for now.
-
autotools build was significantly refactored, modernized and simplified.
-
The cmake build has also been radically simplified. The previous version attempted to duplicate the same complexity that we had in the autotools build and did not do it very well. More tests now pass under cmake. But please note that cmake support is still not entirely functional, and we're not yet able to promise anything about it.
-
Thread-local storage access and emergency malloc integration have been reworked. We now support emergency malloc even on systems with emutls and similarly "bad" TLS support. As a result, backtracing is now more reliable (e.g., on QNX).
-
OSX operator new/delete performance has been improved. OSX's malloc performance is badly compromised by its support of malloc zones, so we cannot help much (the same applies to much of our competition among memory allocators). But the C++ new/delete API doesn't have to integrate with this stuff, so we now directly replace those functions for a sizeable speedup. Note that OSX performance is still not on par with other "prime tier" OSes due to its lack of efficient TLS support.
-
Long deprecated google/ headers have been deleted (use, e.g., "gperftools/tcmalloc.h" instead)
-
All clang builds now use -Wthread-safety and actually check thread-safety declarations
-
Our code has stopped being incompatible with _TIME_BITS=64 on modern GNU Linux systems (relevant only for 32-bit systems)
-
OpenSolaris build has been verified and fixed when needed
Thanks to the following people for code contributions:
-
Github user oPiZiL (build fix for gcc 7.5)
-
Github user zhangdexin (qnx fixes)
-
Ishant Goyal (support for configuring minimal per-thread cache size)
-
Lennox Ho (several build fixes and several fixes around Windows support)
-
Olivier Langlois
-
Sergey Fedorovhas (another fix for building gperftools on old PPC
OSX computers) -
Xiang.Lin (several OSX fixes)
-
Yikai Zhao (aarch64 generic_fp stack frame validation)
You can find the list of all GitHub issues fixes in this release here:
https://github.com/gperftools/gperftools/issues?q=label%3Afixed-in-2.16+is%3Aclosed
gperftools-2.15
This release has the following bug fixes:
-
Xiaowei Wang has pointed out the pthread linking issue on cmake on older glibcs (where -pthread is not implicit). See #1473 for more details.
-
Mikael Simberg and Tom "spot" Callaway have pointed out the missing symbols issue when linking PPC or i386 builds. #1474 has all the details.
Huge thanks to all contributors!
gperftools-2.14
gperftools 2.14 is out!
This release has the following set of notable changes:
-
Roman Geissler has contributed a fix to nasty initialization bug introduced in 2.13 (see github issue #1452 for one example where it fails).
-
spinlock delay support now has proper windows support. Instead of simply sleeping, it uses WaitOnAddress (which is basically windows equivalent of futexes). This improvement was contributed by Lennox Ho.
-
we now have basic QNX support (basic malloc + heap profiler) championed by Xiang.Lin. Thanks! Do note, however, that QNX doesn't provide SIGPROF ticks, so there will be no cpu profiler support on this OS.
-
Yikai Zhao has contributed several fixes to important corner cases of generic_fp stacktrace method.
-
several people have contributed various improvements to our cmake build: Lennox Ho, Sergey Fedorov, Mateusz Jakub Fila. But do note that cmake build is still incomplete and best-effort.
-
Julian Schroeder have fixed generic_fp incompatibility with ARM pointer auth.
-
Mateusz Jakub Fila has contributed implementation of mallocinfo2 function (64-bit version of mallinfo).
-
Lennox Ho has updated C malloc extension shims to include {Set,Get}MemoryReleaseRate.
-
Lennox Ho has contributed the ability to disable malloc functions patching on windows when TCMALLOC_DISABLE_REPLACEMENT=1 environment variable is set.
-
User poljak181 has contributed a fix to infinite recursion in some cases of malloc hooks (or user-replaced operator new) and MallocExtension::instance().
-
Sergey Fedorov has contributed a fix to use MAP_ANON on some older OSes without MAP_ANONYMOUS.
-
the way we detect working ucontext->pc extraction method was reworked and is now fully compile-time as opposed to config-time. This means no more duplication and mismatches between autoconf and cmake bits in this area.
List of relevant tickets can be seen online at: https://github.com/gperftools/gperftools/issues?q=label%3Afixed-in-2.14+
gperftools-2.13
gperftools 2.13 is out!
This release includes a few minor fixes:
-
Ivan Dlugos has fixed some issues with cmake and config.h defines.
-
32-bit builds no longer require 64-bit atomics (which we wrongly introduced in 2.11 and which broke builds on some 32-bit architectures).
-
generic_fp backtracing method now uses robust address probing method. The previous approach had occasional false positives, which caused occasional rare crashes.
-
In some cases, MSVC generated TrivialOnce machine code that deadlocked programs on startup. The issue is now fixed.
gperftools-2.12
Brett T. Warden contributed one significant fix. After a change in the previous release, we installed broken pkg-config files. Brett noticed and fixed that. Huge thanks!
gperftools-2.11
gperftools 2.11 is out!
Few minor fixes since rc couple weeks ago. Plus couple notable contributions:
-
Artem Polyakov has contributed auto-detection of several MPI systems w.r.t. filenames used by HEAPPROFILE and CPUPROFILE environment variables. Also, we now support HEAPPROFILE_USE_PID and CPUPROFILE_USE_PID environment variables that force profile filenames to have pid appended. Which will be useful for some programs that fork for parallelism. See #1263 for details.
-
Ken Raffenetti has extended MPI detection mentioned above with detection of MPICH system.
Thanks a lot!
gperftools-2.10.80
gperftools 2.11rc is out!
Most notable change is that Linux/aarch64 and Linux/riscv are now fully supported. That is, all unit tests pass on those architectures (previously the heap leak checker was broken).
Also notable is that heap leak checker support is officially deprecated as of this release. All bug fixes from now are on a best effort basis. For clarity we also declare that it is only expected to work (for some definition of work) on Linux/x86 (all kinds), Linux/aarch64, Linux/arm, Linux/ppc (untested as of this writing) and Linux/mips (untested as well). While some functionality worked in the past on BSDs, it was never fully functional; and will never be. We strongly recommend everyone to switch to asan and friends.
For major internal changes it is also worth mentioning that we now fully switched to C++-11 std::atomic. All custom OS- and arch-specific atomic bits have been removed at last.
Another notable change is that mmap and sbrk hooks facility is now no-op. We keep API and ABI for formal compatibility, but the calls to add mmap/sbrk hooks do nothing and return an error (whenever possible as part of API). There seem to be no users of it anyways, and mmap replacement API that is part of that facility really screwed up 64-bit offsets on (some/most) 32-bit systems. Internally for heap profiler and heap checker we have a new, but non-public API (see mmap_hook.h).
Most tests now pass on NetBSD x86-64 (I tested on version 9.2). And only one that fails is new stacktrace test for stacktraces from signal handler (so there could be some imperfections for cpu profiles).
We don't warn people away from the libgcc stacktrace capturing method anymore. In fact users on most recent glibc-s are advised to use it (pass --enable-libgcc-unwinder-by-default). This is thanks to the dl_find_object API offered by glibc which allows this implementation to be fully async-signal-safe. Modern Linux distros should from now on build their gperftools package with this enabled (other than those built on top of musl).
generic_fp and generic_fp_unsafe stacktrace capturing methods have been expanded for more architectures and even some basic non-Linux support. We have completely removed old x86-specific frame pointer stacktrace implementation in favor of those 2. _unsafe one should be roughly equivalent to the old x86 method. And 'safe' one is recommended as a new default for those who want FP-based stacktracing. Safe implementation robustly checks memory before accessing it, preventing unlikely, but not impossible crashes when frame pointers are bogus.
On platforms that support it, we now build gperftools with "-fno-omit-frame-pointer -momit-leaf-frame-pointer". This makes gperftools mostly frame-pointer-ful, but without performance hit in places that matter (this is how Google builds their binaries BTW). That should cover gcc (at least) on x86, aarch64 and riscv. Intention for this change is to make distro-shipped libtcmalloc.so compatible with frame-pointer stacktrace capturing (for those who still do heap profiling, for example). Of course, passing --enable-frame-pointers still gives you full frame pointers (i.e. even for leaf functions).
There is now support for detecting actual page size at runtime. tcmalloc will now allocate memory in units of this page size. It particularly helps on arms with 64k pages to return memory back to the kernel. But it is somewhat controversial, because it effectively bumps tcmalloc logical page size on those machines potentially increasing fragmentation. In any case, there is now a new environment variable TCMALLOC_OVERRIDE_PAGESIZE allowing people to override this check. I.e. to either reduce effective page size down to tcmalloc's logical page size or to increase it.
MallocExtension::MarkThreadTemporarilyIdle has been changed to be identical to MarkThreadIdle. MarkThreadTemporarilyIdle is believed to be unused, anyways. See issue #880 for details.
There are a whole bunch of smaller fixes. Many of those smaller fixes had no associated ticket, but some had. People are advised to see here for list of notable tickets closed in this release: https://github.com/gperftools/gperftools/issues?q=label%3Afixed-in-2.11+
Some of those tickets are quite notable (fixes for rare deadlocks in cpu profiler ProfilerStop or while capturing heap growth stacktraces (aka growthz)).
Here is list of notable contributions:
-
Chris Cambly has contributed initial support for AIX
-
Ali Saidi has contributed SpinlockPause implementation for aarch64
-
Henrik Reinstädtler has contributed fix for cpuprofiler on aarch64 OSX
-
Gabriel Marin has backported Chromium's commit for always sanity checking large frees
-
User zhangyiru has contributed a fix to report the number of leaked bytes as size_t instead of (usuall 746D y 32-bit) int.
-
Sergey Fedorov has contributed some fix for building on older ppc-based OSX-es
-
User tigeran has removed unused using declaration
Huge thanks to all contributors.
gperftools-2.10
30 May 2022
gperftools 2.10 is out!
Here are notable changes:
- Matt T. Proud contributed documentation fix to call Go programming language by it's true name instead of golang.
- Robert Scott contributed debugallocator feature to use readable (PROT_READ) fence pages. This is activated by TCMALLOC_PAGE_FENCE_READABLE environment veriable.
- User stdpain contributed fix for cmake detection of libunwind.
- Natale Patriciello contributed fix for OSX Monterey support.
- Volodymyr Nikolaichuk contributed support for returning memory back to OS by using mmap with MAP_FIXED and PROT_NONE. It is off by default and enabled by preprocessor define: FREE_MMAP_PROT_NONE. This should help OSes that don't support Linux-style madvise MADV_DONTNEED or BSD-style MADV_FREE.
- Jingyun Hua has contributed basic support for LoongArch.
- Github issue #1338 of failing to build on some recent musl versions has been fixed.
- Github issue #1321 of failing to ship cmake bits with .tar.gz archive has been fixed.