8000 fio core dumps · Issue #1452 · gperftools/gperftools · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

fio core dumps #1452

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mabod opened this issue Oct 20, 2023 · 9 comments
Closed

fio core dumps #1452

mabod opened this issue Oct 20, 2023 · 9 comments

Comments

@mabod
Copy link
mabod commented Oct 20, 2023

I see core dumps with fio since a few weeks. I opened an issue for fio:

axboe/fio#1650

The fio developers say it is an out-of-memory issue and not a fio issue. Today I downgraded gperftools from 2.13-1 to 2.12-1 (EndeavourOS resp. Arch) and the coredumps are gone.

Can you please have a look into this?

@mabod
Copy link
Author
mabod commented Oct 21, 2023

I see this on second computer as well. fio crashing with gperftools 2.13.

First computer is a AMD Ryzen 9 5900X 12-Core Processor with 64 GB ECC RAM.
Second computer is a AMD Ryzen 5 3400G with 64 GB RAM

@alk
Copy link
Contributor
alk commented Oct 21, 2023

Hi. So backtrace from fio ticket is produced by some tool, which isn't giving perfect backtrace. Can you please do backtrace via gdb and post here?

gdb <fio path> core

then in gdb prompt

>bt

@mabod
Copy link
Author
mabod commented Oct 22, 2023

I dont know if this is what you are lokking for:

─# coredumpctl debug 527311                                                                             
           PID: 527311 (fio)
           UID: 1000 (matthias)
           GID: 1000 (matthias)
        Signal: 11 (SEGV)
     Timestamp: Fri 2023-10-20 10:44:06 CEST (1 day 23h ago)
  Command Line: /home/matthias/src/fio/fio --output=./write.out /home/matthias/fio//fio-bench-generic-seq-write.options
    Executable: /home/matthias/src/fio/fio
 Control Group: /user.slice/user-1000.slice/user@1000.service/app.slice/vte-spawn-8158d5f7-e1e7-45ab-8783-dd395cbe25d4.scope
          Unit: user@1000.service
     User Unit: vte-spawn-8158d5f7-e1e7-45ab-8783-dd395cbe25d4.scope
         Slice: user-1000.slice
     Owner UID: 1000 (matthias)
       Boot ID: 4a32aa1467214358ae1388942ceed477
    Machine ID: 4bd88beaa35549b5922de02c8064cbf1
      Hostname: rakete
       Storage: /var/lib/systemd/coredump/core.fio.1000.4a32aa1467214358ae1388942ceed477.527311.1697791446000000.zst (present)
  Size on Disk: 252.4K
       Message: Process 527311 (fio) of user 1000 dumped core.
                
                Stack trace of thread 527311:
                #0  0x0000000000000000 n/a (n/a + 0x0)
                #1  0x00007f3f24c2e54a n/a (libtcmalloc.so.4 + 0x2e54a)
                #2  0x00007f3f24c33f02 _Z13GetStackTracePPvii (libtcmalloc.so.4 + 0x33f02)
                #3  0x00007f3f24c27381 _ZN8tcmalloc8PageHeap12HandleUnlockEPNS0_14LockingContextE (libtcmalloc.so.4 + 0x27381)
                #4  0x00007f3f24c2750d _ZN8tcmalloc8PageHeap16NewWithSizeClassEmj (libtcmalloc.so.4 + 0x2750d)
                #5  0x00007f3f24c275bf _ZN8tcmalloc15CentralFreeList8PopulateEv (libtcmalloc.so.4 + 0x275bf)
                #6  0x00007f3f24c277a9 _ZN8tcmalloc15CentralFreeList21FetchFromOneSpansSafeEiPPvS2_ (libtcmalloc.so.4 + 0x277a9)
                #7  0x00007f3f24c27854 _ZN8tcmalloc15CentralFreeList11RemoveRangeEPPvS2_i (libtcmalloc.so.4 + 0x27854)
                #8  0x00007f3f24c279c8 _ZN8tcmalloc11ThreadCache21FetchFromCentralCacheEjiPFPvmE (libtcmalloc.so.4 + 0x279c8)
                #9  0x00007f3f24c373d6 _ZN8tcmalloc24allocate_full_malloc_oomEm (libtcmalloc.so.4 + 0x373d6)
                #10 0x00007f3f23cac4fb pool (libstdc++.so.6 + 0xac4fb)
                #11 0x00007f3f24f40eee n/a (ld-linux-x86-64.so.2 + 0x4eee)
                #12 0x00007f3f24f40fdc n/a (ld-linux-x86-64.so.2 + 0x4fdc)
                #13 0x00007f3f24f572d0 n/a (ld-linux-x86-64.so.2 + 0x1b2d0)
                ELF object binary architecture: AMD x86-64

GNU gdb (GDB) 13.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/matthias/src/fio/fio...
[New LWP 527311]

warning: Section `.reg-xstate/527311' in core file too small.

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.archlinux.org>
Enable debuginfod for this session? (y or [n]) y
Debuginfod has been enabled.
To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
[Thread debugging using libthread_db enabled]                                                                                            
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Core was generated by `/home/matthias/src/fio/fio --output=./write.out /home/matthias/fio//fio-bench-g'.
Program terminated with signal SIGSEGV, Segmentation fault.

warning: Section `.reg-xstate/527311' in core file too small.
--Type <RET> for more, q to quit, c to continue without paging--
#0  0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007f3f24c2e4f0 in ?? () at src/memory_region_map.cc:553 from /usr/lib/libtcmalloc.so.4
#2  0x00005617e208d010 in ?? ()
#3  0x00005617e208d010 in ?? ()
#4  0x000000000000001e in ?? ()
#5  0x0000000000000000 in ?? ()

@alk
Copy link
Contributor
alk commented Oct 22, 2023

Thanks for the update.

Sadly, backtrace from gdb looks quite broken. Here is my guess. Perhaps /usr/lib/libtcmalloc.so.4 isn't same as when core was produced? If so, can I ask you to do backtrace again with fully matching binaries?

@alk
Copy link
Contributor
alk commented Oct 22, 2023

So I got myself your OS in VM and here is what I get for fio:

user@endeavoros ~]$ ldd `which fio` | sort
	/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007ff97d578000)
	libacl.so.1 => /usr/lib/libacl.so.1 (0x00007ff97d41b000)
	libaio.so.1 => /usr/lib/libaio.so.1 (0x00007ff97d441000)
	libbrotlicommon.so.1 => /usr/lib/libbrotlicommon.so.1 (0x00007ff97bc59000)
	libbrotlidec.so.1 => /usr/lib/libbrotlidec.so.1 (0x00007ff97cd07000)
	libcom_err.so.2 => /usr/lib/libcom_err.so.2 (0x00007ff97cd01000)
	libcrypto.so.3 => /usr/lib/libcrypto.so.3 (0x00007ff97c800000)
	libc.so.6 => /usr/lib/libc.so.6 (0x00007ff97c61e000)
	libcurl.so.4 => /usr/lib/libcurl.so.4 (0x00007ff97d113000)
	libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007ff97d0ee000)
	libgfapi.so.0 => /usr/lib/libgfapi.so.0 (0x00007ff97d1cb000)
	libgfrpc.so.0 => /usr/lib/libgfrpc.so.0 (0x00007ff97d0b2000)
	libgfxdr.so.0 => /usr/lib/libgfxdr.so.0 (0x00007ff97d09d000)
	libglusterfs.so.0 => /usr/lib/libglusterfs.so.0 (0x00007ff97cd16000)
	libgssapi_krb5.so.2 => /usr/lib/libgssapi_krb5.so.2 (0x00007ff97c4bf000)
	libidn2.so.0 => /usr/lib/libidn2.so.0 (0x00007ff97c5fc000)
	libk5crypto.so.3 => /usr/lib/libk5crypto.so.3 (0x00007ff97bc94000)
	libkeyutils.so.1 => /usr/lib/libkeyutils.so.1 (0x00007ff97bc8d000)
	libkrb5.so.3 => /usr/lib/libkrb5.so.3 (0x00007ff97bcc2000)
	libkrb5support.so.0 => /usr/lib/libkrb5support.so.0 (0x00007ff97c47e000)
	liblzma.so.5 => /usr/lib/liblzma.so.5 (0x00007ff97c48c000)
	libm.so.6 => /usr/lib/libm.so.6 (0x00007ff97d446000)
	libnghttp2.so.14 => /usr/lib/libnghttp2.so.14 (0x00007ff97d01d000)
	libnuma.so.1 => /usr/lib/libnuma.so.1 (0x00007ff97d54d000)
	libpsl.so.5 => /usr/lib/libpsl.so.5 (0x00007ff97d007000)
	libresolv.so.2 => /usr/lib/libresolv.so.2 (0x00007ff97bc7c000)
	libssh2.so.1 => /usr/lib/libssh2.so.1 (0x00007ff97c5b3000)
	libssl.so.3 => /usr/lib/libssl.so.3 (0x00007ff97c513000)
	libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007ff97c200000)
	libtcmalloc_minimal.so.4 => /usr/lib/libtcmalloc_minimal.so.4 (0x00007ff97c027000)
	libtcmalloc.so.4 => /usr/lib/libtcmalloc.so.4 (0x00007ff97ce00000)
	libtirpc.so.3 => /usr/lib/libtirpc.so.3 (0x00007ff97d06f000)
	libunistring.so.5 => /usr/lib/libunistring.so.5 (0x00007ff97bd9a000)
	libunwind.so.8 => /usr/lib/libunwind.so.8 (0x00007ff97d426000)
	liburcu-bp.so.8 => /usr/lib/liburcu-bp.so.8 (0x00007ff97d05b000)
	liburcu-cds.so.8 => /usr/lib/liburcu-cds.so.8 (0x00007ff97d048000)
	liburcu-common.so.8 => /usr/lib/liburcu-common.so.8 (0x00007ff97d054000)
	libuuid.so.1 => /usr/lib/libuuid.so.1 (0x00007ff97d066000)
	libz.so.1 => /usr/lib/libz.so.1 (0x00007ff97d533000)
	libzstd.so.1 => /usr/lib/libzstd.so.1 (0x00007ff97bf54000)
	linux-vdso.so.1 (0x00007ffcd6598000)

See that it consumes both libtcmalloc.so and libtcmalloc_minimal.so. Which might work, but is one thing that is asking for trouble for sure.

@mabod
Copy link
Author
mabod commented Oct 22, 2023

I did another test with a core dump. This time it looks more comprehensive:

(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007f5368e2e4f0 in ?? () from /usr/lib/libtcmalloc.so.4
#2  0x00007f5368e2e54a in ?? () from /usr/lib/libtcmalloc.so.4
#3  0x00007f5368e33f02 in GetStackTrace(void**, int, int) () from /usr/lib/libtcmalloc.so.4
#4  0x00007f5368e27381 in tcmalloc::PageHeap::HandleUnlock(tcmalloc::PageHeap::LockingContext*) () from /usr/lib/libtcmalloc.so.4
#5  0x00007f5368e2750d in tcmalloc::PageHeap::NewWithSizeClass(unsigned long, unsigned int) () from /usr/lib/libtcmalloc.so.4
#6  0x00007f5368e275bf in tcmalloc::CentralFreeList::Populate() () from /usr/lib/libtcmalloc.so.4
#7  0x00007f5368e277a9 in tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) () from /usr/lib/libtcmalloc.so.4
#8  0x00007f5368e27854 in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /usr/lib/libtcmalloc.so.4
#9  0x00007f5368e279c8 in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, int, void* (*)(unsigned long)) () from /usr/lib/libtcmalloc.so.4
#10 0x00007f5368e373d6 in tcmalloc::allocate_full_malloc_oom(unsigned long) () from /usr/lib/libtcmalloc.so.4
#11 0x00007f53680ac4fb in (anonymous namespace)::pool::pool (this=0x7f5368276280 <(anonymous namespace)::emergency_pool>) at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_alloc.cc:235
#12 __static_initialization_and_destruction_0 () at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_alloc.cc:373
#13 _GLOBAL__sub_I_eh_alloc.cc(void) () at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_alloc.cc:456
#14 0x00007f5369095eee in call_init (env=0x7fff4f47e178, argv=0x7fff4f47e158, argc=3, l=<optimized out>) at dl-init.c:90
#15 call_init (l=<optimized out>, argc=3, argv=0x7fff4f47e158, env=0x7fff4f47e178) at dl-init.c:27
#16 0x00007f5369095fdc in _dl_init (main_map=0x7f53690c52d0, argc=3, argv=0x7fff4f47e158, env=0x7fff4f47e178) at dl-init.c:137
#17 0x00007f53690ac2d0 in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#18 0x0000000000000003 in ?? ()
#19 0x00007fff4f480265 in ?? ()
#20 0x00007fff4f480269 in ?? ()
#21 0x00007fff4f48027e in ?? ()
#22 0x0000000000000000 in ?? ()

BUT:
I believe the issue has been fixed. I started the bug report with package version 2.13-1. And this is the version with which I created this new core dump. But since yesterday there is a new package 2.13-2 with which I can not reproduce the core dump anymore.

The new package 2.13-2 contains this patch:

https://github.com/gperftools/gperftools/commit/c48d4f14.patch

The package maintainer says it fixes "segfaults". Seems to fix my issue.

https://gitlab.archlinux.org/archlinux/packaging/packages/gperftools/-/commits/main

@alk
Copy link
Contributor
alk commented Oct 22, 2023

Thanks for the update. I hoped to see function names for those two topmost call stack frames, but perhaps debug info (even function names) is stripped. BTW, it's worth asking Arch folk if that lack of function names debug info is what they intend to ship.

If you can access the archlinux packaging person, I would very much like to have repro for "without patch". If this patch was indeed "it", I am very surprised, since it was believed to be only affecting older, less standards compliant gcc versions. And Arch is quite modern with its gcc versions.

@alk
Copy link
Contributor
alk commented Oct 22, 2023

Alternatively, is there a way for me to grab "broken" version of libtcmalloc.so.4 ? (even better if I am able to reproduce myself; but just having the binary will help be double check if it was "it" indeed. All evidence so far agrees with this theory)

@alk
Copy link
Contributor
alk commented Oct 22, 2023

Ah, okay. I am able to reproduce. So indeed it was "it" and we can close now given patch is already merged to our master branch. And I'll be doing new release sometime soon.

@alk alk closed this as completed Oct 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants
0