[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault in stats.c:404 #782

Closed
dukeartem opened this issue Nov 16, 2022 · 2 comments
Closed

Segmentation fault in stats.c:404 #782

dukeartem opened this issue Nov 16, 2022 · 2 comments

Comments

@dukeartem
Copy link
Contributor

Describe the bug
Hello ✋
After this commit 57230d7 my unbound installation began to fall with Segmentation fault. After debug with gdb i get place with segfault

Thread` 1 "unbound" received signal SIGSEGV, Segmentation fault.
server_stats_obtain (worker=worker@entry=0x409bf2f4800, who=0x40828008000, s=s@entry=0x7fffffffc778, reset=reset@entry=0)
    at /place/sandbox-data/tasks/8/0/1507276808/__FUSE/mount_path_8ea5aeab-ca51-4dae-91ec-8299322eef4d/contrib/tools/unbound/daemon/stats.c:404

(gdb) bt
#0  server_stats_obtain (worker=worker@entry=0x409bf2f4800, who=0x40828008000, s=s@entry=0x7fffffffc778, reset=reset@entry=0)
    at /place/sandbox-data/tasks/8/0/1507276808/__FUSE/mount_path_8ea5aeab-ca51-4dae-91ec-8299322eef4d/contrib/tools/unbound/daemon/stats.c:404
#1  0x0000000002197642 in do_stats (ssl=0x7fffffffdc50, worker=0x0, reset=0) at /place/sandbox-data/tasks/8/0/1507276808/__FUSE/mount_path_8ea5aeab-ca51-4dae-91ec-8299322eef4d/contrib/tools/unbound/daemon/remote.c:1116
#2  0x00000000021965ac in execute_cmd (rc=rc@entry=0x409bee82b80, ssl=0x0, ssl@entry=0x7fffffffdc50, cmd=cmd@entry=0x7fffffffdc70 " stats_noreset", worker=0x7fffffffb198)
    at /place/sandbox-data/tasks/8/0/1507276808/__FUSE/mount_path_8ea5aeab-ca51-4dae-91ec-8299322eef4d/contrib/tools/unbound/daemon/remote.c:3080
#3  0x0000000002196016 in handle_req (rc=0x409bee82b80, res=0x7fffffffdc50, s=<optimized out>)
    at /place/sandbox-data/tasks/8/0/1507276808/__FUSE/mount_path_8ea5aeab-ca51-4dae-91ec-8299322eef4d/contrib/tools/unbound/daemon/remote.c:3323
#4  remote_control_callback (c=<optimized out>, arg=0x552091994430, err=<optimized out>, rep=<optimized out>)
    at /place/sandbox-data/tasks/8/0/1507276808/__FUSE/mount_path_8ea5aeab-ca51-4dae-91ec-8299322eef4d/contrib/tools/unbound/daemon/remote.c:3409
#5  0x000000000223654e in event_persist_closure (base=0x409bf268e80, ev=0x55208f57e600) at /place/sandbox-data/tasks/8/0/1507276808/__FUSE/mount_path_8ea5aeab-ca51-4dae-91ec-8299322eef4d/contrib/libs/libevent/event.c:1623
#6  event_process_active_single_queue (base=0x409bf268e80, activeq=0x409bfc79210, max_to_process=max_to_process@entry=2147483647, endtime=endtime@entry=0x0)
    at /place/sandbox-data/tasks/8/0/1507276808/__FUSE/mount_path_8ea5aeab-ca51-4dae-91ec-8299322eef4d/contrib/libs/libevent/event.c:1682
#7  0x000000000223323c in event_process_active (base=0x409bf268e80) at /place/sandbox-data/tasks/8/0/1507276808/__FUSE/mount_path_8ea5aeab-ca51-4dae-91ec-8299322eef4d/contrib/libs/libevent/event.c:1783
#8  event_base_loop (base=0x409bf268e80, flags=flags@entry=0) at /place/sandbox-data/tasks/8/0/1507276808/__FUSE/mount_path_8ea5aeab-ca51-4dae-91ec-8299322eef4d/contrib/libs/libevent/event.c:2006
#9  0x0000000002232c07 in event_base_dispatch (event_base=0x0) at /place/sandbox-data/tasks/8/0/1507276808/__FUSE/mount_path_8ea5aeab-ca51-4dae-91ec-8299322eef4d/contrib/libs/libevent/event.c:1817
#10 0x00000000021a4ed5 in ub_event_base_dispatch (base=0x0) at /place/sandbox-data/tasks/8/0/1507276808/__FUSE/mount_path_8ea5aeab-ca51-4dae-91ec-8299322eef4d/contrib/tools/unbound/util/ub_event.c:280
#11 0x0000000002f8115c in comm_base_dispatch (b=<optimized out>) at /place/sandbox-data/tasks/8/0/1507276808/__FUSE/mount_path_8ea5aeab-ca51-4dae-91ec-8299322eef4d/contrib/tools/unbound/util/netevent.c:267
#12 0x00000000021a3e59 in worker_work (worker=<optimized out>) at /place/sandbox-data/tasks/8/0/1507276808/__FUSE/mount_path_8ea5aeab-ca51-4dae-91ec-8299322eef4d/contrib/tools/unbound/daemon/worker.c:2320
#13 0x0000000002194ace in daemon_fork (daemon=daemon@entry=0x409bf340000) at /place/sandbox-data/tasks/8/0/1507276808/__FUSE/mount_path_8ea5aeab-ca51-4dae-91ec-8299322eef4d/contrib/tools/unbound/daemon/daemon.c:773
#14 0x000000000219f018 in run_daemon (cfgfile=<optimized out>, cmdline_verbose=0, debug_mode=1, need_pidfile=1)
    at /place/sandbox-data/tasks/8/0/1507276808/__FUSE/mount_path_8ea5aeab-ca51-4dae-91ec-8299322eef4d/contrib/tools/unbound/daemon/unbound.c:743
#15 main (argc=<optimized out>, argv=<optimized out>) at /place/sandbox-data/tasks/8/0/1507276808/__FUSE/mount_path_8ea5aeab-ca51-4dae-91ec-8299322eef4d/contrib/tools/unbound/daemon/unbound.c:845

I caught problem with new function which uses select for wait responce from socket. I read that this method has limit on how many description may have been selected and one more limit that method(FD_SET structure) can't work with fd description which has num more 1024 (it's something historicaly, i don't know why). And on my installation i have this problem. I wrote simple function on poll:

#include "sys/poll.h"

int tube_wait_timeout_poll(struct tube* tube, int msec) {
    struct pollfd fds;

    fds.fd = tube->sr;
    fds.events = POLLIN;

    int ret = poll( &fds, 1, msec );

    if ( ret == -1 )
        return -1;
    else if ( ret == 0 )
        return 0;

    fds.revents = 0;
    return ret;
}

and changed old function in this line on new function.
And it's working fine for me. I haven't expirence and testing with Windows for change this place and therefore it's not PR

To reproduce
Steps to reproduce the behavior:

  1. in unbound.conf
  • set interface: multiple time on different ip (i have 7)
  • set num-threads: in something big (i set 64)
  1. start unbound and within a minute it restarting with segfailt in dmesg

System:

  • Unbound version: 1.17.0
  • OS: Ubuntu 18.04.3 LTS
  • unbound -V output:
Version 1.17.0-UNBOUNDREL-8-select-to-poll-r10337037

Configure line: --disable-static --prefix=/var/empty/unbound-1.17.0 --bindir=/var/empty/tmp/out/bin --sbindir=/var/empty/tmp/out/sbin --includedir=/var/empty/tmp/out/include --oldincludedir=/var/empty/tmp/out/include --mandir=/var/empty/tmp/out/share/man --infodir=/var/empty/tmp/out/share/info --docdir=/var/empty/tmp/out/share/doc/unbound --libdir=/var/empty/tmp/out/lib --libexecdir=/var/empty/tmp/out/libexec --localedir=/var/empty/tmp/out/share/locale --disable-rpath --enable-dnstap --enable-dnscrypt --enable-subnet --enable-systemd --libdir=/usr/lib --prefix= --with-libevent=/var/empty/libevent-2.1.12-dev --with-pidfile=/run/unbound.pid --with-rootkey-file=/var/lib/unbound/root.key --with-ssl=/var/empty/openssl-1.1.1o-dev --with-username= --with-libnghttp2=/var/empty/nghttp2-1.47.0-dev
Linked libs: libevent 2.1.12-stable (it uses epoll), OpenSSL 1.1.1l  24 Aug 2021
Linked modules: dns64 subnetcache respip validator iterator
DNSCrypt feature available

BSD licensed, see LICENSE in source package for details.
Report bugs to unbound-bugs@nlnetlabs.nl or https://github.com/NLnetLabs/unbound/issues
@dukeartem
Copy link
Contributor Author

yes, now i'm shure that's problem with socket who has num 1024 and more. A litle bit "why" about it i found here https://stackoverflow.com/a/20997117

dukeartem added a commit to dukeartem/unbound that referenced this issue Nov 19, 2022
select() has limit for 1024 file descriptors and it's affect big installation. More information in issue NLnetLabs#782
@wcawijngaards
Copy link
Member

The fix has the suggested code, with some changes. The change allows the function to continue to ignore EINTR and EAGAIN while waiting for the file descriptor. Also, the function pollit is fixed for the same issue of an fd larger than 1024, it seems to not cause a problem for you, but it has the same issue with fd_set.

Thanks for the detailed report and suggested fix! The previously referenced commit is fine, it causes the problem to show up, because it calls the tube functions, but the fix in the tube wait function is the right fix to have.

jedisct1 added a commit to jedisct1/unbound that referenced this issue Dec 13, 2022
* nlnet/master:
  - Updates for NLnetLabs#461 (Add max-query-restarts option).
  - Expose 'max-sent-count' as a configuration option; the   default value retains Unbound's behavior.
  - Expose 'statistics-inhibit-zero' as a configuration option; the   default value retains Unbound's behavior.
  - Fix to wrap Makefile scripts directory in quotes for uninstall.
  Changelog note for NLnetLabs#808 - Merge NLnetLabs#808: Wrap Makefile script's directory variables in quotes.
  wrap directory variables in quotes
  Fix date.
  - Fix NLnetLabs#773: When used with systemd-networkd, unbound does not start   until systemd-networkd-wait-online.service times out.
  - Clear documentation for interactivity between the subnet module and   the serve-expired and prefetch configuration options.
  - Add SVCB and HTTPS to the types removed by 'unbound-control flush'.
  - Fix NLnetLabs#782: Segmentation fault in stats.c:404.
  Changelog entry for NLnetLabs#720
  Document max-query-restarts option
  Use max-query-restarts in iterative resolver
  Add max-query-restarts to grammar and lexer
  Add max-query-restarts config parameter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants