Description
[ type description here; PLEASE REMOVE THIS LINE AND THE LINES BELOW BEFORE SUBMITTING THE ISSUE ]
Following transferrred from stack overflow:
I own two systems, host-a and host-b, connected in a lan via a switch; they both have no firewall. When I run clockdiff in each of them I receive the error message that the other host is down.
All commands are executed in a clean bash by running exec -c bash --norc
(files redacted).
root@host-a:~# clockdiff host-b
...........................clockdiff: host-b is down
root@host-a:~# clockdiff -o host-b
...........................clockdiff: host-b is down
At the same time, a tcpdump on the same machine host-a reveals that the queried host responds with ICMP messages (ICMP time stamp in the former case, ECHO in the latter)
root@host-a:~# tcpdump icmp
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on lan0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
....
<TIME>:18.078956 IP host-b > host-a: ICMP time stamp reply id 29228 seq 7424: org <TIME>:18.078, recv <TIME>:08.347, xmit <TIME>:08.347, length 20
<TIME>:18.078956 IP host-b > host-a: ICMP time stamp reply id 29228 seq 7680: org <TIME>:18.078, recv <TIME>:08.347, xmit <TIME>:08.347, length 20
1
...
Curiously, when I run the exact same commands under strace, the replies are measured and a correct response is returned.
root@host-a:~# strace -o /tmp/log clockdiff host-b
....................................................
host=host-b rtt=0(0)ms/0ms delta=-9737ms/-9737ms <DATETIME>
root@host-a:~# strace -o /tmp/log clockdiff -o host-b
...................................................
root@host-a:~# strace -o /tmp/log clockdiff host-b
....................................................
host=host-b rtt=0(0)ms/0ms delta=-9737ms/-9737ms <DATETIME>
root@host-a:~# uname -a
Linux host-a 6.1.0-25-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux
Some excerpt from the strace log:
...
connect(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("<HOSTB IP>")}, 16) = 0
ppoll([{fd=3, events=POLLIN|POLLHUP}], 1, {tv_sec=0, tv_nsec=0}, NULL, 8) = 0 (Timeout)
sendto(3, "\r\0"..., 20, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("<HOSTB IP>")}, 16) = 20
ppoll([{fd=3, events=POLLIN|POLLHUP}], 1, {tv_sec=1, tv_nsec=0}, NULL, 8) = 1 ([{fd=3, revents=POLLIN}], left {tv_sec=0, tv_nsec=999837227})
recvfrom(3, "E\0"..., 1024, 0, NULL, 0x7ffff369ecd8) = 40
newfstatat(1, "", {st_mode=S_IFCHR|0600, st_rdev=makedev(0x88, 0xb), ...}, AT_EMPTY_PATH) = 0
write(1, ".", 1) = 1
ppoll([{fd=3, events=POLLIN|POLLHUP}], 1, {tv_sec=0, tv_nsec=937000000}, NULL, 8) = 0 (Timeout)
sendto(3, "\r\0"..., 20, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("<HOSTB IP>")}, 16) = 20
ppoll([{fd=3, events=POLLIN|POLLHUP}], 1, {tv_sec=0, tv_nsec=937000000}, NULL, 8) = 1 ([{fd=3, revents=POLLIN}], left {tv_sec=0, tv_nsec=936829484})
recvfrom(3, "E\0"..., 1024, 0, NULL, 0x7ffff369ecd8) = 40
write(1, ".", 1) = 1
ppoll([{fd=3, events=POLLIN|POLLHUP}], 1, {tv_sec=0, tv_nsec=842000000}, NULL, 8) = 0 (Timeout)
sendto(3, "\r\0"..., 20, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("<HOSTB IP>")}, 16) = 20
ppoll([{fd=3, events=POLLIN|POLLHUP}], 1, {tv_sec=0, tv_nsec=842000000}, NULL, 8) = 1 ([{fd=3, revents=POLLIN}], left {tv_sec=0, tv_nsec=841996498})
recvfrom(3, "E\0"..., 1024, 0, NULL, 0x7ffff369ecd8) = 40
write(1, ".", 1) = 1
ppoll([{fd=3, events=POLLIN|POLLHUP}], 1, {tv_sec=0, tv_nsec=736000000}, NULL, 8) = 0 (Timeout)
sendto(3, "\r\0"..., 20, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("<HOSTB IP>")}, 16) = 20
ppoll([{fd=3, events=POLLIN|POLLHUP}], 1, {tv_sec=0, tv_nsec=736000000}, NULL, 8) = 1 ([{fd=3, revents=POLLIN}], left {tv_sec=0, tv_nsec=735845402})
recvfrom(3, "E\0"..., 1024, 0, NULL, 0x7ffff369ecd8) = 40
write(1, ".", 1) = 1
ppoll([{fd=3, events=POLLIN|POLLHUP}], 1, {tv_sec=0, tv_nsec=631000000}, NULL, 8) = 0 (Timeout)
sendto(3, "\r\0"..., 20, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("<HOSTB IP>")}, 16) = 20
ppoll([{fd=3, events=POLLIN|POLLHUP}], 1, {tv_sec=0, tv_nsec=631000000}, NULL, 8) = 1 ([{fd=3, revents=POLLIN}], left {tv_sec=0, tv_nsec=630791917})
recvfrom(3, "E\0"..., 1024, 0, NULL, 0x7ffff369ecd8) = 40
write(1, ".", 1) = 1
ppoll([{fd=3, events=POLLIN|POLLHUP}], 1, {tv_sec=0, tv_nsec=532000000}, NULL, 8) = 0 (Timeout)
sendto(3, "\r\0"..., 20, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("<HOSTB IP>")}, 16) = 20
ppoll([{fd=3, events=POLLIN|POLLHUP}], 1, {tv_sec=0, tv_nsec=532000000}, NULL, 8) = 1 ([{fd=3, revents=POLLIN}], left {tv_sec=0, tv_nsec=531768694})
recvfrom(3, "E\0"..., 1024, 0, NULL, 0x7ffff369ecd8) = 40
write(1, ".", 1) = 1
sendto(3, "\r\0"..., 20, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("<HOSTB IP>")}, 16) = 20
ppoll([{fd=3, events=POLLIN|POLLHUP}], 1, {tv_sec=0, tv_nsec=442000000}, NULL, 8) = 1 ([{fd=3, revents=POLLIN}], left {tv_sec=0, tv_nsec=441949198})
recvfrom(3, "E\0"..., 1024, 0, NULL, 0x7ffff369ecd8) = 40
write(1, ".", 1) = 1
...
The excutable is not setuid and the OS is debian.
root@host-a:~# ls -laF `which clockdiff`
-rwxr-xr-x 1 root root 23120 Sep 24 2024 /usr/bin/clockdiff*
root@host-a# getcap /usr/bin/clockdiff
/usr/bin/clockdiff cap_net_raw=ep
root@host-a:~# dpkg -S /usr/bin/clockdiff
iputils-clockdiff: /usr/bin/clockdiff
root@host-a:~# clockdiff -V
clockdiff from iputils 20221126
libcap: yes, IDN: yes, NLS: no, error.h: yes, getrandom(): yes, __fpending(): yes
I have encountered strace Heisenbugs before, but in this case, I cannot fathom why the normal invocation would fail to see any responses, when it is still run as root.
Also, this behaviour is empirically always reproducible in around 10 repetitions of the test.
What am I missing?