Better visibility into packet-dropping decisions
This problem is not new, and neither are attempts to address it. The kernel currently contains a "drop_monitor" functionality that was introduced in the 2.6.30 kernel back in 2009. Over the years, it has gained some functionality but has managed to remain thoroughly and diligently undocumented. This feature appears to support a netlink API that can deliver notifications when packets are dropped. Those notifications include an address within the kernel showing where the decision to drop the packet was made, and can optionally include the dropped packets themselves. User-space code can turn the addresses into function names; desperate administrators can then dig through the kernel source to try to figure out what is going on.
It seems like there should be a better way. As it happens, the beginning of the infrastructure to provide that better way was contributed to 5.17 by Menglong Dong. The internal kernel function that frees the memory holding a packet is kfree_skb(); in 5.17, that function has become:
void kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason);
The reason argument is new; it is intended to say why the packet passed as skb has reached the end of the line. This information is not actually useful to the kernel, but it has been added to the existing kfree_skb tracepoint, making it available to any program that connects to that tracepoint. Analysis scripts can quickly print out why packets are being dropped; administrators can also attach BPF programs to, for example, create a histogram of reasons for dropped packets.
A new version of kfree_skb() has also been added; it simply calls kfree_skb_reason() with "unspecified" as the reason.
In 5.17, the use of this infrastructure is relatively limited. There are a few TCP-level drop locations that have been instrumented with the new call, including code that drops packets for being smaller than the TCP header size, not being associated with an existing TCP socket, exhibiting checksum failures, or having been explicitly dropped by an add-on socket filter program. The UDP subsystem has also been enhanced to note those same reasons for dropped packets.
The situation is set to improve considerably in 5.18; patches already in linux-next add a number of new reasons. These document packets dropped by the netfilter subsystem, that contain IP-header errors, or have been identified as a spoofed packet by the reverse-path filter (rp_filter) mechanism. Administrators will be able to see IP packets that have been dropped due to an unsupported higher-level protocol. Reasons have also been added for UDP packets dropped by the IPSec XFRM policy or a lack of memory within the kernel.
There is yet another set of reason annotations that has been accepted, but which has not yet found its way into linux-next; chances are that these will show up in 5.18 as well. They extend the XFRM-policy annotation to TCP, note packets dropped due to missing or incorrect MD5 hashes (which are evidently still a thing in 2022), as well as those containing invalid TCP flags or sequence numbers outside of the current TCP window. These patches also add new instances of the other reasons noted above; some situations can be detected in multiple places.
While the above set of reasons may seem long, this work could be seen as having just begun. In current linux-next, there are over 2,700 calls to kfree_skb(), compared to 18 to kfree_skb_reason(). That suggests that a lot of packets will still be dropped for unspecified reasons. Still, this work represents a useful step forward, one that should make many of the reasons for packet loss more readily available to system administrators.
The part that remains missing, of course, is the user-space side. The current reason codes are all defined in <linux/skbuff.h>, which is not part of the externally available kernel API. Moving them to a separate file under the uapi directory would make them more accessible to developers. Also helpful, of course, would be to have some documentation for this mechanism and how to use it (and interpret the results), but even your editor, often cited for naive optimism, will not be holding his breath for that to show up.
Meanwhile, though, an important piece of the kernel's network functionality
is becoming a little more transparent to users. That should make life
easier for system administrators who will be able to spend less time trying
to figure out why packets aren't making it through the system.
Unfortunately, though, this work offers no help for users who are wondering
why their packets are disappearing somewhere in the far reaches of the
Internet.
Index entries for this article | |
---|---|
Kernel | Networking |
Posted Feb 25, 2022 20:29 UTC (Fri)
by atnot (subscriber, #124910)
[Link] (8 responses)
Posted Feb 26, 2022 2:04 UTC (Sat)
by shemminger (subscriber, #5739)
[Link] (2 responses)
Posted Feb 26, 2022 5:52 UTC (Sat)
by tititou (subscriber, #75162)
[Link] (1 responses)
Posted Feb 26, 2022 19:03 UTC (Sat)
by johill (subscriber, #25196)
[Link]
It supports reporting a string (error message), a pointer to a bad attribute, and if NL_SET_ERR_MSG_ATTR_POL was used (which it is in the general policy-based parsing) will even return the policy for the attribute back to userspace to explain why the attribute failed (e.g. if it's NLA_RANGE(U32, 1,2) and you gave a value 3).
Posted Feb 26, 2022 15:20 UTC (Sat)
by jreiser (subscriber, #11027)
[Link] (4 responses)
Posted Feb 26, 2022 19:05 UTC (Sat)
by johill (subscriber, #25196)
[Link] (2 responses)
Posted Feb 27, 2022 3:21 UTC (Sun)
by roc (subscriber, #30627)
[Link] (1 responses)
Posted Feb 27, 2022 9:17 UTC (Sun)
by jengelh (subscriber, #33263)
[Link]
Posted Mar 11, 2022 8:44 UTC (Fri)
by njs (guest, #40338)
[Link]
https://github.com/nviennot/linux-trace-error
Posted Feb 26, 2022 4:49 UTC (Sat)
by alison (subscriber, #63752)
[Link] (1 responses)
Posted Feb 27, 2022 21:26 UTC (Sun)
by shemminger (subscriber, #5739)
[Link]
Posted Feb 27, 2022 23:43 UTC (Sun)
by amarao (subscriber, #87073)
[Link] (3 responses)
Posted Mar 2, 2022 3:25 UTC (Wed)
by MaZe (subscriber, #53908)
[Link] (2 responses)
Posted Mar 2, 2022 9:58 UTC (Wed)
by amarao (subscriber, #87073)
[Link]
Posted Jul 7, 2022 6:48 UTC (Thu)
by gdt (subscriber, #6284)
[Link]
Linux counting failed MD5 packets is excellent, as network operators investigating BGP connection issues can check that the counter is the expected zero.
For the longest time vendors were promoting IPsec as the replacement for the TCP MD5 option, but operationally the overhead of configuration and customer education was too high. More recently TCP-AO (Authentication Option) offers a similar mechanism to the MD5 option, but with modern cyrptographic algorithms.
For external BGP connections the TTL security check also offers good protection from network abuse. Customers generally seem to be able to configure that without much difficulty.
Posted Mar 6, 2022 20:16 UTC (Sun)
by gfa (guest, #53331)
[Link] (1 responses)
Does anybody know any tool that can use this functionality?
thanks
Posted Mar 9, 2022 17:55 UTC (Wed)
by rstonehouse (subscriber, #81531)
[Link]
(Also there is a systemtap script to do something similar. See https://sourceware.org/git/?p=systemtap.git;a=blob;f=test...)
Better visibility into packet-dropping decisions
Better visibility into packet-dropping decisions
Many places have it, but lots still need work -- volunteers wanted.
Better visibility into packet-dropping decisions
Can you provide a link or an example about it ?
Better visibility into packet-dropping decisions
There is a need for a facility to locate at run time every failed subroutine call. The source code be edited with sed so that return -Exxxxx; becomes return ErrorCode(Exxxxx); with a default macro definiton something like
return -Exxxxx;
#ifndef ErrorCode
#define ErrorCode(errnum) -(errnum)
#endif
Then the determined investigator can re-compile selected source files with something like
#define ErrorCode(errnum) myErrorDiagnostic(errnum, __builtin_return_address(0), __FUNCTION__, __LINE__)
and supply a definition for the added subroutine myErrorDiagnostic. Of course there are a handful of cases where error numbers are variables or the syntax is complex, and also a few places where simple automated editing fails. Rate limiting the reporting can be a problem. But I did this once, and got the answer I wanted.
In most files you can even just
return -Exxxxx;
#define EINVAL ({printk(...); 22;})
if you really want :-)
return -Exxxxx;
return -Exxxxx;
return -Exxxxx;
Better visibility into packet-dropping decisions
Better visibility into packet-dropping decisions
This is reported in rx_missed. Not sure if there more that HW can tell you.
There are lots of rx_dropped places in drivers, these could/should be instrumented.
Better visibility into packet-dropping decisions
Better visibility into packet-dropping decisions
Better visibility into packet-dropping decisions
Better visibility into packet-dropping decisions
Cynically, if the BGP connection isn't using a long, random, unique key prior to that outage, then it will be afterwards :-)
Better visibility into packet-dropping decisions
Better visibility into packet-dropping decisions