latency packets can be "lost" at high packet rates #8

atheurer · 2016-10-13T20:35:52Z

We have been using the timestamper:measureLatency() to measure latency while having a concurrent bulk network traffic in the background. Some of these latency packets can get "lost", and I am not sure why this is happening yet. Somewhere around 10Mpps is where we start seeing loss. At 14.7Mpps, we get up to 25% loss. During this test, the bulk network (14.7Mpps) has exactly 0 loss. The latency packets are sent at about 100/second.

I have added debug code to measureLatency() to report the different ways loss can happen, and it is always because the time to wait for the packet has expired.

So far these tests use 64 byte frames for bulk network traffic and 76(80) byte frames for the latency packets. If we increase the frame size to 124 for latency, we get all of the packets.

The network adapter is Intel Niantic, and the ports are connected to each other (no other system involved).

I am wondering if this has anything to do with filtering on the adapter. Have you seen anything like this? I will probably look at the software timestamping next to see if the same problem is there.

emmericp · 2016-10-13T21:06:46Z

Yes, I've unfortunately seen this exact behavior before.
It only seems to happen on older ixgbe NICs at almost 100% load.
IIRC the rx packets sometimes simply don't get timestamped, try to check the number of received but not timestamped packets in your debug code. It should be 0 with a loopback cable and proper filters.

It doesn't happen on an X550 or i40e NICs, so it looks like a hardware issue and we can't do anything :(

Let me know if you find anything useful or a work-around. Software timestamping should work.

atheurer · 2016-10-14T20:34:53Z

BTW, I tried this on i40, and I still lose some packets, 5.8% at 19.84 Mpps. I will try some SW timestamping approach to see if it is truly losing these packets.

emmericp · 2016-10-14T20:52:42Z

Interesting result, I only tested it on an X710 10 Gbit i40e NIC at 14.88 Mpps (since that was the only one that's currently in a loop-back config in our lab).

emmericp · 2016-10-21T13:31:32Z

I just talked to Franck Baudin at the DPDK Summit about this issue and he stressed the importance of this for you guys in the opnfv project.
I think I have a work-around by changing the way the statistics are reported.

I unfortunately don't have access to any directly connected ixgbe or XL710 NICs at the moment. I'll setup a test system with a direct connection between two ixgbe ports in my lab on monday.

I've just tested an X710 (i40e 10 GbE NIC) and this NIC works fine:

[Device: id=4] TX: 14.88 Mpps, 7619 Mbit/s (10000 Mbit/s with framing)
[Device: id=5] RX: 14.88 Mpps, 7619 Mbit/s (10000 Mbit/s with framing)
[Device: id=4] TX: 14.88 Mpps, 7619 Mbit/s (10000 Mbit/s with framing)
[INFO]  Sent 59997 packets, received 59997
Samples: 59997, Average: 533.1 ns, StdDev: 12.8 ns, Quartiles: 526.0/532.0/538.0 ns
[Device: id=5] RX: 14.88 (StdDev 0.00) Mpps, 7619 (StdDev 0) Mbit/s (10000 Mbit/s with framing), total 152856921 packets with 9783802896 bytes (incl. CRC)
[Device: id=4] RX: 0.00 (StdDev 0.00) Mpps, 0 (StdDev 0) Mbit/s (0 Mbit/s with framing), total 0 packets with 0 bytes (incl. CRC)
[Device: id=5] TX: 0.00 (StdDev 0.00) Mpps, 0 (StdDev 0) Mbit/s (0 Mbit/s with framing), total 0 packets with 0 bytes (incl. CRC)
[Device: id=4] TX: 14.88 (StdDev 0.00) Mpps, 7619 (StdDev 0) Mbit/s (10000 Mbit/s with framing), total 152856921 packets with 9783802896 bytes (incl. CRC)

emmericp · 2016-10-24T14:01:02Z

Okay, I've setup a few loopback connections and found the following:

it only happens when the NIC is fully loaded in terms of packets/s
filtering the packets (e.g. incorrect dst mac and disabling promisc) "solves" this
there is also a very rare case where a timestamp is not taken (but the packet is received correctly) without any overload scenario. This happens every 50k to 100k packets or so.
NICs affected: XL710 (40 GbE), 82599 (10 GbE)
NICs not affected: X710 (10 GbE), X540/X550 (10 GbE 10GBASE-T)
there are two types of losses: packets not received at all and packets not timestamped
the device rx and tx counters always match, i.e. MoonGen counts all packets correctly even if they are not timestamped or received properly

Specific to 82599 NICs:

it both loses packets and fails to timestamp packets that are received, for example
[INFO] Sent 2638 packets, received 2139 packets, got 2086 timestamps
the worst-case are minimum-sized packets

Specific to XL710 NICs:

it only loses packets, but timestamps all packets that are received correctly, for example
[INFO] Sent 22803 packets, received 22716 packets, got 22716 timestamps
the worst case are again minimum-sized packets, which the NIC doesn't handle very well at all anyways
it is very likely that the NIC is simply better at sending packets than at receiving them -- this NIC is full of such weird hardware limits...

To conclude:

I've uploaded a test script here: https://github.com/emmericp/moongen-scripts/blob/master/timestamping-full-load-nic-bug.lua (use with the latest version of moongen)
82599 and XL710 sometimes lose timestamp packets and this seems to be a hardware limitation
newer 10 GbE NICs don't have this problem
all dev Rx/Tx counters are still correct

So I believe that this is not a big problem, it merely reduces the sample rate for timestamps at high packet loads. Use device Rx/Tx counters to report throughput and packet loss, ignore non-timestamped packets, this simply means that the sample rate will be lower under full load. Certainly not a good thing, but it doesn't look like that we can do better with the hardware.

Maybe it's also possible to install an explicit drop filter (like the commented out :setPromisc(false) call in the example script) for non-timestamped packets. I'm however not sure if that works and if the counters still work (they don't with promisc = false, but I think with an fdir filter they should).

BTW: timestamping at full load is not a useful scenario in many cases. For example, if you are forwarding between two ports with the same speed, then buffers might fill up due to short interruptions on the DuT and it's not possible for the DuT to "catch up" since the packets are coming in at the same rate that they can be sent out. This will be visible as an increasing latency over time for no obvious reason.)

atheurer · 2016-11-09T04:16:28Z

Thanks for all the testing and information. Initially this problem was quite severe around 10 Mpps (losing every single latency packet), but that was on a much older version of MoonGen/DPDK. More recent versions were significantly better, only seeing a small percentage of loss. I'll run your test script on the latest code just be sure I am seeing the same thing.

I agree that time-stamping at full load might not be useful, if the DUT cannot sustain 0 packet loss. However, we tune the DUT quite extensively to obtain 0-packet loss, and typically test this for 2 hours, and sometimes 12 hours or more. Technically this is not full load, because we need a DUT to process packets at a slightly higher rate than it is receiving, so that when there is some preemption, and buffer use increases, the buffer can later be "drained" before the next preemption happens. But, at this maximum, sustained, no-loss rate, we really do want to have a good characterization of latency.

emmericp closed this as completed Oct 13, 2016

emmericp reopened this Oct 21, 2016

emmericp added a commit to emmericp/moongen-scripts that referenced this issue Oct 24, 2016

add test script for libmoon/libmoon#8

72feff7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

latency packets can be "lost" at high packet rates #8

latency packets can be "lost" at high packet rates #8

latency packets can be "lost" at high packet rates #8

latency packets can be "lost" at high packet rates #8

Comments