LWN: Comments on "Network transmit queue limits" https://lwn.net/Articles/454390/ This is a special feed containing comments posted to the individual LWN article titled "Network transmit queue limits". en-us Thu, 16 Jan 2025 04:43:16 +0000 Thu, 16 Jan 2025 04:43:16 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net Network transmit queue limits https://lwn.net/Articles/563046/ https://lwn.net/Articles/563046/ shentino <div class="FormattedComment"> The purpose of the device queue is actually to maximize throughput by keeping the interface busy without having to pester the kernel for new packets.<br> <p> Especially if the kernel is busy with something else and can't immediately handle an interrupt.<br> <p> And since the queue is being digested by the hardware itself in many cases, the kernel can't just tinker with it willy nilly.<br> </div> Mon, 12 Aug 2013 04:42:00 +0000 Network transmit queue limits https://lwn.net/Articles/456214/ https://lwn.net/Articles/456214/ wtanksleyjr <div class="FormattedComment"> It seems to me -- ignorance alert! -- that the problem isn't the bytes or the time at all; it's the variance. The purpose of a queue isn't to make a device send faster or slower; it's to cover up variance.<br> <p> The sources of the variance will have to be considered carefully; variance caused by time delays on the output is probably different from that caused by multiple clients asynchronously loading data into the input.<br> </div> Wed, 24 Aug 2011 15:03:24 +0000 Network transmit queue limits https://lwn.net/Articles/455299/ https://lwn.net/Articles/455299/ butlerm <div class="FormattedComment"> <font class="QuotedText">&gt;I wasn't there (so I'm probably wrong), but I believe that slow-start was designed as a fairly naive mechanism because it was not supposed to matter much in practice</font><br> <p> It is worth keeping in mind that slow start is not very slow - it is a doubling of the congestion window (and hence average transmit bandwidth) every round trip time. If you don't have something like slow start a new connection tends to immediately saturate every bottleneck link, causing large scale packet loss not only on the new connection, but all the others using the link as well.<br> <p> That puts all the (congestion controlled) flows on the link into some sort of recovery mode, which is generally much slower than slow start is in the first place - a constant increase every RTT rather than a multiplicative one. <br> <p> It works, the flows do sort themselves out, but it isn't very friendly, and usually doesn't even help the new connection. That is why they adopted "slow" start in the first place. It replaced previous practice of saturating the outbound link until some sort of loss indication was received. Running a gigabit per second flow into a ten megabit per second link doesn't work all that well.<br> </div> Wed, 17 Aug 2011 16:20:29 +0000 Network transmit queue limits https://lwn.net/Articles/455296/ https://lwn.net/Articles/455296/ butlerm <div class="FormattedComment"> I wouldn't worry too much about an initial congestion window of ten packets. On a five mbps bottleneck link with 1500 byte packets that is only about 2.4 ms of queuing delay. The queuing delay due to ack compression as the congestion window increases is probably going to be considerably higher than that.<br> <p> There seems to me to be only two good ways to solve the queuing latency problem, beyond simply reducing queuing limits on bottleneck routers and interfaces to reasonable sizes. One is the widespread deployment of packet pacing, which is difficult to do well without hardware support, and which has other challenges. The other is fair (flow specific) queuing at every bottleneck router or interface. The latter seems much more practical to me.<br> </div> Wed, 17 Aug 2011 16:02:20 +0000 Network transmit queue limits https://lwn.net/Articles/455276/ https://lwn.net/Articles/455276/ nye <div class="FormattedComment"> <font class="QuotedText">&gt;I don't have my library handy, but I seem to recall that Tanenbaum discusses TCP congestion control at length. I'm sure you'll find something good in Stevens too.</font><br> <p> Thanks for the reference. I don't know Stevens - I assume you're talking about TCP/IP illustrated? I notice there's a second edition due out later this year. Sadly not in paperback though; can't stand hardbacks so I'll probably give it a miss.<br> <p> <font class="QuotedText">&gt;&gt; since the sender already has an upper bound for the min-RTT, why is the initial congestion window set to a fixed number rather than to "the number of segments that can be transmitted in the RTT"</font><br> <p> <font class="QuotedText">&gt; Recall that the congestion window is there to limit congestion: it should decrease as congestion increases. With typical queueing techniques, the RTT increases with congestion, so what you are suggesting has the opposite of the desired dynamics.</font><br> <p> Sorry, I should have said "the number of segments that can be transmitted in the *minimum* RTT", and then only as the *initial* cwnd. The thinking being that it can't possibly have received an ACK yet, so the fact that it hasn't need not imply congestion. I haven't really thought through the implications of that in the case that the 3-way handshake is made under highly congested conditions though, giving a vastly inaccurate bound for the min-RTT<br> <p> <font class="QuotedText">&gt;I wasn't there (so I'm probably wrong), but I believe that slow-start was designed as a fairly naive mechanism because it was not supposed to matter much in practice. TCP connections were supposed to be either long-lived bulk transfers (FTP, say), or interactive flows</font><br> <p> This is interesting, from the point of view of how we're predominantly using a protocol for something a little out of its design parameters. <br> <p> (I was going to go off on a tangent here about using TCP/IP in circumstances which break its design assumptions, like bufferbloat and highly asymmetrical connections, but I need to think about it some more)<br> <p> <p> </div> Wed, 17 Aug 2011 12:05:43 +0000 Network transmit queue limits https://lwn.net/Articles/455245/ https://lwn.net/Articles/455245/ jch <div class="FormattedComment"> <font class="QuotedText">&gt; If anyone knows of any resources which explain this problem from "first principles"</font><br> <p> I don't have my library handy, but I seem to recall that Tanenbaum discusses TCP congestion control at length. I'm sure you'll find something good in Stevens too.<br> <p> <font class="QuotedText">&gt; I can't wrap my head around slow-start, probably because I don't think I understand the problem it's intended to solve.</font><br> <p> I'll make the bold claim that nobody fully understands the dynamics of TCP.<br> <p> I wasn't there (so I'm probably wrong), but I believe that slow-start was designed as a fairly naive mechanism because it was not supposed to matter much in practice. TCP connections were supposed to be either long-lived bulk transfers (FTP, say), or interactive flows (telnet, or the conversational phase of SMTP). In the first case, slow-start only happens at the beginning of the transfer, which is a negligible part of the connection, while in the second case the size of the congestion window doesn't matter.<br> <p> The trouble is with HTTP, which causes a lot of short-lived connections. Such a short-lived connection spends most or all of its life in slow-start. Hence the need for sharing state between different connections (which Linux does AFAIR) or tweaking the initial window.<br> <p> <font class="QuotedText">&gt; since the sender already has an upper bound for the min-RTT, why is the initial congestion window set to a fixed number rather than to "the number of segments that can be transmitted in the RTT"</font><br> <p> Recall that the congestion window is there to limit congestion: it should decrease as congestion increases. With typical queueing techniques, the RTT increases with congestion, so what you are suggesting has the opposite of the desired dynamics.<br> <p> Yeah, it's tricky. No, I don't claim to understand the trade-offs involved.<br> <p> --jch<br> <p> </div> Tue, 16 Aug 2011 22:19:30 +0000 Network transmit queue limits https://lwn.net/Articles/455240/ https://lwn.net/Articles/455240/ jch <div class="FormattedComment"> It does cause more packets to be queued, which increases queue length and hence network-layer latency. OTOH, it does cause packets to be sent more faster, which I guess can be described as reducing application-layer latency (the time needed to load a page).<br> <p> That's just the kind of tricky trade-off that the bufferbloat project is struggling with.<br> <p> --jch<br> <p> </div> Tue, 16 Aug 2011 21:45:06 +0000 Network transmit queue limits https://lwn.net/Articles/454990/ https://lwn.net/Articles/454990/ nye <div class="FormattedComment"> (Please excuse the naivety of this question)<br> <p> I can't wrap my head around slow-start, probably because I don't think I understand the problem it's intended to solve.<br> <p> What I'm wondering is: since the sender already has an upper bound for the min-RTT, why is the initial congestion window set to a fixed number rather than to "the number of segments that can be transmitted in the RTT" (or the receiver's advertised window if smaller)?<br> <p> I guess this wouldn't work for high-latency congested links since the initial window is IIUC used as the *minimum* window to fall back to when congestion occurs, but why does that need to be the case? I suspect the answer to this question may be along the lines of "that's the point of slow-start", but it's not intuitive to me.<br> <p> If anyone knows of any resources which explain this problem from "first principles" - ie. without requiring the reader to already have more than a passing familiarity with TCP - I'd appreciate a pointer.<br> </div> Mon, 15 Aug 2011 14:22:03 +0000 Network transmit queue limits https://lwn.net/Articles/454986/ https://lwn.net/Articles/454986/ corbet You're talking about the congestion window change? That's very much about latency. It lets pages load more quickly without the need to open lots of independent connections; the associated documentation is very clear on the motivation. Mon, 15 Aug 2011 13:45:11 +0000 Network transmit queue limits https://lwn.net/Articles/454976/ https://lwn.net/Articles/454976/ jch <div class="FormattedComment"> <font class="QuotedText">&gt; So it is not surprising that we have seen various latency-reducing changes from Google, including the increase in the initial congestion window</font><br> <p> This doesn't decrease latency -- it increases throughput for short-lived connections ("mice"). Quite the opposite, in underprovisioned networks with a lot of mice it could increase latency dramatically.<br> <p> --jch<br> <p> </div> Mon, 15 Aug 2011 11:43:10 +0000 Initial congestion window https://lwn.net/Articles/454911/ https://lwn.net/Articles/454911/ butlerm Sorry for creating any confusion. I see on <a href="http://git.kernel.org/?p=linux/kernel/git/stable/linux-3.0.y.git;a=history;f=include/net/tcp.h;h=cda30ea354a214072b634ee9c2fa9b7ff23cc216;hb=HEAD">git.kernel.org</a> that both patches have made it in, which is good news. However, I believe that the increase to the initial congestion window is still a <a href="http://tools.ietf.org/html/draft-hkchu-tcpm-initcwnd-01">draft</a>, not an RFC. Sun, 14 Aug 2011 00:11:00 +0000 In the wild https://lwn.net/Articles/454889/ https://lwn.net/Articles/454889/ dmarti If you're using Google or Microsoft web sites, you're probably also testing this: <a href="http://blog.benstrong.com/2010/11/google-and-microsoft-cheat-on-slow.html">Google and Microsoft Cheat on Slow-Start. Should You?</a> Sat, 13 Aug 2011 14:51:05 +0000 Initial congestion window https://lwn.net/Articles/454887/ https://lwn.net/Articles/454887/ corbet No, it's the initial congestion window; I'm not quite sure where this comes from. And yes it went through a long process with the IETF first. Sat, 13 Aug 2011 14:18:14 +0000 Network transmit queue limits https://lwn.net/Articles/454872/ https://lwn.net/Articles/454872/ butlerm <div class="FormattedComment"> Getting the time accurate to microseconds can be a rather expensive operation, unfortunately, and that weighs against regulating queue lengths in terms of time when a simple proxy like bytes is available.<br> </div> Sat, 13 Aug 2011 07:40:11 +0000 Network transmit queue limits https://lwn.net/Articles/454870/ https://lwn.net/Articles/454870/ butlerm According to the linked article, the patch which was merged in 2.6.38 increases the initial <em>receive</em> window, not the initial congestion window. A patch increasing the initial congestion window would be the sort of thing the IETF would frown upon - without their blessing, of course. Sat, 13 Aug 2011 07:36:43 +0000 Network transmit queue limits https://lwn.net/Articles/454861/ https://lwn.net/Articles/454861/ dlang <div class="FormattedComment"> the key thing is that if the delay in transmitting is going to be too long, you want to be able to have the upper layers return an error rather than leaving the data in the queue.<br> </div> Sat, 13 Aug 2011 05:27:26 +0000 Network transmit queue limits https://lwn.net/Articles/454859/ https://lwn.net/Articles/454859/ sfink <div class="FormattedComment"> This may very well be the right solution, but it seems less obvious than the text of this article would imply. Rather than dynamically adjusting the network device queue length, it seems like you'd really want to keep the device queue at short as possible without getting underruns, and feed it with a much larger priority queue of per-connection queues controlled by the kernel -- one which is lockless and served by a very high priority realtime thread.<br> <p> But I don't know anything about what's involved so this probably isn't a realistic solution.<br> </div> Sat, 13 Aug 2011 05:15:10 +0000 Network transmit queue limits https://lwn.net/Articles/454792/ https://lwn.net/Articles/454792/ ajb I was thinking of something along the lines of: <pre> void q_add(Q *q,PKT *pkt) { // timestamp packet pkt->time=now(); // add packet to end of list *q->last=pkt; q->last=&amp;pkt->next; } PKT *q_get(Q *q) { PKT *pkt=q->first; if((pkt->time+q->max_time) < now()) { free(pkt); return 0; } else { return pkt; } } </pre> No estimation at all. There are weaknesses in this approach, but it's simpler than adjusting a byte length. Fri, 12 Aug 2011 17:25:12 +0000 Network transmit queue limits https://lwn.net/Articles/454791/ https://lwn.net/Articles/454791/ dlang <div class="FormattedComment"> time is _much_ harder to estimate and measure than bytes.<br> <p> if you have a full-duplex connection (i.e. hard-wired ethernet on modern switches), bytes and time have a very close correlation.<br> <p> if you are on a shared media connection (unfortunantly including all radio based systems), then the correlation is not as close due to the fact that you can't know ahead of time how long it will take to send the data (you have to wait for other systems, retry, etc)<br> <p> I think bytes is as accurate as you are going to be able to get.<br> </div> Fri, 12 Aug 2011 16:55:45 +0000 Network transmit queue limits https://lwn.net/Articles/454760/ https://lwn.net/Articles/454760/ ajb <div class="FormattedComment"> I wonder if it wouldn't work better to define the queue length in microseconds, rather than bytes. That seems to be what this mechanism is approximating. <br> </div> Fri, 12 Aug 2011 12:09:26 +0000