Proposal: FINAL_DATA #2949

bemasc · 2024-11-14T22:46:52Z

This adds a FINAL_DATA capsule type to make clean shutdown explicit.

DavidSchinazi

I do think we should add a FINAL_DATA. It allows for example sending metadata after the stream is gracefully closed. I can imagine using that to send capsules that contain TCP performance info

kazuho · 2024-11-19T02:05:05Z

Thank you for opening the new issue dedicated to FINAL_DATA.

@DavidSchinazi

I do think we should add a FINAL_DATA. It allows for example sending metadata after the stream is gracefully closed. I can imagine using that to send capsules that contain TCP performance info

I think the question is if we would benefit from being able to observe the ordering between close and metadata; if there is, then it makes sense to have a frame indicating close. Otherwise, it is waste, considering that most underlying layers (HTTP/2, HTTP/3, TLS) already provide ways to distinguish between graceful shutdown and abrupt close.

For things like performance information, I don't think there would be material difference in sending them right before or after signalling the closure. I'd assume that we would allow such information to be sent at any moment during the lifetime of the tunnel, and that receivers would record whatever they receive. Senders that want to emit most recent information can send performance information at the moment they notice the TCP tunnel getting closed, and close the tunnel.

Separately, if we have the interest in sending metadata after closure, I am not sure if such interest applies only to graceful close and not to resets? To paraphrase, it might make more sense to define a frame that conveys closure and how it is closed (i.e., graceful or reset), rather than FINAL_DATA that only signals graceful shutdown.

DavidSchinazi · 2024-11-19T02:49:43Z

Otherwise, it is waste

What is wasted? This just requires registering a capsule type, which is a 2^62 registry. In most cases, the proxy will send the last bit of data inside a FINAL_DATA capsule. In the unusual scenario where it learns about the FIN after it sent its last DATA capsule, then it does need to send 2 bytes.

allow such information to be sent at any moment during the lifetime of the tunnel

FWIW, the lifetime of the tunnel is already somewhat disconnected from the TCP FIN since a FIN only closes one side of the connection. I see a value in telling the peer that TCP closed gracefully while reserving the right to send future capsules in response to something the peer sends.

it might make more sense to define a frame that conveys closure and how it is closed (i.e., graceful or reset), rather than FINAL_DATA that only signals graceful shutdown.

Having a third capsule like DATA_ABORTED doesn't seem unreasonable to me.

PiotrSikora · 2024-11-20T16:14:54Z

it might make more sense to define a frame that conveys closure and how it is closed (i.e., graceful or reset), rather than FINAL_DATA that only signals graceful shutdown.

+1

bemasc · 2025-02-27T21:03:17Z

@PiotrSikora @kazuho Please comment at #3000 with whether you want to move forward with this approach or a different one. Otherwise I will merge this and publish a new release to get in before the draft deadline.

This is an alternative to #2949 Fixes #3000

martinthomson

This seems like a better approach, but I'm not finding it particularly clear about who does what (and in reaction to what).

draft-ietf-httpbis-connect-tcp.md

bemasc · 2025-04-15T21:41:20Z

This seems like a better approach, but I'm not finding it particularly clear about who does what (and in reaction to what).

I've added a lot more detail (and diagrams!) to this PR to try to pin down precisely what is proposed here.

wtarreau · 2025-04-16T03:34:23Z

Nice! It looks clean to me. I'm just having one nit here:

- HTTP/1.1 over TLS: a TLS Error Alert

I'd instead say:

- HTTP/1.1 over TLS: a TLS Error Alert or TCP RST

The rationale for this is the following: sending a TCP RST doesn't cost anything for the sender, so a client could easily send hundreds of thousands of them per second. However, the gateway dealing with protocol translation would face a risk of trivial denial of service by sending a TLS alert, because that means you need to be the last one sending before closing, with the connection ending in TIME_WAIT on its outgoing side, something that must never ever happen, otherwise it ends up depleting its source ports in a fraction of second and fails to connect to the servers for a minute or more. The only way to avoid this is to close the TCP connection abrubptly as well, but that results in the loss of the TLS alert for the other one (data queued in network buffers are destroyed by the system as soon as the connection is reset). As such, the next hop cannot reliably expect to see the TLS alert, and must be prepared to just see the TCP connection being reset.

We could suggest that the TLS alert be a SHOULD that eases debugging and is more polite, but that the TCP RST is the final way of closing in any case. I.e. depending on timing the endpoint must be prepared to receive a TCP RST for all protocols ultimately transported over TCP. Some might have the chance to see an RST_STREAM or TLS alert before that.

kazuho

Thank you very much for pushing the pull request forward. Looks good overall.

draft-ietf-httpbis-connect-tcp.md

bemasc · 2025-04-16T15:19:51Z

@wtarreau It sounds like you're concerned about a malicious client and destination executing an asymmetric resource exhaustion attack against the proxy. The more I thought about this class of attacks, the more I realized there seemed to be a lot of them. I've added a section discussing these attacks and how to mitigate them.

I don't think we need to change the guidance on TLS Alerts. Sending a TCP RST in that case would not fix all the "WAIT Abuse" port exhaustion attacks, and it would violate the confidentiality promise of the "https://" URI scheme. Instead, I think these attacks must be addressed by doing resource accounting more carefully, as noted in the new text.

wtarreau · 2025-04-16T15:28:43Z

@wtarreau It sounds like you're concerned about a malicious client and destination executing an asymmetric resource exhaustion attack against the proxy. The more I thought about this class of attacks, the more I realized there seemed to be a lot of them. I've added a section discussing these attacks and how to mitigate them.

Not just attacks actually, even regular usage. The most common cases I've seen port exhaustion was on pretty valid traffic relying misdesigned protocols (e.g. SQL stuff where the client closes first). Once you have many clients doing just a few short connections, as soon as this happens more often than the the number of ephemeral ports over the time_wait duration, you're stuck. On a default linux setup, you have 28232 ports and 60s time_wait, that's an abysmally load of 470 conn/s that suffices to block them all. It doesn't require many clients, and actually a single firewall reboot or edge proxy restart in production can be sufficient to cause this. Thus for me the mitigation doesn't address accidents, it only addresses the most trivial case of a bad actor.

I don't think we need to change the guidance on TLS 8000 Alerts. Sending a TCP RST in that case would not fix all the "WAIT Abuse" port exhaustion attacks,

Yes it does because it avoids the TIME_WAIT.

and it would violate the confidentiality promise of the "https://" URI scheme.

I don't see how. Otherwise TCP would violate TLS, since an RST can happen for many reasons, the first one being just the gateway crashing or being restarted in the middle of transfers.

Instead, I think these attacks must be addressed by doing resource accounting more carefully, as noted in the new text.

Attacks do have to be mitigated, but it's also our responsibility to make sure that protocols and their extensions are designed in a way that doesn't cause domino effects on production equipment or that they amplify the consequences of minor accidents (e.g. a first layer of proxy being restarted before the gateway that performs the conversion).

bemasc · 2025-04-16T17:33:07Z

@wtarreau It sounds like you're concerned about a malicious client and destination executing an asymmetric resource exhaustion attack against the proxy. The more I thought about this class of attacks, the more I realized there seemed to be a lot of them. I've added a section discussing these attacks and how to mitigate them.

Not just attacks actually, even regular usage. The most common cases I've seen port exhaustion was on pretty valid traffic relying misdesigned protocols (e.g. SQL stuff where the client closes first). Once you have many clients doing just a few short connections, as soon as this happens more often than the the number of ephemeral ports over the time_wait duration, you're stuck. On a default linux setup, you have 28232 ports and 60s time_wait, that's an abysmally load of 470 conn/s that suffices to block them all. It doesn't require many clients, and actually a single firewall reboot or edge proxy restart in production can be sufficient to cause this. Thus for me the mitigation doesn't address accidents, it only addresses the most trivial case of a bad actor.

I'm not sure I understand this problem. TIME-WAIT is per-4-tuple, so the 470 conn/s limit is at each client, not at the proxy. (Or maybe you're thinking of 470 conn/s from the proxy to any single destination? But that would not be affected by how we structure the client<->proxy protocol.)

I don't think we need to change the guidance on TLS Alerts. Sending a TCP RST in that case would not fix all the "WAIT Abuse" port exhaustion attacks,

Yes it does because it avoids the TIME_WAIT.

The TCP RFC says that sending RST means you enter TIME-WAIT. Regardless, TIME-WAIT is by 4-tuple, so it primarily applies when the client closes.

and it would violate the confidentiality promise of the "https://" URI scheme.

I don't see how. Otherwise TCP would violate TLS, since an RST can happen for many reasons, the first one being just the gateway crashing or being restarted in the middle of transfers.

These RSTs are HTTP response content, and cannot leak outside the TLS envelope. If a standard HTTP gateway receives a TCP RST from upstream, it forwards it as an HTTP 5XX or stream error inside TLS.

BTW, using a TCP RST here also violates the "https://" integrity guarantees: a middlebox can (and often does) convert the RST to a FIN, losing the error signal.

Instead, I think these attacks must be addressed by doing resource accounting more carefully, as noted in the new text.

Attacks do have to be mitigated, but it's also our responsibility to make sure that protocols and their extensions are designed in a way that doesn't cause domino effects on production equipment or that they amplify the consequences of minor accidents (e.g. a first layer of proxy being restarted before the gateway that performs the conversion).

Definitely! If we have a convincing way to make this protocol safer to deploy, I'm all for it. Otherwise, perhaps you can contribute some text to the Operational Considerations.

wtarreau · 2025-04-19T08:38:46Z

The TCP RFC says that sending RST means you enter TIME-WAIT.

In fact not really, that's a non-normative "should". In practices all stacks I've dealt with till now don't do that, and for a good reason, which is that the only ways for a userland application to trigger an RST are 1) breaking the association by connecting to AF_UNSPEC , leaving no trace of the connection in the TCP table, or 2) disabling lingering and closing, in which case the RST is more a consequence of the destruction of pending unsent data. The only case an application needs to force an RST is to close an outgoing connection precisely to avoid monopolizing an ephemeral port to a destination.

Regarding the direction of the RST, I think we were not talking about the same since you're speaking about responses. I agree that with responses generally an intermediary will send a 5xx (if the RST was received before the intermediary started to send headers), or break TLS to the client. I was speaking about the other direction: a TCP client reaches a gateway that encapsulates the connection over HTTP. In this case the client can close using RST at no cost, and it must not incur a cost for the gateway. If the gateway is forced to emit a TLS alert, it will have to keep the ephemeral port open to the next hop. My point is that for such directions the intermediary must be allowed to close using RST (and the next hop to detect that as well, just as if the gateway crashed after all).

If we have a convincing way to make this protocol safer to deploy, I'm all for it. Otherwise, perhaps you can contribute some text to the Operational Considerations.

I think that just using this does the job:

HTTP/1.1 over TLS: a TLS Error Alert, or TCP RST as a last resort

bemasc · 2025-04-21T14:10:39Z

@wtarreau OK, it sounds like your main concern is about shared clients being left in the TIME-WAIT state on the client<->proxy connection. After thinking about this a bit, I believe there are many ways that this can happen, even when connections are closed cleanly.

Ultimately, this is just one of several major downsides of using HTTP/1.1 for this purpose. I've added Operational Considerations text recommending against use of HTTP/1.1 for this kind of deployment and enumerating several concerns, including this one.

wtarreau · 2025-04-21T14:38:57Z

OK I think there was a misunderstanding between us, because by client I meant the original TCP one for which the gateway would encapsulate the TCP connection while in your case the client is the client of the protocol acting for another one. Then we're in line. And I generally agree with your approach recommending against HTTP/1.1 in this case. Thanks!

draft-ietf-httpbis-connect-tcp.md

LPardue · 2025-04-21T22:54:11Z

draft-ietf-httpbis-connect-tcp.md

+    - HTTP/3: RESET_STREAM with H3_CONNECT_ERROR
+    - HTTP/2: RST_STREAM with CONNECT_ERROR


I think these are the first mentions of these frames, you may want to cite where they come from (i.e. link to the QUIC frame and HTTP/3 error code, the HTTP/2 frame and code, etc).

I haven't been able to figure out how to make kramdown-rfc do this. :-/ If you know how, please do share.

LPardue · 2025-04-21T23:01:54Z

draft-ietf-httpbis-connect-tcp.md

@@ -212,8 +273,32 @@ Template-driven TCP proxying is largely subject to the same security risks as cl

 A small additional risk is posed by the use of a URI Template parser on the client side.  The template input string could be crafted to exploit any vulnerabilities in the parser implementation.  Client implementers should apply their usual precautions for code that processes untrusted inputs.

+## Resource Exhaustion attacks
+
+Proxy implementors should take special care to avoid resource exhaustion attacks when the client is not trusted. A malicious client can achieve highly asymmetric resource usage by colluding with a destination server and violating the ordinary rules of TCP or HTTP.  Some example attacks and mitigations:


Do you need the is not trusted caveat? Maybe even deleting the entire sentence leaves the paragraph clear enough that there are client-initiated resource exhaustion attacks a proxy needs to watch out for.

NB: I don't tend to trust anything at this layer of the stack, since it can be controlled by untrusted code running at a higher layer

OK, I shortened this considerably.

LPardue · 2025-04-21T23:03:33Z

draft-ietf-httpbis-connect-tcp.md

+  - Mitigation: Limit the number of concurrent connections per client.
+* **Window Bloat**: An attacker can grow the receive window size by simulating a "long, fat network" {{?RFC7323}}, then fill the window (from the sender) and stop acknowledging it (at the receiver).  This leaves the proxy buffering up to 1 GiB of TCP data until some timeout, while the attacker does not have to retain a large buffer.
+  - Mitigation: Limit the maximum receive window for TCP and HTTP connections, and the size of userspace buffers used for proxying.  Alternatively, monitor the connections' send queues and limit the total buffered data per client.
+* **WAIT Abuse**: An attacker can force the proxy into a TIME-WAIT, CLOSE-WAIT, or FIN-WAIT state until the timer expires, tying up a proxy<->destination 4-tuple for up to four minutes after the client's connection is closed.


Where does four minutes come from?

The TCP RFC defines the TIME-WAIT timeout as 2 * MSL, which is 2 minutes.

LPardue

Thanks for working on this. I raised a few nits/suggestions but this LGTM whatever you decide.

draft-ietf-httpbis-connect-tcp.md

Co-authored-by: Kazuho Oku <kazuhooku@gmail.com>

Proposal: FINAL_DATA

a0e7b9d

This adds a FINAL_DATA capsule type to make clean shutdown explicit.

bemasc added the connect-tcp draft-ietf-httpbis-connect-tcp label Nov 14, 2024

bemasc mentioned this pull request Nov 14, 2024

[connect-tcp] Remove bare TCP payload mode #2943

Merged

DavidSchinazi approved these changes Nov 18, 2024

View reviewed changes

Base automatically changed from bemasc-capsule-only to main November 18, 2024 23:22

Merge branch 'main' into bemasc-final-data

2ce34d5

bemasc marked this pull request as ready for review November 18, 2024 23:22

bemasc mentioned this pull request Feb 15, 2025

Resolve how to signal TCP RST vs. FIN #3000

Open

Merge branch 'main' into bemasc-final-data

8ad11f3

bemasc added a commit that referenced this pull request Apr 9, 2025

[connect-tcp] Proposal: END_DATA and STOP_SENDING_DATA

91fde0e

This is an alternative to #2949 Fixes #3000

bemasc added a commit that referenced this pull request Apr 9, 2025

[connect-tcp] Proposal: END_DATA and STOP_SENDING_DATA

b3ac21f

This is an alternative to #2949 Fixes #3000

bemasc mentioned this pull request Apr 9, 2025

[connect-tcp] Proposal: END_DATA and STOP_SENDING_DATA #3065

Closed

martinthomson reviewed Apr 12, 2025

View reviewed changes

draft-ietf-httpbis-connect-tcp.md Outdated Show resolved Hide resolved

draft-ietf-httpbis-connect-tcp.md 8000 Outdated Show resolved Hide resolved

draft-ietf-httpbis-connect-tcp.md Outdated Show resolved Hide resolved

Major revision, add diagrams

5b73e23

TavoElgueaLopez approved these changes Apr 16, 2025

View reviewed changes

kazuho reviewed Apr 16, 2025

View reviewed changes

draft-ietf-httpbis-connect-tcp.md Show resolved Hide resolved

Add a section on resource exhaustion attacks

e4fb08b

Correction: TIME-WAIT claims a 4-tuple, not a port

1640f7d

Add operational considerations discouraging HTTP/1.1

1541b0e

LPardue reviewed Apr 21, 2025

View reviewed changes

draft-ietf-httpbis-connect-tcp.md Outdated Show resolved Hide resolved

LPardue reviewed Apr 21, 2025

View reviewed changes

LPardue approved these changes Apr 21, 2025

View reviewed changes

Incorporate comments from Lucas

1f61f6a

kazuho reviewed Apr 24, 2025

View reviewed changes

draft-ietf-httpbis-connect-tcp.md Outdated Show resolved Hide resolved

draft-ietf-httpbis-connect-tcp.md Outdated Show resolved Hide resolved

draft-ietf-httpbis-connect-tcp.md Show resolved Hide resolved

Apply suggestions from Kazuho

de0c152

Co-authored-by: Kazuho Oku <kazuhooku@gmail.com>

kazuho approved these changes Apr 24, 2025

View reviewed changes

Slight wording adjustments

4441310

bemasc force-pushed the bemasc-final-data branch from 7f66609 to 4441310 Compare April 24, 2025 01:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: FINAL_DATA #2949

Proposal: FINAL_DATA #2949

		- HTTP/3: RESET_STREAM with H3_CONNECT_ERROR
		- HTTP/2: RST_STREAM with CONNECT_ERROR

Proposal: FINAL_DATA #2949

Are you sure you want to change the base?

Proposal: FINAL_DATA #2949

Conversation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment