-
Notifications
You must be signed in to change notification settings 8000 - Fork 835
gRPC call error data propagation broken with HAProxy as a reverse-proxy. #1219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We are seeing this exact scenario in an implementation of the ByteStream API, where a client can upload a file calling A server can, after reading the first message and detecting that the In that case the client detects (at the gRPC level) a failed This however is not the case when going through HAProxy: gRPC interprets the result of the transaction as a |
Unfortunately at this point we're reaching the limits of what can be done by gRPC trying to reuse HTTP infrastructure with more detailed semantics than HTTP itself provides :-/ The reality here is that we have a stream abort on one side that we must signal on the other side, and that in order to best accommodate the various combinations of H1/H2 on each side, if there's an empty response it's a cancelled (i.e. the client may retry) but if it was an error or an active abort it's a stream error. There's some ongoing design work to try to improve the various abort situations and try to pass finer information between the two sides, we should definitely include this one into the mix. But I have no solution to propose till 2.5 at least given that such information cannot even be internally represented. CCing @capflam. |
By the way we've worked quite a bit on this and have clearer ideas about what needs to be changed for this to work better. However these are definitely long-term changes I'm afraid. |
I've tested your repro on several haproxy versions (from 2.2 to 3.0-DEV). And everything seems to work properly. The client always returns:
The issue is quite old now and I guess the bug was in fact not well identified and solved in the mean time. I'm closing now. Thanks ! |
@capflam I believe I have been observing this issue on real workloads, and I can still reproduce this issue on After starting the proxy/test server, the first request seems to always succeed, then we see the
Running it 200 times with
I believe this might be expected based on what
I've made a custom build where I understand from above that a lot of the complexity here might stem from cases with both HTTP/1 and HTTP/2. However, would there be a way to support forwarding error codes from the server at least in cases where only HTTP/2 is being used? |
Well, I performed another test, capturing the traffic with wireshark. And this work as expected. Both the server and haproxy do the same thing. The permission denied is returned in the 200-ok response, in the |
@capflam From what I can see in wireshark when we reproduce this: gRPC server ends the stream with END_STREAM flag in a Header frame, followed by a RST_STREAM with NO_ERROR
In the same setup where we reproduce this issue, once we remove the haproxy, let client directly talk to server, client has no problem handling HEADER(ES flag) + RST(NO_ERROR). For a HTTP/2 server that wants to close the stream, the server either has to wait for client to also ES the stream, or wait for the client to RST the stream, or if server doesn't want to wait, then of course server's only choice is the RST the stream itself. I really do not see any wrong behavior from gRPC in this case. |
You are right for the server behavior. It does not send a But you are wrong about the HAProxy behavior. It does not turn the The code is not the same between the server and the client because both sides are almost not aware of what happens on the other side. The protocol may be different, the connection management too, the content may be altered... What happens on one side is not necessarily true on the other side. In that case, I guess a The way the client handles the result of the request is the problem here. If the |
@capflam Shouldn't clients stop uploading data after RSTSTREAM regardless of error code? The stream state is driven by the frame not the error code. Server means to tell client NO_ERROR but now because of Haproxy clients see Cancel. From ending streams perspective RST is enough. The problem here is the choice of error code |
Indeed, a RST_STREAM is enough. But the reason is useful to understand what happens. And the CANCEL is here to say the stream is no longer needed. And it is the case here. Another way would be to drain the request payload, just like in HTTP/1.1. The NO_ERROR error code is to say the stream was finished without error. But here, it is not accurate. There is no reason to change the HTTP/2 behavior on this point. And I'm so 8000 rry but I disagree on the problem origin. It a client or a lib issue. Just think about it. A valid response was delivered to the client with a gRPC status (the permission denied here). And then the H2 stream is cancelled. the response must be handled first in all cases. If the reset is received later, there is no issue and the permission denied is properly reported. But if the reset is received in the same time, the cancel is reported. There is absolutely no reason to do so. I don't know how it is implemented. But H2 frames order is important. And I doubt it is specified in the gRPC specs that a cancel must be handled before all other frames. There is no problem with NO_ERROR code, but for me, it is just hiding the true bug. |
I don't agree with this but I don't know if me disagreeing makes any difference at this point. The response was handled. But after ES the client stream is at half-closed state. Then a RST_STREAM showed up, clients should still process it. All these are within h2 spec. If you are building a h2 client library today will you just ignore the RST during half-closes state? I think that's wrong. If client library delivers a finish callback and then an error to application, should application also ignore the error? I think that's also wrong. Server never cancelled the stream. It's merely a termination. Proxy cancelled it. The whole client behavior shouldn't even be a concern here but even if it is, I think client should process the final RST and its error code and notify application. Are any of these illegal in h2? |
A source of confusion on my end is that 'HTTP Request/Response Exchange' in the HTTP/2 specification indicates:
This is the situation here -- the client is streaming messages to the server, which then decides to send a complete response (with the It specifies that clients must not discard responses when getting a I would also agree that the client probably should still not discard a complete response in that case (and the Python gRPC client seems to unfortunately behave non-deterministically there at the moment), but as far as I understand, it is undefined in the HTTP/2 specification which would instead suggest to stick to |
Hum, thanks @psavalle . I was pretty sure a CANCEL should be used in that case. It seems I'm wrong. So I guess in that case HAProxy should not send a CANCEL but a NO_ERROR. indeed. I will review that with Willy to be sure. But it seems there is indeed an issue here. However, it remains pretty strange that the client does not return the gRPC status in that case. the gRPC is a protocol on top of H2. All what happens at the H2 level should not necessarily have an equivalence at the GRPC level. Here the reset is about the stream not the gRPC request. The gRPC response was provided. the reset must of couse be handled at the H2 level. But at the gRPC level, it is meaningless at this stage. So indeed, the request upload must be stopped from the H2 point of view. But I doubt there is anything to do with this reset if you already handled the permission denied. BUT BUT, it remains the wrong error code seems to be used here. So I'm reopening the issue. |
Yes we can have a look at this, it's indeed possible that we need to differentiate data interrupted after the end of the response from data interrupted before. |
…was already sent On frontend side, when a stream is shut while the response was already fully sent, it was cancelled by sending a RST_STREAM(CANCEL) frame. However, it is not accurrate. CANCEL error code must only be used if the response headers were sent, but not the full response. As stated in the RFC 9113, when the response was fully sent, to stop the request sending, a RST_STREAM with an error code of NO_ERROR must be sent. This patch should solve the issue #1219. It must be backported to all stable versions.
Should be fixed in 3.2-dev now. Thanks ! |
I have tested |
…was already sent On frontend side, when a stream is shut while the response was already fully sent, it was cancelled by sending a RST_STREAM(CANCEL) frame. However, it is not accurrate. CANCEL error code must only be used if the response headers were sent, but not the full response. As stated in the RFC 9113, when the response was fully sent, to stop the request sending, a RST_STREAM with an error code of NO_ERROR must be sent. This patch should solve the issue haproxy#1219. It must be backported to all stable versions. (cherry picked from commit e87397b) Signed-off-by: Willy Tarreau <w@1wt.eu>
Detailed description of the problem
If the gRPC server application (served from behind HAProxy) immediately (ie. before sending any responses) aborts the response streaming call with specific error, then the client application will receive
CANCELLED
error instead of the one that was sent. The error details will be Received RST_STREAM with error code 8.Expected behavior
The error data produced by the gRPC server reaches the client unchanged in the terms of gRPC protocol..
Steps to reproduce the behavior
I have prepared a repository with minimal example containing simple python client & server. It also contains docker-compose with network environment including configured HAProxy.
Do you have any idea what may have caused this?
To terminate the call gRPC server sends
RST_STREAM
with error codeNO_ERROR
. The client application expects the error code to be like that. HAProxy rewrite the error code toCANCEL
which breaks the client logic. The rewriting is done inh2_do_shutr
function ofmux_h2.c
.What is your configuration?
Output of
haproxy -vv
anduname -a
In my repository containing the minimal issue example I've received report that the issue is also present with HAProxy 2.4 and gRPC 1.37.0.
Additional information (if helpful)
The text was updated successfully, but these errors were encountered: