-
Notifications
You must be signed in to change notification settings - Fork 128
Experimental support for pico HTTP parser #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The problem may also exist in http_server.c:376. I'm not sure yet. |
EDIT: The memory leak actually happens when you use
~~Could you re-run the test and keep an eye on the RSS used by the program? Without
|
@@ -84,7 +85,7 @@ void free_http_request(http_request* request) | |||
free((char*)v); | |||
}); | |||
kh_destroy(string_hashmap, request->headers); | |||
free(request->url); | |||
//free(request->url); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is what's causing the memory leak I'm seeing. Uncommenting this line works while using http_parser
(it creates a seg fault though, when using pico
, as to be expected).
Using pico, I can't actually replicate the slowdown: 2 minute run
2 second run
I'm running this on a VM with just one core dedicated to it, so I'm not sure if that influences the results. I'll try on a server machine with multiple cores. |
And getting similar results on EC2 as well: 2 second run
2 minute run
|
So if i use a batch of
|
Yeah, that's the problem, is if you have a single request coming in, it's response time is bound to that timer. Another option is to push responses into a queue and have a thread pulling off the queue. We get into some complicated things here though with spin locks and contention etc. |
Batching is an inherent trade-off of throughput over latency, right? Working from that, could it make sense to define a throughput mode vs latency mode? I would it think roughly as.. Thoughput mode:
Batch mode might need some sort of simple math or hard cap on the batch flush time to prevent run away latency. Latency mode:
Latency mode still batches, but there's an upper bound on how long it waits. Dunno, just spitballing. |
It's a throughput vs latency tradeoff yes but how bad of a tradeoff depends on your implementation. Consider a thread spin waiting on the queue. As soon as an entry is pushed it'll pick it up VERY quickly. But this isn't going to be easy to build. I'm wondering if there is an alternative. |
@@ -111,6 +129,7 @@ void free_http_connection(http_connection* connection) | |||
free_http_request(connection->request); | |||
} | |||
|
|||
free(connection->response_buffers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kellabyte unless I'm missing something, response buffers are allocated statically and not through malloc, so do they really need to be freed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the buffer array doesn't need to be freed, but the base
elements do.
I actually think our I don't think me setting |
Yeah, I think HTTP/1.1 is the way to go, considering it's been around for around 16 years now and that there are already quite a few HTTP/2.0 implementations. Would you want to have an issue for tracking HTTP/1.1 compliance? On the persistent connections topic, we probably need to support the |
I've committed my changes so far but when I run a benchmark I'm still getting:
Any thoughts @nmdguerreiro? |
@@ -261,11 +286,93 @@ void http_stream_on_read_http_parser(uv_stream_t* tcp, ssize_t nread, const uv_b | |||
/* uv_close((uv_handle_t*) &client->handle, http_stream_on_close); */ | |||
} | |||
} | |||
else if (nread == UV_EOF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused here. We're closing the connection in both else branches.
@kellabyte my only guess is that |
So I
I've narrowed it down that https://github.com/kellabyte/Haywire/blob/picohttpparser/src/haywire/connection_consumer.c#L67 sometimes returns an If I only start reading if |
I can't find that Should we just close on |
Hmm perhaps. The weird thing is this only happens with
|
I changed the
|
I can't replicate the non-2xx codes you got above 😕 I just ran the benchmark repeatedly without an issue (other than a couple of timeouts):
|
Are you running on a Mac or Linux? I seem to find the problem on Linux. I actually think the issue is here now that I've switched the |
I tried both and those results are from Amazon Linux.
|
@kellabyte still having issues? |
We do static matching currently so might as well simplify this for now until `pathfinder` is implemented.
We do static matching currently so might as well simplify this for now until `pathfinder` is implemented.
We do static matching currently so might as well simplify this for now until `pathfinder` is implemented.
@nmdguerreiro I would love your code review feedback on this because there's some flaws in my implementation.
pico
http parser is a SSE accelerated HTTP parser. This can gain us quite a lot of performance in Haywire I believe. It's not a streaming parser likehttp_parser
that Haywire already uses but I've made it so you can configure which one Haywire uses. There's a cloudflare fork of this parser that is AVX optimized as well. I don't want to require AVX or SSE but since performance is Haywire's main goal, I want to get these integrated and see how fast we can go!Keep in mind there's some hard coded values and not tidy stuff in this branch so far because I'm experimenting. There's a major bug in my implementation though which is killing performance that I would love a code review on to get it fixed so we can see the potential of
pico
in Haywire.To run
pico
run the following./build/hello_world --parser pico
The big problem with my pico implementation is it's losing requests over time. The longer the benchmark the slower the requests/second throughput rate is. If you run
dstat
and watch network throughput you'll see over time it slows down until almost no responses get sent. I suspect that there's a loop/buffer management issue that is causing requests to stall and never get processed.If you do a really short benchmark you can see some amazing potential.
HTTP_PARSER
PICO
2 second run shows some great potential.
2 minute run shows how performance tanked over time.
dstat output during the pico run