8000 Configure larger BufReader for send and receive by rawler · Pull Request #263 · tiny-http/tiny-http · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Configure larger BufReader for send and receive #263

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

rawler
Copy link
Collaborator
@rawler rawler commented Jun 4, 2024

Larger BufReader can yield slightly faster transfer, at slightly lower system overhead.

An example benchmark, sending 512MB (cached) from local disk over 127.0.0.1, 100 times using 4 concurrent transfers:

| BUFFER_SIZE |    latency    |  CPU usage     |
|             |  P50  |   P95 |  USR  |  SYS   |
| 1KB         | 817ms | 891ms | 0.76s | 19.89s |
| 32KB        | 774ms | 868ms | 1.30s | 18.50s |
| 128KB       | 711ms | 774ms | 1.18s | 16.77s |

In more complex scenarios, the difference can be higher. The tradeoff is more memory used for each concurrent connection.

Larger BufReader can yield slightly faster transfer, at slightly lower
system overhead. An example benchmark, sending 512MB (cached) from local
disk over 127.0.0.1, 100 times using 4 concurrent transfers:

| BUFFER_SIZE |    latency    |  CPU usage     |
|             |  P50  |   P95 |  USR  |  SYS   |
| 1KB         | 817ms | 891ms | 0.76s | 19.89s |
| 32KB        | 774ms | 868ms | 1.30s | 18.50s |
| 128KB       | 711ms | 774ms | 1.18s | 16.77s |

In more complex scenarios, the difference can be higher. The tradeoff is
more memory used for each concurrent connection.
@rawler
Copy link
Collaborator Author
rawler commented Jun 4, 2024

This is to be seen as a basis for discussion. To be clear, I'm not sure increasing the send&receive buffers for connections to 32KB is acceptable for all uses of tiny-http. At least on Linux machines, the default limit on threads seems to be 1024. With the 1-thread per connection model of tiny-http, the potential possible memory-use here would be 2321024KB, or 64MB.


The net effect we're after, is that tracing one of our apps in production, we discovered that the syscall to send packets works with really small buffers of just 8KB, leading to lots of syscalls to transfer a 22GB file over a 50gbit network. These 8KB transfer buffers is the default behavior of rusts std::io::copy where it looks at any known buffer sizes of the reader and writer, but defaults and raises the copy-buffer to at least 8KB.

Increasing the send-buffer in our production environment significantly improved throughput and reduced system load, but since this environment is also receiving very fluctuating production traffic, it's difficult to present an exact benchmark.


As an alternative to configuring the sizes of BufReader and BufWriter, one could consider writing a custom implementation of std::io::copy instead, using a local stack buffer of ~64KB. Doing so would ensure that the memory is just used while actually streaming the response, and increasing the use of the thread stack might inflate memory less than fixed allocation per-connection. The bigger downside is that std::io::copy does have a few specializations for the cases when R and W are raw file-descriptor backed streams, using low-level kernel-splicing for extremely efficient copying. Not sure if that's ever applicable to tinyhttp?

Another option would be to let users configure buffer-size for copying, but that would require a more flexible way to construct the server than exists today.

@rawler
Copy link
Collaborator Author
rawler commented Jun 4, 2024

NVM. I just realized the variant we actually tested in prod (manual reimpl of std::io::copy) is way outperforming the BufWriter-size solution.

4x100 connections of 512MB with a 32KB copy-buffer here only took a P95 of 616ms, using 0.21 + 13.45s USR and SYS CPU. I'll try to produce an adapted PR tomorrow.

@rawler rawler closed this Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0