-
Notifications
You must be signed in to change notification settings - Fork 9
Error if body not fully downloaded #947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
For downloads that aren't using a checksum, we need to manually assert that the number of bytes written is the same as the content-length being reported in the HTTP request.
98a9b35
to
f6223df
Compare
If a download was not successful, it's content length and bytes received will not match. In that case, expect that the developer may wish to retry the download, so delete the file that was created in this function.
When a download fails due to disk IO or if the file cannot be cleaned, do not suggest an (immediate) retry. This interface would allow code like this: ```retry let mut download_attempts = 0; while download_attempts <= MAX_RETRIES { match download_file(&artifact.url, &tarball_path) { Err(error) if error.retry_suggested() && download_attempts < MAX_RETRIES => { let sub_bullet = log_background_bullet.cancel(format!("{error}")); download_attempts += 1; thread::sleep(Durati 8000 on::from_secs(1)); log_background_bullet = sub_bullet.start_timer("Retrying"); } result => { result?; let _ = log_background_bullet.done(); break; } } } ``` This allows developers who care less about the specific error states to defer the decision to whether or not an error can be retried to a canonical location.
UnexpectedBytes { | ||
expected: u64, | ||
received: u64, | ||
path_deleted: Result<(), std::io::Error>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got a little clever with this representation. I'm not sure if this is a net good or bad idea (to return a Result here).
My logic for this is: It seems unlikely that the delete would fail if we were able to create and write to the file, however, it's still possible so we need to represent it. It seems wrong to return an DownloadError::IoError
as that explicitly says it's for io errors during download (as opposed to cleanup). We could add another variant just for this case, but it would end up having almost the same information.
The primary error here is the bytes not written, whether or not the disk could be cleaned is a secondary concern.
I'm open to alternative suggestions.
.get(CONTENT_LENGTH) | ||
.ok_or_else(|| DownloadError::MissingRequiredHeader(CONTENT_LENGTH))? | ||
.to_str() | ||
.map_err(|_| DownloadError::HeaderEncodingError(CONTENT_LENGTH))? | ||
.parse() | ||
.map_err(DownloadError::CannotParseInteger)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to change this function to require a Content-Length
header (e.g. not support Transfer-Encoding: chunked
)?
Seems like this might be a bit surprising for a generic download_file
function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it'd make sense to use the content_length
function on the body instead, and only compare the written bytes if it contains a value.
That would also eliminate the need to manually get and parse the header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% overlooked that, thanks. Great suggestion. I'm wondering if ureq BodyReader is able to somehow fail the io::copy
if the connection is closed unexpectedly and we're waiting for missing packets. I'm going to make a small POC ruby server that demonstrates the behavior and compare to curl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that the behavior is documented in the changelog: https://github.com/algesten/ureq/blob/main/CHANGELOG.md#140:
Error if body length is less than content-length.
I also experimentally confirm that ureq does the correct thing in the chunked and fixed content-length case (errors the from the IO reader), see below. So, it looks like this case is handled and we don't need to explicitly check. Closing my PR.
Fixed length
$ PORT=9292 MODE=fixed ruby server.rb
Chunked server listening on port 9292
Client connected
Real length: `24`, claimed length: 34
Sent fixed body with mismatched Content-Length
$ cd ureureq_it_u_buy_it && cargo run
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.02s
Running `target/debug/ureq_it_u_buy_it`
Downloaded file size: `24` bytes
thread 'main' panicked at src/main.rs:33:12:
called `Result::unwrap()` on an `Err` value: IoError(Custom { kind: UnexpectedEof, error: "Peer disconnected" })
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Chunked
$ PORT=9292 MODE=chunked ruby server.rb
Chunked server listening on port 9292
Client connected
Sent chunk 1
Sent chunk 2
Server exiting ungracefully...
$ cd ureureq_it_u_buy_it && cargo run
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.02s
Running `target/debug/ureq_it_u_buy_it`
thread 'main' panicked at src/main.rs:31:12:
called `Result::unwrap()` on an `Err` value: IoError(Custom { kind: UnexpectedEof, error: "Peer disconnected" })
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Pull request was closed
## Problem I was under the mistaken impression that the reader returned by the `into_reader` didn't perform checks for Content-Length. It does, but even after consulting the documentation in several places, I moved forward with adding my own check in heroku/libcnb.rs#947. A co-worker corrected my implementation, which led me to discover it wasn't needed. Neither of us came to the conclusion that we were already protected via documentation alone. One of the doc locations that I checked was `into_reader`. ## Fix I added two error cases to the docs where the reader can (and should) make errors: when a connection is closed before Content-Length bytes are received and if an ungraceful close of the TCP connection occurs. At the same time, the subject of the bullet point on the reader not being "limited" wasn't clear to me. To help, I added some more details. I understand the documents weren't written for me personally, and we cannot add every possible caveat to every possible source of documentation. Feel free to take this feedback and run with it if you like. Thanks for this library!
For downloads that aren't using a checksum, we need to manually assert that the number of bytes written is the same as the content-length being reported in the HTTP request.
GUS-W-18647436