Error if body not fully downloaded #947

schneems · 2025-05-28T16:34:55Z

For downloads that aren't using a checksum, we need to manually assert that the number of bytes written is the same as the content-length being reported in the HTTP request.

GUS-W-18647436

For downloads that aren't using a checksum, we need to manually assert that the number of bytes written is the same as the content-length being reported in the HTTP request.

libherokubuildpack/src/download.rs

If a download was not successful, it's content length and bytes received will not match. In that case, expect that the developer may wish to retry the download, so delete the file that was created in this function.

When a download fails due to disk IO or if the file cannot be cleaned, do not suggest an (immediate) retry. This interface would allow code like this: ```retry let mut download_attempts = 0; while download_attempts <= MAX_RETRIES { match download_file(&artifact.url, &tarball_path) { Err(error) if error.retry_suggested() && download_attempts < MAX_RETRIES => { let sub_bullet = log_background_bullet.cancel(format!("{error}")); download_attempts += 1; thread::sleep(Durati 8000 on::from_secs(1)); log_background_bullet = sub_bullet.start_timer("Retrying"); } result => { result?; let _ = log_background_bullet.done(); break; } } } ``` This allows developers who care less about the specific error states to defer the decision to whether or not an error can be retried to a canonical location.

schneems · 2025-05-28T17:59:54Z

libherokubuildpack/src/download.rs

+    UnexpectedBytes {
+        expected: u64,
+        received: u64,
+        path_deleted: Result<(), std::io::Error>,


I got a little clever with this representation. I'm not sure if this is a net good or bad idea (to return a Result here).

My logic for this is: It seems unlikely that the delete would fail if we were able to create and write to the file, however, it's still possible so we need to represent it. It seems wrong to return an DownloadError::IoError as that explicitly says it's for io errors during download (as opposed to cleanup). We could add another variant just for this case, but it would end up having almost the same information.

The primary error here is the bytes not written, whether or not the disk could be cleaned is a secondary concern.

I'm open to alternative suggestions.

runesoerensen · 2025-05-29T18:28:12Z

libherokubuildpack/src/download.rs

+        .get(CONTENT_LENGTH)
+        .ok_or_else(|| DownloadError::MissingRequiredHeader(CONTENT_LENGTH))?
+        .to_str()
+        .map_err(|_| DownloadError::HeaderEncodingError(CONTENT_LENGTH))?
+        .parse()
+        .map_err(DownloadError::CannotParseInteger)?;


Do we want to change this function to require a Content-Length header (e.g. not support Transfer-Encoding: chunked)?

Seems like this might be a bit surprising for a generic download_file function?

Perhaps it'd make sense to use the content_length function on the body instead, and only compare the written bytes if it contains a value.

That would also eliminate the need to manually get and parse the header.

100% overlooked that, thanks. Great suggestion. I'm wondering if ureq BodyReader is able to somehow fail the io::copy if the connection is closed unexpectedly and we're waiting for missing packets. I'm going to make a small POC ruby server that demonstrates the behavior and compare to curl.

I found that the behavior is documented in the changelog: https://github.com/algesten/ureq/blob/main/CHANGELOG.md#140:

Error if body length is less than content-length.

I also experimentally confirm that ureq does the correct thing in the chunked and fixed content-length case (errors the from the IO reader), see below. So, it looks like this case is handled and we don't need to explicitly check. Closing my PR.

Fixed length

$ PORT=9292 MODE=fixed ruby server.rb Chunked server listening on port 9292 Client connected Real length: `24`, claimed length: 34 Sent fixed body with mismatched Content-Length

$ cd ureureq_it_u_buy_it && cargo run Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.02s Running `target/debug/ureq_it_u_buy_it` Downloaded file size: `24` bytes thread 'main' panicked at src/main.rs:33:12: called `Result::unwrap()` on an `Err` value: IoError(Custom { kind: UnexpectedEof, error: "Peer disconnected" }) note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Chunked

$ PORT=9292 MODE=chunked ruby server.rb Chunked server listening on port 9292 Client connected Sent chunk 1 Sent chunk 2 Server exiting ungracefully...

$ cd ureureq_it_u_buy_it && cargo run Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.02s Running `target/debug/ureq_it_u_buy_it` thread 'main' panicked at src/main.rs:31:12: called `Result::unwrap()` on an `Err` value: IoError(Custom { kind: UnexpectedEof, error: "Peer disconnected" }) note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

## Problem I was under the mistaken impression that the reader returned by the `into_reader` didn't perform checks for Content-Length. It does, but even after consulting the documentation in several places, I moved forward with adding my own check in heroku/libcnb.rs#947. A co-worker corrected my implementation, which led me to discover it wasn't needed. Neither of us came to the conclusion that we were already protected via documentation alone. One of the doc locations that I checked was `into_reader`. ## Fix I added two error cases to the docs where the reader can (and should) make errors: when a connection is closed before Content-Length bytes are received and if an ungraceful close of the TCP connection occurs. At the same time, the subject of the bullet point on the reader not being "limited" wasn't clear to me. To help, I added some more details. I understand the documents weren't written for me personally, and we cannot add every possible caveat to every possible source of documentation. Feel free to take this feedback and run with it if you like. Thanks for this library!

schneems mentioned this pull request May 28, 2025

Automatically Use Proxy from Environment Variables #946

Open

Error if body not fully downloaded

f6223df

For downloads that aren't using a checksum, we need to manually assert that the number of bytes written is the same as the content-length being reported in the HTTP request.

schneems force-pushed the schneems/download-file-bytes-error branch from 98a9b35 to f6223df Compare May 28, 2025 17:02

schneems commented May 28, 2025

View reviewed changes

libherokubuildpack/src/download.rs Outdated Show resolved Hide resolved

schneems marked this pull request as ready for review May 28, 2025 17:03

schneems requested a review from a team as a code owner May 28, 2025 17:03

schneems added 3 commits May 28, 2025 12:37

Fewer temp variables

a11555c

Delete file on partial download error

f6e5568

If a download was not successful, it's content length and bytes received will not match. In that case, expect that the developer may wish to retry the download, so delete the file that was created in this function.

schneems commented May 28, 2025

View reviewed changes

schneems enabled auto-merge (squash) May 29, 2025 17:02

runesoerensen reviewed May 29, 2025

View reviewed changes

schneems closed this May 29, 2025

auto-merge was automatically disabled May 29, 2025 20:58
Pull request was closed

schneems mentioned this pull request May 29, 2025

Expand documented edge cases algesten/ureq#1074

Open

schneems deleted the schneems/download-file-bytes-error branch May 29, 2025 22:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error if body not fully downloaded #947

Error if body not fully downloaded #947

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Error if body not fully downloaded #947

Error if body not fully downloaded #947

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Fixed length

Chunked

Uh oh!

Uh oh!