Redesign endpoint and transfer API #117

kevinmehall · 2025-02-24T06:17:54Z

This is a redesign of the main APIs for transfers in nusb v0.2:

The bulk_in / bulk_out / interrupt_in / interrupt_out APIs returning TransferFuture have been removed. The cancel-on-drop semantics have proven to be unhelpful, and I increasingly think that a cancel-safe API is a must here. USB transfer semantics are fundamentally incompatible with the synchronous drop imposed by Rust futures.
To use a non-control endpoint, you must now claim the endpoint for exclusive access with interface.endpoint::<EpType, Dir>(addr). This checks that the endpoint exists and the type and direction match the descriptor, moving those errors up-front instead of when performing a transfer. It also requires that the endpoint is not already claimed.
- Most v0.1 users create a struct containing an Interface, along with u8 addresses for one or more endpoints. Instead, you'd hold one or more Endpoints to perform transfers on.
- Exclusively claiming an endpoint allows enabling RAW_IO on Windows when creating an endpoint, as well as potentially allowing other options to be changed that (especially on Windows) are not safe to modify while requests are pending.
- Similarly, you must drop all Endpoints of an interface before changing alternate setting.
The Endpoint API is similar to the v0.1 Queue API. However instead of hiding the transfers and handling only buffers, this introduces Request and Completion types that wrap both the buffer and platform-specific transfer data structures (e.g. URB on Linux).
- You obtain a Request<EpType, Dir> from endpoint.allocate(capacity), pass it to endpoint.submit(request), and de-queue a completed transfer with endpoint.next_complete() returning Completion<EpType, Dir>. Request<Bulk | Interrupt, Out> has Vec-like methods for filling the buffer, and Completion<Bulk | Interrupt, In> derefs to &[u8] for accessing the received data. This will enable alternative allocators for zero-copy and allow the API to adapt to the additional requirements for isochronous transfers.
- ~~Currently the buffer capacity is fixed once allocated, and the buffer cannot be converted from or into a Vec. Those would be possible to add, but would complicate the story around zero-copy buffers.~~
The internal implementation of transfers and the interaction between platform-specific and platform-independent code has been simplified.
- @Yatekii this is going to be a terrible merge conflict with your WebUSB branch, sorry. However, I hope it will make WebUSB easier -- I think WebUSB will be able to ignore crate::transfer::internal completely, and implement the internal Interface and Endpoint methods on top of JsFuture, hopefully without any need for unsafe.
Control transfers remain as Futures, because a queue-based API would be very inconvenient for most control transfer use cases, and they are expected to complete quickly so cancellation is less of an issue.
- Blocking and async control transfer APIs have been unified using the recently-added MaybeFuture trait. This also avoids the locking pitfalls of Linux's blocking control transfer ioctl by implementing all control transfers with async internals.
- Control transfers now have a timeout parameter. Unfortunately, on Windows, the timeout is set via an endpoint "policy" rather than per-request, and like with RAW_IO, it is unclear whether SetPipePolicy is allowed with transfers pending. For now, the passed timeout is ignored and WinUSB has a default control transfer timeout of 5s.
- Because of the timeout, and for consistency with macOS where per-transfer cancellation is not possible, and with WebUSB where cancellation is not possible altogether, dropping a control 8000 transfer future currently does not cancel the transfer, but waits for it to time out. This may be revisited.
The Endpoint / Request / Completion API is a little complicated for simple use cases that might have used libusb's synchronous API, or nusb v0.1's TransferFuture APIs to send and receive small transfers. I think a set of convenience APIs implemented on top of Endpoint's public APIs could fill this need.
- I plan to add Endpoint wrappers that implement Read + ReadBuf + AsyncRead + AsyncReadBuf and Write + AsyncWrite traits. It would accept parameters for transfer size, and number of concurrent transfers to maintain, giving control over buffering, but also abstracting away the entire concept of "transfers" for the code written on top. For writing to an OUT endpoint, a Write implementation along with methods for forcing a short or zero-length packet seems like it could serve a lot of these needs. For IN endpoints, Read is a little less direct, but I'm experimenting with a mode where short packets are handled specially.

Feedback welcome!

Resonanz · 2025-02-24T09:00:53Z

Great job, and thanks for your efforts and hard work on this Kevin.

Resonanz · 2025-02-24T09:05:12Z

Any chance that you have some demo code to play with for the new API, Kevin? I was messing with interrupt in and out and queues today so this update is quite timely. I'll be back onto it tomorrow. Thanks.

kevinmehall · 2025-02-24T16:20:45Z

@Resonanz There's a small example that was updated in this PR. Bulk works the same as interrupt.

Yatekii · 2025-02-25T01:31:25Z

Great stuff! I am happy rebasing :) I will have a look through this PR tomorrow and then try and make a new webusb branch based on it :)

P.S. I am using the webusb branch successfully for developing https://inspect.probe.rs/ since a while :)

martinling · 2025-02-25T14:24:27Z

Great stuff. This looks like the right way forward to me.

Yatekii · 2025-03-03T00:53:20Z

@kevinmehall I run into the problem where WebUSB needs to be able to report transfer errors. The PlatformSubmit trait does not currently allow to do so. Do I miss the obvious or is this something we could think about in this PR too?

kevinmehall · 2025-03-03T01:09:05Z

PlatformSubmit no longer exists on this branch, so hopefully it's more straightforward now.

The interface between the platform code and the common code is:
platform::Transfer - a type that contains an owned buffer (maybe just Vec for WebUSB) , endpoint number, status, and any state you'd want to re-use when resubmitting a transfer.
platform::Endpoint::allocate - produce a Transfer with specified buffer size
platform::Endpoint::submit - consume ownership of a Transfer and submit it
platform::Endpoint::poll_next_complete - if a transfer has completed, return ownership of Transfer with the status updated

I think the implementation between submit and poll_next_complete could be something like a VecDeque<(JsFuture, Transfer)> where poll_next_complete polls the head JsFuture, and if complete, updates the status and buffer (for IN transfers) and returns the Transfer.

martinling · 2025-03-03T05:19:15Z

I had a go at porting Packetry to the new API.

The biggest problem I encountered is that control_in and control_out now return std::io::Error rather than TransferError.

This is a lot less useful to the caller, because with a TransferError we could specifically match on a TransferError::Stall result, but with a std::io::Error we'd need to reimplement all the OS-specific mapping of IO errors to USB outcomes that was previously handled by nusb.

That particular case is quite important, because a STALL response is a perfectly normal part of the USB protocol that userspace drivers will often need to expect and handle. It doesn't warrant bailing out with an opaque IO error as if the host were unable to talk to the device at all.

Similarly, TransferError::Disconnected is an outcome that will commonly need to be handled specifically (particularly due to the interaction with hotplug code), and which isn't obvious from a std::io::Error.

kevinmehall · 2025-03-03T05:37:36Z

Good point. Not sure why I changed that, so I'll change it back to TransferError.

martinling · 2025-03-03T12:27:50Z

Thanks! I have it working now.

I've put those changes in greatscottgadgets/packetry#222 with some CI coverage so that I can keep an eye on things as this branch evolves.

One thing I'm still a bit unsure about from an ergonomics point of view is getting rid of the standalone buffer types.

Previously, it was possible to take an inbound Completion and, after checking its status, discard the Completion wrapper and just retain the type that owns the buffer.

That type can't just be a Vec any more, since we want to support specially-allocated buffers to support zero-copy, but it could still be a separate type that only owns the buffer, and implements Deref and Drop.

This would require either making Completion a non-opaque struct again, or having a method like take_data(self) -> Buffer, but I feel like that would be a lot nicer than having to continue to carry around the Completion, complete with its status that you already checked, and its full type signature including what type of endpoint it arrived on.

As it stands it makes things quite weird:

if completion.status().is_ok() {
    println!("Received {} bytes", completion.len()); // Huh? What's the length of a completion mean?
    println!("First byte was {}", completion[0]"); // Why is completion indexable? This looks weird.

    // ...but if I name `completion` as `data` instead, then that obscures the fact it also contains a
    /// result that still needs to be checked.

    // Also now all my downstream processing has to deal with this `Completion<Bulk, In>`
    // type, until I'm either done with the data or willing to incur the cost of copying it.
    channel.send(completion); 
}

My suggestion would be something like:

if completion.status().is_ok() {
    let data = completion.data(); // data() returns &Buffer
    println!("Received {} bytes", data.len());
    println!("First byte was {}", data[0]");
    channel.send(completion.take_data()); // take_data() returns Buffer and consumes the Completion
}

martinling · 2025-03-05T20:35:13Z

A much more minor issue I encountered is that ClaimEndpointError doesn't implement std::error::Error yet, but that's just a matter of adding the necessary implementations when you get that far.

kevinmehall · 2025-03-07T16:41:54Z

Thanks for testing it out!

This would require either making Completion a non-opaque struct again, or having a method like take_data(self) -> Buffer, but I feel like that would be a lot nicer than having to continue to carry around the Completion, complete with its status that you already checked, and its full type signature including what type of endpoint it arrived on.

The endpoint type is in the type signature because I want to be able to add different methods for isochronous. Though one option would be to use one set of Request / Completion types for Bulk/Interrupt and a separate IsoRequest / IsoCompletion for Isochronous, instead of a single type with the EpType type param. I'm not sure that would end up simpler overall though.

println!("Received {} bytes", completion.len()); // Huh? What's the length of a completion mean? println!("First byte was {}", completion[0]"); // Why is completion indexable? This looks weird.
...
let data = completion.data(); // data() returns &Buffer

Agreed that Completi 8000 on<_, In> deref-ing to the data could be confusing and less obvious than a .data() method. Even without a separate Buffer type, fn data(&self) -> &[u8] might be a clearer way to access the data in addition or instead of that Deref impl. You're not using Request<_, Out> at all, but it similarly has DerefMut to let you write data to the buffer (or at least the parts that have been initialized). That one would need another type to be able to do request.data().extend_from_slice(...) instead of request.extend_from_slice(), but I'll think about doing that.

channel.send(completion.take_data()); // take_data() returns Buffer and consumes the Completion

To be fully zero-allocation, it would want to keep the non-buffer parts of the transfer, and have a way to re-combine the buffer into a Request to re-use. That seems like a lot of API surface and moving parts for the user to deal with. The buffer would still have to store the endpoint type (either type-level or as a runtime value) and address, because on Windows and macOS, isochronous buffers are registered with a specific endpoint and can't be re-used on a different endpoint. So Buffer would end up being pretty similar to Completion<_, In> except without the status field.

One possibility is exposing only Vec-based buffers: A take_data(&mut self) -> Vec<u8> that transfers ownership of the buffer if it's allocated by the standard allocator, or copies to a new Vec if the Completion owns special memory. That's still overall one copy in all cases. It could leave the Completion in a state where reuse() allocates a new buffer if necessary. Not zero copy, nor zero alloc, but if you want to send a standard Vec to something that doesn't know about nusb, it's the best you can do.

On that subject, I noticed that you're not re-using transfers/buffers in Packetry, which is probably fine now with the standard allocator, and it would be annoying to send consumed buffers back to the thread owning the Endpoint to re-submit. But if Completion owned a Linux zero-copy mmap allocation, I wonder whether the mmap/munmap overhead for every 16K transfer would make zero-copy even worthwhile for buffers only used once.

Did you consider combining those threads? Instead of blocking on the channel receive, block on the select between stop_rx and endpoint.next_complete(). Or perhaps once I finish the EndpointRead API that wraps Endpoint to implement AsyncRead, you could just let that manage the buffers for you, and the decoder could read from that directly instead of going through the VecDeque.

martinling · 2025-03-09T07:43:52Z

The endpoint type is in the type signature because I want to be able to add different methods for isochronous. Though one option would be to use one set of Request / Completion types for Bulk/Interrupt and a separate IsoRequest / IsoCompletion for Isochronous, instead of a single type with the EpType type param. I'm not sure that would end up simpler overall though.

I think having the single parameterised type for Completion makes sense. It's not that I think the Completion shouldn't have those parameters, it's just that I don't find it ergonomic to have to continue to carry the Completion around.

Agreed that Completion<_, In> deref-ing to the data could be confusing and less obvious than a .data() method. Even without a separate Buffer type, fn data(&self) -> &[u8] might be a clearer way to access the data in addition or instead of that Deref impl.

Yes, that would make the intent clearer, but it doesn't really address the fundamental issue; see below.

You're not using Request<_, Out> at all, but it similarly has DerefMut to let you write data to the buffer (or at least the parts that have been initialized). That one would need another type to be able to do request.data().extend_from_slice(...) instead of request.extend_from_slice(), but I'll think about doing that.

It's less of an issue for Request<_, Out> because in that case you're the one putting the request together, so there's no question about whether the data is valid or complete. The data is an integral part of the request.

Whereas Completion<_, In> is conceptually somewhat like a Result<Data, TransferError>. In nusb 0.1 the only reason it wasn't actually just a Result is that a failed transfer can still contain partial data which the caller may want to have.

But if the caller wants to treat the Completion like a Result, and ignore the partial results of failed transfer (which I think is going to be a common thing to do), then it's really important to have a clear distinction between a Completion and a known-complete data payload.

As it stands, there's no way to distinguish those two things in the type system!

That's why I'd like to be able to discard the Completion after checking its status: because it means there's no ambiguity downstream about whether that status still needs to be checked.

To be fully zero-allocation, it would want to keep the non-buffer parts of the transfer, and have a way to re-combine the buffer into a Request to re-use. That seems like a lot of API surface and moving parts for the user to deal with. The buffer would still have to store the endpoint type (either type-level or as a runtime value) and address, because on Windows and macOS, isochronous buffers are registered with a specific endpoint and can't be re-used on a different endpoint. So Buffer would end up being pretty similar to Completion<_, In> except without the status field.

If take_data() returns a Buffer<Bulk, In> type that contains the exact same internal fields as a Completion then that's fine - the point is just to have it be a different type, because that removes any ambiguity downstream about whether the status was already checked.

This could actually be achieved by just renaming the new Completion type to Buffer, and defining Completion similarly to nusb 0.1:

struct Completion<EpType, Dir> {
    status: Result<(), TransferError>,
    data: Buffer<EpType, Dir>,
}

Because having that discardable wrapper was important!

On that subject, I noticed that you're not re-using transfers/buffers in Packetry, which is probably fine now with the standard allocator, and it would be annoying to send consumed buffers back to the thread owning the Endpoint to re-submit. But if Completion owned a Linux zero-copy mmap allocation, I wonder whether the mmap/munmap overhead for every 16K transfer would make zero-copy even worthwhile for buffers only used once.

There hasn't been any attempt to make Packetry zero-copy ready yet. We could arrange to send those consumed buffers back for reuse, and obviously if nusb were doing that zero-copy mmap allocation then it'd be a higher priority to do so.

Did you consider combining those threads? Instead of blocking on the channel receive, block on the select between stop_rx and endpoint.next_complete(). Or perhaps once I finish the EndpointRead API that wraps Endpoint to implement AsyncRead, you could just let that manage the buffers for you, and the decoder could read from that directly instead of going through the VecDeque.

The issue there is that we can't guarantee that the decoder will always run fast enough, i.e. not block for long enough that the transfer queue is emptied.

Having a dedicated thread that just keeps up with the USB transfers, queuing up all the data for later decoding, means that we ensure we always drain the buffer on the device as fast it's possible for the host to achieve. That's important because both Cynthion and iCE40-usbtrace are devices that offload data to the host at the same USB speed they capture at. So they can only capture a saturated bus for a limited period that's constrained by their internal buffer size.

martinling · 2025-03-12T12:34:36Z

I've just added a second commit to that Packetry branch which sends each Completion back for reuse.

It doesn't really add much complication, and actually works very neatly, as it means the number of allocated transfers grows to fit the worst-case backlog length of the decoder thread, whilst keeping the same queue depth of transfers submitted to the OS.

kevinmehall · 2025-03-24T06:02:41Z

A related buffer management issue encountered while porting a (yet-to-be-published) project to this is that it's annoying that allocating a Request requires &mut Endpoint. If an OUT Endpoint is shared in a Mutex for multiple writers, or owned by a dedicated thread, the transfer can't be allocated and data filled without potentially waiting for the ongoing transfer to complete. So I think that's another argument for decoupling buffers and transfers and/or making it more efficient to interoperate with Vec.

I'm wondering if exposing the transfer object as Request and Completion types is even worth it. I'm considering going back to submit() taking a buffer and next_complete returning a buffer + status, and keeping the ancillary transfer structures internal, like the v0.1 Queue API. It's nice to say that it's completely zero-alloc when re-using transfers, but I think the small allocations for the transfer structures are pretty cheap compared to the syscalls, and caching one previously-completed transfer for re-use avoids the allocation in the majority of cases.

That's why I'd like to be able to discard the Completion after checking its status: because it means there's no ambiguity downstream about whether that status still needs to be checked.

One counterpoint that became apparent when implementing a Read wrapper for Endpoint is that if you care about handling a partial completion, the error should be checked after the data, because any data occurred first in the logical stream. For EndpointRead, the BufRead::fill_buf(&mut self) -> Result<&[u8], io::Error> method flattens the data and status to a Result, but yields the error only after all the data from the Completion has been consumed.

There are plenty of cases where you don't care about partial data if there is an error though, and I get the point that it'd be too easy to not check the status at all.

martinling · 2025-03-31T15:57:45Z

A related buffer management issue encountered while porting a (yet-to-be-published) project to this is that it's annoying that allocating a Request requires &mut Endpoint. If an OUT Endpoint is shared in a Mutex for multiple writers, or owned by a dedicated thread, the transfer can't be allocated and data filled without potentially waiting for the ongoing transfer to complete. So I think that's another argument for decoupling buffers and transfers and/or making it more efficient to interoperate with Vec.

Yeah, interoperability with Vec is important. There are going to be cases where the user already has some data in a Vec that they want to send over USB, and don't really have much choice about that (e.g. it came from some library code that generated it that way). In that scenario, allocating a special "zero-copy" buffer, only to then just copy into it, is not adding any value - you'd have been better off just giving the Vec buffer to the OS and letting it copy internally if it needs to.

But in the case where you have the choice of what storage to use, allocating a zero-copy buffer from nusb makes sense.

So perhaps what we need is separate Request<Vec> and Request<Buffer> types, etc.

I'm wondering if exposing the transfer object as Request and Completion types is even worth it. I'm considering going back to submit() taking a buffer and next_complete returning a buffer + status, and keeping the ancillary transfer structures internal, like the v0.1 Queue API. It's nice to say that it's completely zero-alloc when re-using transfers, but I think the small allocations for the transfer structures are pretty cheap compared to the syscalls, and caching one previously-completed transfer for re-use avoids the allocation in the majority of cases.

I don't personally have strong feelings either way on this.

One possible argument for keeping it strictly zero-alloc would be the possibility of extending/porting the nusb API to support bare-metal embedded systems with USB host support, where people may want a completely no-alloc solution. But that's way outside the current scope of the project.

Even in an OS environment though, there might perhaps be advantages to avoiding allocation in situations where real-time performance is important - e.g. SDR, USB audio, etc.

One counterpoint that became apparent when implementing a Read wrapper for Endpoint is that if you care about handling a partial completion, the error should be checked after the data, because any data occurred first in the logical stream. For EndpointRead, the BufRead::fill_buf(&mut self) -> Result<&[u8], io::Error> method flattens the data and status to a Result, but yields the error only after all the data from the Completion has been consumed.

I don't think that ordering between the data and the error is an argument for bundling them into the same object. Even if I want to respect that ordering I may still want to separate the two.

For instance, right now in my ported version of Packetry I have a Completion<Bulk, In> queue of data to be processed. But it would be better for me to have a Buffer<Bulk, In> queue. In the case where I get a Completion with both data and an Err status, I could put the data in the queue and then handle the error. But the data and the error logically go to different places, and indeed to different threads - the data goes to the decoder thread, while the error is handled by the capture thread which does teardown, and then gets sent to the UI thread which reacts to the teardown and displays a dialog. So I still want to split them even if I'm respecting that ordering.

kevinmehall · 2025-04-20T01:21:33Z

Went back to something more like v0.1 Queue with submit taking only a buffer and next_complete() returning a Completion containing data and status. I added a Buffer type that can be backed by either a Vec or a zero-copy mmap.

The implementation relies on internal allocations for the transfer state, but it caches the last completed transfer so as to not need to allocate when re-submitting after a transfer completes.

martinling · 2025-04-28T19:27:19Z

Seems like a good compromise. I've updated greatscottgadgets/packetry#222 to the new version and everything seems to be working OK.

OVERLAPPED.Internal contains NTSTATUS instead of WIN32_ERROR, and while it seems very unlikely to change, is documented as subject to change.

kevinmehall force-pushed the redesign branch 2 times, most recently from 54a5764 to 25b99e5 Compare February 24, 2025 16:13

kevinmehall force-pushed the redesign branch from 25b99e5 to 1ad516f Compare March 3, 2025 05:44

martinling mentioned this pull request Mar 3, 2025

WIP: Port to redesigned nusb API greatscottgadgets/packetry#222

Draft

NuLL3rr0r mentioned this pull request Mar 7, 2025

Support for isochronous transfers #47

Open

kevinmehall force-pushed the redesign branch from 1ad516f to a9ef4ca Compare March 8, 2025 21:38

kevinmehall added 4 commits April 12, 2025 13:47

Redesign transfer / endpoint API

ef82f2c

windows: Enable RAW_IO when creating an IN endpoint

47a1896

Use TransferError for control transfers

2aee309

impl Error for ClaimEndpointError

ba3f7d6

kevinmehall force-pushed the redesign branch from a9ef4ca to b2228df Compare April 20, 2025 01:07

kevinmehall force-pushed the redesign branch from 28411fa to 4ede59b Compare April 20, 2025 15:27

Expose buffer instead of transfer

992fd16

kevinmehall added 2 commits April 20, 2025 17:01

Linux zero-copy

2929a10

Endpoint::wait_next_complete for blocking wait

3af56dd

kevinmehall force-pushed the redesign branch from 4ede59b to 3af56dd Compare April 20, 2025 23:05

kevinmehall added 2 commits April 27, 2025 13:52

Move Completion to nusb::transfer

6501af5

Rename Completion::data -> buffer

1e3ae0f

kevinmehall added 3 commits May 3, 2025 10:19

Include error code in TransferError::Unknown

6027b5c

Simplify actual_len vs requested_len

4a3b5dd

Docs edits

b976e91

kevinmehall marked this pull request as ready for review May 3, 2025 20:13

kevinmehall mentioned this pull request May 3, 2025

Turn on RAW_IO on Windows #54

Closed

windows: Revert to GetOverlappedResult for completion status

3f55819

OVERLAPPED.Internal contains NTSTATUS instead of WIN32_ERROR, and while it seems very unlikely to change, is documented as subject to change.

kevinmehall merged commit e1796a8 into main May 4, 2025
8 checks passed

kevinmehall deleted the redesign branch May 4, 2025 04:11

kevinmehall mentioned this pull request Jun 15, 2025

Cancellation events may not be processed #103

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Redesign endpoint and transfer API #117

Redesign endpoint and transfer API #117

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Redesign endpoint and transfer API #117

Redesign endpoint and transfer API #117

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!