capsules: virtual_uart: avoid double receive, check rx len, support custom buffer #1757

bradjc · 2020-04-13T19:52:54Z

Pull Request Overview

Unexpected behavior: Running the process console causes uart.receive_abort to be called after every character.

Cause: start_receive gets called twice, once with a rx_buffer, and the second time without since it is being used after the first call. The first call happens because process console starts a new receive after handling its current read, and the second call happens because uart mux calls start_receive on its client's behalf after the callback.

Fix: Make sure that the buffer used by virtual_uart for underlying UART receives isn't replaced before client callbacks are completed. This causes all receives in callbacks to be queued until they have all completed and the virtual_uart receive handler can determine the shortest read to issue.

Other Changes

Do not assume the buffer used for virtual_uart receive is the default one, and instead use the actual buffer length for the maximum receive length.
Check for ESIZE when a client issues a read.

This pull request adds/changes/fixes...

Testing Strategy

Running process console on hail and verifying it works.

Added a debug statement to sam4l::usart::abort and it no longer is printed.

Tried the imix/src/test/virtual_uart_rx_test on hail and it works as expected.

TODO or Help Wanted

n/a

Documentation Updated

Updated the relevant files in /docs, or no updates are required.

Formatting

Ran make formatall.

phil-levis · 2020-04-17T21:30:59Z

Can you document when this happens? Amit noted that it happens on some boards and not others. So I want to understand that's causing that discrepancy. My read-through of the change and code suggests that it is OK, but this is something that needs a lot of testing. The virtual UART receive path is pretty tricky logic with edge cases.

phil-levis

Please provide a test case when this happens. I want to understand why it happens in some cases and not in others. The virtual receive logic is tricky and has to handle a bunch of edge cases. I'm worried that this patch will cause it to miss one.

phil-levis · 2020-04-17T21:41:47Z

One concern -- it's important that the read not happen directly in the read_done callback. I.e., a client issuing a read in its callback is not serviced immediately. The reason is that might be an outstanding shorter read.

I.e.,

client A requests read of 10
client B requests read of 12
10 bytes are read
A signaled read_done
client A requests read of 10
2 bytes are read
B signaled read_done
8 bytes area read
A signaled read_done

If the second A call stops the 2 byte read this is a bug.

pftbest · 2020-04-17T21:51:40Z

I posted this example code on slack:

https://gist.github.com/pftbest/c5d89849e22d103278a104704c4624cb

Compile it for desktop, no device required.

It has only one reader, but every call to receive is followed by an abort call.

bradjc · 2020-04-20T16:35:06Z

One test case is just to run the process_console with no apps on master. To see the abort calls, I added this patch:

diff --git a/chips/sam4l/src/usart.rs b/chips/sam4l/src/usart.rs
index e7b7fd4f4..cb82195da 100644
--- a/chips/sam4l/src/usart.rs
+++ b/chips/sam4l/src/usart.rs
@@ -12,6 +12,7 @@ use kernel::hil;
 use kernel::hil::spi;
 use kernel::hil::uart;
 use kernel::ReturnCode;
+use kernel::debug_gpio;

 use crate::dma;
 use crate::pm;
@@ -865,6 +866,9 @@ impl uart::Receive<'a> for USART<'a> {

     fn receive_abort(&self) -> ReturnCode {
         let usart = &USARTRegManager::new(&self);
+
+        debug_gpio!(0, toggle);
+
         self.disable_rx_timeout(usart);
         self.abort_rx(usart, ReturnCode::ECANCEL, uart::Error::Aborted);
         ReturnCode::EBUSY

phil-levis · 2020-04-23T04:59:35Z

I forgot to bring an imix home when Stanford shut down. I do have a launchxl, but the instructions to build it do not work (make flash-ccfg fails). So I have not tested this. But...

I think this bug fix is incorrect. I'd appreciate if the "Testing" section of the pull request reported running the UART virtualization tests.

Let's call the read that comes in during the callback the new read and any that are already occurring the pending reads.

The problem is that start_receive is being called twice. This should not happen. In particular, in the case that @bradjc describes, the new read should not result in an underlying read being issued. It is important that the virtualizer looks at all of the reads -- new new and pending -- be considered for the shortest one (the edge case I described above). Instead, the new read should just be added to the queue, then later, its length should be considered when figuring out how long a read to issue.

Why is it being called twice? We set the receive_pending flag -- why can the new read start an underlying read?

The problem is that whether you can start an underlying read is controlled by the presence of buffer. This is replaced before we issue the callback (line 118), and so when the new read comes in, it's there, and so the new read starts immediately (line 118). This replacement of buffer should occur when completing_read is set to false (line 155).

I think there's another lingering bug. If the new read is the shortest read, its length is not considered in the length of the next read to perform. So there also needs to be a line added to the if clause in lines 132-134, which sets next_read_len to min(next_read_len, device.rx_len.get()).

So, in short, I think:

If you apply the patch as is, then reads issued in callbacks can cause other clients to miss bytes. If another client has a short read, and a read issued in a callback makes a long read, the underlying UART will issue a long read. The client with a short read will miss all of the bytes in the long read after its own. E.g., if the short read is 5 bytes and the long one is 120, the short read will have a receive of 5 bytes signaled after 120, and miss bytes 5-119.
If you apply the changes I suggest, then this means that when a read is issued in a callback, its enqueued but not executed. When the loop across all devices completes, the virtualizer will issue a read whose length is the shortest outstanding read length.

I'll try to figure out a way to get an imix.

bradjc · 2020-04-23T14:29:42Z

If another client has a short read, and a read issued in a callback makes a long read, the underlying UART will issue a long read. The client with a short read will miss all of the bytes in the long read after its own. E.g., if the short read is 5 bytes and the long one is 120, the short read will have a receive of 5 bytes signaled after 120, and miss bytes 5-119.

Ok yes that makes sense. This still needs to be addressed.

However, I don't think receive.receive_buffer can only enqueue, since if the receive call is the first and only call to receive, there won't be another call to actually start the receive. A deferred call could provide the callback needed to actually start the receive, but this would be redundant if the receive.receive_buffer() stems from a callback.

So it seems like either:

Having receive.receive_buffer() enqueue a receive and then setup a deferred call to check all of the enqueued receives, and then issue the smallest one. I think there could be a state variable which skips the deferred call if the callback iter is happening, but this seems a little difficult to reason about.
Enhancing the MuxUart.start_receive() function. It would still actually issue the first receive to the hardware, but on subsequent reads it would check if the new read is smaller, and only then call abort(), and otherwise keep the existing logic plus what is currently in this PR.

would address the long/short receive issue.

phil-levis · 2020-04-23T16:33:47Z

However, I don't think receive.receive_buffer can only enqueue, since if the receive call is the first and only call to receive, there won't be another call to actually start the receive.

I think it can? The end of received_buffer starts the next receive:

tock/capsules/src/virtual_uart.rs

Line 157 in d7abb62

self.start_receive(next_read_len);

read_pending is set here:

tock/capsules/src/virtual_uart.rs

Line 133 in d7abb62

read_pending = true;

Unexpected behavior: Running the process console causes uart.receive_abort to be called after every character. Cause: start_receive gets called twice, once with a rx_buffer, and the second time without since it is being used after the first call. The first call happens because process console starts a new receive after handling its current read, and the second call happens because uart mux calls start_receive on its client's behalf after the callback. Fix: Remove the mux calling start_receive on its client's behalf.

This fixes the issue where receives issued from callbacks were always resulting in abort() calls by not replacing the underlying UART receive buffer until after all of the client callbacks ha 8000 ve finished. This "flag", in conjunction with the `completing_read` bool are what start_receive uses to determine whether to start or abort a receive. This also fixes determining the minimum length of a receive by calculating it in all cases where another receive is needed.

Without this check, the virtual uart will continue issuing reads to receive rx_len bytes, even if the buffer is not big enough to store them.

bradjc · 2020-04-24T14:59:37Z

Ok, discussed the fix with @phil-levis, and updated now. At a high level, the fix is to properly enqueue all receive requests issued during a receive callback until all of the callbacks have finished. Before, the first receive issued from a callback was improperly getting issued to the hardware. Receive requests issued at other times when there is not an active underlying receive are not enqueued, and issue immediately, as expected.

phil-levis · 2020-04-24T16:09:52Z

Can you run the virtual UART test?

bradjc · 2020-04-24T19:35:10Z

I ran the virtual uart test on hail and it works just as expected.

bradjc changed the title ~~capsules: virt_uart: avoid double receive~~ capsules: virtual_uart: avoid double receive Apr 13, 2020

ppannuto requested a review from phil-levis April 13, 2020 20:03

alistair23 previously approved these changes Apr 13, 2020

View reviewed changes

bradjc mentioned this pull request Apr 17, 2020

Release 1.5 #1685

Closed

69 tasks

phil-levis requested changes Apr 17, 2020

View reviewed changes

bradjc added 3 commits April 24, 2020 10:08

capsules: virtual uart: check for ESIZE on recv

6334749

Without this check, the virtual uart will continue issuing reads to receive rx_len bytes, even if the buffer is not big enough to store them.

bradjc dismissed alistair23’s stale review via 6334749 April 24, 2020 14:46

bradjc force-pushed the virtual-uart-no-double-read branch from 2ad8e51 to 6334749 Compare April 24, 2020 14:46

bradjc changed the title ~~capsules: virtual_uart: avoid double receive~~ capsules: virtual_uart: avoid double receive, check rx len, support custom buffer Apr 24, 2020

phil-levis approved these changes Apr 24, 2020

View reviewed changes

alevy approved these changes Apr 24, 2020

View reviewed changes

bradjc merged commit 6ec37b2 into master Apr 24, 2020

bors bot deleted the virtual-uart-no-double-read branch April 24, 2020 19:36

bradjc mentioned this pull request Apr 30, 2020

Release 2020-04-30 13:11:39 -0400 #1810

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

capsules: virtual_uart: avoid double receive, check rx len, support custom buffer #1757

capsules: virtual_uart: avoid double receive, check rx len, support custom buffer #1757

capsules: virtual_uart: avoid double receive, check rx len, support custom buffer #1757

capsules: virtual_uart: avoid double receive, check rx len, support custom buffer #1757

Conversation

Pull Request Overview

Other Changes

Testing Strategy

TODO or Help Wanted

Documentation Updated

Formatting

Choose a reason for hiding this comment