core/mutex: use thread_yield_higher() in mutex_unlock() #20890

maribu · 2024-10-05T20:08:40Z

Contribution description

Using sched_switch() in mutex_unlock() can result in crashes when mutex_unlock() is called from IRQ context. This however is a common pattern in RIOT to wake up a thread from IRQ. The reason for the crash is that sched_switch() assumes thread_get_active() to always return a non-NULL value. But when thread-less idle is used, no thread may be active after the last runnable thread exited. Using thread_yield_higher() instead solves the issue, as thread_yield_higher() is safe to call from IRQ context without an active thread.

Testing procedure

As described in #20812 (also add the assert(active_thread != NULL); to sched_switch() to not rely on undefined behavior triggering the crash, but to crash reliably on invalid state).

In addition to fixing the crash, it should not introduce regressions. I'm not sure how good the test coverage for mutex is, but at least the following passed:

for test in tests/core/mutex_*; do make BOARD=nucleo-f767zi -C $test flash test; done

make: Entering directory '/home/maribu/Repos/software/RIOT/master/tests/core/mutex_cancel'
Building application "tests_mutex_cancel" for "nucleo-f767zi" with CPU "stm32".

"make" -C /home/maribu/Repos/software/RIOT/master/pkg/cmsis/ 
"make" -C /home/maribu/Repos/software/RIOT/master/boards/common/init
"make" -C /home/maribu/Repos/software/RIOT/master/boards/nucleo-f767zi
"make" -C /home/maribu/Repos/software/RIOT/master/boards/common/nucleo
"make" -C /home/maribu/Repos/software/RIOT/master/core
"make" -C /home/maribu/Repos/software/RIOT/master/core/lib
"make" -C /home/maribu/Repos/software/RIOT/master/cpu/stm32
"make" -C /home/maribu/Repos/software/RIOT/master/cpu/cortexm_common
"make" -C /home/maribu/Repos/software/RIOT/master/cpu/cortexm_common/periph
"make" -C /home/maribu/Repos/software/RIOT/master/cpu/stm32/periph
"make" -C /home/maribu/Repos/software/RIOT/master/cpu/stm32/stmclk
"make" -C /home/maribu/Repos/software/RIOT/master/cpu/stm32/vectors
"make" -C /home/maribu/Repos/software/RIOT/master/drivers
"make" -C /home/maribu/Repos/software/RIOT/master/drivers/periph_common
"make" -C /home/maribu/Repos/software/RIOT/master/sys
"make" -C /home/maribu/Repos/software/RIOT/master/sys/auto_init
"make" -C /home/maribu/Repos/software/RIOT/master/sys/div
"make" -C /home/maribu/Repos/software/RIOT/master/sys/frac
"make" -C /home/maribu/Repos/software/RIOT/master/sys/isrpipe
"make" -C /home/maribu/Repos/software/RIOT/master/sys/libc
"make" -C /home/maribu/Repos/software/RIOT/master/sys/malloc_thread_safe
"make" -C /home/maribu/Repos/software/RIOT/master/sys/newlib_syscalls_default
"make" -C /home/maribu/Repos/software/RIOT/master/sys/pm_layered
"make" -C /home/maribu/Repos/software/RIOT/master/sys/preprocessor
"make" -C /home/maribu/Repos/software/RIOT/master/sys/stdio
"make" -C /home/maribu/Repos/software/RIOT/master/sys/stdio_uart
"make" -C /home/maribu/Repos/software/RIOT/master/sys/test_utils/interactive_sync
"make" -C /home/maribu/Repos/software/RIOT/master/sys/test_utils/print_stack_usage
"make" -C /home/maribu/Repos/software/RIOT/master/sys/tsrb
"make" -C /home/maribu/Repos/software/RIOT/master/sys/ztimer
   text	  data	   bss	   dec	   hex	filename
  12652	   172	  2708	 15532	  3cac	/home/maribu/Repos/software/RIOT/master/tests/core/mutex_cancel/bin/nucleo-f767zi/tests_mutex_cancel.elf
/home/maribu/Repos/software/RIOT/master/dist/tools/openocd/openocd.sh flash /home/maribu/Repos/software/RIOT/master/tests/core/mutex_cancel/bin/nucleo-f767zi/tests_mutex_cancel.elf
### Flashing Target ###
Open On-Chip Debugger 0.12.0+dev-snapshot (2024-09-30-11:15)
Licensed under GNU GPL v2
For bug reports, read
	http://openocd.org/doc/doxygen/bugs.html
DEPRECATED! use 'adapter serial' not 'hla_serial'
hla_swd
Info : The selected transport took over low-level target control. The results might differ compared to plain JTAG/SWD
srst_only separate srst_nogate srst_open_drain connect_assert_srst
Info : clock speed 2000 kHz
Info : STLINK V2J29M18 (API v2) VID:PID 0483:374B
Info : Target voltage: 3.232589
Info : [stm32f7x.cpu] Cortex-M7 r1p0 processor detected
Info : [stm32f7x.cpu] target has 8 breakpoints, 4 watchpoints
Info : [stm32f7x.cpu] Examination succeed
Info : starting gdb server for stm32f7x.cpu on 0
Info : Listening on port 41515 for gdb connections
    TargetName         Type       Endian TapName            State       
--  ------------------ ---------- ------ ------------------ ------------
 0* stm32f7x.cpu       hla_target little stm32f7x.cpu       unknown
Info : Unable to match requested speed 2000 kHz, using 1800 kHz
Info : Unable to match requested speed 2000 kHz, using 1800 kHz
[stm32f7x.cpu] halted due to debug-request, current mode: Thread 
xPSR: 0x01000000 pc: 0x08000a34 msp: 0x20000200
Info : device id = 0x10016451
Info : flash size = 2048 KiB
Info : Single Bank 2048 kiB STM32F76x/77x found
auto erase enabled
wrote 32768 bytes from file /home/maribu/Repos/software/RIOT/master/tests/core/mutex_cancel/bin/nucleo-f767zi/tests_mutex_cancel.elf in 0.736344s (43.458 KiB/s)
verified 12824 bytes in 0.110537s (113.296 KiB/s)
Info : Unable to match requested speed 2000 kHz, using 1800 kHz
Info : Unable to match requested speed 2000 kHz, using 1800 kHz
shutdown command invoked
Done flashing
r
/home/maribu/Repos/software/RIOT/master/dist/tools/pyterm/pyterm -p "/dev/ttyACM0" -b "115200" -ln "/tmp/pyterm-maribu" -rn "2024-10-05_21.57.33-tests_mutex_cancel-nucleo-f767zi" --no-reconnect --noprefix --no-repeat-command-on-empty-line 
Twisted not available, please install it if you want to use pyterm's JSON capabilities
Connect to serial port /dev/ttyACM0
Welcome to pyterm!
Type '/exit' to exit.
READY
s
START
main(): This is RIOT! (Version: 2024.10-devel-254-g13188)
Test Application for mutex_cancel / mutex_lock_cancelable
=========================================================

Test without cancellation: OK
Test early cancellation: OK
Verify no side effects on subsequent calls: OK
Test late cancellation: OK
TEST PASSED

make: Leaving directory '/home/maribu/Repos/software/RIOT/master/tests/core/mutex_cancel'
make: Entering directory '/home/maribu/Repos/software/RIOT/master/tests/core/mutex_order'
Building application "tests_mutex_order" for "nucleo-f767zi" with CPU "stm32".

"make" -C /home/maribu/Repos/software/RIOT/master/pkg/cmsis/ 
"make" -C /home/maribu/Repos/software/RIOT/master/boards/common/init
"make" -C /home/maribu/Repos/software/RIOT/master/boards/nucleo-f767zi
"make" -C /home/maribu/Repos/software/RIOT/master/boards/common/nucleo
"make" -C /home/maribu/Repos/software/RIOT/master/core
"make" -C /home/maribu/Repos/software/RIOT/master/core/lib
"make" -C /home/maribu/Repos/software/RIOT/master/cpu/stm32
"make" -C /home/maribu/Repos/software/RIOT/master/cpu/cortexm_common
"make" -C /home/maribu/Repos/software/RIOT/master/cpu/cortexm_common/periph
"make" -C /home/maribu/Repos/software/RIOT/master/cpu/stm32/periph
"make" -C /home/maribu/Repos/software/RIOT/master/cpu/stm32/stmclk
"make" -C /home/maribu/Repos/software/RIOT/master/cpu/stm32/vectors
"make" -C /home/maribu/Repos/software/RIOT/master/drivers
"make" -C /home/maribu/Repos/software/RIOT/master/drivers/periph_common
"make" -C /home/maribu/Repos/software/RIOT/master/sys
"make" -C /home/maribu/Repos/software/RIOT/master/sys/auto_init
"make" -C /home/maribu/Repos/software/RIOT/master/sys/div
"make" -C /home/maribu/Repos/software/RIOT/master/sys/isrpipe
"make" -C /home/maribu/Repos/software/RIOT/master/sys/libc
"make" -C /home/maribu/Repos/software/RIOT/master/sys/malloc_thread_safe
"make" -C /home/maribu/Repos/software/RIOT/master/sys/newlib_syscalls_default
"make" -C /home/maribu/Repos/software/RIOT/master/sys/pm_layered
"make" -C /home/maribu/Repos/software/RIOT/master/sys/preprocessor
"make" -C /home/maribu/Repos/software/RIOT/master/sys/stdio
"make" -C /home/maribu/Repos/software/RIOT/master/sys/stdio_uart
"make" -C /home/maribu/Repos/software/RIOT/master/sys/test_utils/interactive_sync
"make" -C /home/maribu/Repos/software/RIOT/master/sys/test_utils/print_stack_usage
"make" -C /home/maribu/Repos/software/RIOT/master/sys/tsrb
   text	  data	   bss	   dec	   hex	filename
  10964	   128	 10380	 21472	  53e0	/home/maribu/Repos/software/RIOT/master/tests/core/mutex_order/bin/nucleo-f767zi/tests_mutex_order.elf
/home/maribu/Repos/software/RIOT/master/dist/tools/openocd/openocd.sh flash /home/maribu/Repos/software/RIOT/master/tests/core/mutex_order/bin/nucleo-f767zi/tests_mutex_order.elf
### Flashing Target ###
Open On-Chip Debugger 0.12.0+dev-snapshot (2024-09-30-11:15)
Licensed under GNU GPL v2
For bug reports, read
	http://openocd.org/doc/doxygen/bugs.html
DEPRECATED! use 'adapter serial' not 'hla_serial'
hla_swd
Info : The selected transport took over low-level target control. The results might differ compared to plain JTAG/SWD
srst_only separate srst_nogate srst_open_drain connect_assert_srst
Info : clock speed 2000 kHz
Info : STLINK V2J29M18 (API v2) VID:PID 0483:374B
Info : Target voltage: 3.232589
Info : [stm32f7x.cpu] Cortex-M7 r1p0 processor detected
Info : [stm32f7x.cpu] target has 8 breakpoints, 4 watchpoints
Info : [stm32f7x.cpu] Examination succeed
Info : starting gdb server for stm32f7x.cpu on 0
Info : Listening on port 35273 for gdb connections
    TargetName         Type       Endian TapName            State       
--  ------------------ ---------- ------ ------------------ ------------
 0* stm32f7x.cpu       hla_target little stm32f7x.cpu       unknown
Info : Unable to match requested speed 2000 kHz, using 1800 kHz
Info : Unable to match requested speed 2000 kHz, using 1800 kHz
[stm32f7x.cpu] halted due to debug-request, current mode: Thread 
xPSR: 0x01000000 pc: 0x08000b50 msp: 0x20000200
Info : device id = 0x10016451
Info : flash size = 2048 KiB
Info : Single Bank 2048 kiB STM32F76x/77x found
auto erase enabled
wrote 32768 bytes from file /home/maribu/Repos/software/RIOT/master/tests/core/mutex_order/bin/nucleo-f767zi/tests_mutex_order.elf in 0.724737s (44.154 KiB/s)
verified 11092 bytes in 0.104662s (103.495 KiB/s)
Info : Unable to match requested speed 2000 kHz, using 1800 kHz
Info : Unable to match requested speed 2000 kHz, using 1800 kHz
shutdown command invoked
Done flashing
r
/home/maribu/Repos/software/RIOT/master/dist/tools/pyterm/pyterm -p "/dev/ttyACM0" -b "115200" -ln "/tmp/pyterm-maribu" -rn "2024-10-05_21.57.40-tests_mutex_order-nucleo-f767zi" --no-reconnect --noprefix --no-repeat-command-on-empty-line 
Twisted not available, please install it if you want to use pyterm's JSON capabilities
Connect to serial port /dev/ttyACM0
Welcome to pyterm!
Type '/exit' to exit.
READY
s
START
main(): This is RIOT! (Version: 2024.10-devel-254-g13188)
Mutex order test
Please refer to the README.md for more information

T2 (prio 6): trying to lock mutex now
T3 (prio 4): trying to lock mutex now
T4 (prio 0): trying to lock mutex now
T5 (prio 2): trying to lock mutex now
T6 (prio 1): trying to lock mutex now
T4 (prio 0): locked mutex now
{ "threads": [{ "name": "t", "stack_size": 1536, "stack_used": 356 }]}
T6 (prio 1): locked mutex now
{ "threads": [{ "name": "t", "stack_size": 1536, "stack_used": 356 }]}
T5 (prio 2): locked mutex now
{ "threads": [{ "name": "t", "stack_size": 1536, "stack_used": 356 }]}
T3 (prio 4): locked mutex now
{ "threads": [{ "name": "t", "stack_size": 1536, "stack_used": 356 }]}
T2 (prio 6): locked mutex now

make: Leaving directory '/home/maribu/Repos/software/RIOT/master/tests/core/mutex_order'
make: Entering directory '/home/maribu/Repos/software/RIOT/master/tests/core/mutex_unlock_and_sleep'
Building application "tests_mutex_unlock_and_sleep" for "nucleo-f767zi" with CPU "stm32".

"make" -C /home/maribu/Repos/software/RIOT/master/pkg/cmsis/ 
"make" -C /home/maribu/Repos/software/RIOT/master/boards/common/init
"make" -C /home/maribu/Repos/software/RIOT/master/boards/nucleo-f767zi
"make" -C /home/maribu/Repos/software/RIOT/master/boards/common/nucleo
"make" -C /home/maribu/Repos/software/RIOT/master/core
"make" -C /home/maribu/Repos/software/RIOT/master/core/lib
"make" -C /home/maribu/Repos/software/RIOT/master/cpu/stm32
"make" -C /home/maribu/Repos/software/RIOT/master/cpu/cortexm_common
"make" -C /home/maribu/Repos/software/RIOT/master/cpu/cortexm_common/periph
"make" -C /home/maribu/Repos/software/RIOT/master/cpu/stm32/periph
"make" -C /home/maribu/Repos/software/RIOT/master/cpu/stm32/stmclk
"make" -C /home/maribu/Repos/software/RIOT/master/cpu/stm32/vectors
"make" -C /home/maribu/Repos/software/RIOT/master/drivers
"make" -C /home/maribu/Repos/software/RIOT/master/drivers/periph_common
"make" -C /home/maribu/Repos/software/RIOT/master/sys
"make" -C /home/maribu/Repos/software/RIOT/master/sys/auto_init
"make" -C /home/maribu/Repos/software/RIOT/master/sys/div
"make" -C /home/maribu/Repos/software/RIOT/master/sys/isrpipe
"make" -C /home/maribu/Repos/software/RIOT/master/sys/libc
"make" -C /home/maribu/Repos/software/RIOT/master/sys/malloc_thread_safe
"make" -C /home/maribu/Repos/software/RIOT/master/sys/newlib_syscalls_default
"make" -C /home/maribu/Repos/software/RIOT/master/sys/pm_layered
"make" -C /home/maribu/Repos/software/RIOT/master/sys/preprocessor
"make" -C /home/maribu/Repos/software/RIOT/master/sys/stdio
"make" -C /home/maribu/Repos/software/RIOT/master/sys/stdio_uart
"make" -C /home/maribu/Repos/software/RIOT/master/sys/test_utils/interactive_sync
"make" -C /home/maribu/Repos/software/RIOT/master/sys/test_utils/print_stack_usage
"make" -C /home/maribu/Repos/software/RIOT/master/sys/tsrb
   text	  data	   bss	   dec	   hex	filename
  10940	   128	  3732	 14800	  39d0	/home/maribu/Repos/software/RIOT/master/tests/core/mutex_unlock_and_sleep/bin/nucleo-f767zi/tests_mutex_unlock_and_sleep.elf
/home/maribu/Repos/software/RIOT/master/dist/tools/openocd/openocd.sh flash /home/maribu/Repos/software/RIOT/master/tests/core/mutex_unlock_and_sleep/bin/nucleo-f767zi/tests_mutex_unlock_and_sleep.elf
### Flashing Target ###
Open On-Chip Debugger 0.12.0+dev-snapshot (2024-09-30-11:15)
Licensed under GNU GPL v2
For bug reports, read
	http://openocd.org/doc/doxygen/bugs.html
DEPRECATED! use 'adapter serial' not 'hla_serial'
hla_swd
Info : The selected transport took over low-level target control. The results might differ compared to plain JTAG/SWD
srst_only separate srst_nogate srst_open_drain connect_assert_srst
Info : clock speed 2000 kHz
Info : STLINK V2J29M18 (API v2) VID:PID 0483:374B
Info : Target voltage: 3.234166
Info : [stm32f7x.cpu] Cortex-M7 r1p0 processor detected
Info : [stm32f7x.cpu] target has 8 breakpoints, 4 watchpoints
Info : [stm32f7x.cpu] Examination succeed
Info : starting gdb server for stm32f7x.cpu on 0
Info : Listening on port 35587 for gdb connections
    TargetName         Type       Endian TapName            State       
--  ------------------ ---------- ------ ------------------ ------------
 0* stm32f7x.cpu       hla_target little stm32f7x.cpu       unknown
Info : Unable to match requested speed 2000 kHz, using 1800 kHz
Info : Unable to match requested speed 2000 kHz, using 1800 kHz
[stm32f7x.cpu] halted due to debug-request, current mode: Thread 
xPSR: 0x01000000 pc: 0x08000a34 msp: 0x20000200
Info : device id = 0x10016451
Info : flash size = 2048 KiB
Info : Single Bank 2048 kiB STM32F76x/77x found
auto erase enabled
wrote 32768 bytes from file /home/maribu/Repos/software/RIOT/master/tests/core/mutex_unlock_and_sleep/bin/nucleo-f767zi/tests_mutex_unlock_and_sleep.elf in 0.733191s (43.645 KiB/s)
verified 11068 bytes in 0.103960s (103.969 KiB/s)
Info : Unable to match requested speed 2000 kHz, using 1800 kHz
Info : Unable to match requested speed 2000 kHz, using 1800 kHz
shutdown command invoked
Done flashing
r
/home/maribu/Repos/software/RIOT/master/dist/tools/pyterm/pyterm -p "/dev/ttyACM0" -b "115200" -ln "/tmp/pyterm-maribu" -rn "2024-10-05_21.57.54-tests_mutex_unlock_and_sleep-nucleo-f767zi" --no-reconnect --noprefix --no-repeat-command-on-empty-line 
Twisted not available, please install it if you want to use pyterm's JSON capabilities
Connect to serial port /dev/ttyACM0
Welcome to pyterm!
Type '/exit' to exit.
READY
s
START
main(): This is RIOT! (Version: 2024.10-devel-254-g13188)
[ALIVE] alternated 10k times.
[ALIVE] alternated 20k times.
[ALIVE] alternated 30k times.
[ALIVE] alternated 40k times.
[ALIVE] alternated 50k times.
[ALIVE] alternated 60k times.
[ALIVE] alternated 70k times.
[ALIVE] alternated 80k times.
[ALIVE] alternated 90k times.
[ALIVE] alternated 100k times.

make: Leaving directory '/home/maribu/Repos/software/RIOT/master/tests/core/mutex_unlock_and_sleep'

Issues/PRs references

This fixes #20812

Alternative to #20878

Using `sched_switch()` in `mutex_unlock()` can result in crashes when `mutex_unlock()` is called from IRQ context. This however is a common pattern in RIOT to wake up a thread from IRQ. The reason for the crash is that `sched_switch()` assumes `thread_get_active()` to always return a non-`NULL` value. But when thread-less idle is used, no thread may be active after the last runnable thread exited. Using `thread_yield_higher()` instead solves the issue, as `thread_yield_higher()` is safe to call from IRQ context without an active thread. This fixes RIOT-OS#20812

riot-ci · 2024-10-05T20:28:42Z

Murdock results

✔️ PASSED

1d99f4f core/mutex: use thread_yield_higher() in mutex_unlock()

Success	Failures	Total	Runtime
10197	0	10197	19m:58s

Artifacts

Documentation preview

benpicco · 2024-10-07T08:26:07Z

With this patch tests/bench/mutex_pingpong is also faster, tested on same54-xpro:

master

"result" : 215440, "ticks" : 556

this PR

"result" : 223048, "ticks" : 538

maribu · 2024-10-07T10:08:47Z

That is expected: sched_switch() ends up calling thread_yield_higher() in this case anyway, so it only adds overhead compared to calling thread_yield_higher() directly.

The use of sched_switch() is only beneficial when when a high prio thread blocks while holding a mutex, a low prio thread adds itself to the waiter list of said mutex, the high prio thrrad resumes and releases the mutex. That is super niche IMO.

benpicco · 2024-10-07T10:57:34Z

The use of sched_switch() is only beneficial when when a high prio thread blocks while holding a mutex, a low prio thread adds itself to the waiter list of said mutex, the high prio thrrad resumes and releases the mutex

IMHO that's even wrong - if the high prio thread is running, calling mutex_unlock() should not cause it to yield to a lower priority thread.

Only yielding to higher priority threads is much more sound.

maribu · 2024-10-07T11:08:06Z

IMHO that's even wrong

Neither sched_switch() nor thread_yield_higher () will yield in favor of a lower prio thread. sched_switch() will be faster when not actually yielding.

The hot path is no waiter with a mutex anyway, and that is actually optimised anyway (with no call to either). So this would be net positive from a performance PoV IMO.

maribu added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Area: core Area: RIOT kernel. Handle PRs marked with this with care! CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Oct 5, 2024

maribu requested a review from kaspar030 as a code owner October 5, 2024 20:08

maribu mentioned this pull request Oct 5, 2024

core/sched_switch: fix crash with no active thread #20878

Open

benpicco approved these changes Oct 7, 2024

View reviewed changes

benpicco added this pull request to the merge queue Oct 7, 2024

Merged via the queue into RIOT-OS:master with commit 89d3370 Oct 7, 2024
29 checks passed

maribu deleted the core/mutex/thread_yield branch October 10, 2024 08:25

benpicco added this to the Release 2024.10 milestone Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core/mutex: use thread_yield_higher() in mutex_unlock() #20890

core/mutex: use thread_yield_higher() in mutex_unlock() #20890

core/mutex: use thread_yield_higher() in mutex_unlock() #20890

core/mutex: use thread_yield_higher() in mutex_unlock() #20890

Conversation

Contribution description

Testing procedure

Issues/PRs references

Murdock results

Artifacts

master

this PR