-
Notifications
You must be signed in to change notification settings - Fork 2.1k
core/clist: use insertion sort #21398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
99199f5
to
cd93aa7
Compare
I also improved the unit test a little bit to cover a few edge cases and to verify that the sorted list is actually sorted. |
f8a1c38
to
689b4cd
Compare
This adds a benchmark for clist_sort() and calls that on three different list sizes.
689b4cd
to
9842631
Compare
Add a few cycles before starting the stopwatch for fancy CPUs that do have a cache and branch predictor that may warm up (e.g. Cortex M7).
9842631
to
c33cb94
Compare
Some benchmarks:
|
/* This is the worst case for most sorting algorithms: having the | ||
* list perfectly sorted but in inverse direction. For real time, we | ||
* only care about worst case and not about best case / average case */ | ||
n->value = val--; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the worst case for this insertion sort implementation, as using val++
makes sorting much slower.
n->value = val--; | |
n->value = val++; |
E.g., on nrf52840dk, for n=256
:
before:
2025-04-12 23:18:21,904 # clist_sort (#256, worst): 311423us --- 311.423us per call --- 3211 calls per sec
after:
2025-04-12 23:19:11,954 # clist_sort (#256, worst): 9233345us --- 9233.345us per call --- 108 calls per sec
That's a factor 30 slower than the claimed "worst case".
The cut-off point where this insertion sort becomes slower than the previous implementation (which does not come from IOCCC btw) is below n=32
.
I'm all for a fixed clist sort, but the benchmark seems misleading, the worst case is actually not the worst case for this insertion sort.
With the "fixed worst case", it becomes noticable with less than 32 list entries. At 32 list entries, the old merge sort is already 50% faster then this insertion sort (worst case). The nrf's memory fits 1000 times 32 list entries. With 16384 list entries, merge sort can sort the list five times per second. (insertion sort is still running and will take a while, I will update this post when it is done.). This might be a size that is unlikely to come up on our little devices. This insertion sort is still much faster in the "avg case" of this benchmark. (Which it shouldn't be actually. 🙂) |
- Iterate of different lengths of unsorted data to also cover corner case one-item list - Actually check that the list is sorted afterwards (and not just that the maximum got sorted in last) - Actually check that the list was stable-sorted (as the API doc promises a stable sort)
This replaces the previous clist sort implementation, which used a merge sort, with a simple insertion sort implementation. The motivation is as follow: 1. The insertion sort is smaller (code size) 2. The performance benefit of the merge sort would only become noticeable on data sets that would be to large to comfortably handle on the MCUs RIOT targets 3. The previous implementation looked like a good candidate for the IOCCC, while the insertion sort is trivial to understand 4. Static analysis could provide for two statements an example control flow path that would trigger a `NULL` pointer dereferencation. The current code has no such defects.
c33cb94
to
66f4a66
Compare
Contribution description
This replaces the previous clist sort implementation, which used a merge sort, with a simple insertion sort implementation.
The motivation is as follow:
NULL
pointer dereferencation. The current code has no such defects.Testing procedure
-fanalyzer
should now be happy with the codeIssues/PRs references
None