8000 core/clist: use insertion sort by maribu · Pull Request #21398 · RIOT-OS/RIOT · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

core/clist: use insertion sort #21398

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

maribu
Copy link
Member
@maribu maribu commented Apr 11, 2025

Contribution description

This replaces the previous clist sort implementation, which used a merge sort, with a simple insertion sort implementation.

The motivation is as follow:

  1. The insertion sort is smaller (code size)
  2. The performance benefit of the merge sort would only become noticeable on data sets that would be to large to comfortably handle on the MCUs RIOT targets
  3. The previous implementation looked like a good candidate for the IOCCC, while the insertion sort is trivial to understand
  4. Static analysis could provide for two statements an example control flow path that would trigger a NULL pointer dereferencation. The current code has no such defects.

Testing procedure

  1. There is a unit test that should still pass.
  2. Adding -fanalyzer should now be happy with the code
  3. The code should now be understandable without spending hours on a white board

Issues/PRs references

None

@maribu maribu added Area: core Area: RIOT kernel. Handle PRs marked with this with care! Type: cleanup The issue proposes a clean-up / The PR cleans-up parts of the codebase / documentation CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Process: needs >1 ACK Integration Process: This PR requires more than one ACK labels Apr 11, 2025
@maribu maribu requested a review from kaspar030 as a code owner April 11, 2025 19:06
@github-actions github-actions bot added the Process: missing approvals Integration Process: PR needs more ACKS (handled by action) label Apr 11, 2025
@maribu maribu force-pushed the core/clist_sort_insertion_sort branch 2 times, most recently from 99199f5 to cd93aa7 Compare April 11, 2025 19:12
@maribu maribu requested a review from miri64 as a code owner April 11, 2025 19:20
@github-actions github-actions bot added the Area: tests Area: tests and testing framework label Apr 11, 2025
@maribu
Copy link
Member Author
maribu commented Apr 11, 2025

I also improved the unit test a little bit to cover a few edge cases and to verify that the sorted list is actually sorted.

@riot-ci
Copy link
riot-ci commented Apr 11, 2025

Murdock results

FAILED

66f4a66 fixup! core/clist: use insertion sort

Success Failures Total Runtime
5943 0 9842 05m:49s

Artifacts

@maribu maribu force-pushed the core/clist_sort_insertion_sort branch from f8a1c38 to 689b4cd Compare April 12, 2025 09:00
@maribu maribu added State: waiting for other PR State: The PR requires another PR to be merged first and removed Area: tests Area: tests and testing framework labels Apr 12, 2025
@maribu maribu marked this pull request as draft April 12, 2025 09:01
This adds a benchmark for clist_sort() and calls that on three different
list sizes.
@maribu maribu force-pushed the core/clist_sort_insertion_sort branch from 689b4cd to 9842631 Compare April 12, 2025 09:34
@github-actions github-actions bot added the Area: tests Area: tests and testing framework label Apr 12, 2025
@maribu
Copy link
Member Author
maribu commented Apr 12, 2025

maribu added 2 commits April 12, 2025 21:15
Add a few cycles before starting the stopwatch for fancy CPUs that do
have a cache and branch predictor that may warm up (e.g. Cortex M7).
@maribu maribu force-pushed the core/clist_sort_insertion_sort branch from 9842631 to c33cb94 Compare April 12, 2025 19:31
@github-actions github-actions bot added the Area: sys Area: System label Apr 12, 2025
@maribu
Copy link
Member Author
maribu commented Apr 12, 2025

Some benchmarks:

native64

master (merge sort, IOCCC variant)

main(): This is RIOT! (Version: 2025.04-devel-520-g98b01-tests/bench/runtime_coreapis)
Runtime of Selected Core API functions

                 nop loop:       430us  ---   0.000us per call  ---  2325581395 calls per sec

             mutex_init():       893us  ---   0.000us per call  ---  1119820828 calls per sec
        mutex lock/unlock:    909438us  ---   0.909us per call  ---    1099580 calls per sec

       thread_flags_set():    319468us  ---   0.319us per call  ---    3130203 calls per sec
     thread_flags_clear():    325462us  ---   0.325us per call  ---    3072555 calls per sec
thread flags set/wait any:    978442us  ---   0.978us per call  ---    1022032 calls per sec
thread flags set/wait all:    972636us  ---   0.972us per call  ---    1028133 calls per sec
thread flags set/wait one:    979552us  ---   0.979us per call  ---    1020874 calls per sec

        msg_try_receive():    334414us  ---   0.334us per call  ---    2990305 calls per sec
              msg_avail():      4767us  ---   0.004us per call  ---  209775540 calls per sec

   clist_sort (#4, worst):     19603us  ---   0.019us per call  ---   51012600 calls per sec
     clist_sort (#4, avg):     20029us  ---   0.020us per call  ---   49927604 calls per sec
   clist_sort (#8, worst):     47127us  ---   0.047us per call  ---   21219258 calls per sec
     clist_sort (#8, avg):     49394us  ---   0.049us per call  ---   20245373 calls per sec
  clist_sort (#16, worst):    119396us  ---   0.119us per call  ---    8375489 calls per sec
    clist_sort (#16, avg):    115471us  ---   0.115us per call  ---    8660183 calls per sec
  clist_sort (#32, worst):    278381us  ---   0.278us per call  ---    3592199 calls per sec
    clist_sort (#32, avg):    312818us  ---   0.312us per call  ---    3196746 calls per sec
  clist_sort (#64, worst):    636261us  ---   0.636us per call  ---    1571682 calls per sec
    clist_sort (#64, avg):    780777us  ---   0.780us per call  ---    1280775 calls per sec
 clist_sort (#128, worst):   1424094us  ---   1.424us per call  ---     702200 calls per sec
   clist_sort (#128, avg):   1874443us  ---   1.874us per call  ---     533491 calls per sec
 clist_sort (#256, worst):   3235114us  ---   3.235us per call  ---     309108 calls per sec
   clist_sort (#256, avg):   4239094us  ---   4.239us per call  ---     235899 calls per sec

[SUCCESS]
{ "threads": [{ "name": "idle", "stack_size": 16384, "stack_used": 1080 }]}
{ "threads": [{ "name": "main", "stack_size": 20480, "stack_used": 4760 }]}

This PR (simple insertion sort)

main(): This is RIOT! (Version: 2025.04-devel-522-gc33cb94-core/clist_sort_insertion_sort)
Runtime of Selected Core API functions

                 nop loop:       593us  ---   0.000us per call  ---  1686340640 calls per sec

             mutex_init():      1268us  ---   0.001us per call  ---  788643533 calls per sec
        mutex lock/unlock:   1038304us  ---   1.038us per call  ---     963109 calls per sec

       thread_flags_set():    327235us  ---   0.327us per call  ---    3055907 calls per sec
     thread_flags_clear():    331057us  ---   0.331us per call  ---    3020627 calls per sec
thread flags set/wait any:    979440us  ---   0.979us per call  ---    1020991 calls per sec
thread flags set/wait all:    976936us  ---   0.976us per call  ---    1023608 calls per sec
thread flags set/wait one:    979527us  ---   0.979us per call  ---    1020900 calls per sec

        msg_try_receive():    330618us  ---   0.330us per call  ---    3024638 calls per sec
              msg_avail():      2910us  ---   0.002us per call  ---  343642611 calls per sec

   clist_sort (#4, worst):     16905us  ---   0.016us per call  ---   59154096 calls per sec
     clist_sort (#4, avg):     15056us  ---   0.015us per call  ---   66418703 calls per sec
   clist_sort (#8, worst):     33741us  ---   0.033us per call  ---   29637532 calls per sec
     clist_sort (#8, avg):     22386us  ---   0.022us per call  ---   44670776 calls per sec
  clist_sort (#16, worst):     73995us  ---   0.073us per call  ---   13514426 calls per sec
    clist_sort (#16, avg):     56454us  ---   0.056us per call  ---   17713536 calls per sec
  clist_sort (#32, worst):    194442us  ---   0.194us per call  ---    5142921 calls per sec
    clist_sort (#32, avg):    106041us  ---   0.106us per call  ---    9430314 calls per sec
  clist_sort (#64, worst):    394818us  ---   0.394us per call  ---    2532812 calls per sec
    clist_sort (#64, avg):    253255us  ---   0.253us per call  ---    3948589 calls per sec
 clist_sort (#128, worst):    753123us  ---   0.753us per call  ---    1327804 calls per sec
   clist_sort (#128, avg):    601839us  ---   0.601us per call  ---    1661573 calls per sec
 clist_sort (#256, worst):   1514474us  ---   1.514us per call  ---     660295 calls per sec
   clist_sort (#256, avg):   1330050us  ---   1.330us per call  ---     751851 calls per sec

[SUCCESS]
{ "threads": [{ "name": "idle", "stack_size": 16384, "stack_used": 1080 }]}
{ "threads": [{ "name": "main", "stack_size": 20480, "stack_used": 4760 }]}

same54-xpro

master (merge sort, IOCCC variant)

main(): This is RIOT! (Version: 2025.04-devel-520-g98b01-tests/bench/runtime_coreapis)
Runtime of Selected Core API functions

                 nop loop:     33335us  ---   0.033us per call  ---   29998500 calls per sec

             mutex_init():         2us  ---   0.000us per call  ---  1783793664 calls per sec
        mutex lock/unlock:    541669us  ---   0.541us per call  ---    1846145 calls per sec

       thread_flags_set():    275002us  ---   0.275us per call  ---    3636337 calls per sec
     thread_flags_clear():    266669us  ---   0.266us per call  ---    3749967 calls per sec
thread flags set/wait any:    941669us  ---   0.941us per call  ---    1061944 calls per sec
thread flags set/wait all:    800002us  ---   0.800us per call  ---    1249996 calls per sec
thread flags set/wait one:    933335us  ---   0.933us per call  ---    1071426 calls per sec

        msg_try_receive():    416669us  ---   0.416us per call  ---    2399986 calls per sec
              msg_avail():    133335us  ---   0.133us per call  ---    7499906 calls per sec

   clist_sort (#4, worst):      4127us  ---   4.127us per call  ---     242306 calls per sec
     clist_sort (#4, avg):      4202us  ---   4.202us per call  ---     237981 calls per sec
   clist_sort (#8, worst):     10352us  ---  10.352us per call  ---      96599 calls per sec
     clist_sort (#8, avg):     10358us  ---  
8000
10.358us per call  ---      96543 calls per sec
  clist_sort (#16, worst):     25110us  ---  25.110us per call  ---      39824 calls per sec
    clist_sort (#16, avg):     25002us  ---  25.002us per call  ---      39996 calls per sec
  clist_sort (#32, worst):     59235us  ---  59.235us per call  ---      16881 calls per sec
    clist_sort (#32, avg):     62160us  ---  62.160us per call  ---      16087 calls per sec
  clist_sort (#64, worst):    136694us  ---  136.694us per call  ---       7315 calls per sec
    clist_sort (#64, avg):    147785us  ---  147.785us per call  ---       6766 calls per sec
 clist_sort (#128, worst):    310018us  ---  310.018us per call  ---       3225 calls per sec
   clist_sort (#128, avg):    341994us  ---  341.994us per call  ---       2924 calls per sec
 clist_sort (#256, worst):    693477us  ---  693.477us per call  ---       1442 calls per sec
   clist_sort (#256, avg):    777652us  ---  777.652us per call  ---       1285 calls per sec

[SUCCESS]

This PR (simple insertion sort)

main(): This is RIOT! (Version: 2025.04-devel-522-gc33cb94-core/clist_sort_insertion_sort)
Runtime of Selected Core API functions

                 nop loop:     33335us  ---   0.033us per call  ---   29998500 calls per sec

             mutex_init():         2us  ---   0.000us per call  ---  1783793664 calls per sec
        mutex lock/unlock:    541669us  ---   0.541us per call  ---    1846145 calls per sec

       thread_flags_set():    275001us  ---   0.275us per call  ---    3636350 calls per sec
     thread_flags_clear():    266668us  ---   0.266us per call  ---    3749981 calls per sec
thread flags set/wait any:    941669us  ---   0.941us per call  ---    1061944 calls per sec
thread flags set/wait all:    800002us  ---   0.800us per call  ---    1249996 calls per sec
thread flags set/wait one:    933336us  ---   0.933us per call  ---    1071425 calls per sec

        msg_try_receive():    416668us  ---   0.416us per call  ---    2399992 calls per sec
              msg_avail():    133335us  ---   0.133us per call  ---    7499906 calls per sec

   clist_sort (#4, worst):      2252us  ---   2.252us per call  ---     444049 calls per sec
     clist_sort (#4, avg):      2443us  ---   2.443us per call  ---     409332 calls per sec
   clist_sort (#8, worst):      4852us  ---   4.852us per call  ---     206100 calls per sec
     clist_sort (#8, avg):      3918us  ---   3.918us per call  ---     255232 calls per sec
  clist_sort (#16, worst):     10052us  ---  10.052us per call  ---      99482 calls per sec
    clist_sort (#16, avg):      8485us  ---   8.485us per call  ---     117855 calls per sec
  clist_sort (#32, worst):     20452us  ---  20.452us per call  ---      48894 calls per sec
    clist_sort (#32, avg):     15069us  ---  15.069us per call  ---      66361 calls per sec
  clist_sort (#64, worst):     41252us  ---  41.252us per call  ---      24241 calls per sec
    clist_sort (#64, avg):     31985us  ---  31.985us per call  ---      31264 calls per sec
 clist_sort (#128, worst):     82852us  ---  82.852us per call  ---      12069 calls per sec
   clist_sort (#128, avg):     68819us  ---  68.819us per call  ---      14530 calls per sec
 clist_sort (#256, worst):    166052us  ---  166.052us per call  ---       6022 calls per sec
   clist_sort (#256, avg):    140085us  ---  140.085us per call  ---       7138 calls per sec

[SUCCESS]

/* This is the worst case for most sorting algorithms: having the
* list perfectly sorted but in inverse direction. For real time, we
* only care about worst case and not about best case / average case */
n->value = val--;
Copy link
Contributor
@kaspar030 kaspar030 Apr 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the worst case for this insertion sort implementation, as using val++ makes sorting much slower.

Suggested change
n->value = val--;
n->value = val++;

E.g., on nrf52840dk, for n=256:

before:

2025-04-12 23:18:21,904 #  clist_sort (#256, worst):    311423us  ---  311.423us per call  ---       3211 calls per sec

after:

2025-04-12 23:19:11,954 #  clist_sort (#256, worst):   9233345us  ---  9233.345us per call  ---        108 calls per sec

That's a factor 30 slower than the claimed "worst case".

The cut-off point where this insertion sort becomes slower than the previous implementation (which does not come from IOCCC btw) is below n=32.

@kaspar030
Copy link
Contributor

I'm all for a fixed clist sort, but the benchmark seems misleading, the worst case is actually not the worst case for this insertion sort.

The performance benefit of the merge sort would only become noticeable on data sets that would be to large to comfortably handle on the MCUs RIOT targets

With the "fixed worst case", it becomes noticable with less than 32 list entries. At 32 list entries, the old merge sort is already 50% faster then this insertion sort (worst case). The nrf's memory fits 1000 times 32 list entries.

With 16384 list entries, merge sort can sort the list five times per second. (insertion sort is still running and will take a while, I will update this post when it is done.). This might be a size that is unlikely to come up on our little devices.

This insertion sort is still much faster in the "avg case" of this benchmark. (Which it shouldn't be actually. 🙂)

maribu added 3 commits April 13, 2025 11:58
- Iterate of different lengths of unsorted data to also cover corner case
  one-item list
- Actually check that the list is sorted afterwards (and not just that
  the maximum got sorted in last)
- Actually check that the list was stable-sorted (as the API doc
  promises a stable sort)
This replaces the previous clist sort implementation, which used a merge
sort, with a simple insertion sort implementation.

The motivation is as follow:

1. The insertion sort is smaller (code size)
2. The performance benefit of the merge sort would only become noticeable
   on data sets that would be to large to comfortably handle on the MCUs
   RIOT targets
3. The previous implementation looked like a good candidate for the
   IOCCC, while the insertion sort is trivial to understand
4. Static analysis could provide for two statements an example control
   flow path that would trigger a `NULL` pointer dereferencation.
   The current code has no such defects.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: core Area: RIOT kernel. Handle PRs marked with this with care! Area: sys Area: System Area: tests Area: tests and testing framework CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Process: missing approvals Integration Process: PR needs more ACKS (handled by action) Process: needs >1 ACK Integration Process: This PR requires more than one ACK State: waiting for other PR State: The PR requires another PR to be merged first Type: cleanup The issue proposes a clean-up / The PR cleans-up parts of the codebase / documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0