8000 cpu/native: fix bug in periph_timer by maribu · Pull Request #20009 · RIOT-OS/RIOT · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

cpu/native: fix bug in periph_timer #20009

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 3, 2023

Conversation

maribu
Copy link
Member
@maribu maribu commented Oct 23, 2023

Contribution description

While debugging #18977 (comment) it became obvious that the periph_timer in native is broken and issues early IRQs. This replaces the use of setitimer that cannot use a monotonic clock source with timer_settime().

Testing procedure

I have some non-publishable code that tests if the time an ISR fires in terms of timer_read() is no earlier than the time expected. This occasionally triggered with master, but I didn't see any of these issues anymore with this PR. I guess I should revive my PR to spice up the periph timer tests and add a polished version of this and let this run for an hour or two.

The tests ins tests/periph/timer* should still succeed on native. (They do for me in a container running riot/riotbuild).

Issues/PRs references

Found while debugging #18977 (comment)

@maribu maribu requested a review from benpicco October 23, 2023 11:19
@github-actions github-actions bot added Platform: native Platform: This PR/issue effects the native platform Area: cpu Area: CPU/MCU ports labels Oct 23, 2023
@Wer-Wolf
Copy link
Contributor

Could this be because we use ITIMER_REAL as clock source, which is affected by adjtime(2) and settimeofday(2)?
Maybe this problem could be solved by using timer_create() with CLOCK_MONOTONIC.

@MrKevinWeiss
Copy link
Contributor

Should this be backported? Probably not a bad idea.

@maribu maribu force-pushed the native_timer_bug_work_around branch from fd90d58 to 96eef23 Compare October 26, 2023 09:08
@maribu maribu changed the title cpu/native: work around bug in periph_timer cpu/native: fix bug in periph_timer Oct 26, 2023
@maribu maribu force-pushed the native_timer_bug_work_around branch 2 times, most recently from eabef72 to fd1fd2c Compare October 26, 2023 09:11
@maribu
Copy link
Member Author
maribu commented Oct 26, 2023

Could this be because we use ITIMER_REAL as clock source, which is affected by adjtime(2) and settimeofday(2)? Maybe this problem could be solved by using timer_create() with CLOCK_MONOTONIC.

Yes, indeed! This was actually also my first intuition. But rather than reading the code I did a quick grep for CLOCK_[A-Z]*, which only showed matches for CLOCK_MONOTONIC and none CLOCK_REALTIME, so I assumed this wasn't the issue. Sometime actually reading the code containing the bug does pay off 😅

@benpicco
Copy link
Contributor

Hm I still get

>>> timeout happened 9739 µs early <<<
>>> timeout happened 9623 µs early <<<
>>> timeout happened 2156 µs early <<<
>>> timeout happened 3265 µs early <<<
>>> timeout happened 9689 µs early <<<
>>> timeout happened 9795 µs early <<<
>>> timeout happened 2262 µs early <<<
>>> timeout happened 9846 µs early <<<
>>> timeout happened 9796 µs early <<<
>>> timeout happened 9647 µs early <<<
>>> timeout happened 4028 µs early <<<
>>> timeout happened 9670 µs early <<<
>>> timeout happened 4650 µs early <<<
>>> timeout happened 9687 µs early <<<
>>> timeout happened 9769 µs early <<<

with this (and the patches from #18977) applied

@Wer-Wolf
Copy link
Contributor

Maybe the fact that timer_start()/timer_stop() do not stop the value returned by timer_read() from incrementing could also be the reason for this.

@maribu
Copy link
Member Author
maribu commented Oct 26, 2023

@benpicco: That's expected. Most of the time that bug triggered it was not due to the periph_timer bug. But it should be a tad less frequent now, with one of the causes identified and fixed.

@benpicco benpicco added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Oct 27, 2023
@benpicco
Copy link
Contributor
benpicco commented Oct 27, 2023

Urgh so are you suggesting there is (at least) a third bug (besides this one and #18977) at play here that makes timeouts unreliable 😩

Please squash btw

Copy link
Contributor
@benpicco benpicco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trusting our timer unittests

@maribu
Copy link
Member Author
maribu commented Oct 27, 2023

Urgh so are you suggesting there is (at least) a third bug

Yes :) But on the other hand 2/3 of the issue is hopefully solved. (Well, the third bug is the one with the most fallout, though).

@maribu maribu force-pushed the native_timer_bug_work_around branch from 5b2d9b2 to f9f0c27 Compare October 27, 2023 12:35
@riot-ci
Copy link
riot-ci commented Oct 27, 2023

Murdock results

✔️ PASSED

50b841e cpu/native: drop unused real_setitimer

Success Failures Total Runtime
7953 0 7953 17m:12s

Artifacts

@MrKevinWeiss MrKevinWeiss added the Process: needs backport Integration Process: The PR is required to be backported to a release or feature branch label Nov 1, 2023
@MrKevinWeiss
Copy link
Contributor

Do we think this can get in the next little while or should we make it a known issue for this release?

@maribu
Copy link
Member Author
maribu commented Nov 2, 2023

Murdock found quite the bug. I really should give more priority my rewrite of the periph_timer test app higher on my todo, given that the current one was unable to detect the completely broken timer_stop() implementation 😱

Also use `CLOCK_MONOTONIC` for the timeouts, not just for
`timer_read()`. This fixes mismatches between when a timeout
occurs and what is expected in the context of the values returned by
`timer_read()`.
@maribu maribu force-pushed the native_timer_bug_work_around branch from f9f0c27 to 50b841e Compare November 2, 2023 13:13
@maribu maribu added the CI: ready for merge train 🚃 PR is ready to be merged and awaiting the next merge train label Nov 2, 2023
@MrKevinWeiss
Copy link
Contributor

bors merge

Copy link
Contributor
bors bot commented Nov 3, 2023

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@bors bors bot merged commit 4250c15 into RIOT-OS:master Nov 3, 2023
@maribu maribu deleted the native_timer_bug_work_around branch December 5, 2023 08:21
@chrysn
Copy link
Member
chrysn commented Aug 17, 2024

A brief note from a workshop, this broke building on Debian 11.

maribu added a commit to maribu/RIOT that referenced this pull request Aug 17, 2024
maribu added a commit to maribu/RIOT that referenced this pull request Aug 18, 2024
ant9000 pushed a commit to ant9000/RIOT that referenced this pull request Aug 23, 2024
dprigoshij pushed a commit to dprigoshij/RIOT that referenced this pull request Mar 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: cpu Area: CPU/MCU ports CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR CI: ready for merge train 🚃 PR is ready to be merged and awaiting the next merge train Platform: native Platform: This PR/issue effects the native platform Process: needs backport Integration Process: The PR is required to be backported to a release or feature branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants
0