-
Notifications
You must be signed in to change notification settings - Fork 226
Add per-event multithreaded rendering #636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bfcfbbb
to
053f80e
Compare
aecd930
to
0e138c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review: I’ve looked at all the commits before pthreads. They look good to me sans a few nits. 👍 That said, because I haven’t reviewed the threading commits yet, I can’t vouch that the set of things moved inside RenderContext
is complete and minimal.
One commit message, “ass_render: take BitmapContext* in render_text()”, mistakenly refers to “BitmapContext” instead of RenderContext.
By the way: if we’re gonna use stdatomic.h
, why not use C11 threads and mutexes as well? Is it that stdatomic.h is easier to polyfill provide our own fallback for? For reference, Visual C++/UCRT don’t have any of this yet, but it supposedly is all “on the roadmap”.
C11 threads aren't supported on Darwin/macOS yet, and introduce a dependency on a relatively recent libc on Linux. Pthreads and atomics are both supported everywhere relevant (including Windows via MinGW). If we really cared about threading in win32 builds that don't bring their own winpthread wrapper, we could provide our own header-only polyfill just for that, and if we really cared about threading in MSVC builds (I still don't get why anyone would want to do this…), we could polyfill that as well. I don't think any of that needs to block this as long as MinGW builds against an external pthread package work, though. |
@TheOneric There should be no need for -pthreads in CFLAGS or when linking against shared libass; no pthread structs or symbols are part of the public API or ABI as of this PR. So your patch looks reasonable, and I'll experiment with it a bit more soon. @astiob, I've also fixed the commit message you mentioned. More generally: this has improved overall performance on every case I've tried so far (though I haven't tested any that only ever have a single event per frame; those obviously won't see any benefit, but we should at least confirm the added overhead isn't too severe. If so, it might be worth falling back to the single-thread path when only one event is selected.). However, the improvement hasn't been as large as I'd have liked in some cases, and the added CPU usage from mutex overhead in the caches is sometimes substantial, so it's probably worth experimenting a bit more to see if we can reduce that pre-merge. We might also want to tune the thread count a little more carefully (like, maybe we only benefit from using up to some particular number? Are there specific tweaks worth making for big.LITTLE systems, like core-affinity pinning?). What do we think an API for setting thread count should look like? We'd either have to always spawn [NCORES] at renderer-setup and then potentially rejoin+restart them when a count change is requested, or lazy-spawn the threads only when a render call is made. |
As this does make things faster when there is one event per frame I think you have to ensure that that case is not much slower. |
cc1e523
to
66cf76c
Compare
I just pushed a commit with some cache experiments that should make most hits lock-free (and misses lower-overhead). It reduces the amount of time we spend in mutex calls dramatically, but still needs some more cleanup (in particular, it currently doesn't build when pthreads are disabled). |
41440fe
to
bb31b71
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I didn't miss something, there's no special handling for external function calls (only the message callback?) from threads. Is your intent to change the API to require callbacks to be thread-safe in the future?
(Still not a complete review)
4e998f3
to
df891dc
Compare
Items retrieved from cache are always kept until at least the end of the frame, so there's no need to explicitly ref them when they're only used intra-frame. Instead, explicitly ref items when we'll be keeping them around for longer (e.g. as part of another cache item's key). Note that ass_shaper already depended on cache items never being released mid-frame prior to this change. This will save massively on atomic contention later.
Versions prior to this were not thread-safe by default.
This is automatically set by AM_PROG_CC if needed, and potentially to a newer version (e.g. C11, which we need for atomics).
This will later be used to shard promotion and locking per-thread
Closing this out to open a new PR with the current cache design, as the thread here has gotten so long as to be impractical. |
Long-awaited successor to #107.
TODO:
CONFIG_PTHREAD
on<stdatomic.h>
availabilityAnd of course, this'll need some extensive testing (I've put it through some basics under asan/tsan, but I'm sure other folks have other ideas; stuff with a lot of weird fonts is likely to be tricky).
Note that this PR can be merged piecemeal; any initial series of commits will work fine without the rest. In particular, we could merge the portion prior to
configure: detect pthreads
ahead of the last few commits that actually add pthread-specific code.