Add per-event multithreaded rendering #636

rcombs · 2022-07-27T09:50:07Z

Long-awaited successor to #107.

TODO:

Make sure we require a new enough (thread-safe) fribidi
Gate CONFIG_PTHREAD on <stdatomic.h> availability
Implement the pthread functions used for win32 (anybody wanna take this one? this only really uses some fairly basic functions)
Detect CPU count on Win32
Provide an API to set thread count

And of course, this'll need some extensive testing (I've put it through some basics under asan/tsan, but I'm sure other folks have other ideas; stuff with a lot of weird fonts is likely to be tricky).

Note that this PR can be merged piecemeal; any initial series of commits will work fine without the rest. In particular, we could merge the portion prior to configure: detect pthreads ahead of the last few commits that actually add pthread-specific code.

libass/ass_rasterizer.h

libass/ass_cache.c

libass/ass_render.h

configure.ac

astiob

Partial review: I’ve looked at all the commits before pthreads. They look good to me sans a few nits. 👍 That said, because I haven’t reviewed the threading commits yet, I can’t vouch that the set of things moved inside RenderContext is complete and minimal.

One commit message, “ass_render: take BitmapContext* in render_text()”, mistakenly refers to “BitmapContext” instead of RenderContext.

By the way: if we’re gonna use stdatomic.h, why not use C11 threads and mutexes as well? Is it that stdatomic.h is easier to ~~polyfill~~ provide our own fallback for? For reference, Visual C++/UCRT don’t have any of this yet, but it supposedly is all “on the roadmap”.

libass/ass_render.c

libass/ass_bitmap.h

rcombs · 2022-07-30T20:02:58Z

C11 threads aren't supported on Darwin/macOS yet, and introduce a dependency on a relatively recent libc on Linux. Pthreads and atomics are both supported everywhere relevant (including Windows via MinGW). If we really cared about threading in win32 builds that don't bring their own winpthread wrapper, we could provide our own header-only polyfill just for that, and if we really cared about threading in MSVC builds (I still don't get why anyone would want to do this…), we could polyfill that as well. I don't think any of that needs to block this as long as MinGW builds against an external pthread package work, though.

rcombs · 2022-07-30T20:22:54Z

@TheOneric There should be no need for -pthreads in CFLAGS or when linking against shared libass; no pthread structs or symbols are part of the public API or ABI as of this PR. So your patch looks reasonable, and I'll experiment with it a bit more soon.

@astiob, I've also fixed the commit message you mentioned.

More generally: this has improved overall performance on every case I've tried so far (though I haven't tested any that only ever have a single event per frame; those obviously won't see any benefit, but we should at least confirm the added overhead isn't too severe. If so, it might be worth falling back to the single-thread path when only one event is selected.). However, the improvement hasn't been as large as I'd have liked in some cases, and the added CPU usage from mutex overhead in the caches is sometimes substantial, so it's probably worth experimenting a bit more to see if we can reduce that pre-merge.

We might also want to tune the thread count a little more carefully (like, maybe we only benefit from using up to some particular number? Are there specific tweaks worth making for big.LITTLE systems, like core-affinity pinning?).

What do we think an API for setting thread count should look like? We'd either have to always spawn [NCORES] at renderer-setup and then potentially rejoin+restart them when a count change is requested, or lazy-spawn the threads only when a render call is made.

DanOscarsson · 2022-07-31T07:44:18Z

As this does make things faster when there is one event per frame I think you have to ensure that that case is not much slower.
I use mpv and in it libass is used for subtitles and for "on screen display".
In the case "on screen display" there will often be more then one event per frame but thus case is only used now and then.
But for the normal case with subtitles will in most cases only have one event per frame.
The most common (I suspect just below 100 %) use of subtitles if for movies and TV series which almost always use .srt format. The same is valid for most video viewers that use libass for subtitle rendering. While .ass is used for anime the most common viewed videos use .srt and as libass is used by many video players to show subtitles it is important that libass will not be slower/use more CPU for that type of subtitles.

rcombs · 2022-07-31T09:34:36Z

I just pushed a commit with some cache experiments that should make most hits lock-free (and misses lower-overhead). It reduces the amount of time we spend in mutex calls dramatically, but still needs some more cleanup (in particular, it currently doesn't build when pthreads are disabled).

TheOneric

If I didn't miss something, there's no special handling for external function calls (only the message callback?) from threads. Is your intent to change the API to require callbacks to be thread-safe in the future?

(Still not a complete review)

libass/ass_render.c

Items retrieved from cache are always kept until at least the end of the frame, so there's no need to explicitly ref them when they're only used intra-frame. Instead, explicitly ref items when we'll be keeping them around for longer (e.g. as part of another cache item's key). Note that ass_shaper already depended on cache items never being released mid-frame prior to this change. This will save massively on atomic contention later.

Versions prior to this were not thread-safe by default.

This is automatically set by AM_PROG_CC if needed, and potentially to a newer version (e.g. C11, which we need for atomics).

This will later be used to shard promotion and locking per-thread

rcombs · 2024-06-22T05:08:55Z

Closing this out to open a new PR with the current cache design, as the thread here has gotten so long as to be impractical.

rcombs requested review from astiob, MrSmile and TheOneric and removed request for MrSmile July 27, 2022 09:50

rcombs force-pushed the threading branch 3 times, most recently from bfcfbbb to 053f80e Compare July 27, 2022 10:13

MrSmile reviewed Jul 27, 2022

View reviewed changes

libass/ass_rasterizer.h Outdated Show resolved Hide resolved

libass/ass_cache.c Outdated Show resolved Hide resolved

libass/ass_cache.c Show resolved Hide resolved

rcombs force-pushed the threading branch 3 times, most recently from aecd930 to 0e138c3 Compare July 27, 2022 22:09

MrSmile reviewed Jul 28, 2022

View reviewed changes

libass/ass_render.h Outdated Show resolved Hide resolved

rcombs force-pushed the threading branch from 0e138c3 to cb7721e Compare July 29, 2022 09:10

TheOneric reviewed Jul 29, 2022

View reviewed changes

configure.ac Outdated Show resolved Hide resolved

TheOneric mentioned this pull request Jul 30, 2022

Regression tests in make check, sanitisers in CI #631

Merged

astiob reviewed Jul 30, 2022

View reviewed changes

libass/ass_render.c Outdated Show resolved Hide resolved

libass/ass_bitmap.h Outdated Show resolved Hide resolved

rcombs force-pushed the threading branch from cb7721e to 0a88c3d Compare July 30, 2022 20:08

rcombs force-pushed the threading branch from 0a88c3d to b76ddde Compare July 31, 2022 04:35

rcombs force-pushed the threading branch 2 times, most recently from cc1e523 to 66cf76c Compare July 31, 2022 09:32

rcombs force-pushed the threading branch 2 times, most recently from 41440fe to bb31b71 Compare August 1, 2022 05:57

TheOneric reviewed Aug 3, 2022

View reviewed changes

libass/ass_render.c 10000 Outdated Show resolved Hide resolved

rcombs force-pushed the threading branch from bb31b71 to ee39d80 Compare August 4, 2022 21:08

TheOneric added the performance label Sep 12, 2022

TheOneric linked an issue Sep 15, 2022 that may be closed by this pull request

Multi-threading! #56

Open

rcombs force-pushed the threading branch 2 times, most recently from 4e998f3 to df891dc Compare May 19, 2024 21:36

rcombs added 2 commits June 5, 2024 18:10

ass_cache: remove unused stats functionality

ed1ad65

rcombs mentioned this pull request Jun 6, 2024

Cache changes extracted from threading PR #783

Merged

rcombs added 2 commits June 5, 2024 20:01

ass_shaper: avoid mutating shaper_metrics_data in get_cached_metrics

33eb659

ass_shaper: remove redundant member

9058bec

rcombs mentioned this pull request Jun 6, 2024

Cache per-size metrics in shaper #784

Merged

rcombs and others added 18 commits June 5, 2024 20:52

ass_shaper: cache face size metrics

42bf2fd

ass_shaper: don't mutate the shared hb_font_t during shaping

55163e9

ass_shaper: shrink ass_shaper_metrics_data slightly

3dc5488

ghci: print contents of test-suite.log

6a35aa7

configure: require fribidi 0.19.7

1f46150

Versions prior to this were not thread-safe by default.

configure: detect threads and atomics

2a80d85

build: remove explicit -std=gnu99

9228729

This is automatically set by AM_PROG_CC if needed, and potentially to a newer version (e.g. C11, which we need for atomics).

ass_threading: add pthread/atomics support header

5252be2

ci/gha: add a TSAN build

05609c3

ass_utils: add locking around log callback

12d5afe

ass_font(select): add thread-safety

bc5e2d2

ass_font: add locking routines

c94d981

ass_font: access FT_Face thread-safely

28a2c01

ass_shaper: access FT_Face thread-safely

119c97d

ass_cache: introduce CacheClient API

5594f7e

This will later be used to shard promotion and locking per-thread

ass_cache: split key and value destructors

8d5847e

ass_cache: introduce lockfree thread-safe architecture

8ad9183

ass_render: add support for per-event threaded rendering

7077a53

rcombs force-pushed the threading branch from df891dc to 7077a53 Compare June 6, 2024 04:50

rcombs closed this Jun 22, 2024

rcombs mentioned this pull request Jun 22, 2024

Threading #793

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add per-event multithreaded rendering #636

Add per-event multithreaded rendering #636

Add per-event multithreaded rendering #636

Add per-event multithreaded rendering #636

Conversation

Choose a reason for hiding this comment

Choose a reason for hiding this comment