Improve rendering performances of strokes #6385

bhennion · 2025-04-13T10:26:37Z

This PR tries to improve rendering performances of strokes, following #4662.
As explained in #4662, the idea is to compute the contour of a stroke and call cairo_fill() only once per stroke (the current implementation calls cairo_stroke() - and hence cairo_fill() - once per point in the stroke).

The key is hence to compute the contour of a stroke. The contour means "some cairo_path on which to call cairo_fill() to obtain the desired stroke".
There are many different ways of getting a stroke from a sequence of points (x, y, pressure). I tried out two, so we have 3 rendering methods to discuss.

Current master.
Use every point's pressure value to make circles and add tangents:

Make a contour out of that idea and call cairo_fill() only once.
Use a point's pressure to make an oblong shape going to the next point (the last point's pressure value is not used).

Make a contour out of that idea and call cairo_fill() only once. This one gives the exact same output as current master's implementation.

You can see a bunch of unnatural test cases in this example file example-v2.pdf example-v3.pdf.
Version 2. is somehow more intuitive, but 3. (and 1.) is what we've been rendering so far. Of course, for a "normal" stroke, the difference is not really visible.

I benchmarked the three versions (you can try the benchmarks on linux). I get (on a modest laptop with integrated GPU).

1.4 fps for the current rendering method (from master) (stroke-demo --xopp)
19.6 fps for implementation 2. (stroke-demo) (edit: latest version 24fps)
17.6 fps for implementation 3. (stroke-demo --pixel-precise) (edit: latest version 19.3ps)

Importantly, the way we render strokes also has an effect on Exports. On an handwritten page, I get:

master: PDF 316 kB / SVG 39 MB (see Produces invalid svg #4112 for the amazing 39 MB...)
implementation 2.: PDF 808kB / SVG 4.4 MB
implementation 3.: PDF 995kB / SVG 5.1 MB

To summarize, we have 3 rendering methods:

master version. Bad for screen rendering, bad for SVG. Produces smaller PDF exports (but more expensive the render)
Implementation 2. as above. Fastest so far, produces smallest SVG files but PDF files are significantly bigger than 1. (but cheaper to render - again, only one fill() operation per stroke)
Implementation 3. as above. Slightly slower than 2. (but still much much better than 1.). Produces larger SVG files and PDF files. The main argument for this one is: we do not change the output at all compared to master.

I tend to favor option 2. (for the efficiency and clean logic of the rendering) but a mix of 1. (for PDF exports) and 3. (for other renderings) could also be a good choice. @xournalpp/core I opened this draft PR with the following questions in mind:

Would we care if the rendering output changed (in a way that would most likely never ever be noticed by a user)?
Do we prefer smaller PDF outputs or bigger ones that are cheaper to render?

rolandlo · 2025-04-15T16:52:34Z

Would we care if the rendering output changed (in a way that would most likely never ever be noticed by a user)?

I don't think so. The rendering output would probably only look a bit more natural.
The smallness of the output difference relies on the assumption that the pressure varies slowly, which may not be the case when adding strokes via a plugin, but even for plugins the rendering in 2 seems to make more sense in most situations.

Do we prefer smaller PDF outputs or bigger ones that are cheaper to render?

That's the harder question. I would probably go for the cheaper rendering. The impact looks more signficant and more in spirit of our goals:

Xournal++ is a hand note-taking software written in C++ with the target of
flexibility, functionality and speed

So method 2 looks like the way to go for me. I will take a look how PDF sizes are affected for exports of the documents I write in class though.

I'm wondering how this will work out in Gtk 4 in case we eventua 8000 lly move to render node based rendering (using gtk_snapshot_append_stroke).

bhennion · 2025-04-15T17:08:20Z

I agree with you about that "the rendering in 2 seems to make more sense in most situations" (hence my implementing it). However, it looks like the PDF size change is quite big (>= x2) for handwritten notes. This is IMO the biggest obstruction with this method...

I'm wondering how this will work out in Gtk 4 in case we eventually move to render node based rendering (using gtk_snapshot_append_stroke).

Well, there is a gtk_snapshot_append_fill taking in a GskPath, built using GskPathBuilder. The GskPathBuilder interface is not in 1-to-1 match to cairo's, but surely we can figure it out (then).

model/MathVect.h.

Strokes are now implemented by constructing their contour and calling cairo_fill() only once (as opposed to once per segment). This results in huge performance improvements, depending on the ratio average distance btw points / average width

bhennion · 2025-04-20T16:42:59Z

Alright. I went for the half/half solution involving 1. and 3. Even if 2. feels more natural, I think the export size of PDF files was just way too big.
So I just kept using what master uses for PDF exports and used 3. for screen rendering. I think it matters that screen renderings and PDF exports give the same output (so I discarded the mixture of 1. and 2.).

On an implementation level, I kept debugging functions in. We may want to remove them.

And in terms of performances, this implementation is much faster than master's current one. In benchmarks, the FPS improvement seemed to depend on the ratio average distance between consecutive points / average width: it works better for elongated strokes than for fat ones.
The bottleneck is pretty much the calls to std::hypot and std::asin. I do not see a way of getting rid of them (except the small optimization I already put in the code).

I think this is now ready for review and testing.

rolandlo · 2025-04-21T12:43:51Z

Alright. I went for the half/half solution involving 1. and 3. Even if 2. feels more natural, I think the export size of PDF files was just way too big. So I just kept using what master uses for PDF exports and used 3. for screen rendering. I think it matters that screen renderings and PDF exports give the same output (so I discarded the mixture of 1. and 2.).

That sounds reasonable. There are no drawbacks compared to master and most of the rendering performance gain from 2 is there in 3 as well.

On an implementation level, I kept debugging functions in. We may want to remove them.

And in terms of performances, this implementation is much faster than master's current one. In benchmarks, the FPS improvement seemed to depend on the ratio average distance between consecutive points / average width: it works better for elongated strokes than for fat ones. The bottleneck is pretty much the calls to std::hypot and std::asin. I do not see a way of getting rid of them (except the small optimization I already put in the code).

I think this is now ready for review and testing.

I'm particularly interested how this works out on MacOS, where rendering performance is still below par compared to Linux and Windows.

rolandlo · 2025-04-21T12:44:45Z

\action create-installers macos windows

github-actions · 2025-04-21T12:44:58Z

Caution

The following targets are invalid: ["macos"]. Usage is \action create-installers target1 target2...
Available targets are: ubuntu-22 ubuntu-24 debian mac-x64 mac-arm windows or all

rolandlo · 2025-04-21T12:45:32Z

\action create-installers mac-x64 mac-arm windows

github-actions · 2025-04-21T12:45:42Z

Started 3 build(s) (see details)

github-actions · 2025-04-21T12:54:13Z

CI successful!
Available artifacts:

chillibeanpaisley · 2025-05-06T10:47:47Z

Look forward to latency improvements.
Suppose bottom line is that true real-time handwriting requires 20ms(equivalent to 50fps) at a minimum.
50fps may imply gpu compute code and pinned memory(no cpu calls).

Wonder what fps could be obtained by disabling pressure sensitivity and smoothing?
Would moving to vulkan provide much fps improvement with current code?

bhennion · 2025-05-08T11:39:51Z

@rolandlo I made a new benchmarking tool that should work on every supported platform. It is based on gtk-demo's benchmark app.
You can find the sources and instructions on this repo https://github.com/bhennion/xopp-bench

It allows to benchmark the three options of the OP (as a translation dictionary: "XOPP original" means option 1. in the OP, "XOPP contour" means option 3. and "Conic contour" means option 2.).
You can also play around with the constants defined in the header of stroke-demo.cpp: they can change the results.

bhennion · 2025-05-08T11:42:26Z

Wonder what fps could be obtained by disabling pressure sensitivity and smoothing?

Disabling pressure sensitivity is already possible in the settings. The rendering performances should indeed improve significantly.

Would moving to vulkan provide much fps improvement with current code?

It should allow for a more efficient use of hardware acceleration on some platforms (MacOS for instance). But this will only be possible once Xournal++ is fully ported to gtk4. Then we'll be able to benchmark this.

chillibeanpaisley · 2025-05-14T07:32:24Z

@bhennion Would like to keep pressure sensitivity for complete stroke renders(once per whole stroke) but turn it off for real-time display of ink. Could this be arranged? Will this give 50fps?

When do you see gtk4 port coming out roughly?

rolandlo · 2025-05-28T20:29:23Z

src/core/model/StrokeContour.cpp

+            cairo_line_to(cr, p2.x, p2.y);
+            cairo_arc(cr, p2.x, p2.y, .5 * p2.z, a1, a3 - M_PI_2);


Can you please explain this important case? I see where the path must head to, but why in this precise way?

Imagine you were to draw the contour of a single segment. You would make two half circles and two lines. In this contour, you have 4 junction vertices where a line meets a half circle.

The case we have here corresponds to: the junction vertices for the first segment are in the second segment's (This rarely happens -- we need a big width-variation and a small motion -- but it happens).

It this case, we fallback to a conservative (but suboptimal) approach:

The cairo cursor is currently at one of the junction vertices of the first segment, so within the second segment. (the top of the small red half circle on the left)

We go to the center of the disk (draw the red oblique line on the left)

Then we draw the full quarter circle that corresponds to half of the the second's segment first half circle (the large red quarter circle on the left). Doing that also add a ray (the horizontal blue line on the left).

This is the simplest solution I could think of to ensure everything is painted upon cairo_fill. It is obviously not optimal (in the above example, we could actually optimize out the last point of the stroke), but it is provably doing the right thing: locally, we are always turning clockwise, so the winding number always increases and cannot accidentally drop back to 0 where is should not.

You can see other examples of this case at the bottom of this file
example-pp.pdf

bhennion force-pushed the pr/fastRender branch from 756edac to 81b32e6 Compare April 13, 2025 10:31

bhennion marked this pull request as draft April 13, 2025 10:32

bhennion linked an issue Apr 16, 2025 that may be closed by this pull request

Improve stroke rendering performances #4662

Open

tattsan mentioned this pull request Apr 17, 2025

Color Emoji Fonts are poorly rendered in exported PDF. #3692

Open

Move mathematical vectors from control/tools/StrokeStabilier to

b193e9e

model/MathVect.h.

bhennion force-pushed the pr/fastRender branch from 8c46eb9 to 9ae512a Compare April 20, 2025 15:53

Implement faster rendering of strokes

de2be5c

Strokes are now implemented by constructing their contour and calling cairo_fill() only once (as opposed to once per segment). This results in huge performance improvements, depending on the ratio average distance btw points / average width

bhennion force-pushed the pr/fastRender branch from 9ae512a to de2be5c Compare April 20, 2025 15:55

bhennion marked this pull request as ready for review April 20, 2025 16:43

< 8000 div class="TimelineItem-badge ">

bhennion mentioned this pull request Apr 30, 2025

Export lines as continuous line, not strokes #6407

Open

bhennion linked an issue Apr 30, 2025 that may be closed by this pull request

Export lines as continuous line, not strokes #6407

Open

rolandlo reviewed May 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve rendering performances of strokes #6385

Improve rendering performances of strokes #6385

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		cairo_line_to(cr, p2.x, p2.y);
		cairo_arc(cr, p2.x, p2.y, .5 * p2.z, a1, a3 - M_PI_2);

Improve rendering performances of strokes #6385

Are you sure you want to change the base?

Improve rendering performances of strokes #6385

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!