8000 Improve rendering performances of strokes by bhennion · Pull Request #6385 · xournalpp/xournalpp · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Improve rendering performances of strokes #6385

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

bhennion
Copy link
Collaborator
@bhennion bhennion commented Apr 13, 2025

This PR tries to improve rendering performances of strokes, following #4662.
As explained in #4662, the idea is to compute the contour of a stroke and call cairo_fill() only once per stroke (the current implementation calls cairo_stroke() - and hence cairo_fill() - once per point in the stroke).

The key is hence to compute the contour of a stroke. The contour means "some cairo_path on which to call cairo_fill() to obtain the desired stroke".
There are many different ways of getting a stroke from a sequence of points (x, y, pressure). I tried out two, so we have 3 rendering methods to discuss.

  1. Current master.
  2. Use every point's pressure value to make circles and add tangents:
    Copie d'écran_20250413_100750
    Make a contour out of that idea and call cairo_fill() only once.
  3. Use a point's pressure to make an oblong shape going to the next point (the last point's pressure value is not used).
    Copie d'écran_20250413_100438
    Make a contour out of that idea and call cairo_fill() only once. This one gives the exact same output as current master's implementation.

You can see a bunch of unnatural test cases in this example file example-v2.pdf example-v3.pdf.
Version 2. is somehow more intuitive, but 3. (and 1.) is what we've been rendering so far. Of course, for a "normal" stroke, the difference is not really visible.

I benchmarked the three versions (you can try the benchmarks on linux). I get (on a modest laptop with integrated GPU).

  1. 1.4 fps for the current rendering method (from master) (stroke-demo --xopp)
  2. 19.6 fps for implementation 2. (stroke-demo) (edit: latest version 24fps)
  3. 17.6 fps for implementation 3. (stroke-demo --pixel-precise) (edit: latest version 19.3ps)

Importantly, the way we render strokes also has an effect on Exports. On an handwritten page, I get:

  1. master: PDF 316 kB / SVG 39 MB (see Produces invalid svg #4112 for the amazing 39 MB...)
  2. implementation 2.: PDF 808kB / SVG 4.4 MB
  3. implementation 3.: PDF 995kB / SVG 5.1 MB

To summarize, we have 3 rendering methods:

  1. master version. Bad for screen rendering, bad for SVG. Produces smaller PDF exports (but more expensive the render)
  2. Implementation 2. as above. Fastest so far, produces smallest SVG files but PDF files are significantly bigger than 1. (but cheaper to render - again, only one fill() operation per stroke)
  3. Implementation 3. as above. Slightly slower than 2. (but still much much better than 1.). Produces larger SVG files and PDF files. The main argument for this one is: we do not change the output at all compared to master.

I tend to favor option 2. (for the efficiency and clean logic of the rendering) but a mix of 1. (for PDF exports) and 3. (for other renderings) could also be a good choice. @xournalpp/core I opened this draft PR with the following questions in mind:

  • Would we care if the rendering output changed (in a way that would most likely never ever be noticed by a user)?
  • Do we prefer smaller PDF outputs or bigger ones that are cheaper to render?

@bhennion bhennion marked this pull request as draft April 13, 2025 10:32
@rolandlo
Copy link
Member
  • Would we care if the rendering output changed (in a way that would most likely never ever be noticed by a user)?

I don't think so. The rendering output would probably only look a bit more natural.
The smallness of the output difference relies on the assumption that the pressure varies slowly, which may not be the case when adding strokes via a plugin, but even for plugins the rendering in 2 seems to make more sense in most situations.

  • Do we prefer smaller PDF outputs or bigger ones that are cheaper to render?

That's the harder question. I would probably go for the cheaper rendering. The impact looks more signficant and more in spirit of our goals:

Xournal++ is a hand note-taking software written in C++ with the target of
flexibility, functionality and speed

So method 2 looks like the way to go for me. I will take a look how PDF sizes are affected for exports of the documents I write in class though.

I'm wondering how this will work out in Gtk 4 in case we eventua 8000 lly move to render node based rendering (using gtk_snapshot_append_stroke).

@bhennion
Copy link
Collaborator Author

I agree with you about that "the rendering in 2 seems to make more sense in most situations" (hence my implementing it). However, it looks like the PDF size change is quite big (>= x2) for handwritten notes. This is IMO the biggest obstruction with this method...

I'm wondering how this will work out in Gtk 4 in case we eventually move to render node based rendering (using gtk_snapshot_append_stroke).

Well, there is a gtk_snapshot_append_fill taking in a GskPath, built using GskPathBuilder. The GskPathBuilder interface is not in 1-to-1 match to cairo's, but surely we can figure it out (then).

        Strokes are now implemented by constructing their contour and
calling cairo_fill() only once (as opposed to once per segment). This
results in huge performance improvements, depending on the ratio
  average distance btw points / average width
@bhennion
Copy link
Collaborator Author

Alright. I went for the half/half solution involving 1. and 3. Even if 2. feels more natural, I think the export size of PDF files was just way too big.
So I just kept using what master uses for PDF exports and used 3. for screen rendering. I think it matters that screen renderings and PDF exports give the same output (so I discarded the mixture of 1. and 2.).

On an implementation level, I kept debugging functions in. We may want to remove them.

And in terms of performances, this implementation is much faster than master's current one. In benchmarks, the FPS improvement seemed to depend on the ratio average distance between consecutive points / average width: it works better for elongated strokes than for fat ones.
The bottleneck is pretty much the calls to std::hypot and std::asin. I do not see a way of getting rid of them (except the small optimization I already put in the code).

I think this is now ready for review and testing.

@bhennion bhennion marked this pull request as ready for review April 20, 2025 16:43
@rolandlo
Copy link
Member

Alright. I went for the half/half solution involving 1. and 3. Even if 2. feels more natural, I think the export size of PDF files was just way too big. So I just kept using what master uses for PDF exports and used 3. for screen rendering. I think it matters that screen renderings and PDF exports give the same output (so I discarded the mixture of 1. and 2.).

That sounds reasonable. There are no drawbacks compared to master and most of the rendering performance gain from 2 is there in 3 as well.

On an implementation level, I kept debugging functions in. We may want to remove them.

And in terms of performances, this implementation is much faster than master's current one. In benchmarks, the FPS improvement seemed to depend on the ratio average distance between consecutive points / average width: it works better for elongated strokes than for fat ones. The bottleneck is pretty much the calls to std::hypot and std::asin. I do not see a way of getting rid of them (except the small optimization I already put in the code).

I think this is now ready for review and testing.

I'm particularly interested how this works out on MacOS, where rendering performance is still below par compared to Linux and Windows.

@rolandlo
Copy link
Member

\action create-installers macos windows

Copy link
Contributor

Caution

The following targets are invalid: ["macos"]. Usage is \action create-installers target1 target2...
Available targets are: ubuntu-22 ubuntu-24 debian mac-x64 mac-arm windows or all

@rolandlo
Copy link
Member

\action create-installers mac-x64 mac-arm windows

Copy link
Contributor

Started 3 build(s) (see details)

Copy link
Contributor

< 8000 div class="TimelineItem-badge ">
@bhennion bhennion added this to the v1.3.0 milestone Apr 22, 2025
@bhennion bhennion linked an issue Apr 30, 2025 that may be closed by this pull request
@chillibeanpaisley
Copy link

Look forward to latency improvements.
Suppose bottom line is that true real-time handwriting requires 20ms(equivalent to 50fps) at a minimum.
50fps may imply gpu compute code and pinned memory(no cpu calls).

Wonder what fps could be obtained by disabling pressure sensitivity and smoothing?
Would moving to vulkan provide much fps improvement with current code?

@bhennion
Copy link
Collaborator Author
bhennion commented May 8, 2025

@rolandlo I made a new benchmarking tool that should work on every supported platform. It is based on gtk-demo's benchmark app.
You can find the sources and instructions on this repo https://github.com/bhennion/xopp-bench

It allows to benchmark the three options of the OP (as a translation dictionary: "XOPP original" means option 1. in the OP, "XOPP contour" means option 3. and "Conic contour" means option 2.).
You can also play around with the constants defined in the header of stroke-demo.cpp: they can change the results.

@bhennion
Copy link
Collaborator Author
bhennion commented May 8, 2025

Wonder what fps could be obtained by disabling pressure sensitivity and smoothing?

Disabling pressure sensitivity is already possible in the settings. The rendering performances should indeed improve significantly.

Would moving to vulkan provide much fps improvement with current code?

It should allow for a more efficient use of hardware acceleration on some platforms (MacOS for instance). But this will only be possible once Xournal++ is fully ported to gtk4. Then we'll be able to benchmark this.

@chillibeanpaisley
Copy link

@bhennion Would like to keep pressure sensitivity for complete stroke renders(once per whole stroke) but turn it off for real-time display of ink. Could this be arranged? Will this give 50fps?

When do you see gtk4 port coming out roughly?

Comment on lines +63 to +64
cairo_line_to(cr, p2.x, p2.y);
cairo_arc(cr, p2.x, p2.y, .5 * p2.z, a1, a3 - M_PI_2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain this important case? I see where the path must head to, but why in this precise way?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagine you were to draw the contour of a single segment. You would make two half circles and two lines. In this contour, you have 4 junction vertices where a line meets a half circle.

The case we have here corresponds to: the junction vertices for the first segment are in the second segment's (This rarely happens -- we need a big width-variation and a small motion -- but it happens).

It this case, we fallback to a conservative (but suboptimal) approach:
Copie d'écran_20250529_083644

  • The cairo cursor is currently at one of the junction vertices of the first segment, so within the second segment. (the top of the small red half circle on the left)
  • We go to the center of the disk (draw the red oblique line on the left)
  • Then we draw the full quarter circle that corresponds to half of the the second's segment first half circle (the large red quarter circle on the left). Doing that also add a ray (the horizontal blue line on the left).

This is the simplest solution I could think of to ensure everything is painted upon cairo_fill. It is obviously not optimal (in the above example, we could actually optimize out the last point of the stroke), but it is provably doing the right thing: locally, we are always turning clockwise, so the winding number always increases and cannot accidentally drop back to 0 where is should not.

You can see other examples of this case at the bottom of this file
example-pp.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Export lines as continuous line, not strokes Improve stroke rendering performances
3 participants
0