8000 Support Banner/Scroll effects’ fadeawaywidth/height gradient by astiob · Pull Request #666 · libass/libass · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Support Banner/Scroll effects’ fadeawaywidth/height gradient #666

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

astiob
Copy link
Member
@astiob astiob commented Nov 17, 2022

Guide to reviewing the commits:

  1. The first commit lays down some foundation and implements an integer-only gradient. If you’re not interested in the integer-only gradient specifically, feel free to skip the gradient arithmetic in your initial review.
  2. The third commit implements a mathematically sound floating-point gradient, the “sanest” variant. This can/will be squashed into the first commit if we settle on a floating-point variant, completely replacing the integer-only code.
  3. The fourth commit is a small but icky patch to get us closer to DirectShow VSFilter if we think that’s useful. We can decide to drop this.

There are several bikesheds to be coloured here:

  • Should the gradient be implemented as the splitting of ASS_Images with tweaked ASS_Image alphas (as the proposed code does) or by pre-multiplying the backing bitmap(s)?

    • VSFilter does the latter. But this is a scrolling event, so (unless it’s scrolling reaaaally slowly, which may be used to emulate a stationary gradient) this can’t be cached and the whole bitmap needs to be redone on each frame. Plus, we’d need either to allocate and fill a separate bitmap for the gradient to use mul_bitmaps on it (VSFilter does this) or introduce a special-purpose multiplier routine for this, which we’d probably like to have an assembler-language implementation for (which someone would have to create).

    • Splitting ASS_Images lets us forego any bitmap multiplications and allows bitmap reuse (though reuse can still be limited due to subpixel shifts during the scrolling), but it requires more ASS_Image allocations and breaks those API consumers that use ASS_Images to estimate event bounding boxes.

  • How stable should the output be on changing display resolution, and how close should it be to VSFilter’s?

    I initially offer three variants in this PR. One is a stupid simple implementation, much of the same kind of stupid as in VSFilter and matching it somewhat but not fully, and technically moving by sub-pixel amounts as display resolution is changed. Another is a mathematically ideal variant that to me is the most intuitive thing an author would expect from fadeawayheight/width without knowing how any implementation actually does it. The third is a mathematically exact replica of traditional VSFilter’s gradient position, but I didn’t replicate the bad rounding of values (which in some sense shifts the gradient further), so I’m not entirely sure how this even helps. I can add that if desired, though.

    I haven’t had much luck imagining situations where a carefully crafted script might be significantly affected by these differences. But I haven’t tried looking for real examples, either. Theoretically, a script might use these effects to achieve a colour gradient.

    For reference, VSFilter’s rounding errors on values:

    1. It computes the gradient step as floor(2¹⁴ / gradientWidthOrHeightInDisplayPixels) and repeatedly adds this (starting from fully transparent) for the leading gradient or subtracts this (starting from fully opaque) for the trailing gradient, accumulating/multiplying the error caused by the floor. As a result, the leading gradient never quite smoothly reaches 1, and the trailing gradient never quite smoothly reaches 0; and the higher the resolution, the worse it may get.

    2. It computes the final opacity as floor(opacity × gradient / 2¹⁴). And the opacities are in the range 0–64, so even this floor alone is worth up to ¹⁄₆₄ ≈ ⁴⁄₂₅₅. The leading gradient in particular might thus appear to originate another few pixels beyond the designated point.

  • I’ve noticed the rounding of \clip(,,,) coordinates is different among the various renderers. Do we or do we not want to align this to any particular VSFilter flavour, or perhaps independently improve our own?

@TheOneric
Copy link
Member

It appears this introduces (or triggers an existing) issue with state persisting beyond its intended lifetime. Here’s a sample:

[Script Info]
ScriptType: v4.00+
ScaledBorderAndShadow: yes
YCbCr Matrix: None
PlayResX: 1920
PlayResY: 1080

[Aegisub Project Garbage]
Video File: ?dummy:25.000000:39000:1920:1080:47:191:225:

[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,DejaVu Sans,48,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100,100,0,0,1,0,0,2,10,10,10,1

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.00,0:00:04.99,Default,,0,0,0,Banner;9;0;300,{\4c&HFF0000\1a&HFE\xshad-1250}{\p1}m 0 0 l 600 0 600 300 0 300
Dialogue: 0,0:00:05.00,0:00:08.00,Default,,0,0,0,,AAA{\1c&HFF0000}{\p1}m 0 0 l 600 0 600 300 0 300{\p0}AAAA

The second event doesn't render if the first one was displayed before it. If playback is started after the first event or the first even is commented the second event shows up normally.

@astiob
Copy link
Member Author
astiob commented Nov 17, 2022

Thanks for testing! I had only tried single frames, as I don’t (yet) have an easily testable video playback setup. Try with the new code, please.

I’ve added a second condition to apply_scroll_fade. It appears we reset evt_type but not any of the scroll_... properties in init_render_context, which makes sense but I didn’t pay attention to.

@TheOneric
Copy link
Member

Try with the new code, please.

Both events show up now.

Copy link
Member
@TheOneric TheOneric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An intial look at the floating-point version and \clip rounding. I didn’t think about the rounding error emulation yet.

  • If there’s no visible compatibility problems, I think the floating-point variant is good
  • Similarly with \clip rounding. Since *VSFilters apparently already differ and if we don't expect visible issues from not faithfully emulating one flavour, the approach here seems fine to me
  • Being mostly stable across different rendering resolutions is imo desireable

I find the code around multiple ASS_Images with borrowed buffers a bit confusing, but tbf that’s due to preëxisting API and lack of documentation so not directly relevant to this PR. I particularly dislike how the private my_draw_bitmap function returns ASS_ImagePriv *s as ASS_Image *s and the result is then cast back to ASS_ImagePriv * in various places; as an internal-only API it could/should imho just directly return ASS_ImagePriv *. I’m also not sure what ASS_ImagePriv->buffer even is.

Comment on lines +736 to +737
ASS_Image *nimg = ass_image_borrow_copy(cur);
nimg->dst_x = x + render_priv->settings.left_margin;
cur->w = nimg->dst_x - cur->dst_x;
nimg->w -= cur->w;
nimg->bitmap += cur->w;
cur->next = nimg;
cur = nimg;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure it’s actually worthwhile, but in theory you could add a new IMAGE_TYPE_GRADIENT for ASS_Image and use it here, so consumers who want to calculate bounding boxes will know to merge succesive neighbouring gradient image of the same colour sans alpha.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would erase the distinction between shadows, borders and main images for events affected by gradients, defeating the point of image->type. We could, of course, make a whole parallel set of three new values… But it would still be impossible in some cases to tell where one gradient event ends and another gradient event starts.

By the way, why do the IMAGE_TYPE_... constants not start with ASS_? Sigh.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, why do the IMAGE_TYPE_... constants not start with ASS_? Sigh.

Btw, all macros and enums in ass_types.h but the JUSTIFY ones also lack an ASS_ prefix.

@astiob
Copy link
Member Author
astiob commented Nov 19, 2022

I’m also not sure what ASS_ImagePriv->buffer even is.

Currently, an ASS_Image references a source bitmap: either a borrowed cache entry (in which case source is set and reference-counted while buffer is null) or a directly owned bitmap buffer (in which case source is null but buffer is this very buffer). The latter is used for BorderStyle=4 in add_background and for blended vector clips in blend_vector_clip; the former is used for everything else.

This adds a third mode, 8000 where the ASS_Image references a direct bitmap buffer that is borrowed from another ASS_Image that owns it. In this case, both source and buffer are null, and the owning image is in the same image list, so both images will be freed by ass_frame_unref at the same time.

Copy link
Member
@MrSmile MrSmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that multiplication would be faster than ton of separate images, but that depends on ASS_Image treatment by consumer. It probably shouldn't matter for regular usage, but we all know about insane full-color drawings that sometimes appear in subs...

I think you should create struct FadeParams and encapsulate alpha calculation inside it:

int event_start = cur->dst_y - render_priv->settings.top_margin;
int event_end = event_start + cur->h;

FadeParams fade = fade_init(state->scroll_y0 * state->screen_scale_y,
                            state->scroll_y1 * state->screen_scale_y,
                            fade_height, cur->color);

int a = fade_alpha(&fade, event_start);
cur->color = (cur->color & ~0xFF) | a;
for (int y = event_start + 1; y < event_end; y++) {
    int prev_a = a;
    a = fade_alpha(&fade, y);
    if (a == prev_a)
        continue;

    ASS_Image *nimg = ass_image_borrow_copy(cur);
    nimg->dst_y = y + render_priv->settings.top_margin;
    cur->h = nimg->dst_y - cur->dst_y;
    nimg->h -= cur->h;
    nimg->bitmap += cur->h * cur->stride;
    cur->next = nimg;
    cur = nimg;

    cur->color = (cur->color & ~0xFF) | a;
}

It would be less copy-paste and all rounding variants would be neatly localized in FadeParams and its functions.

Use lround as opposed to lrint to ensure smoother animations

For all sane use cases lround = lrint, so I don't understand what are you meaning here. Personally I prefer lrint as it can be optimized slightly better than lround.

@astiob
Copy link
Member Author
astiob commented Nov 20, 2022

For all sane use cases lround = lrint, so I don't understand what are you meaning here.

I, in turn, don’t understand what you mean by sane use cases. I gave an example of what I mean in the same commit message. Consider a 720p video/script on a 1080p screen: the scaling factor is exactly 1.5. Whole pixels in the source turn into 1.5 display pixels each. lrint rounds successive 1px steps—1.5px steps after scaling—to intervals of 1px, 2px, 2px, 1px, 1px, 2px, 2px, 1px, 1px, 2px, 2px… lround, in contrast, gives 1px, 2px, 1px, 2px, 1px, 2px, 1px, 2px… which is much smoother. In fact, this applies not only to animations (as I wrote in the commit message) where this stepping happens over time, but also to those same insane typeset signs™ where it happens within the span of a single frame, with a separate event for each pixel.

Copy link
Member
@TheOneric TheOneric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m very slightly in favour of emulating VSFilter’s start position (even without emulating rounding errors), but I’m also fine with leaving it out unless we actually get bug reports about it.

The cache-reuse argument for using multiple ASS_Images with a shared cached buffer makes sense to me. Though I can also see how if the events are short-lived, letting API-users blend more, mostly 1px-wide/high images may end up being a bit worse than us premultiplying with alpha bitmap and our mul_bitmaps assembly-variant (or theoretical special-purpose assembly). If this becomes a point of contention, we could create artificial samples for both cases and actually measure the perf diff.

For \clip and Scroll effects, scaled coordinates may not be integers.
Round each coordinate to the nearest integer, not towards zero.

Use lround as opposed to lrint to ensure smoother animations
in case of e. g. a 1px animation step at a 1.5x scaling factor:
lround gives successive steps of 2px, 1px, 2, 1, 2, 1, 2, 1, 2...
whereas lrint would give 2, 1, 1, 2, 2, 1, 1, 2, 2...
(The same reasoning has previously been applied
to \kf coordinates in render_and_combine_glyphs.)

For what it's worth, for Scroll effects, this exactly matches all VSFilter
variants and setups except specific rare cases on MPC-HC's and MPC-BE's
internal subtitle renderers where PlayRes differs from display resolution
(or video storage resolution if the renderer is built as a standalone
DirectShow filter) and MPC ends up off by 1px after double-rounding scaled
Scroll boundaries.

However, for rectangular \clip, all VSFilters truncate the scaled coordinates,
and XySubFilter truncates them twice: at video-storage *and* display
resolution. Furthermore, they compute \clip\t(\clip) relative to the
truncated scaled starting coordinates, whereas libass currently computes it
relative to the truncated PlayRes starting coordinates.

Events that have \clip are additionally clipped to the video area,
but the old code used floating-point arithmetic and rounded the result
also towards zero. For practical values of PlayRes and display size
on practical machines with a 53-bit double-precision float mantissa,
this arithmetic is exact and no loss of precision occurs, although
with a theoretical huge resolution this could lead to the rightmost
pixel column or bottommost pixel row of the video area to be falsely
clipped off (if PlayRes * display res / PlayRes < display res).
But at any rate, this arithmetic isn't necessary at all:
just use the known corresponding integer values.
This should work and match VSFilter when rendering to target
display resolution equal to layout (video storage) reslution,
except that our visual quality is better due to its limited
value range (0-64 for both base bitmap and gradient bitmap before
they're multiplied together) and more abundant rounding error.

When rendering to a different target display resolution, the location
of our fade bands as implemented in this commit will not exactly match
DirectVobSub's, and the greater the scaling factor, the bigger the difference
will be. XySubFilter also exhibits this problem, although our locations don't
exactly match XySubFilter's, either, except when PlayRes == LayoutRes.

The gradient is implemented as multiple ASS_Images to avoid spending time
and memory on a bitmap multiplication on every frame, as the event keeps
scrolling and the multiplied bitmap would be uncacheable. However, this
does increase the workload of the consumer and may also harm applications
that use ASS_Images to estimate event bounding boxes.
Compared to before, this shifts the gradient by half a pixel up/left when
rendering at PlayRes, but this new position is exactly reproduced when
rendering at any other resolution.

The linear gradient now covers exactly the area from the left edge
of the video to the right edge of the fadeawaywidthth PlayRes-pixel
(exactly fadeawaywidth PlayRes-pixels in total), similarly on the right,
and similarly from the top edge of the y0th PlayRes-pixel for exactly
fadeawayheight PlayRes-pixels down, and from the bottom edge of y1st
PlayRes-pixel for exactly fadeawayheight PlayRes-pixels up. I think
this is the most intuitive arrangement.

The gradient is sampled at the centre of each display-resolution pixel.

This does NOT match any VSFilter variant in any configuration. However, for a
fixed video viewed on ever-denser displays, I think MPC-HC/BE ISR's gradient
(and for PlayRes == video res, also XySubFilter's) tends to this same result
in its limit as the display resolution grows to infinity... or rather, would
tend if you could ignore its rounding error.
This still doesn't emulate the rounding error on its output *values*, though.
@astiob
Copy link
Member Author
astiob commented Feb 9, 2023

Pushed the non-fadeaway commits to master for now, with an expanded example regarding lrint vs lround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0