-
Notifications
You must be signed in to change notification settings - Fork 158
[Vello Hybrid]: Clipping (Spatiotemporal Allocation) #957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit has a sketch of spatio-temporal allocation for clipping, but it is not fully wired up yet. Scenes without clipping should work, but there is a fair amount of TODO remaining for clipping. There's a fair amount of refactoring here. The biggest change is that draw calls and render passes can be issued from inside the scheduler, as opposed to separate "prepare" and "render" calls. The number of render passes needed will vary by the scene.
sparse_strips/vello_sparse_tests/snapshots/image_with_transform_rotate_1.png
Outdated
Show resolved
Hide resolved
@@ -67,6 +91,50 @@ fn clip_rectangle_with_star_evenodd(ctx: &mut impl Renderer) { | |||
ctx.pop_layer(); | |||
} | |||
|
|||
#[vello_test] | |||
fn clip_deeply_nested_circles(ctx: &mut impl Renderer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like this test to exercise creating many rounds in the scheduler.
// TODO: We currently allocate a new strips buffer for each render pass. A more efficient | ||
// approach would be to re-use buffers or slices of a larger buffer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is such a large refactor of vello hybrid that I wanted to keep optimisations like this out of this PR to limit added complexity.
sparse_strips/vello_sparse_tests/snapshots/fill_command_respects_clip_bounds.png
Outdated
Show resolved
Hide resolved
let alphas_texture = | ||
Self::make_alphas_texture(device, max_texture_dimension_2d, alpha_texture_height); | ||
let alpha_data = vec![0; (max_texture_dimension_2d * alpha_texture_height * 16) as usize]; | ||
const INITIAL_ALPHA_TEXTURE_HEIGHT: u32 = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non actionable 鈥 for my own understanding, why is it a safe change to change INITIAL_ALPHA_TEXTURE_HEIGHT
from 2
to 1
? What does it impact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was motivated by wanting to keep initial allocations small. If the scene requires more alphas, that can be done in prepare
(where we re-allocate GPU resources to contain the scene we want to render).
}); | ||
|
||
let strip_shader = device.create_shader_module(wgpu::ShaderModuleDescriptor { | ||
label: Some("Strip Shader"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: may be nice if label matches file - e.g. Render Strips Shader
. Similarly strip_shader
-> render_strips_shader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't like the render_*
prefix because render_
could arguably be put in front of so many classes of things. I also didn't like renaming the shaders to strips
. I'll sleep on it, but if anyone has a better name, please let me know 馃檹 . Maybe "fine" for fine rasterisation (but arguably that includes "clear_slots" too).
/// Errors that can occur during rendering. | ||
#[derive(Error, Debug)] | ||
pub enum RenderError { | ||
/// No slots available for rendering. | ||
/// | ||
/// This error is likely to occur if a scene has an extreme number of nested layers | ||
/// (clipping, blending, masks, or opacity layers). | ||
/// | ||
/// TODO: Consider supporting more than a single column of slots in slot textures. | ||
#[error("No slots available for rendering")] | ||
SlotsExhausted, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there other errors we will want to add?
Should unwrap
and expect
methods actually return a RenderError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm weary about making even more changes to vello_hybrid
in this single PR, so I've left a TODO and will leave this as a separate exercise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in 41f78ba
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I wasn't clear in the initial comment. I meant this as a philosophical question about the future. Do you expect any fallible behaviors to be moved into this enum in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. For example, I think when there are too many alphas to fit into the device's alpha texture, we should return an appropriate render error.
The question of returning a RenderError
for unwraps and expects which we believe are safe is another question entirely. Historically, if I believe I'm smarter than the compiler, I lean towards panicking, but I think this decision should be made by Linebender as a group. I'll ask about it in Utrecht.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've unresolved this so that it gets greater visibility from other reviewers.
sample_count: 1, | ||
dimension: wgpu::TextureDimension::D2, | ||
// TODO: Is this correct or need it be RGBA8Unorm? | ||
format: render_target_config.format, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who sets the rendering target? It looks like it's set by the caller.
In our render_to_file
example, we appear to pass Rgba8Unorm
as the render target config format. Whilst for the winit
example we pass Bgra8Unorm
as the surface format.
I guess another question is related to the texture definition in the shader. In wgsl
clip_input_texture
is typed as texture_2d<f32>
.
Is it expected that the vello_hybrid renderer works with all wgsl texture formats? (sorry for the naivety).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a naive question. We only support Rgba8Unorm and Bgra8Unorm currently I believe. I'll confirm this.
}); | ||
let clear_slot_indices_buffer = Self::make_clear_slot_indices_buffer( | ||
device, | ||
slot_count as u64 * size_of::<u32>() as u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The u32
attribute used for the slot index
that's passed to the clear slots wgsl shader seems brittle in it's definition when configuring the descriptors / pipeline.
A parallel is the GpuStrip
which encapsulates the attribute passed to the render_strips
shader. Is it worth using the New Type Idiom to wrap this u32
such that it's traceable to all locations that need to use the accurate size?
For example this line of code. But also the array_stride
in the clear_pipeline
?
It could also provide the attributes?
Maybe this isn't worth doing for 2 code locations.
if opacity.is_some() { | ||
unimplemented!() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may figure this out as I keep reviewing - but I am naively confused how opacity is not implemented or even how it's expected to be implemented, when there are tests like clip_with_opacity
.
Is this assumption accurate? In the test clip_with_opacity
, a clip layer is pushed, and then a rect is filled with alpha 0.5. Would the equivalent in the future be: push a clip layer with opacity set to 0.5
, and then paint a fully opaque rect. Would the opacity in the clip layer transfer to the child drawn within it and the two cases would render the same?
Also, I just realized I am a little confused about using the name alpha
vs opacity
. Are they the same thing, e.g. an alpha of 1
and an opacity of 1
are both opaque. Should terms be consolidated?
Edit: yep, I confirmed opacity here behaves as I expect.
clip, | ||
BlendMode::new(Mix::Normal, Compose::SrcOver), | ||
None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize I don't actually fully understand the difference between clipping and masking.
Clarifying: A mask is essentially a pixmap (alpha or luminance), whilst a clip_path lets you clip via a bezier path. It's not immediately intuitive that a clip_path is much more challenging than a mask that's "straightforward".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clipping is basically a special case of an alpha mask, where everything inside of the clip shape is fully opaque in the alpha mask, and everything outside if fully transparent. So you could in theory emulate it with an alpha mask, but since clipping is such a common case we have a custom logic that is more complex, but faster.
|
||
for (i, color) in colors.iter().enumerate() { | ||
let clip_rect = Rect::new((i as f64) * OFFSET, 0.0, WIDTH, HEIGHT); | ||
ctx.push_clip_layer(&clip_rect.to_path(0.1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Worth defining 0.1
as DEFAULT_TOLERANCE
(similar to vello_cpu)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, definitely. But since that touches every file in tests/*
, let's leave it for a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now technically read over all the code and it looks great! I will need to give the schedular a second pass as there are still some mysteries in there.
Unfortunately as someone relatively new to this repo, my comments have been focused on nits and syntax, and less on the overall design. However, overall design seems to be working per tests.
Great work!
//! At this point, we can sample the pixels from slot N - 1 of texture 1 to draw the final | ||
//! result. | ||
//! | ||
//! ## Nuances |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this section is missing how slot depth maps to the slot textures. E.g. odd vs even slot depths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added notes in c1e9097
junk: &mut RendererJunk<'_>, | ||
scene: &Scene, | ||
) -> Result<(), RenderError> { | ||
let mut tile_state = mem::take(&mut self.tile_state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite following why we don't just create a brand new tile_state
vec allocation here, vs storing it on self
?
Naively, mem::take
replaces tile_state with an empty vec, does that incur an allocation? Then at the end we re-set tile_state
back on self
.
This comment seems perf related so can be ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty vectors in rust do not allocate, so taking the memory and returning it later allows us to reuse the allocation without creating more allocations
let clip_x = u32(in.position.x) & 0xFFu; | ||
let clip_y = (u32(in.position.y) & 3) + in.rgba_or_slot * config.strip_height; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me how in.position.y
has been modified in this PR. From the scheduler, it looks like it also contains the slot_ix * Tile::HEIGHT
, or slot y position. So why the & 3
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the answer to this question should also potentially result in a code comment in the wgsl shader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in.position
represents the position builtin
. In a vertex shader, it ranges from -1 to 1 for X and Y. In the context of a fragment shader, it represents the pixel coordinate of where we are drawing (see this article). The & 3
is used to constrain the pixel coordinate to 4 since that's the height of our tile. I added a CAUTION:
note to the config.strip_height
about the danger in changing its value without updating this logic.
In time, we will want to make this configurable, but I'm not sure how that will present. We could make the & 3
configurable, but then we should also make the & 0xFFu
configurable to wide tile width. I think at this stage we should untangle those concerns when we get to them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh whoops, I got my x and y's confused! Thanks for the great answer :D
}); | ||
} | ||
Cmd::PushBuf => { | ||
let ix = clip_depth % 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 3 draws in a round representing, slot 0, slot 1, and final texture. In draw_mut
the Draw slot corresponds to 1 - clip_depth % 2
.
Should these be aligned such that this is 1 - clip_depth % 2
. I think all that should do is ensure the slot 0 free vec and slot 0 draw are on the same index and it shouldn't change logic.
Edit: I think it's all a bit more complex. The choice between the two is intentional and changes in multiple places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just rubber stamping since we discussed in office hours that this is fine to merge, and Andrew doesn't seem to have write permissions yet.
debug_assert!( | ||
has_non_zero_alpha(rgba), | ||
"Color fields with 0 alpha are reserved for clipping" | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raised in Zulip#vello > Vello Hybrid Crashing when alpha = 0 @ 馃挰.
What is the recommendation if someone wants to pass a fully transparent fill? Additionally, is there a use-case of painting something fully transparent?
Context
Hooks up and addresses all the TODO's in Raph's sketch (#934). See this thread#vello > Spatiotemporal allocation (hybrid) @ 馃挰.
Notes
GpuResources
non-optional - I thought this was more trouble than it was worth.clear_slots
pipeline to support fine grained clearing of slots in slot textures (needed for spatiotemporal allocation)vello_hybrid
write_buffer
)I've copied the documentation from
schedule.rs
below for information about how spatiotemporal allocation works:Scheduling
we flush rounds (i.e. execute render passes) to free up space. Note that a slot refers
to 1 wide tile's worth of pixels in the clip texture.
free
vector contains the indices of the slots that are available for use in the two clip textures.Example
Consider the following scene of drawing a single wide tile with three overlapping rectangles with
decreasing width clipping regions.
This single wide tile scene should produce the below rendering:
How the scene is scheduled into rounds and draw calls are shown below:
Round 0
In this round, we don't have any preserved slots or slots that we need to sample from. Simply,
draw unclipped primitives.
Draw to texture 0:
In Slot N - 1 of texture 0, draw the unclipped green rectangle.
Slot N - 1:
Draw to texture 1:
In Slot N - 2 of texture 1, draw unclipped red rectangle and, in slot N - 1, draw the unclipped
blue rectangle.
Slot N - 2:
Slot N - 1:
Round 1
At this point, we have three slots that contain our unclipped rectangles. In this round,
we start to sample those pixels to apply clipping (texture 1 samples from texture 0 and
the render target view samples from texture 1).
Draw to texture 0:
Slot N - 1 of texture 0 contains our unclipped green rectangle. In this draw, we sample
the pixels from slot N - 2 from texture 1 to draw the blue rectangle into this slot.
Slot N - 1:
Draw to texture 1:
Then, into Slot N - 2 of texture 1, which contains our red rectangle, we sample the pixels
from slot N - 1 of texture 0 which contain our green and blue rectangles.
Draw to render target
At this point, we can sample the pixels from slot N - 1 of texture 1 to draw the final
rendition.
Nuances
are scheduled. At clip depths 1 and 2, we can draw directly to the final target within a
single round.
we can use a
LoadOp::Clear
operation. Otherwise, we need to clear the dirty slots usinga fine grained render pass.
For more information about this algorithm, see this Zulip thread.