Added backdrop effects pipeline (blur)

This commit is contained in:
Zachary Levy
2026-04-28 22:12:25 -07:00
parent ff29dbd92f
commit 16989cbb71
29 changed files with 2931 additions and 415 deletions
+92 -18
View File
@@ -52,9 +52,12 @@ statically allocates registers for the worst-case path (Ring_Arc) regardless of
fragment actually evaluates, so all fragments pay the occupancy cost of the heaviest branch. This is
a documented limitation, not a design constraint (see "Known limitations: V3D and Bifrost" below).
MSAA is opt-in (default `._1`, no MSAA) via `Init_Options.msaa_samples`. SDF rendering does not
benefit from MSAA because fragment coverage is computed analytically. MSAA remains useful for text
glyph edges and tessellated user geometry if desired.
MSAA is intentionally not supported. SDF text and shapes compute fragment coverage analytically
via `smoothstep`, so they don't benefit from multisampling. Tessellated user geometry submitted via
`prepare_shape` is rendered without anti-aliasing — if AA is required for tessellated content, the
caller must render it to their own offscreen target and submit the result as a texture. This
decision matches RAD Debugger's architecture and aligns with the SBC target (Mali Valhall, where
MSAA's per-tile bandwidth multiplier is expensive).
## 2D rendering pipeline plan
@@ -249,9 +252,9 @@ API where each layer draws shadows before quads before glyphs. Our design avoids
submission order is draw order, no layer juggling required.
**PSO compilation costs multiply.** Each pipeline takes 150ms to compile on Metal/Vulkan/D3D12 at
first use. 7 pipelines is ~175ms cold startup; 3 pipelines is ~75ms. Adding state axes (MSAA
variants, blend modes, color formats) multiplies combinatorially — a 2.3× larger variant matrix per
additional axis with 7 pipelines vs 3.
first use. 7 pipelines is ~175ms cold startup; 3 pipelines is ~75ms. Adding state axes (blend
modes, color formats) multiplies combinatorially — a 2.3× larger variant matrix per additional
axis with 7 pipelines vs 3.
**Branching cost comparison: unified vs per-kind in the effects pipeline.** The effects pipeline is
the strongest candidate for per-kind splitting because effect branches are heavier than shape
@@ -587,27 +590,29 @@ Wallace's variant) and vger-rs.
### Backdrop pipeline
The backdrop pipeline handles effects that sample the current render target as input: frosted glass,
refraction, mirror surfaces. It is separated from the effects pipeline for a structural reason, not
register pressure.
refraction, mirror surfaces. It is separated from the main and effects pipelines for a structural
reason, not register pressure.
**Render-pass boundary.** Before any backdrop-sampling fragment can run, the current render target
must be copied to a separate texture via `CopyGPUTextureToTexture`. This is a command-buffer-level
operation that cannot happen mid-render-pass. The copy naturally creates a pipeline boundary that no
amount of shader optimization can eliminate — it is a fundamental requirement of sampling a surface
while also writing to it.
must be in a sampler-readable state. A draw call that samples the render target it is also writing
to is a hard GPU constraint; the only way to satisfy it is to end the current render pass and start
a new one. That render-pass boundary is what a “bracket” is.
**Multi-pass implementation.** Backdrop effects are implemented as separable multi-pass sequences
(downsample → horizontal blur → vertical blurcomposite), following the standard approach used by
(downsample → horizontal blur → vertical-blur+composite), following the standard approach used by
iOS `UIVisualEffectView`, Android `RenderEffect`, and Flutter's `BackdropFilter`. Each individual
sub-pass is budgeted at **≤24 registers** (same as the main pipeline — full Valhall occupancy). The
multi-pass approach avoids the monolithic 70+ register shader that a single-pass Gaussian blur would
require, keeping each sub-pass well under the 32-register cliff.
**Bracketed execution.** All backdrop draws in a frame share a single bracketed region of the command
buffer: end the current render pass, copy the render target, execute all backdrop sub-passes, then
resume normal drawing. The entry/exit cost (texture copy + render-pass break) is paid once per frame
regardless of how many backdrop effects are visible. When no backdrop effects are present, the bracket
is never entered and the texture copy never happens — zero cost.
**Approach B: render-target choice.** When any layer in the frame contains a backdrop draw, the
entire frame renders into `source_texture` (a full-resolution single-sample texture owned by the
backdrop pipeline) instead of directly into the swapchain. At the end of the frame, `source_texture`
is copied to the swapchain via a single `CopyGPUTextureToTexture` call. This means the bracket has
no mid-frame texture copy: by the time the bracket runs, `source_texture` already contains the pre-
bracket frame contents and is the natural sampler input. When no layer in the frame has a backdrop
draw, the existing fast path runs: the frame renders directly to the swapchain and the backdrop
pipeline's working textures are never touched. Zero cost for backdrop-free frames.
**Why not split the backdrop sub-passes into separate pipelines?** Each sub-pass is budgeted at ≤24
registers, well under Valhall's 32-register cliff, so there is no occupancy motivation for splitting.
@@ -617,6 +622,75 @@ all. Additionally, backdrop effects cover a small fraction of the frame's total
typical UI scales), so even if a sub-pass did cross a cliff, the occupancy variation within the
bracket would have negligible impact on frame time.
#### Bracket scheduling model
The bracket is scheduled per layer, anchored at the first backdrop sub-batch in the layer's
submission order. Concretely, a layer with one or more backdrops splits into three groups:
1. **Pass A (pre-bracket)** — every non-backdrop sub-batch with index `< bracket_start_index`.
Renders to `source_texture` in a single render pass.
2. **The bracket** — every backdrop sub-batch in the layer (regardless of index). Runs one
downsample pass, then one (H-blur + V-composite) pass pair per unique sigma.
3. **Pass B (post-bracket)** — every non-backdrop sub-batch with index `>= bracket_start_index`.
Renders to `source_texture` with `LOAD`, drawing on top of the composited backdrop output.
`bracket_start_index` is the absolute index of the first `.Backdrop` kind in the layer's sub-batch
range. If the layer has no backdrops, none of this kicks in and the layer renders in a single render
pass via the existing fast path.
The downsample runs once per layer, not once per sigma: it just copies `source_texture` to a ¼-
resolution working texture and doesn't depend on the kernel. Each unique sigma in the layer triggers
one H-blur (reads `downsample_texture`, writes `h_blur_texture`) and one V-composite (reads
`h_blur_texture`, writes `source_texture` per-primitive with the SDF mask). Sub-batch coalescing in
`append_or_extend_sub_batch` merges contiguous same-sigma backdrops into a single instanced V-
composite draw call; non-contiguous same-sigma backdrops still share the H-blur output but issue
separate V-composite draws.
#### Submission-order trade-off
Within Pass A and Pass B, sub-batches render in the user's submission order. What the bracket model
sacrifices is *interleaved* ordering between backdrop and non-backdrop content within a single
layer. A non-backdrop sub-batch submitted between two backdrops still renders in Pass B (after the
bracket), not at its submission position. Worked example:
```
draw.rectangle(layer, bg, GRAY) // 0 Tessellated → Pass A
draw.rectangle(layer, card_blue, BLUE) // 1 SDF → Pass A
draw.rectangle_backdrop(layer, panelA, 12) // 2 Backdrop → Bracket (sees: bg + blue card)
draw.rectangle(layer, card_red, RED) // 3 SDF → Pass B (drawn ON TOP of panelA)
draw.rectangle_backdrop(layer, panelB, 12) // 4 Backdrop → Bracket (sees: bg + blue card; same as panelA)
draw.text(layer, "label", ...) // 5 Text → Pass B (drawn ON TOP of both panels)
```
In this layer, panelB does *not* see card_red — even though card_red was submitted before panelB —
because both backdrops sample `source_texture` as it stood at the bracket entry, which is after
Pass A and before card_red has rendered. card_red ends up on top of panelA, not underneath it.
The user controls the alternative outcome by splitting layers. Putting card_red and panelB into a
new layer (via `draw.new_layer`) gives panelB a fresh source snapshot that includes panelA and
card_red:
```
base := draw.begin(...)
draw.rectangle(base, bg, GRAY)
draw.rectangle(base, card_blue, BLUE)
draw.rectangle_backdrop(base, panelA, 12) // panelA in base layer's bracket
top := draw.new_layer(base, ...)
draw.rectangle(top, card_red, RED)
draw.rectangle_backdrop(top, panelB, 12) // top layer's bracket; sees base + card_red
draw.text(top, "label", ...)
```
Why one bracket per layer and not one per backdrop? Each bracket adds three render passes
(downsample + H-blur + V-composite) and at least three tile-cache flushes on tilers like Mali
Valhall. Strict submission-order semantics would require one bracket per cluster of contiguous
backdrops, which scales the GPU cost linearly with how interleaved the user's submission happens
to be — a footgun. The current design caps the bracket cost per layer regardless of submission
interleave, and gives the user explicit control over ordering through the existing layer
abstraction. This matches the cost/complexity envelope of iOS `UIVisualEffectView` and CSS
`backdrop-filter` (both of which constrain backdrop ordering implicitly).
### Vertex layout
The vertex struct is unchanged from the current 20-byte layout: