Backdrop scope implementation (#25)

Co-authored-by: Zachary Levy <zachary@sunforge.is>
Reviewed-on: #25
This commit was merged in pull request #25.
This commit is contained in:
2026-05-02 01:31:58 +00:00
parent 5317b8f142
commit e8ffa28de3
5 changed files with 633 additions and 349 deletions
+94 -62
View File
@@ -614,11 +614,11 @@ require, keeping each sub-pass well under the 32-register cliff.
frame renders into `source_texture` (a full-resolution single-sample texture owned by the
backdrop pipeline) instead of directly into the swapchain. At the end of the frame,
`source_texture` is copied to the swapchain via a single `CopyGPUTextureToTexture` call.
This means the bracket has no mid-frame texture copy: by the time the bracket runs,
`source_texture` already contains the pre-bracket frame contents and is the natural sampler
input. When no layer in the frame has a backdrop draw, the existing fast path runs: the frame
renders directly to the swapchain and the backdrop pipeline's working textures are never
touched. Zero cost for backdrop-free frames.
This means each bracket has no mid-frame texture copy: by the time a bracket runs,
`source_texture` already contains the contents written by everything that preceded it on the
timeline and is the natural sampler input. When no layer in the frame has a backdrop draw,
the existing fast path runs: the frame renders directly to the swapchain and the backdrop
pipeline's working textures are never touched. Zero cost for backdrop-free frames.
**Why not split the backdrop sub-passes into separate pipelines?** Each sub-pass is budgeted at ≤24
registers, well under Valhall's 32-register cliff, so there is no occupancy motivation for splitting.
@@ -628,81 +628,113 @@ all. Additionally, backdrop effects cover a small fraction of the frame's total
typical UI scales), so even if a sub-pass did cross a cliff, the occupancy variation within the
bracket would have negligible impact on frame time.
#### Bracket scheduling model
#### Bracket scheduling
The bracket is scheduled per layer, anchored at the first backdrop sub-batch in the layer's
submission order. Concretely, a layer with one or more backdrops splits into three groups:
Backdrop draws are scheduled via **explicit scopes**: every call to `backdrop_blur` must be wrapped
in a `begin_backdrop` / `end_backdrop` pair (or the RAII-style `backdrop_scope` wrapper). Each
scope produces exactly one bracket at render time. A layer may contain any number of scopes; draws
between scopes render at their submission position relative to the brackets, so the user controls
exactly which backdrops share a bracket.
1. **Pass A (pre-bracket)** — every non-backdrop sub-batch with index `< bracket_start_index`.
Renders to `source_texture` in a single render pass.
2. **The bracket** — every backdrop sub-batch in the layer (regardless of index). Runs one
downsample pass, then one (H-blur + V-composite) pass pair per unique sigma.
3. **Pass B (post-bracket)** — every non-backdrop sub-batch with index `>= bracket_start_index`.
Renders to `source_texture` with `LOAD`, drawing on top of the composited backdrop output.
At render time, `draw_layer` walks the layer's sub-batch list once, alternating between two run
kinds:
`bracket_start_index` is the absolute index of the first `.Backdrop` kind in the layer's sub-batch
range. If the layer has no backdrops, none of this kicks in and the layer renders in a single render
pass via the existing fast path.
- **Non-backdrop runs** are rendered to `source_texture` in one render pass via
`render_layer_sub_batch_range`. Clear-vs-load is tracked frame-globally via `GLOB.cleared`.
- **Backdrop runs** are dispatched to `run_backdrop_bracket` with their index range. Each run is
one bracket; the bracket opens and closes its own render passes for downsample, H-blur, V-blur,
and composite stages.
Per-sigma-group execution. The bracket walks each layer's sub-batches and groups contiguous
`.Backdrop` sub-batches that share a sigma; each group picks its own downsample factor (1, 2, or 4)
based on `compute_backdrop_downsample_factor`. For each group it runs four sub-passes: a downsample
from `source_texture` to `downsample_texture`; an H-blur from `downsample_texture` to
`h_blur_texture`; a V-blur from `h_blur_texture` back into `downsample_texture` (ping-pong reuse);
and finally a composite that reads the fully-blurred `downsample_texture`, applies the SDF mask
and tint, and writes the result to `source_texture`. Sub-batch coalescing in
`append_or_extend_sub_batch` merges contiguous same-sigma backdrops into a single instanced
composite draw; non-contiguous same-sigma backdrops still share the blur output but issue separate
composite draws.
Within a bracket, the scheduler groups contiguous same-sigma sub-batches and runs four sub-passes
per group: downsample (`source_texture``downsample_texture`), H-blur (`downsample_texture`
`h_blur_texture`), V-blur (`h_blur_texture``downsample_texture`, ping-pong reuse), and
composite (`downsample_texture``source_texture` with SDF mask and tint applied). Each group
picks its own downsample factor (1, 2, or 4) based on sigma; see the comment block at the top of
`backdrop.odin` for the factor-selection table.
The working textures are sized at the full swapchain resolution; larger downsample factors only
fill a sub-rect via viewport-limited rendering (see the comment block at the top of `backdrop.odin`
for the factor-selection table and rationale).
Sub-batch coalescing in `append_or_extend_sub_batch` merges contiguous same-sigma backdrops
sharing one scissor into a single instanced composite draw. Same-sigma backdrops separated by a
`ScissorStart` boundary stay in one sigma group (one set of blur passes) but issue separate
composite draws; the composite pass calls `SetGPUScissor` between draws when the active scissor
changes.
#### Submission-order trade-off
Working textures are sized at full swapchain resolution; larger downsample factors fill a sub-rect
via viewport-limited rendering.
Within Pass A and Pass B, sub-batches render in the user's submission order. What the bracket model
sacrifices is _interleaved_ ordering between backdrop and non-backdrop content within a single
layer. A non-backdrop sub-batch submitted between two backdrops still renders in Pass B (after the
bracket), not at its submission position. Worked example:
#### Scope contract
```
draw.rectangle(layer, bg, GRAY) // 0 Tessellated → Pass A
draw.rectangle(layer, card_blue, BLUE) // 1 SDF → Pass A
draw.gaussian_blur(layer, panelA, sigma=12) // 2 Backdrop → Bracket (sees: bg + blue card)
draw.rectangle(layer, card_red, RED) // 3 SDF → Pass B (drawn ON TOP of panelA)
draw.gaussian_blur(layer, panelB, sigma=12) // 4 Backdrop → Bracket (sees: bg + blue card; same as panelA)
draw.text(layer, "label", ...) // 5 Text → Pass B (drawn ON TOP of both panels)
```
Scope state is global: `GLOB.open_backdrop_layer` tracks the currently-open scope (or `nil`) for
the whole renderer. The five misuse cases panic via `log.panic` / `log.panicf`:
In this layer, panelB does _not_ see card_red — even though card_red was submitted before panelB —
because both backdrops sample `source_texture` as it stood at the bracket entry, which is after
Pass A and before card_red has rendered. card_red ends up on top of panelA, not underneath it.
1. `backdrop_blur` called outside an open scope.
2. A non-backdrop draw call issued on a layer with an open scope. Asserted at the top of
`append_or_extend_sub_batch`.
3. `new_layer` called while a scope is open.
4. `end()` called while a scope is open.
5. `begin_backdrop` while one is already open, or `end_backdrop` on the wrong layer.
The user controls the alternative outcome by splitting layers. Putting card_red and panelB into a
new layer (via `draw.new_layer`) gives panelB a fresh source snapshot that includes panelA and
card_red:
Worked example with two scopes on the same layer:
```
base := draw.begin(...)
draw.rectangle(base, bg, GRAY)
draw.rectangle(base, card_blue, BLUE)
draw.gaussian_blur(base, panelA, sigma=12) // panelA in base layer's bracket
top := draw.new_layer(base, ...)
draw.rectangle(top, card_red, RED)
draw.gaussian_blur(top, panelB, sigma=12) // top layer's bracket; sees base + card_red
draw.text(top, "label", ...)
{
draw.backdrop_scope(base)
draw.backdrop_blur(base, panelA, sigma=12) // bracket 1: sees bg + blue card
}
draw.rectangle(base, card_red, RED) // renders ON TOP of panelA's composite
{
draw.backdrop_scope(base)
draw.backdrop_blur(base, panelB, sigma=12) // bracket 2: sees bg + blue card + panelA + card_red
}
draw.text(base, "label", ...) // renders ON TOP of panelB's composite
```
Why one bracket per layer and not one per backdrop? Each bracket adds three render passes
(downsample + H-blur + V-composite) and at least three tile-cache flushes on tilers like Mali
Valhall. Strict submission-order semantics would require one bracket per cluster of contiguous
backdrops, which scales the GPU cost linearly with how interleaved the user's submission happens
to be — a footgun. The current design caps the bracket cost per layer regardless of submission
interleave, and gives the user explicit control over ordering through the existing layer
abstraction. This matches the cost/complexity envelope of iOS `UIVisualEffectView` and CSS
`backdrop-filter` (both of which constrain backdrop ordering implicitly).
Each bracket adds four render passes (downsample + H-blur + V-blur + composite) plus tile-cache
flushes on tilers like Mali Valhall, so users who don't need interleaving should group backdrops
into a single scope to amortize:
```
{
draw.backdrop_scope(base)
draw.backdrop_blur(base, panelA, sigma=12) // shares one bracket with panelB;
draw.backdrop_blur(base, panelB, sigma=12) // same sigma also coalesces into one
} // instanced composite draw call
```
#### Clay integration: `Backdrop_Marker`
Clay has no notion of backdrops. The integration uses Clay's only extension point — the opaque
`customData: rawptr` on `clay.CustomElementConfig` — to carry a magic-number-tagged struct that
`prepare_clay_batch` recognizes:
```
Backdrop_Marker :: struct {
magic: u32, // BACKDROP_MARKER_MAGIC (0x42445054, 'BDPT')
sigma: f32,
tint: Color,
radii: Rectangle_Radii,
feather_px: f32,
}
```
The user populates a `Backdrop_Marker` (with stable lifetime through the `prepare_clay_batch`
call) and points the corresponding `clay.CustomElementConfig.customData` at it.
`prepare_clay_batch` walks Clay's command stream once, calling `is_clay_backdrop` per command
(a u32 magic check on `customData`'s first 4 bytes). On a hit it opens a backdrop scope (or
extends an open one) and dispatches via `backdrop_blur`. Non-backdrop commands issued during an
open scope go to a deferred index buffer for replay after the scope closes; this preserves Clay's
painter's-algorithm ordering across backdrops without violating the scope contract.
The magic-number sentinel keeps the marker type self-describing in core dumps and decouples the
integration from Clay-side changes. Zero-init memory has `magic = 0`, so a marker with a forgotten
magic field gets routed through the regular `custom_draw` path and surfaces as "my custom draw
never fired" rather than as a silent backdrop schedule.
### Vertex layout