Backdrop scope implementation (#25)
Co-authored-by: Zachary Levy <zachary@sunforge.is> Reviewed-on: #25
This commit was merged in pull request #25.
This commit is contained in:
+94
-62
@@ -614,11 +614,11 @@ require, keeping each sub-pass well under the 32-register cliff.
|
||||
frame renders into `source_texture` (a full-resolution single-sample texture owned by the
|
||||
backdrop pipeline) instead of directly into the swapchain. At the end of the frame,
|
||||
`source_texture` is copied to the swapchain via a single `CopyGPUTextureToTexture` call.
|
||||
This means the bracket has no mid-frame texture copy: by the time the bracket runs,
|
||||
`source_texture` already contains the pre-bracket frame contents and is the natural sampler
|
||||
input. When no layer in the frame has a backdrop draw, the existing fast path runs: the frame
|
||||
renders directly to the swapchain and the backdrop pipeline's working textures are never
|
||||
touched. Zero cost for backdrop-free frames.
|
||||
This means each bracket has no mid-frame texture copy: by the time a bracket runs,
|
||||
`source_texture` already contains the contents written by everything that preceded it on the
|
||||
timeline and is the natural sampler input. When no layer in the frame has a backdrop draw,
|
||||
the existing fast path runs: the frame renders directly to the swapchain and the backdrop
|
||||
pipeline's working textures are never touched. Zero cost for backdrop-free frames.
|
||||
|
||||
**Why not split the backdrop sub-passes into separate pipelines?** Each sub-pass is budgeted at ≤24
|
||||
registers, well under Valhall's 32-register cliff, so there is no occupancy motivation for splitting.
|
||||
@@ -628,81 +628,113 @@ all. Additionally, backdrop effects cover a small fraction of the frame's total
|
||||
typical UI scales), so even if a sub-pass did cross a cliff, the occupancy variation within the
|
||||
bracket would have negligible impact on frame time.
|
||||
|
||||
#### Bracket scheduling model
|
||||
#### Bracket scheduling
|
||||
|
||||
The bracket is scheduled per layer, anchored at the first backdrop sub-batch in the layer's
|
||||
submission order. Concretely, a layer with one or more backdrops splits into three groups:
|
||||
Backdrop draws are scheduled via **explicit scopes**: every call to `backdrop_blur` must be wrapped
|
||||
in a `begin_backdrop` / `end_backdrop` pair (or the RAII-style `backdrop_scope` wrapper). Each
|
||||
scope produces exactly one bracket at render time. A layer may contain any number of scopes; draws
|
||||
between scopes render at their submission position relative to the brackets, so the user controls
|
||||
exactly which backdrops share a bracket.
|
||||
|
||||
1. **Pass A (pre-bracket)** — every non-backdrop sub-batch with index `< bracket_start_index`.
|
||||
Renders to `source_texture` in a single render pass.
|
||||
2. **The bracket** — every backdrop sub-batch in the layer (regardless of index). Runs one
|
||||
downsample pass, then one (H-blur + V-composite) pass pair per unique sigma.
|
||||
3. **Pass B (post-bracket)** — every non-backdrop sub-batch with index `>= bracket_start_index`.
|
||||
Renders to `source_texture` with `LOAD`, drawing on top of the composited backdrop output.
|
||||
At render time, `draw_layer` walks the layer's sub-batch list once, alternating between two run
|
||||
kinds:
|
||||
|
||||
`bracket_start_index` is the absolute index of the first `.Backdrop` kind in the layer's sub-batch
|
||||
range. If the layer has no backdrops, none of this kicks in and the layer renders in a single render
|
||||
pass via the existing fast path.
|
||||
- **Non-backdrop runs** are rendered to `source_texture` in one render pass via
|
||||
`render_layer_sub_batch_range`. Clear-vs-load is tracked frame-globally via `GLOB.cleared`.
|
||||
- **Backdrop runs** are dispatched to `run_backdrop_bracket` with their index range. Each run is
|
||||
one bracket; the bracket opens and closes its own render passes for downsample, H-blur, V-blur,
|
||||
and composite stages.
|
||||
|
||||
Per-sigma-group execution. The bracket walks each layer's sub-batches and groups contiguous
|
||||
`.Backdrop` sub-batches that share a sigma; each group picks its own downsample factor (1, 2, or 4)
|
||||
based on `compute_backdrop_downsample_factor`. For each group it runs four sub-passes: a downsample
|
||||
from `source_texture` to `downsample_texture`; an H-blur from `downsample_texture` to
|
||||
`h_blur_texture`; a V-blur from `h_blur_texture` back into `downsample_texture` (ping-pong reuse);
|
||||
and finally a composite that reads the fully-blurred `downsample_texture`, applies the SDF mask
|
||||
and tint, and writes the result to `source_texture`. Sub-batch coalescing in
|
||||
`append_or_extend_sub_batch` merges contiguous same-sigma backdrops into a single instanced
|
||||
composite draw; non-contiguous same-sigma backdrops still share the blur output but issue separate
|
||||
composite draws.
|
||||
Within a bracket, the scheduler groups contiguous same-sigma sub-batches and runs four sub-passes
|
||||
per group: downsample (`source_texture` → `downsample_texture`), H-blur (`downsample_texture` →
|
||||
`h_blur_texture`), V-blur (`h_blur_texture` → `downsample_texture`, ping-pong reuse), and
|
||||
composite (`downsample_texture` → `source_texture` with SDF mask and tint applied). Each group
|
||||
picks its own downsample factor (1, 2, or 4) based on sigma; see the comment block at the top of
|
||||
`backdrop.odin` for the factor-selection table.
|
||||
|
||||
The working textures are sized at the full swapchain resolution; larger downsample factors only
|
||||
fill a sub-rect via viewport-limited rendering (see the comment block at the top of `backdrop.odin`
|
||||
for the factor-selection table and rationale).
|
||||
Sub-batch coalescing in `append_or_extend_sub_batch` merges contiguous same-sigma backdrops
|
||||
sharing one scissor into a single instanced composite draw. Same-sigma backdrops separated by a
|
||||
`ScissorStart` boundary stay in one sigma group (one set of blur passes) but issue separate
|
||||
composite draws; the composite pass calls `SetGPUScissor` between draws when the active scissor
|
||||
changes.
|
||||
|
||||
#### Submission-order trade-off
|
||||
Working textures are sized at full swapchain resolution; larger downsample factors fill a sub-rect
|
||||
via viewport-limited rendering.
|
||||
|
||||
Within Pass A and Pass B, sub-batches render in the user's submission order. What the bracket model
|
||||
sacrifices is _interleaved_ ordering between backdrop and non-backdrop content within a single
|
||||
layer. A non-backdrop sub-batch submitted between two backdrops still renders in Pass B (after the
|
||||
bracket), not at its submission position. Worked example:
|
||||
#### Scope contract
|
||||
|
||||
```
|
||||
draw.rectangle(layer, bg, GRAY) // 0 Tessellated → Pass A
|
||||
draw.rectangle(layer, card_blue, BLUE) // 1 SDF → Pass A
|
||||
draw.gaussian_blur(layer, panelA, sigma=12) // 2 Backdrop → Bracket (sees: bg + blue card)
|
||||
draw.rectangle(layer, card_red, RED) // 3 SDF → Pass B (drawn ON TOP of panelA)
|
||||
draw.gaussian_blur(layer, panelB, sigma=12) // 4 Backdrop → Bracket (sees: bg + blue card; same as panelA)
|
||||
draw.text(layer, "label", ...) // 5 Text → Pass B (drawn ON TOP of both panels)
|
||||
```
|
||||
Scope state is global: `GLOB.open_backdrop_layer` tracks the currently-open scope (or `nil`) for
|
||||
the whole renderer. The five misuse cases panic via `log.panic` / `log.panicf`:
|
||||
|
||||
In this layer, panelB does _not_ see card_red — even though card_red was submitted before panelB —
|
||||
because both backdrops sample `source_texture` as it stood at the bracket entry, which is after
|
||||
Pass A and before card_red has rendered. card_red ends up on top of panelA, not underneath it.
|
||||
1. `backdrop_blur` called outside an open scope.
|
||||
2. A non-backdrop draw call issued on a layer with an open scope. Asserted at the top of
|
||||
`append_or_extend_sub_batch`.
|
||||
3. `new_layer` called while a scope is open.
|
||||
4. `end()` called while a scope is open.
|
||||
5. `begin_backdrop` while one is already open, or `end_backdrop` on the wrong layer.
|
||||
|
||||
The user controls the alternative outcome by splitting layers. Putting card_red and panelB into a
|
||||
new layer (via `draw.new_layer`) gives panelB a fresh source snapshot that includes panelA and
|
||||
card_red:
|
||||
Worked example with two scopes on the same layer:
|
||||
|
||||
```
|
||||
base := draw.begin(...)
|
||||
draw.rectangle(base, bg, GRAY)
|
||||
draw.rectangle(base, card_blue, BLUE)
|
||||
draw.gaussian_blur(base, panelA, sigma=12) // panelA in base layer's bracket
|
||||
|
||||
top := draw.new_layer(base, ...)
|
||||
draw.rectangle(top, card_red, RED)
|
||||
draw.gaussian_blur(top, panelB, sigma=12) // top layer's bracket; sees base + card_red
|
||||
draw.text(top, "label", ...)
|
||||
{
|
||||
draw.backdrop_scope(base)
|
||||
draw.backdrop_blur(base, panelA, sigma=12) // bracket 1: sees bg + blue card
|
||||
}
|
||||
|
||||
draw.rectangle(base, card_red, RED) // renders ON TOP of panelA's composite
|
||||
|
||||
{
|
||||
draw.backdrop_scope(base)
|
||||
draw.backdrop_blur(base, panelB, sigma=12) // bracket 2: sees bg + blue card + panelA + card_red
|
||||
}
|
||||
|
||||
draw.text(base, "label", ...) // renders ON TOP of panelB's composite
|
||||
```
|
||||
|
||||
Why one bracket per layer and not one per backdrop? Each bracket adds three render passes
|
||||
(downsample + H-blur + V-composite) and at least three tile-cache flushes on tilers like Mali
|
||||
Valhall. Strict submission-order semantics would require one bracket per cluster of contiguous
|
||||
backdrops, which scales the GPU cost linearly with how interleaved the user's submission happens
|
||||
to be — a footgun. The current design caps the bracket cost per layer regardless of submission
|
||||
interleave, and gives the user explicit control over ordering through the existing layer
|
||||
abstraction. This matches the cost/complexity envelope of iOS `UIVisualEffectView` and CSS
|
||||
`backdrop-filter` (both of which constrain backdrop ordering implicitly).
|
||||
Each bracket adds four render passes (downsample + H-blur + V-blur + composite) plus tile-cache
|
||||
flushes on tilers like Mali Valhall, so users who don't need interleaving should group backdrops
|
||||
into a single scope to amortize:
|
||||
|
||||
```
|
||||
{
|
||||
draw.backdrop_scope(base)
|
||||
draw.backdrop_blur(base, panelA, sigma=12) // shares one bracket with panelB;
|
||||
draw.backdrop_blur(base, panelB, sigma=12) // same sigma also coalesces into one
|
||||
} // instanced composite draw call
|
||||
```
|
||||
|
||||
#### Clay integration: `Backdrop_Marker`
|
||||
|
||||
Clay has no notion of backdrops. The integration uses Clay's only extension point — the opaque
|
||||
`customData: rawptr` on `clay.CustomElementConfig` — to carry a magic-number-tagged struct that
|
||||
`prepare_clay_batch` recognizes:
|
||||
|
||||
```
|
||||
Backdrop_Marker :: struct {
|
||||
magic: u32, // BACKDROP_MARKER_MAGIC (0x42445054, 'BDPT')
|
||||
sigma: f32,
|
||||
tint: Color,
|
||||
radii: Rectangle_Radii,
|
||||
feather_px: f32,
|
||||
}
|
||||
```
|
||||
|
||||
The user populates a `Backdrop_Marker` (with stable lifetime through the `prepare_clay_batch`
|
||||
call) and points the corresponding `clay.CustomElementConfig.customData` at it.
|
||||
`prepare_clay_batch` walks Clay's command stream once, calling `is_clay_backdrop` per command
|
||||
(a u32 magic check on `customData`'s first 4 bytes). On a hit it opens a backdrop scope (or
|
||||
extends an open one) and dispatches via `backdrop_blur`. Non-backdrop commands issued during an
|
||||
open scope go to a deferred index buffer for replay after the scope closes; this preserves Clay's
|
||||
painter's-algorithm ordering across backdrops without violating the scope contract.
|
||||
|
||||
The magic-number sentinel keeps the marker type self-describing in core dumps and decouples the
|
||||
integration from Clay-side changes. Zero-init memory has `magic = 0`, so a marker with a forgotten
|
||||
magic field gets routed through the regular `custom_draw` path and surfaces as "my custom draw
|
||||
never fired" rather than as a silent backdrop schedule.
|
||||
|
||||
### Vertex layout
|
||||
|
||||
|
||||
Reference in New Issue
Block a user