Backdrop scope implementation (#25)

Co-authored-by: Zachary Levy <zachary@sunforge.is> Reviewed-on: #25
2026-05-02 01:31:58 +00:00
parent 5317b8f142
commit e8ffa28de3
5 changed files with 633 additions and 349 deletions
@@ -614,11 +614,11 @@ require, keeping each sub-pass well under the 32-register cliff.
 frame renders into `source_texture` (a full-resolution single-sample texture owned by the
 backdrop pipeline) instead of directly into the swapchain. At the end of the frame,
 `source_texture` is copied to the swapchain via a single `CopyGPUTextureToTexture` call.
-This means the bracket has no mid-frame texture copy: by the time the bracket runs,
-`source_texture` already contains the pre-bracket frame contents and is the natural sampler
-input. When no layer in the frame has a backdrop draw, the existing fast path runs: the frame
-renders directly to the swapchain and the backdrop pipeline's working textures are never
-touched. Zero cost for backdrop-free frames.
+This means each bracket has no mid-frame texture copy: by the time a bracket runs,
+`source_texture` already contains the contents written by everything that preceded it on the
+timeline and is the natural sampler input. When no layer in the frame has a backdrop draw,
+the existing fast path runs: the frame renders directly to the swapchain and the backdrop
+pipeline's working textures are never touched. Zero cost for backdrop-free frames.

 **Why not split the backdrop sub-passes into separate pipelines?** Each sub-pass is budgeted at ≤24
 registers, well under Valhall's 32-register cliff, so there is no occupancy motivation for splitting.
@@ -628,81 +628,113 @@ all. Additionally, backdrop effects cover a small fraction of the frame's total
 typical UI scales), so even if a sub-pass did cross a cliff, the occupancy variation within the
 bracket would have negligible impact on frame time.

-#### Bracket scheduling model
+#### Bracket scheduling

-The bracket is scheduled per layer, anchored at the first backdrop sub-batch in the layer's
-submission order. Concretely, a layer with one or more backdrops splits into three groups:
+Backdrop draws are scheduled via **explicit scopes**: every call to `backdrop_blur` must be wrapped
+in a `begin_backdrop` / `end_backdrop` pair (or the RAII-style `backdrop_scope` wrapper). Each
+scope produces exactly one bracket at render time. A layer may contain any number of scopes; draws
+between scopes render at their submission position relative to the brackets, so the user controls
+exactly which backdrops share a bracket.

-1. **Pass A (pre-bracket)** — every non-backdrop sub-batch with index `< bracket_start_index`.
-   Renders to `source_texture` in a single render pass.
-2. **The bracket** — every backdrop sub-batch in the layer (regardless of index). Runs one
-   downsample pass, then one (H-blur + V-composite) pass pair per unique sigma.
-3. **Pass B (post-bracket)** — every non-backdrop sub-batch with index `>= bracket_start_index`.
-   Renders to `source_texture` with `LOAD`, drawing on top of the composited backdrop output.
+At render time, `draw_layer` walks the layer's sub-batch list once, alternating between two run
+kinds:

-`bracket_start_index` is the absolute index of the first `.Backdrop` kind in the layer's sub-batch
-range. If the layer has no backdrops, none of this kicks in and the layer renders in a single render
-pass via the existing fast path.
+- **Non-backdrop runs** are rendered to `source_texture` in one render pass via
+  `render_layer_sub_batch_range`. Clear-vs-load is tracked frame-globally via `GLOB.cleared`.
+- **Backdrop runs** are dispatched to `run_backdrop_bracket` with their index range. Each run is
+  one bracket; the bracket opens and closes its own render passes for downsample, H-blur, V-blur,
+  and composite stages.

-Per-sigma-group execution. The bracket walks each layer's sub-batches and groups contiguous
-`.Backdrop` sub-batches that share a sigma; each group picks its own downsample factor (1, 2, or 4)
-based on `compute_backdrop_downsample_factor`. For each group it runs four sub-passes: a downsample
-from `source_texture` to `downsample_texture`; an H-blur from `downsample_texture` to
-`h_blur_texture`; a V-blur from `h_blur_texture` back into `downsample_texture` (ping-pong reuse);
-and finally a composite that reads the fully-blurred `downsample_texture`, applies the SDF mask
-and tint, and writes the result to `source_texture`. Sub-batch coalescing in
-`append_or_extend_sub_batch` merges contiguous same-sigma backdrops into a single instanced
-composite draw; non-contiguous same-sigma backdrops still share the blur output but issue separate
-composite draws.
+Within a bracket, the scheduler groups contiguous same-sigma sub-batches and runs four sub-passes
+per group: downsample (`source_texture` → `downsample_texture`), H-blur (`downsample_texture` →
+`h_blur_texture`), V-blur (`h_blur_texture` → `downsample_texture`, ping-pong reuse), and
+composite (`downsample_texture` → `source_texture` with SDF mask and tint applied). Each group
+picks its own downsample factor (1, 2, or 4) based on sigma; see the comment block at the top of
+`backdrop.odin` for the factor-selection table.

-The working textures are sized at the full swapchain resolution; larger downsample factors only
-fill a sub-rect via viewport-limited rendering (see the comment block at the top of `backdrop.odin`
-for the factor-selection table and rationale).
+Sub-batch coalescing in `append_or_extend_sub_batch` merges contiguous same-sigma backdrops
+sharing one scissor into a single instanced composite draw. Same-sigma backdrops separated by a
+`ScissorStart` boundary stay in one sigma group (one set of blur passes) but issue separate
+composite draws; the composite pass calls `SetGPUScissor` between draws when the active scissor
+changes.

-#### Submission-order trade-off
+Working textures are sized at full swapchain resolution; larger downsample factors fill a sub-rect
+via viewport-limited rendering.

-Within Pass A and Pass B, sub-batches render in the user's submission order. What the bracket model
-sacrifices is _interleaved_ ordering between backdrop and non-backdrop content within a single
-layer. A non-backdrop sub-batch submitted between two backdrops still renders in Pass B (after the
-bracket), not at its submission position. Worked example:
+#### Scope contract

-```
-draw.rectangle(layer, bg, GRAY)                 // 0  Tessellated  → Pass A
-draw.rectangle(layer, card_blue, BLUE)          // 1  SDF          → Pass A
-draw.gaussian_blur(layer, panelA, sigma=12)     // 2  Backdrop     → Bracket  (sees: bg + blue card)
-draw.rectangle(layer, card_red, RED)            // 3  SDF          → Pass B   (drawn ON TOP of panelA)
-draw.gaussian_blur(layer, panelB, sigma=12)     // 4  Backdrop     → Bracket  (sees: bg + blue card; same as panelA)
-draw.text(layer, "label", ...)                  // 5  Text         → Pass B   (drawn ON TOP of both panels)
-```
+Scope state is global: `GLOB.open_backdrop_layer` tracks the currently-open scope (or `nil`) for
+the whole renderer. The five misuse cases panic via `log.panic` / `log.panicf`:

-In this layer, panelB does _not_ see card_red — even though card_red was submitted before panelB —
-because both backdrops sample `source_texture` as it stood at the bracket entry, which is after
-Pass A and before card_red has rendered. card_red ends up on top of panelA, not underneath it.
+1. `backdrop_blur` called outside an open scope.
+2. A non-backdrop draw call issued on a layer with an open scope. Asserted at the top of
+   `append_or_extend_sub_batch`.
+3. `new_layer` called while a scope is open.
+4. `end()` called while a scope is open.
+5. `begin_backdrop` while one is already open, or `end_backdrop` on the wrong layer.

-The user controls the alternative outcome by splitting layers. Putting card_red and panelB into a
-new layer (via `draw.new_layer`) gives panelB a fresh source snapshot that includes panelA and
-card_red:
+Worked example with two scopes on the same layer:

 ```
 base := draw.begin(...)
 draw.rectangle(base, bg, GRAY)
 draw.rectangle(base, card_blue, BLUE)
-draw.gaussian_blur(base, panelA, sigma=12)   // panelA in base layer's bracket

-top := draw.new_layer(base, ...)
-draw.rectangle(top, card_red, RED)
-draw.gaussian_blur(top, panelB, sigma=12)    // top layer's bracket; sees base + card_red
-draw.text(top, "label", ...)
+{
+    draw.backdrop_scope(base)
+    draw.backdrop_blur(base, panelA, sigma=12)      // bracket 1: sees bg + blue card
+}
+
+draw.rectangle(base, card_red, RED)                  // renders ON TOP of panelA's composite
+
+{
+    draw.backdrop_scope(base)
+    draw.backdrop_blur(base, panelB, sigma=12)      // bracket 2: sees bg + blue card + panelA + card_red
+}
+
+draw.text(base, "label", ...)                        // renders ON TOP of panelB's composite
 ```

-Why one bracket per layer and not one per backdrop? Each bracket adds three render passes
-(downsample + H-blur + V-composite) and at least three tile-cache flushes on tilers like Mali
-Valhall. Strict submission-order semantics would require one bracket per cluster of contiguous
-backdrops, which scales the GPU cost linearly with how interleaved the user's submission happens
-to be — a footgun. The current design caps the bracket cost per layer regardless of submission
-interleave, and gives the user explicit control over ordering through the existing layer
-abstraction. This matches the cost/complexity envelope of iOS `UIVisualEffectView` and CSS
-`backdrop-filter` (both of which constrain backdrop ordering implicitly).
+Each bracket adds four render passes (downsample + H-blur + V-blur + composite) plus tile-cache
+flushes on tilers like Mali Valhall, so users who don't need interleaving should group backdrops
+into a single scope to amortize:
+
+```
+{
+    draw.backdrop_scope(base)
+    draw.backdrop_blur(base, panelA, sigma=12)      // shares one bracket with panelB;
+    draw.backdrop_blur(base, panelB, sigma=12)      // same sigma also coalesces into one
+}                                                    // instanced composite draw call
+```
+
+#### Clay integration: `Backdrop_Marker`
+
+Clay has no notion of backdrops. The integration uses Clay's only extension point — the opaque
+`customData: rawptr` on `clay.CustomElementConfig` — to carry a magic-number-tagged struct that
+`prepare_clay_batch` recognizes:
+
+```
+Backdrop_Marker :: struct {
+    magic:      u32,  // BACKDROP_MARKER_MAGIC (0x42445054, 'BDPT')
+    sigma:      f32,
+    tint:       Color,
+    radii:      Rectangle_Radii,
+    feather_px: f32,
+}
+```
+
+The user populates a `Backdrop_Marker` (with stable lifetime through the `prepare_clay_batch`
+call) and points the corresponding `clay.CustomElementConfig.customData` at it.
+`prepare_clay_batch` walks Clay's command stream once, calling `is_clay_backdrop` per command
+(a u32 magic check on `customData`'s first 4 bytes). On a hit it opens a backdrop scope (or
+extends an open one) and dispatches via `backdrop_blur`. Non-backdrop commands issued during an
+open scope go to a deferred index buffer for replay after the scope closes; this preserves Clay's
+painter's-algorithm ordering across backdrops without violating the scope contract.
+
+The magic-number sentinel keeps the marker type self-describing in core dumps and decouples the
+integration from Clay-side changes. Zero-init memory has `magic = 0`, so a marker with a forgotten
+magic field gets routed through the regular `custom_draw` path and surfaces as "my custom draw
+never fired" rather than as a silent backdrop schedule.

 ### Vertex layout