Orgnaization & cleanup

This commit is contained in:
Zachary Levy
2026-04-30 16:52:55 -07:00
parent 16989cbb71
commit fd64bc01bf
13 changed files with 269 additions and 258 deletions
+97 -82
View File
@@ -15,10 +15,10 @@ modes dispatched by a push constant:
shader premultiplies the texture sample (`t.rgb *= t.a`) and computes `out = color * t`.
- **Mode 1 (SDF):** A static 6-vertex unit-quad buffer is drawn instanced, with per-primitive
`Primitive` structs (80 bytes each) uploaded each frame to a GPU storage buffer. The vertex shader
reads `primitives[gl_InstanceIndex]`, computes world-space position from unit quad corners +
`Base_2D_Primitive` structs (96 bytes each) uploaded each frame to a GPU storage buffer. The vertex
shader reads `primitives[gl_InstanceIndex]`, computes world-space position from unit quad corners +
primitive bounds. The fragment shader dispatches on `Shape_Kind` (encoded in the low byte of
`Primitive.flags`) to evaluate one of four signed distance functions:
`Base_2D_Primitive.flags`) to evaluate one of four signed distance functions:
- **RRect** (kind 1) — `sdRoundedBox` with per-corner radii. Covers rectangles (sharp or rounded),
circles (uniform radii = half-size), and line segments / capsules (rotated RRect with uniform
radii = half-thickness). Covers filled, outlined, textured, and gradient-filled variants.
@@ -28,21 +28,22 @@ modes dispatched by a push constant:
normals. Covers full rings, partial arcs, and pie slices (`inner_radius = 0`).
All SDF shapes support fill, outline, solid color, 2-color linear gradients, 2-color radial
gradients, and texture fills via `Shape_Flags` (see `pipeline_2d_base.odin`). Gradient and outline
parameters are packed into the same 16 bytes as the texture UV rect via a `Uv_Or_Effects` raw union
— zero size increase to the 80-byte `Primitive` struct. Gradient/outline and texture are mutually
exclusive.
gradients, and texture fills via `Shape_Flags` (see `pipeline_2d_base.odin`). The texture UV rect
(`uv_rect: [4]f32`) and the gradient/outline parameters (`effects: Gradient_Outline`) live in their
own 16-byte slots in `Base_2D_Primitive`, so a primitive can carry texture and outline simultaneously.
Gradient and texture remain mutually exclusive at the fill-source level (a Brush variant chooses one
or the other) since they share the worst-case fragment-shader register path.
All SDF shapes produce mathematically exact curves with analytical anti-aliasing via `smoothstep`
no tessellation, no piecewise-linear approximation. A rounded rectangle is 1 primitive (80 bytes)
no tessellation, no piecewise-linear approximation. A rounded rectangle is 1 primitive (96 bytes)
instead of ~250 vertices (~5000 bytes).
The main pipeline's register budget is **≤24 registers** (see "Main/effects split: register pressure"
in the pipeline plan below for the full cliff/margin analysis and SBC architecture context). The
fragment shader's estimated peak footprint is ~2226 fp32 VGPRs (~1622 fp16 VGPRs on architectures
in the pipeline plan below for the full cliff/margin analysis and SBC architecture context).
The fragment shader's estimated peak footprint is ~2226 fp32 VGPRs (~1622 fp16 VGPRs on architectures
with native mediump) via manual live-range analysis. The dominant peak is the Ring_Arc kind path
(wedge normals + inner/outer radii + dot-product temporaries live simultaneously with carried state
like `f_color`, `f_uv_or_effects`, and `half_size`). RRect is 12 regs lower (`corner_radii` vec4
like `f_color`, `f_uv_rect`/`f_effects`, and `half_size`). RRect is 12 regs lower (`corner_radii` vec4
replaces the separate inner/outer + normal pairs). NGon and Ellipse are lighter still. Real compilers
apply live-range coalescing, mediump-to-fp16 promotion, and rematerialization that typically shave
24 regs from hand-counted estimates — the conservative 26-reg upper bound is expected to compile
@@ -439,12 +440,13 @@ vertex shader branches on this uniform to select the tessellated or SDF code pat
- **Tessellated mode** (`mode = 0`): direct vertex buffer with explicit geometry. Used for text
(SDL_ttf atlas sampling), triangles, triangle fans/strips, single-pixel points, and any
user-provided raw vertex geometry.
- **SDF mode** (`mode = 1`): shared unit-quad vertex buffer + GPU storage buffer of `Primitive`
structs, drawn instanced. Used for all shapes with closed-form signed distance functions.
- **SDF mode** (`mode = 1`): shared unit-quad vertex buffer + GPU storage buffer of
`Base_2D_Primitive` structs, drawn instanced. Used for all shapes with closed-form signed distance
functions.
Both modes use the same fragment shader. The fragment shader checks `Shape_Kind` (low byte of
`Primitive.flags`): kind 0 (`Solid`) is the tessellated path, which premultiplies the texture sample
and computes `out = color * t`; kinds 14 dispatch to one of four SDF functions (RRect, NGon,
`Base_2D_Primitive.flags`): kind 0 (`Solid`) is the tessellated path, which premultiplies the texture
sample and computes `out = color * t`; kinds 14 dispatch to one of four SDF functions (RRect, NGon,
Ellipse, Ring_Arc) and apply gradient/texture/outline/solid color based on `Shape_Flags` bits.
#### Why SDF for shapes
@@ -452,8 +454,8 @@ Ellipse, Ring_Arc) and apply gradient/texture/outline/solid color based on `Shap
CPU-side adaptive tessellation for curved shapes (the current approach) has three problems:
1. **Vertex bandwidth.** A rounded rectangle with four corner arcs produces ~250 vertices × 20 bytes
= 5 KB. An SDF rounded rectangle is one `Primitive` struct (~56 bytes) plus 4 shared unit-quad
vertices. That is roughly a 90× reduction per shape.
= 5 KB. An SDF rounded rectangle is one `Base_2D_Primitive` struct (96 bytes) plus 4 shared
unit-quad vertices. That is roughly a 50× reduction per shape.
2. **Quality.** Tessellated curves are piecewise-linear approximations. At high DPI or under
animation/zoom, faceting is visible at any practical segment count. SDF evaluation produces
@@ -484,14 +486,14 @@ SDF primitives are submitted via a GPU storage buffer indexed by `gl_InstanceInd
shader, rather than encoding per-primitive data redundantly in vertex attributes. This follows the
pattern used by both Zed GPUI and vger-rs.
Each SDF shape is described by a single `Primitive` struct (80 bytes) in the storage buffer. The
vertex shader reads `primitives[gl_InstanceIndex]`, computes the quad corner position from the unit
vertex and the primitive's bounds, and passes shape parameters to the fragment shader via `flat`
interpolated varyings.
Each SDF shape is described by a single `Base_2D_Primitive` struct (96 bytes) in the storage
buffer. The vertex shader reads `primitives[gl_InstanceIndex]`, computes the quad corner position
from the unit vertex and the primitive's bounds, and passes shape parameters to the fragment shader
via `flat` interpolated varyings.
Compared to encoding per-primitive data in vertex attributes (the "fat vertex" approach), storage-
buffer instancing eliminates the 46× data duplication across quad corners. A rounded rectangle costs
80 bytes instead of 4 vertices × 40+ bytes = 160+ bytes.
96 bytes instead of 4 vertices × 60+ bytes = 240+ bytes.
The tessellated path retains the existing direct vertex buffer layout (20 bytes/vertex, no storage
buffer access). The vertex shader branch on `mode` (push constant) is warp-uniform — every invocation
@@ -499,15 +501,18 @@ in a draw call has the same mode — so it is effectively free on all modern GPU
#### Shape kinds and SDF dispatch
The fragment shader dispatches on `Shape_Kind` (low byte of `Primitive.flags`) to evaluate one of
four signed distance functions. The `Shape_Kind` enum and per-kind `*_Params` structs are defined in
`pipeline_2d_base.odin`. CPU-side drawing procs in `shapes.odin` build the appropriate `Primitive`
and set the kind automatically:
The fragment shader dispatches on `Shape_Kind` (low byte of `Base_2D_Primitive.flags`) to evaluate
one of four signed distance functions. The `Shape_Kind` enum and per-kind `*_Params` structs are
defined in `pipeline_2d_base.odin`. CPU-side drawing procs in `shapes.odin` build the appropriate
`Base_2D_Primitive` and set the kind automatically:
Each user-facing shape proc accepts a `Brush` union (color, linear gradient, radial gradient,
or textured fill) as its fill source, plus optional outline parameters. The procs map to SDF
kinds as follows:
| User-facing proc | Shape_Kind | SDF function | Notes |
| -------------------- | ---------- | ------------------ | ---------------------------------------------------------- |
| `rectangle` | `RRect` | `sdRoundedBox` | Per-corner radii from `radii` param |
| `rectangle_texture` | `RRect` | `sdRoundedBox` | Textured fill; `.Textured` flag set |
| `circle` | `RRect` | `sdRoundedBox` | Uniform radii = half-size (circle is a degenerate RRect) |
| `line`, `line_strip` | `RRect` | `sdRoundedBox` | Rotated capsule — stadium shape (radii = half-thickness) |
| `ellipse` | `Ellipse` | `sdEllipseApprox` | Approximate ellipse SDF (fast, suitable for UI) |
@@ -599,20 +604,21 @@ to is a hard GPU constraint; the only way to satisfy it is to end the current re
a new one. That render-pass boundary is what a “bracket” is.
**Multi-pass implementation.** Backdrop effects are implemented as separable multi-pass sequences
(downsample → horizontal blur → vertical-blur+composite), following the standard approach used by
iOS `UIVisualEffectView`, Android `RenderEffect`, and Flutter's `BackdropFilter`. Each individual
(downsample → horizontal blur → vertical blurcomposite), following the standard approach used
by iOS `UIVisualEffectView`, Android `RenderEffect`, and Flutter's `BackdropFilter`. Each individual
sub-pass is budgeted at **≤24 registers** (same as the main pipeline — full Valhall occupancy). The
multi-pass approach avoids the monolithic 70+ register shader that a single-pass Gaussian blur would
require, keeping each sub-pass well under the 32-register cliff.
**Approach B: render-target choice.** When any layer in the frame contains a backdrop draw, the
entire frame renders into `source_texture` (a full-resolution single-sample texture owned by the
backdrop pipeline) instead of directly into the swapchain. At the end of the frame, `source_texture`
is copied to the swapchain via a single `CopyGPUTextureToTexture` call. This means the bracket has
no mid-frame texture copy: by the time the bracket runs, `source_texture` already contains the pre-
bracket frame contents and is the natural sampler input. When no layer in the frame has a backdrop
draw, the existing fast path runs: the frame renders directly to the swapchain and the backdrop
pipeline's working textures are never touched. Zero cost for backdrop-free frames.
**Render-target choice.** When any layer in the frame contains a backdrop draw, the entire
frame renders into `source_texture` (a full-resolution single-sample texture owned by the
backdrop pipeline) instead of directly into the swapchain. At the end of the frame,
`source_texture` is copied to the swapchain via a single `CopyGPUTextureToTexture` call.
This means the bracket has no mid-frame texture copy: by the time the bracket runs,
`source_texture` already contains the pre-bracket frame contents and is the natural sampler
input. When no layer in the frame has a backdrop draw, the existing fast path runs: the frame
renders directly to the swapchain and the backdrop pipeline's working textures are never
touched. Zero cost for backdrop-free frames.
**Why not split the backdrop sub-passes into separate pipelines?** Each sub-pass is budgeted at ≤24
registers, well under Valhall's 32-register cliff, so there is no occupancy motivation for splitting.
@@ -638,13 +644,20 @@ submission order. Concretely, a layer with one or more backdrops splits into thr
range. If the layer has no backdrops, none of this kicks in and the layer renders in a single render
pass via the existing fast path.
The downsample runs once per layer, not once per sigma: it just copies `source_texture` to a ¼-
resolution working texture and doesn't depend on the kernel. Each unique sigma in the layer triggers
one H-blur (reads `downsample_texture`, writes `h_blur_texture`) and one V-composite (reads
`h_blur_texture`, writes `source_texture` per-primitive with the SDF mask). Sub-batch coalescing in
`append_or_extend_sub_batch` merges contiguous same-sigma backdrops into a single instanced V-
composite draw call; non-contiguous same-sigma backdrops still share the H-blur output but issue
separate V-composite draws.
Per-sigma-group execution. The bracket walks each layer's sub-batches and groups contiguous
`.Backdrop` sub-batches that share a sigma; each group picks its own downsample factor (1, 2, or 4)
based on `compute_backdrop_downsample_factor`. For each group it runs four sub-passes: a downsample
from `source_texture` to `downsample_texture`; an H-blur from `downsample_texture` to
`h_blur_texture`; a V-blur from `h_blur_texture` back into `downsample_texture` (ping-pong reuse);
and finally a composite that reads the fully-blurred `downsample_texture`, applies the SDF mask
and tint, and writes the result to `source_texture`. Sub-batch coalescing in
`append_or_extend_sub_batch` merges contiguous same-sigma backdrops into a single instanced
composite draw; non-contiguous same-sigma backdrops still share the blur output but issue separate
composite draws.
The working textures are sized at the full swapchain resolution; larger downsample factors only
fill a sub-rect via viewport-limited rendering (see the comment block at the top of `backdrop.odin`
for the factor-selection table and rationale).
#### Submission-order trade-off
@@ -656,9 +669,9 @@ bracket), not at its submission position. Worked example:
```
draw.rectangle(layer, bg, GRAY) // 0 Tessellated → Pass A
draw.rectangle(layer, card_blue, BLUE) // 1 SDF → Pass A
draw.rectangle_backdrop(layer, panelA, 12) // 2 Backdrop → Bracket (sees: bg + blue card)
draw.gaussian_blur(layer, panelA, sigma=12) // 2 Backdrop → Bracket (sees: bg + blue card)
draw.rectangle(layer, card_red, RED) // 3 SDF → Pass B (drawn ON TOP of panelA)
draw.rectangle_backdrop(layer, panelB, 12) // 4 Backdrop → Bracket (sees: bg + blue card; same as panelA)
draw.gaussian_blur(layer, panelB, sigma=12) // 4 Backdrop → Bracket (sees: bg + blue card; same as panelA)
draw.text(layer, "label", ...) // 5 Text → Pass B (drawn ON TOP of both panels)
```
@@ -674,11 +687,11 @@ card_red:
base := draw.begin(...)
draw.rectangle(base, bg, GRAY)
draw.rectangle(base, card_blue, BLUE)
draw.rectangle_backdrop(base, panelA, 12) // panelA in base layer's bracket
draw.gaussian_blur(base, panelA, sigma=12) // panelA in base layer's bracket
top := draw.new_layer(base, ...)
draw.rectangle(top, card_red, RED)
draw.rectangle_backdrop(top, panelB, 12) // top layer's bracket; sees base + card_red
draw.gaussian_blur(top, panelB, sigma=12) // top layer's bracket; sees base + card_red
draw.text(top, "label", ...)
```
@@ -708,29 +721,30 @@ draws, `position` carries actual world-space geometry. For SDF draws, `position`
corners (0,0 to 1,1) and the vertex shader computes world-space position from the storage-buffer
primitive's bounds.
The `Primitive` struct for SDF shapes lives in the storage buffer, not in vertex attributes:
The `Base_2D_Primitive` struct for SDF shapes lives in the storage buffer, not in vertex attributes:
```
Primitive :: struct {
Base_2D_Primitive :: struct {
bounds: [4]f32, // 0: min_x, min_y, max_x, max_y
color: Color, // 16: u8x4, unpacked in shader via unpackUnorm4x8
flags: u32, // 20: low byte = Shape_Kind, bits 8+ = Shape_Flags
rotation_sc: u32, // 24: packed f16 pair (sin, cos). Requires .Rotated flag.
_pad: f32, // 28: reserved for future use
params: Shape_Params, // 32: per-kind params union (half_feather, radii, etc.) (32 bytes)
uv: Uv_Or_Effects, // 64: texture UV rect or gradient/outline parameters (16 bytes)
uv_rect: [4]f32, // 64: texture UV coordinates. Read when .Textured.
effects: Gradient_Outline, // 80: gradient and/or outline parameters (16 bytes).
}
// Total: 80 bytes (std430 aligned)
// Total: 96 bytes (std430 aligned)
```
`Shape_Params` is a `#raw_union` over `RRect_Params`, `NGon_Params`, `Ellipse_Params`, and
`Ring_Arc_Params` (plus a `raw: [8]f32` view), defined in `pipeline_2d_base.odin`. Each SDF kind
writes its own params variant; the fragment shader reads the appropriate fields based on `Shape_Kind`.
`Uv_Or_Effects` is a `#raw_union` that aliases `[4]f32` (texture UV rect: u_min, v_min, u_max,
v_max) with a `Gradient_Outline` struct containing `gradient_color: Color`, `outline_color: Color`,
`Gradient_Outline` is a 16-byte struct containing `gradient_color: Color`, `outline_color: Color`,
`gradient_dir_sc: u32` (packed f16 cos/sin pair), and `outline_packed: u32` (packed f16 outline
width). The `flags` field encodes the `Shape_Kind` in the low byte and `Shape_Flags` in bits 8+
via `pack_kind_flags`.
width). It is independent of `uv_rect`, so a primitive can carry texture and outline parameters at
the same time. The `flags` field encodes the `Shape_Kind` in the low byte and `Shape_Flags` in bits
8+ via `pack_kind_flags`.
### Draw submission order
@@ -754,7 +768,7 @@ pair into bitmap atlases and emits indexed triangle data via `GetGPUTextDrawData
**unchanged** by the SDF migration — text continues to flow through the main pipeline's tessellated
mode with `mode = 0`, sampling the SDL_ttf atlas texture.
A future phase may evaluate MSDF (multi-channel signed distance field) text rendering, which would
MSDF (multi-channel signed distance field) text rendering may be evaluated later, which would
allow resolution-independent glyph rendering from a single small atlas per font. This would involve:
- Offline atlas generation via Chlumský's msdf-atlas-gen tool.
@@ -763,8 +777,7 @@ allow resolution-independent glyph rendering from a single small atlas per font.
already exists for the four current SDF kinds).
- Potential removal of the SDL_ttf dependency.
This is explicitly deferred. The SDF shape migration is independent of and does not block text
changes.
This is explicitly deferred.
**References:**
@@ -778,8 +791,8 @@ changes.
### Textures
Textures plug into the existing main pipeline — no additional GPU pipeline, no shader rewrite. The
work is a resource layer (registration, upload, sampling, lifecycle) plus two textured-draw procs
that route into the existing tessellated and SDF paths respectively.
work is a resource layer (registration, upload, sampling, lifecycle) plus a `Texture_Fill` Brush
variant that routes the existing shape procs through the SDF path with the `.Textured` flag set.
#### Why draw owns registered textures
@@ -829,22 +842,25 @@ with the same texture but different samplers produce separate draw calls, which
#### Textured draw procs
Textured rectangles route through the existing SDF path via `rectangle_texture`, which mirrors
`rectangle` exactly — same parameters for radii, origin, rotation, feather — with the `color`
parameter replaced by a `Texture_Id`, an optional `tint`, a `uv_rect`, and a `Sampler_Preset`.
Textures share the same shape procs as colors and gradients. Each shape proc takes a `Brush`
union as its fill source; passing a `Texture_Fill` value (carrying `Texture_Id`, `tint`,
`uv_rect`, and `Sampler_Preset`) routes the draw through the SDF path with the `.Textured`
flag set. There is no dedicated `rectangle_texture` / `circle_texture` proc — the same
`rectangle`, `circle`, `ellipse`, `polygon`, `ring`, `line`, and `line_strip` procs handle
all fill sources.
An earlier iteration of this design considered a separate tessellated proc for "simple" fullscreen
quads, on the theory that the tessellated path's lower register count would improve occupancy at
large fragment counts. Both paths are well within the ≤24-register main pipeline budget — both run at
full occupancy on every target architecture (Valhall and above). The remaining ALU difference (~15
extra instructions for the SDF evaluation) amounts to ~20μs at 4K — below noise. Meanwhile,
splitting into a separate pipeline would add ~15μs per pipeline bind on the CPU side per scissor,
matching or exceeding the GPU-side savings. Within the main pipeline, unified remains strictly better.
A separate tessellated proc for "simple" fullscreen quads was considered on the theory that
the tessellated path's lower register count would improve occupancy at large fragment counts.
Both paths are well within the ≤24-register main pipeline budget — both run at full
occupancy on every target architecture (Valhall and above). The remaining ALU difference
(~15 extra instructions for the SDF evaluation) amounts to ~20μs at 4K — below noise.
Meanwhile, splitting into a separate pipeline would add ~15μs per pipeline bind on the CPU
side per scissor, matching or exceeding the GPU-side savings. Within the main pipeline,
unified remains strictly better.
SDF drawing procs live in the `draw` package with unprefixed names (`rectangle`, `rectangle_texture`,
`circle`, `ellipse`, `polygon`, `ring`, `line`, `line_strip`). Gradients and outlines are optional
parameters on each proc rather than separate overloads. Future per-shape texture variants
(`circle_texture`, `ellipse_texture`) are additive.
SDF drawing procs live in the `draw` package with unprefixed names (`rectangle`, `circle`,
`ellipse`, `polygon`, `ring`, `line`, `line_strip`). Gradients, textures, and outlines are
selected via the `Brush` union and optional outline parameters rather than separate overloads.
#### What SDF anti-aliasing does and does not do for textured draws
@@ -858,8 +874,8 @@ depends on how closely the display size matches the SDL_ttf atlas's rasterized s
#### Fit modes are a computation layer, not a renderer concept
Standard image-fit behaviors (stretch, fill/cover, fit/contain, tile, center) are expressed as UV
sub-region computations on top of the `uv_rect` parameter that both textured-draw procs accept. The
renderer has no knowledge of fit modes — it samples whatever UV region it is given.
sub-region computations on top of the `uv_rect` field of `Texture_Fill`. The renderer has no
knowledge of fit modes — it samples whatever UV region it is given.
A `fit_params` helper computes the appropriate `uv_rect`, sampler preset, and (for letterbox/fit
mode) shrunken inner rect from a `Fit_Mode` enum, the target rect, and the texture's pixel size.
@@ -883,13 +899,13 @@ textures onto a free list that is processed in `r_end_frame`, not at the call si
Clay's `RenderCommandType.Image` is handled by dereferencing `imageData: rawptr` as a pointer to a
`Clay_Image_Data` struct containing a `Texture_Id`, `Fit_Mode`, and tint color. Routing mirrors the
existing rectangle handling: `fit_params` computes UVs from the fit mode, then
`rectangle_texture` is called with the appropriate radii (zero for sharp corners, per-corner values
from Clay's `cornerRadius` otherwise).
existing rectangle handling: `fit_params` computes UVs from the fit mode, then `rectangle` is
called with a `Texture_Fill` brush and the appropriate radii (zero for sharp corners, per-corner
values from Clay's `cornerRadius` otherwise).
#### Deferred features
The following are plumbed in the descriptor but not implemented in phase 1:
The following are plumbed in `Texture_Desc` but not yet implemented:
- **Mipmaps**: `Texture_Desc.mip_levels` field exists; generation via SDL3 deferred.
- **Compressed formats**: `Texture_Desc.format` accepts BC/ASTC; upload path deferred.
@@ -897,7 +913,6 @@ The following are plumbed in the descriptor but not implemented in phase 1:
- **3D textures, arrays, cube maps**: `Texture_Desc.type` and `depth_or_layers` fields exist.
- **Additional samplers**: anisotropic, trilinear, clamp-to-border — additive enum values.
- **Atlas packing**: internal optimization for sub-batch coalescing; invisible to callers.
- **Per-shape texture variants**: `circle_texture`, `ellipse_texture`, `polygon_texture` — potential future additions, following the existing naming convention.
**References:**
+33 -32
View File
@@ -21,16 +21,16 @@ import sdl "vendor:sdl3"
// sigma_phys ≤ 8 → factor = 2
// sigma_phys > 8 → factor = 4 (capped)
//
// Capped at factor=4: master's preference for visual quality over bandwidth at the high end.
// Larger factors (8 and 16) would lose more high-frequency detail than the kernel can mask
// even with the H+V split, and the bandwidth saving is small (the work region also shrinks
// quadratically, so most of the savings are already captured at factor=4).
// Capped at factor=4 to favor visual quality over bandwidth at the high end. Larger factors
// (8 and 16) would lose more high-frequency detail than the kernel can mask even with the
// H+V split, and the bandwidth saving is small (the work region also shrinks quadratically,
// so most of the savings are already captured at factor=4).
//
// Working textures are sized at full swapchain resolution to support factor=1. Larger factors
// just write to a smaller sub-rect via viewport-limited rendering. Memory cost: ½-res → full-
// res working textures means 4× more bytes per working texture (2 textures, RGBA8: roughly
// 16 MB at 1080p, 64 MB at 4K). On modern GPUs this is well within budget; on Mali Valhall
// SBCs it's negligible against unified-memory headroom.
// just write to a smaller sub-rect via viewport-limited rendering. Memory cost: full-res
// working textures (2 textures, RGBA8) is roughly 16 MB at 1080p, 64 MB at 4K. On modern
// GPUs this is well within budget; on Mali Valhall SBCs it's negligible against unified-
// memory headroom.
//
// The shaders read the factor as a uniform. The downsample shader has three paths (factor=1
// identity, factor=2 single bilinear tap, factor>=4 four bilinear taps with offsets scaling
@@ -86,7 +86,7 @@ Backdrop_Vert_Uniforms :: struct {
// shaders/source/backdrop_downsample.frag.
Backdrop_Downsample_Frag_Uniforms :: struct {
inv_source_size: [2]f32, // 0: 8 — 1.0 / source_texture pixel dimensions (full-res)
downsample_factor: u32, // 8: 4 — 2 or 4 (selects 1-tap vs 4-tap path in shader)
downsample_factor: u32, // 8: 4 — 1, 2, or 4 (selects identity / 1-tap / 4-tap path in shader)
_pad0: u32, // 12: 4
}
@@ -120,11 +120,12 @@ Pipeline_2D_Backdrop :: struct {
primitive_buffer: Buffer,
// Working textures, allocated once at swapchain resolution and recreated only on resize.
// `source_texture` is full-resolution; the other two are ¼-res. All single-sample.
// All three are sized at full swapchain resolution and single-sample. Larger downsample
// factors fill only a sub-rect via viewport-limited rendering (see file-header comment).
// source_texture — when any backdrop draw exists this frame, the entire frame renders
// here instead of the swapchain (Approach B). Copied to the swapchain
// at frame end. Acts as the bracket's snapshot input by virtue of
// already containing the pre-bracket frame.
// here instead of the swapchain. Copied to the swapchain at frame
// end. Acts as the bracket's snapshot input by virtue of already
// containing the pre-bracket frame.
// downsample_texture — written by the downsample PSO. Read by the blur PSO in mode 0.
// h_blur_texture — written by the blur PSO in mode 0. Read by the blur PSO in mode 1.
source_texture: ^sdl.GPUTexture,
@@ -243,7 +244,7 @@ create_pipeline_2d_backdrop :: proc(
//----- Downsample PSO ----------------------------------
// Single bilinear sample, blend disabled. No vertex buffer (gl_VertexIndex 0..2 emits the
// fullscreen triangle). Single-sample target (the ¼-res working textures are never MSAA).
// fullscreen triangle). Single-sample target (working textures are never MSAA).
downsample_target := sdl.GPUColorTargetDescription {
format = swapchain_format,
blend_state = sdl.GPUColorTargetBlendState{enable_blend = false},
@@ -350,9 +351,9 @@ destroy_pipeline_2d_backdrop :: proc(device: ^sdl.GPUDevice, pipeline: ^Pipeline
// ---------------------------------------------------------------------------------------------------------------------
// Allocate (or reallocate, on resize) the three working textures that the backdrop bracket
// uses. `source_texture` is full swapchain resolution; the other two are ¼-res. All single-
// sample, all share the swapchain format, all need {.COLOR_TARGET, .SAMPLER} usage so they
// can be written by render passes and read by subsequent passes.
// uses. All three are sized at full swapchain resolution, single-sample, share the swapchain
// format, and need {.COLOR_TARGET, .SAMPLER} usage so they can be written by render passes
// and read by subsequent passes.
//
// Recreates on dimension change only — same-size frames hit the early-out and skip GPU
// resource churn.
@@ -466,19 +467,19 @@ ensure_backdrop_textures :: proc(device: ^sdl.GPUDevice, format: sdl.GPUTextureF
// `i in [1, pair_count)` and does two texture fetches per pair — one at +offset, one at
// -offset — for a total of 1 + 2*(pair_count-1) bilinear fetches per fragment.
//
// `sigma` is the true Gaussian standard deviation in the kernel's working-space units (¼-res
// texels, after the caller has converted from logical pixels via dpi_scaling and the
// downsample factor). The kernel extent reaches ±3σ, capturing 99.7% of the Gaussian's
// `sigma` is the true Gaussian standard deviation in the kernel's working-space units
// (working-resolution texels, after the caller has converted from logical pixels via
// dpi_scaling and the downsample factor). The kernel extent reaches ±3σ, capturing 99.7% of
// the Gaussian's
// mass; weights beyond that contribute imperceptibly. sigma <= 0 produces a degenerate
// kernel `{1, 0}` that acts as a sharp pass-through. After the loop, the discrete weights
// are normalized so they sum to 1.0 (truncating at ±3σ loses a tiny amount of mass; we
// renormalize to preserve overall image brightness).
//
// Earlier versions of this routine ported RAD Debugger's algorithm verbatim, which derives
// stdev from a tap-count parameter (`stdev = (blur_count-1)/2`). That made the parameter
// name misleading: the user thought they were passing σ but were actually passing
// half-kernel-width. This version takes σ directly and derives the tap count from it,
// matching what callers expect when they read "gaussian_sigma".
// Note on the parameter contract: this routine takes σ directly and derives the tap count
// from it, rather than the inverse (RAD Debugger's algorithm passes a tap count and derives
// `stdev = (blur_count-1)/2`). Taking σ directly matches what callers expect when they read
// "gaussian_sigma" — passing tap count under that name was a footgun.
@(private)
compute_blur_kernel :: proc(sigma: f32, kernel: ^[MAX_BACKDROP_KERNEL_PAIRS][4]f32) -> (pair_count: u32) {
if sigma <= 0 {
@@ -624,7 +625,7 @@ upload_backdrop_primitives :: proc(device: ^sdl.GPUDevice, pass: ^sdl.GPUCopyPas
// ---------------------------------------------------------------------------------------------------------------------
// Returns true if any sub-batch in any layer this frame is .Backdrop kind. Called once at the
// top of `end()` to decide whether to route the whole frame to source_texture (Approach B).
// top of `end()` to decide whether to route the whole frame to source_texture.
// O(total sub-batches) but with an early-exit on the first hit, so typical cost is tiny.
@(private)
frame_has_backdrop :: proc() -> bool {
@@ -742,10 +743,10 @@ compute_backdrop_group_work_region :: proc(
// target viewport, per-primitive SDF discard handles masking and applies the tint. Each
// sub-batch in the group is one instanced draw.
//
// V-blur was historically combined with the composite into a single shader invocation, but
// that produced a horizontal-vs-vertical asymmetry artifact (horizontal source features
// looked sharper than vertical ones inside the panel). Splitting V-blur into its own
// working→working pass restores symmetry by making H and V blurs structurally identical.
// V-blur is run as its own working→working pass rather than folded into the composite. The
// folded variant produces a horizontal-vs-vertical asymmetry artifact (horizontal source
// features end up looking sharper than vertical ones inside the panel). Matching V's
// structure exactly to H's restores symmetry.
//
// On exit, source_texture contains the pre-bracket contents plus all backdrop primitives
// composited on top. The caller then runs Pass B (post-bracket non-backdrop sub-batches) on
@@ -1011,8 +1012,8 @@ run_backdrop_bracket :: proc(
// geometry. The caller sets `color` (tint) on the returned primitive before submitting.
//
// No rotation, no outline — backdrop primitives are intentionally limited to axis-aligned
// RRects in v1. Rotation breaks screen-space blur sampling visually; outline would be a
// specialized edge effect that belongs in its own primitive type.
// RRects. Rotation breaks screen-space blur sampling visually; outline would be a specialized
// edge effect that belongs in its own primitive type.
@(private)
build_backdrop_primitive :: proc(
rect: Rectangle,
+5 -5
View File
@@ -830,9 +830,9 @@ end :: proc(device: ^sdl.GPUDevice, window: ^sdl.Window, clear_color: Color = DF
}
// Pre-scan: if any layer this frame has a backdrop sub-batch, route the entire frame to
// source_texture (Approach B) so the bracket can sample the pre-bracket framebuffer
// without a mid-frame texture copy. Frames without any backdrop hit the existing fast
// path and never touch the backdrop pipeline's working textures.
// source_texture so the bracket can sample the pre-bracket framebuffer without a mid-
// frame texture copy. Frames without any backdrop hit the existing fast path and never
// touch the backdrop pipeline's working textures.
has_backdrop := frame_has_backdrop()
// Upload primitives to GPU (vertices, indices, SDF prims, and backdrop prims share one
@@ -880,8 +880,8 @@ end :: proc(device: ^sdl.GPUDevice, window: ^sdl.Window, clear_color: Color = DF
draw_layer(device, window, cmd_buffer, render_texture, width, height, clear_color_f32, &layer)
}
// Approach B finalization: when we rendered into source_texture, copy it to the swapchain.
// Single CopyGPUTextureToTexture call per frame, only when backdrop content was present.
// When we rendered into source_texture, copy it to the swapchain. Single
// CopyGPUTextureToTexture call per frame, only when backdrop content was present.
if has_backdrop {
copy_pass := sdl.BeginGPUCopyPass(cmd_buffer)
sdl.CopyGPUTextureToTexture(
+1 -1
View File
@@ -20,7 +20,7 @@ texture_size :: #force_inline proc(qrcode_buf: []u8) -> int {
//
// Returns ok=false when:
// - qrcode_buf is invalid (qrcode.get_size returns 0).
// - texture_buf is smaller than to_texture_size(qrcode_buf).
// - texture_buf is smaller than texture_size(qrcode_buf).
@(require_results)
to_texture :: proc(
qrcode_buf: []u8,
+4 -5
View File
@@ -10,8 +10,8 @@ import cyber "../cybersteel"
// Backdrop example.
//
// Verifies the Stage D bracket scheduler end-to-end. The demo is structured as three zones in
// one window so we can stress-test the cases that matter:
// Exercises the bracket scheduler end-to-end. The demo is structured as three zones in one
// window so we can stress-test the cases that matter:
//
// Zone 1 (top, base layer): animated colorful background + two side-by-side frosted panels
// with DIFFERENT sigmas and DIFFERENT tints. Tests sigma grouping
@@ -269,9 +269,8 @@ gaussian_blur :: proc() {
// SPACE : reset to sigma=10
// T : toggle the test rectangle on top of the panel
//
// Sigma is printed to the console label and to the title bar so you can correlate visual
// behavior with kernel state (which is also logged via the [backdrop] debug print in
// backdrop.odin's compute_blur_kernel callsite).
// Sigma is printed to the title bar so you can correlate visual behavior with the numeric
// value as you adjust it.
gaussian_blur_debug :: proc() {
if !sdl.Init({.VIDEO}) do os.exit(1)
window := sdl.CreateWindow("Backdrop debug", 800, 600, {.HIGH_PIXEL_DENSITY})
+3 -3
View File
@@ -116,9 +116,9 @@ Gradient_Outline :: struct {
// avoiding per-pixel trigonometry in the fragment shader. Only read when .Rotated is set.
//
// Named Base_2D_Primitive (not just Primitive) to disambiguate from Backdrop_Primitive in
// pipeline_2d_backdrop.odin. The two pipelines have unrelated GPU layouts and unrelated
// fragment-shader contracts; pairing each with its own primitive type keeps cross-references
// unambiguous when grepping the codebase.
// backdrop.odin. The two pipelines have unrelated GPU layouts and unrelated fragment-shader
// contracts; pairing each with its own primitive type keeps cross-references unambiguous
// when grepping the codebase.
Base_2D_Primitive :: struct {
bounds: [4]f32, // 0: min_x, min_y, max_x, max_y (world-space, pre-DPI)
color: Color, // 16: u8x4, fill color / gradient start color / texture tint
+22 -23
View File
@@ -1,19 +1,18 @@
#version 450 core
// Unified backdrop blur fragment shader.
// Handles both H-blur (mode 0, blurs the ¼-resolution downsample texture into
// the ¼-resolution h_blur texture) and V-blur+composite (mode 1, blurs h_blur
// vertically, masks via RRect SDF, applies tint, composites outline, and writes
// to the main render target with premultiplied alpha).
// Handles both the 1D separable blur passes (mode 0, used for BOTH the H-pass and V-pass;
// `direction` picks the axis) and the composite pass (mode 1, reads the fully-blurred
// working texture, masks via RRect SDF, applies tint, and writes to source_texture with
// premultiplied-over blending). Working textures are sized at the full swapchain resolution;
// downsampled content occupies only a sub-rect at downsample factor > 1 (set via viewport).
//
// Following RAD's pattern, V-mode replaces a separate composite pass: the SDF
// discard limits V-blur work to the masked region, and the per-primitive tint
// is folded in. Output blends with the main render target via the standard
// premultiplied-over blend state (ONE, ONE_MINUS_SRC_ALPHA).
// The composite blends with source_texture via the standard premultiplied-over blend state
// (ONE, ONE_MINUS_SRC_ALPHA).
//
// Backdrop primitives are tint-only — there is no outline. A specialized edge
// effect (e.g. liquid-glass-style refraction outlines) would be implemented
// as a dedicated primitive type with its own pipeline.
// Backdrop primitives are tint-only — there is no outline. A specialized edge effect
// (e.g. liquid-glass-style refraction outlines) would be implemented as a dedicated
// primitive type with its own pipeline.
//
// Two modes, structurally distinct:
//
@@ -30,11 +29,11 @@
// (gl_FragCoord.xy * inv_downsample_factor) * inv_working_size.
// No kernel is applied here — the blur is already complete.
//
// Splitting V-blur out of the composite pass (an earlier version combined them) was needed
// to avoid a horizontal-vs-vertical asymmetry artifact: when the V-blur sampled the H-blur
// output through the bilinear-upsample/SDF-mask/tint pipeline in one shader invocation,
// horizontal source features ended up looking sharper than vertical ones. Running V-blur as
// its own working→working pass (matching H's structure exactly) restores symmetry.
// V-blur is run as its own working→working pass rather than folded into the composite. The
// folded variant produced a horizontal-vs-vertical asymmetry artifact: when V-blur sampled
// the H-blur output through the bilinear-upsample/SDF-mask/tint pipeline in one shader
// invocation, horizontal source features ended up looking sharper than vertical ones.
// Matching V's structure exactly to H's restores symmetry.
const uint MAX_KERNEL_PAIRS = 32;
@@ -140,16 +139,16 @@ void main() {
vec2 uv = (gl_FragCoord.xy * inv_downsample_factor) * inv_working_size;
vec3 color = texture(blur_input_tex, uv).rgb;
// Tint composition (Option B semantics): inside the masked region the panel is fully
// opaque — it completely hides the original framebuffer content, just like real frosted
// glass and like iOS UIBlurEffect / CSS backdrop-filter. f_color.rgb specifies the tint
// color; f_color.a specifies the tint *mix strength* (NOT panel opacity). At alpha=0 we
// see the pure blur; at alpha=255 we see the blur fully multiplied by the tint color.
// Tint composition: inside the masked region the panel is fully opaque — it completely
// hides the original framebuffer content, just like real frosted glass and like iOS
// UIBlurEffect / CSS backdrop-filter. f_color.rgb specifies the tint color; f_color.a
// specifies the tint *mix strength* (NOT panel opacity). At alpha=0 we see the pure
// blur; at alpha=255 we see the blur fully multiplied by the tint color.
//
// Output is premultiplied to match the ONE, ONE_MINUS_SRC_ALPHA blend state. Coverage
// (the SDF mask's edge AA) modulates only the alpha channel, never the panel-vs-source
// blend; that way edge pixels still feather correctly without re-introducing the bug
// where mid-panel pixels became semi-transparent.
// blend; that way edge pixels still feather correctly while mid-panel pixels stay fully
// opaque.
mediump vec3 tinted = mix(color, color * f_color.rgb, f_color.a);
mediump float coverage = sdf_alpha(d_n, h_n);
out_color = vec4(tinted * coverage, coverage);
+12 -11
View File
@@ -1,18 +1,19 @@
#version 450 core
// Unified backdrop blur vertex shader.
// Handles both H-blur (fullscreen triangle, mode 0) and V-blur+composite (instanced
// unit-quad over Backdrop_Primitive storage buffer, mode 1) for the second PSO of
// the backdrop bracket. The first PSO (downsample) uses backdrop_fullscreen.vert.
// Handles both the 1D separable blur passes (fullscreen triangle, mode 0; used for
// BOTH the H-pass and V-pass) and the composite pass (instanced unit-quad over
// Backdrop_Primitive storage buffer, mode 1) for the second PSO of the backdrop bracket.
// The first PSO (downsample) uses backdrop_fullscreen.vert.
//
// No vertex buffer for either mode. Mode 0 uses gl_VertexIndex 0..2 for a single
// fullscreen triangle; mode 1 uses gl_VertexIndex 0..5 for a unit-quad (two
// triangles, TRIANGLELIST topology) and gl_InstanceIndex to select the primitive.
//
// Mode 0 viewport+scissor are CPU-set per layer-bracket to the work region (union
// AABB of backdrop primitives + 3*max_sigma, clamped to swapchain bounds). Mode 1
// renders into the main render target with the screen-space orthographic projection;
// the per-primitive bounds drive the quad in screen space.
// Mode 0 viewport+scissor are CPU-set per sigma group to the work region (union AABB
// of that group's backdrop primitives + halo, clamped to swapchain bounds). Mode 1
// renders into source_texture with the screen-space orthographic projection; the
// per-primitive bounds drive the quad in screen space.
//
// Backdrop primitives have NO rotation — backdrop sampling is in screen space, so
// a rotated mask over a stationary blur sample would look wrong.
@@ -46,11 +47,11 @@ layout(set = 1, binding = 0) uniform Uniforms {
// vec2 and scalar tail packs tight to land the struct at a clean 48-byte
// stride (a multiple of 16, so the array stride needs no rounding either).
// Field semantics match the CPU-side Backdrop_Primitive declared in
// levlib/draw/pipeline_2d_backdrop.odin; keep both in sync.
// levlib/draw/backdrop.odin; keep both in sync.
//
// Backdrop primitives are tint-only in v1: outline is intentionally absent.
// Future specialized effects (e.g. liquid-glass-style edges) would be a
// dedicated primitive type with its own pipeline rather than a flag bit here.
// Backdrop primitives are tint-only: outline is intentionally absent. Specialized
// edge effects (e.g. liquid-glass-style refraction outlines) would be a dedicated
// primitive type with its own pipeline rather than a flag bit here.
struct Backdrop_Primitive {
vec4 bounds; // 0-15: min_xy, max_xy (world-space)
vec4 radii; // 16-31: per-corner radii (physical px)
+9 -12
View File
@@ -2,9 +2,9 @@
// Backdrop downsample fragment shader.
// Reads source_texture (full-resolution snapshot of pre-bracket framebuffer contents) and
// writes a downsampled copy at factor 1, 2, 4, 8, or 16. The output is the working texture
// (sized at full swapchain resolution); larger factors only fill a sub-rect of it via the
// CPU-set viewport. See backdrop.odin for the factor selection table (Flutter-style).
// writes a downsampled copy at factor 1, 2, or 4. The output is the working texture (sized
// at full swapchain resolution); larger factors only fill a sub-rect of it via the CPU-set
// viewport. See backdrop.odin for the factor selection table (Flutter-style).
//
// Shader paths by factor:
//
@@ -15,15 +15,12 @@
// factor=2: each output covers a 2×2 source block. Single bilinear tap at the shared
// corner reads all 4 source pixels with 0.25 weight.
//
// factor>=4: each output covers a (factor)×(factor) source block. We use 4 bilinear taps,
// each at the shared corner of a (factor/2)×(factor/2) sub-block. Each tap reads
// 4 source pixels uniformly; combined, the 4 taps sample 16 source pixels arranged
// uniformly across the block. This is an approximation of a true (factor)² box
// filter — exact at factor=4 (16 pixels = full coverage), undersampled at factor=8
// (16 pixels of 64) and factor=16 (16 of 256). Flutter uses a richer 13-tap COD-
// style downsample shader at high factors; we accept the simpler 4-tap pattern
// for now since the high-factor cases come with large kernels that mask any
// residual aliasing.
// factor=4: each output covers a 4×4 source block. We use 4 bilinear taps, each at the
// shared corner of a 2×2 sub-block. Each tap reads 4 source pixels uniformly;
// combined, the 4 taps sample 16 source pixels arranged uniformly across the
// block (full coverage at factor=4). The factor>=4 path is structured so the
// same shader code would extend to factor=8 (16 pixels of 64) or factor=16 (16
// of 256) if the CPU-side cap is ever raised, though the current cap is 4.
//
// The viewport+scissor are set by the CPU to limit output to the layer's work region in
// working-texture coords (work_region_phys / factor), clamped to the texture bounds.
+1 -1
View File
@@ -45,7 +45,7 @@ layout(std430, set = 0, binding = 0) readonly buffer Base_2D_Primitives {
// ---------- Entry point ----------
void main() {
if (mode == 0u) {
// ---- Mode 0: Tessellated (legacy) ----
// ---- Mode 0: Tessellated (used for text and arbitrary user geometry) ----
f_color = v_color;
f_local_or_uv = v_uv;
f_params = vec4(0.0);
+2 -1
View File
@@ -53,7 +53,8 @@ emit_rectangle :: proc(x, y, width, height: f32, color: Color, vertices: []Verte
}
// Internal — submit an SDF primitive with optional texture binding.
// Replaces the old prepare_sdf_primitive and prepare_sdf_primitive_textured.
// The texture-aware counterpart of `draw.prepare_sdf_primitive`; lets shape procs route a
// texture_id and sampler into the sub-batch without growing the public API.
@(private)
prepare_sdf_primitive_ex :: proc(
layer: ^Layer,
+1 -2
View File
@@ -9,7 +9,6 @@ import qr ".."
main :: proc() {
//----- General setup ----------------------------------
{
// Temp
track_temp: mem.Tracking_Allocator
mem.tracking_allocator_init(&track_temp, context.temp_allocator)
@@ -48,7 +47,7 @@ main :: proc() {
// Logger
context.logger = log.create_console_logger()
defer log.destroy_console_logger(context.logger)
}
args := os.args
if len(args) < 2 {
+1 -2
View File
@@ -14,7 +14,6 @@ DB_PATH :: "out/debug/lmdb_example_db"
main :: proc() {
//----- General setup ----------------------------------
{
// Temp
track_temp: mem.Tracking_Allocator
mem.tracking_allocator_init(&track_temp, context.temp_allocator)
@@ -53,7 +52,7 @@ main :: proc() {
// Logger
context.logger = log.create_console_logger()
defer log.destroy_console_logger(context.logger)
}
environment: ^mdb.Env