Orgnaization & cleanup

This commit is contained in:
Zachary Levy
2026-04-30 16:52:55 -07:00
parent 16989cbb71
commit fd64bc01bf
13 changed files with 269 additions and 258 deletions
+107 -92
View File
@@ -15,10 +15,10 @@ modes dispatched by a push constant:
shader premultiplies the texture sample (`t.rgb *= t.a`) and computes `out = color * t`.
- **Mode 1 (SDF):** A static 6-vertex unit-quad buffer is drawn instanced, with per-primitive
`Primitive` structs (80 bytes each) uploaded each frame to a GPU storage buffer. The vertex shader
reads `primitives[gl_InstanceIndex]`, computes world-space position from unit quad corners +
`Base_2D_Primitive` structs (96 bytes each) uploaded each frame to a GPU storage buffer. The vertex
shader reads `primitives[gl_InstanceIndex]`, computes world-space position from unit quad corners +
primitive bounds. The fragment shader dispatches on `Shape_Kind` (encoded in the low byte of
`Primitive.flags`) to evaluate one of four signed distance functions:
`Base_2D_Primitive.flags`) to evaluate one of four signed distance functions:
- **RRect** (kind 1) — `sdRoundedBox` with per-corner radii. Covers rectangles (sharp or rounded),
circles (uniform radii = half-size), and line segments / capsules (rotated RRect with uniform
radii = half-thickness). Covers filled, outlined, textured, and gradient-filled variants.
@@ -28,21 +28,22 @@ modes dispatched by a push constant:
normals. Covers full rings, partial arcs, and pie slices (`inner_radius = 0`).
All SDF shapes support fill, outline, solid color, 2-color linear gradients, 2-color radial
gradients, and texture fills via `Shape_Flags` (see `pipeline_2d_base.odin`). Gradient and outline
parameters are packed into the same 16 bytes as the texture UV rect via a `Uv_Or_Effects` raw union
— zero size increase to the 80-byte `Primitive` struct. Gradient/outline and texture are mutually
exclusive.
gradients, and texture fills via `Shape_Flags` (see `pipeline_2d_base.odin`). The texture UV rect
(`uv_rect: [4]f32`) and the gradient/outline parameters (`effects: Gradient_Outline`) live in their
own 16-byte slots in `Base_2D_Primitive`, so a primitive can carry texture and outline simultaneously.
Gradient and texture remain mutually exclusive at the fill-source level (a Brush variant chooses one
or the other) since they share the worst-case fragment-shader register path.
All SDF shapes produce mathematically exact curves with analytical anti-aliasing via `smoothstep`
no tessellation, no piecewise-linear approximation. A rounded rectangle is 1 primitive (80 bytes)
no tessellation, no piecewise-linear approximation. A rounded rectangle is 1 primitive (96 bytes)
instead of ~250 vertices (~5000 bytes).
The main pipeline's register budget is **≤24 registers** (see "Main/effects split: register pressure"
in the pipeline plan below for the full cliff/margin analysis and SBC architecture context). The
fragment shader's estimated peak footprint is ~2226 fp32 VGPRs (~1622 fp16 VGPRs on architectures
in the pipeline plan below for the full cliff/margin analysis and SBC architecture context).
The fragment shader's estimated peak footprint is ~2226 fp32 VGPRs (~1622 fp16 VGPRs on architectures
with native mediump) via manual live-range analysis. The dominant peak is the Ring_Arc kind path
(wedge normals + inner/outer radii + dot-product temporaries live simultaneously with carried state
like `f_color`, `f_uv_or_effects`, and `half_size`). RRect is 12 regs lower (`corner_radii` vec4
like `f_color`, `f_uv_rect`/`f_effects`, and `half_size`). RRect is 12 regs lower (`corner_radii` vec4
replaces the separate inner/outer + normal pairs). NGon and Ellipse are lighter still. Real compilers
apply live-range coalescing, mediump-to-fp16 promotion, and rematerialization that typically shave
24 regs from hand-counted estimates — the conservative 26-reg upper bound is expected to compile
@@ -439,12 +440,13 @@ vertex shader branches on this uniform to select the tessellated or SDF code pat
- **Tessellated mode** (`mode = 0`): direct vertex buffer with explicit geometry. Used for text
(SDL_ttf atlas sampling), triangles, triangle fans/strips, single-pixel points, and any
user-provided raw vertex geometry.
- **SDF mode** (`mode = 1`): shared unit-quad vertex buffer + GPU storage buffer of `Primitive`
structs, drawn instanced. Used for all shapes with closed-form signed distance functions.
- **SDF mode** (`mode = 1`): shared unit-quad vertex buffer + GPU storage buffer of
`Base_2D_Primitive` structs, drawn instanced. Used for all shapes with closed-form signed distance
functions.
Both modes use the same fragment shader. The fragment shader checks `Shape_Kind` (low byte of
`Primitive.flags`): kind 0 (`Solid`) is the tessellated path, which premultiplies the texture sample
and computes `out = color * t`; kinds 14 dispatch to one of four SDF functions (RRect, NGon,
`Base_2D_Primitive.flags`): kind 0 (`Solid`) is the tessellated path, which premultiplies the texture
sample and computes `out = color * t`; kinds 14 dispatch to one of four SDF functions (RRect, NGon,
Ellipse, Ring_Arc) and apply gradient/texture/outline/solid color based on `Shape_Flags` bits.
#### Why SDF for shapes
@@ -452,8 +454,8 @@ Ellipse, Ring_Arc) and apply gradient/texture/outline/solid color based on `Shap
CPU-side adaptive tessellation for curved shapes (the current approach) has three problems:
1. **Vertex bandwidth.** A rounded rectangle with four corner arcs produces ~250 vertices × 20 bytes
= 5 KB. An SDF rounded rectangle is one `Primitive` struct (~56 bytes) plus 4 shared unit-quad
vertices. That is roughly a 90× reduction per shape.
= 5 KB. An SDF rounded rectangle is one `Base_2D_Primitive` struct (96 bytes) plus 4 shared
unit-quad vertices. That is roughly a 50× reduction per shape.
2. **Quality.** Tessellated curves are piecewise-linear approximations. At high DPI or under
animation/zoom, faceting is visible at any practical segment count. SDF evaluation produces
@@ -484,14 +486,14 @@ SDF primitives are submitted via a GPU storage buffer indexed by `gl_InstanceInd
shader, rather than encoding per-primitive data redundantly in vertex attributes. This follows the
pattern used by both Zed GPUI and vger-rs.
Each SDF shape is described by a single `Primitive` struct (80 bytes) in the storage buffer. The
vertex shader reads `primitives[gl_InstanceIndex]`, computes the quad corner position from the unit
vertex and the primitive's bounds, and passes shape parameters to the fragment shader via `flat`
interpolated varyings.
Each SDF shape is described by a single `Base_2D_Primitive` struct (96 bytes) in the storage
buffer. The vertex shader reads `primitives[gl_InstanceIndex]`, computes the quad corner position
from the unit vertex and the primitive's bounds, and passes shape parameters to the fragment shader
via `flat` interpolated varyings.
Compared to encoding per-primitive data in vertex attributes (the "fat vertex" approach), storage-
buffer instancing eliminates the 46× data duplication across quad corners. A rounded rectangle costs
80 bytes instead of 4 vertices × 40+ bytes = 160+ bytes.
96 bytes instead of 4 vertices × 60+ bytes = 240+ bytes.
The tessellated path retains the existing direct vertex buffer layout (20 bytes/vertex, no storage
buffer access). The vertex shader branch on `mode` (push constant) is warp-uniform — every invocation
@@ -499,15 +501,18 @@ in a draw call has the same mode — so it is effectively free on all modern GPU
#### Shape kinds and SDF dispatch
The fragment shader dispatches on `Shape_Kind` (low byte of `Primitive.flags`) to evaluate one of
four signed distance functions. The `Shape_Kind` enum and per-kind `*_Params` structs are defined in
`pipeline_2d_base.odin`. CPU-side drawing procs in `shapes.odin` build the appropriate `Primitive`
and set the kind automatically:
The fragment shader dispatches on `Shape_Kind` (low byte of `Base_2D_Primitive.flags`) to evaluate
one of four signed distance functions. The `Shape_Kind` enum and per-kind `*_Params` structs are
defined in `pipeline_2d_base.odin`. CPU-side drawing procs in `shapes.odin` build the appropriate
`Base_2D_Primitive` and set the kind automatically:
Each user-facing shape proc accepts a `Brush` union (color, linear gradient, radial gradient,
or textured fill) as its fill source, plus optional outline parameters. The procs map to SDF
kinds as follows:
| User-facing proc | Shape_Kind | SDF function | Notes |
| -------------------- | ---------- | ------------------ | ---------------------------------------------------------- |
| `rectangle` | `RRect` | `sdRoundedBox` | Per-corner radii from `radii` param |
| `rectangle_texture` | `RRect` | `sdRoundedBox` | Textured fill; `.Textured` flag set |
| `circle` | `RRect` | `sdRoundedBox` | Uniform radii = half-size (circle is a degenerate RRect) |
| `line`, `line_strip` | `RRect` | `sdRoundedBox` | Rotated capsule — stadium shape (radii = half-thickness) |
| `ellipse` | `Ellipse` | `sdEllipseApprox` | Approximate ellipse SDF (fast, suitable for UI) |
@@ -599,20 +604,21 @@ to is a hard GPU constraint; the only way to satisfy it is to end the current re
a new one. That render-pass boundary is what a “bracket” is.
**Multi-pass implementation.** Backdrop effects are implemented as separable multi-pass sequences
(downsample → horizontal blur → vertical-blur+composite), following the standard approach used by
iOS `UIVisualEffectView`, Android `RenderEffect`, and Flutter's `BackdropFilter`. Each individual
(downsample → horizontal blur → vertical blurcomposite), following the standard approach used
by iOS `UIVisualEffectView`, Android `RenderEffect`, and Flutter's `BackdropFilter`. Each individual
sub-pass is budgeted at **≤24 registers** (same as the main pipeline — full Valhall occupancy). The
multi-pass approach avoids the monolithic 70+ register shader that a single-pass Gaussian blur would
require, keeping each sub-pass well under the 32-register cliff.
**Approach B: render-target choice.** When any layer in the frame contains a backdrop draw, the
entire frame renders into `source_texture` (a full-resolution single-sample texture owned by the
backdrop pipeline) instead of directly into the swapchain. At the end of the frame, `source_texture`
is copied to the swapchain via a single `CopyGPUTextureToTexture` call. This means the bracket has
no mid-frame texture copy: by the time the bracket runs, `source_texture` already contains the pre-
bracket frame contents and is the natural sampler input. When no layer in the frame has a backdrop
draw, the existing fast path runs: the frame renders directly to the swapchain and the backdrop
pipeline's working textures are never touched. Zero cost for backdrop-free frames.
**Render-target choice.** When any layer in the frame contains a backdrop draw, the entire
frame renders into `source_texture` (a full-resolution single-sample texture owned by the
backdrop pipeline) instead of directly into the swapchain. At the end of the frame,
`source_texture` is copied to the swapchain via a single `CopyGPUTextureToTexture` call.
This means the bracket has no mid-frame texture copy: by the time the bracket runs,
`source_texture` already contains the pre-bracket frame contents and is the natural sampler
input. When no layer in the frame has a backdrop draw, the existing fast path runs: the frame
renders directly to the swapchain and the backdrop pipeline's working textures are never
touched. Zero cost for backdrop-free frames.
**Why not split the backdrop sub-passes into separate pipelines?** Each sub-pass is budgeted at ≤24
registers, well under Valhall's 32-register cliff, so there is no occupancy motivation for splitting.
@@ -638,13 +644,20 @@ submission order. Concretely, a layer with one or more backdrops splits into thr
range. If the layer has no backdrops, none of this kicks in and the layer renders in a single render
pass via the existing fast path.
The downsample runs once per layer, not once per sigma: it just copies `source_texture` to a ¼-
resolution working texture and doesn't depend on the kernel. Each unique sigma in the layer triggers
one H-blur (reads `downsample_texture`, writes `h_blur_texture`) and one V-composite (reads
`h_blur_texture`, writes `source_texture` per-primitive with the SDF mask). Sub-batch coalescing in
`append_or_extend_sub_batch` merges contiguous same-sigma backdrops into a single instanced V-
composite draw call; non-contiguous same-sigma backdrops still share the H-blur output but issue
separate V-composite draws.
Per-sigma-group execution. The bracket walks each layer's sub-batches and groups contiguous
`.Backdrop` sub-batches that share a sigma; each group picks its own downsample factor (1, 2, or 4)
based on `compute_backdrop_downsample_factor`. For each group it runs four sub-passes: a downsample
from `source_texture` to `downsample_texture`; an H-blur from `downsample_texture` to
`h_blur_texture`; a V-blur from `h_blur_texture` back into `downsample_texture` (ping-pong reuse);
and finally a composite that reads the fully-blurred `downsample_texture`, applies the SDF mask
and tint, and writes the result to `source_texture`. Sub-batch coalescing in
`append_or_extend_sub_batch` merges contiguous same-sigma backdrops into a single instanced
composite draw; non-contiguous same-sigma backdrops still share the blur output but issue separate
composite draws.
The working textures are sized at the full swapchain resolution; larger downsample factors only
fill a sub-rect via viewport-limited rendering (see the comment block at the top of `backdrop.odin`
for the factor-selection table and rationale).
#### Submission-order trade-off
@@ -654,12 +667,12 @@ layer. A non-backdrop sub-batch submitted between two backdrops still renders in
bracket), not at its submission position. Worked example:
```
draw.rectangle(layer, bg, GRAY) // 0 Tessellated → Pass A
draw.rectangle(layer, card_blue, BLUE) // 1 SDF → Pass A
draw.rectangle_backdrop(layer, panelA, 12) // 2 Backdrop → Bracket (sees: bg + blue card)
draw.rectangle(layer, card_red, RED) // 3 SDF → Pass B (drawn ON TOP of panelA)
draw.rectangle_backdrop(layer, panelB, 12) // 4 Backdrop → Bracket (sees: bg + blue card; same as panelA)
draw.text(layer, "label", ...) // 5 Text → Pass B (drawn ON TOP of both panels)
draw.rectangle(layer, bg, GRAY) // 0 Tessellated → Pass A
draw.rectangle(layer, card_blue, BLUE) // 1 SDF → Pass A
draw.gaussian_blur(layer, panelA, sigma=12) // 2 Backdrop → Bracket (sees: bg + blue card)
draw.rectangle(layer, card_red, RED) // 3 SDF → Pass B (drawn ON TOP of panelA)
draw.gaussian_blur(layer, panelB, sigma=12) // 4 Backdrop → Bracket (sees: bg + blue card; same as panelA)
draw.text(layer, "label", ...) // 5 Text → Pass B (drawn ON TOP of both panels)
```
In this layer, panelB does *not* see card_red — even though card_red was submitted before panelB —
@@ -674,11 +687,11 @@ card_red:
base := draw.begin(...)
draw.rectangle(base, bg, GRAY)
draw.rectangle(base, card_blue, BLUE)
draw.rectangle_backdrop(base, panelA, 12) // panelA in base layer's bracket
draw.gaussian_blur(base, panelA, sigma=12) // panelA in base layer's bracket
top := draw.new_layer(base, ...)
draw.rectangle(top, card_red, RED)
draw.rectangle_backdrop(top, panelB, 12) // top layer's bracket; sees base + card_red
draw.gaussian_blur(top, panelB, sigma=12) // top layer's bracket; sees base + card_red
draw.text(top, "label", ...)
```
@@ -708,29 +721,30 @@ draws, `position` carries actual world-space geometry. For SDF draws, `position`
corners (0,0 to 1,1) and the vertex shader computes world-space position from the storage-buffer
primitive's bounds.
The `Primitive` struct for SDF shapes lives in the storage buffer, not in vertex attributes:
The `Base_2D_Primitive` struct for SDF shapes lives in the storage buffer, not in vertex attributes:
```
Primitive :: struct {
bounds: [4]f32, // 0: min_x, min_y, max_x, max_y
color: Color, // 16: u8x4, unpacked in shader via unpackUnorm4x8
flags: u32, // 20: low byte = Shape_Kind, bits 8+ = Shape_Flags
rotation_sc: u32, // 24: packed f16 pair (sin, cos). Requires .Rotated flag.
_pad: f32, // 28: reserved for future use
params: Shape_Params, // 32: per-kind params union (half_feather, radii, etc.) (32 bytes)
uv: Uv_Or_Effects, // 64: texture UV rect or gradient/outline parameters (16 bytes)
Base_2D_Primitive :: struct {
bounds: [4]f32, // 0: min_x, min_y, max_x, max_y
color: Color, // 16: u8x4, unpacked in shader via unpackUnorm4x8
flags: u32, // 20: low byte = Shape_Kind, bits 8+ = Shape_Flags
rotation_sc: u32, // 24: packed f16 pair (sin, cos). Requires .Rotated flag.
_pad: f32, // 28: reserved for future use
params: Shape_Params, // 32: per-kind params union (half_feather, radii, etc.) (32 bytes)
uv_rect: [4]f32, // 64: texture UV coordinates. Read when .Textured.
effects: Gradient_Outline, // 80: gradient and/or outline parameters (16 bytes).
}
// Total: 80 bytes (std430 aligned)
// Total: 96 bytes (std430 aligned)
```
`Shape_Params` is a `#raw_union` over `RRect_Params`, `NGon_Params`, `Ellipse_Params`, and
`Ring_Arc_Params` (plus a `raw: [8]f32` view), defined in `pipeline_2d_base.odin`. Each SDF kind
writes its own params variant; the fragment shader reads the appropriate fields based on `Shape_Kind`.
`Uv_Or_Effects` is a `#raw_union` that aliases `[4]f32` (texture UV rect: u_min, v_min, u_max,
v_max) with a `Gradient_Outline` struct containing `gradient_color: Color`, `outline_color: Color`,
`Gradient_Outline` is a 16-byte struct containing `gradient_color: Color`, `outline_color: Color`,
`gradient_dir_sc: u32` (packed f16 cos/sin pair), and `outline_packed: u32` (packed f16 outline
width). The `flags` field encodes the `Shape_Kind` in the low byte and `Shape_Flags` in bits 8+
via `pack_kind_flags`.
width). It is independent of `uv_rect`, so a primitive can carry texture and outline parameters at
the same time. The `flags` field encodes the `Shape_Kind` in the low byte and `Shape_Flags` in bits
8+ via `pack_kind_flags`.
### Draw submission order
@@ -754,7 +768,7 @@ pair into bitmap atlases and emits indexed triangle data via `GetGPUTextDrawData
**unchanged** by the SDF migration — text continues to flow through the main pipeline's tessellated
mode with `mode = 0`, sampling the SDL_ttf atlas texture.
A future phase may evaluate MSDF (multi-channel signed distance field) text rendering, which would
MSDF (multi-channel signed distance field) text rendering may be evaluated later, which would
allow resolution-independent glyph rendering from a single small atlas per font. This would involve:
- Offline atlas generation via Chlumský's msdf-atlas-gen tool.
@@ -763,8 +777,7 @@ allow resolution-independent glyph rendering from a single small atlas per font.
already exists for the four current SDF kinds).
- Potential removal of the SDL_ttf dependency.
This is explicitly deferred. The SDF shape migration is independent of and does not block text
changes.
This is explicitly deferred.
**References:**
@@ -778,8 +791,8 @@ changes.
### Textures
Textures plug into the existing main pipeline — no additional GPU pipeline, no shader rewrite. The
work is a resource layer (registration, upload, sampling, lifecycle) plus two textured-draw procs
that route into the existing tessellated and SDF paths respectively.
work is a resource layer (registration, upload, sampling, lifecycle) plus a `Texture_Fill` Brush
variant that routes the existing shape procs through the SDF path with the `.Textured` flag set.
#### Why draw owns registered textures
@@ -829,22 +842,25 @@ with the same texture but different samplers produce separate draw calls, which
#### Textured draw procs
Textured rectangles route through the existing SDF path via `rectangle_texture`, which mirrors
`rectangle` exactly — same parameters for radii, origin, rotation, feather — with the `color`
parameter replaced by a `Texture_Id`, an optional `tint`, a `uv_rect`, and a `Sampler_Preset`.
Textures share the same shape procs as colors and gradients. Each shape proc takes a `Brush`
union as its fill source; passing a `Texture_Fill` value (carrying `Texture_Id`, `tint`,
`uv_rect`, and `Sampler_Preset`) routes the draw through the SDF path with the `.Textured`
flag set. There is no dedicated `rectangle_texture` / `circle_texture` proc — the same
`rectangle`, `circle`, `ellipse`, `polygon`, `ring`, `line`, and `line_strip` procs handle
all fill sources.
An earlier iteration of this design considered a separate tessellated proc for "simple" fullscreen
quads, on the theory that the tessellated path's lower register count would improve occupancy at
large fragment counts. Both paths are well within the ≤24-register main pipeline budget — both run at
full occupancy on every target architecture (Valhall and above). The remaining ALU difference (~15
extra instructions for the SDF evaluation) amounts to ~20μs at 4K — below noise. Meanwhile,
splitting into a separate pipeline would add ~15μs per pipeline bind on the CPU side per scissor,
matching or exceeding the GPU-side savings. Within the main pipeline, unified remains strictly better.
A separate tessellated proc for "simple" fullscreen quads was considered on the theory that
the tessellated path's lower register count would improve occupancy at large fragment counts.
Both paths are well within the ≤24-register main pipeline budget — both run at full
occupancy on every target architecture (Valhall and above). The remaining ALU difference
(~15 extra instructions for the SDF evaluation) amounts to ~20μs at 4K — below noise.
Meanwhile, splitting into a separate pipeline would add ~15μs per pipeline bind on the CPU
side per scissor, matching or exceeding the GPU-side savings. Within the main pipeline,
unified remains strictly better.
SDF drawing procs live in the `draw` package with unprefixed names (`rectangle`, `rectangle_texture`,
`circle`, `ellipse`, `polygon`, `ring`, `line`, `line_strip`). Gradients and outlines are optional
parameters on each proc rather than separate overloads. Future per-shape texture variants
(`circle_texture`, `ellipse_texture`) are additive.
SDF drawing procs live in the `draw` package with unprefixed names (`rectangle`, `circle`,
`ellipse`, `polygon`, `ring`, `line`, `line_strip`). Gradients, textures, and outlines are
selected via the `Brush` union and optional outline parameters rather than separate overloads.
#### What SDF anti-aliasing does and does not do for textured draws
@@ -858,8 +874,8 @@ depends on how closely the display size matches the SDL_ttf atlas's rasterized s
#### Fit modes are a computation layer, not a renderer concept
Standard image-fit behaviors (stretch, fill/cover, fit/contain, tile, center) are expressed as UV
sub-region computations on top of the `uv_rect` parameter that both textured-draw procs accept. The
renderer has no knowledge of fit modes — it samples whatever UV region it is given.
sub-region computations on top of the `uv_rect` field of `Texture_Fill`. The renderer has no
knowledge of fit modes — it samples whatever UV region it is given.
A `fit_params` helper computes the appropriate `uv_rect`, sampler preset, and (for letterbox/fit
mode) shrunken inner rect from a `Fit_Mode` enum, the target rect, and the texture's pixel size.
@@ -883,13 +899,13 @@ textures onto a free list that is processed in `r_end_frame`, not at the call si
Clay's `RenderCommandType.Image` is handled by dereferencing `imageData: rawptr` as a pointer to a
`Clay_Image_Data` struct containing a `Texture_Id`, `Fit_Mode`, and tint color. Routing mirrors the
existing rectangle handling: `fit_params` computes UVs from the fit mode, then
`rectangle_texture` is called with the appropriate radii (zero for sharp corners, per-corner values
from Clay's `cornerRadius` otherwise).
existing rectangle handling: `fit_params` computes UVs from the fit mode, then `rectangle` is
called with a `Texture_Fill` brush and the appropriate radii (zero for sharp corners, per-corner
values from Clay's `cornerRadius` otherwise).
#### Deferred features
The following are plumbed in the descriptor but not implemented in phase 1:
The following are plumbed in `Texture_Desc` but not yet implemented:
- **Mipmaps**: `Texture_Desc.mip_levels` field exists; generation via SDL3 deferred.
- **Compressed formats**: `Texture_Desc.format` accepts BC/ASTC; upload path deferred.
@@ -897,7 +913,6 @@ The following are plumbed in the descriptor but not implemented in phase 1:
- **3D textures, arrays, cube maps**: `Texture_Desc.type` and `depth_or_layers` fields exist.
- **Additional samplers**: anisotropic, trilinear, clamp-to-border — additive enum values.
- **Atlas packing**: internal optimization for sub-batch coalescing; invisible to callers.
- **Per-shape texture variants**: `circle_texture`, `ellipse_texture`, `polygon_texture` — potential future additions, following the existing naming convention.
**References:**