draw-improvements (#17)

Major rework to draw rendering system. We are making a SDF first rendering system with tesselated stuff only as a fallback strategy for specific situations where SDF is particularly poorly suited Co-authored-by: Zachary Levy <zachary@sunforge.is> Reviewed-on: #17
2026-04-24 07:57:44 +00:00
parent 37da2ea068
commit bca19277b3
15 changed files with 1773 additions and 1736 deletions
@@ -9,35 +9,51 @@ The renderer uses a single unified `Pipeline_2D_Base` (`TRIANGLELIST` pipeline)
 modes dispatched by a push constant:

 - **Mode 0 (Tessellated):** Vertex buffer contains real geometry. Used for text (indexed draws into
-  SDL_ttf atlas textures), axis-aligned sharp-corner rectangles (already optimal as 2 triangles),
-  per-vertex color gradients (`rectangle_gradient`, `circle_gradient`), angular-clipped circle
-  sectors (`circle_sector`), and arbitrary user geometry (`triangle`, `triangle_fan`,
-  `triangle_strip`). The fragment shader computes `out = color * texture(tex, uv)`.
+  SDL_ttf atlas textures), single-pixel points (`tes_pixel`), arbitrary user geometry (`tes_triangle`,
+  `tes_triangle_fan`, `tes_triangle_strip`), and shapes without a closed-form rounded-rectangle
+  reduction: ellipses (`tes_ellipse`), regular polygons (`tes_polygon`), and circle sectors
+  (`tes_sector`). The fragment shader computes `out = color * texture(tex, uv)`.

 - **Mode 1 (SDF):** A static 6-vertex unit-quad buffer is drawn instanced, with per-primitive
-  `Primitive` structs uploaded each frame to a GPU storage buffer. The vertex shader reads
-  `primitives[gl_InstanceIndex]`, computes world-space position from unit quad corners + primitive
-  bounds. The fragment shader dispatches on `Shape_Kind` to evaluate the correct signed distance
-  function analytically.
+  `Primitive` structs (80 bytes each) uploaded each frame to a GPU storage buffer. The vertex shader
+  reads `primitives[gl_InstanceIndex]`, computes world-space position from unit quad corners +
+  primitive bounds. The fragment shader always evaluates `sdRoundedBox` — there is no per-primitive
+  kind dispatch.

-Seven SDF shape kinds are implemented:
+The SDF path handles all shapes that are algebraically reducible to a rounded rectangle:

-1. **RRect** — rounded rectangle with per-corner radii (iq's `sdRoundedBox`)
-2. **Circle** — filled or stroked circle
-3. **Ellipse** — exact signed-distance ellipse (iq's iterative `sdEllipse`)
-4. **Segment** — capsule-style line segment with rounded caps
-5. **Ring_Arc** — annular ring with angular clipping for arcs
-6. **NGon** — regular polygon with arbitrary side count and rotation
-7. **Polyline** — decomposed into independent `Segment` primitives per adjacent point pair
+- **Rounded rectangles** — per-corner radii via `sdRoundedBox` (iq). Covers filled, stroked,
+  textured, and gradient-filled rectangles.
+- **Circles** — uniform radii equal to half-size. Covers filled, stroked, and radial-gradient circles.
+- **Line segments / capsules** — rotated RRect with uniform radii equal to half-thickness (stadium shape).
+- **Full rings / annuli** — stroked circle (mid-radius with stroke thickness = outer - inner).

-All SDF shapes support fill and stroke modes via `Shape_Flags`, and produce mathematically exact
-curves with analytical anti-aliasing via `smoothstep` — no tessellation, no piecewise-linear
-approximation. A rounded rectangle is 1 primitive (64 bytes) instead of ~250 vertices (~5000 bytes).
+All SDF shapes support fill, stroke, solid color, bilinear 4-corner gradients, radial 2-color
+gradients, and texture fills via `Shape_Flags`. Gradient colors are packed into the same 16 bytes as
+the texture UV rect via a `Uv_Or_Gradient` raw union — zero size increase to the 80-byte `Primitive`
+struct. Gradient and texture are mutually exclusive.
+
+All SDF shapes produce mathematically exact curves with analytical anti-aliasing via `smoothstep` —
+no tessellation, no piecewise-linear approximation. A rounded rectangle is 1 primitive (80 bytes)
+instead of ~250 vertices (~5000 bytes).
+
+The fragment shader's estimated register footprint is ~20–23 VGPRs via static live-range analysis.
+RRect and Ring_Arc are roughly tied at peak pressure — RRect carries `corner_radii` (4 regs) plus
+`sdRoundedBox` temporaries, Ring_Arc carries wedge normals plus dot-product temporaries. Both land
+comfortably under Mali Valhall's 32-register occupancy cliff (G57/G77/G78 and later) and well under
+desktop limits. On older Bifrost Mali (G71/G72/G76, 16-register cliff) either shape kind may incur
+partial occupancy reduction. These estimates are hand-counted; exact numbers require `malioc` or
+Radeon GPU Analyzer against the compiled SPIR-V.

 MSAA is opt-in (default `._1`, no MSAA) via `Init_Options.msaa_samples`. SDF rendering does not
 benefit from MSAA because fragment coverage is computed analytically. MSAA remains useful for text
 glyph edges and tessellated user geometry if desired.

+All public drawing procs use prefixed names for clarity: `sdf_*` for SDF-path shapes, `tes_*` for
+tessellated-path shapes. Proc groups provide a single entry point per shape concept (e.g.,
+`sdf_rectangle` dispatches to `sdf_rectangle_solid` or `sdf_rectangle_gradient` based on argument
+count).
+
 ## 2D rendering pipeline plan

 This section documents the planned architecture for levlib's 2D rendering system. The design is driven
@@ -91,19 +107,19 @@ Below the cliff, adding registers has zero occupancy cost.

 On consumer Ampere/Ada GPUs (RTX 30xx/40xx, 65,536 regs/SM, max 1,536 threads/SM, cliff at ~43 regs):

-| Register allocation     | Reg-limited threads | Actual (hw-capped) | Occupancy |
-| ----------------------- | ------------------- | ------------------ | --------- |
-| 20 regs (main pipeline) | 3,276               | 1,536              | 100%      |
-| 32 regs                 | 2,048               | 1,536              | 100%      |
-| 48 regs (effects)       | 1,365               | 1,365              | ~89%      |
+| Register allocation      | Reg-limited threads | Actual (hw-capped) | Occupancy |
+| ------------------------ | ------------------- | ------------------ | --------- |
+| ~16 regs (main pipeline) | 4,096               | 1,536              | 100%      |
+| 32 regs                  | 2,048               | 1,536              | 100%      |
+| 48 regs (effects)        | 1,365               | 1,365              | ~89%      |

 On Volta/A100 GPUs (65,536 regs/SM, max 2,048 threads/SM, cliff at ~32 regs):

-| Register allocation     | Reg-limited threads | Actual (hw-capped) | Occupancy |
-| ----------------------- | ------------------- | ------------------ | --------- |
-| 20 regs (main pipeline) | 3,276               | 2,048              | 100%      |
-| 32 regs                 | 2,048               | 2,048              | 100%      |
-| 48 regs (effects)       | 1,365               | 1,365              | ~67%      |
+| Register allocation      | Reg-limited threads | Actual (hw-capped) | Occupancy |
+| ------------------------ | ------------------- | ------------------ | --------- |
+| ~16 regs (main pipeline) | 4,096               | 2,048              | 100%      |
+| 32 regs                  | 2,048               | 2,048              | 100%      |
+| 48 regs (effects)        | 1,365               | 1,365              | ~67%      |

 On low-end mobile (ARM Mali Bifrost/Valhall, 64 regs/thread, cliff fixed at 32 regs):

@@ -261,11 +277,12 @@ Our design has two branch points:
   Every thread in every warp of a draw call sees the same `mode` value. **Zero divergence, zero
   cost.**

-2. **`shape_kind` (flat varying from storage buffer): which SDF to evaluate.** This is category 3.
+2. **`flags` (flat varying from storage buffer): gradient/texture/stroke mode.** This is category 3.
   The `flat` interpolation qualifier ensures that all fragments rasterized from one primitive's quad
-   receive the same `shape_kind` value. Divergence can only occur at the **boundary between two
-   adjacent primitives of different kinds**, where the rasterizer might pack fragments from both
-   primitives into the same warp.
+   receive the same flag bits. However, since the SDF path now evaluates only `sdRoundedBox` with no
+   kind dispatch, the only flag-dependent branches are gradient vs. texture vs. solid color selection
+   — all lightweight (3–8 instructions per path). Divergence at primitive boundaries between
+   different flag combinations has negligible cost.

 For category 3, the divergence analysis depends on primitive size:

@@ -282,9 +299,10 @@ For category 3, the divergence analysis depends on primitive size:
  frame-level divergence is typically **1–3%** of all warps.

 At 1–3% divergence, the throughput impact is negligible. At 4K with 12.4M total fragments
-(~387,000 warps), divergent boundary warps number in the low thousands. Each divergent warp pays at
-most ~25 extra instructions (the cost of the longest untaken SDF branch). At ~12G instructions/sec
-on a mid-range GPU, that totals ~4μs — under 0.05% of an 8.3ms (120 FPS) frame budget. This is
+(~387,000 warps), divergent boundary warps number in the low thousands. Without kind dispatch, the
+longest untaken branch is the gradient evaluation (~8 instructions), not a different SDF function.
+Each divergent warp pays at most ~8 extra instructions. At ~12G instructions/sec on a mid-range GPU,
+that totals ~1.3μs — under 0.02% of an 8.3ms (120 FPS) frame budget. This is
 confirmed by production renderers that use exactly this pattern:

 - **vger / vger-rs** (Audulus): single pipeline, 11 primitive kinds dispatched by a `switch` on a
@@ -309,9 +327,10 @@ our design:
   > have no per-fragment data-dependent branches in the main pipeline.

 2. **Branches where both paths are very long.** If both sides of a branch are 500+ instructions,
-   divergent warps pay double a large cost. Our SDF functions are 10–25 instructions each. Even
-   fully divergent, the penalty is ~25 extra instructions — less than a single texture sample's
-   latency.
+   divergent warps pay double a large cost. Without kind dispatch, the SDF path always evaluates
+   `sdRoundedBox`; the only branches are gradient/texture/solid color selection at 3–8 instructions
+   each. Even fully divergent, the penalty is ~8 extra instructions — less than a single texture
+   sample's latency.

 3. **Branches that prevent compiler optimizations.** Some compilers cannot schedule instructions
   across branch boundaries, reducing VLIW utilization on older architectures. Modern GPUs (NVIDIA
@@ -319,9 +338,9 @@ our design:
   concern.

 4. **Register pressure from the union of all branches.** This is the real cost, and it is why we
-   split heavy effects (shadows, glass) into separate pipelines. Within the main pipeline, all SDF
-   branches have similar register footprints (12–22 registers), so combining them causes negligible
-   occupancy loss.
+   split heavy effects (shadows, glass) into separate pipelines. Within the main pipeline, the SDF
+   path has a single evaluation (sdRoundedBox) with flag-based color selection, clustering at ~15–18
+   registers, so there is negligible occupancy loss.

 **References:**

@@ -342,17 +361,19 @@ our design:
 ### Main pipeline: SDF + tessellated (unified)

 The main pipeline serves two submission modes through a single `TRIANGLELIST` pipeline and a single
-vertex input layout, distinguished by a push constant:
+vertex input layout, distinguished by a mode marker in the `Primitive.flags` field (low byte:
+0 = tessellated, 1 = SDF). The tessellated path sets this to 0 via zero-initialization in the vertex
+shader; the SDF path sets it to 1 via `pack_flags`.

- **Tessellated mode** (`mode = 0`): direct vertex buffer with explicit geometry. Unchanged from
-  today. Used for text (SDL_ttf atlas sampling), polylines, triangle fans/strips, gradient-filled
-  shapes, and any user-provided raw vertex geometry.
+- **Tessellated mode** (`mode = 0`): direct vertex buffer with explicit geometry. Used for text
+  (SDL_ttf atlas sampling), triangle fans/strips, ellipses, regular polygons, circle sectors, and
+  any user-provided raw vertex geometry.
 - **SDF mode** (`mode = 1`): shared unit-quad vertex buffer + GPU storage buffer of `Primitive`
  structs, drawn instanced. Used for all shapes with closed-form signed distance functions.

-Both modes converge on the same fragment shader, which dispatches on a `shape_kind` discriminant
-carried either in the vertex data (tessellated, always `Solid = 0`) or in the storage-buffer
-primitive struct (SDF modes).
+Both modes use the same fragment shader. The fragment shader checks the mode marker: mode 0 computes
+`out = color * texture(tex, uv)`; mode 1 always evaluates `sdRoundedBox` and applies
+gradient/texture/solid color based on flag bits.

 #### Why SDF for shapes

@@ -391,49 +412,60 @@ SDF primitives are submitted via a GPU storage buffer indexed by `gl_InstanceInd
 shader, rather than encoding per-primitive data redundantly in vertex attributes. This follows the
 pattern used by both Zed GPUI and vger-rs.

-Each SDF shape is described by a single `Primitive` struct (~56 bytes) in the storage buffer. The
+Each SDF shape is described by a single `Primitive` struct (80 bytes) in the storage buffer. The
 vertex shader reads `primitives[gl_InstanceIndex]`, computes the quad corner position from the unit
 vertex and the primitive's bounds, and passes shape parameters to the fragment shader via `flat`
 interpolated varyings.

 Compared to encoding per-primitive data in vertex attributes (the "fat vertex" approach), storage-
 buffer instancing eliminates the 4–6× data duplication across quad corners. A rounded rectangle costs
-56 bytes instead of 4 vertices × 40+ bytes = 160+ bytes.
+80 bytes instead of 4 vertices × 40+ bytes = 160+ bytes.

 The tessellated path retains the existing direct vertex buffer layout (20 bytes/vertex, no storage
 buffer access). The vertex shader branch on `mode` (push constant) is warp-uniform — every invocation
 in a draw call has the same mode — so it is effectively free on all modern GPUs.

-#### Shape kinds
+#### Shape folding

-Primitives in the main pipeline's storage buffer carry a `Shape_Kind` discriminant:
+The SDF path evaluates a single function — `sdRoundedBox` — for all primitives. There is no
+`Shape_Kind` enum or per-primitive kind dispatch in the fragment shader. Shapes that are algebraically
+special cases of a rounded rectangle are emitted as RRect primitives by the CPU-side drawing procs:

-| Kind       | SDF function                           | Notes                                                     |
-| ---------- | -------------------------------------- | --------------------------------------------------------- |
-| `RRect`    | `sdRoundedBox` (iq)                    | Per-corner radii. Covers all Clay rectangles and borders. |
-| `Circle`   | `sdCircle`                             | Filled and stroked.                                       |
-| `Ellipse`  | `sdEllipse`                            | Exact (iq's closed-form).                                 |
-| `Segment`  | `sdSegment` capsule                    | Rounded caps, correct sub-pixel thin lines.               |
-| `Ring_Arc` | `abs(sdCircle) - thickness` + arc mask | Rings, arcs, circle sectors unified.                      |
-| `NGon`     | `sdRegularPolygon`                     | Regular n-gon for n ≥ 5.                                  |
+| User-facing shape            | RRect mapping                                | Notes                                    |
+| ---------------------------- | -------------------------------------------- | ---------------------------------------- |
+| Rectangle (sharp or rounded) | Direct                                       | Per-corner radii from `radii` param      |
+| Circle                       | `half_size = (r, r)`, `radii = (r, r, r, r)` | Uniform radii = half-size                |
+| Line segment / capsule       | Rotated RRect, `radii = half_thickness`      | Stadium shape (fully-rounded minor axis) |
+| Full ring / annulus          | Stroked circle at mid-radius                 | `stroke_px = outer - inner`              |

-The `Solid` kind (value 0) is reserved for the tessellated path, where `shape_kind` is implicitly
-zero because the fragment shader receives it from zero-initialized vertex attributes.
+Shapes without a closed-form RRect reduction are drawn via the tessellated path:

-Stroke/outline variants of each shape are handled by the `Shape_Flags` bit set rather than separate
-shape kinds. The fragment shader transforms `d = abs(d) - stroke_width` when the `Stroke` flag is
-set.
+| Shape                     | Tessellated proc                   | Method                     |
+| ------------------------- | ---------------------------------- | -------------------------- |
+| Ellipse                   | `tes_ellipse`, `tes_ellipse_lines` | Triangle fan approximation |
+| Regular polygon (N-gon)   | `tes_polygon`, `tes_polygon_lines` | Triangle fan from center   |
+| Circle sector (pie slice) | `tes_sector`                       | Triangle fan arc           |
+
+The `Shape_Flags` bit set controls rendering mode per primitive:
+
+| Flag              | Bit | Effect                                                               |
+| ----------------- | --- | -------------------------------------------------------------------- |
+| `Stroke`          | 0   | Outline instead of fill (`d = abs(d) - stroke_width/2`)              |
+| `Textured`        | 1   | Sample texture using `uv.uv_rect` (mutually exclusive with Gradient) |
+| `Gradient`        | 2   | Bilinear 4-corner interpolation from `uv.corner_colors`              |
+| `Gradient_Radial` | 3   | Radial 2-color falloff (inner/outer) from `uv.corner_colors[0..1]`   |

 **What stays tessellated:**

 - Text (SDL_ttf atlas, pending future MSDF evaluation)
- `rectangle_gradient`, `circle_gradient` (per-vertex color interpolation)
- `triangle_fan`, `triangle_strip` (arbitrary user-provided point lists)
- `line_strip` / polylines (SDF polyline rendering is possible but complex; deferred)
+- Ellipses (`tes_ellipse`, `tes_ellipse_lines`)
+- Regular polygons (`tes_polygon`, `tes_polygon_lines`)
+- Circle sectors / pie slices (`tes_sector`)
+- `tes_triangle`, `tes_triangle_fan`, `tes_triangle_strip` (arbitrary user-provided geometry)
 - Any raw vertex geometry submitted via `prepare_shape`

-The rule: if the shape has a closed-form SDF, it goes SDF. If it's described only by a vertex list or
-needs per-vertex color interpolation, it stays tessellated.
+The design rule: if the shape reduces to `sdRoundedBox`, it goes SDF. If it requires a different SDF
+function or is described by a vertex list, it stays tessellated.

 ### Effects pipeline

@@ -547,21 +579,21 @@ The `Primitive` struct for SDF shapes lives in the storage buffer, not in vertex

 ```
 Primitive :: struct {
-    bounds:     [4]f32,         //  0: min_x, min_y, max_x, max_y
-    color:      Color,          // 16: u8x4, unpacked in shader via unpackUnorm4x8
-    kind_flags: u32,            // 20: (kind as u32) | (flags as u32 << 8)
-    rotation:   f32,            // 24: shader self-rotation in radians
-    _pad:       f32,            // 28: alignment
-    params:     Shape_Params,   // 32: raw union, 32 bytes (two vec4s of shape-specific data)
-    uv_rect:    [4]f32,         // 64: texture UV sub-region (u_min, v_min, u_max, v_max)
+    bounds:      [4]f32,          //  0: min_x, min_y, max_x, max_y
+    color:       Color,           // 16: u8x4, unpacked in shader via unpackUnorm4x8
+    flags:       u32,             // 20: low byte = Shape_Kind, bits 8+ = Shape_Flags
+    rotation_sc: u32,             // 24: packed f16 pair (sin, cos). Requires .Rotated flag.
+    _pad:        f32,             // 28: reserved for future use
+    params:      Shape_Params,    // 32: per-kind params union (half_feather, radii, etc.) (32 bytes)
+    uv:          Uv_Or_Effects,   // 64: texture UV rect or gradient/outline parameters (16 bytes)
 }
 // Total: 80 bytes (std430 aligned)
 ```

-`Shape_Params` is a `#raw_union` with named variants per shape kind (`rrect`, `circle`, `segment`,
-etc.), ensuring type safety on the CPU side and zero-cost reinterpretation on the GPU side. The
-`uv_rect` field is used by textured SDF primitives (Shape_Flag.Textured); non-textured primitives
-leave it zeroed.
+`RRect_Params` holds the rounded-rectangle parameters directly — there is no `Shape_Params` union.
+`Uv_Or_Gradient` is a `#raw_union` that aliases `[4]f32` (texture UV rect) with `[4]Color` (gradient
+corner colors, clockwise from top-left: TL, TR, BR, BL). The `flags` field encodes both the
+tessellated/SDF mode marker (low byte) and shape flags (bits 8+) via `pack_flags`.

 ### Draw submission order

@@ -583,14 +615,15 @@ invariant is that each primitive is drawn exactly once, in the pipeline that own
 Text rendering currently uses SDL_ttf's GPU text engine, which rasterizes glyphs per `(font, size)`
 pair into bitmap atlases and emits indexed triangle data via `GetGPUTextDrawData`. This path is
 **unchanged** by the SDF migration — text continues to flow through the main pipeline's tessellated
-mode with `shape_kind = Solid`, sampling the SDL_ttf atlas texture.
+mode with `mode = 0`, sampling the SDL_ttf atlas texture.

 A future phase may evaluate MSDF (multi-channel signed distance field) text rendering, which would
 allow resolution-independent glyph rendering from a single small atlas per font. This would involve:

 - Offline atlas generation via Chlumský's msdf-atlas-gen tool.
 - Runtime glyph metrics via `vendor:stb/truetype` (already in the Odin distribution).
- A new `Shape_Kind.MSDF_Glyph` variant in the main pipeline's fragment shader.
+- A new MSDF glyph mode in the fragment shader, which would require reintroducing a mode/kind
+  distinction (the current shader evaluates only `sdRoundedBox` with no kind dispatch).
 - Potential removal of the SDL_ttf dependency.

 This is explicitly deferred. The SDF shape migration is independent of and does not block text
@@ -659,30 +692,26 @@ with the same texture but different samplers produce separate draw calls, which

 #### Textured draw procs

-Textured rectangles route through the existing SDF path via `draw.rectangle_texture` and
-`draw.rectangle_texture_corners`, mirroring `draw.rectangle` and `draw.rectangle_corners` exactly —
+Textured rectangles route through the existing SDF path via `sdf_rectangle_texture` and
+`sdf_rectangle_texture_corners`, mirroring `sdf_rectangle` and `sdf_rectangle_corners` exactly —
 same parameters, same naming — with the color parameter replaced by a texture ID plus an optional
 tint.

-An earlier iteration of this design considered a separate tessellated `draw.texture` proc for
-"simple" fullscreen quads, on the theory that the tessellated path's lower register count (~16 regs
-vs ~24 for the SDF textured branch) would improve occupancy at large fragment counts. Applying the
-register-pressure analysis from the pipeline-strategy section above shows this is wrong: both 16 and
-24 registers are well below the register cliff (~43 regs on consumer Ampere/Ada, ~32 on Volta/A100),
-so both run at 100% occupancy. The remaining ALU difference (~15 extra instructions for the SDF
-evaluation) amounts to ~20μs at 4K — below noise. Meanwhile, splitting into a separate pipeline
-would add ~1–5μs per pipeline bind on the CPU side per scissor, matching or exceeding the GPU-side
-savings. Within the main pipeline, unified remains strictly better.
+An earlier iteration of this design considered a separate tessellated proc for "simple" fullscreen
+quads, on the theory that the tessellated path's lower register count (~16 regs vs ~18 for the SDF
+textured branch) would improve occupancy at large fragment counts. Applying the register-pressure
+analysis from the pipeline-strategy section above shows this is wrong: both 16 and 18 registers are
+well below the register cliff (~43 regs on consumer Ampere/Ada, ~32 on Volta/A100), so both run at
+100% occupancy. The remaining ALU difference (~15 extra instructions for the SDF evaluation) amounts
+to ~20μs at 4K — below noise. Meanwhile, splitting into a separate pipeline would add ~1–5μs per
+pipeline bind on the CPU side per scissor, matching or exceeding the GPU-side savings. Within the
+main pipeline, unified remains strictly better.

-The naming convention follows the existing shape API: `rectangle_texture` and
-`rectangle_texture_corners` sit alongside `rectangle` and `rectangle_corners`, mirroring the
-`rectangle_gradient` / `circle_gradient` pattern where the shape is the primary noun and the
-modifier (gradient, texture) is secondary. This groups related procs together in autocomplete
-(`rectangle_*`) and reads as natural English ("draw a rectangle with a texture").
-
-Future per-shape texture variants (`circle_texture`, `ellipse_texture`, `polygon_texture`) are
-reserved by this naming convention and require only a `Shape_Flag.Textured` bit plus a small
-per-shape UV mapping function in the fragment shader. These are additive.
+The naming convention uses `sdf_` and `tes_` prefixes to indicate the rendering path, with suffixes
+for modifiers: `sdf_rectangle_texture` and `sdf_rectangle_texture_corners` sit alongside
+`sdf_rectangle` (solid or gradient overload). Proc groups like `sdf_rectangle` dispatch to
+`sdf_rectangle_solid` or `sdf_rectangle_gradient` based on argument count. Future per-shape texture
+variants (`sdf_circle_texture`) are additive.

 #### What SDF anti-aliasing does and does not do for textured draws

@@ -721,9 +750,9 @@ textures onto a free list that is processed in `r_end_frame`, not at the call si

 Clay's `RenderCommandType.Image` is handled by dereferencing `imageData: rawptr` as a pointer to a
 `Clay_Image_Data` struct containing a `Texture_Id`, `Fit_Mode`, and tint color. Routing mirrors the
-existing rectangle handling: zero `cornerRadius` dispatches to `draw.texture` (tessellated), nonzero
-dispatches to `draw.rectangle_texture_corners` (SDF). A `fit_params` call computes UVs from the fit
-mode before dispatch.
+existing rectangle handling: zero `cornerRadius` dispatches to `sdf_rectangle_texture` (SDF, sharp
+corners), nonzero dispatches to `sdf_rectangle_texture_corners` (SDF, per-corner radii). A
+`fit_params` call computes UVs from the fit mode before dispatch.

 #### Deferred features

@@ -735,7 +764,7 @@ The following are plumbed in the descriptor but not implemented in phase 1:
 - **3D textures, arrays, cube maps**: `Texture_Desc.type` and `depth_or_layers` fields exist.
 - **Additional samplers**: anisotropic, trilinear, clamp-to-border — additive enum values.
 - **Atlas packing**: internal optimization for sub-batch coalescing; invisible to callers.
- **Per-shape texture variants**: `circle_texture`, `ellipse_texture`, etc. — reserved by naming.
+- **Per-shape texture variants**: `sdf_circle_texture`, `tes_ellipse_texture`, `tes_polygon_texture` — potential future additions, reserved by naming convention.

 **References:**