draw-improvements (#17)

Major rework to draw rendering system. We are making a SDF first rendering system with tesselated stuff only as a fallback strategy for specific situations where SDF is particularly poorly suited

Co-authored-by: Zachary Levy <zachary@sunforge.is>
Reviewed-on: #17
This commit was merged in pull request #17.
This commit is contained in:
2026-04-24 07:57:44 +00:00
parent 37da2ea068
commit bca19277b3
15 changed files with 1773 additions and 1736 deletions

View File

@@ -9,35 +9,51 @@ The renderer uses a single unified `Pipeline_2D_Base` (`TRIANGLELIST` pipeline)
modes dispatched by a push constant:
- **Mode 0 (Tessellated):** Vertex buffer contains real geometry. Used for text (indexed draws into
SDL_ttf atlas textures), axis-aligned sharp-corner rectangles (already optimal as 2 triangles),
per-vertex color gradients (`rectangle_gradient`, `circle_gradient`), angular-clipped circle
sectors (`circle_sector`), and arbitrary user geometry (`triangle`, `triangle_fan`,
`triangle_strip`). The fragment shader computes `out = color * texture(tex, uv)`.
SDL_ttf atlas textures), single-pixel points (`tes_pixel`), arbitrary user geometry (`tes_triangle`,
`tes_triangle_fan`, `tes_triangle_strip`), and shapes without a closed-form rounded-rectangle
reduction: ellipses (`tes_ellipse`), regular polygons (`tes_polygon`), and circle sectors
(`tes_sector`). The fragment shader computes `out = color * texture(tex, uv)`.
- **Mode 1 (SDF):** A static 6-vertex unit-quad buffer is drawn instanced, with per-primitive
`Primitive` structs uploaded each frame to a GPU storage buffer. The vertex shader reads
`primitives[gl_InstanceIndex]`, computes world-space position from unit quad corners + primitive
bounds. The fragment shader dispatches on `Shape_Kind` to evaluate the correct signed distance
function analytically.
`Primitive` structs (80 bytes each) uploaded each frame to a GPU storage buffer. The vertex shader
reads `primitives[gl_InstanceIndex]`, computes world-space position from unit quad corners +
primitive bounds. The fragment shader always evaluates `sdRoundedBox` — there is no per-primitive
kind dispatch.
Seven SDF shape kinds are implemented:
The SDF path handles all shapes that are algebraically reducible to a rounded rectangle:
1. **RRect** — rounded rectangle with per-corner radii (iq's `sdRoundedBox`)
2. **Circle** — filled or stroked circle
3. **Ellipse**exact signed-distance ellipse (iq's iterative `sdEllipse`)
4. **Segment** — capsule-style line segment with rounded caps
5. **Ring_Arc** — annular ring with angular clipping for arcs
6. **NGon** — regular polygon with arbitrary side count and rotation
7. **Polyline** — decomposed into independent `Segment` primitives per adjacent point pair
- **Rounded rectangles** — per-corner radii via `sdRoundedBox` (iq). Covers filled, stroked,
textured, and gradient-filled rectangles.
- **Circles** — uniform radii equal to half-size. Covers filled, stroked, and radial-gradient circles.
- **Line segments / capsules** — rotated RRect with uniform radii equal to half-thickness (stadium shape).
- **Full rings / annuli** — stroked circle (mid-radius with stroke thickness = outer - inner).
All SDF shapes support fill and stroke modes via `Shape_Flags`, and produce mathematically exact
curves with analytical anti-aliasing via `smoothstep` — no tessellation, no piecewise-linear
approximation. A rounded rectangle is 1 primitive (64 bytes) instead of ~250 vertices (~5000 bytes).
All SDF shapes support fill, stroke, solid color, bilinear 4-corner gradients, radial 2-color
gradients, and texture fills via `Shape_Flags`. Gradient colors are packed into the same 16 bytes as
the texture UV rect via a `Uv_Or_Gradient` raw union — zero size increase to the 80-byte `Primitive`
struct. Gradient and texture are mutually exclusive.
All SDF shapes produce mathematically exact curves with analytical anti-aliasing via `smoothstep`
no tessellation, no piecewise-linear approximation. A rounded rectangle is 1 primitive (80 bytes)
instead of ~250 vertices (~5000 bytes).
The fragment shader's estimated register footprint is ~2023 VGPRs via static live-range analysis.
RRect and Ring_Arc are roughly tied at peak pressure — RRect carries `corner_radii` (4 regs) plus
`sdRoundedBox` temporaries, Ring_Arc carries wedge normals plus dot-product temporaries. Both land
comfortably under Mali Valhall's 32-register occupancy cliff (G57/G77/G78 and later) and well under
desktop limits. On older Bifrost Mali (G71/G72/G76, 16-register cliff) either shape kind may incur
partial occupancy reduction. These estimates are hand-counted; exact numbers require `malioc` or
Radeon GPU Analyzer against the compiled SPIR-V.
MSAA is opt-in (default `._1`, no MSAA) via `Init_Options.msaa_samples`. SDF rendering does not
benefit from MSAA because fragment coverage is computed analytically. MSAA remains useful for text
glyph edges and tessellated user geometry if desired.
All public drawing procs use prefixed names for clarity: `sdf_*` for SDF-path shapes, `tes_*` for
tessellated-path shapes. Proc groups provide a single entry point per shape concept (e.g.,
`sdf_rectangle` dispatches to `sdf_rectangle_solid` or `sdf_rectangle_gradient` based on argument
count).
## 2D rendering pipeline plan
This section documents the planned architecture for levlib's 2D rendering system. The design is driven
@@ -91,19 +107,19 @@ Below the cliff, adding registers has zero occupancy cost.
On consumer Ampere/Ada GPUs (RTX 30xx/40xx, 65,536 regs/SM, max 1,536 threads/SM, cliff at ~43 regs):
| Register allocation | Reg-limited threads | Actual (hw-capped) | Occupancy |
| ----------------------- | ------------------- | ------------------ | --------- |
| 20 regs (main pipeline) | 3,276 | 1,536 | 100% |
| 32 regs | 2,048 | 1,536 | 100% |
| 48 regs (effects) | 1,365 | 1,365 | ~89% |
| Register allocation | Reg-limited threads | Actual (hw-capped) | Occupancy |
| ------------------------ | ------------------- | ------------------ | --------- |
| ~16 regs (main pipeline) | 4,096 | 1,536 | 100% |
| 32 regs | 2,048 | 1,536 | 100% |
| 48 regs (effects) | 1,365 | 1,365 | ~89% |
On Volta/A100 GPUs (65,536 regs/SM, max 2,048 threads/SM, cliff at ~32 regs):
| Register allocation | Reg-limited threads | Actual (hw-capped) | Occupancy |
| ----------------------- | ------------------- | ------------------ | --------- |
| 20 regs (main pipeline) | 3,276 | 2,048 | 100% |
| 32 regs | 2,048 | 2,048 | 100% |
| 48 regs (effects) | 1,365 | 1,365 | ~67% |
| Register allocation | Reg-limited threads | Actual (hw-capped) | Occupancy |
| ------------------------ | ------------------- | ------------------ | --------- |
| ~16 regs (main pipeline) | 4,096 | 2,048 | 100% |
| 32 regs | 2,048 | 2,048 | 100% |
| 48 regs (effects) | 1,365 | 1,365 | ~67% |
On low-end mobile (ARM Mali Bifrost/Valhall, 64 regs/thread, cliff fixed at 32 regs):
@@ -261,11 +277,12 @@ Our design has two branch points:
Every thread in every warp of a draw call sees the same `mode` value. **Zero divergence, zero
cost.**
2. **`shape_kind` (flat varying from storage buffer): which SDF to evaluate.** This is category 3.
2. **`flags` (flat varying from storage buffer): gradient/texture/stroke mode.** This is category 3.
The `flat` interpolation qualifier ensures that all fragments rasterized from one primitive's quad
receive the same `shape_kind` value. Divergence can only occur at the **boundary between two
adjacent primitives of different kinds**, where the rasterizer might pack fragments from both
primitives into the same warp.
receive the same flag bits. However, since the SDF path now evaluates only `sdRoundedBox` with no
kind dispatch, the only flag-dependent branches are gradient vs. texture vs. solid color selection
— all lightweight (38 instructions per path). Divergence at primitive boundaries between
different flag combinations has negligible cost.
For category 3, the divergence analysis depends on primitive size:
@@ -282,9 +299,10 @@ For category 3, the divergence analysis depends on primitive size:
frame-level divergence is typically **13%** of all warps.
At 13% divergence, the throughput impact is negligible. At 4K with 12.4M total fragments
(~387,000 warps), divergent boundary warps number in the low thousands. Each divergent warp pays at
most ~25 extra instructions (the cost of the longest untaken SDF branch). At ~12G instructions/sec
on a mid-range GPU, that totals ~4μs — under 0.05% of an 8.3ms (120 FPS) frame budget. This is
(~387,000 warps), divergent boundary warps number in the low thousands. Without kind dispatch, the
longest untaken branch is the gradient evaluation (~8 instructions), not a different SDF function.
Each divergent warp pays at most ~8 extra instructions. At ~12G instructions/sec on a mid-range GPU,
that totals ~1.3μs — under 0.02% of an 8.3ms (120 FPS) frame budget. This is
confirmed by production renderers that use exactly this pattern:
- **vger / vger-rs** (Audulus): single pipeline, 11 primitive kinds dispatched by a `switch` on a
@@ -309,9 +327,10 @@ our design:
> have no per-fragment data-dependent branches in the main pipeline.
2. **Branches where both paths are very long.** If both sides of a branch are 500+ instructions,
divergent warps pay double a large cost. Our SDF functions are 1025 instructions each. Even
fully divergent, the penalty is ~25 extra instructions — less than a single texture sample's
latency.
divergent warps pay double a large cost. Without kind dispatch, the SDF path always evaluates
`sdRoundedBox`; the only branches are gradient/texture/solid color selection at 38 instructions
each. Even fully divergent, the penalty is ~8 extra instructions — less than a single texture
sample's latency.
3. **Branches that prevent compiler optimizations.** Some compilers cannot schedule instructions
across branch boundaries, reducing VLIW utilization on older architectures. Modern GPUs (NVIDIA
@@ -319,9 +338,9 @@ our design:
concern.
4. **Register pressure from the union of all branches.** This is the real cost, and it is why we
split heavy effects (shadows, glass) into separate pipelines. Within the main pipeline, all SDF
branches have similar register footprints (1222 registers), so combining them causes negligible
occupancy loss.
split heavy effects (shadows, glass) into separate pipelines. Within the main pipeline, the SDF
path has a single evaluation (sdRoundedBox) with flag-based color selection, clustering at ~1518
registers, so there is negligible occupancy loss.
**References:**
@@ -342,17 +361,19 @@ our design:
### Main pipeline: SDF + tessellated (unified)
The main pipeline serves two submission modes through a single `TRIANGLELIST` pipeline and a single
vertex input layout, distinguished by a push constant:
vertex input layout, distinguished by a mode marker in the `Primitive.flags` field (low byte:
0 = tessellated, 1 = SDF). The tessellated path sets this to 0 via zero-initialization in the vertex
shader; the SDF path sets it to 1 via `pack_flags`.
- **Tessellated mode** (`mode = 0`): direct vertex buffer with explicit geometry. Unchanged from
today. Used for text (SDL_ttf atlas sampling), polylines, triangle fans/strips, gradient-filled
shapes, and any user-provided raw vertex geometry.
- **Tessellated mode** (`mode = 0`): direct vertex buffer with explicit geometry. Used for text
(SDL_ttf atlas sampling), triangle fans/strips, ellipses, regular polygons, circle sectors, and
any user-provided raw vertex geometry.
- **SDF mode** (`mode = 1`): shared unit-quad vertex buffer + GPU storage buffer of `Primitive`
structs, drawn instanced. Used for all shapes with closed-form signed distance functions.
Both modes converge on the same fragment shader, which dispatches on a `shape_kind` discriminant
carried either in the vertex data (tessellated, always `Solid = 0`) or in the storage-buffer
primitive struct (SDF modes).
Both modes use the same fragment shader. The fragment shader checks the mode marker: mode 0 computes
`out = color * texture(tex, uv)`; mode 1 always evaluates `sdRoundedBox` and applies
gradient/texture/solid color based on flag bits.
#### Why SDF for shapes
@@ -391,49 +412,60 @@ SDF primitives are submitted via a GPU storage buffer indexed by `gl_InstanceInd
shader, rather than encoding per-primitive data redundantly in vertex attributes. This follows the
pattern used by both Zed GPUI and vger-rs.
Each SDF shape is described by a single `Primitive` struct (~56 bytes) in the storage buffer. The
Each SDF shape is described by a single `Primitive` struct (80 bytes) in the storage buffer. The
vertex shader reads `primitives[gl_InstanceIndex]`, computes the quad corner position from the unit
vertex and the primitive's bounds, and passes shape parameters to the fragment shader via `flat`
interpolated varyings.
Compared to encoding per-primitive data in vertex attributes (the "fat vertex" approach), storage-
buffer instancing eliminates the 46× data duplication across quad corners. A rounded rectangle costs
56 bytes instead of 4 vertices × 40+ bytes = 160+ bytes.
80 bytes instead of 4 vertices × 40+ bytes = 160+ bytes.
The tessellated path retains the existing direct vertex buffer layout (20 bytes/vertex, no storage
buffer access). The vertex shader branch on `mode` (push constant) is warp-uniform — every invocation
in a draw call has the same mode — so it is effectively free on all modern GPUs.
#### Shape kinds
#### Shape folding
Primitives in the main pipeline's storage buffer carry a `Shape_Kind` discriminant:
The SDF path evaluates a single function — `sdRoundedBox` — for all primitives. There is no
`Shape_Kind` enum or per-primitive kind dispatch in the fragment shader. Shapes that are algebraically
special cases of a rounded rectangle are emitted as RRect primitives by the CPU-side drawing procs:
| Kind | SDF function | Notes |
| ---------- | -------------------------------------- | --------------------------------------------------------- |
| `RRect` | `sdRoundedBox` (iq) | Per-corner radii. Covers all Clay rectangles and borders. |
| `Circle` | `sdCircle` | Filled and stroked. |
| `Ellipse` | `sdEllipse` | Exact (iq's closed-form). |
| `Segment` | `sdSegment` capsule | Rounded caps, correct sub-pixel thin lines. |
| `Ring_Arc` | `abs(sdCircle) - thickness` + arc mask | Rings, arcs, circle sectors unified. |
| `NGon` | `sdRegularPolygon` | Regular n-gon for n ≥ 5. |
| User-facing shape | RRect mapping | Notes |
| ---------------------------- | -------------------------------------------- | ---------------------------------------- |
| Rectangle (sharp or rounded) | Direct | Per-corner radii from `radii` param |
| Circle | `half_size = (r, r)`, `radii = (r, r, r, r)` | Uniform radii = half-size |
| Line segment / capsule | Rotated RRect, `radii = half_thickness` | Stadium shape (fully-rounded minor axis) |
| Full ring / annulus | Stroked circle at mid-radius | `stroke_px = outer - inner` |
The `Solid` kind (value 0) is reserved for the tessellated path, where `shape_kind` is implicitly
zero because the fragment shader receives it from zero-initialized vertex attributes.
Shapes without a closed-form RRect reduction are drawn via the tessellated path:
Stroke/outline variants of each shape are handled by the `Shape_Flags` bit set rather than separate
shape kinds. The fragment shader transforms `d = abs(d) - stroke_width` when the `Stroke` flag is
set.
| Shape | Tessellated proc | Method |
| ------------------------- | ---------------------------------- | -------------------------- |
| Ellipse | `tes_ellipse`, `tes_ellipse_lines` | Triangle fan approximation |
| Regular polygon (N-gon) | `tes_polygon`, `tes_polygon_lines` | Triangle fan from center |
| Circle sector (pie slice) | `tes_sector` | Triangle fan arc |
The `Shape_Flags` bit set controls rendering mode per primitive:
| Flag | Bit | Effect |
| ----------------- | --- | -------------------------------------------------------------------- |
| `Stroke` | 0 | Outline instead of fill (`d = abs(d) - stroke_width/2`) |
| `Textured` | 1 | Sample texture using `uv.uv_rect` (mutually exclusive with Gradient) |
| `Gradient` | 2 | Bilinear 4-corner interpolation from `uv.corner_colors` |
| `Gradient_Radial` | 3 | Radial 2-color falloff (inner/outer) from `uv.corner_colors[0..1]` |
**What stays tessellated:**
- Text (SDL_ttf atlas, pending future MSDF evaluation)
- `rectangle_gradient`, `circle_gradient` (per-vertex color interpolation)
- `triangle_fan`, `triangle_strip` (arbitrary user-provided point lists)
- `line_strip` / polylines (SDF polyline rendering is possible but complex; deferred)
- Ellipses (`tes_ellipse`, `tes_ellipse_lines`)
- Regular polygons (`tes_polygon`, `tes_polygon_lines`)
- Circle sectors / pie slices (`tes_sector`)
- `tes_triangle`, `tes_triangle_fan`, `tes_triangle_strip` (arbitrary user-provided geometry)
- Any raw vertex geometry submitted via `prepare_shape`
The rule: if the shape has a closed-form SDF, it goes SDF. If it's described only by a vertex list or
needs per-vertex color interpolation, it stays tessellated.
The design rule: if the shape reduces to `sdRoundedBox`, it goes SDF. If it requires a different SDF
function or is described by a vertex list, it stays tessellated.
### Effects pipeline
@@ -547,21 +579,21 @@ The `Primitive` struct for SDF shapes lives in the storage buffer, not in vertex
```
Primitive :: struct {
bounds: [4]f32, // 0: min_x, min_y, max_x, max_y
color: Color, // 16: u8x4, unpacked in shader via unpackUnorm4x8
kind_flags: u32, // 20: (kind as u32) | (flags as u32 << 8)
rotation: f32, // 24: shader self-rotation in radians
_pad: f32, // 28: alignment
params: Shape_Params, // 32: raw union, 32 bytes (two vec4s of shape-specific data)
uv_rect: [4]f32, // 64: texture UV sub-region (u_min, v_min, u_max, v_max)
bounds: [4]f32, // 0: min_x, min_y, max_x, max_y
color: Color, // 16: u8x4, unpacked in shader via unpackUnorm4x8
flags: u32, // 20: low byte = Shape_Kind, bits 8+ = Shape_Flags
rotation_sc: u32, // 24: packed f16 pair (sin, cos). Requires .Rotated flag.
_pad: f32, // 28: reserved for future use
params: Shape_Params, // 32: per-kind params union (half_feather, radii, etc.) (32 bytes)
uv: Uv_Or_Effects, // 64: texture UV rect or gradient/outline parameters (16 bytes)
}
// Total: 80 bytes (std430 aligned)
```
`Shape_Params` is a `#raw_union` with named variants per shape kind (`rrect`, `circle`, `segment`,
etc.), ensuring type safety on the CPU side and zero-cost reinterpretation on the GPU side. The
`uv_rect` field is used by textured SDF primitives (Shape_Flag.Textured); non-textured primitives
leave it zeroed.
`RRect_Params` holds the rounded-rectangle parameters directly — there is no `Shape_Params` union.
`Uv_Or_Gradient` is a `#raw_union` that aliases `[4]f32` (texture UV rect) with `[4]Color` (gradient
corner colors, clockwise from top-left: TL, TR, BR, BL). The `flags` field encodes both the
tessellated/SDF mode marker (low byte) and shape flags (bits 8+) via `pack_flags`.
### Draw submission order
@@ -583,14 +615,15 @@ invariant is that each primitive is drawn exactly once, in the pipeline that own
Text rendering currently uses SDL_ttf's GPU text engine, which rasterizes glyphs per `(font, size)`
pair into bitmap atlases and emits indexed triangle data via `GetGPUTextDrawData`. This path is
**unchanged** by the SDF migration — text continues to flow through the main pipeline's tessellated
mode with `shape_kind = Solid`, sampling the SDL_ttf atlas texture.
mode with `mode = 0`, sampling the SDL_ttf atlas texture.
A future phase may evaluate MSDF (multi-channel signed distance field) text rendering, which would
allow resolution-independent glyph rendering from a single small atlas per font. This would involve:
- Offline atlas generation via Chlumský's msdf-atlas-gen tool.
- Runtime glyph metrics via `vendor:stb/truetype` (already in the Odin distribution).
- A new `Shape_Kind.MSDF_Glyph` variant in the main pipeline's fragment shader.
- A new MSDF glyph mode in the fragment shader, which would require reintroducing a mode/kind
distinction (the current shader evaluates only `sdRoundedBox` with no kind dispatch).
- Potential removal of the SDL_ttf dependency.
This is explicitly deferred. The SDF shape migration is independent of and does not block text
@@ -659,30 +692,26 @@ with the same texture but different samplers produce separate draw calls, which
#### Textured draw procs
Textured rectangles route through the existing SDF path via `draw.rectangle_texture` and
`draw.rectangle_texture_corners`, mirroring `draw.rectangle` and `draw.rectangle_corners` exactly —
Textured rectangles route through the existing SDF path via `sdf_rectangle_texture` and
`sdf_rectangle_texture_corners`, mirroring `sdf_rectangle` and `sdf_rectangle_corners` exactly —
same parameters, same naming — with the color parameter replaced by a texture ID plus an optional
tint.
An earlier iteration of this design considered a separate tessellated `draw.texture` proc for
"simple" fullscreen quads, on the theory that the tessellated path's lower register count (~16 regs
vs ~24 for the SDF textured branch) would improve occupancy at large fragment counts. Applying the
register-pressure analysis from the pipeline-strategy section above shows this is wrong: both 16 and
24 registers are well below the register cliff (~43 regs on consumer Ampere/Ada, ~32 on Volta/A100),
so both run at 100% occupancy. The remaining ALU difference (~15 extra instructions for the SDF
evaluation) amounts to ~20μs at 4K — below noise. Meanwhile, splitting into a separate pipeline
would add ~15μs per pipeline bind on the CPU side per scissor, matching or exceeding the GPU-side
savings. Within the main pipeline, unified remains strictly better.
An earlier iteration of this design considered a separate tessellated proc for "simple" fullscreen
quads, on the theory that the tessellated path's lower register count (~16 regs vs ~18 for the SDF
textured branch) would improve occupancy at large fragment counts. Applying the register-pressure
analysis from the pipeline-strategy section above shows this is wrong: both 16 and 18 registers are
well below the register cliff (~43 regs on consumer Ampere/Ada, ~32 on Volta/A100), so both run at
100% occupancy. The remaining ALU difference (~15 extra instructions for the SDF evaluation) amounts
to ~20μs at 4K — below noise. Meanwhile, splitting into a separate pipeline would add ~15μs per
pipeline bind on the CPU side per scissor, matching or exceeding the GPU-side savings. Within the
main pipeline, unified remains strictly better.
The naming convention follows the existing shape API: `rectangle_texture` and
`rectangle_texture_corners` sit alongside `rectangle` and `rectangle_corners`, mirroring the
`rectangle_gradient` / `circle_gradient` pattern where the shape is the primary noun and the
modifier (gradient, texture) is secondary. This groups related procs together in autocomplete
(`rectangle_*`) and reads as natural English ("draw a rectangle with a texture").
Future per-shape texture variants (`circle_texture`, `ellipse_texture`, `polygon_texture`) are
reserved by this naming convention and require only a `Shape_Flag.Textured` bit plus a small
per-shape UV mapping function in the fragment shader. These are additive.
The naming convention uses `sdf_` and `tes_` prefixes to indicate the rendering path, with suffixes
for modifiers: `sdf_rectangle_texture` and `sdf_rectangle_texture_corners` sit alongside
`sdf_rectangle` (solid or gradient overload). Proc groups like `sdf_rectangle` dispatch to
`sdf_rectangle_solid` or `sdf_rectangle_gradient` based on argument count. Future per-shape texture
variants (`sdf_circle_texture`) are additive.
#### What SDF anti-aliasing does and does not do for textured draws
@@ -721,9 +750,9 @@ textures onto a free list that is processed in `r_end_frame`, not at the call si
Clay's `RenderCommandType.Image` is handled by dereferencing `imageData: rawptr` as a pointer to a
`Clay_Image_Data` struct containing a `Texture_Id`, `Fit_Mode`, and tint color. Routing mirrors the
existing rectangle handling: zero `cornerRadius` dispatches to `draw.texture` (tessellated), nonzero
dispatches to `draw.rectangle_texture_corners` (SDF). A `fit_params` call computes UVs from the fit
mode before dispatch.
existing rectangle handling: zero `cornerRadius` dispatches to `sdf_rectangle_texture` (SDF, sharp
corners), nonzero dispatches to `sdf_rectangle_texture_corners` (SDF, per-corner radii). A
`fit_params` call computes UVs from the fit mode before dispatch.
#### Deferred features
@@ -735,7 +764,7 @@ The following are plumbed in the descriptor but not implemented in phase 1:
- **3D textures, arrays, cube maps**: `Texture_Desc.type` and `depth_or_layers` fields exist.
- **Additional samplers**: anisotropic, trilinear, clamp-to-border — additive enum values.
- **Atlas packing**: internal optimization for sub-batch coalescing; invisible to callers.
- **Per-shape texture variants**: `circle_texture`, `ellipse_texture`, etc. — reserved by naming.
- **Per-shape texture variants**: `sdf_circle_texture`, `tes_ellipse_texture`, `tes_polygon_texture` — potential future additions, reserved by naming convention.
**References:**