Major rendering updates
This commit is contained in:
242
draw/README.md
242
draw/README.md
@@ -9,35 +9,48 @@ The renderer uses a single unified `Pipeline_2D_Base` (`TRIANGLELIST` pipeline)
|
||||
modes dispatched by a push constant:
|
||||
|
||||
- **Mode 0 (Tessellated):** Vertex buffer contains real geometry. Used for text (indexed draws into
|
||||
SDL_ttf atlas textures), axis-aligned sharp-corner rectangles (already optimal as 2 triangles),
|
||||
per-vertex color gradients (`rectangle_gradient`, `circle_gradient`), angular-clipped circle
|
||||
sectors (`circle_sector`), and arbitrary user geometry (`triangle`, `triangle_fan`,
|
||||
`triangle_strip`). The fragment shader computes `out = color * texture(tex, uv)`.
|
||||
SDL_ttf atlas textures), single-pixel points (`tes_pixel`), arbitrary user geometry (`tes_triangle`,
|
||||
`tes_triangle_fan`, `tes_triangle_strip`), and shapes without a closed-form rounded-rectangle
|
||||
reduction: ellipses (`tes_ellipse`), regular polygons (`tes_polygon`), and circle sectors
|
||||
(`tes_sector`). The fragment shader computes `out = color * texture(tex, uv)`.
|
||||
|
||||
- **Mode 1 (SDF):** A static 6-vertex unit-quad buffer is drawn instanced, with per-primitive
|
||||
`Primitive` structs uploaded each frame to a GPU storage buffer. The vertex shader reads
|
||||
`primitives[gl_InstanceIndex]`, computes world-space position from unit quad corners + primitive
|
||||
bounds. The fragment shader dispatches on `Shape_Kind` to evaluate the correct signed distance
|
||||
function analytically.
|
||||
`Primitive` structs (80 bytes each) uploaded each frame to a GPU storage buffer. The vertex shader
|
||||
reads `primitives[gl_InstanceIndex]`, computes world-space position from unit quad corners +
|
||||
primitive bounds. The fragment shader always evaluates `sdRoundedBox` — there is no per-primitive
|
||||
kind dispatch.
|
||||
|
||||
Seven SDF shape kinds are implemented:
|
||||
The SDF path handles all shapes that are algebraically reducible to a rounded rectangle:
|
||||
|
||||
1. **RRect** — rounded rectangle with per-corner radii (iq's `sdRoundedBox`)
|
||||
2. **Circle** — filled or stroked circle
|
||||
3. **Ellipse** — exact signed-distance ellipse (iq's iterative `sdEllipse`)
|
||||
4. **Segment** — capsule-style line segment with rounded caps
|
||||
5. **Ring_Arc** — annular ring with angular clipping for arcs
|
||||
6. **NGon** — regular polygon with arbitrary side count and rotation
|
||||
7. **Polyline** — decomposed into independent `Segment` primitives per adjacent point pair
|
||||
- **Rounded rectangles** — per-corner radii via `sdRoundedBox` (iq). Covers filled, stroked,
|
||||
textured, and gradient-filled rectangles.
|
||||
- **Circles** — uniform radii equal to half-size. Covers filled, stroked, and radial-gradient circles.
|
||||
- **Line segments / capsules** — rotated RRect with uniform radii equal to half-thickness (stadium shape).
|
||||
- **Full rings / annuli** — stroked circle (mid-radius with stroke thickness = outer - inner).
|
||||
|
||||
All SDF shapes support fill and stroke modes via `Shape_Flags`, and produce mathematically exact
|
||||
curves with analytical anti-aliasing via `smoothstep` — no tessellation, no piecewise-linear
|
||||
approximation. A rounded rectangle is 1 primitive (64 bytes) instead of ~250 vertices (~5000 bytes).
|
||||
All SDF shapes support fill, stroke, solid color, bilinear 4-corner gradients, radial 2-color
|
||||
gradients, and texture fills via `Shape_Flags`. Gradient colors are packed into the same 16 bytes as
|
||||
the texture UV rect via a `Uv_Or_Gradient` raw union — zero size increase to the 80-byte `Primitive`
|
||||
struct. Gradient and texture are mutually exclusive.
|
||||
|
||||
All SDF shapes produce mathematically exact curves with analytical anti-aliasing via `smoothstep` —
|
||||
no tessellation, no piecewise-linear approximation. A rounded rectangle is 1 primitive (80 bytes)
|
||||
instead of ~250 vertices (~5000 bytes).
|
||||
|
||||
The fragment shader's estimated register footprint is ~20–26 VGPRs, with Ring_Arc being the
|
||||
heaviest shape kind (pre-computed edge normals and branchless wedge evaluation require more live
|
||||
values than RRect's sdRoundedBox). Comfortably under the 32-register occupancy cliff on mobile
|
||||
Mali and Adreno, and well under desktop architectures' higher limits.
|
||||
|
||||
MSAA is opt-in (default `._1`, no MSAA) via `Init_Options.msaa_samples`. SDF rendering does not
|
||||
benefit from MSAA because fragment coverage is computed analytically. MSAA remains useful for text
|
||||
glyph edges and tessellated user geometry if desired.
|
||||
|
||||
All public drawing procs use prefixed names for clarity: `sdf_*` for SDF-path shapes, `tes_*` for
|
||||
tessellated-path shapes. Proc groups provide a single entry point per shape concept (e.g.,
|
||||
`sdf_rectangle` dispatches to `sdf_rectangle_solid` or `sdf_rectangle_gradient` based on argument
|
||||
count).
|
||||
|
||||
## 2D rendering pipeline plan
|
||||
|
||||
This section documents the planned architecture for levlib's 2D rendering system. The design is driven
|
||||
@@ -91,19 +104,19 @@ Below the cliff, adding registers has zero occupancy cost.
|
||||
|
||||
On consumer Ampere/Ada GPUs (RTX 30xx/40xx, 65,536 regs/SM, max 1,536 threads/SM, cliff at ~43 regs):
|
||||
|
||||
| Register allocation | Reg-limited threads | Actual (hw-capped) | Occupancy |
|
||||
| ----------------------- | ------------------- | ------------------ | --------- |
|
||||
| 20 regs (main pipeline) | 3,276 | 1,536 | 100% |
|
||||
| 32 regs | 2,048 | 1,536 | 100% |
|
||||
| 48 regs (effects) | 1,365 | 1,365 | ~89% |
|
||||
| Register allocation | Reg-limited threads | Actual (hw-capped) | Occupancy |
|
||||
| ------------------------ | ------------------- | ------------------ | --------- |
|
||||
| ~16 regs (main pipeline) | 4,096 | 1,536 | 100% |
|
||||
| 32 regs | 2,048 | 1,536 | 100% |
|
||||
| 48 regs (effects) | 1,365 | 1,365 | ~89% |
|
||||
|
||||
On Volta/A100 GPUs (65,536 regs/SM, max 2,048 threads/SM, cliff at ~32 regs):
|
||||
|
||||
| Register allocation | Reg-limited threads | Actual (hw-capped) | Occupancy |
|
||||
| ----------------------- | ------------------- | ------------------ | --------- |
|
||||
| 20 regs (main pipeline) | 3,276 | 2,048 | 100% |
|
||||
| 32 regs | 2,048 | 2,048 | 100% |
|
||||
| 48 regs (effects) | 1,365 | 1,365 | ~67% |
|
||||
| Register allocation | Reg-limited threads | Actual (hw-capped) | Occupancy |
|
||||
| ------------------------ | ------------------- | ------------------ | --------- |
|
||||
| ~16 regs (main pipeline) | 4,096 | 2,048 | 100% |
|
||||
| 32 regs | 2,048 | 2,048 | 100% |
|
||||
| 48 regs (effects) | 1,365 | 1,365 | ~67% |
|
||||
|
||||
On low-end mobile (ARM Mali Bifrost/Valhall, 64 regs/thread, cliff fixed at 32 regs):
|
||||
|
||||
@@ -261,11 +274,12 @@ Our design has two branch points:
|
||||
Every thread in every warp of a draw call sees the same `mode` value. **Zero divergence, zero
|
||||
cost.**
|
||||
|
||||
2. **`shape_kind` (flat varying from storage buffer): which SDF to evaluate.** This is category 3.
|
||||
2. **`flags` (flat varying from storage buffer): gradient/texture/stroke mode.** This is category 3.
|
||||
The `flat` interpolation qualifier ensures that all fragments rasterized from one primitive's quad
|
||||
receive the same `shape_kind` value. Divergence can only occur at the **boundary between two
|
||||
adjacent primitives of different kinds**, where the rasterizer might pack fragments from both
|
||||
primitives into the same warp.
|
||||
receive the same flag bits. However, since the SDF path now evaluates only `sdRoundedBox` with no
|
||||
kind dispatch, the only flag-dependent branches are gradient vs. texture vs. solid color selection
|
||||
— all lightweight (3–8 instructions per path). Divergence at primitive boundaries between
|
||||
different flag combinations has negligible cost.
|
||||
|
||||
For category 3, the divergence analysis depends on primitive size:
|
||||
|
||||
@@ -282,9 +296,10 @@ For category 3, the divergence analysis depends on primitive size:
|
||||
frame-level divergence is typically **1–3%** of all warps.
|
||||
|
||||
At 1–3% divergence, the throughput impact is negligible. At 4K with 12.4M total fragments
|
||||
(~387,000 warps), divergent boundary warps number in the low thousands. Each divergent warp pays at
|
||||
most ~25 extra instructions (the cost of the longest untaken SDF branch). At ~12G instructions/sec
|
||||
on a mid-range GPU, that totals ~4μs — under 0.05% of an 8.3ms (120 FPS) frame budget. This is
|
||||
(~387,000 warps), divergent boundary warps number in the low thousands. Without kind dispatch, the
|
||||
longest untaken branch is the gradient evaluation (~8 instructions), not a different SDF function.
|
||||
Each divergent warp pays at most ~8 extra instructions. At ~12G instructions/sec on a mid-range GPU,
|
||||
that totals ~1.3μs — under 0.02% of an 8.3ms (120 FPS) frame budget. This is
|
||||
confirmed by production renderers that use exactly this pattern:
|
||||
|
||||
- **vger / vger-rs** (Audulus): single pipeline, 11 primitive kinds dispatched by a `switch` on a
|
||||
@@ -309,9 +324,10 @@ our design:
|
||||
> have no per-fragment data-dependent branches in the main pipeline.
|
||||
|
||||
2. **Branches where both paths are very long.** If both sides of a branch are 500+ instructions,
|
||||
divergent warps pay double a large cost. Our SDF functions are 10–25 instructions each. Even
|
||||
fully divergent, the penalty is ~25 extra instructions — less than a single texture sample's
|
||||
latency.
|
||||
divergent warps pay double a large cost. Without kind dispatch, the SDF path always evaluates
|
||||
`sdRoundedBox`; the only branches are gradient/texture/solid color selection at 3–8 instructions
|
||||
each. Even fully divergent, the penalty is ~8 extra instructions — less than a single texture
|
||||
sample's latency.
|
||||
|
||||
3. **Branches that prevent compiler optimizations.** Some compilers cannot schedule instructions
|
||||
across branch boundaries, reducing VLIW utilization on older architectures. Modern GPUs (NVIDIA
|
||||
@@ -319,9 +335,9 @@ our design:
|
||||
concern.
|
||||
|
||||
4. **Register pressure from the union of all branches.** This is the real cost, and it is why we
|
||||
split heavy effects (shadows, glass) into separate pipelines. Within the main pipeline, all SDF
|
||||
branches have similar register footprints (12–22 registers), so combining them causes negligible
|
||||
occupancy loss.
|
||||
split heavy effects (shadows, glass) into separate pipelines. Within the main pipeline, the SDF
|
||||
path has a single evaluation (sdRoundedBox) with flag-based color selection, clustering at ~15–18
|
||||
registers, so there is negligible occupancy loss.
|
||||
|
||||
**References:**
|
||||
|
||||
@@ -342,17 +358,19 @@ our design:
|
||||
### Main pipeline: SDF + tessellated (unified)
|
||||
|
||||
The main pipeline serves two submission modes through a single `TRIANGLELIST` pipeline and a single
|
||||
vertex input layout, distinguished by a push constant:
|
||||
vertex input layout, distinguished by a mode marker in the `Primitive.flags` field (low byte:
|
||||
0 = tessellated, 1 = SDF). The tessellated path sets this to 0 via zero-initialization in the vertex
|
||||
shader; the SDF path sets it to 1 via `pack_flags`.
|
||||
|
||||
- **Tessellated mode** (`mode = 0`): direct vertex buffer with explicit geometry. Unchanged from
|
||||
today. Used for text (SDL_ttf atlas sampling), polylines, triangle fans/strips, gradient-filled
|
||||
shapes, and any user-provided raw vertex geometry.
|
||||
- **Tessellated mode** (`mode = 0`): direct vertex buffer with explicit geometry. Used for text
|
||||
(SDL_ttf atlas sampling), triangle fans/strips, ellipses, regular polygons, circle sectors, and
|
||||
any user-provided raw vertex geometry.
|
||||
- **SDF mode** (`mode = 1`): shared unit-quad vertex buffer + GPU storage buffer of `Primitive`
|
||||
structs, drawn instanced. Used for all shapes with closed-form signed distance functions.
|
||||
|
||||
Both modes converge on the same fragment shader, which dispatches on a `shape_kind` discriminant
|
||||
carried either in the vertex data (tessellated, always `Solid = 0`) or in the storage-buffer
|
||||
primitive struct (SDF modes).
|
||||
Both modes use the same fragment shader. The fragment shader checks the mode marker: mode 0 computes
|
||||
`out = color * texture(tex, uv)`; mode 1 always evaluates `sdRoundedBox` and applies
|
||||
gradient/texture/solid color based on flag bits.
|
||||
|
||||
#### Why SDF for shapes
|
||||
|
||||
@@ -391,49 +409,60 @@ SDF primitives are submitted via a GPU storage buffer indexed by `gl_InstanceInd
|
||||
shader, rather than encoding per-primitive data redundantly in vertex attributes. This follows the
|
||||
pattern used by both Zed GPUI and vger-rs.
|
||||
|
||||
Each SDF shape is described by a single `Primitive` struct (~56 bytes) in the storage buffer. The
|
||||
Each SDF shape is described by a single `Primitive` struct (80 bytes) in the storage buffer. The
|
||||
vertex shader reads `primitives[gl_InstanceIndex]`, computes the quad corner position from the unit
|
||||
vertex and the primitive's bounds, and passes shape parameters to the fragment shader via `flat`
|
||||
interpolated varyings.
|
||||
|
||||
Compared to encoding per-primitive data in vertex attributes (the "fat vertex" approach), storage-
|
||||
buffer instancing eliminates the 4–6× data duplication across quad corners. A rounded rectangle costs
|
||||
56 bytes instead of 4 vertices × 40+ bytes = 160+ bytes.
|
||||
80 bytes instead of 4 vertices × 40+ bytes = 160+ bytes.
|
||||
|
||||
The tessellated path retains the existing direct vertex buffer layout (20 bytes/vertex, no storage
|
||||
buffer access). The vertex shader branch on `mode` (push constant) is warp-uniform — every invocation
|
||||
in a draw call has the same mode — so it is effectively free on all modern GPUs.
|
||||
|
||||
#### Shape kinds
|
||||
#### Shape folding
|
||||
|
||||
Primitives in the main pipeline's storage buffer carry a `Shape_Kind` discriminant:
|
||||
The SDF path evaluates a single function — `sdRoundedBox` — for all primitives. There is no
|
||||
`Shape_Kind` enum or per-primitive kind dispatch in the fragment shader. Shapes that are algebraically
|
||||
special cases of a rounded rectangle are emitted as RRect primitives by the CPU-side drawing procs:
|
||||
|
||||
| Kind | SDF function | Notes |
|
||||
| ---------- | -------------------------------------- | --------------------------------------------------------- |
|
||||
| `RRect` | `sdRoundedBox` (iq) | Per-corner radii. Covers all Clay rectangles and borders. |
|
||||
| `Circle` | `sdCircle` | Filled and stroked. |
|
||||
| `Ellipse` | `sdEllipse` | Exact (iq's closed-form). |
|
||||
| `Segment` | `sdSegment` capsule | Rounded caps, correct sub-pixel thin lines. |
|
||||
| `Ring_Arc` | `abs(sdCircle) - thickness` + arc mask | Rings, arcs, circle sectors unified. |
|
||||
| `NGon` | `sdRegularPolygon` | Regular n-gon for n ≥ 5. |
|
||||
| User-facing shape | RRect mapping | Notes |
|
||||
| ---------------------------- | -------------------------------------------- | ---------------------------------------- |
|
||||
| Rectangle (sharp or rounded) | Direct | Per-corner radii from `radii` param |
|
||||
| Circle | `half_size = (r, r)`, `radii = (r, r, r, r)` | Uniform radii = half-size |
|
||||
| Line segment / capsule | Rotated RRect, `radii = half_thickness` | Stadium shape (fully-rounded minor axis) |
|
||||
| Full ring / annulus | Stroked circle at mid-radius | `stroke_px = outer - inner` |
|
||||
|
||||
The `Solid` kind (value 0) is reserved for the tessellated path, where `shape_kind` is implicitly
|
||||
zero because the fragment shader receives it from zero-initialized vertex attributes.
|
||||
Shapes without a closed-form RRect reduction are drawn via the tessellated path:
|
||||
|
||||
Stroke/outline variants of each shape are handled by the `Shape_Flags` bit set rather than separate
|
||||
shape kinds. The fragment shader transforms `d = abs(d) - stroke_width` when the `Stroke` flag is
|
||||
set.
|
||||
| Shape | Tessellated proc | Method |
|
||||
| ------------------------- | ---------------------------------- | -------------------------- |
|
||||
| Ellipse | `tes_ellipse`, `tes_ellipse_lines` | Triangle fan approximation |
|
||||
| Regular polygon (N-gon) | `tes_polygon`, `tes_polygon_lines` | Triangle fan from center |
|
||||
| Circle sector (pie slice) | `tes_sector` | Triangle fan arc |
|
||||
|
||||
The `Shape_Flags` bit set controls rendering mode per primitive:
|
||||
|
||||
| Flag | Bit | Effect |
|
||||
| ----------------- | --- | -------------------------------------------------------------------- |
|
||||
| `Stroke` | 0 | Outline instead of fill (`d = abs(d) - stroke_width/2`) |
|
||||
| `Textured` | 1 | Sample texture using `uv.uv_rect` (mutually exclusive with Gradient) |
|
||||
| `Gradient` | 2 | Bilinear 4-corner interpolation from `uv.corner_colors` |
|
||||
| `Gradient_Radial` | 3 | Radial 2-color falloff (inner/outer) from `uv.corner_colors[0..1]` |
|
||||
|
||||
**What stays tessellated:**
|
||||
|
||||
- Text (SDL_ttf atlas, pending future MSDF evaluation)
|
||||
- `rectangle_gradient`, `circle_gradient` (per-vertex color interpolation)
|
||||
- `triangle_fan`, `triangle_strip` (arbitrary user-provided point lists)
|
||||
- `line_strip` / polylines (SDF polyline rendering is possible but complex; deferred)
|
||||
- Ellipses (`tes_ellipse`, `tes_ellipse_lines`)
|
||||
- Regular polygons (`tes_polygon`, `tes_polygon_lines`)
|
||||
- Circle sectors / pie slices (`tes_sector`)
|
||||
- `tes_triangle`, `tes_triangle_fan`, `tes_triangle_strip` (arbitrary user-provided geometry)
|
||||
- Any raw vertex geometry submitted via `prepare_shape`
|
||||
|
||||
The rule: if the shape has a closed-form SDF, it goes SDF. If it's described only by a vertex list or
|
||||
needs per-vertex color interpolation, it stays tessellated.
|
||||
The design rule: if the shape reduces to `sdRoundedBox`, it goes SDF. If it requires a different SDF
|
||||
function or is described by a vertex list, it stays tessellated.
|
||||
|
||||
### Effects pipeline
|
||||
|
||||
@@ -547,21 +576,21 @@ The `Primitive` struct for SDF shapes lives in the storage buffer, not in vertex
|
||||
|
||||
```
|
||||
Primitive :: struct {
|
||||
bounds: [4]f32, // 0: min_x, min_y, max_x, max_y
|
||||
color: Color, // 16: u8x4, unpacked in shader via unpackUnorm4x8
|
||||
kind_flags: u32, // 20: (kind as u32) | (flags as u32 << 8)
|
||||
rotation: f32, // 24: shader self-rotation in radians
|
||||
_pad: f32, // 28: alignment
|
||||
params: Shape_Params, // 32: raw union, 32 bytes (two vec4s of shape-specific data)
|
||||
uv_rect: [4]f32, // 64: texture UV sub-region (u_min, v_min, u_max, v_max)
|
||||
bounds: [4]f32, // 0: min_x, min_y, max_x, max_y
|
||||
color: Color, // 16: u8x4, unpacked in shader via unpackUnorm4x8
|
||||
flags: u32, // 20: low byte = Shape_Kind, bits 8+ = Shape_Flags
|
||||
rotation_sc: u32, // 24: packed f16 pair (sin, cos). Requires .Rotated flag.
|
||||
_pad: f32, // 28: reserved for future use
|
||||
params: Shape_Params, // 32: per-kind params union (half_feather, radii, etc.) (32 bytes)
|
||||
uv: Uv_Or_Effects, // 64: texture UV rect or gradient/outline parameters (16 bytes)
|
||||
}
|
||||
// Total: 80 bytes (std430 aligned)
|
||||
```
|
||||
|
||||
`Shape_Params` is a `#raw_union` with named variants per shape kind (`rrect`, `circle`, `segment`,
|
||||
etc.), ensuring type safety on the CPU side and zero-cost reinterpretation on the GPU side. The
|
||||
`uv_rect` field is used by textured SDF primitives (Shape_Flag.Textured); non-textured primitives
|
||||
leave it zeroed.
|
||||
`RRect_Params` holds the rounded-rectangle parameters directly — there is no `Shape_Params` union.
|
||||
`Uv_Or_Gradient` is a `#raw_union` that aliases `[4]f32` (texture UV rect) with `[4]Color` (gradient
|
||||
corner colors, clockwise from top-left: TL, TR, BR, BL). The `flags` field encodes both the
|
||||
tessellated/SDF mode marker (low byte) and shape flags (bits 8+) via `pack_flags`.
|
||||
|
||||
### Draw submission order
|
||||
|
||||
@@ -583,14 +612,15 @@ invariant is that each primitive is drawn exactly once, in the pipeline that own
|
||||
Text rendering currently uses SDL_ttf's GPU text engine, which rasterizes glyphs per `(font, size)`
|
||||
pair into bitmap atlases and emits indexed triangle data via `GetGPUTextDrawData`. This path is
|
||||
**unchanged** by the SDF migration — text continues to flow through the main pipeline's tessellated
|
||||
mode with `shape_kind = Solid`, sampling the SDL_ttf atlas texture.
|
||||
mode with `mode = 0`, sampling the SDL_ttf atlas texture.
|
||||
|
||||
A future phase may evaluate MSDF (multi-channel signed distance field) text rendering, which would
|
||||
allow resolution-independent glyph rendering from a single small atlas per font. This would involve:
|
||||
|
||||
- Offline atlas generation via Chlumský's msdf-atlas-gen tool.
|
||||
- Runtime glyph metrics via `vendor:stb/truetype` (already in the Odin distribution).
|
||||
- A new `Shape_Kind.MSDF_Glyph` variant in the main pipeline's fragment shader.
|
||||
- A new MSDF glyph mode in the fragment shader, which would require reintroducing a mode/kind
|
||||
distinction (the current shader evaluates only `sdRoundedBox` with no kind dispatch).
|
||||
- Potential removal of the SDL_ttf dependency.
|
||||
|
||||
This is explicitly deferred. The SDF shape migration is independent of and does not block text
|
||||
@@ -659,30 +689,26 @@ with the same texture but different samplers produce separate draw calls, which
|
||||
|
||||
#### Textured draw procs
|
||||
|
||||
Textured rectangles route through the existing SDF path via `draw.rectangle_texture` and
|
||||
`draw.rectangle_texture_corners`, mirroring `draw.rectangle` and `draw.rectangle_corners` exactly —
|
||||
Textured rectangles route through the existing SDF path via `sdf_rectangle_texture` and
|
||||
`sdf_rectangle_texture_corners`, mirroring `sdf_rectangle` and `sdf_rectangle_corners` exactly —
|
||||
same parameters, same naming — with the color parameter replaced by a texture ID plus an optional
|
||||
tint.
|
||||
|
||||
An earlier iteration of this design considered a separate tessellated `draw.texture` proc for
|
||||
"simple" fullscreen quads, on the theory that the tessellated path's lower register count (~16 regs
|
||||
vs ~24 for the SDF textured branch) would improve occupancy at large fragment counts. Applying the
|
||||
register-pressure analysis from the pipeline-strategy section above shows this is wrong: both 16 and
|
||||
24 registers are well below the register cliff (~43 regs on consumer Ampere/Ada, ~32 on Volta/A100),
|
||||
so both run at 100% occupancy. The remaining ALU difference (~15 extra instructions for the SDF
|
||||
evaluation) amounts to ~20μs at 4K — below noise. Meanwhile, splitting into a separate pipeline
|
||||
would add ~1–5μs per pipeline bind on the CPU side per scissor, matching or exceeding the GPU-side
|
||||
savings. Within the main pipeline, unified remains strictly better.
|
||||
An earlier iteration of this design considered a separate tessellated proc for "simple" fullscreen
|
||||
quads, on the theory that the tessellated path's lower register count (~16 regs vs ~18 for the SDF
|
||||
textured branch) would improve occupancy at large fragment counts. Applying the register-pressure
|
||||
analysis from the pipeline-strategy section above shows this is wrong: both 16 and 18 registers are
|
||||
well below the register cliff (~43 regs on consumer Ampere/Ada, ~32 on Volta/A100), so both run at
|
||||
100% occupancy. The remaining ALU difference (~15 extra instructions for the SDF evaluation) amounts
|
||||
to ~20μs at 4K — below noise. Meanwhile, splitting into a separate pipeline would add ~1–5μs per
|
||||
pipeline bind on the CPU side per scissor, matching or exceeding the GPU-side savings. Within the
|
||||
main pipeline, unified remains strictly better.
|
||||
|
||||
The naming convention follows the existing shape API: `rectangle_texture` and
|
||||
`rectangle_texture_corners` sit alongside `rectangle` and `rectangle_corners`, mirroring the
|
||||
`rectangle_gradient` / `circle_gradient` pattern where the shape is the primary noun and the
|
||||
modifier (gradient, texture) is secondary. This groups related procs together in autocomplete
|
||||
(`rectangle_*`) and reads as natural English ("draw a rectangle with a texture").
|
||||
|
||||
Future per-shape texture variants (`circle_texture`, `ellipse_texture`, `polygon_texture`) are
|
||||
reserved by this naming convention and require only a `Shape_Flag.Textured` bit plus a small
|
||||
per-shape UV mapping function in the fragment shader. These are additive.
|
||||
The naming convention uses `sdf_` and `tes_` prefixes to indicate the rendering path, with suffixes
|
||||
for modifiers: `sdf_rectangle_texture` and `sdf_rectangle_texture_corners` sit alongside
|
||||
`sdf_rectangle` (solid or gradient overload). Proc groups like `sdf_rectangle` dispatch to
|
||||
`sdf_rectangle_solid` or `sdf_rectangle_gradient` based on argument count. Future per-shape texture
|
||||
variants (`sdf_circle_texture`) are additive.
|
||||
|
||||
#### What SDF anti-aliasing does and does not do for textured draws
|
||||
|
||||
@@ -721,9 +747,9 @@ textures onto a free list that is processed in `r_end_frame`, not at the call si
|
||||
|
||||
Clay's `RenderCommandType.Image` is handled by dereferencing `imageData: rawptr` as a pointer to a
|
||||
`Clay_Image_Data` struct containing a `Texture_Id`, `Fit_Mode`, and tint color. Routing mirrors the
|
||||
existing rectangle handling: zero `cornerRadius` dispatches to `draw.texture` (tessellated), nonzero
|
||||
dispatches to `draw.rectangle_texture_corners` (SDF). A `fit_params` call computes UVs from the fit
|
||||
mode before dispatch.
|
||||
existing rectangle handling: zero `cornerRadius` dispatches to `sdf_rectangle_texture` (SDF, sharp
|
||||
corners), nonzero dispatches to `sdf_rectangle_texture_corners` (SDF, per-corner radii). A
|
||||
`fit_params` call computes UVs from the fit mode before dispatch.
|
||||
|
||||
#### Deferred features
|
||||
|
||||
@@ -735,7 +761,7 @@ The following are plumbed in the descriptor but not implemented in phase 1:
|
||||
- **3D textures, arrays, cube maps**: `Texture_Desc.type` and `depth_or_layers` fields exist.
|
||||
- **Additional samplers**: anisotropic, trilinear, clamp-to-border — additive enum values.
|
||||
- **Atlas packing**: internal optimization for sub-batch coalescing; invisible to callers.
|
||||
- **Per-shape texture variants**: `circle_texture`, `ellipse_texture`, etc. — reserved by naming.
|
||||
- **Per-shape texture variants**: `sdf_circle_texture`, `tes_ellipse_texture`, `tes_polygon_texture` — potential future additions, reserved by naming convention.
|
||||
|
||||
**References:**
|
||||
|
||||
|
||||
Reference in New Issue
Block a user