Improved consistency with naming of init / create / destroy and when to propagate allocation errors and (#18 )

Co-authored-by: Zachary Levy <zachary@sunforge.is> Reviewed-on: #18
draw-improvements (#17 )
2026-04-24 21:46:21 +00:00 · 2026-04-24 07:57:44 +00:00 · 2026-04-22 06:03:10 +00:00 · 2026-04-22 04:47:59 +00:00 · 2026-04-22 00:05:08 +00:00
33 changed files with 4667 additions and 3017 deletions
@@ -70,6 +70,11 @@
    "command": "odin run draw/examples -debug -out=out/debug/draw-examples -- hellope-custom",
    "cwd": "$ZED_WORKTREE_ROOT",
  },
  {
    "label": "Run draw textures example",
    "command": "odin run draw/examples -debug -out=out/debug/draw-examples -- textures",
    "cwd": "$ZED_WORKTREE_ROOT",
  },
  {
    "label": "Run qrcode basic example",
    "command": "odin run qrcode/examples -debug -out=out/debug/qrcode-examples -- basic",
@@ -9,35 +9,51 @@ The renderer uses a single unified `Pipeline_2D_Base` (`TRIANGLELIST` pipeline)
 modes dispatched by a push constant:
 - **Mode 0 (Tessellated):** Vertex buffer contains real geometry. Used for text (indexed draws into
-  SDL_ttf atlas textures), axis-aligned sharp-corner rectangles (already optimal as 2 triangles),
+  SDL_ttf atlas textures), single-pixel points (`tes_pixel`), arbitrary user geometry (`tes_triangle`,
-  per-vertex color gradients (`rectangle_gradient`, `circle_gradient`), angular-clipped circle
+  `tes_triangle_fan`, `tes_triangle_strip`), and shapes without a closed-form rounded-rectangle
-  sectors (`circle_sector`), and arbitrary user geometry (`triangle`, `triangle_fan`,
+  reduction: ellipses (`tes_ellipse`), regular polygons (`tes_polygon`), and circle sectors
-  `triangle_strip`). The fragment shader computes `out = color * texture(tex, uv)`.
+  (`tes_sector`). The fragment shader computes `out = color * texture(tex, uv)`.
 - **Mode 1 (SDF):** A static 6-vertex unit-quad buffer is drawn instanced, with per-primitive
-  `Primitive` structs uploaded each frame to a GPU storage buffer. The vertex shader reads
+  `Primitive` structs (80 bytes each) uploaded each frame to a GPU storage buffer. The vertex shader
-  `primitives[gl_InstanceIndex]`, computes world-space position from unit quad corners + primitive
+  reads `primitives[gl_InstanceIndex]`, computes world-space position from unit quad corners +
-  bounds. The fragment shader dispatches on `Shape_Kind` to evaluate the correct signed distance
+  primitive bounds. The fragment shader always evaluates `sdRoundedBox` — there is no per-primitive
-  function analytically.
+  kind dispatch.
-Seven SDF shape kinds are implemented:
+The SDF path handles all shapes that are algebraically reducible to a rounded rectangle:
-1. **RRect** — rounded rectangle with per-corner radii (iq's `sdRoundedBox`)
+- **Rounded rectangles** — per-corner radii via `sdRoundedBox` (iq). Covers filled, stroked,
-2. **Circle** — filled or stroked circle
+  textured, and gradient-filled rectangles.
-3. **Ellipse** — exact signed-distance ellipse (iq's iterative `sdEllipse`)
+- **Circles** — uniform radii equal to half-size. Covers filled, stroked, and radial-gradient circles.
-4. **Segment** — capsule-style line segment with rounded caps
+- **Line segments / capsules** — rotated RRect with uniform radii equal to half-thickness (stadium shape).
-5. **Ring_Arc** — annular ring with angular clipping for arcs
+- **Full rings / annuli** — stroked circle (mid-radius with stroke thickness = outer - inner).
 6. **NGon** — regular polygon with arbitrary side count and rotation
 7. **Polyline** — decomposed into independent `Segment` primitives per adjacent point pair
-All SDF shapes support fill and stroke modes via `Shape_Flags`, and produce mathematically exact
+All SDF shapes support fill, stroke, solid color, bilinear 4-corner gradients, radial 2-color
-curves with analytical anti-aliasing via `smoothstep` — no tessellation, no piecewise-linear
+gradients, and texture fills via `Shape_Flags`. Gradient colors are packed into the same 16 bytes as
-approximation. A rounded rectangle is 1 primitive (64 bytes) instead of ~250 vertices (~5000 bytes).
+the texture UV rect via a `Uv_Or_Gradient` raw union — zero size increase to the 80-byte `Primitive`
 struct. Gradient and texture are mutually exclusive.
 All SDF shapes produce mathematically exact curves with analytical anti-aliasing via `smoothstep` —
 no tessellation, no piecewise-linear approximation. A rounded rectangle is 1 primitive (80 bytes)
 instead of ~250 vertices (~5000 bytes).
 The fragment shader's estimated register footprint is ~20–23 VGPRs via static live-range analysis.
 RRect and Ring_Arc are roughly tied at peak pressure — RRect carries `corner_radii` (4 regs) plus
 `sdRoundedBox` temporaries, Ring_Arc carries wedge normals plus dot-product temporaries. Both land
 comfortably under Mali Valhall's 32-register occupancy cliff (G57/G77/G78 and later) and well under
 desktop limits. On older Bifrost Mali (G71/G72/G76, 16-register cliff) either shape kind may incur
 partial occupancy reduction. These estimates are hand-counted; exact numbers require `malioc` or
 Radeon GPU Analyzer against the compiled SPIR-V.
 MSAA is opt-in (default `._1`, no MSAA) via `Init_Options.msaa_samples`. SDF rendering does not
 benefit from MSAA because fragment coverage is computed analytically. MSAA remains useful for text
 glyph edges and tessellated user geometry if desired.
 All public drawing procs use prefixed names for clarity: `sdf_*` for SDF-path shapes, `tes_*` for
 tessellated-path shapes. Proc groups provide a single entry point per shape concept (e.g.,
 `sdf_rectangle` dispatches to `sdf_rectangle_solid` or `sdf_rectangle_gradient` based on argument
 count).
 ## 2D rendering pipeline plan
 This section documents the planned architecture for levlib's 2D rendering system. The design is driven
@@ -47,68 +63,107 @@ primitives and effects can be added to the library without architectural changes
 ### Overview: three pipelines
-The 2D renderer will use three GPU pipelines, split by **register pressure compatibility** and
+The 2D renderer uses three GPU pipelines, split by **register pressure** (main vs effects) and
-**render-state requirements**:
+**render-pass structure** (everything vs backdrop):
-1. **Main pipeline** — shapes (SDF and tessellated) and text. Low register footprint (~18–22
+1. **Main pipeline** — shapes (SDF and tessellated), text, and textured rectangles. Low register
-   registers per thread). Runs at high GPU occupancy. Handles 90%+ of all fragments in a typical
+   footprint (~18–24 registers per thread). Runs at full GPU occupancy on every architecture.
-   frame.
+   Handles 90%+ of all fragments in a typical frame.
 2. **Effects pipeline** — drop shadows, inner shadows, outer glow, and similar ALU-bound blur
   effects. Medium register footprint (~48–60 registers). Each effects primitive includes the base
   shape's SDF so that it can draw both the effect and the shape in a single fragment pass, avoiding
-   redundant overdraw.
+   redundant overdraw. Separated from the main pipeline to protect main-pipeline occupancy on
   low-end hardware (see register analysis below).
-3. **Backdrop-effects pipeline** — frosted glass, refraction, and any effect that samples the current
+3. **Backdrop pipeline** — frosted glass, refraction, and any effect that samples the current render
-   render target as input. High register footprint (~70–80 registers) and structurally requires a
+   target as input. Implemented as a multi-pass sequence (downsample, separable blur, composite),
-   `CopyGPUTextureToTexture` from the render target before drawing. Separated both for register
+   where each individual pass has a low-to-medium register footprint (~15–40 registers). Separated
-   pressure and because the texture-copy requirement forces a render-pass-level state change.
+   from the other pipelines because it structurally requires ending the current render pass and
   copying the render target before any backdrop-sampling fragment can execute — a command-buffer-
   level boundary that cannot be avoided regardless of shader complexity.
 A typical UI frame with no effects uses 1 pipeline bind and 0 switches. A frame with drop shadows
 uses 2 pipelines and 1 switch. A frame with shadows and frosted glass uses all 3 pipelines and 2
-switches plus 1 texture copy. At ~5μs per pipeline bind on modern APIs, worst-case switching overhead
+switches plus 1 texture copy. At ~1–5μs per pipeline bind on modern APIs, worst-case switching
-is under 0.15% of an 8.3ms (120 FPS) frame budget.
+overhead is negligible relative to an 8.3ms (120 FPS) frame budget.
 ### Why three pipelines, not one or seven
 The natural question is whether we should use a single unified pipeline (fewer state changes, simpler
 code) or many per-primitive-type pipelines (no branching overhead, lean per-shader register usage).
-The dominant cost factor is **GPU register pressure**, not pipeline switching overhead or fragment
+#### Main/effects split: register pressure
 shader branching. A GPU shader core has a fixed register pool shared among all concurrent threads. The
 compiler allocates registers pessimistically based on the worst-case path through the shader. If the
 shader contains both a 20-register RRect SDF and a 72-register frosted-glass blur, _every_ fragment
 — even trivial RRects — is allocated 72 registers. This directly reduces **occupancy** (the number of
 warps that can run simultaneously), which reduces the GPU's ability to hide memory latency.
-Concrete example on a modern NVIDIA SM with 65,536 registers:
+A GPU shader core has a fixed register pool shared among all concurrent threads. The compiler
 allocates registers pessimistically based on the worst-case path through the shader. If the shader
 contains both a 20-register RRect SDF and a 48-register drop-shadow blur, _every_ fragment — even
 trivial RRects — is allocated 48 registers. This directly reduces **occupancy** (the number of
 warps/wavefronts that can run simultaneously), which reduces the GPU's ability to hide memory
 latency.
-| Register allocation       | Max concurrent threads | Occupancy |
+Each GPU architecture has a **register cliff** — a threshold above which occupancy starts dropping.
-| ------------------------- | ---------------------- | --------- |
+Below the cliff, adding registers has zero occupancy cost.
 | 20 regs (RRect only)      | 3,276                  | ~100%     |
 | 48 regs (+ drop shadow)   | 1,365                  | ~42%      |
 | 72 regs (+ frosted glass) | 910                    | ~28%      |
-For a 4K frame (3840×2160) at 1.5× overdraw (~12.4M fragments), running all fragments at 28%
+On consumer Ampere/Ada GPUs (RTX 30xx/40xx, 65,536 regs/SM, max 1,536 threads/SM, cliff at ~43 regs):
 occupancy instead of 100% roughly triples fragment shading time. At 4K this is severe: if the main
 pipeline's fragment work at full occupancy takes ~2ms, a single unified shader containing the glass
 branch would push it to ~6ms — consuming 72% of the 8.3ms budget available at 120 FPS and leaving
 almost nothing for CPU work, uploads, and presentation. This is a per-frame multiplier, not a
 per-primitive cost — it applies even when the heavy branch is never taken.
-The three-pipeline split groups primitives by register footprint so that:
+| Register allocation      | Reg-limited threads | Actual (hw-capped) | Occupancy |
 | ------------------------ | ------------------- | ------------------ | --------- |
 | ~16 regs (main pipeline) | 4,096               | 1,536              | 100%      |
 | 32 regs                  | 2,048               | 1,536              | 100%      |
 | 48 regs (effects)        | 1,365               | 1,365              | ~89%      |
- Main pipeline (~20 regs): 90%+ of fragments run at near-full occupancy.
+On Volta/A100 GPUs (65,536 regs/SM, max 2,048 threads/SM, cliff at ~32 regs):
 - Effects pipeline (~55 regs): shadow/glow fragments run at moderate occupancy; unavoidable given the
  blur math complexity.
 - Backdrop-effects pipeline (~75 regs): glass fragments run at low occupancy; also unavoidable, and
  structurally separated anyway by the texture-copy requirement.
-This avoids the register-pressure tax of a single unified shader while keeping pipeline count minimal
+| Register allocation      | Reg-limited threads | Actual (hw-capped) | Occupancy |
-(3 vs. Zed GPUI's 7). The effects that drag occupancy down are isolated to the fragments that
+| ------------------------ | ------------------- | ------------------ | --------- |
-actually need them.
+| ~16 regs (main pipeline) | 4,096               | 2,048              | 100%      |
 | 32 regs                  | 2,048               | 2,048              | 100%      |
 | 48 regs (effects)        | 1,365               | 1,365              | ~67%      |
-**Why not per-primitive-type pipelines (GPUI's approach)?** Zed's GPUI uses 7 separate shader pairs:
+On low-end mobile (ARM Mali Bifrost/Valhall, 64 regs/thread, cliff fixed at 32 regs):
 | Register allocation  | Occupancy                  |
 | -------------------- | -------------------------- |
 | 0–32 regs (main)     | 100% (full thread count)   |
 | 33–64 regs (effects) | ~50% (thread count halves) |
 Mali's cliff at 32 registers is the binding constraint. On desktop the occupancy difference between
 20 and 48 registers is modest (89–100%); on Mali it is a hard 2× throughput reduction. The
 main/effects split protects 90%+ of a frame's fragments (shapes, text, textures) from the effects
 pipeline's register cost.
 For the effects pipeline's drop-shadow shader — erf-approximation blur math with several texture
 fetches — 50% occupancy on Mali roughly halves throughput. At 4K with 1.5× overdraw (~12.4M
 fragments), a single unified shader containing the shadow branch would cost ~4ms instead of ~2ms on
 low-end mobile. This is a per-frame multiplier even when the heavy branch is never taken, because the
 compiler allocates registers for the worst-case path.
 All main-pipeline members (SDF shapes, tessellated geometry, text, textured rectangles) cluster at
 12–24 registers — below the cliff on every architecture — so unifying them costs nothing in
 occupancy.
 **Note on Apple M3+ GPUs:** Apple's M3 introduces Dynamic Caching (register file virtualization),
 which allocates registers at runtime based on actual usage rather than worst-case. This weakens the
 static register-pressure argument on M3 and later, but the split remains useful for isolating blur
 ALU complexity and keeping the backdrop texture-copy out of the main render pass.
 #### Backdrop split: render-pass structure
 The backdrop pipeline (frosted glass, refraction, mirror surfaces) is separated for a structural
 reason unrelated to register pressure. Before any backdrop-sampling fragment can execute, the current
 render target must be copied to a separate texture via `CopyGPUTextureToTexture` — a command-buffer-
 level operation that requires ending the current render pass. This boundary exists regardless of
 shader complexity and cannot be optimized away.
 The backdrop pipeline's individual shader passes (downsample, separable blur, composite) are
 register-light (~15–40 regs each), so merging them into the effects pipeline would cause no occupancy
 problem. But the render-pass boundary makes merging structurally impossible — effects draws happen
 inside the main render pass, backdrop draws happen inside their own bracketed pass sequence.
 #### Why not per-primitive-type pipelines (GPUI's approach)
 Zed's GPUI uses 7 separate shader pairs:
 quad, shadow, underline, monochrome sprite, polychrome sprite, path, surface. This eliminates all
 branching and gives each shader minimal register usage. Three concrete costs make this approach wrong
 for our use case:
@@ -120,7 +175,7 @@ typical UI frame with 15 scissors and 3–4 primitive kinds per scissor, per-kin
 ~45–60 draw calls and pipeline binds; our unified approach produces ~15–20 draw calls and 1–5
 pipeline binds. At ~5μs each for CPU-side command encoding on modern APIs, per-kind splitting adds
 375–500μs of CPU overhead per frame — **4.5–6% of an 8.3ms (120 FPS) budget** — with no
-compensating GPU-side benefit, because the register-pressure savings within the simple-SDF tier are
+compensating GPU-side benefit, because the register-pressure savings within the simple-SDF range are
 negligible (all members cluster at 12–22 registers).
 **Z-order preservation forces the API to expose layers.** With a single pipeline drawing all kinds
@@ -159,10 +214,10 @@ in submission order:
  ~60 boundary warps at ~80 extra instructions each), unified divergence costs ~13μs — still 3.5×
  cheaper than the pipeline-switching alternative.
-The split we _do_ perform (main / effects / backdrop-effects) is motivated by register-pressure tier
+The split we _do_ perform (main / effects / backdrop) is motivated by register-pressure boundaries
-boundaries where occupancy differences are catastrophic at 4K (see numbers above). Within a tier,
+and structural render-pass requirements (see analysis above). Within a pipeline, unified is
-unified is strictly better by every measure: fewer draw calls, simpler Z-order, lower CPU overhead,
+strictly better by every measure: fewer draw calls, simpler Z-order, lower CPU overhead, and
-and negligible GPU-side branching cost.
+negligible GPU-side branching cost.
 **References:**
@@ -172,6 +227,16 @@ and negligible GPU-side branching cost.
  https://github.com/zed-industries/zed/blob/cb6fc11/crates/gpui/src/platform/mac/shaders.metal
 - NVIDIA Nsight Graphics 2024.3 documentation on active-threads-per-warp and divergence analysis:
  https://developer.nvidia.com/blog/optimize-gpu-workloads-for-graphics-applications-with-nvidia-nsight-graphics/
 - NVIDIA Ampere GPU Architecture Tuning Guide — SM specs, max warps per SM (48 for cc 8.6, 64 for
  cc 8.0), register file size (64K), occupancy factors:
  https://docs.nvidia.com/cuda/ampere-tuning-guide/index.html
 - NVIDIA Ada GPU Architecture Tuning Guide — SM specs, max warps per SM (48 for cc 8.9):
  https://docs.nvidia.com/cuda/ada-tuning-guide/index.html
 - CUDA Occupancy Calculation walkthrough (register allocation granularity, worked examples):
  https://leimao.github.io/blog/CUDA-Occupancy-Calculation/
 - Apple M3 GPU architecture — Dynamic Caching (register file virtualization) eliminates static
  worst-case register allocation, reducing the occupancy penalty for high-register shaders:
  https://asplos.dev/wiki/m3-chip-explainer/gpu/index.html
 ### Why fragment shader branching is safe in this design
@@ -212,11 +277,12 @@ Our design has two branch points:
   Every thread in every warp of a draw call sees the same `mode` value. **Zero divergence, zero
   cost.**
-2. **`shape_kind` (flat varying from storage buffer): which SDF to evaluate.** This is category 3.
+2. **`flags` (flat varying from storage buffer): gradient/texture/stroke mode.** This is category 3.
   The `flat` interpolation qualifier ensures that all fragments rasterized from one primitive's quad
-   receive the same `shape_kind` value. Divergence can only occur at the **boundary between two
+   receive the same flag bits. However, since the SDF path now evaluates only `sdRoundedBox` with no
-   adjacent primitives of different kinds**, where the rasterizer might pack fragments from both
+   kind dispatch, the only flag-dependent branches are gradient vs. texture vs. solid color selection
-   primitives into the same warp.
+   — all lightweight (3–8 instructions per path). Divergence at primitive boundaries between
   different flag combinations has negligible cost.
 For category 3, the divergence analysis depends on primitive size:
@@ -233,9 +299,10 @@ For category 3, the divergence analysis depends on primitive size:
  frame-level divergence is typically **1–3%** of all warps.
 At 1–3% divergence, the throughput impact is negligible. At 4K with 12.4M total fragments
-(~387,000 warps), divergent boundary warps number in the low thousands. Each divergent warp pays at
+(~387,000 warps), divergent boundary warps number in the low thousands. Without kind dispatch, the
-most ~25 extra instructions (the cost of the longest untaken SDF branch). At ~12G instructions/sec
+longest untaken branch is the gradient evaluation (~8 instructions), not a different SDF function.
-on a mid-range GPU, that totals ~4μs — under 0.05% of an 8.3ms (120 FPS) frame budget. This is
+Each divergent warp pays at most ~8 extra instructions. At ~12G instructions/sec on a mid-range GPU,
 that totals ~1.3μs — under 0.02% of an 8.3ms (120 FPS) frame budget. This is
 confirmed by production renderers that use exactly this pattern:
 - **vger / vger-rs** (Audulus): single pipeline, 11 primitive kinds dispatched by a `switch` on a
@@ -260,9 +327,10 @@ our design:
   > have no per-fragment data-dependent branches in the main pipeline.
 2. **Branches where both paths are very long.** If both sides of a branch are 500+ instructions,
-   divergent warps pay double a large cost. Our SDF functions are 10–25 instructions each. Even
+   divergent warps pay double a large cost. Without kind dispatch, the SDF path always evaluates
-   fully divergent, the penalty is ~25 extra instructions — less than a single texture sample's
+   `sdRoundedBox`; the only branches are gradient/texture/solid color selection at 3–8 instructions
-   latency.
+   each. Even fully divergent, the penalty is ~8 extra instructions — less than a single texture
   sample's latency.
 3. **Branches that prevent compiler optimizations.** Some compilers cannot schedule instructions
   across branch boundaries, reducing VLIW utilization on older architectures. Modern GPUs (NVIDIA
@@ -270,9 +338,9 @@ our design:
   concern.
 4. **Register pressure from the union of all branches.** This is the real cost, and it is why we
-   split heavy effects (shadows, glass) into separate pipelines. Within the main pipeline, all SDF
+   split heavy effects (shadows, glass) into separate pipelines. Within the main pipeline, the SDF
-   branches have similar register footprints (12–22 registers), so combining them causes negligible
+   path has a single evaluation (sdRoundedBox) with flag-based color selection, clustering at ~15–18
-   occupancy loss.
+   registers, so there is negligible occupancy loss.
 **References:**
@@ -293,17 +361,19 @@ our design:
 ### Main pipeline: SDF + tessellated (unified)
 The main pipeline serves two submission modes through a single `TRIANGLELIST` pipeline and a single
-vertex input layout, distinguished by a push constant:
+vertex input layout, distinguished by a mode marker in the `Primitive.flags` field (low byte:
 0 = tessellated, 1 = SDF). The tessellated path sets this to 0 via zero-initialization in the vertex
 shader; the SDF path sets it to 1 via `pack_flags`.
- **Tessellated mode** (`mode = 0`): direct vertex buffer with explicit geometry. Unchanged from
+- **Tessellated mode** (`mode = 0`): direct vertex buffer with explicit geometry. Used for text
-  today. Used for text (SDL_ttf atlas sampling), polylines, triangle fans/strips, gradient-filled
+  (SDL_ttf atlas sampling), triangle fans/strips, ellipses, regular polygons, circle sectors, and
-  shapes, and any user-provided raw vertex geometry.
+  any user-provided raw vertex geometry.
 - **SDF mode** (`mode = 1`): shared unit-quad vertex buffer + GPU storage buffer of `Primitive`
  structs, drawn instanced. Used for all shapes with closed-form signed distance functions.
-Both modes converge on the same fragment shader, which dispatches on a `shape_kind` discriminant
+Both modes use the same fragment shader. The fragment shader checks the mode marker: mode 0 computes
-carried either in the vertex data (tessellated, always `Solid = 0`) or in the storage-buffer
+`out = color * texture(tex, uv)`; mode 1 always evaluates `sdRoundedBox` and applies
-primitive struct (SDF modes).
+gradient/texture/solid color based on flag bits.
 #### Why SDF for shapes
@@ -342,49 +412,60 @@ SDF primitives are submitted via a GPU storage buffer indexed by `gl_InstanceInd
 shader, rather than encoding per-primitive data redundantly in vertex attributes. This follows the
 pattern used by both Zed GPUI and vger-rs.
-Each SDF shape is described by a single `Primitive` struct (~56 bytes) in the storage buffer. The
+Each SDF shape is described by a single `Primitive` struct (80 bytes) in the storage buffer. The
 vertex shader reads `primitives[gl_InstanceIndex]`, computes the quad corner position from the unit
 vertex and the primitive's bounds, and passes shape parameters to the fragment shader via `flat`
 interpolated varyings.
 Compared to encoding per-primitive data in vertex attributes (the "fat vertex" approach), storage-
 buffer instancing eliminates the 4–6× data duplication across quad corners. A rounded rectangle costs
-56 bytes instead of 4 vertices × 40+ bytes = 160+ bytes.
+80 bytes instead of 4 vertices × 40+ bytes = 160+ bytes.
 The tessellated path retains the existing direct vertex buffer layout (20 bytes/vertex, no storage
 buffer access). The vertex shader branch on `mode` (push constant) is warp-uniform — every invocation
 in a draw call has the same mode — so it is effectively free on all modern GPUs.
-#### Shape kinds
+#### Shape folding
-Primitives in the main pipeline's storage buffer carry a `Shape_Kind` discriminant:
+The SDF path evaluates a single function — `sdRoundedBox` — for all primitives. There is no
 `Shape_Kind` enum or per-primitive kind dispatch in the fragment shader. Shapes that are algebraically
 special cases of a rounded rectangle are emitted as RRect primitives by the CPU-side drawing procs:
-| Kind       | SDF function                           | Notes                                                     |
+| User-facing shape            | RRect mapping                                | Notes                                    |
-| ---------- | -------------------------------------- | --------------------------------------------------------- |
+| ---------------------------- | -------------------------------------------- | ---------------------------------------- |
-| `RRect`    | `sdRoundedBox` (iq)                    | Per-corner radii. Covers all Clay rectangles and borders. |
+| Rectangle (sharp or rounded) | Direct                                       | Per-corner radii from `radii` param      |
-| `Circle`   | `sdCircle`                             | Filled and stroked.                                       |
+| Circle                       | `half_size = (r, r)`, `radii = (r, r, r, r)` | Uniform radii = half-size                |
-| `Ellipse`  | `sdEllipse`                            | Exact (iq's closed-form).                                 |
+| Line segment / capsule       | Rotated RRect, `radii = half_thickness`      | Stadium shape (fully-rounded minor axis) |
-| `Segment`  | `sdSegment` capsule                    | Rounded caps, correct sub-pixel thin lines.               |
+| Full ring / annulus          | Stroked circle at mid-radius                 | `stroke_px = outer - inner`              |
 | `Ring_Arc` | `abs(sdCircle) - thickness` + arc mask | Rings, arcs, circle sectors unified.                      |
 | `NGon`     | `sdRegularPolygon`                     | Regular n-gon for n ≥ 5.                                  |
-The `Solid` kind (value 0) is reserved for the tessellated path, where `shape_kind` is implicitly
+Shapes without a closed-form RRect reduction are drawn via the tessellated path:
 zero because the fragment shader receives it from zero-initialized vertex attributes.
-Stroke/outline variants of each shape are handled by the `Shape_Flags` bit set rather than separate
+| Shape                     | Tessellated proc                   | Method                     |
-shape kinds. The fragment shader transforms `d = abs(d) - stroke_width` when the `Stroke` flag is
+| ------------------------- | ---------------------------------- | -------------------------- |
-set.
+| Ellipse                   | `tes_ellipse`, `tes_ellipse_lines` | Triangle fan approximation |
 | Regular polygon (N-gon)   | `tes_polygon`, `tes_polygon_lines` | Triangle fan from center   |
 | Circle sector (pie slice) | `tes_sector`                       | Triangle fan arc           |
 The `Shape_Flags` bit set controls rendering mode per primitive:
 | Flag              | Bit | Effect                                                               |
 | ----------------- | --- | -------------------------------------------------------------------- |
 | `Stroke`          | 0   | Outline instead of fill (`d = abs(d) - stroke_width/2`)              |
 | `Textured`        | 1   | Sample texture using `uv.uv_rect` (mutually exclusive with Gradient) |
 | `Gradient`        | 2   | Bilinear 4-corner interpolation from `uv.corner_colors`              |
 | `Gradient_Radial` | 3   | Radial 2-color falloff (inner/outer) from `uv.corner_colors[0..1]`   |
 **What stays tessellated:**
 - Text (SDL_ttf atlas, pending future MSDF evaluation)
- `rectangle_gradient`, `circle_gradient` (per-vertex color interpolation)
+- Ellipses (`tes_ellipse`, `tes_ellipse_lines`)
- `triangle_fan`, `triangle_strip` (arbitrary user-provided point lists)
+- Regular polygons (`tes_polygon`, `tes_polygon_lines`)
- `line_strip` / polylines (SDF polyline rendering is possible but complex; deferred)
+- Circle sectors / pie slices (`tes_sector`)
 - `tes_triangle`, `tes_triangle_fan`, `tes_triangle_strip` (arbitrary user-provided geometry)
 - Any raw vertex geometry submitted via `prepare_shape`
-The rule: if the shape has a closed-form SDF, it goes SDF. If it's described only by a vertex list or
+The design rule: if the shape reduces to `sdRoundedBox`, it goes SDF. If it requires a different SDF
-needs per-vertex color interpolation, it stays tessellated.
+function or is described by a vertex list, it stays tessellated.
 ### Effects pipeline
@@ -442,25 +523,40 @@ Wallace's variant) and vger-rs.
 - Vello's implementation of blurred rounded rectangle as a gradient type:
  https://github.com/linebender/vello/pull/665
-### Backdrop-effects pipeline
+### Backdrop pipeline
-The backdrop-effects pipeline handles effects that sample the current render target as input: frosted
+The backdrop pipeline handles effects that sample the current render target as input: frosted glass,
-glass, refraction, mirror surfaces. It is structurally separated from the effects pipeline for two
+refraction, mirror surfaces. It is separated from the effects pipeline for a structural reason, not
-reasons:
+register pressure.
-1. **Render-state requirement.** Before any backdrop-sampling fragment can run, the current render
+**Render-pass boundary.** Before any backdrop-sampling fragment can run, the current render target
-   target must be copied to a separate texture via `CopyGPUTextureToTexture`. This is a command-
+must be copied to a separate texture via `CopyGPUTextureToTexture`. This is a command-buffer-level
-   buffer-level operation that cannot happen mid-render-pass. The copy naturally creates a pipeline
+operation that cannot happen mid-render-pass. The copy naturally creates a pipeline boundary that no
-   boundary.
+amount of shader optimization can eliminate — it is a fundamental requirement of sampling a surface
 while also writing to it.
-2. **Register pressure.** Backdrop-sampling shaders read from a texture with Gaussian kernel weights
+**Multi-pass implementation.** Backdrop effects are implemented as separable multi-pass sequences
-   (multiple texture fetches per fragment), pushing register usage to ~70–80. Including this in the
+(downsample → horizontal blur → vertical blur → composite), following the standard approach used by
-   effects pipeline would reduce occupancy for all shadow/glow fragments from ~30% to ~20%, costing
+iOS `UIVisualEffectView`, Android `RenderEffect`, and Flutter's `BackdropFilter`. Each individual
-   measurable throughput on the common case.
+pass has a low-to-medium register footprint (~15–40 registers), well within the main pipeline's
 occupancy range. The multi-pass approach avoids the monolithic 70+ register shader that a single-pass
 Gaussian blur would require, making backdrop effects viable on low-end mobile GPUs (including
 Mali-G31 and VideoCore VI) where per-thread register limits are tight.
-The backdrop-effects pipeline binds a secondary sampler pointing at the captured backdrop texture. When
+**Bracketed execution.** All backdrop draws in a frame share a single bracketed region of the command
-no backdrop effects are present in a frame, this pipeline is never bound and the texture copy never
+buffer: end the current render pass, copy the render target, execute all backdrop sub-passes, then
-happens — zero cost.
+resume normal drawing. The entry/exit cost (texture copy + render-pass break) is paid once per frame
 regardless of how many backdrop effects are visible. When no backdrop effects are present, the bracket
 is never entered and the texture copy never happens — zero cost.
 **Why not split the backdrop sub-passes into separate pipelines?** The individual passes range from
 ~15 to ~40 registers, which does cross Mali's 32-register cliff. However, the register-pressure argument
 that justifies the main/effects split does not apply here. The main/effects split protects the
 _common path_ (90%+ of frame fragments) from the uncommon path's register cost. Inside the backdrop
 pipeline there is no common-vs-uncommon distinction — if backdrop effects are active, every sub-pass
 runs; if not, none run. The backdrop pipeline either executes as a complete unit or not at all.
 Additionally, backdrop effects cover a small fraction of the frame's total fragments (~5% at typical
 UI scales), so the occupancy variation within the bracket has negligible impact on frame time.
 ### Vertex layout
@@ -483,19 +579,21 @@ The `Primitive` struct for SDF shapes lives in the storage buffer, not in vertex
 ```
 Primitive :: struct {
-    kind:   Shape_Kind,     //  0: enum u8
+    bounds:      [4]f32,          //  0: min_x, min_y, max_x, max_y
-    flags:  Shape_Flags,    //  1: bit_set[Shape_Flag; u8]
+    color:       Color,           // 16: u8x4, unpacked in shader via unpackUnorm4x8
-    _pad:   u16,            //  2: reserved
+    flags:       u32,             // 20: low byte = Shape_Kind, bits 8+ = Shape_Flags
-    bounds: [4]f32,         //  4: min_x, min_y, max_x, max_y
+    rotation_sc: u32,             // 24: packed f16 pair (sin, cos). Requires .Rotated flag.
-    color:  Color,          // 20: u8x4
+    _pad:        f32,             // 28: reserved for future use
-    _pad2:  [3]u8,          // 24: alignment
+    params:      Shape_Params,    // 32: per-kind params union (half_feather, radii, etc.) (32 bytes)
-    params: Shape_Params,   // 28: raw union, 32 bytes
+    uv:          Uv_Or_Effects,   // 64: texture UV rect or gradient/outline parameters (16 bytes)
 }
-// Total: 60 bytes (padded to 64 for GPU alignment)
+// Total: 80 bytes (std430 aligned)
 ```
-`Shape_Params` is a `#raw_union` with named variants per shape kind (`rrect`, `circle`, `segment`,
+`RRect_Params` holds the rounded-rectangle parameters directly — there is no `Shape_Params` union.
-etc.), ensuring type safety on the CPU side and zero-cost reinterpretation on the GPU side.
+`Uv_Or_Gradient` is a `#raw_union` that aliases `[4]f32` (texture UV rect) with `[4]Color` (gradient
 corner colors, clockwise from top-left: TL, TR, BR, BL). The `flags` field encodes both the
 tessellated/SDF mode marker (low byte) and shape flags (bits 8+) via `pack_flags`.
 ### Draw submission order
@@ -506,7 +604,7 @@ Within each scissor region, draws are issued in submission order to preserve the
 2. Bind **main pipeline, tessellated mode** → draw all queued tessellated vertices (non-indexed for
   shapes, indexed for text). Pipeline state unchanged from today.
 3. Bind **main pipeline, SDF mode** → draw all queued SDF primitives (instanced, one draw call).
-4. If backdrop effects are present: copy render target, bind **backdrop-effects pipeline** → draw
+4. If backdrop effects are present: copy render target, bind **backdrop pipeline** → draw
   backdrop primitives.
 The exact ordering within a scissor may be refined based on actual Z-ordering requirements. The key
@@ -517,14 +615,15 @@ invariant is that each primitive is drawn exactly once, in the pipeline that own
 Text rendering currently uses SDL_ttf's GPU text engine, which rasterizes glyphs per `(font, size)`
 pair into bitmap atlases and emits indexed triangle data via `GetGPUTextDrawData`. This path is
 **unchanged** by the SDF migration — text continues to flow through the main pipeline's tessellated
-mode with `shape_kind = Solid`, sampling the SDL_ttf atlas texture.
+mode with `mode = 0`, sampling the SDL_ttf atlas texture.
 A future phase may evaluate MSDF (multi-channel signed distance field) text rendering, which would
 allow resolution-independent glyph rendering from a single small atlas per font. This would involve:
 - Offline atlas generation via Chlumský's msdf-atlas-gen tool.
 - Runtime glyph metrics via `vendor:stb/truetype` (already in the Odin distribution).
- A new `Shape_Kind.MSDF_Glyph` variant in the main pipeline's fragment shader.
+- A new MSDF glyph mode in the fragment shader, which would require reintroducing a mode/kind
  distinction (the current shader evaluates only `sdRoundedBox` with no kind dispatch).
 - Potential removal of the SDL_ttf dependency.
 This is explicitly deferred. The SDF shape migration is independent of and does not block text
@@ -539,12 +638,176 @@ changes.
 - Valve's original SDF text rendering paper (SIGGRAPH 2007):
  https://steamcdn-a.akamaihd.net/apps/valve/2007/SIGGRAPH2007_AlphaTestedMagnification.pdf
 ### Textures
 Textures plug into the existing main pipeline — no additional GPU pipeline, no shader rewrite. The
 work is a resource layer (registration, upload, sampling, lifecycle) plus two textured-draw procs
 that route into the existing tessellated and SDF paths respectively.
 #### Why draw owns registered textures
 A texture's GPU resource (the `^sdl.GPUTexture`, transfer buffer, shader resource view) is created
 and destroyed by draw. The user provides raw bytes and a descriptor at registration time; draw
 uploads synchronously and returns an opaque `Texture_Id` handle. The user can free their CPU-side
 bytes immediately after `register_texture` returns.
 This follows the model used by the RAD Debugger's render layer (`src/render/render_core.h` in
 EpicGamesExt/raddebugger, MIT license), where `r_tex2d_alloc` takes `(kind, size, format, data)`
 and returns an opaque handle that the renderer owns and releases. The single-owner model eliminates
 an entire class of lifecycle bugs (double-free, use-after-free across subsystems, unclear cleanup
 responsibility) that dual-ownership designs introduce.
 If advanced interop is ever needed (e.g., a future 3D pipeline or compute shader sharing the same
 GPU texture), the clean extension is a borrowed-reference accessor (`get_gpu_texture(id)`) that
 returns the underlying handle without transferring ownership. This is purely additive and does not
 require changing the registration API.
 #### Why `Texture_Kind` exists
 `Texture_Kind` (Static / Dynamic / Stream) is a driver hint for update frequency, adopted from the
 RAD Debugger's `R_ResourceKind`. It maps directly to SDL3 GPU usage patterns:
 - **Static**: uploaded once, never changes. Covers QR codes, decoded PNGs, icons — the 90% case.
 - **Dynamic**: updatable via `update_texture_region`. Covers font atlas growth, procedural updates.
 - **Stream**: frequent full re-uploads. Covers video playback, per-frame procedural generation.
 This costs one byte in the descriptor and lets the backend pick optimal memory placement without a
 future API change.
 #### Why samplers are per-draw, not per-texture
 A sampler describes how to filter and address a texture during sampling — nearest vs bilinear, clamp
 vs repeat. This is a property of the _draw_, not the texture. The same QR code texture should be
 sampled with `Nearest_Clamp` when displayed at native resolution but could reasonably be sampled
 with `Linear_Clamp` in a zoomed-out thumbnail. The same icon atlas might be sampled with
 `Nearest_Clamp` for pixel art or `Linear_Clamp` for smooth scaling.
 The RAD Debugger follows this pattern: `R_BatchGroup2DParams` carries `tex_sample_kind` alongside
 the texture handle, chosen per batch group at draw time. We do the same — `Sampler_Preset` is a
 parameter on the draw procs, not a field on `Texture_Desc`.
 Internally, draw keeps a small pool of pre-created `^sdl.GPUSampler` objects (one per preset,
 lazily initialized). Sub-batch coalescing keys on `(kind, texture_id, sampler_preset)` — draws
 with the same texture but different samplers produce separate draw calls, which is correct.
 #### Textured draw procs
 Textured rectangles route through the existing SDF path via `sdf_rectangle_texture` and
 `sdf_rectangle_texture_corners`, mirroring `sdf_rectangle` and `sdf_rectangle_corners` exactly —
 same parameters, same naming — with the color parameter replaced by a texture ID plus an optional
 tint.
 An earlier iteration of this design considered a separate tessellated proc for "simple" fullscreen
 quads, on the theory that the tessellated path's lower register count (~16 regs vs ~18 for the SDF
 textured branch) would improve occupancy at large fragment counts. Applying the register-pressure
 analysis from the pipeline-strategy section above shows this is wrong: both 16 and 18 registers are
 well below the register cliff (~43 regs on consumer Ampere/Ada, ~32 on Volta/A100), so both run at
 100% occupancy. The remaining ALU difference (~15 extra instructions for the SDF evaluation) amounts
 to ~20μs at 4K — below noise. Meanwhile, splitting into a separate pipeline would add ~1–5μs per
 pipeline bind on the CPU side per scissor, matching or exceeding the GPU-side savings. Within the
 main pipeline, unified remains strictly better.
 The naming convention uses `sdf_` and `tes_` prefixes to indicate the rendering path, with suffixes
 for modifiers: `sdf_rectangle_texture` and `sdf_rectangle_texture_corners` sit alongside
 `sdf_rectangle` (solid or gradient overload). Proc groups like `sdf_rectangle` dispatch to
 `sdf_rectangle_solid` or `sdf_rectangle_gradient` based on argument count. Future per-shape texture
 variants (`sdf_circle_texture`) are additive.
 #### What SDF anti-aliasing does and does not do for textured draws
 The SDF path anti-aliases the **shape's outer silhouette** — rounded-corner edges, rotated edges,
 stroke outlines. It does not anti-alias or sharpen the texture content. Inside the shape, fragments
 sample through the chosen `Sampler_Preset`, and image quality is whatever the sampler produces from
 the source texels. A low-resolution texture displayed at a large size shows bilinear blur regardless
 of which draw proc is used. This matches the current text-rendering model, where glyph sharpness
 depends on how closely the display size matches the SDL_ttf atlas's rasterized size.
 #### Fit modes are a computation layer, not a renderer concept
 Standard image-fit behaviors (stretch, fill/cover, fit/contain, tile, center) are expressed as UV
 sub-region computations on top of the `uv_rect` parameter that both textured-draw procs accept. The
 renderer has no knowledge of fit modes — it samples whatever UV region it is given.
 A `fit_params` helper computes the appropriate `uv_rect`, sampler preset, and (for letterbox/fit
 mode) shrunken inner rect from a `Fit_Mode` enum, the target rect, and the texture's pixel size.
 Users who need custom UV control (sprite atlas sub-regions, UV animation, nine-patch slicing) skip
 the helper and compute `uv_rect` directly. This keeps the renderer primitive minimal while making
 the common cases convenient.
 #### Deferred release
 `unregister_texture` does not immediately release the GPU texture. It queues the slot for release at
 the end of the current frame, after `SubmitGPUCommandBuffer` has handed work to the GPU. This
 prevents a race condition where a texture is freed while the GPU is still sampling from it in an
 already-submitted command buffer. The same deferred-release pattern is applied to `clear_text_cache`
 and `clear_text_cache_entry`, fixing a pre-existing latent bug where destroying a cached
 `^sdl_ttf.Text` mid-frame could free an atlas texture still referenced by in-flight draw batches.
 This pattern is standard in production renderers — the RAD Debugger's `r_tex2d_release` queues
 textures onto a free list that is processed in `r_end_frame`, not at the call site.
 #### Clay integration
 Clay's `RenderCommandType.Image` is handled by dereferencing `imageData: rawptr` as a pointer to a
 `Clay_Image_Data` struct containing a `Texture_Id`, `Fit_Mode`, and tint color. Routing mirrors the
 existing rectangle handling: zero `cornerRadius` dispatches to `sdf_rectangle_texture` (SDF, sharp
 corners), nonzero dispatches to `sdf_rectangle_texture_corners` (SDF, per-corner radii). A
 `fit_params` call computes UVs from the fit mode before dispatch.
 #### Deferred features
 The following are plumbed in the descriptor but not implemented in phase 1:
 - **Mipmaps**: `Texture_Desc.mip_levels` field exists; generation via SDL3 deferred.
 - **Compressed formats**: `Texture_Desc.format` accepts BC/ASTC; upload path deferred.
 - **Render-to-texture**: `Texture_Desc.usage` accepts `.COLOR_TARGET`; render-pass refactor deferred.
 - **3D textures, arrays, cube maps**: `Texture_Desc.type` and `depth_or_layers` fields exist.
 - **Additional samplers**: anisotropic, trilinear, clamp-to-border — additive enum values.
 - **Atlas packing**: internal optimization for sub-batch coalescing; invisible to callers.
 - **Per-shape texture variants**: `sdf_circle_texture`, `tes_ellipse_texture`, `tes_polygon_texture` — potential future additions, reserved by naming convention.
 **References:**
 - RAD Debugger render layer (ownership model, deferred release, sampler-at-draw-time):
  https://github.com/EpicGamesExt/raddebugger — `src/render/render_core.h`, `src/render/d3d11/render_d3d11.c`
 - Casey Muratori, Handmade Hero day 472 — texture handling as a renderer-owned resource concern,
  atlases as a separate layer above the renderer.
 ## 3D rendering
 3D pipeline architecture is under consideration and will be documented separately. The current
 expectation is that 3D rendering will use dedicated pipelines (separate from the 2D pipelines)
 sharing GPU resources (textures, samplers, command buffer lifecycle) with the 2D renderer.
 ## Multi-window support
 The renderer currently assumes a single window via the global `GLOB` state. Multi-window support is
 deferred but anticipated. When revisited, the RAD Debugger's bucket + pass-list model
 (`src/draw/draw.h`, `src/draw/draw.c` in EpicGamesExt/raddebugger) is worth studying as a reference.
 RAD separates draw submission from rendering via **buckets**. A `DR_Bucket` is an explicit handle
 that accumulates an ordered list of render passes (`R_PassList`). The user creates a bucket, pushes
 it onto a thread-local stack, issues draw calls (which target the top-of-stack bucket), then submits
 the bucket to a specific window. Multiple buckets can exist simultaneously — one per window, or one
 per UI panel that gets composited into a parent bucket via `dr_sub_bucket`. Implicit draw parameters
 (clip rect, 2D transform, sampler mode, transparency) are managed via push/pop stacks scoped to each
 bucket, so different windows can have independent clip and transform state without interference.
 The key properties this gives RAD:
 - **Per-window isolation.** Each window builds its own bucket with its own pass list and state stacks.
  No global contention.
 - **Thread-parallel building.** Each thread has its own draw context and arena. Multiple threads can
  build buckets concurrently, then submit them to the render backend sequentially.
 - **Compositing.** A pre-built bucket (e.g., a tooltip or overlay) can be injected into another
  bucket with a transform applied, without rebuilding its draw calls.
 For our library, the likely adaptation would be replacing the single `GLOB` with a per-window draw
 context that users create and pass to `begin`/`end`, while keeping the explicit-parameter draw call
 style rather than adopting RAD's implicit state stacks. Texture and sampler resources would remain
 global (shared across windows), with only the per-frame staging buffers and layer/scissor state
 becoming per-context.
 ## Building shaders
 GLSL shader sources live in `shaders/source/`. Compiled outputs (SPIR-V and Metal Shading Language)
@@ -4,6 +4,7 @@ import "base:runtime"
 import "core:c"
 import "core:log"
 import "core:math"
 import "core:strings"
 import sdl "vendor:sdl3"
 import sdl_ttf "vendor:sdl3/ttf"
@@ -27,11 +28,109 @@ BUFFER_INIT_SIZE :: 256
 INITIAL_LAYER_SIZE :: 5
 INITIAL_SCISSOR_SIZE :: 10
 // Sentinel value: when passed as msaa_samples, `init` will use the maximum MSAA sample count
 // supported by the GPU for the swapchain format.
 MSAA_MAX :: sdl.GPUSampleCount(0xFF)
 // ----- Default parameter values -----
 // Named constants for non-zero default procedure parameters. Centralizes magic numbers
 // so they can be tuned in one place and referenced by name in proc signatures.
 DFT_FEATHER_PX :: 1 // Total AA feather width in physical pixels (half on each side of boundary).
 DFT_STROKE_THICKNESS :: 1 // Default line/stroke thickness in logical pixels.
 DFT_FONT_SIZE :: 44 // Default font size in points for text rendering.
 DFT_CIRC_END_ANGLE :: 360 // Full-circle end angle in degrees (ring/arc).
 DFT_UV_RECT :: Rectangle{0, 0, 1, 1} // Full-texture UV rect (rectangle_texture).
 DFT_TINT :: WHITE // Default texture tint (rectangle_texture, clay_image).
 DFT_TEXT_COLOR :: BLACK // Default text color.
 DFT_CLEAR_COLOR :: BLACK // Default clear color for end().
 DFT_SAMPLER :: Sampler_Preset.Linear_Clamp // Default texture sampler preset.
 GLOB: Global
 Global :: struct {
 	// -- Per-frame staging (hottest — touched by every prepare/upload/clear cycle) --
 	tmp_shape_verts:          [dynamic]Vertex, // Tessellated shape vertices staged for GPU upload.
 	tmp_text_verts:           [dynamic]Vertex, // Text vertices staged for GPU upload.
 	tmp_text_indices:         [dynamic]c.int, // Text index buffer staged for GPU upload.
 	tmp_text_batches:         [dynamic]TextBatch, // Text atlas batch metadata for indexed drawing.
 	tmp_primitives:           [dynamic]Primitive, // SDF primitives staged for GPU storage buffer upload.
 	tmp_sub_batches:          [dynamic]Sub_Batch, // Sub-batch records that drive draw call dispatch.
 	tmp_uncached_text:        [dynamic]^sdl_ttf.Text, // Uncached TTF_Text objects destroyed after end() submits.
 	layers:                   [dynamic]Layer, // Draw layers, each with its own scissor stack.
 	scissors:                 [dynamic]Scissor, // Scissor rects that clip drawing within each layer.
 	// -- Per-frame scalars (accessed during prepare and draw_layer) --
 	curr_layer_index:         uint, // Index of the currently active layer.
 	dpi_scaling:              f32, // Window DPI scale factor applied to all pixel coordinates.
 	clay_z_index:             i16, // Tracks z-index for layer splitting during Clay batch processing.
 	cleared:                  bool, // Whether the render target has been cleared this frame.
 	// -- Pipeline (accessed every draw_layer call) --
 	pipeline_2d_base:         Pipeline_2D_Base, // The unified 2D GPU pipeline (shaders, buffers, samplers).
 	device:                   ^sdl.GPUDevice, // GPU device handle, stored at init.
 	samplers:                 [SAMPLER_PRESET_COUNT]^sdl.GPUSampler, // Lazily-created sampler objects, one per Sampler_Preset.
 	// -- Deferred release (processed once per frame at frame boundary) --
 	pending_texture_releases: [dynamic]Texture_Id, // Deferred GPU texture releases, processed next frame.
 	pending_text_releases:    [dynamic]^sdl_ttf.Text, // Deferred TTF_Text destroys, processed next frame.
 	// -- Textures (registration is occasional, binding is per draw call) --
 	texture_slots:            [dynamic]Texture_Slot, // Registered texture slots indexed by Texture_Id.
 	texture_free_list:        [dynamic]u32, // Recycled slot indices available for reuse.
 	// -- MSAA (once per frame in end()) --
 	msaa_texture:             ^sdl.GPUTexture, // Intermediate render target for multi-sample resolve.
 	msaa_width:               u32, // Cached width to detect when MSAA texture needs recreation.
 	msaa_height:              u32, // Cached height to detect when MSAA texture needs recreation.
 	sample_count:             sdl.GPUSampleCount, // Sample count chosen at init (._1 means MSAA disabled).
 	// -- Clay (once per frame in prepare_clay_batch) --
 	clay_memory:              [^]u8, // Raw memory block backing Clay's internal arena.
 	// -- Text (occasional — font registration and text cache lookups) --
 	text_cache:               Text_Cache, // Font registry, SDL_ttf engine, and cached TTF_Text objects.
 	// -- Resize tracking (cold — checked once per frame in resize_global) --
 	max_layers:               int, // High-water marks for dynamic array shrink heuristic.
 	max_scissors:             int,
 	max_shape_verts:          int,
 	max_text_verts:           int,
 	max_text_indices:         int,
 	max_text_batches:         int,
 	max_primitives:           int,
 	max_sub_batches:          int,
 	// -- Init-only (coldest — set once at init, never written again) --
 	odin_context:             runtime.Context, // Odin context captured at init for use in callbacks.
 }
 // ---------------------------------------------------------------------------------------------------------------------
-// ----- Color -------------------------
+// ----- Core types --------------------
 // ---------------------------------------------------------------------------------------------------------------------
-Color :: distinct [4]u8
+// A 2D position in world space. Non-distinct alias for [2]f32 — bare literals like {100, 200}
 // work at non-ambiguous call sites.
 //
 // Coordinate system: origin is the top-left corner of the window/layer. X increases rightward,
 // Y increases downward. This matches SDL, HTML Canvas, and most 2D UI coordinate conventions.
 // All position parameters in the draw API (center, origin, start_position, end_position, etc.)
 // use this coordinate system.
 //
 // Units are logical pixels (pre-DPI-scaling). The renderer multiplies by dpi_scaling internally
 // before uploading to the GPU. A Vec2{100, 50} refers to the same visual location regardless of
 // display DPI.
 Vec2 :: [2]f32
 // An RGBA color with 8 bits per channel. Distinct type over [4]u8 so that proc-group
 // overloads can disambiguate Color from other 4-byte structs.
 //
 // Channel order: R, G, B, A (indices 0, 1, 2, 3). Alpha 255 is fully opaque, 0 is fully
 // transparent. This matches the GPU-side layout: the shader unpacks via unpackUnorm4x8 which
 // reads the bytes in memory order as R, G, B, A and normalizes each to [0, 1].
 //
 // When used in the Primitive struct (Primitive.color), the 4 bytes are stored as a u32 in
 // native byte order and unpacked by the shader.
 Color :: [4]u8
 BLACK :: Color{0, 0, 0, 255}
 WHITE :: Color{255, 255, 255, 255}
@@ -40,8 +139,43 @@ GREEN :: Color{0, 255, 0, 255}
 BLUE :: Color{0, 0, 255, 255}
 BLANK :: Color{0, 0, 0, 0}
 // Per-corner rounding radii for rectangles, specified clockwise from top-left.
 // All values are in logical pixels (pre-DPI-scaling).
 Rectangle_Radii :: struct {
 	top_left:     f32,
 	top_right:    f32,
 	bottom_right: f32,
 	bottom_left:  f32,
 }
 // A linear gradient between two colors along an arbitrary angle.
 // The `end_color` is the color at the end of the gradient direction; the shape's fill `color`
 // parameter acts as the start color. `angle` is in degrees: 0 = left-to-right, 90 = top-to-bottom.
 Linear_Gradient :: struct {
 	end_color: Color,
 	angle:     f32,
 }
 // A radial gradient between two colors from center to edge.
 // The `outer_color` is the color at the shape's edge; the shape's fill `color` parameter
 // acts as the inner (center) color.
 Radial_Gradient :: struct {
 	outer_color: Color,
 }
 // Tagged union for specifying a gradient on any shape. Defaults to `nil` (no gradient).
 // When a gradient is active, the shape's `color` parameter becomes the start/inner color,
 // and the gradient struct carries the end/outer color plus any type-specific parameters.
 //
 // Gradient and Textured are mutually exclusive on the same primitive. If a shape uses
 // `rectangle_texture`, gradients are not applicable — use the tint color instead.
 Gradient :: union {
 	Linear_Gradient,
 	Radial_Gradient,
 }
 // Convert clay.Color ([4]c.float in 0–255 range) to Color.
-color_from_clay :: proc(clay_color: clay.Color) -> Color {
+color_from_clay :: #force_inline proc(clay_color: clay.Color) -> Color {
 	return Color{u8(clay_color[0]), u8(clay_color[1]), u8(clay_color[2]), u8(clay_color[3])}
 }
@@ -51,9 +185,19 @@ color_to_f32 :: proc(color: Color) -> [4]f32 {
 	return {f32(color[0]) * INV, f32(color[1]) * INV, f32(color[2]) * INV, f32(color[3]) * INV}
 }
-// ---------------------------------------------------------------------------------------------------------------------
+// Pre-multiply RGB channels by alpha. The tessellated vertex path and text path require
-// ----- Core types --------------------
+// premultiplied colors because the blend state is ONE, ONE_MINUS_SRC_ALPHA and the
-// ---------------------------------------------------------------------------------------------------------------------
+// tessellated fragment shader passes vertex color through without further modification.
 // Users who construct Vertex structs manually for prepare_shape must premultiply their colors.
 premultiply_color :: #force_inline proc(color: Color) -> Color {
 	a := u32(color[3])
 	return Color {
 		u8((u32(color[0]) * a + 127) / 255),
 		u8((u32(color[1]) * a + 127) / 255),
 		u8((u32(color[2]) * a + 127) / 255),
 		color[3],
 	}
 }
 Rectangle :: struct {
 	x:      f32,
@@ -63,15 +207,17 @@ Rectangle :: struct {
 }
 Sub_Batch_Kind :: enum u8 {
-	Shapes, // non-indexed, white texture, mode 0
+	Tessellated, // non-indexed, white texture or user texture, mode 0
 	Text, // indexed, atlas texture, mode 0
-	SDF, // instanced unit quad, white texture, mode 1
+	SDF, // instanced unit quad, white texture or user texture, mode 1
 }
 Sub_Batch :: struct {
-	kind:   Sub_Batch_Kind,
+	kind:       Sub_Batch_Kind,
-	offset: u32, // Shapes: vertex offset; Text: text_batch index; SDF: primitive index
+	offset:     u32, // Tessellated: vertex offset; Text: text_batch index; SDF: primitive index
-	count:  u32, // Shapes: vertex count; Text: always 1; SDF: primitive count
+	count:      u32, // Tessellated: vertex count; Text: always 1; SDF: primitive count
 	texture_id: Texture_Id,
 	sampler:    Sampler_Preset,
 }
 Layer :: struct {
@@ -88,44 +234,6 @@ Scissor :: struct {
 	sub_batch_len:   u32,
 }
 // ---------------------------------------------------------------------------------------------------------------------
 // ----- Global state ------------------
 // ---------------------------------------------------------------------------------------------------------------------
 GLOB: Global
 Global :: struct {
 	odin_context:      runtime.Context,
 	pipeline_2d_base:  Pipeline_2D_Base,
 	text_cache:        Text_Cache,
 	layers:            [dynamic]Layer,
 	scissors:          [dynamic]Scissor,
 	tmp_shape_verts:   [dynamic]Vertex,
 	tmp_text_verts:    [dynamic]Vertex,
 	tmp_text_indices:  [dynamic]c.int,
 	tmp_text_batches:  [dynamic]TextBatch,
 	tmp_primitives:    [dynamic]Primitive,
 	tmp_sub_batches:   [dynamic]Sub_Batch,
 	tmp_uncached_text: [dynamic]^sdl_ttf.Text, // Uncached TTF_Text objects to destroy after end()
 	clay_memory:       [^]u8,
 	msaa_texture:      ^sdl.GPUTexture,
 	curr_layer_index:  uint,
 	max_layers:        int,
 	max_scissors:      int,
 	max_shape_verts:   int,
 	max_text_verts:    int,
 	max_text_indices:  int,
 	max_text_batches:  int,
 	max_primitives:    int,
 	max_sub_batches:   int,
 	dpi_scaling:       f32,
 	msaa_width:        u32,
 	msaa_height:       u32,
 	sample_count:      sdl.GPUSampleCount,
 	clay_z_index:      i16,
 	cleared:           bool,
 }
 Init_Options :: struct {
 	// MSAA sample count. Default is ._1 (no MSAA). SDF rendering does not benefit from MSAA
 	// because SDF fragments compute coverage analytically via `smoothstep`. MSAA helps for
@@ -135,10 +243,6 @@ Init_Options :: struct {
 	msaa_samples: sdl.GPUSampleCount,
 }
 // Sentinel value: when passed as msaa_samples, `init` will use the maximum MSAA sample count
 // supported by the GPU for the swapchain format.
 MSAA_MAX :: sdl.GPUSampleCount(0xFF)
 // Initialize the renderer. Returns false if GPU pipeline or text engine creation fails.
@(require_results)
 init :: proc(
@@ -168,22 +272,30 @@ init :: proc(
 	}
 	GLOB = Global {
-		layers            = make([dynamic]Layer, 0, INITIAL_LAYER_SIZE, allocator = allocator),
+		layers                   = make([dynamic]Layer, 0, INITIAL_LAYER_SIZE, allocator = allocator),
-		scissors          = make([dynamic]Scissor, 0, INITIAL_SCISSOR_SIZE, allocator = allocator),
+		scissors                 = make([dynamic]Scissor, 0, INITIAL_SCISSOR_SIZE, allocator = allocator),
-		tmp_shape_verts   = make([dynamic]Vertex, 0, BUFFER_INIT_SIZE, allocator = allocator),
+		tmp_shape_verts          = make([dynamic]Vertex, 0, BUFFER_INIT_SIZE, allocator = allocator),
-		tmp_text_verts    = make([dynamic]Vertex, 0, BUFFER_INIT_SIZE, allocator = allocator),
+		tmp_text_verts           = make([dynamic]Vertex, 0, BUFFER_INIT_SIZE, allocator = allocator),
-		tmp_text_indices  = make([dynamic]c.int, 0, BUFFER_INIT_SIZE, allocator = allocator),
+		tmp_text_indices         = make([dynamic]c.int, 0, BUFFER_INIT_SIZE, allocator = allocator),
-		tmp_text_batches  = make([dynamic]TextBatch, 0, BUFFER_INIT_SIZE, allocator = allocator),
+		tmp_text_batches         = make([dynamic]TextBatch, 0, BUFFER_INIT_SIZE, allocator = allocator),
-		tmp_primitives    = make([dynamic]Primitive, 0, BUFFER_INIT_SIZE, allocator = allocator),
+		tmp_primitives           = make([dynamic]Primitive, 0, BUFFER_INIT_SIZE, allocator = allocator),
-		tmp_sub_batches   = make([dynamic]Sub_Batch, 0, BUFFER_INIT_SIZE, allocator = allocator),
+		tmp_sub_batches          = make([dynamic]Sub_Batch, 0, BUFFER_INIT_SIZE, allocator = allocator),
-		tmp_uncached_text = make([dynamic]^sdl_ttf.Text, 0, 16, allocator = allocator),
+		tmp_uncached_text        = make([dynamic]^sdl_ttf.Text, 0, 16, allocator = allocator),
-		odin_context      = odin_context,
+		device                   = device,
-		dpi_scaling       = sdl.GetWindowDisplayScale(window),
+		texture_slots            = make([dynamic]Texture_Slot, 0, 16, allocator = allocator),
-		clay_memory       = make([^]u8, min_memory_size, allocator = allocator),
+		texture_free_list        = make([dynamic]u32, 0, 16, allocator = allocator),
-		sample_count      = resolved_sample_count,
+		pending_texture_releases = make([dynamic]Texture_Id, 0, 16, allocator = allocator),
-		pipeline_2d_base  = pipeline,
+		pending_text_releases    = make([dynamic]^sdl_ttf.Text, 0, 16, allocator = allocator),
-		text_cache        = text_cache,
+		odin_context             = odin_context,
 		dpi_scaling              = sdl.GetWindowDisplayScale(window),
 		clay_memory              = make([^]u8, min_memory_size, allocator = allocator),
 		sample_count             = resolved_sample_count,
 		pipeline_2d_base         = pipeline,
 		text_cache               = text_cache,
 	}
 	// Reserve slot 0 for INVALID_TEXTURE
 	append(&GLOB.texture_slots, Texture_Slot{})
 	log.debug("Window DPI scaling:", GLOB.dpi_scaling)
 	arena := clay.CreateArenaWithCapacityAndMemory(min_memory_size, GLOB.clay_memory)
 	window_width, window_height: c.int
@@ -230,12 +342,23 @@ destroy :: proc(device: ^sdl.GPUDevice, allocator := context.allocator) {
 	if GLOB.msaa_texture != nil {
 		sdl.ReleaseGPUTexture(device, GLOB.msaa_texture)
 	}
 	process_pending_texture_releases()
 	destroy_all_textures()
 	destroy_sampler_pool()
 	for ttf_text in GLOB.pending_text_releases do sdl_ttf.DestroyText(ttf_text)
 	delete(GLOB.pending_text_releases)
 	destroy_pipeline_2d_base(device, &GLOB.pipeline_2d_base)
 	destroy_text_cache()
 }
 // Internal
 clear_global :: proc() {
 	// Process deferred texture releases from the previous frame
 	process_pending_texture_releases()
 	// Process deferred text releases from the previous frame
 	for ttf_text in GLOB.pending_text_releases do sdl_ttf.DestroyText(ttf_text)
 	clear(&GLOB.pending_text_releases)
 	GLOB.curr_layer_index = 0
 	GLOB.clay_z_index = 0
 	GLOB.cleared = false
@@ -265,6 +388,7 @@ measure_text_clay :: proc "c" (
 	context = GLOB.odin_context
 	text := string(text.chars[:text.length])
 	c_text := strings.clone_to_cstring(text, context.temp_allocator)
 	defer delete(c_text, context.temp_allocator)
 	width, height: c.int
 	if !sdl_ttf.GetStringSize(get_font(config.fontId, config.fontSize), c_text, 0, &width, &height) {
 		log.panicf("Failed to measure text: %s", sdl.GetError())
@@ -331,12 +455,13 @@ new_layer :: proc(prev_layer: ^Layer, bounds: Rectangle) -> ^Layer {
 // ---------------------------------------------------------------------------------------------------------------------
 // Submit shape vertices (colored triangles) to the given layer for rendering.
 // TODO: Should probably be renamed to better match tesselated naming conventions in the library.
 prepare_shape :: proc(layer: ^Layer, vertices: []Vertex) {
 	if len(vertices) == 0 do return
 	offset := u32(len(GLOB.tmp_shape_verts))
 	append(&GLOB.tmp_shape_verts, ..vertices)
 	scissor := &GLOB.scissors[layer.scissor_start + layer.scissor_len - 1]
-	append_or_extend_sub_batch(scissor, layer, .Shapes, offset, u32(len(vertices)))
+	append_or_extend_sub_batch(scissor, layer, .Tessellated, offset, u32(len(vertices)))
 }
 // Submit an SDF primitive to the given layer for rendering.
@@ -362,6 +487,9 @@ prepare_text :: proc(layer: ^Layer, text: Text) {
 	base_x := math.round(text.position[0] * GLOB.dpi_scaling)
 	base_y := math.round(text.position[1] * GLOB.dpi_scaling)
 	// Premultiply text color once — reused across all glyph vertices.
 	pm_color := premultiply_color(text.color)
 	for data != nil {
 		vertex_start := u32(len(GLOB.tmp_text_verts))
 		index_start := u32(len(GLOB.tmp_text_indices))
@@ -372,7 +500,7 @@ prepare_text :: proc(layer: ^Layer, text: Text) {
 			uv := data.uv[i]
 			append(
 				&GLOB.tmp_text_verts,
-				Vertex{position = {pos.x + base_x, -pos.y + base_y}, uv = {uv.x, uv.y}, color = text.color},
+				Vertex{position = {pos.x + base_x, -pos.y + base_y}, uv = {uv.x, uv.y}, color = pm_color},
 			)
 		}
@@ -410,6 +538,9 @@ prepare_text_transformed :: proc(layer: ^Layer, text: Text, transform: Transform
 	scissor := &GLOB.scissors[layer.scissor_start + layer.scissor_len - 1]
 	// Premultiply text color once — reused across all glyph vertices.
 	pm_color := premultiply_color(text.color)
 	for data != nil {
 		vertex_start := u32(len(GLOB.tmp_text_verts))
 		index_start := u32(len(GLOB.tmp_text_indices))
@@ -422,7 +553,7 @@ prepare_text_transformed :: proc(layer: ^Layer, text: Text, transform: Transform
 			// so we apply directly — no per-vertex DPI divide/multiply.
 			append(
 				&GLOB.tmp_text_verts,
-				Vertex{position = apply_transform(transform, {pos.x, -pos.y}), uv = {uv.x, uv.y}, color = text.color},
+				Vertex{position = apply_transform(transform, {pos.x, -pos.y}), uv = {uv.x, uv.y}, color = pm_color},
 			)
 		}
@@ -454,15 +585,24 @@ append_or_extend_sub_batch :: proc(
 	kind: Sub_Batch_Kind,
 	offset: u32,
 	count: u32,
 	texture_id: Texture_Id = INVALID_TEXTURE,
 	sampler: Sampler_Preset = DFT_SAMPLER,
 ) {
 	if scissor.sub_batch_len > 0 {
 		last := &GLOB.tmp_sub_batches[scissor.sub_batch_start + scissor.sub_batch_len - 1]
-		if last.kind == kind && kind != .Text && last.offset + last.count == offset {
+		if last.kind == kind &&
 		   kind != .Text &&
 		   last.offset + last.count == offset &&
 		   last.texture_id == texture_id &&
 		   last.sampler == sampler {
 			last.count += count
 			return
 		}
 	}
-	append(&GLOB.tmp_sub_batches, Sub_Batch{kind = kind, offset = offset, count = count})
+	append(
 		&GLOB.tmp_sub_batches,
 		Sub_Batch{kind = kind, offset = offset, count = count, texture_id = texture_id, sampler = sampler},
 	)
 	scissor.sub_batch_len += 1
 	layer.sub_batch_len += 1
 }
@@ -502,6 +642,7 @@ prepare_clay_batch :: proc(
 	mouse_wheel_delta: [2]f32,
 	frame_time: f32 = 0,
 	custom_draw: Custom_Draw = nil,
 	temp_allocator := context.temp_allocator,
 ) {
 	mouse_pos: [2]f32
 	mouse_flags := sdl.GetMouseState(&mouse_pos.x, &mouse_pos.y)
@@ -538,10 +679,14 @@ prepare_clay_batch :: proc(
 		switch (render_command.commandType) {
 		case clay.RenderCommandType.None:
 			log.errorf(
 					"Received render command with type None. This generally means we're in some kind of fucked up state.",
 				)
 		case clay.RenderCommandType.Text:
 			render_data := render_command.renderData.text
 			txt := string(render_data.stringContents.chars[:render_data.stringContents.length])
-			c_text := strings.clone_to_cstring(txt, context.temp_allocator)
+			c_text := strings.clone_to_cstring(txt, temp_allocator)
 			defer delete(c_text, temp_allocator)
 			// Clay render-command IDs are derived via Clay's internal HashNumber (Jenkins-family)
 			// and namespaced with .Clay so they can never collide with user-provided custom text IDs.
 			sdl_text := cache_get_or_update(
@@ -551,6 +696,29 @@ prepare_clay_batch :: proc(
 			)
 			prepare_text(layer, Text{sdl_text, {bounds.x, bounds.y}, color_from_clay(render_data.textColor)})
 		case clay.RenderCommandType.Image:
 			// Any texture
 			render_data := render_command.renderData.image
 			if render_data.imageData == nil do continue
 			img_data := (^Clay_Image_Data)(render_data.imageData)^
 			cr := render_data.cornerRadius
 			radii := Rectangle_Radii {
 				top_left     = cr.topLeft,
 				top_right    = cr.topRight,
 				bottom_right = cr.bottomRight,
 				bottom_left  = cr.bottomLeft,
 			}
 			// Background color behind the image (Clay allows it)
 			bg := color_from_clay(render_data.backgroundColor)
 			if bg[3] > 0 {
 				rectangle(layer, bounds, bg, radii = radii)
 			}
 			// Compute fit UVs
 			uv, sampler, inner := fit_params(img_data.fit, bounds, img_data.texture_id)
 			// Draw the image
 			rectangle_texture(layer, inner, img_data.texture_id, img_data.tint, uv, sampler, radii)
 		case clay.RenderCommandType.ScissorStart:
 			if bounds.width == 0 || bounds.height == 0 do continue
@@ -582,34 +750,38 @@ prepare_clay_batch :: proc(
 			render_data := render_command.renderData.rectangle
 			cr := render_data.cornerRadius
 			color := color_from_clay(render_data.backgroundColor)
-			radii := [4]f32{cr.topLeft, cr.topRight, cr.bottomRight, cr.bottomLeft}
+			radii := Rectangle_Radii {
-
+				top_left     = cr.topLeft,
-			if radii == {0, 0, 0, 0} {
+				top_right    = cr.topRight,
-				rectangle(layer, bounds, color)
+				bottom_right = cr.bottomRight,
-			} else {
+				bottom_left  = cr.bottomLeft,
 				rectangle_corners(layer, bounds, radii, color)
 			}
 			rectangle(layer, bounds, color, radii = radii)
 		case clay.RenderCommandType.Border:
 			render_data := render_command.renderData.border
 			cr := render_data.cornerRadius
 			color := color_from_clay(render_data.color)
 			thickness := f32(render_data.width.top)
-			radii := [4]f32{cr.topLeft, cr.topRight, cr.bottomRight, cr.bottomLeft}
+			radii := Rectangle_Radii {
-
+				top_left     = cr.topLeft,
-			if radii == {0, 0, 0, 0} {
+				top_right    = cr.topRight,
-				rectangle_lines(layer, bounds, color, thickness)
+				bottom_right = cr.bottomRight,
-			} else {
+				bottom_left  = cr.bottomLeft,
 				rectangle_corners_lines(layer, bounds, radii, color, thickness)
 			}
 			rectangle(layer, bounds, BLANK, outline_color = color, outline_width = thickness, radii = radii)
 		case clay.RenderCommandType.Custom: if custom_draw != nil {
 					custom_draw(layer, bounds, render_command.renderData.custom)
 				} else {
 					log.error("Received clay render command of type custom but no custom_draw proc provided.")
 				}
 		}
 	}
 }
 // Render primitives. clear_color is the background fill before any layers are drawn.
-end :: proc(device: ^sdl.GPUDevice, window: ^sdl.Window, clear_color: Color = BLACK) {
+end :: proc(device: ^sdl.GPUDevice, window: ^sdl.Window, clear_color: Color = DFT_CLEAR_COLOR) {
 	cmd_buffer := sdl.AcquireGPUCommandBuffer(device)
 	if cmd_buffer == nil {
 		log.panicf("Failed to acquire GPU command buffer: %s", sdl.GetError())
@@ -642,7 +814,16 @@ end :: proc(device: ^sdl.GPUDevice, window: ^sdl.Window, clear_color: Color = BL
 		render_texture = GLOB.msaa_texture
 	}
-	clear_color_f32 := color_to_f32(clear_color)
+	// Premultiply clear color: the blend state is ONE, ONE_MINUS_SRC_ALPHA (premultiplied),
 	// so the clear color must also be premultiplied for correct background compositing.
 	clear_color_straight := color_to_f32(clear_color)
 	clear_alpha := clear_color_straight[3]
 	clear_color_f32 := [4]f32 {
 		clear_color_straight[0] * clear_alpha,
 		clear_color_straight[1] * clear_alpha,
 		clear_color_straight[2] * clear_alpha,
 		clear_alpha,
 	}
 	// Draw layers. One render pass per layer; sub-batches draw in submission order within each scissor.
 	for &layer, index in GLOB.layers {
@@ -850,10 +1031,20 @@ Transform_2D :: struct {
 //   origin       – pivot point in local space (measured from the shape's natural reference point).
 //   rotation_deg – rotation in degrees, counter-clockwise.
 //
-build_pivot_rotation :: proc(position: [2]f32, origin: [2]f32, rotation_deg: f32) -> Transform_2D {
+build_pivot_rotation :: proc(position: Vec2, origin: Vec2, rotation_deg: f32) -> Transform_2D {
 	radians := math.to_radians(rotation_deg)
 	cos_angle := math.cos(radians)
 	sin_angle := math.sin(radians)
 	return build_pivot_rotation_sc(position, origin, cos_angle, sin_angle)
 }
 // Variant of build_pivot_rotation that accepts pre-computed cos/sin values,
 // avoiding redundant trigonometry when the caller has already computed them.
 build_pivot_rotation_sc :: #force_inline proc(
 	position: Vec2,
 	origin: Vec2,
 	cos_angle, sin_angle: f32,
 ) -> Transform_2D {
 	return Transform_2D {
 		m00 = cos_angle,
 		m01 = -sin_angle,
@@ -865,7 +1056,7 @@ build_pivot_rotation :: proc(position: [2]f32, origin: [2]f32, rotation_deg: f32
 }
 // Apply the transform to a local-space point, producing a world-space point.
-apply_transform :: #force_inline proc(transform: Transform_2D, point: [2]f32) -> [2]f32 {
+apply_transform :: #force_inline proc(transform: Transform_2D, point: Vec2) -> Vec2 {
 	return {
 		transform.m00 * point.x + transform.m01 * point.y + transform.tx,
 		transform.m10 * point.x + transform.m11 * point.y + transform.ty,
@@ -875,7 +1066,7 @@ apply_transform :: #force_inline proc(transform: Transform_2D, point: [2]f32) ->
 // Fast-path check callers use BEFORE building a transform.
 // Returns true if either the origin is non-zero or rotation is non-zero,
 // meaning a transform actually needs to be computed.
-needs_transform :: #force_inline proc(origin: [2]f32, rotation: f32) -> bool {
+needs_transform :: #force_inline proc(origin: Vec2, rotation: f32) -> bool {
 	return origin != {0, 0} || rotation != 0
 }
@@ -0,0 +1,179 @@
 package draw_qr
 import draw ".."
 import "../../qrcode"
 DFT_QR_DARK :: draw.BLACK // Default QR code dark module color.
 DFT_QR_LIGHT :: draw.WHITE // Default QR code light module color.
 DFT_QR_BOOST_ECL :: true // Default QR error correction level boost.
 // Returns the number of bytes to_texture will write for the given encoded
 // QR buffer. Equivalent to size*size*4 where size = qrcode.get_size(qrcode_buf).
 texture_size :: #force_inline proc(qrcode_buf: []u8) -> int {
 	size := qrcode.get_size(qrcode_buf)
 	return size * size * 4
 }
 // Decodes an encoded QR buffer into tightly-packed RGBA pixel data written to
 // texture_buf. No allocations, no GPU calls. Returns the Texture_Desc the
 // caller should pass to draw.register_texture alongside texture_buf.
 //
 // Returns ok=false when:
 //   - qrcode_buf is invalid (qrcode.get_size returns 0).
 //   - texture_buf is smaller than to_texture_size(qrcode_buf).
@(require_results)
 to_texture :: proc(
 	qrcode_buf: []u8,
 	texture_buf: []u8,
 	dark: draw.Color = DFT_QR_DARK,
 	light: draw.Color = DFT_QR_LIGHT,
 ) -> (
 	desc: draw.Texture_Desc,
 	ok: bool,
 ) {
 	size := qrcode.get_size(qrcode_buf)
 	if size == 0 do return {}, false
 	if len(texture_buf) < size * size * 4 do return {}, false
 	for y in 0 ..< size {
 		for x in 0 ..< size {
 			i := (y * size + x) * 4
 			c := dark if qrcode.get_module(qrcode_buf, x, y) else light
 			texture_buf[i + 0] = c[0]
 			texture_buf[i + 1] = c[1]
 			texture_buf[i + 2] = c[2]
 			texture_buf[i + 3] = c[3]
 		}
 	}
 	return draw.Texture_Desc {
 			width = u32(size),
 			height = u32(size),
 			depth_or_layers = 1,
 			type = .D2,
 			format = .R8G8B8A8_UNORM,
 			usage = {.SAMPLER},
 			mip_levels = 1,
 			kind = .Static,
 		},
 		true
 }
 // Allocates pixel buffer via temp_allocator, decodes qrcode_buf into it, and
 // registers with the GPU. The pixel allocation is freed before return.
 //
 // Returns ok=false when:
 //   - qrcode_buf is invalid (qrcode.get_size returns 0).
 //   - temp_allocator fails to allocate the pixel buffer.
 //   - GPU texture registration fails.
@(require_results)
 register_texture_from_raw :: proc(
 	qrcode_buf: []u8,
 	dark: draw.Color = DFT_QR_DARK,
 	light: draw.Color = DFT_QR_LIGHT,
 	temp_allocator := context.temp_allocator,
 ) -> (
 	texture: draw.Texture_Id,
 	ok: bool,
 ) {
 	tex_size := texture_size(qrcode_buf)
 	if tex_size == 0 do return draw.INVALID_TEXTURE, false
 	pixels, alloc_err := make([]u8, tex_size, temp_allocator)
 	if alloc_err != nil do return draw.INVALID_TEXTURE, false
 	defer delete(pixels, temp_allocator)
 	desc := to_texture(qrcode_buf, pixels, dark, light) or_return
 	return draw.register_texture(desc, pixels)
 }
 // Encodes text as a QR Code and registers the result as an RGBA texture.
 //
 // Returns ok=false when:
 //   - temp_allocator fails to allocate.
 //   - The text cannot fit in any version within [min_version, max_version] at the given ECL.
 //   - GPU texture registration fails.
@(require_results)
 register_texture_from_text :: proc(
 	text: string,
 	ecl: qrcode.Ecc = .Low,
 	min_version: int = qrcode.VERSION_MIN,
 	max_version: int = qrcode.VERSION_MAX,
 	mask: Maybe(qrcode.Mask) = nil,
 	boost_ecl: bool = DFT_QR_BOOST_ECL,
 	dark: draw.Color = DFT_QR_DARK,
 	light: draw.Color = DFT_QR_LIGHT,
 	temp_allocator := context.temp_allocator,
 ) -> (
 	texture: draw.Texture_Id,
 	ok: bool,
 ) {
 	qrcode_buf, alloc_err := make([]u8, qrcode.buffer_len_for_version(max_version), temp_allocator)
 	if alloc_err != nil do return draw.INVALID_TEXTURE, false
 	defer delete(qrcode_buf, temp_allocator)
 	qrcode.encode_auto(
 		text,
 		qrcode_buf,
 		ecl,
 		min_version,
 		max_version,
 		mask,
 		boost_ecl,
 		temp_allocator,
 	) or_return
 	return register_texture_from_raw(qrcode_buf, dark, light, temp_allocator)
 }
 // Encodes arbitrary binary data as a QR Code and registers the result as an RGBA texture.
 //
 // Returns ok=false when:
 //   - temp_allocator fails to allocate.
 //   - The payload cannot fit in any version within [min_version, max_version] at the given ECL.
 //   - GPU texture registration fails.
@(require_results)
 register_texture_from_binary :: proc(
 	bin_data: []u8,
 	ecl: qrcode.Ecc = .Low,
 	min_version: int = qrcode.VERSION_MIN,
 	max_version: int = qrcode.VERSION_MAX,
 	mask: Maybe(qrcode.Mask) = nil,
 	boost_ecl: bool = DFT_QR_BOOST_ECL,
 	dark: draw.Color = DFT_QR_DARK,
 	light: draw.Color = DFT_QR_LIGHT,
 	temp_allocator := context.temp_allocator,
 ) -> (
 	texture: draw.Texture_Id,
 	ok: bool,
 ) {
 	qrcode_buf, alloc_err := make([]u8, qrcode.buffer_len_for_version(max_version), temp_allocator)
 	if alloc_err != nil do return draw.INVALID_TEXTURE, false
 	defer delete(qrcode_buf, temp_allocator)
 	qrcode.encode_auto(
 		bin_data,
 		qrcode_buf,
 		ecl,
 		min_version,
 		max_version,
 		mask,
 		boost_ecl,
 		temp_allocator,
 	) or_return
 	return register_texture_from_raw(qrcode_buf, dark, light, temp_allocator)
 }
 register_texture_from :: proc {
 	register_texture_from_text,
 	register_texture_from_binary,
 }
 // Default fit=.Fit preserves the QR's square aspect; override as needed.
 clay_image :: #force_inline proc(
 	texture: draw.Texture_Id,
 	tint: draw.Color = draw.DFT_TINT,
 ) -> draw.Clay_Image_Data {
 	return draw.clay_image_data(texture, fit = .Fit, tint = tint)
 }
@@ -1,19 +1,18 @@
 package examples
 import "core:fmt"
 import "core:log"
 import "core:mem"
 import "core:os"
 main :: proc() {
-	//----- Tracking allocator ----------------------------------
+	//----- General setup ----------------------------------
 	{
 		tracking_temp_allocator := false
 		// Temp
 		track_temp: mem.Tracking_Allocator
-		if tracking_temp_allocator {
+		mem.tracking_allocator_init(&track_temp, context.temp_allocator)
-			mem.tracking_allocator_init(&track_temp, context.temp_allocator)
+		context.temp_allocator = mem.tracking_allocator(&track_temp)
-			context.temp_allocator = mem.tracking_allocator(&track_temp)
+
 		}
 		// Default
 		track: mem.Tracking_Allocator
 		mem.tracking_allocator_init(&track, context.allocator)
@@ -22,18 +21,10 @@ main :: proc() {
 		// This could be fine for some global state or it could be a memory leak.
 		defer {
 			// Temp allocator
-			if tracking_temp_allocator {
+			if len(track_temp.bad_free_array) > 0 {
-				if len(track_temp.allocation_map) > 0 {
+				fmt.eprintf("=== %v incorrect frees - temp allocator: ===\n", len(track_temp.bad_free_array))
-					fmt.eprintf("=== %v allocations not freed - temp allocator: ===\n", len(track_temp.allocation_map))
+				for entry in track_temp.bad_free_array {
-					for _, entry in track_temp.allocation_map {
+					fmt.eprintf("- %p @ %v\n", entry.memory, entry.location)
 						fmt.eprintf("- %v bytes @ %v\n", entry.size, entry.location)
 					}
 				}
 				if len(track_temp.bad_free_array) > 0 {
 					fmt.eprintf("=== %v incorrect frees - temp allocator: ===\n", len(track_temp.bad_free_array))
 					for entry in track_temp.bad_free_array {
 						fmt.eprintf("- %p @ %v\n", entry.memory, entry.location)
 					}
 				}
 				mem.tracking_allocator_destroy(&track_temp)
 			}
@@ -52,12 +43,15 @@ main :: proc() {
 			}
 			mem.tracking_allocator_destroy(&track)
 		}
 		// Logger
 		context.logger = log.create_console_logger()
 		defer log.destroy_console_logger(context.logger)
 	}
 	args := os.args
 	if len(args) < 2 {
 		fmt.eprintln("Usage: examples <example_name>")
-		fmt.eprintln("Available examples: hellope-shapes, hellope-text, hellope-clay, hellope-custom")
+		fmt.eprintln("Available examples: hellope-shapes, hellope-text, hellope-clay, hellope-custom, textures")
 		os.exit(1)
 	}
@@ -66,9 +60,10 @@ main :: proc() {
 	case "hellope-custom": hellope_custom()
 	case "hellope-shapes": hellope_shapes()
 	case "hellope-text": hellope_text()
 	case "textures": textures()
 	case:
 		fmt.eprintf("Unknown example: %v\n", args[1])
-		fmt.eprintln("Available examples: hellope-shapes, hellope-text, hellope-clay, hellope-custom")
+		fmt.eprintln("Available examples: hellope-shapes, hellope-text, hellope-clay, hellope-custom, textures")
 		os.exit(1)
 	}
 }
@@ -1,6 +1,7 @@
 package examples
 import "../../draw"
 import "../../draw/tess"
 import "../../vendor/clay"
 import "core:math"
 import "core:os"
@@ -28,19 +29,26 @@ hellope_shapes :: proc() {
 		base_layer := draw.begin({width = 500, height = 500})
 		// Background
-		draw.rectangle(base_layer, {0, 0, 500, 500}, {40, 40, 40, 255})
+		draw.rectangle(base_layer, {0, 0, 500, 500}, draw.Color{40, 40, 40, 255})
 		// ----- Shapes without rotation (existing demo) -----
-		draw.rectangle(base_layer, {20, 20, 200, 120}, {80, 120, 200, 255})
+		draw.rectangle(
-		draw.rectangle_lines(base_layer, {20, 20, 200, 120}, draw.WHITE, thickness = 2)
+			base_layer,
-		draw.rectangle(base_layer, {240, 20, 240, 120}, {200, 80, 80, 255}, roundness = 0.3)
+			{20, 20, 200, 120},
-		draw.rectangle_gradient(
+			draw.Color{80, 120, 200, 255},
 			outline_color = draw.WHITE,
 			outline_width = 2,
 			radii = {top_right = 15, top_left = 5},
 		)
 		red_rect_raddi := draw.uniform_radii({240, 20, 240, 120}, 0.3)
 		red_rect_raddi.bottom_left = 0
 		draw.rectangle(base_layer, {240, 20, 240, 120}, draw.Color{200, 80, 80, 255}, radii = red_rect_raddi)
 		draw.rectangle(
 			base_layer,
 			{20, 160, 460, 60},
 			{255, 0, 0, 255},
-			{0, 255, 0, 255},
+			gradient = draw.Linear_Gradient{end_color = {0, 0, 255, 255}, angle = 0},
 			{0, 0, 255, 255},
 			{255, 255, 0, 255},
 		)
 		// ----- Rotation demos -----
@@ -50,17 +58,12 @@ hellope_shapes :: proc() {
 		draw.rectangle(
 			base_layer,
 			rect,
-			{100, 200, 100, 255},
+			draw.Color{100, 200, 100, 255},
-			origin = draw.center_of(rect),
+			outline_color = draw.WHITE,
-			rotation = spin_angle,
+			outline_width = 2,
 		)
 		draw.rectangle_lines(
 			base_layer,
 			rect,
 			draw.WHITE,
 			thickness = 2,
 			origin = draw.center_of(rect),
 			rotation = spin_angle,
 			feather_px = 1,
 		)
 		// Rounded rectangle rotating around its center
@@ -68,8 +71,8 @@ hellope_shapes :: proc() {
 		draw.rectangle(
 			base_layer,
 			rrect,
-			{200, 100, 200, 255},
+			draw.Color{200, 100, 200, 255},
-			roundness = 0.4,
+			radii = draw.uniform_radii(rrect, 0.4),
 			origin = draw.center_of(rrect),
 			rotation = spin_angle,
 		)
@@ -78,19 +81,36 @@ hellope_shapes :: proc() {
 		draw.ellipse(base_layer, {410, 340}, 50, 30, {255, 200, 50, 255}, rotation = spin_angle)
 		// Circle orbiting a point (moon orbiting planet)
-		planet_pos := [2]f32{100, 450}
+		// Convention B: center = pivot point (planet), origin = offset from moon center to pivot.
-		moon_pos := planet_pos + {0, -40}
+		// Moon's visual center at rotation=0: planet_pos - origin = (100, 450) - (0, 40) = (100, 410).
 		planet_pos := draw.Vec2{100, 450}
 		draw.circle(base_layer, planet_pos, 8, {200, 200, 200, 255}) // planet (stationary)
-		draw.circle(base_layer, moon_pos, 5, {100, 150, 255, 255}, origin = {0, 40}, rotation = spin_angle) // moon orbiting
+		draw.circle(
 			base_layer,
 			planet_pos,
 			5,
 			{100, 150, 255, 255},
 			origin = draw.Vec2{0, 40},
 			rotation = spin_angle,
 		) // moon orbiting
-		// Ring arc rotating in place
+		// Sector (pie slice) rotating in place
-		draw.ring(base_layer, {250, 450}, 15, 30, 0, 270, {100, 100, 220, 255}, rotation = spin_angle)
+		draw.ring(
 			base_layer,
 			draw.Vec2{250, 450},
 			0,
 			30,
 			{100, 100, 220, 255},
 			start_angle = 0,
 			end_angle = 270,
 			rotation = spin_angle,
 		)
 		// Triangle rotating around its center
-		tv1 := [2]f32{350, 420}
+		tv1 := draw.Vec2{350, 420}
-		tv2 := [2]f32{420, 480}
+		tv2 := draw.Vec2{420, 480}
-		tv3 := [2]f32{340, 480}
+		tv3 := draw.Vec2{340, 480}
-		draw.triangle(
+		tess.triangle_aa(
 			base_layer,
 			tv1,
 			tv2,
@@ -101,8 +121,16 @@ hellope_shapes :: proc() {
 		)
 		// Polygon rotating around its center (already had rotation; now with origin for orbit)
-		draw.polygon(base_layer, {460, 450}, 6, 30, {180, 100, 220, 255}, rotation = spin_angle)
+		draw.polygon(
-		draw.polygon_lines(base_layer, {460, 450}, 6, 30, draw.WHITE, rotation = spin_angle, thickness = 2)
+			base_layer,
 			{460, 450},
 			6,
 			30,
 			{180, 100, 220, 255},
 			outline_color = draw.WHITE,
 			outline_width = 2,
 			rotation = spin_angle,
 		)
 		draw.end(gpu, window)
 	}
@@ -133,9 +161,6 @@ hellope_text :: proc() {
 		spin_angle += 0.5
 		base_layer := draw.begin({width = 600, height = 600})
 		// Grey background
 		draw.rectangle(base_layer, {0, 0, 600, 600}, {127, 127, 127, 255})
 		// ----- Text API demos -----
 		// Cached text with id — TTF_Text reused across frames (good for text-heavy apps)
@@ -175,7 +200,7 @@ hellope_text :: proc() {
 		// Measure text for manual layout
 		size := draw.measure_text("Measured!", JETBRAINS_MONO_REGULAR, FONT_SIZE)
-		draw.rectangle(base_layer, {300 - size.x / 2, 380, size.x, size.y}, {60, 60, 60, 200})
+		draw.rectangle(base_layer, {300 - size.x / 2, 380, size.x, size.y}, draw.Color{60, 60, 60, 200})
 		draw.text(
 			base_layer,
 			"Measured!",
@@ -199,7 +224,7 @@ hellope_text :: proc() {
 			id = CORNER_SPIN_ID,
 		)
-		draw.end(gpu, window)
+		draw.end(gpu, window, draw.Color{127, 127, 127, 255})
 	}
 }
@@ -337,15 +362,21 @@ hellope_custom :: proc() {
 	draw_custom :: proc(layer: ^draw.Layer, bounds: draw.Rectangle, render_data: clay.CustomRenderData) {
 		gauge := cast(^Gauge)render_data.customData
-		// Background from clay's backgroundColor
+		border_width: f32 = 2
-		draw.rectangle(layer, bounds, draw.color_from_clay(render_data.backgroundColor), roundness = 0.25)
+		draw.rectangle(
 			layer,
 			bounds,
 			draw.color_from_clay(render_data.backgroundColor),
 			outline_color = draw.WHITE,
 			outline_width = border_width,
 		)
-		// Fill bar
+		fill := draw.Rectangle {
-		fill := bounds
+			x      = bounds.x,
-		fill.width *= gauge.value
+			y      = bounds.y,
-		draw.rectangle(layer, fill, gauge.color, roundness = 0.25)
+			width  = bounds.width * gauge.value,
-
+			height = bounds.height,
-		// Border
+		}
-		draw.rectangle_lines(layer, bounds, draw.WHITE, thickness = 2, roundness = 0.25)
+		draw.rectangle(layer, fill, gauge.color)
 	}
 }
@@ -0,0 +1,272 @@
 package examples
 import "../../draw"
 import "../../draw/draw_qr"
 import "core:os"
 import sdl "vendor:sdl3"
 textures :: proc() {
 	if !sdl.Init({.VIDEO}) do os.exit(1)
 	window := sdl.CreateWindow("Textures", 800, 600, {.HIGH_PIXEL_DENSITY})
 	gpu := sdl.CreateGPUDevice(draw.PLATFORM_SHADER_FORMAT, true, nil)
 	if !sdl.ClaimWindowForGPUDevice(gpu, window) do os.exit(1)
 	if !draw.init(gpu, window) do os.exit(1)
 	JETBRAINS_MONO_REGULAR = draw.register_font(JETBRAINS_MONO_REGULAR_RAW)
 	FONT_SIZE :: u16(14)
 	LABEL_OFFSET :: f32(8) // gap between item and its label
 	//----- Texture registration ----------------------------------
 	checker_size :: 8
 	checker_pixels: [checker_size * checker_size * 4]u8
 	for y in 0 ..< checker_size {
 		for x in 0 ..< checker_size {
 			i := (y * checker_size + x) * 4
 			is_dark := ((x + y) % 2) == 0
 			val: u8 = 40 if is_dark else 220
 			checker_pixels[i + 0] = val // R
 			checker_pixels[i + 1] = val / 2 // G — slight color tint
 			checker_pixels[i + 2] = val // B
 			checker_pixels[i + 3] = 255 // A
 		}
 	}
 	checker_texture, _ := draw.register_texture(
 		draw.Texture_Desc {
 			width = checker_size,
 			height = checker_size,
 			depth_or_layers = 1,
 			type = .D2,
 			format = .R8G8B8A8_UNORM,
 			usage = {.SAMPLER},
 			mip_levels = 1,
 		},
 		checker_pixels[:],
 	)
 	defer draw.unregister_texture(checker_texture)
 	stripe_w :: 16
 	stripe_h :: 8
 	stripe_pixels: [stripe_w * stripe_h * 4]u8
 	for y in 0 ..< stripe_h {
 		for x in 0 ..< stripe_w {
 			i := (y * stripe_w + x) * 4
 			stripe_pixels[i + 0] = u8(x * 255 / (stripe_w - 1)) // R gradient left→right
 			stripe_pixels[i + 1] = u8(y * 255 / (stripe_h - 1)) // G gradient top→bottom
 			stripe_pixels[i + 2] = 128 // B constant
 			stripe_pixels[i + 3] = 255 // A
 		}
 	}
 	stripe_texture, _ := draw.register_texture(
 		draw.Texture_Desc {
 			width = stripe_w,
 			height = stripe_h,
 			depth_or_layers = 1,
 			type = .D2,
 			format = .R8G8B8A8_UNORM,
 			usage = {.SAMPLER},
 			mip_levels = 1,
 		},
 		stripe_pixels[:],
 	)
 	defer draw.unregister_texture(stripe_texture)
 	qr_texture, _ := draw_qr.register_texture_from("https://x.com/miiilato/status/1880241066471051443")
 	defer draw.unregister_texture(qr_texture)
 	spin_angle: f32 = 0
 	//----- Draw loop ----------------------------------
 	for {
 		defer free_all(context.temp_allocator)
 		ev: sdl.Event
 		for sdl.PollEvent(&ev) {
 			if ev.type == .QUIT do return
 		}
 		spin_angle += 1
 		base_layer := draw.begin({width = 800, height = 600})
 		// Background
 		draw.rectangle(base_layer, {0, 0, 800, 600}, draw.Color{30, 30, 30, 255})
 		//----- Row 1: Sampler presets (y=30) ----------------------------------
 		ROW1_Y :: f32(30)
 		ITEM_SIZE :: f32(120)
 		COL1 :: f32(30)
 		COL2 :: f32(180)
 		COL3 :: f32(330)
 		COL4 :: f32(480)
 		// Nearest (sharp pixel edges)
 		draw.rectangle_texture(
 			base_layer,
 			{COL1, ROW1_Y, ITEM_SIZE, ITEM_SIZE},
 			checker_texture,
 			sampler = .Nearest_Clamp,
 		)
 		draw.text(
 			base_layer,
 			"Nearest",
 			{COL1, ROW1_Y + ITEM_SIZE + LABEL_OFFSET},
 			JETBRAINS_MONO_REGULAR,
 			FONT_SIZE,
 			color = draw.WHITE,
 		)
 		// Linear (bilinear blur)
 		draw.rectangle_texture(
 			base_layer,
 			{COL2, ROW1_Y, ITEM_SIZE, ITEM_SIZE},
 			checker_texture,
 			sampler = .Linear_Clamp,
 		)
 		draw.text(
 			base_layer,
 			"Linear",
 			{COL2, ROW1_Y + ITEM_SIZE + LABEL_OFFSET},
 			JETBRAINS_MONO_REGULAR,
 			FONT_SIZE,
 			color = draw.WHITE,
 		)
 		// Tiled (4x repeat)
 		draw.rectangle_texture(
 			base_layer,
 			{COL3, ROW1_Y, ITEM_SIZE, ITEM_SIZE},
 			checker_texture,
 			sampler = .Nearest_Repeat,
 			uv_rect = {0, 0, 4, 4},
 		)
 		draw.text(
 			base_layer,
 			"Tiled 4x",
 			{COL3, ROW1_Y + ITEM_SIZE + LABEL_OFFSET},
 			JETBRAINS_MONO_REGULAR,
 			FONT_SIZE,
 			color = draw.WHITE,
 		)
 		//----- Row 2: Sampler presets (y=190) ----------------------------------
 		ROW2_Y :: f32(190)
 		// QR code (RGBA texture with baked colors, nearest sampling)
 		draw.rectangle(base_layer, {COL1, ROW2_Y, ITEM_SIZE, ITEM_SIZE}, draw.Color{255, 255, 255, 255}) // white bg
 		draw.rectangle_texture(
 			base_layer,
 			{COL1, ROW2_Y, ITEM_SIZE, ITEM_SIZE},
 			qr_texture,
 			sampler = .Nearest_Clamp,
 		)
 		draw.text(
 			base_layer,
 			"QR Code",
 			{COL1, ROW2_Y + ITEM_SIZE + LABEL_OFFSET},
 			JETBRAINS_MONO_REGULAR,
 			FONT_SIZE,
 			color = draw.WHITE,
 		)
 		// Rounded corners
 		draw.rectangle_texture(
 			base_layer,
 			{COL2, ROW2_Y, ITEM_SIZE, ITEM_SIZE},
 			checker_texture,
 			sampler = .Nearest_Clamp,
 			radii = draw.uniform_radii({COL2, ROW2_Y, ITEM_SIZE, ITEM_SIZE}, 0.3),
 		)
 		draw.text(
 			base_layer,
 			"Rounded",
 			{COL2, ROW2_Y + ITEM_SIZE + LABEL_OFFSET},
 			JETBRAINS_MONO_REGULAR,
 			FONT_SIZE,
 			color = draw.WHITE,
 		)
 		// Rotating
 		rot_rect := draw.Rectangle{COL3, ROW2_Y, ITEM_SIZE, ITEM_SIZE}
 		draw.rectangle_texture(
 			base_layer,
 			rot_rect,
 			checker_texture,
 			sampler = .Nearest_Clamp,
 			origin = draw.center_of(rot_rect),
 			rotation = spin_angle,
 		)
 		draw.text(
 			base_layer,
 			"Rotating",
 			{COL3, ROW2_Y + ITEM_SIZE + LABEL_OFFSET},
 			JETBRAINS_MONO_REGULAR,
 			FONT_SIZE,
 			color = draw.WHITE,
 		)
 		//----- Row 3: Fit modes + Per-corner radii (y=360) ----------------------------------
 		ROW3_Y :: f32(360)
 		FIT_SIZE :: f32(120) // square target rect
 		// Stretch
 		uv_s, sampler_s, inner_s := draw.fit_params(.Stretch, {COL1, ROW3_Y, FIT_SIZE, FIT_SIZE}, stripe_texture)
 		draw.rectangle(base_layer, {COL1, ROW3_Y, FIT_SIZE, FIT_SIZE}, draw.Color{60, 60, 60, 255}) // bg
 		draw.rectangle_texture(base_layer, inner_s, stripe_texture, uv_rect = uv_s, sampler = sampler_s)
 		draw.text(
 			base_layer,
 			"Stretch",
 			{COL1, ROW3_Y + FIT_SIZE + LABEL_OFFSET},
 			JETBRAINS_MONO_REGULAR,
 			FONT_SIZE,
 			color = draw.WHITE,
 		)
 		// Fill (center-crop)
 		uv_f, sampler_f, inner_f := draw.fit_params(.Fill, {COL2, ROW3_Y, FIT_SIZE, FIT_SIZE}, stripe_texture)
 		draw.rectangle(base_layer, {COL2, ROW3_Y, FIT_SIZE, FIT_SIZE}, draw.Color{60, 60, 60, 255})
 		draw.rectangle_texture(base_layer, inner_f, stripe_texture, uv_rect = uv_f, sampler = sampler_f)
 		draw.text(
 			base_layer,
 			"Fill",
 			{COL2, ROW3_Y + FIT_SIZE + LABEL_OFFSET},
 			JETBRAINS_MONO_REGULAR,
 			FONT_SIZE,
 			color = draw.WHITE,
 		)
 		// Fit (letterbox)
 		uv_ft, sampler_ft, inner_ft := draw.fit_params(.Fit, {COL3, ROW3_Y, FIT_SIZE, FIT_SIZE}, stripe_texture)
 		draw.rectangle(base_layer, {COL3, ROW3_Y, FIT_SIZE, FIT_SIZE}, draw.Color{60, 60, 60, 255}) // visible margin bg
 		draw.rectangle_texture(base_layer, inner_ft, stripe_texture, uv_rect = uv_ft, sampler = sampler_ft)
 		draw.text(
 			base_layer,
 			"Fit",
 			{COL3, ROW3_Y + FIT_SIZE + LABEL_OFFSET},
 			JETBRAINS_MONO_REGULAR,
 			FONT_SIZE,
 			color = draw.WHITE,
 		)
 		// Per-corner radii
 		draw.rectangle_texture(
 			base_layer,
 			{COL4, ROW3_Y, FIT_SIZE, FIT_SIZE},
 			checker_texture,
 			sampler = .Nearest_Clamp,
 			radii = {20, 0, 20, 0},
 		)
 		draw.text(
 			base_layer,
 			"Per-corner",
 			{COL4, ROW3_Y + FIT_SIZE + LABEL_OFFSET},
 			JETBRAINS_MONO_REGULAR,
 			FONT_SIZE,
 			color = draw.WHITE,
 		)
 		draw.end(gpu, window)
 	}
 }
@@ -5,8 +5,13 @@ import "core:log"
 import "core:mem"
 import sdl "vendor:sdl3"
 // Vertex layout for tessellated and text geometry.
 // IMPORTANT: `color` must be premultiplied alpha (RGB channels pre-scaled by alpha).
 // The tessellated fragment shader passes vertex color through directly — it does NOT
 // premultiply. The blend state is ONE, ONE_MINUS_SRC_ALPHA (premultiplied-over).
 // Use `premultiply_color` when constructing vertices manually for `prepare_shape`.
 Vertex :: struct {
-	position: [2]f32,
+	position: Vec2,
 	uv:       [2]f32,
 	color:    Color,
 }
@@ -23,97 +28,127 @@ TextBatch :: struct {
 // ----- SDF primitive types -----------
 // ----------------------------------------------------------------------------------------------------------------
 // The SDF path evaluates one of four signed distance functions per primitive, dispatched
 // by Shape_Kind encoded in the low byte of Primitive.flags:
 //
 //   RRect    — rounded rectangle with per-corner radii (sdRoundedBox). Also covers circles
 //              (uniform radii = half-size), capsule-style line segments (rotated, max rounding),
 //              and other RRect-reducible shapes.
 //   NGon     — regular polygon with N sides and optional rounding.
 //   Ellipse  — approximate ellipse (non-exact SDF, suitable for UI but not for shape merging).
 //   Ring_Arc — annular ring with optional angular clipping. Covers full rings, partial arcs,
 //              pie slices (inner_radius = 0), and loading spinners.
 Shape_Kind :: enum u8 {
-	Solid    = 0,
+	Solid    = 0, // tessellated path (mode marker; not a real SDF kind)
 	RRect    = 1,
-	Circle   = 2,
+	NGon     = 2,
 	Ellipse  = 3,
-	Segment  = 4,
+	Ring_Arc = 4,
 	Ring_Arc = 5,
 	NGon     = 6,
 }
 Shape_Flag :: enum u8 {
-	Stroke,
+	Textured, // bit 0: sample texture using uv.uv_rect (mutually exclusive with Gradient)
 	Gradient, // bit 1: 2-color gradient using uv.effects.gradient_color as end/outer color
 	Gradient_Radial, // bit 2: if set with Gradient, radial from center; else linear at angle
 	Outline, // bit 3: outer outline band using uv.effects.outline_color; CPU expands bounds by outline_width
 	Rotated, // bit 4: shape has non-zero rotation; rotation_sc contains packed sin/cos
 	Arc_Narrow, // bit 5: ring arc span ≤ π — intersect half-planes. Neither Arc bit = full ring.
 	Arc_Wide, // bit 6: ring arc span > π — union half-planes. Neither Arc bit = full ring.
 }
 Shape_Flags :: bit_set[Shape_Flag;u8]
 RRect_Params :: struct {
-	half_size: [2]f32,
+	half_size:    [2]f32,
-	radii:     [4]f32,
+	radii:        [4]f32,
-	soft_px:   f32,
+	half_feather: f32, // feather_px * 0.5; shader uses smoothstep(-h, h, d)
-	stroke_px: f32,
+	_:            f32,
 }
 Circle_Params :: struct {
 	radius:    f32,
 	soft_px:   f32,
 	stroke_px: f32,
 	_:         [5]f32,
 }
 Ellipse_Params :: struct {
 	radii:     [2]f32,
 	soft_px:   f32,
 	stroke_px: f32,
 	_:         [4]f32,
 }
 Segment_Params :: struct {
 	a:       [2]f32,
 	b:       [2]f32,
 	width:   f32,
 	soft_px: f32,
 	_:       [2]f32,
 }
 Ring_Arc_Params :: struct {
 	inner_radius: f32,
 	outer_radius: f32,
 	start_rad:    f32,
 	end_rad:      f32,
 	soft_px:      f32,
 	_:            [3]f32,
 }
 NGon_Params :: struct {
-	radius:    f32,
+	radius:       f32,
-	rotation:  f32,
+	sides:        f32,
-	sides:     f32,
+	half_feather: f32, // feather_px * 0.5; shader uses smoothstep(-h, h, d)
-	soft_px:   f32,
+	_:            [5]f32,
-	stroke_px: f32,
+}
-	_:         [3]f32,
+
 Ellipse_Params :: struct {
 	radii:        [2]f32,
 	half_feather: f32, // feather_px * 0.5; shader uses smoothstep(-h, h, d)
 	_:            [5]f32,
 }
 Ring_Arc_Params :: struct {
 	inner_radius: f32, // inner radius in physical pixels (0 for pie slice)
 	outer_radius: f32, // outer radius in physical pixels
 	normal_start: [2]f32, // pre-computed outward normal of start edge: (sin(start), -cos(start))
 	normal_end:   [2]f32, // pre-computed outward normal of end edge: (-sin(end), cos(end))
 	half_feather: f32, // feather_px * 0.5; shader uses smoothstep(-h, h, d)
 	_:            f32,
 }
 Shape_Params :: struct #raw_union {
 	rrect:    RRect_Params,
 	circle:   Circle_Params,
 	ellipse:  Ellipse_Params,
 	segment:  Segment_Params,
 	ring_arc: Ring_Arc_Params,
 	ngon:     NGon_Params,
 	ellipse:  Ellipse_Params,
 	ring_arc: Ring_Arc_Params,
 	raw:      [8]f32,
 }
 #assert(size_of(Shape_Params) == 32)
-// GPU layout: 64 bytes, std430-compatible. The shader declares this as a storage buffer struct.
+// GPU-side storage for 2-color gradient parameters and/or outline parameters.
-Primitive :: struct {
+// Packed into 16 bytes to alias with uv_rect in the Uv_Or_Effects raw union.
-	bounds:     [4]f32, //  0: min_x, min_y, max_x, max_y (world-space, pre-DPI)
+// The shader reads gradient_color and outline_color via unpackUnorm4x8.
-	color:      Color, // 16: u8x4, unpacked in shader via unpackUnorm4x8
+// gradient_dir_sc stores the pre-computed gradient direction as (cos, sin) in f16 pair
-	kind_flags: u32, // 20: (kind as u32) | (flags as u32 << 8)
+// via unpackHalf2x16. outline_packed stores outline_width as f16 via unpackHalf2x16.
-	rotation:   f32, // 24: shader self-rotation in radians (used by RRect, Ellipse)
+Gradient_Outline :: struct {
-	_pad:       f32, // 28: alignment to vec4 boundary
+	gradient_color:  Color, //  0: end (linear) or outer (radial) gradient color
-	params:     Shape_Params, // 32: two vec4s of shape params
+	outline_color:   Color, //  4: outline band color
 	gradient_dir_sc: u32, //  8: packed f16 pair: low = cos(angle), high = sin(angle) — pre-computed gradient direction
 	outline_packed:  u32, // 12: packed f16 pair: low = outline_width (f16, physical pixels), high = reserved
 }
-#assert(size_of(Primitive) == 64)
+#assert(size_of(Gradient_Outline) == 16)
 // Uv_Or_Effects aliases the final 16 bytes of a Primitive. When .Textured is set,
 // uv_rect holds texture-atlas coordinates. When .Gradient or .Outline is set,
 // effects holds 2-color gradient parameters and/or outline parameters.
 // Textured and Gradient are mutually exclusive; if both are set, Gradient takes precedence.
 Uv_Or_Effects :: struct #raw_union {
 	uv_rect: [4]f32, // u_min, v_min, u_max, v_max (default {0,0,1,1})
 	effects: Gradient_Outline, // gradient + outline parameters
 }
 // GPU layout: 80 bytes, std430-compatible. The shader declares this as a storage buffer struct.
 // The low byte of `flags` encodes the Shape_Kind (0 = tessellated, 1-4 = SDF kinds).
 // Bits 8-15 encode Shape_Flags (Textured, Gradient, Gradient_Radial, Outline, Rotated, Arc_Narrow, Arc_Wide).
 // rotation_sc stores pre-computed sin/cos of the rotation angle as a packed f16 pair,
 // avoiding per-pixel trigonometry in the fragment shader. Only read when .Rotated is set.
 Primitive :: struct {
 	bounds:      [4]f32, //  0: min_x, min_y, max_x, max_y (world-space, pre-DPI)
 	color:       Color, // 16: u8x4, fill color / gradient start color / texture tint
 	flags:       u32, // 20: low byte = Shape_Kind, bits 8+ = Shape_Flags
 	rotation_sc: u32, // 24: packed f16 pair: low = sin(angle), high = cos(angle). Requires .Rotated flag.
 	_pad:        f32, // 28: reserved for future use
 	params:      Shape_Params, // 32: per-kind shape parameters (raw union, 32 bytes)
 	uv:          Uv_Or_Effects, // 64: texture coords or gradient/outline parameters
 }
 #assert(size_of(Primitive) == 80)
 // Pack shape kind and flags into the Primitive.flags field. The low byte encodes the Shape_Kind
 // (which also serves as the SDF mode marker — kind > 0 means SDF path). The tessellated path
 // leaves the field at 0 (Solid kind, set by vertex shader zero-initialization).
 pack_kind_flags :: #force_inline proc(kind: Shape_Kind, flags: Shape_Flags) -> u32 {
 	return u32(kind) | (u32(transmute(u8)flags) << 8)
 }
 // Pack two f16 values into a single u32 for GPU consumption via unpackHalf2x16.
 // Used to pack gradient_dir_sc (cos/sin) and outline_packed (width/reserved) in Gradient_Outline.
 pack_f16_pair :: #force_inline proc(low, high: f16) -> u32 {
 	return u32(transmute(u16)low) | (u32(transmute(u16)high) << 16)
 }
 Pipeline_2D_Base :: struct {
 	sdl_pipeline:     ^sdl.GPUGraphicsPipeline,
 	vertex_buffer:    Buffer,
@@ -206,19 +241,23 @@ create_pipeline_2d_base :: proc(
 		target_info = sdl.GPUGraphicsPipelineTargetInfo {
 			color_target_descriptions = &sdl.GPUColorTargetDescription {
 				format = sdl.GetGPUSwapchainTextureFormat(device, window),
 				// Premultiplied-alpha blending: src outputs RGB pre-multiplied by alpha,
 				// so src factor is ONE (not SRC_ALPHA). This eliminates the per-pixel
 				// divide in the outline path and is the standard blend mode used by
 				// Skia, Flutter, and GPUI.
 				blend_state = sdl.GPUColorTargetBlendState {
 					enable_blend = true,
 					enable_color_write_mask = true,
-					src_color_blendfactor = .SRC_ALPHA,
+					src_color_blendfactor = .ONE,
 					dst_color_blendfactor = .ONE_MINUS_SRC_ALPHA,
 					color_blend_op = .ADD,
-					src_alpha_blendfactor = .SRC_ALPHA,
+					src_alpha_blendfactor = .ONE,
 					dst_alpha_blendfactor = .ONE_MINUS_SRC_ALPHA,
 					alpha_blend_op = .ADD,
 					color_write_mask = sdl.GPUColorComponentFlags{.R, .G, .B, .A},
 				},
 			},
-			num_color_targets = 1,
+			num_color_targets         = 1,
 		},
 		vertex_input_state = sdl.GPUVertexInputState {
 			vertex_buffer_descriptions = &sdl.GPUVertexBufferDescription {
@@ -298,7 +337,7 @@ create_pipeline_2d_base :: proc(
 	}
 	// Upload white pixel and unit quad data in a single command buffer
-	white_pixel := [4]u8{255, 255, 255, 255}
+	white_pixel := Color{255, 255, 255, 255}
 	white_transfer_buf := sdl.CreateGPUTransferBuffer(
 		device,
 		sdl.GPUTransferBufferCreateInfo{usage = .UPLOAD, size = size_of(white_pixel)},
@@ -566,6 +605,7 @@ draw_layer :: proc(
 	current_mode: Draw_Mode = .Tessellated
 	current_vert_buf := main_vert_buf
 	current_atlas: ^sdl.GPUTexture
 	current_sampler := sampler
 	// Text vertices live after shape vertices in the GPU vertex buffer
 	text_vertex_gpu_base := u32(len(GLOB.tmp_shape_verts))
@@ -575,7 +615,7 @@ draw_layer :: proc(
 		for &batch in GLOB.tmp_sub_batches[scissor.sub_batch_start:][:scissor.sub_batch_len] {
 			switch batch.kind {
-			case .Shapes:
+			case .Tessellated:
 				if current_mode != .Tessellated {
 					push_globals(cmd_buffer, width, height, .Tessellated)
 					current_mode = .Tessellated
@@ -584,14 +624,24 @@ draw_layer :: proc(
 					sdl.BindGPUVertexBuffers(render_pass, 0, &sdl.GPUBufferBinding{buffer = main_vert_buf, offset = 0}, 1)
 					current_vert_buf = main_vert_buf
 				}
-				if current_atlas != white_texture {
+				// Determine texture and sampler for this batch
 				batch_texture: ^sdl.GPUTexture = white_texture
 				batch_sampler: ^sdl.GPUSampler = sampler
 				if batch.texture_id != INVALID_TEXTURE {
 					if bound_texture := texture_gpu_handle(batch.texture_id); bound_texture != nil {
 						batch_texture = bound_texture
 					}
 					batch_sampler = get_sampler(batch.sampler)
 				}
 				if current_atlas != batch_texture || current_sampler != batch_sampler {
 					sdl.BindGPUFragmentSamplers(
 						render_pass,
 						0,
-						&sdl.GPUTextureSamplerBinding{texture = white_texture, sampler = sampler},
+						&sdl.GPUTextureSamplerBinding{texture = batch_texture, sampler = batch_sampler},
 						1,
 					)
-					current_atlas = white_texture
+					current_atlas = batch_texture
 					current_sampler = batch_sampler
 				}
 				sdl.DrawGPUPrimitives(render_pass, batch.count, 1, batch.offset, 0)
@@ -632,14 +682,24 @@ draw_layer :: proc(
 					sdl.BindGPUVertexBuffers(render_pass, 0, &sdl.GPUBufferBinding{buffer = unit_quad, offset = 0}, 1)
 					current_vert_buf = unit_quad
 				}
-				if current_atlas != white_texture {
+				// Determine texture and sampler for this batch
 				batch_texture: ^sdl.GPUTexture = white_texture
 				batch_sampler: ^sdl.GPUSampler = sampler
 				if batch.texture_id != INVALID_TEXTURE {
 					if bound_texture := texture_gpu_handle(batch.texture_id); bound_texture != nil {
 						batch_texture = bound_texture
 					}
 					batch_sampler = get_sampler(batch.sampler)
 				}
 				if current_atlas != batch_texture || current_sampler != batch_sampler {
 					sdl.BindGPUFragmentSamplers(
 						render_pass,
 						0,
-						&sdl.GPUTextureSamplerBinding{texture = white_texture, sampler = sampler},
+						&sdl.GPUTextureSamplerBinding{texture = batch_texture, sampler = batch_sampler},
 						1,
 					)
-					current_atlas = white_texture
+					current_atlas = batch_texture
 					current_sampler = batch_sampler
 				}
 				sdl.DrawGPUPrimitives(render_pass, 6, batch.count, 0, batch.offset)
 			}
@@ -23,274 +23,225 @@ struct main0_in
    float2 f_local_or_uv [[user(locn1)]];
    float4 f_params [[user(locn2)]];
    float4 f_params2 [[user(locn3)]];
-    uint f_kind_flags [[user(locn4)]];
+    uint f_flags [[user(locn4)]];
-    float f_rotation [[user(locn5), flat]];
+    uint f_rotation_sc [[user(locn5)]];
    uint4 f_uv_or_effects [[user(locn6)]];
 };
 static inline __attribute__((always_inline))
-float2 apply_rotation(thread const float2& p, thread const float& angle)
+float sdRoundedBox(thread const float2& p, thread const float2& b, thread const float4& r)
 {
-    float cr = cos(-angle);
+    float2 _48;
    float sr = sin(-angle);
    return float2x2(float2(cr, sr), float2(-sr, cr)) * p;
 }
 static inline __attribute__((always_inline))
 float sdRoundedBox(thread const float2& p, thread const float2& b, thread float4& r)
 {
    float2 _61;
    if (p.x > 0.0)
    {
-        _61 = r.xy;
+        _48 = r.xy;
    }
    else
    {
-        _61 = r.zw;
+        _48 = r.zw;
    }
-    r.x = _61.x;
+    float2 rxy = _48;
-    r.y = _61.y;
+    float _62;
    float _78;
    if (p.y > 0.0)
    {
-        _78 = r.x;
+        _62 = rxy.x;
    }
    else
    {
-        _78 = r.y;
+        _62 = rxy.y;
    }
-    r.x = _78;
+    float rr = _62;
-    float2 q = (abs(p) - b) + float2(r.x);
+    float2 q = abs(p) - b;
-    return (fast::min(fast::max(q.x, q.y), 0.0) + length(fast::max(q, float2(0.0)))) - r.x;
+    if (rr == 0.0)
 }
 static inline __attribute__((always_inline))
 float sdf_stroke(thread const float& d, thread const float& stroke_width)
 {
    return abs(d) - (stroke_width * 0.5);
 }
 static inline __attribute__((always_inline))
 float sdCircle(thread const float2& p, thread const float& r)
 {
    return length(p) - r;
 }
 static inline __attribute__((always_inline))
 float sdEllipse(thread float2& p, thread float2& ab)
 {
    p = abs(p);
    if (p.x > p.y)
    {
-        p = p.yx;
+        return fast::max(q.x, q.y);
        ab = ab.yx;
    }
-    float l = (ab.y * ab.y) - (ab.x * ab.x);
+    q += float2(rr);
-    float m = (ab.x * p.x) / l;
+    return (fast::min(fast::max(q.x, q.y), 0.0) + length(fast::max(q, float2(0.0)))) - rr;
    float m2 = m * m;
    float n = (ab.y * p.y) / l;
    float n2 = n * n;
    float c = ((m2 + n2) - 1.0) / 3.0;
    float c3 = (c * c) * c;
    float q = c3 + ((m2 * n2) * 2.0);
    float d = c3 + (m2 * n2);
    float g = m + (m * n2);
    float co;
    if (d < 0.0)
    {
        float h = acos(q / c3) / 3.0;
        float s = cos(h);
        float t = sin(h) * 1.73205077648162841796875;
        float rx = sqrt(((-c) * ((s + t) + 2.0)) + m2);
        float ry = sqrt(((-c) * ((s - t) + 2.0)) + m2);
        co = (((ry + (sign(l) * rx)) + (abs(g) / (rx * ry))) - m) / 2.0;
    }
    else
    {
        float h_1 = ((2.0 * m) * n) * sqrt(d);
        float s_1 = sign(q + h_1) * powr(abs(q + h_1), 0.3333333432674407958984375);
        float u = sign(q - h_1) * powr(abs(q - h_1), 0.3333333432674407958984375);
        float rx_1 = (((-s_1) - u) - (c * 4.0)) + (2.0 * m2);
        float ry_1 = (s_1 - u) * 1.73205077648162841796875;
        float rm = sqrt((rx_1 * rx_1) + (ry_1 * ry_1));
        co = (((ry_1 / sqrt(rm - rx_1)) + ((2.0 * g) / rm)) - m) / 2.0;
    }
    float2 r = ab * float2(co, sqrt(1.0 - (co * co)));
    return length(r - p) * sign(p.y - r.y);
 }
 static inline __attribute__((always_inline))
-float sdSegment(thread const float2& p, thread const float2& a, thread const float2& b)
+float sdRegularPolygon(thread const float2& p, thread const float& r, thread const float& n)
 {
-    float2 pa = p - a;
+    float an = 3.1415927410125732421875 / n;
-    float2 ba = b - a;
+    float bn = mod(precise::atan2(p.y, p.x), 2.0 * an) - an;
-    float h = fast::clamp(dot(pa, ba) / dot(ba, ba), 0.0, 1.0);
+    return (length(p) * cos(bn)) - r;
    return length(pa - (ba * h));
 }
 static inline __attribute__((always_inline))
-float sdf_alpha(thread const float& d, thread const float& soft)
+float sdEllipseApprox(thread const float2& p, thread const float2& ab)
 {
-    return 1.0 - smoothstep(-soft, soft, d);
+    float k0 = length(p / ab);
    float k1 = length(p / (ab * ab));
    return (k0 * (k0 - 1.0)) / k1;
 }
 static inline __attribute__((always_inline))
 float4 gradient_2color(thread const float4& start_color, thread const float4& end_color, thread const float& t)
 {
    return mix(start_color, end_color, float4(fast::clamp(t, 0.0, 1.0)));
 }
 static inline __attribute__((always_inline))
 float sdf_alpha(thread const float& d, thread const float& h)
 {
    return 1.0 - smoothstep(-h, h, d);
 }
 fragment main0_out main0(main0_in in [[stage_in]], texture2d<float> tex [[texture(0)]], sampler texSmplr [[sampler(0)]])
 {
    main0_out out = {};
-    uint kind = in.f_kind_flags & 255u;
+    uint kind = in.f_flags & 255u;
-    uint flags = (in.f_kind_flags >> 8u) & 255u;
+    uint flags = (in.f_flags >> 8u) & 255u;
    if (kind == 0u)
    {
-        out.out_color = in.f_color * tex.sample(texSmplr, in.f_local_or_uv);
+        float4 t = tex.sample(texSmplr, in.f_local_or_uv);
        float _195 = t.w;
        float4 _197 = t;
        float3 _199 = _197.xyz * _195;
        t.x = _199.x;
        t.y = _199.y;
        t.z = _199.z;
        out.out_color = in.f_color * t;
        return out;
    }
    float d = 1000000015047466219876688855040.0;
-    float soft = 1.0;
+    float h = 0.5;
    float2 half_size = in.f_params.xy;
    float2 p_local = in.f_local_or_uv;
    if ((flags & 16u) != 0u)
    {
        float2 sc = float2(as_type<half2>(in.f_rotation_sc));
        p_local = float2((sc.y * p_local.x) + (sc.x * p_local.y), ((-sc.x) * p_local.x) + (sc.y * p_local.y));
    }
    if (kind == 1u)
    {
-        float2 b = in.f_params.xy;
+        float4 corner_radii = float4(in.f_params.zw, in.f_params2.xy);
-        float4 r = float4(in.f_params.zw, in.f_params2.xy);
+        h = in.f_params2.z;
-        soft = fast::max(in.f_params2.z, 1.0);
+        float2 param = p_local;
-        float stroke_px = in.f_params2.w;
+        float2 param_1 = half_size;
-        float2 p_local = in.f_local_or_uv;
+        float4 param_2 = corner_radii;
-        if (in.f_rotation != 0.0)
+        d = sdRoundedBox(param, param_1, param_2);
        {
            float2 param = p_local;
            float param_1 = in.f_rotation;
            p_local = apply_rotation(param, param_1);
        }
        float2 param_2 = p_local;
        float2 param_3 = b;
        float4 param_4 = r;
        float _491 = sdRoundedBox(param_2, param_3, param_4);
        d = _491;
        if ((flags & 1u) != 0u)
        {
            float param_5 = d;
            float param_6 = stroke_px;
            d = sdf_stroke(param_5, param_6);
        }
    }
    else
    {
        if (kind == 2u)
        {
            float radius = in.f_params.x;
-            soft = fast::max(in.f_params.y, 1.0);
+            float sides = in.f_params.y;
-            float stroke_px_1 = in.f_params.z;
+            h = in.f_params.z;
-            float2 param_7 = in.f_local_or_uv;
+            float2 param_3 = p_local;
-            float param_8 = radius;
+            float param_4 = radius;
-            d = sdCircle(param_7, param_8);
+            float param_5 = sides;
-            if ((flags & 1u) != 0u)
+            d = sdRegularPolygon(param_3, param_4, param_5);
-            {
+            half_size = float2(radius);
                float param_9 = d;
                float param_10 = stroke_px_1;
                d = sdf_stroke(param_9, param_10);
            }
        }
        else
        {
            if (kind == 3u)
            {
                float2 ab = in.f_params.xy;
-                soft = fast::max(in.f_params.z, 1.0);
+                h = in.f_params.z;
-                float stroke_px_2 = in.f_params.w;
+                float2 param_6 = p_local;
-                float2 p_local_1 = in.f_local_or_uv;
+                float2 param_7 = ab;
-                if (in.f_rotation != 0.0)
+                d = sdEllipseApprox(param_6, param_7);
-                {
+                half_size = ab;
                    float2 param_11 = p_local_1;
                    float param_12 = in.f_rotation;
                    p_local_1 = apply_rotation(param_11, param_12);
                }
                float2 param_13 = p_local_1;
                float2 param_14 = ab;
                float _560 = sdEllipse(param_13, param_14);
                d = _560;
                if ((flags & 1u) != 0u)
                {
                    float param_15 = d;
                    float param_16 = stroke_px_2;
                    d = sdf_stroke(param_15, param_16);
                }
            }
            else
            {
                if (kind == 4u)
                {
-                    float2 a = in.f_params.xy;
+                    float inner = in.f_params.x;
-                    float2 b_1 = in.f_params.zw;
+                    float outer = in.f_params.y;
-                    float width = in.f_params2.x;
+                    float2 n_start = in.f_params.zw;
-                    soft = fast::max(in.f_params2.y, 1.0);
+                    float2 n_end = in.f_params2.xy;
-                    float2 param_17 = in.f_local_or_uv;
+                    uint arc_bits = (flags >> 5u) & 3u;
-                    float2 param_18 = a;
+                    h = in.f_params2.z;
-                    float2 param_19 = b_1;
+                    float r = length(p_local);
-                    d = sdSegment(param_17, param_18, param_19) - (width * 0.5);
+                    d = fast::max(inner - r, r - outer);
-                }
+                    if (arc_bits != 0u)
                else
                {
                    if (kind == 5u)
                    {
-                        float inner = in.f_params.x;
+                        float d_start = dot(p_local, n_start);
-                        float outer = in.f_params.y;
+                        float d_end = dot(p_local, n_end);
-                        float start_rad = in.f_params.z;
+                        float _372;
-                        float end_rad = in.f_params.w;
+                        if (arc_bits == 1u)
                        soft = fast::max(in.f_params2.x, 1.0);
                        float r_1 = length(in.f_local_or_uv);
                        float d_ring = fast::max(inner - r_1, r_1 - outer);
                        float angle = precise::atan2(in.f_local_or_uv.y, in.f_local_or_uv.x);
                        if (angle < 0.0)
                        {
-                            angle += 6.283185482025146484375;
+                            _372 = fast::max(d_start, d_end);
                        }
                        float ang_start = mod(start_rad, 6.283185482025146484375);
                        float ang_end = mod(end_rad, 6.283185482025146484375);
                        float _654;
                        if (ang_end > ang_start)
                        {
                            _654 = float((angle >= ang_start) && (angle <= ang_end));
                        }
                        else
                        {
-                            _654 = float((angle >= ang_start) || (angle <= ang_end));
+                            _372 = fast::min(d_start, d_end);
                        }
                        float in_arc = _654;
                        if (abs(ang_end - ang_start) >= 6.282185077667236328125)
                        {
                            in_arc = 1.0;
                        }
                        d = (in_arc > 0.5) ? d_ring : 1000000015047466219876688855040.0;
                    }
                    else
                    {
                        if (kind == 6u)
                        {
                            float radius_1 = in.f_params.x;
                            float rotation = in.f_params.y;
                            float sides = in.f_params.z;
                            soft = fast::max(in.f_params.w, 1.0);
                            float stroke_px_3 = in.f_params2.x;
                            float2 p = in.f_local_or_uv;
                            float c = cos(rotation);
                            float s = sin(rotation);
                            p = float2x2(float2(c, -s), float2(s, c)) * p;
                            float an = 3.1415927410125732421875 / sides;
                            float bn = mod(precise::atan2(p.y, p.x), 2.0 * an) - an;
                            d = (length(p) * cos(bn)) - radius_1;
                            if ((flags & 1u) != 0u)
                            {
                                float param_20 = d;
                                float param_21 = stroke_px_3;
                                d = sdf_stroke(param_20, param_21);
                            }
                        }
                        float d_wedge = _372;
                        d = fast::max(d, d_wedge);
                    }
                    half_size = float2(outer);
                }
            }
        }
    }
-    float param_22 = d;
+    float grad_magnitude = fast::max(fwidth(d), 9.9999999747524270787835121154785e-07);
-    float param_23 = soft;
+    d /= grad_magnitude;
-    float alpha = sdf_alpha(param_22, param_23);
+    h /= grad_magnitude;
-    out.out_color = float4(in.f_color.xyz, in.f_color.w * alpha);
+    float4 shape_color;
    if ((flags & 2u) != 0u)
    {
        float4 gradient_start = in.f_color;
        float4 gradient_end = unpack_unorm4x8_to_float(in.f_uv_or_effects.x);
        if ((flags & 4u) != 0u)
        {
            float t_1 = length(p_local / half_size);
            float4 param_8 = gradient_start;
            float4 param_9 = gradient_end;
            float param_10 = t_1;
            shape_color = gradient_2color(param_8, param_9, param_10);
        }
        else
        {
            float2 direction = float2(as_type<half2>(in.f_uv_or_effects.z));
            float t_2 = (dot(p_local / half_size, direction) * 0.5) + 0.5;
            float4 param_11 = gradient_start;
            float4 param_12 = gradient_end;
            float param_13 = t_2;
            shape_color = gradient_2color(param_11, param_12, param_13);
        }
    }
    else
    {
        if ((flags & 1u) != 0u)
        {
            float4 uv_rect = as_type<float4>(in.f_uv_or_effects);
            float2 local_uv = ((p_local / half_size) * 0.5) + float2(0.5);
            float2 uv = mix(uv_rect.xy, uv_rect.zw, local_uv);
            shape_color = in.f_color * tex.sample(texSmplr, uv);
        }
        else
        {
            shape_color = in.f_color;
        }
    }
    if ((flags & 8u) != 0u)
    {
        float4 ol_color = unpack_unorm4x8_to_float(in.f_uv_or_effects.y);
        float ol_width = float2(as_type<half2>(in.f_uv_or_effects.w)).x / grad_magnitude;
        float param_14 = d;
        float param_15 = h;
        float fill_cov = sdf_alpha(param_14, param_15);
        float param_16 = d - ol_width;
        float param_17 = h;
        float total_cov = sdf_alpha(param_16, param_17);
        float outline_cov = fast::max(total_cov - fill_cov, 0.0);
        float3 rgb_pm = ((shape_color.xyz * shape_color.w) * fill_cov) + ((ol_color.xyz * ol_color.w) * outline_cov);
        float alpha_pm = (shape_color.w * fill_cov) + (ol_color.w * outline_cov);
        out.out_color = float4(rgb_pm, alpha_pm);
    }
    else
    {
        float param_18 = d;
        float param_19 = h;
        float alpha = sdf_alpha(param_18, param_19);
        out.out_color = float4((shape_color.xyz * shape_color.w) * alpha, shape_color.w * alpha);
    }
    return out;
 }
@@ -14,22 +14,24 @@ struct Primitive
 {
    float4 bounds;
    uint color;
-    uint kind_flags;
+    uint flags;
-    float rotation;
+    uint rotation_sc;
    float _pad;
    float4 params;
    float4 params2;
    uint4 uv_or_effects;
 };
 struct Primitive_1
 {
    float4 bounds;
    uint color;
-    uint kind_flags;
+    uint flags;
-    float rotation;
+    uint rotation_sc;
    float _pad;
    float4 params;
    float4 params2;
    uint4 uv_or_effects;
 };
 struct Primitives
@@ -43,8 +45,9 @@ struct main0_out
    float2 f_local_or_uv [[user(locn1)]];
    float4 f_params [[user(locn2)]];
    float4 f_params2 [[user(locn3)]];
-    uint f_kind_flags [[user(locn4)]];
+    uint f_flags [[user(locn4)]];
-    float f_rotation [[user(locn5)]];
+    uint f_rotation_sc [[user(locn5)]];
    uint4 f_uv_or_effects [[user(locn6)]];
    float4 gl_Position [[position]];
 };
@@ -55,7 +58,7 @@ struct main0_in
    float4 v_color [[attribute(2)]];
 };
-vertex main0_out main0(main0_in in [[stage_in]], constant Uniforms& _12 [[buffer(0)]], const device Primitives& _72 [[buffer(1)]], uint gl_InstanceIndex [[instance_id]])
+vertex main0_out main0(main0_in in [[stage_in]], constant Uniforms& _12 [[buffer(0)]], const device Primitives& _75 [[buffer(1)]], uint gl_InstanceIndex [[instance_id]])
 {
    main0_out out = {};
    if (_12.mode == 0u)
@@ -64,20 +67,22 @@ vertex main0_out main0(main0_in in [[stage_in]], constant Uniforms& _12 [[buffer
        out.f_local_or_uv = in.v_uv;
        out.f_params = float4(0.0);
        out.f_params2 = float4(0.0);
-        out.f_kind_flags = 0u;
+        out.f_flags = 0u;
-        out.f_rotation = 0.0;
+        out.f_rotation_sc = 0u;
        out.f_uv_or_effects = uint4(0u);
        out.gl_Position = _12.projection * float4(in.v_position * _12.dpi_scale, 0.0, 1.0);
    }
    else
    {
        Primitive p;
-        p.bounds = _72.primitives[int(gl_InstanceIndex)].bounds;
+        p.bounds = _75.primitives[int(gl_InstanceIndex)].bounds;
-        p.color = _72.primitives[int(gl_InstanceIndex)].color;
+        p.color = _75.primitives[int(gl_InstanceIndex)].color;
-        p.kind_flags = _72.primitives[int(gl_InstanceIndex)].kind_flags;
+        p.flags = _75.primitives[int(gl_InstanceIndex)].flags;
-        p.rotation = _72.primitives[int(gl_InstanceIndex)].rotation;
+        p.rotation_sc = _75.primitives[int(gl_InstanceIndex)].rotation_sc;
-        p._pad = _72.primitives[int(gl_InstanceIndex)]._pad;
+        p._pad = _75.primitives[int(gl_InstanceIndex)]._pad;
-        p.params = _72.primitives[int(gl_InstanceIndex)].params;
+        p.params = _75.primitives[int(gl_InstanceIndex)].params;
-        p.params2 = _72.primitives[int(gl_InstanceIndex)].params2;
+        p.params2 = _75.primitives[int(gl_InstanceIndex)].params2;
        p.uv_or_effects = _75.primitives[int(gl_InstanceIndex)].uv_or_effects;
        float2 corner = in.v_position;
        float2 world_pos = mix(p.bounds.xy, p.bounds.zw, corner);
        float2 center = (p.bounds.xy + p.bounds.zw) * 0.5;
@@ -85,8 +90,9 @@ vertex main0_out main0(main0_in in [[stage_in]], constant Uniforms& _12 [[buffer
        out.f_local_or_uv = (world_pos - center) * _12.dpi_scale;
        out.f_params = p.params;
        out.f_params2 = p.params2;
-        out.f_kind_flags = p.kind_flags;
+        out.f_flags = p.flags;
-        out.f_rotation = p.rotation;
+        out.f_rotation_sc = p.rotation_sc;
        out.f_uv_or_effects = p.uv_or_effects;
        out.gl_Position = _12.projection * float4(world_pos * _12.dpi_scale, 0.0, 1.0);
    }
    return out;
@@ -1,12 +1,13 @@
 #version 450 core
 // --- Inputs from vertex shader ---
-layout(location = 0) in vec4 f_color;
+layout(location = 0) in mediump vec4 f_color;
 layout(location = 1) in vec2 f_local_or_uv;
 layout(location = 2) in vec4 f_params;
 layout(location = 3) in vec4 f_params2;
-layout(location = 4) flat in uint f_kind_flags;
+layout(location = 4) flat in uint f_flags;
-layout(location = 5) flat in float f_rotation;
+layout(location = 5) flat in uint f_rotation_sc;
 layout(location = 6) flat in uvec4 f_uv_or_effects;
 // --- Output ---
 layout(location = 0) out vec4 out_color;
@@ -19,77 +20,43 @@ layout(set = 2, binding = 0) uniform sampler2D tex;
 // All operate in physical pixel space — no dpi_scale needed here.
 // ---------------------------------------------------------------------------
 const float PI = 3.14159265358979;
 float sdCircle(vec2 p, float r) {
    return length(p) - r;
 }
 float sdRoundedBox(vec2 p, vec2 b, vec4 r) {
-    r.xy = (p.x > 0.0) ? r.xy : r.zw;
+    vec2 rxy = (p.x > 0.0) ? r.xy : r.zw;
-    r.x = (p.y > 0.0) ? r.x : r.y;
+    float rr = (p.y > 0.0) ? rxy.x : rxy.y;
-    vec2 q = abs(p) - b + r.x;
+    vec2 q = abs(p) - b;
-    return min(max(q.x, q.y), 0.0) + length(max(q, vec2(0.0))) - r.x;
+    if (rr == 0.0) {
-}
+        return max(q.x, q.y);
 float sdSegment(vec2 p, vec2 a, vec2 b) {
    vec2 pa = p - a, ba = b - a;
    float h = clamp(dot(pa, ba) / dot(ba, ba), 0.0, 1.0);
    return length(pa - ba * h);
 }
 float sdEllipse(vec2 p, vec2 ab) {
    p = abs(p);
    if (p.x > p.y) {
        p = p.yx;
        ab = ab.yx;
    }
-    float l = ab.y * ab.y - ab.x * ab.x;
+    q += rr;
-    float m = ab.x * p.x / l;
+    return min(max(q.x, q.y), 0.0) + length(max(q, vec2(0.0))) - rr;
    float m2 = m * m;
    float n = ab.y * p.y / l;
    float n2 = n * n;
    float c = (m2 + n2 - 1.0) / 3.0;
    float c3 = c * c * c;
    float q = c3 + m2 * n2 * 2.0;
    float d = c3 + m2 * n2;
    float g = m + m * n2;
    float co;
    if (d < 0.0) {
        float h = acos(q / c3) / 3.0;
        float s = cos(h);
        float t = sin(h) * sqrt(3.0);
        float rx = sqrt(-c * (s + t + 2.0) + m2);
        float ry = sqrt(-c * (s - t + 2.0) + m2);
        co = (ry + sign(l) * rx + abs(g) / (rx * ry) - m) / 2.0;
    } else {
        float h = 2.0 * m * n * sqrt(d);
        float s = sign(q + h) * pow(abs(q + h), 1.0 / 3.0);
        float u = sign(q - h) * pow(abs(q - h), 1.0 / 3.0);
        float rx = -s - u - c * 4.0 + 2.0 * m2;
        float ry = (s - u) * sqrt(3.0);
        float rm = sqrt(rx * rx + ry * ry);
        co = (ry / sqrt(rm - rx) + 2.0 * g / rm - m) / 2.0;
    }
    vec2 r = ab * vec2(co, sqrt(1.0 - co * co));
    return length(r - p) * sign(p.y - r.y);
 }
-float sdf_alpha(float d, float soft) {
+// Approximate ellipse SDF — fast, suitable for UI, NOT a true Euclidean distance.
-    return 1.0 - smoothstep(-soft, soft, d);
+float sdEllipseApprox(vec2 p, vec2 ab) {
    float k0 = length(p / ab);
    float k1 = length(p / (ab * ab));
    return k0 * (k0 - 1.0) / k1;
 }
-float sdf_stroke(float d, float stroke_width) {
+// Regular N-gon SDF (Inigo Quilez).
-    return abs(d) - stroke_width * 0.5;
+float sdRegularPolygon(vec2 p, float r, float n) {
    float an = 3.141592653589793 / n;
    float bn = mod(atan(p.y, p.x), 2.0 * an) - an;
    return length(p) * cos(bn) - r;
 }
-// Rotate a 2D point by the negative of the given angle (inverse rotation).
+// Coverage from SDF distance using half-feather width (feather_px * 0.5, pre-computed on CPU).
-// Used to rotate the sampling frame opposite to the shape's rotation so that
+// Produces a symmetric transition centered on d=0: smoothstep(-h, h, d).
-// the SDF evaluates correctly for the rotated shape.
+float sdf_alpha(float d, float h) {
-vec2 apply_rotation(vec2 p, float angle) {
+    return 1.0 - smoothstep(-h, h, d);
-    float cr = cos(-angle);
+}
-    float sr = sin(-angle);
+
-    return mat2(cr, sr, -sr, cr) * p;
+// ---------------------------------------------------------------------------
 // Gradient helpers
 // ---------------------------------------------------------------------------
 mediump vec4 gradient_2color(mediump vec4 start_color, mediump vec4 end_color, mediump float t) {
    return mix(start_color, end_color, clamp(t, 0.0, 1.0));
 }
 // ---------------------------------------------------------------------------
@@ -97,114 +64,137 @@ vec2 apply_rotation(vec2 p, float angle) {
 // ---------------------------------------------------------------------------
 void main() {
-    uint kind = f_kind_flags & 0xFFu;
+    uint kind = f_flags & 0xFFu;
-    uint flags = (f_kind_flags >> 8u) & 0xFFu;
+    uint flags = (f_flags >> 8u) & 0xFFu;
-    // -----------------------------------------------------------------------
+    // Kind 0: Tessellated path — vertex colors arrive premultiplied from CPU.
-    // Kind 0: Tessellated path. Texture multiply for text atlas,
+    // Texture samples are straight-alpha (SDL_ttf glyph atlas: rgb=1, a=coverage;
-    //         white pixel for solid shapes.
+    // or the 1x1 white texture: rgba=1). Convert to premultiplied form so the
-    // -----------------------------------------------------------------------
+    // blend state (ONE, ONE_MINUS_SRC_ALPHA) composites correctly.
    if (kind == 0u) {
-        out_color = f_color * texture(tex, f_local_or_uv);
+        vec4 t = texture(tex, f_local_or_uv);
        t.rgb *= t.a;
        out_color = f_color * t;
        return;
    }
-    // -----------------------------------------------------------------------
+    // SDF path — dispatch on kind
    // SDF path. f_local_or_uv = shape-centered position in physical pixels.
    // All dimensional params are already in physical pixels (CPU pre-scaled).
    // -----------------------------------------------------------------------
    float d = 1e30;
-    float soft = 1.0;
+    float h = 0.5; // half-feather width; overwritten per shape kind
    vec2 half_size = f_params.xy; // used by RRect and as reference size for gradients
    vec2 p_local = f_local_or_uv;
    // Apply inverse rotation using pre-computed sin/cos (no per-pixel trig).
    // .Rotated flag = bit 4 = 16u
    if ((flags & 16u) != 0u) {
        vec2 sc = unpackHalf2x16(f_rotation_sc); // .x = sin(angle), .y = cos(angle)
        // Inverse rotation matrix R(-angle) = [[cos, sin], [-sin, cos]]
        p_local = vec2(sc.y * p_local.x + sc.x * p_local.y,
                -sc.x * p_local.x + sc.y * p_local.y);
    }
    if (kind == 1u) {
-        // RRect: rounded box
+        // RRect — half_feather in params2.z
-        vec2 b = f_params.xy; // half_size (phys px)
+        vec4 corner_radii = vec4(f_params.zw, f_params2.xy);
-        vec4 r = vec4(f_params.zw, f_params2.xy); // corner radii: tr, br, tl, bl
+        h = f_params2.z;
-        soft = max(f_params2.z, 1.0);
+        d = sdRoundedBox(p_local, half_size, corner_radii);
        float stroke_px = f_params2.w;
        vec2 p_local = f_local_or_uv;
        if (f_rotation != 0.0) {
            p_local = apply_rotation(p_local, f_rotation);
        }
        d = sdRoundedBox(p_local, b, r);
        if ((flags & 1u) != 0u) d = sdf_stroke(d, stroke_px);
    }
    else if (kind == 2u) {
-        // Circle — rotationally symmetric, no rotation needed
+        // NGon — half_feather in params.z
        float radius = f_params.x;
-        soft = max(f_params.y, 1.0);
+        float sides = f_params.y;
-        float stroke_px = f_params.z;
+        h = f_params.z;
-
+        d = sdRegularPolygon(p_local, radius, sides);
-        d = sdCircle(f_local_or_uv, radius);
+        half_size = vec2(radius); // for gradient UV computation
        if ((flags & 1u) != 0u) d = sdf_stroke(d, stroke_px);
    }
    else if (kind == 3u) {
-        // Ellipse
+        // Ellipse — half_feather in params.z
        vec2 ab = f_params.xy;
-        soft = max(f_params.z, 1.0);
+        h = f_params.z;
-        float stroke_px = f_params.w;
+        d = sdEllipseApprox(p_local, ab);
-
+        half_size = ab; // for gradient UV computation
        vec2 p_local = f_local_or_uv;
        if (f_rotation != 0.0) {
            p_local = apply_rotation(p_local, f_rotation);
        }
        d = sdEllipse(p_local, ab);
        if ((flags & 1u) != 0u) d = sdf_stroke(d, stroke_px);
    }
    else if (kind == 4u) {
-        // Segment (capsule line) — no rotation (excluded)
+        // Ring_Arc — half_feather in params2.z
-        vec2 a = f_params.xy; // already in local physical pixels
+        // Arc mode from flag bits 5-6: 0 = full, 1 = narrow (≤π), 2 = wide (>π)
        vec2 b = f_params.zw;
        float width = f_params2.x;
        soft = max(f_params2.y, 1.0);
        d = sdSegment(f_local_or_uv, a, b) - width * 0.5;
    }
    else if (kind == 5u) {
        // Ring / Arc — rotation handled by CPU angle offset, no shader rotation
        float inner = f_params.x;
        float outer = f_params.y;
-        float start_rad = f_params.z;
+        vec2 n_start = f_params.zw;
-        float end_rad = f_params.w;
+        vec2 n_end = f_params2.xy;
-        soft = max(f_params2.x, 1.0);
+        uint arc_bits = (flags >> 5u) & 3u;
-        float r = length(f_local_or_uv);
+        h = f_params2.z;
        float d_ring = max(inner - r, r - outer);
-        // Angular clip
+        float r = length(p_local);
-        float angle = atan(f_local_or_uv.y, f_local_or_uv.x);
+        d = max(inner - r, r - outer);
        if (angle < 0.0) angle += 2.0 * PI;
        float ang_start = mod(start_rad, 2.0 * PI);
        float ang_end = mod(end_rad, 2.0 * PI);
-        float in_arc = (ang_end > ang_start)
+        if (arc_bits != 0u) {
-            ? ((angle >= ang_start && angle <= ang_end) ? 1.0 : 0.0) : ((angle >= ang_start || angle <= ang_end) ? 1.0 : 0.0);
+            float d_start = dot(p_local, n_start);
-        if (abs(ang_end - ang_start) >= 2.0 * PI - 0.001) in_arc = 1.0;
+            float d_end = dot(p_local, n_end);
            float d_wedge = (arc_bits == 1u)
                ? max(d_start, d_end) // arc ≤ π: intersect half-planes
                : min(d_start, d_end); // arc > π: union half-planes
            d = max(d, d_wedge);
        }
-        d = in_arc > 0.5 ? d_ring : 1e30;
+        half_size = vec2(outer); // for gradient UV computation
    }
    else if (kind == 6u) {
        // Regular N-gon — has its own rotation in params, no Primitive.rotation used
        float radius = f_params.x;
        float rotation = f_params.y;
        float sides = f_params.z;
        soft = max(f_params.w, 1.0);
        float stroke_px = f_params2.x;
        vec2 p = f_local_or_uv;
        float c = cos(rotation), s = sin(rotation);
        p = mat2(c, -s, s, c) * p;
        float an = PI / sides;
        float bn = mod(atan(p.y, p.x), 2.0 * an) - an;
        d = length(p) * cos(bn) - radius;
        if ((flags & 1u) != 0u) d = sdf_stroke(d, stroke_px);
    }
-    float alpha = sdf_alpha(d, soft);
+    // --- fwidth-based normalization for correct AA and stroke width ---
-    out_color = vec4(f_color.rgb, f_color.a * alpha);
+    float grad_magnitude = max(fwidth(d), 1e-6);
    d = d / grad_magnitude;
    h = h / grad_magnitude;
    // --- Determine shape color based on flags ---
    mediump vec4 shape_color;
    if ((flags & 2u) != 0u) {
        // Gradient active (bit 1)
        mediump vec4 gradient_start = f_color;
        mediump vec4 gradient_end = unpackUnorm4x8(f_uv_or_effects.x);
        if ((flags & 4u) != 0u) {
            // Radial gradient (bit 2): t from distance to center
            mediump float t = length(p_local / half_size);
            shape_color = gradient_2color(gradient_start, gradient_end, t);
        } else {
            // Linear gradient: direction pre-computed on CPU as (cos, sin) f16 pair
            vec2 direction = unpackHalf2x16(f_uv_or_effects.z);
            mediump float t = dot(p_local / half_size, direction) * 0.5 + 0.5;
            shape_color = gradient_2color(gradient_start, gradient_end, t);
        }
    } else if ((flags & 1u) != 0u) {
        // Textured (bit 0) — RRect only in practice
        vec4 uv_rect = uintBitsToFloat(f_uv_or_effects);
        vec2 local_uv = p_local / half_size * 0.5 + 0.5;
        vec2 uv = mix(uv_rect.xy, uv_rect.zw, local_uv);
        shape_color = f_color * texture(tex, uv);
    } else {
        // Solid color
        shape_color = f_color;
    }
    // --- Outline (bit 3) — outer outline via premultiplied compositing ---
    // The outline band sits OUTSIDE the original shape boundary (d=0 to d=+ol_width).
    // fill_cov covers the interior with AA at d=0; total_cov covers interior+outline with
    // AA at d=ol_width. The outline band's coverage is total_cov - fill_cov.
    // Output is premultiplied: blend state is ONE, ONE_MINUS_SRC_ALPHA.
    if ((flags & 8u) != 0u) {
        mediump vec4 ol_color = unpackUnorm4x8(f_uv_or_effects.y);
        // Outline width in f_uv_or_effects.w (low f16 half)
        float ol_width = unpackHalf2x16(f_uv_or_effects.w).x / grad_magnitude;
        float fill_cov = sdf_alpha(d, h);
        float total_cov = sdf_alpha(d - ol_width, h);
        float outline_cov = max(total_cov - fill_cov, 0.0);
        // Premultiplied output — no divide, no threshold check
        vec3 rgb_pm = shape_color.rgb * shape_color.a * fill_cov
                + ol_color.rgb * ol_color.a * outline_cov;
        float alpha_pm = shape_color.a * fill_cov + ol_color.a * outline_cov;
        out_color = vec4(rgb_pm, alpha_pm);
    } else {
        mediump float alpha = sdf_alpha(d, h);
        out_color = vec4(shape_color.rgb * shape_color.a * alpha, shape_color.a * alpha);
    }
 }
@@ -6,12 +6,13 @@ layout(location = 1) in vec2 v_uv;
 layout(location = 2) in vec4 v_color;
 // ---------- Outputs to fragment shader ----------
-layout(location = 0) out vec4 f_color;
+layout(location = 0) out mediump vec4 f_color;
 layout(location = 1) out vec2 f_local_or_uv;
 layout(location = 2) out vec4 f_params;
 layout(location = 3) out vec4 f_params2;
-layout(location = 4) flat out uint f_kind_flags;
+layout(location = 4) flat out uint f_flags;
-layout(location = 5) flat out float f_rotation;
+layout(location = 5) flat out uint f_rotation_sc;
 layout(location = 6) flat out uvec4 f_uv_or_effects;
 // ---------- Uniforms (single block — avoids spirv-cross reordering on Metal) ----------
 layout(set = 1, binding = 0) uniform Uniforms {
@@ -22,13 +23,14 @@ layout(set = 1, binding = 0) uniform Uniforms {
 // ---------- SDF primitive storage buffer ----------
 struct Primitive {
-    vec4 bounds; // 0-15:  min_x, min_y, max_x, max_y
+    vec4 bounds; // 0-15
-    uint color; // 16-19: packed u8x4 (unpack with unpackUnorm4x8)
+    uint color; // 16-19
-    uint kind_flags; // 20-23: kind | (flags << 8)
+    uint flags; // 20-23
-    float rotation; // 24-27: shader self-rotation in radians
+    uint rotation_sc; // 24-27: packed f16 pair (sin, cos)
-    float _pad; // 28-31: alignment padding
+    float _pad; // 28-31
-    vec4 params; // 32-47: shape params part 1
+    vec4 params; // 32-47
-    vec4 params2; // 48-63: shape params part 2
+    vec4 params2; // 48-63
    uvec4 uv_or_effects; // 64-79
 };
 layout(std430, set = 0, binding = 0) readonly buffer Primitives {
@@ -43,8 +45,9 @@ void main() {
        f_local_or_uv = v_uv;
        f_params = vec4(0.0);
        f_params2 = vec4(0.0);
-        f_kind_flags = 0u;
+        f_flags = 0u;
-        f_rotation = 0.0;
+        f_rotation_sc = 0u;
        f_uv_or_effects = uvec4(0);
        gl_Position = projection * vec4(v_position * dpi_scale, 0.0, 1.0);
    } else {
@@ -59,8 +62,9 @@ void main() {
        f_local_or_uv = (world_pos - center) * dpi_scale; // shape-centered physical pixels
        f_params = p.params;
        f_params2 = p.params2;
-        f_kind_flags = p.kind_flags;
+        f_flags = p.flags;
-        f_rotation = p.rotation;
+        f_rotation_sc = p.rotation_sc;
        f_uv_or_effects = p.uv_or_effects;
        gl_Position = projection * vec4(world_pos * dpi_scale, 0.0, 1.0);
    }
@@ -0,0 +1,330 @@
 package tess
 import "core:math"
 import draw ".."
 SMOOTH_CIRCLE_ERROR_RATE :: 0.1
 auto_segments :: proc(radius: f32, arc_degrees: f32) -> int {
 	if radius <= 0 do return 4
 	phys_radius := radius * draw.GLOB.dpi_scaling
 	acos_arg := clamp(2 * math.pow(1 - SMOOTH_CIRCLE_ERROR_RATE / phys_radius, 2) - 1, -1, 1)
 	theta := math.acos(acos_arg)
 	if theta <= 0 do return 4
 	full_circle_segments := int(math.ceil(2 * math.PI / theta))
 	segments := int(f32(full_circle_segments) * arc_degrees / 360.0)
 	min_segments := max(int(math.ceil(f64(arc_degrees / 90.0))), 4)
 	return max(segments, min_segments)
 }
 // ----- Internal helpers -----
 // Color is premultiplied: the tessellated fragment shader passes it through directly
 // and the blend state is ONE, ONE_MINUS_SRC_ALPHA.
 solid_vertex :: proc(position: draw.Vec2, color: draw.Color) -> draw.Vertex {
 	return draw.Vertex{position = position, color = draw.premultiply_color(color)}
 }
 emit_rectangle :: proc(x, y, width, height: f32, color: draw.Color, vertices: []draw.Vertex, offset: int) {
 	vertices[offset + 0] = solid_vertex({x, y}, color)
 	vertices[offset + 1] = solid_vertex({x + width, y}, color)
 	vertices[offset + 2] = solid_vertex({x + width, y + height}, color)
 	vertices[offset + 3] = solid_vertex({x, y}, color)
 	vertices[offset + 4] = solid_vertex({x + width, y + height}, color)
 	vertices[offset + 5] = solid_vertex({x, y + height}, color)
 }
 extrude_line :: proc(
 	start, end_pos: draw.Vec2,
 	thickness: f32,
 	color: draw.Color,
 	vertices: []draw.Vertex,
 	offset: int,
 ) -> int {
 	direction := end_pos - start
 	delta_x := direction[0]
 	delta_y := direction[1]
 	length := math.sqrt(delta_x * delta_x + delta_y * delta_y)
 	if length < 0.0001 do return 0
 	scale := thickness / (2 * length)
 	perpendicular := draw.Vec2{-delta_y * scale, delta_x * scale}
 	p0 := start + perpendicular
 	p1 := start - perpendicular
 	p2 := end_pos - perpendicular
 	p3 := end_pos + perpendicular
 	vertices[offset + 0] = solid_vertex(p0, color)
 	vertices[offset + 1] = solid_vertex(p1, color)
 	vertices[offset + 2] = solid_vertex(p2, color)
 	vertices[offset + 3] = solid_vertex(p0, color)
 	vertices[offset + 4] = solid_vertex(p2, color)
 	vertices[offset + 5] = solid_vertex(p3, color)
 	return 6
 }
 // ----- Public draw -----
 pixel :: proc(layer: ^draw.Layer, pos: draw.Vec2, color: draw.Color) {
 	vertices: [6]draw.Vertex
 	emit_rectangle(pos[0], pos[1], 1, 1, color, vertices[:], 0)
 	draw.prepare_shape(layer, vertices[:])
 }
 triangle :: proc(
 	layer: ^draw.Layer,
 	v1, v2, v3: draw.Vec2,
 	color: draw.Color,
 	origin: draw.Vec2 = {},
 	rotation: f32 = 0,
 ) {
 	if !draw.needs_transform(origin, rotation) {
 		vertices := [3]draw.Vertex{solid_vertex(v1, color), solid_vertex(v2, color), solid_vertex(v3, color)}
 		draw.prepare_shape(layer, vertices[:])
 		return
 	}
 	bounds_min := draw.Vec2{min(v1.x, v2.x, v3.x), min(v1.y, v2.y, v3.y)}
 	transform := draw.build_pivot_rotation(bounds_min, origin, rotation)
 	local_v1 := v1 - bounds_min
 	local_v2 := v2 - bounds_min
 	local_v3 := v3 - bounds_min
 	vertices := [3]draw.Vertex {
 		solid_vertex(draw.apply_transform(transform, local_v1), color),
 		solid_vertex(draw.apply_transform(transform, local_v2), color),
 		solid_vertex(draw.apply_transform(transform, local_v3), color),
 	}
 	draw.prepare_shape(layer, vertices[:])
 }
 // Draw an anti-aliased triangle via extruded edge quads.
 // Interior vertices get the full premultiplied color; outer fringe vertices get BLANK (0,0,0,0).
 // The rasterizer linearly interpolates between them, producing a smooth 1-pixel AA band.
 // `aa_px` controls the extrusion width in logical pixels (default 1.0).
 // This proc emits 21 vertices (3 interior + 6 edge quads × 3 verts each).
 triangle_aa :: proc(
 	layer: ^draw.Layer,
 	v1, v2, v3: draw.Vec2,
 	color: draw.Color,
 	aa_px: f32 = draw.DFT_FEATHER_PX,
 	origin: draw.Vec2 = {},
 	rotation: f32 = 0,
 ) {
 	// Apply rotation if needed, then work in world space.
 	p0, p1, p2: draw.Vec2
 	if !draw.needs_transform(origin, rotation) {
 		p0 = v1
 		p1 = v2
 		p2 = v3
 	} else {
 		bounds_min := draw.Vec2{min(v1.x, v2.x, v3.x), min(v1.y, v2.y, v3.y)}
 		transform := draw.build_pivot_rotation(bounds_min, origin, rotation)
 		p0 = draw.apply_transform(transform, v1 - bounds_min)
 		p1 = draw.apply_transform(transform, v2 - bounds_min)
 		p2 = draw.apply_transform(transform, v3 - bounds_min)
 	}
 	// Compute outward edge normals (unit length, pointing away from triangle interior).
 	// Winding-independent: we check against the centroid to ensure normals point outward.
 	centroid_x := (p0.x + p1.x + p2.x) / 3.0
 	centroid_y := (p0.y + p1.y + p2.y) / 3.0
 	edge_normal :: proc(edge_start, edge_end: draw.Vec2, centroid_x, centroid_y: f32) -> draw.Vec2 {
 		delta_x := edge_end.x - edge_start.x
 		delta_y := edge_end.y - edge_start.y
 		length := math.sqrt(delta_x * delta_x + delta_y * delta_y)
 		if length < 0.0001 do return {0, 0}
 		inverse_length := 1.0 / length
 		// Perpendicular: (-delta_y, delta_x) normalized
 		normal_x := -delta_y * inverse_length
 		normal_y := delta_x * inverse_length
 		// Midpoint of the edge
 		midpoint_x := (edge_start.x + edge_end.x) * 0.5
 		midpoint_y := (edge_start.y + edge_end.y) * 0.5
 		// If normal points toward centroid, flip it
 		if normal_x * (centroid_x - midpoint_x) + normal_y * (centroid_y - midpoint_y) > 0 {
 			normal_x = -normal_x
 			normal_y = -normal_y
 		}
 		return {normal_x, normal_y}
 	}
 	normal_01 := edge_normal(p0, p1, centroid_x, centroid_y)
 	normal_12 := edge_normal(p1, p2, centroid_x, centroid_y)
 	normal_20 := edge_normal(p2, p0, centroid_x, centroid_y)
 	extrude_distance := aa_px * draw.GLOB.dpi_scaling
 	// Outer fringe vertices: each edge vertex extruded outward
 	outer_0_01 := p0 + normal_01 * extrude_distance
 	outer_1_01 := p1 + normal_01 * extrude_distance
 	outer_1_12 := p1 + normal_12 * extrude_distance
 	outer_2_12 := p2 + normal_12 * extrude_distance
 	outer_2_20 := p2 + normal_20 * extrude_distance
 	outer_0_20 := p0 + normal_20 * extrude_distance
 	// Premultiplied interior color (solid_vertex does premul internally).
 	// Outer fringe is BLANK = {0,0,0,0} which is already premul.
 	transparent := draw.BLANK
 	// 3 interior + 6 × 3 edge-quad = 21 vertices
 	vertices: [21]draw.Vertex
 	// Interior triangle
 	vertices[0] = solid_vertex(p0, color)
 	vertices[1] = solid_vertex(p1, color)
 	vertices[2] = solid_vertex(p2, color)
 	// Edge quad: p0→p1 (2 triangles)
 	vertices[3] = solid_vertex(p0, color)
 	vertices[4] = solid_vertex(p1, color)
 	vertices[5] = solid_vertex(outer_1_01, transparent)
 	vertices[6] = solid_vertex(p0, color)
 	vertices[7] = solid_vertex(outer_1_01, transparent)
 	vertices[8] = solid_vertex(outer_0_01, transparent)
 	// Edge quad: p1→p2 (2 triangles)
 	vertices[9] = solid_vertex(p1, color)
 	vertices[10] = solid_vertex(p2, color)
 	vertices[11] = solid_vertex(outer_2_12, transparent)
 	vertices[12] = solid_vertex(p1, color)
 	vertices[13] = solid_vertex(outer_2_12, transparent)
 	vertices[14] = solid_vertex(outer_1_12, transparent)
 	// Edge quad: p2→p0 (2 triangles)
 	vertices[15] = solid_vertex(p2, color)
 	vertices[16] = solid_vertex(p0, color)
 	vertices[17] = solid_vertex(outer_0_20, transparent)
 	vertices[18] = solid_vertex(p2, color)
 	vertices[19] = solid_vertex(outer_0_20, transparent)
 	vertices[20] = solid_vertex(outer_2_20, transparent)
 	draw.prepare_shape(layer, vertices[:])
 }
 triangle_lines :: proc(
 	layer: ^draw.Layer,
 	v1, v2, v3: draw.Vec2,
 	color: draw.Color,
 	thickness: f32 = draw.DFT_STROKE_THICKNESS,
 	origin: draw.Vec2 = {},
 	rotation: f32 = 0,
 	temp_allocator := context.temp_allocator,
 ) {
 	vertices := make([]draw.Vertex, 18, temp_allocator)
 	defer delete(vertices, temp_allocator)
 	write_offset := 0
 	if !draw.needs_transform(origin, rotation) {
 		write_offset += extrude_line(v1, v2, thickness, color, vertices, write_offset)
 		write_offset += extrude_line(v2, v3, thickness, color, vertices, write_offset)
 		write_offset += extrude_line(v3, v1, thickness, color, vertices, write_offset)
 	} else {
 		bounds_min := draw.Vec2{min(v1.x, v2.x, v3.x), min(v1.y, v2.y, v3.y)}
 		transform := draw.build_pivot_rotation(bounds_min, origin, rotation)
 		transformed_v1 := draw.apply_transform(transform, v1 - bounds_min)
 		transformed_v2 := draw.apply_transform(transform, v2 - bounds_min)
 		transformed_v3 := draw.apply_transform(transform, v3 - bounds_min)
 		write_offset += extrude_line(transformed_v1, transformed_v2, thickness, color, vertices, write_offset)
 		write_offset += extrude_line(transformed_v2, transformed_v3, thickness, color, vertices, write_offset)
 		write_offset += extrude_line(transformed_v3, transformed_v1, thickness, color, vertices, write_offset)
 	}
 	if write_offset > 0 {
 		draw.prepare_shape(layer, vertices[:write_offset])
 	}
 }
 triangle_fan :: proc(
 	layer: ^draw.Layer,
 	points: []draw.Vec2,
 	color: draw.Color,
 	origin: draw.Vec2 = {},
 	rotation: f32 = 0,
 	temp_allocator := context.temp_allocator,
 ) {
 	if len(points) < 3 do return
 	triangle_count := len(points) - 2
 	vertex_count := triangle_count * 3
 	vertices := make([]draw.Vertex, vertex_count, temp_allocator)
 	defer delete(vertices, temp_allocator)
 	if !draw.needs_transform(origin, rotation) {
 		for i in 1 ..< len(points) - 1 {
 			idx := (i - 1) * 3
 			vertices[idx + 0] = solid_vertex(points[0], color)
 			vertices[idx + 1] = solid_vertex(points[i], color)
 			vertices[idx + 2] = solid_vertex(points[i + 1], color)
 		}
 	} else {
 		bounds_min := draw.Vec2{max(f32), max(f32)}
 		for point in points {
 			bounds_min.x = min(bounds_min.x, point.x)
 			bounds_min.y = min(bounds_min.y, point.y)
 		}
 		transform := draw.build_pivot_rotation(bounds_min, origin, rotation)
 		for i in 1 ..< len(points) - 1 {
 			idx := (i - 1) * 3
 			vertices[idx + 0] = solid_vertex(draw.apply_transform(transform, points[0] - bounds_min), color)
 			vertices[idx + 1] = solid_vertex(draw.apply_transform(transform, points[i] - bounds_min), color)
 			vertices[idx + 2] = solid_vertex(draw.apply_transform(transform, points[i + 1] - bounds_min), color)
 		}
 	}
 	draw.prepare_shape(layer, vertices)
 }
 triangle_strip :: proc(
 	layer: ^draw.Layer,
 	points: []draw.Vec2,
 	color: draw.Color,
 	origin: draw.Vec2 = {},
 	rotation: f32 = 0,
 	temp_allocator := context.temp_allocator,
 ) {
 	if len(points) < 3 do return
 	triangle_count := len(points) - 2
 	vertex_count := triangle_count * 3
 	vertices := make([]draw.Vertex, vertex_count, temp_allocator)
 	defer delete(vertices, temp_allocator)
 	if !draw.needs_transform(origin, rotation) {
 		for i in 0 ..< triangle_count {
 			idx := i * 3
 			if i % 2 == 0 {
 				vertices[idx + 0] = solid_vertex(points[i], color)
 				vertices[idx + 1] = solid_vertex(points[i + 1], color)
 				vertices[idx + 2] = solid_vertex(points[i + 2], color)
 			} else {
 				vertices[idx + 0] = solid_vertex(points[i + 1], color)
 				vertices[idx + 1] = solid_vertex(points[i], color)
 				vertices[idx + 2] = solid_vertex(points[i + 2], color)
 			}
 		}
 	} else {
 		bounds_min := draw.Vec2{max(f32), max(f32)}
 		for point in points {
 			bounds_min.x = min(bounds_min.x, point.x)
 			bounds_min.y = min(bounds_min.y, point.y)
 		}
 		transform := draw.build_pivot_rotation(bounds_min, origin, rotation)
 		for i in 0 ..< triangle_count {
 			idx := i * 3
 			if i % 2 == 0 {
 				vertices[idx + 0] = solid_vertex(draw.apply_transform(transform, points[i] - bounds_min), color)
 				vertices[idx + 1] = solid_vertex(draw.apply_transform(transform, points[i + 1] - bounds_min), color)
 				vertices[idx + 2] = solid_vertex(draw.apply_transform(transform, points[i + 2] - bounds_min), color)
 			} else {
 				vertices[idx + 0] = solid_vertex(draw.apply_transform(transform, points[i + 1] - bounds_min), color)
 				vertices[idx + 1] = solid_vertex(draw.apply_transform(transform, points[i] - bounds_min), color)
 				vertices[idx + 2] = solid_vertex(draw.apply_transform(transform, points[i + 2] - bounds_min), color)
 			}
 		}
 	}
 	draw.prepare_shape(layer, vertices)
 }
@@ -79,7 +79,7 @@ register_font :: proc(bytes: []u8) -> (id: Font_Id, ok: bool) #optional_ok {
 Text :: struct {
 	sdl_text: ^sdl_ttf.Text,
-	position: [2]f32,
+	position: Vec2,
 	color:    Color,
 }
@@ -129,16 +129,17 @@ cache_get_or_update :: proc(key: Cache_Key, c_str: cstring, font: ^sdl_ttf.Font)
 text :: proc(
 	layer: ^Layer,
 	text_string: string,
-	position: [2]f32,
+	position: Vec2,
 	font_id: Font_Id,
-	font_size: u16 = 44,
+	font_size: u16 = DFT_FONT_SIZE,
-	color: Color = BLACK,
+	color: Color = DFT_TEXT_COLOR,
-	origin: [2]f32 = {0, 0},
+	origin: Vec2 = {},
 	rotation: f32 = 0,
 	id: Maybe(u32) = nil,
 	temp_allocator := context.temp_allocator,
 ) {
 	c_str := strings.clone_to_cstring(text_string, temp_allocator)
 	defer delete(c_str, temp_allocator)
 	sdl_text: ^sdl_ttf.Text
 	cached := false
@@ -176,10 +177,11 @@ text :: proc(
 measure_text :: proc(
 	text_string: string,
 	font_id: Font_Id,
-	font_size: u16 = 44,
+	font_size: u16 = DFT_FONT_SIZE,
 	allocator := context.temp_allocator,
-) -> [2]f32 {
+) -> Vec2 {
 	c_str := strings.clone_to_cstring(text_string, allocator)
 	defer delete(c_str, allocator)
 	width, height: c.int
 	if !sdl_ttf.GetStringSize(get_font(font_id, font_size), c_str, 0, &width, &height) {
 		log.panicf("Failed to measure text: %s", sdl.GetError())
@@ -191,46 +193,46 @@ measure_text :: proc(
 // ----- Text anchor helpers -----------
 // ---------------------------------------------------------------------------------------------------------------------
-center_of_text :: proc(text_string: string, font_id: Font_Id, font_size: u16 = 44) -> [2]f32 {
+center_of_text :: proc(text_string: string, font_id: Font_Id, font_size: u16 = DFT_FONT_SIZE) -> Vec2 {
 	size := measure_text(text_string, font_id, font_size)
 	return size * 0.5
 }
-top_left_of_text :: proc(text_string: string, font_id: Font_Id, font_size: u16 = 44) -> [2]f32 {
+top_left_of_text :: proc(text_string: string, font_id: Font_Id, font_size: u16 = DFT_FONT_SIZE) -> Vec2 {
 	return {0, 0}
 }
-top_of_text :: proc(text_string: string, font_id: Font_Id, font_size: u16 = 44) -> [2]f32 {
+top_of_text :: proc(text_string: string, font_id: Font_Id, font_size: u16 = DFT_FONT_SIZE) -> Vec2 {
 	size := measure_text(text_string, font_id, font_size)
 	return {size.x * 0.5, 0}
 }
-top_right_of_text :: proc(text_string: string, font_id: Font_Id, font_size: u16 = 44) -> [2]f32 {
+top_right_of_text :: proc(text_string: string, font_id: Font_Id, font_size: u16 = DFT_FONT_SIZE) -> Vec2 {
 	size := measure_text(text_string, font_id, font_size)
 	return {size.x, 0}
 }
-left_of_text :: proc(text_string: string, font_id: Font_Id, font_size: u16 = 44) -> [2]f32 {
+left_of_text :: proc(text_string: string, font_id: Font_Id, font_size: u16 = DFT_FONT_SIZE) -> Vec2 {
 	size := measure_text(text_string, font_id, font_size)
 	return {0, size.y * 0.5}
 }
-right_of_text :: proc(text_string: string, font_id: Font_Id, font_size: u16 = 44) -> [2]f32 {
+right_of_text :: proc(text_string: string, font_id: Font_Id, font_size: u16 = DFT_FONT_SIZE) -> Vec2 {
 	size := measure_text(text_string, font_id, font_size)
 	return {size.x, size.y * 0.5}
 }
-bottom_left_of_text :: proc(text_string: string, font_id: Font_Id, font_size: u16 = 44) -> [2]f32 {
+bottom_left_of_text :: proc(text_string: string, font_id: Font_Id, font_size: u16 = DFT_FONT_SIZE) -> Vec2 {
 	size := measure_text(text_string, font_id, font_size)
 	return {0, size.y}
 }
-bottom_of_text :: proc(text_string: string, font_id: Font_Id, font_size: u16 = 44) -> [2]f32 {
+bottom_of_text :: proc(text_string: string, font_id: Font_Id, font_size: u16 = DFT_FONT_SIZE) -> Vec2 {
 	size := measure_text(text_string, font_id, font_size)
 	return {size.x * 0.5, size.y}
 }
-bottom_right_of_text :: proc(text_string: string, font_id: Font_Id, font_size: u16 = 44) -> [2]f32 {
+bottom_right_of_text :: proc(text_string: string, font_id: Font_Id, font_size: u16 = DFT_FONT_SIZE) -> Vec2 {
 	size := measure_text(text_string, font_id, font_size)
 	return size
 }
@@ -244,7 +246,7 @@ bottom_right_of_text :: proc(text_string: string, font_id: Font_Id, font_size: u
 // After calling this, subsequent text draws with an `id` will re-create their cache entries.
 clear_text_cache :: proc() {
 	for _, sdl_text in GLOB.text_cache.cache {
-		sdl_ttf.DestroyText(sdl_text)
+		append(&GLOB.pending_text_releases, sdl_text)
 	}
 	clear(&GLOB.text_cache.cache)
 }
@@ -257,7 +259,7 @@ clear_text_cache_entry :: proc(id: u32) {
 	key := Cache_Key{id, .Custom}
 	sdl_text, ok := GLOB.text_cache.cache[key]
 	if ok {
-		sdl_ttf.DestroyText(sdl_text)
+		append(&GLOB.pending_text_releases, sdl_text)
 		delete_key(&GLOB.text_cache.cache, key)
 	}
 }
@@ -0,0 +1,414 @@
 package draw
 import "core:log"
 import "core:mem"
 import sdl "vendor:sdl3"
 Texture_Id :: distinct u32
 INVALID_TEXTURE :: Texture_Id(0) // Slot 0 is reserved/unused
 Texture_Kind :: enum u8 {
 	Static, // Uploaded once, never changes (QR codes, decoded PNGs, icons)
 	Dynamic, // Updatable via update_texture_region
 	Stream, // Frequent full re-uploads (video, procedural)
 }
 Sampler_Preset :: enum u8 {
 	Nearest_Clamp,
 	Linear_Clamp,
 	Nearest_Repeat,
 	Linear_Repeat,
 }
 SAMPLER_PRESET_COUNT :: 4
 Fit_Mode :: enum u8 {
 	Stretch, // Fill rect, may distort aspect ratio (default)
 	Fit, // Preserve aspect, letterbox (may leave margins)
 	Fill, // Preserve aspect, center-crop (may crop edges)
 	Tile, // Repeat at native texture size
 	Center, // 1:1 pixel size, centered, no scaling
 }
 Texture_Desc :: struct {
 	width:           u32,
 	height:          u32,
 	depth_or_layers: u32,
 	type:            sdl.GPUTextureType,
 	format:          sdl.GPUTextureFormat,
 	usage:           sdl.GPUTextureUsageFlags,
 	mip_levels:      u32,
 	kind:            Texture_Kind,
 }
 // Internal slot — not exported.
@(private)
 Texture_Slot :: struct {
 	gpu_texture: ^sdl.GPUTexture,
 	desc:        Texture_Desc,
 	generation:  u32,
 }
 // State stored in GLOB
 // This file references:
 //   GLOB.device                 : ^sdl.GPUDevice
 //   GLOB.texture_slots          : [dynamic]Texture_Slot
 //   GLOB.texture_free_list      : [dynamic]u32
 //   GLOB.pending_texture_releases : [dynamic]Texture_Id
 //   GLOB.samplers               : [SAMPLER_PRESET_COUNT]^sdl.GPUSampler
 Clay_Image_Data :: struct {
 	texture_id: Texture_Id,
 	fit:        Fit_Mode,
 	tint:       Color,
 }
 clay_image_data :: proc(id: Texture_Id, fit: Fit_Mode = .Stretch, tint: Color = WHITE) -> Clay_Image_Data {
 	return {texture_id = id, fit = fit, tint = tint}
 }
 // ---------------------------------------------------------------------------------------------------------------------
 // ----- Registration -------------
 // ---------------------------------------------------------------------------------------------------------------------
 // Register a texture. Draw owns the GPU resource and releases it on unregister.
 // `data` is tightly-packed row-major bytes matching desc.format.
 // The caller may free `data` immediately after this proc returns.
@(require_results)
 register_texture :: proc(desc: Texture_Desc, data: []u8) -> (id: Texture_Id, ok: bool) {
 	device := GLOB.device
 	if device == nil {
 		log.error("register_texture called before draw.init()")
 		return INVALID_TEXTURE, false
 	}
 	assert(desc.width > 0, "Texture_Desc.width must be > 0")
 	assert(desc.height > 0, "Texture_Desc.height must be > 0")
 	assert(desc.depth_or_layers > 0, "Texture_Desc.depth_or_layers must be > 0")
 	assert(desc.mip_levels > 0, "Texture_Desc.mip_levels must be > 0")
 	assert(desc.usage != {}, "Texture_Desc.usage must not be empty (e.g. {.SAMPLER})")
 	// Create the GPU texture
 	gpu_texture := sdl.CreateGPUTexture(
 		device,
 		sdl.GPUTextureCreateInfo {
 			type = desc.type,
 			format = desc.format,
 			usage = desc.usage,
 			width = desc.width,
 			height = desc.height,
 			layer_count_or_depth = desc.depth_or_layers,
 			num_levels = desc.mip_levels,
 			sample_count = ._1,
 		},
 	)
 	if gpu_texture == nil {
 		log.errorf("Failed to create GPU texture (%dx%d): %s", desc.width, desc.height, sdl.GetError())
 		return INVALID_TEXTURE, false
 	}
 	// Upload pixel data via a transfer buffer
 	if len(data) > 0 {
 		data_size := u32(len(data))
 		transfer := sdl.CreateGPUTransferBuffer(
 			device,
 			sdl.GPUTransferBufferCreateInfo{usage = .UPLOAD, size = data_size},
 		)
 		if transfer == nil {
 			log.errorf("Failed to create texture transfer buffer: %s", sdl.GetError())
 			sdl.ReleaseGPUTexture(device, gpu_texture)
 			return INVALID_TEXTURE, false
 		}
 		defer sdl.ReleaseGPUTransferBuffer(device, transfer)
 		mapped := sdl.MapGPUTransferBuffer(device, transfer, false)
 		if mapped == nil {
 			log.errorf("Failed to map texture transfer buffer: %s", sdl.GetError())
 			sdl.ReleaseGPUTexture(device, gpu_texture)
 			return INVALID_TEXTURE, false
 		}
 		mem.copy(mapped, raw_data(data), int(data_size))
 		sdl.UnmapGPUTransferBuffer(device, transfer)
 		cmd_buffer := sdl.AcquireGPUCommandBuffer(device)
 		if cmd_buffer == nil {
 			log.errorf("Failed to acquire command buffer for texture upload: %s", sdl.GetError())
 			sdl.ReleaseGPUTexture(device, gpu_texture)
 			return INVALID_TEXTURE, false
 		}
 		copy_pass := sdl.BeginGPUCopyPass(cmd_buffer)
 		sdl.UploadToGPUTexture(
 			copy_pass,
 			sdl.GPUTextureTransferInfo{transfer_buffer = transfer},
 			sdl.GPUTextureRegion{texture = gpu_texture, w = desc.width, h = desc.height, d = desc.depth_or_layers},
 			false,
 		)
 		sdl.EndGPUCopyPass(copy_pass)
 		if !sdl.SubmitGPUCommandBuffer(cmd_buffer) {
 			log.errorf("Failed to submit texture upload: %s", sdl.GetError())
 			sdl.ReleaseGPUTexture(device, gpu_texture)
 			return INVALID_TEXTURE, false
 		}
 	}
 	// Allocate a slot (reuse from free list or append)
 	slot_index: u32
 	if len(GLOB.texture_free_list) > 0 {
 		slot_index = pop(&GLOB.texture_free_list)
 		GLOB.texture_slots[slot_index] = Texture_Slot {
 			gpu_texture = gpu_texture,
 			desc        = desc,
 			generation  = GLOB.texture_slots[slot_index].generation + 1,
 		}
 	} else {
 		slot_index = u32(len(GLOB.texture_slots))
 		append(&GLOB.texture_slots, Texture_Slot{gpu_texture = gpu_texture, desc = desc, generation = 1})
 	}
 	return Texture_Id(slot_index), true
 }
 // Queue a texture for release at the end of the current frame.
 // The GPU resource is not freed immediately — see "Deferred release" in the README.
 unregister_texture :: proc(id: Texture_Id) {
 	if id == INVALID_TEXTURE do return
 	append(&GLOB.pending_texture_releases, id)
 }
 // Re-upload a sub-region of a Dynamic texture.
 update_texture_region :: proc(id: Texture_Id, region: Rectangle, data: []u8) {
 	if id == INVALID_TEXTURE do return
 	slot := &GLOB.texture_slots[u32(id)]
 	if slot.gpu_texture == nil do return
 	device := GLOB.device
 	data_size := u32(len(data))
 	if data_size == 0 do return
 	transfer := sdl.CreateGPUTransferBuffer(
 		device,
 		sdl.GPUTransferBufferCreateInfo{usage = .UPLOAD, size = data_size},
 	)
 	if transfer == nil {
 		log.errorf("Failed to create transfer buffer for texture region update: %s", sdl.GetError())
 		return
 	}
 	defer sdl.ReleaseGPUTransferBuffer(device, transfer)
 	mapped := sdl.MapGPUTransferBuffer(device, transfer, false)
 	if mapped == nil {
 		log.errorf("Failed to map transfer buffer for texture region update: %s", sdl.GetError())
 		return
 	}
 	mem.copy(mapped, raw_data(data), int(data_size))
 	sdl.UnmapGPUTransferBuffer(device, transfer)
 	cmd_buffer := sdl.AcquireGPUCommandBuffer(device)
 	if cmd_buffer == nil {
 		log.errorf("Failed to acquire command buffer for texture region update: %s", sdl.GetError())
 		return
 	}
 	copy_pass := sdl.BeginGPUCopyPass(cmd_buffer)
 	sdl.UploadToGPUTexture(
 		copy_pass,
 		sdl.GPUTextureTransferInfo{transfer_buffer = transfer},
 		sdl.GPUTextureRegion {
 			texture = slot.gpu_texture,
 			x = u32(region.x),
 			y = u32(region.y),
 			w = u32(region.width),
 			h = u32(region.height),
 			d = 1,
 		},
 		false,
 	)
 	sdl.EndGPUCopyPass(copy_pass)
 	if !sdl.SubmitGPUCommandBuffer(cmd_buffer) {
 		log.errorf("Failed to submit texture region update: %s", sdl.GetError())
 	}
 }
 // ---------------------------------------------------------------------------------------------------------------------
 // ----- Helpers -------------
 // ---------------------------------------------------------------------------------------------------------------------
 // Compute UV rect, recommended sampler, and inner rect for a given fit mode.
 // `rect` is the target drawing area; `texture_id` identifies the texture whose
 // pixel dimensions are looked up via texture_size().
 // For Fit mode, `inner_rect` is smaller than `rect` (centered). For all other modes, `inner_rect == rect`.
 fit_params :: proc(
 	fit: Fit_Mode,
 	rect: Rectangle,
 	texture_id: Texture_Id,
 ) -> (
 	uv_rect: Rectangle,
 	sampler: Sampler_Preset,
 	inner_rect: Rectangle,
 ) {
 	size := texture_size(texture_id)
 	texture_width := f32(size.x)
 	texture_height := f32(size.y)
 	rect_width := rect.width
 	rect_height := rect.height
 	inner_rect = rect
 	if texture_width == 0 || texture_height == 0 || rect_width == 0 || rect_height == 0 {
 		return {0, 0, 1, 1}, .Linear_Clamp, inner_rect
 	}
 	texture_aspect := texture_width / texture_height
 	rect_aspect := rect_width / rect_height
 	switch fit {
 	case .Stretch: return {0, 0, 1, 1}, .Linear_Clamp, inner_rect
 	case .Fill: if texture_aspect > rect_aspect {
 				// Texture wider than rect — crop sides
 				scale := rect_aspect / texture_aspect
 				margin := (1 - scale) * 0.5
 				return {margin, 0, 1 - margin, 1}, .Linear_Clamp, inner_rect
 			} else {
 				// Texture taller than rect — crop top/bottom
 				scale := texture_aspect / rect_aspect
 				margin := (1 - scale) * 0.5
 				return {0, margin, 1, 1 - margin}, .Linear_Clamp, inner_rect
 			}
 	case .Fit:
 		// Preserve aspect, fit inside rect. Returns a shrunken inner_rect.
 		if texture_aspect > rect_aspect {
 			// Image wider — letterbox top/bottom
 			fit_height := rect_width / texture_aspect
 			padding := (rect_height - fit_height) * 0.5
 			inner_rect = Rectangle{rect.x, rect.y + padding, rect_width, fit_height}
 		} else {
 			// Image taller — letterbox left/right
 			fit_width := rect_height * texture_aspect
 			padding := (rect_width - fit_width) * 0.5
 			inner_rect = Rectangle{rect.x + padding, rect.y, fit_width, rect_height}
 		}
 		return {0, 0, 1, 1}, .Linear_Clamp, inner_rect
 	case .Tile:
 		uv_width := rect_width / texture_width
 		uv_height := rect_height / texture_height
 		return {0, 0, uv_width, uv_height}, .Linear_Repeat, inner_rect
 	case .Center:
 		u_half := rect_width / (2 * texture_width)
 		v_half := rect_height / (2 * texture_height)
 		return {0.5 - u_half, 0.5 - v_half, 0.5 + u_half, 0.5 + v_half}, .Nearest_Clamp, inner_rect
 	}
 	return {0, 0, 1, 1}, .Linear_Clamp, inner_rect
 }
 texture_size :: proc(id: Texture_Id) -> [2]u32 {
 	if id == INVALID_TEXTURE do return {0, 0}
 	slot := &GLOB.texture_slots[u32(id)]
 	return {slot.desc.width, slot.desc.height}
 }
 texture_format :: proc(id: Texture_Id) -> sdl.GPUTextureFormat {
 	if id == INVALID_TEXTURE do return .INVALID
 	return GLOB.texture_slots[u32(id)].desc.format
 }
 texture_kind :: proc(id: Texture_Id) -> Texture_Kind {
 	if id == INVALID_TEXTURE do return .Static
 	return GLOB.texture_slots[u32(id)].desc.kind
 }
 // Internal: get the raw GPU texture pointer for binding during draw.
@(private)
 texture_gpu_handle :: proc(id: Texture_Id) -> ^sdl.GPUTexture {
 	if id == INVALID_TEXTURE do return nil
 	idx := u32(id)
 	if idx >= u32(len(GLOB.texture_slots)) do return nil
 	return GLOB.texture_slots[idx].gpu_texture
 }
 // Deferred release (called from draw.end / clear_global)
@(private)
 process_pending_texture_releases :: proc() {
 	device := GLOB.device
 	for id in GLOB.pending_texture_releases {
 		idx := u32(id)
 		if idx >= u32(len(GLOB.texture_slots)) do continue
 		slot := &GLOB.texture_slots[idx]
 		if slot.gpu_texture != nil {
 			sdl.ReleaseGPUTexture(device, slot.gpu_texture)
 			slot.gpu_texture = nil
 		}
 		slot.generation += 1
 		append(&GLOB.texture_free_list, idx)
 	}
 	clear(&GLOB.pending_texture_releases)
 }
@(private)
 get_sampler :: proc(preset: Sampler_Preset) -> ^sdl.GPUSampler {
 	idx := int(preset)
 	if GLOB.samplers[idx] != nil do return GLOB.samplers[idx]
 	// Lazily create
 	min_filter, mag_filter: sdl.GPUFilter
 	address_mode: sdl.GPUSamplerAddressMode
 	switch preset {
 	case .Nearest_Clamp:
 		min_filter = .NEAREST; mag_filter = .NEAREST; address_mode = .CLAMP_TO_EDGE
 	case .Linear_Clamp:
 		min_filter = .LINEAR; mag_filter = .LINEAR; address_mode = .CLAMP_TO_EDGE
 	case .Nearest_Repeat:
 		min_filter = .NEAREST; mag_filter = .NEAREST; address_mode = .REPEAT
 	case .Linear_Repeat:
 		min_filter = .LINEAR; mag_filter = .LINEAR; address_mode = .REPEAT
 	}
 	sampler := sdl.CreateGPUSampler(
 		GLOB.device,
 		sdl.GPUSamplerCreateInfo {
 			min_filter = min_filter,
 			mag_filter = mag_filter,
 			mipmap_mode = .LINEAR,
 			address_mode_u = address_mode,
 			address_mode_v = address_mode,
 			address_mode_w = address_mode,
 		},
 	)
 	if sampler == nil {
 		log.errorf("Failed to create sampler preset %v: %s", preset, sdl.GetError())
 		return GLOB.pipeline_2d_base.sampler // fallback to existing default sampler
 	}
 	GLOB.samplers[idx] = sampler
 	return sampler
 }
 // Internal: destroy all sampler pool entries. Called from draw.destroy().
@(private)
 destroy_sampler_pool :: proc() {
 	device := GLOB.device
 	for &s in GLOB.samplers {
 		if s != nil {
 			sdl.ReleaseGPUSampler(device, s)
 			s = nil
 		}
 	}
 }
 // Internal: destroy all registered textures. Called from draw.destroy().
@(private)
 destroy_all_textures :: proc() {
 	device := GLOB.device
 	for &slot in GLOB.texture_slots {
 		if slot.gpu_texture != nil {
 			sdl.ReleaseGPUTexture(device, slot.gpu_texture)
 			slot.gpu_texture = nil
 		}
 	}
 	delete(GLOB.texture_slots)
 	delete(GLOB.texture_free_list)
 	delete(GLOB.pending_texture_releases)
 }
@@ -2,6 +2,7 @@ package many_bits
 import "base:builtin"
 import "base:intrinsics"
 import "base:runtime"
 import "core:fmt"
 import "core:slice"
@@ -25,15 +26,20 @@ Bits :: struct {
 	length:    int, // Total number of bits being stored
 }
-delete :: proc(bits: Bits, allocator := context.allocator) {
+destroy :: proc(bits: Bits, allocator := context.allocator) -> runtime.Allocator_Error {
-	delete_slice(bits.int_array, allocator)
+	return delete_slice(bits.int_array, allocator)
 }
-make :: proc(#any_int length: int, allocator := context.allocator) -> Bits {
+create :: proc(
-	return Bits {
+	#any_int length: int,
-		int_array = make_slice([]Int_Bits, ((length - 1) >> INDEX_SHIFT) + 1, allocator),
+	allocator := context.allocator,
-		length = length,
+) -> (
-	}
+	bits: Bits,
 	err: runtime.Allocator_Error,
 ) #optional_allocator_error {
 	bits.int_array, err = make_slice([]Int_Bits, ((length - 1) >> INDEX_SHIFT) + 1, allocator)
 	bits.length = length
 	return bits, err
 }
 // Sets all bits to 0 (false)
@@ -507,8 +513,8 @@ import "core:testing"
@(test)
 test_set :: proc(t: ^testing.T) {
-	bits := make(128)
+	bits := create(128)
-	defer delete(bits)
+	defer destroy(bits)
 	set(bits, 0, true)
 	testing.expect_value(t, bits.int_array[0], Int_Bits{0})
@@ -524,8 +530,8 @@ test_set :: proc(t: ^testing.T) {
@(test)
 test_get :: proc(t: ^testing.T) {
-	bits := make(128)
+	bits := create(128)
-	defer delete(bits)
+	defer destroy(bits)
 	// Default is false
 	testing.expect(t, !get(bits, 0))
@@ -560,8 +566,8 @@ test_get :: proc(t: ^testing.T) {
@(test)
 test_set_true_set_false :: proc(t: ^testing.T) {
-	bits := make(128)
+	bits := create(128)
-	defer delete(bits)
+	defer destroy(bits)
 	// set_true within first uint
 	set_true(bits, 0)
@@ -605,8 +611,8 @@ all_true_test :: proc(t: ^testing.T) {
 	uint_max := UINT_MAX
 	all_ones := transmute(Int_Bits)uint_max
-	bits := make(132)
+	bits := create(132)
-	defer delete(bits)
+	defer destroy(bits)
 	bits.int_array[0] = all_ones
 	bits.int_array[1] = all_ones
@@ -616,8 +622,8 @@ all_true_test :: proc(t: ^testing.T) {
 	bits.int_array[2] = {0, 1, 2}
 	testing.expect(t, !all_true(bits))
-	bits2 := make(1)
+	bits2 := create(1)
-	defer delete(bits2)
+	defer destroy(bits2)
 	bits2.int_array[0] = {0}
 	testing.expect(t, all_true(bits2))
@@ -628,8 +634,8 @@ test_range_true :: proc(t: ^testing.T) {
 	uint_max := UINT_MAX
 	all_ones := transmute(Int_Bits)uint_max
-	bits := make(192)
+	bits := create(192)
-	defer delete(bits)
+	defer destroy(bits)
 	// Empty range is vacuously true
 	testing.expect(t, range_true(bits, 0, 0))
@@ -676,7 +682,7 @@ test_range_true :: proc(t: ^testing.T) {
@(test)
 nearest_true_handles_same_word_and_boundaries :: proc(t: ^testing.T) {
-	bits := make(128, context.temp_allocator)
+	bits := create(128, context.temp_allocator)
 	set_true(bits, 0)
 	set_true(bits, 10)
@@ -710,7 +716,7 @@ nearest_true_handles_same_word_and_boundaries :: proc(t: ^testing.T) {
@(test)
 nearest_false_handles_same_word_and_boundaries :: proc(t: ^testing.T) {
-	bits := make(128, context.temp_allocator)
+	bits := create(128, context.temp_allocator)
 	// Start with all bits true, then clear a few to false.
 	for i := 0; i < bits.length; i += 1 {
@@ -749,7 +755,7 @@ nearest_false_handles_same_word_and_boundaries :: proc(t: ^testing.T) {
@(test)
 nearest_false_scans_across_words_and_returns_false_when_all_true :: proc(t: ^testing.T) {
-	bits := make(192, context.temp_allocator)
+	bits := create(192, context.temp_allocator)
 	// Start with all bits true, then clear a couple far apart.
 	for i := 0; i < bits.length; i += 1 {
@@ -773,7 +779,7 @@ nearest_false_scans_across_words_and_returns_false_when_all_true :: proc(t: ^tes
@(test)
 nearest_true_scans_across_words_and_returns_false_when_empty :: proc(t: ^testing.T) {
-	bits := make(192, context.temp_allocator)
+	bits := create(192, context.temp_allocator)
 	set_true(bits, 5)
 	set_true(bits, 130)
@@ -790,7 +796,7 @@ nearest_true_scans_across_words_and_returns_false_when_empty :: proc(t: ^testing
@(test)
 nearest_false_handles_last_word_partial_length :: proc(t: ^testing.T) {
-	bits := make(130, context.temp_allocator)
+	bits := create(130, context.temp_allocator)
 	// Start with all bits true, then clear the first and last valid bits.
 	for i := 0; i < bits.length; i += 1 {
@@ -811,7 +817,7 @@ nearest_false_handles_last_word_partial_length :: proc(t: ^testing.T) {
@(test)
 nearest_true_handles_last_word_partial_length :: proc(t: ^testing.T) {
-	bits := make(130, context.temp_allocator)
+	bits := create(130, context.temp_allocator)
 	set_true(bits, 0)
 	set_true(bits, 129)
@@ -828,7 +834,7 @@ nearest_true_handles_last_word_partial_length :: proc(t: ^testing.T) {
@(test)
 iterator_basic_mixed_bits :: proc(t: ^testing.T) {
 	// Use non-word-aligned length to test partial last word handling
-	bits := make(100, context.temp_allocator)
+	bits := create(100, context.temp_allocator)
 	// Set specific bits: 0, 3, 64, 99 (last valid index)
 	set_true(bits, 0)
@@ -903,7 +909,7 @@ iterator_basic_mixed_bits :: proc(t: ^testing.T) {
@(test)
 iterator_all_false_bits :: proc(t: ^testing.T) {
 	// Use non-word-aligned length
-	bits := make(100, context.temp_allocator)
+	bits := create(100, context.temp_allocator)
 	// All bits default to false, no need to set anything
 	// Test iterate - should return all 100 bits as false
@@ -944,7 +950,7 @@ iterator_all_false_bits :: proc(t: ^testing.T) {
@(test)
 iterator_all_true_bits :: proc(t: ^testing.T) {
 	// Use non-word-aligned length
-	bits := make(100, context.temp_allocator)
+	bits := create(100, context.temp_allocator)
 	// Set all bits to true
 	for i := 0; i < bits.length; i += 1 {
 		set_true(bits, i)
@@ -1,6 +1,8 @@
 package meta
 import "core:fmt"
 import "core:log"
 import "core:mem"
 import "core:os"
 Command :: struct {
@@ -20,6 +22,48 @@ COMMANDS :: []Command {
 }
 main :: proc() {
 	//----- General setup ----------------------------------
 	when ODIN_DEBUG {
 		// Temp
 		track_temp: mem.Tracking_Allocator
 		mem.tracking_allocator_init(&track_temp, context.temp_allocator)
 		context.temp_allocator = mem.tracking_allocator(&track_temp)
 		// Default
 		track: mem.Tracking_Allocator
 		mem.tracking_allocator_init(&track, context.allocator)
 		context.allocator = mem.tracking_allocator(&track)
 		// Log a warning about any memory that was not freed by the end of the program.
 		// This could be fine for some global state or it could be a memory leak.
 		defer {
 			// Temp allocator
 			if len(track_temp.bad_free_array) > 0 {
 				fmt.eprintf("=== %v incorrect frees - temp allocator: ===\n", len(track_temp.bad_free_array))
 				for entry in track_temp.bad_free_array {
 					fmt.eprintf("- %p @ %v\n", entry.memory, entry.location)
 				}
 				mem.tracking_allocator_destroy(&track_temp)
 			}
 			// Default allocator
 			if len(track.allocation_map) > 0 {
 				fmt.eprintf("=== %v allocations not freed - main allocator: ===\n", len(track.allocation_map))
 				for _, entry in track.allocation_map {
 					fmt.eprintf("- %v bytes @ %v\n", entry.size, entry.location)
 				}
 			}
 			if len(track.bad_free_array) > 0 {
 				fmt.eprintf("=== %v incorrect frees - main allocator: ===\n", len(track.bad_free_array))
 				for entry in track.bad_free_array {
 					fmt.eprintf("- %p @ %v\n", entry.memory, entry.location)
 				}
 			}
 			mem.tracking_allocator_destroy(&track)
 		}
 		// Logger
 		context.logger = log.create_console_logger()
 		defer log.destroy_console_logger(context.logger)
 	}
 	args := os.args[1:]
 	if len(args) == 0 {
@@ -4,7 +4,8 @@
 package phased_executor
 import "base:intrinsics"
-import q "core:container/queue"
+import "base:runtime"
 import que "core:container/queue"
 import "core:prof/spall"
 import "core:sync"
 import "core:thread"
@@ -18,7 +19,7 @@ DEFT_SPIN_LIMIT :: 2_500_000
 Harness :: struct($T: typeid) where intrinsics.type_has_nil(T) {
 	mutex:      sync.Mutex,
 	condition:  sync.Cond,
-	cmd_queue:  q.Queue(T),
+	cmd_queue:  que.Queue(T),
 	spin:       bool,
 	lock:       levsync.Spinlock,
 	_pad:       [64 - size_of(uint)]u8, // We want join_count to have its own cache line
@@ -42,13 +43,13 @@ Executor :: struct($T: typeid) where intrinsics.type_has_nil(T) {
 }
 //TODO: Provide a way to set some aspects of context for the executor threads. Namely a logger.
-init_executor :: proc(
+init :: proc(
 	executor: ^Executor($T),
 	#any_int num_threads: int,
 	$on_command_received: proc(command: T),
 	#any_int spin_limit: uint = DEFT_SPIN_LIMIT,
 	allocator := context.allocator,
-) {
+) -> runtime.Allocator_Error {
 	was_initialized, _ := intrinsics.atomic_compare_exchange_strong_explicit(
 		&executor.initialized,
 		false,
@@ -60,9 +61,9 @@ init_executor :: proc(
 	slave_task := build_task(on_command_received)
 	executor.spin_limit = spin_limit
-	executor.harnesses = make([]Harness(T), num_threads, allocator)
+	executor.harnesses = make([]Harness(T), num_threads, allocator) or_return
 	for &harness in executor.harnesses {
-		q.init(&harness.cmd_queue, allocator = allocator)
+		que.init(&harness.cmd_queue, allocator = allocator) or_return
 		harness.spin = true
 	}
@@ -72,11 +73,11 @@ init_executor :: proc(
 	}
 	thread.pool_start(&executor.thread_pool)
-	return
+	return nil
 }
 // Cleanly shuts down all executor tasks then destroys the executor
-destroy_executor :: proc(executor: ^Executor($T), allocator := context.allocator) {
+destroy :: proc(executor: ^Executor($T), allocator := context.allocator) -> runtime.Allocator_Error {
 	was_initialized, _ := intrinsics.atomic_compare_exchange_strong_explicit(
 		&executor.initialized,
 		true,
@@ -90,7 +91,7 @@ destroy_executor :: proc(executor: ^Executor($T), allocator := context.allocator
 	for &harness in executor.harnesses {
 		for {
 			if levsync.try_lock(&harness.lock) {
-				q.push_back(&harness.cmd_queue, nil)
+				que.push_back(&harness.cmd_queue, nil)
 				if !harness.spin {
 					sync.mutex_lock(&harness.mutex)
 					sync.cond_signal(&harness.condition)
@@ -105,9 +106,11 @@ destroy_executor :: proc(executor: ^Executor($T), allocator := context.allocator
 	thread.pool_join(&executor.thread_pool)
 	thread.pool_destroy(&executor.thread_pool)
 	for &harness in executor.harnesses {
-		q.destroy(&harness.cmd_queue)
+		que.destroy(&harness.cmd_queue)
 	}
-	delete(executor.harnesses, allocator)
+	delete(executor.harnesses, allocator) or_return
 	return nil
 }
 build_task :: proc(
@@ -131,10 +134,10 @@ build_task :: proc(
 			spin_count: uint = 0
 			spin_loop: for {
 				if levsync.try_lock(&harness.lock) {
-					if q.len(harness.cmd_queue) > 0 {
+					if que.len(harness.cmd_queue) > 0 {
 						// Execute command
-						command := q.pop_front(&harness.cmd_queue)
+						command := que.pop_front(&harness.cmd_queue)
 						levsync.unlock(&harness.lock)
 						if command == nil do return
 						on_command_received(command)
@@ -163,7 +166,7 @@ build_task :: proc(
 					defer intrinsics.cpu_relax()
 					if levsync.try_lock(&harness.lock) {
 						defer levsync.unlock(&harness.lock)
-						if q.len(harness.cmd_queue) > 0 {
+						if que.len(harness.cmd_queue) > 0 {
 							harness.spin = true
 							break cond_loop
 						} else {
@@ -190,9 +193,9 @@ exec_command :: proc(executor: ^Executor($T), command: T) {
 		}
 		harness := &executor.harnesses[executor.harness_index]
 		if levsync.try_lock(&harness.lock) {
-			if q.len(harness.cmd_queue) <= executor.cmd_queue_floor {
+			if que.len(harness.cmd_queue) <= executor.cmd_queue_floor {
-				q.push_back(&harness.cmd_queue, command)
+				que.push_back(&harness.cmd_queue, command)
-				executor.cmd_queue_floor = q.len(harness.cmd_queue)
+				executor.cmd_queue_floor = que.len(harness.cmd_queue)
 				slave_sleeping := !harness.spin
 				// Must release lock before signalling to avoid race from slave spurious wakeup
 				levsync.unlock(&harness.lock)
@@ -258,7 +261,7 @@ stress_test_executor :: proc(t: ^testing.T) {
 	defer free(exec_counts)
 	executor: Executor(Stress_Cmd)
-	init_executor(&executor, STRESS_NUM_THREADS, stress_handler, spin_limit = 500)
+	init(&executor, STRESS_NUM_THREADS, stress_handler, spin_limit = 500)
 	for round in 0 ..< STRESS_NUM_ROUNDS {
 		base := round * STRESS_CMDS_PER_ROUND
@@ -281,6 +284,6 @@ stress_test_executor :: proc(t: ^testing.T) {
 	// Explicitly destroy to verify clean shutdown.
 	// If destroy_executor returns, all threads received the nil sentinel and exited,
 	// and thread.pool_join completed without deadlock.
-	destroy_executor(&executor)
+	destroy(&executor)
 	testing.expect(t, !executor.initialized, "Executor still marked initialized after destroy")
 }
@@ -1,39 +1,32 @@
 package examples
 import "core:fmt"
 import "core:log"
 import "core:mem"
 import "core:os"
 import qr ".."
 main :: proc() {
-	//----- Tracking allocator ----------------------------------
+	//----- General setup ----------------------------------
 	{
 		tracking_temp_allocator := false
 		// Temp
 		track_temp: mem.Tracking_Allocator
-		if tracking_temp_allocator {
+		mem.tracking_allocator_init(&track_temp, context.temp_allocator)
-			mem.tracking_allocator_init(&track_temp, context.temp_allocator)
+		context.temp_allocator = mem.tracking_allocator(&track_temp)
-			context.temp_allocator = mem.tracking_allocator(&track_temp)
+
 		}
 		// Default
 		track: mem.Tracking_Allocator
 		mem.tracking_allocator_init(&track, context.allocator)
 		context.allocator = mem.tracking_allocator(&track)
 		// Log a warning about any memory that was not freed by the end of the program.
 		// This could be fine for some global state or it could be a memory leak.
 		defer {
 			// Temp allocator
-			if tracking_temp_allocator {
+			if len(track_temp.bad_free_array) > 0 {
-				if len(track_temp.allocation_map) > 0 {
+				fmt.eprintf("=== %v incorrect frees - temp allocator: ===\n", len(track_temp.bad_free_array))
-					fmt.eprintf("=== %v allocations not freed - temp allocator: ===\n", len(track_temp.allocation_map))
+				for entry in track_temp.bad_free_array {
-					for _, entry in track_temp.allocation_map {
+					fmt.eprintf("- %p @ %v\n", entry.memory, entry.location)
 						fmt.eprintf("- %v bytes @ %v\n", entry.size, entry.location)
 					}
 				}
 				if len(track_temp.bad_free_array) > 0 {
 					fmt.eprintf("=== %v incorrect frees - temp allocator: ===\n", len(track_temp.bad_free_array))
 					for entry in track_temp.bad_free_array {
 						fmt.eprintf("- %p @ %v\n", entry.memory, entry.location)
 					}
 				}
 				mem.tracking_allocator_destroy(&track_temp)
 			}
@@ -52,6 +45,9 @@ main :: proc() {
 			}
 			mem.tracking_allocator_destroy(&track)
 		}
 		// Logger
 		context.logger = log.create_console_logger()
 		defer log.destroy_console_logger(context.logger)
 	}
 	args := os.args
@@ -73,57 +69,32 @@ main :: proc() {
 	}
 }
 // -------------------------------------------------------------------------------------------------
 // Utilities
 // -------------------------------------------------------------------------------------------------
 // Prints the given QR Code to the console.
 print_qr :: proc(qrcode: []u8) {
 	size := qr.get_size(qrcode)
 	border :: 4
 	for y in -border ..< size + border {
 		for x in -border ..< size + border {
 			fmt.print("##" if qr.get_module(qrcode, x, y) else "  ")
 		}
 		fmt.println()
 	}
 	fmt.println()
 }
 // -------------------------------------------------------------------------------------------------
 // Demo: Basic
 // -------------------------------------------------------------------------------------------------
 // Creates a single QR Code, then prints it to the console.
 basic :: proc() {
 	text :: "Hello, world!"
 	ecl :: qr.Ecc.Low
 	qrcode: [qr.BUFFER_LEN_MAX]u8
-	ok := qr.encode(text, qrcode[:], ecl)
+	ok := qr.encode_auto(text, qrcode[:], ecl)
 	if ok do print_qr(qrcode[:])
 }
 // -------------------------------------------------------------------------------------------------
 // Demo: Variety
 // -------------------------------------------------------------------------------------------------
 // Creates a variety of QR Codes that exercise different features of the library.
 variety :: proc() {
 	qrcode: [qr.BUFFER_LEN_MAX]u8
 	{ 	// Numeric mode encoding (3.33 bits per digit)
-		ok := qr.encode("314159265358979323846264338327950288419716939937510", qrcode[:], qr.Ecc.Medium)
+		ok := qr.encode_auto("314159265358979323846264338327950288419716939937510", qrcode[:], qr.Ecc.Medium)
 		if ok do print_qr(qrcode[:])
 	}
 	{ 	// Alphanumeric mode encoding (5.5 bits per character)
-		ok := qr.encode("DOLLAR-AMOUNT:$39.87 PERCENTAGE:100.00% OPERATIONS:+-*/", qrcode[:], qr.Ecc.High)
+		ok := qr.encode_auto("DOLLAR-AMOUNT:$39.87 PERCENTAGE:100.00% OPERATIONS:+-*/", qrcode[:], qr.Ecc.High)
 		if ok do print_qr(qrcode[:])
 	}
 	{ 	// Unicode text as UTF-8
-		ok := qr.encode(
+		ok := qr.encode_auto(
 			"\xE3\x81\x93\xE3\x82\x93\xE3\x81\xAB\xE3\x81\xA1wa\xE3\x80\x81" +
 			"\xE4\xB8\x96\xE7\x95\x8C\xEF\xBC\x81\x20\xCE\xB1\xCE\xB2\xCE\xB3\xCE\xB4",
 			qrcode[:],
@@ -133,7 +104,7 @@ variety :: proc() {
 	}
 	{ 	// Moderately large QR Code using longer text (from Lewis Carroll's Alice in Wonderland)
-		ok := qr.encode(
+		ok := qr.encode_auto(
 			"Alice was beginning to get very tired of sitting by her sister on the bank, " +
 			"and of having nothing to do: once or twice she had peeped into the book her sister was reading, " +
 			"but it had no pictures or conversations in it, 'and what is the use of a book,' thought Alice " +
@@ -148,10 +119,6 @@ variety :: proc() {
 	}
 }
 // -------------------------------------------------------------------------------------------------
 // Demo: Segment
 // -------------------------------------------------------------------------------------------------
 // Creates QR Codes with manually specified segments for better compactness.
 segment :: proc() {
 	qrcode: [qr.BUFFER_LEN_MAX]u8
@@ -163,7 +130,7 @@ segment :: proc() {
 		// Encode as single text (auto mode selection)
 		{
 			concat :: silver0 + silver1
-			ok := qr.encode(concat, qrcode[:], qr.Ecc.Low)
+			ok := qr.encode_auto(concat, qrcode[:], qr.Ecc.Low)
 			if ok do print_qr(qrcode[:])
 		}
@@ -172,7 +139,7 @@ segment :: proc() {
 			seg_buf0: [qr.BUFFER_LEN_MAX]u8
 			seg_buf1: [qr.BUFFER_LEN_MAX]u8
 			segs := [2]qr.Segment{qr.make_alphanumeric(silver0, seg_buf0[:]), qr.make_numeric(silver1, seg_buf1[:])}
-			ok := qr.encode(segs[:], qr.Ecc.Low, qrcode[:])
+			ok := qr.encode_auto(segs[:], qr.Ecc.Low, qrcode[:])
 			if ok do print_qr(qrcode[:])
 		}
 	}
@@ -185,7 +152,7 @@ segment :: proc() {
 		// Encode as single text (auto mode selection)
 		{
 			concat :: golden0 + golden1 + golden2
-			ok := qr.encode(concat, qrcode[:], qr.Ecc.Low)
+			ok := qr.encode_auto(concat, qrcode[:], qr.Ecc.Low)
 			if ok do print_qr(qrcode[:])
 		}
@@ -201,7 +168,7 @@ segment :: proc() {
 				qr.make_numeric(golden1, seg_buf1[:]),
 				qr.make_alphanumeric(golden2, seg_buf2[:]),
 			}
-			ok := qr.encode(segs[:], qr.Ecc.Low, qrcode[:])
+			ok := qr.encode_auto(segs[:], qr.Ecc.Low, qrcode[:])
 			if ok do print_qr(qrcode[:])
 		}
 	}
@@ -219,7 +186,7 @@ segment :: proc() {
 				"\xEF\xBD\x84\xEF\xBD\x85\xEF\xBD\x93\xEF" +
 				"\xBD\x95\xE3\x80\x80\xCE\xBA\xCE\xB1\xEF" +
 				"\xBC\x9F"
-			ok := qr.encode(madoka, qrcode[:], qr.Ecc.Low)
+			ok := qr.encode_auto(madoka, qrcode[:], qr.Ecc.Low)
 			if ok do print_qr(qrcode[:])
 		}
@@ -254,16 +221,12 @@ segment :: proc() {
 			seg.data = seg_buf[:(seg.bit_length + 7) / 8]
 			segs := [1]qr.Segment{seg}
-			ok := qr.encode(segs[:], qr.Ecc.Low, qrcode[:])
+			ok := qr.encode_auto(segs[:], qr.Ecc.Low, qrcode[:])
 			if ok do print_qr(qrcode[:])
 		}
 	}
 }
 // -------------------------------------------------------------------------------------------------
 // Demo: Mask
 // -------------------------------------------------------------------------------------------------
 // Creates QR Codes with the same size and contents but different mask patterns.
 mask :: proc() {
 	qrcode: [qr.BUFFER_LEN_MAX]u8
@@ -271,10 +234,10 @@ mask :: proc() {
 	{ 	// Project Nayuki URL
 		ok: bool
-		ok = qr.encode("https://www.nayuki.io/", qrcode[:], qr.Ecc.High)
+		ok = qr.encode_auto("https://www.nayuki.io/", qrcode[:], qr.Ecc.High)
 		if ok do print_qr(qrcode[:])
-		ok = qr.encode("https://www.nayuki.io/", qrcode[:], qr.Ecc.High, mask = qr.Mask.M3)
+		ok = qr.encode_auto("https://www.nayuki.io/", qrcode[:], qr.Ecc.High, mask = qr.Mask.M3)
 		if ok do print_qr(qrcode[:])
 	}
@@ -290,16 +253,29 @@ mask :: proc() {
 		ok: bool
-		ok = qr.encode(text, qrcode[:], qr.Ecc.Medium, mask = qr.Mask.M0)
+		ok = qr.encode_auto(text, qrcode[:], qr.Ecc.Medium, mask = qr.Mask.M0)
 		if ok do print_qr(qrcode[:])
-		ok = qr.encode(text, qrcode[:], qr.Ecc.Medium, mask = qr.Mask.M1)
+		ok = qr.encode_auto(text, qrcode[:], qr.Ecc.Medium, mask = qr.Mask.M1)
 		if ok do print_qr(qrcode[:])
-		ok = qr.encode(text, qrcode[:], qr.Ecc.Medium, mask = qr.Mask.M5)
+		ok = qr.encode_auto(text, qrcode[:], qr.Ecc.Medium, mask = qr.Mask.M5)
 		if ok do print_qr(qrcode[:])
-		ok = qr.encode(text, qrcode[:], qr.Ecc.Medium, mask = qr.Mask.M7)
+		ok = qr.encode_auto(text, qrcode[:], qr.Ecc.Medium, mask = qr.Mask.M7)
 		if ok do print_qr(qrcode[:])
 	}
 }
 // Prints the given QR Code to the console.
 print_qr :: proc(qrcode: []u8) {
 	size := qr.get_size(qrcode)
 	border :: 4
 	for y in -border ..< size + border {
 		for x in -border ..< size + border {
 			fmt.print("##" if qr.get_module(qrcode, x, y) else "  ")
 		}
 		fmt.println()
 	}
 	fmt.println()
 }
@@ -2,10 +2,30 @@ package qrcode
 import "core:slice"
 VERSION_MIN :: 1
 VERSION_MAX :: 40
-// -------------------------------------------------------------------------------------------------
+// The worst-case number of bytes needed to store one QR Code, up to and including version 40.
-// Types
+BUFFER_LEN_MAX :: 3918 // buffer_len_for_version(VERSION_MAX)
-// -------------------------------------------------------------------------------------------------
+
 // Returns the number of bytes needed to store any QR Code up to and including the given version.
 buffer_len_for_version :: #force_inline proc(n: int) -> int {
 	size := n * 4 + 17
 	return (size * size + 7) / 8 + 1
 }
@(private)
 LENGTH_OVERFLOW :: -1
@(private)
 REED_SOLOMON_DEGREE_MAX :: 30
@(private)
 PENALTY_N1 :: 3
@(private)
 PENALTY_N2 :: 3
@(private)
 PENALTY_N3 :: 40
@(private)
 PENALTY_N4 :: 10
 // The error correction level in a QR Code symbol.
 Ecc :: enum {
@@ -44,39 +64,6 @@ Segment :: struct {
 	bit_length: int,
 }
 // -------------------------------------------------------------------------------------------------
 // Constants
 // -------------------------------------------------------------------------------------------------
 VERSION_MIN :: 1
 VERSION_MAX :: 40
 // The worst-case number of bytes needed to store one QR Code, up to and including version 40.
 BUFFER_LEN_MAX :: 3918 // buffer_len_for_version(VERSION_MAX)
 // Returns the number of bytes needed to store any QR Code up to and including the given version.
 buffer_len_for_version :: #force_inline proc(n: int) -> int {
 	size := n * 4 + 17
 	return (size * size + 7) / 8 + 1
 }
 // -------------------------------------------------------------------------------------------------
 // Private constants
 // -------------------------------------------------------------------------------------------------
@(private)
 LENGTH_OVERFLOW :: -1
@(private)
 REED_SOLOMON_DEGREE_MAX :: 30
@(private)
 PENALTY_N1 :: 3
@(private)
 PENALTY_N2 :: 3
@(private)
 PENALTY_N3 :: 40
@(private)
 PENALTY_N4 :: 10
 //odinfmt: disable
 // For generating error correction codes. Index 0 is padding (set to illegal value).
@(private)
@@ -96,10 +83,9 @@ NUM_ERROR_CORRECTION_BLOCKS := [4][41]i8{
 }
 //odinfmt: enable
-
+// ---------------------------------------------------------------------------------------------------------------------
-// -------------------------------------------------------------------------------------------------
+// ----- Encode Procedures ------------------------
-// Encode procedures
+// ---------------------------------------------------------------------------------------------------------------------
 // -------------------------------------------------------------------------------------------------
 // Encodes the given text string to a QR Code, automatically selecting
 // numeric, alphanumeric, or byte mode based on content.
@@ -117,7 +103,7 @@ NUM_ERROR_CORRECTION_BLOCKS := [4][41]i8{
 //   - The text cannot fit in any version within [min_version, max_version] at the given ECL.
 //   - The encoded segment data exceeds the buffer capacity.
@(require_results)
-encode_text_explicit_temp :: proc(
+encode_text_manual :: proc(
 	text: string,
 	temp_buffer, qrcode: []u8,
 	ecl: Ecc,
@@ -130,7 +116,7 @@ encode_text_explicit_temp :: proc(
 ) {
 	text_len := len(text)
 	if text_len == 0 {
-		return encode_segments_advanced_explicit_temp(
+		return encode_segments_advanced_manual(
 			nil,
 			ecl,
 			min_version,
@@ -162,7 +148,7 @@ encode_text_explicit_temp :: proc(
 			seg.data = temp_buffer[:text_len]
 		}
 		segs := [1]Segment{seg}
-		return encode_segments_advanced_explicit_temp(
+		return encode_segments_advanced_manual(
 			segs[:],
 			ecl,
 			min_version,
@@ -211,13 +197,9 @@ encode_text_auto :: proc(
 		return false
 	}
 	defer delete(temp_buffer, temp_allocator)
-	return encode_text_explicit_temp(text, temp_buffer, qrcode, ecl, min_version, max_version, mask, boost_ecl)
+	return encode_text_manual(text, temp_buffer, qrcode, ecl, min_version, max_version, mask, boost_ecl)
 }
 encode_text :: proc {
 	encode_text_explicit_temp,
 	encode_text_auto,
 }
 // Encodes arbitrary binary data to a QR Code using byte mode.
 //
@@ -234,7 +216,7 @@ encode_text :: proc {
 // Returns ok=false when:
 //   - The payload cannot fit in any version within [min_version, max_version] at the given ECL.
@(require_results)
-encode_binary :: proc(
+encode_binary_manual :: proc(
 	data_and_temp: []u8,
 	data_len: int,
 	qrcode: []u8,
@@ -256,7 +238,7 @@ encode_binary :: proc(
 	seg.num_chars = data_len
 	seg.data = data_and_temp[:data_len]
 	segs := [1]Segment{seg}
-	return encode_segments_advanced(
+	return encode_segments_advanced_manual(
 		segs[:],
 		ecl,
 		min_version,
@@ -268,6 +250,55 @@ encode_binary :: proc(
 	)
 }
 // Encodes arbitrary binary data to a QR Code using byte mode,
 // automatically allocating and freeing the temp buffer.
 //
 // Parameters:
 //   bin_data       - [in]  Payload bytes (aliased by the internal segment; not modified).
 //   qrcode         - [out] On success, contains the encoded QR Code. On failure, qrcode[0] is
 //                          set to 0.
 //   temp_allocator - Allocator used for the internal scratch buffer. Freed before return.
 //
 // qrcode must have length >= buffer_len_for_version(max_version).
 //
 // Returns ok=false when:
 //   - The payload cannot fit in any version within [min_version, max_version] at the given ECL.
 //   - The temp_allocator fails to allocate.
@(require_results)
 encode_binary_auto :: proc(
 	bin_data: []u8,
 	qrcode: []u8,
 	ecl: Ecc,
 	min_version: int = VERSION_MIN,
 	max_version: int = VERSION_MAX,
 	mask: Maybe(Mask) = nil,
 	boost_ecl: bool = true,
 	temp_allocator := context.temp_allocator,
 ) -> (
 	ok: bool,
 ) {
 	seg: Segment
 	seg.mode = .Byte
 	seg.bit_length = calc_segment_bit_length(.Byte, len(bin_data))
 	if seg.bit_length == LENGTH_OVERFLOW {
 		qrcode[0] = 0
 		return false
 	}
 	seg.num_chars = len(bin_data)
 	seg.data = bin_data
 	segs := [1]Segment{seg}
 	return encode_segments_advanced_auto(
 		segs[:],
 		ecl,
 		min_version,
 		max_version,
 		mask,
 		boost_ecl,
 		qrcode,
 		temp_allocator,
 	)
 }
 // Encodes the given segments to a QR Code using default parameters
 // (VERSION_MIN..VERSION_MAX, auto mask, boost ECL).
 //
@@ -282,17 +313,8 @@ encode_binary :: proc(
 // Returns ok=false when:
 //   - The total segment data exceeds the capacity of version 40 at the given ECL.
@(require_results)
-encode_segments_explicit_temp :: proc(segs: []Segment, ecl: Ecc, temp_buffer, qrcode: []u8) -> (ok: bool) {
+encode_segments_manual :: proc(segs: []Segment, ecl: Ecc, temp_buffer, qrcode: []u8) -> (ok: bool) {
-	return encode_segments_advanced_explicit_temp(
+	return encode_segments_advanced_manual(segs, ecl, VERSION_MIN, VERSION_MAX, nil, true, temp_buffer, qrcode)
 		segs,
 		ecl,
 		VERSION_MIN,
 		VERSION_MAX,
 		nil,
 		true,
 		temp_buffer,
 		qrcode,
 	)
 }
 // Encodes segments to a QR Code using default parameters, automatically allocating the temp buffer.
@@ -328,13 +350,9 @@ encode_segments_auto :: proc(
 		return false
 	}
 	defer delete(temp_buffer, temp_allocator)
-	return encode_segments_explicit_temp(segs, ecl, temp_buffer, qrcode)
+	return encode_segments_manual(segs, ecl, temp_buffer, qrcode)
 }
 encode_segments :: proc {
 	encode_segments_explicit_temp,
 	encode_segments_auto,
 }
 // Encodes the given segments to a QR Code with full control over version range, mask, and ECL boosting.
 //
@@ -353,7 +371,7 @@ encode_segments :: proc {
 //   - The total segment data exceeds the capacity of every version in [min_version, max_version]
 //     at the given ECL.
@(require_results)
-encode_segments_advanced_explicit_temp :: proc(
+encode_segments_advanced_manual :: proc(
 	segs: []Segment,
 	ecl: Ecc,
 	min_version, max_version: int,
@@ -490,7 +508,7 @@ encode_segments_advanced_auto :: proc(
 		return false
 	}
 	defer delete(temp_buffer, temp_allocator)
-	return encode_segments_advanced_explicit_temp(
+	return encode_segments_advanced_manual(
 		segs,
 		ecl,
 		min_version,
@@ -502,24 +520,24 @@ encode_segments_advanced_auto :: proc(
 	)
 }
-encode_segments_advanced :: proc {
+encode_manual :: proc {
-	encode_segments_advanced_explicit_temp,
+	encode_text_manual,
-	encode_segments_advanced_auto,
+	encode_binary_manual,
 	encode_segments_manual,
 	encode_segments_advanced_manual,
 }
-encode :: proc {
+encode_auto :: proc {
 	encode_text_explicit_temp,
 	encode_text_auto,
-	encode_binary,
+	encode_binary_auto,
 	encode_segments_explicit_temp,
 	encode_segments_auto,
 	encode_segments_advanced_explicit_temp,
 	encode_segments_advanced_auto,
 }
-// -------------------------------------------------------------------------------------------------
+
-// Error correction code generation
+// ---------------------------------------------------------------------------------------------------------------------
-// -------------------------------------------------------------------------------------------------
+// ----- Error Correction Code Generation ------------------------
 // ---------------------------------------------------------------------------------------------------------------------
 // Appends error correction bytes to each block of data, then interleaves bytes from all blocks.
@(private)
@@ -587,10 +605,6 @@ get_num_raw_data_modules :: proc(ver: int) -> int {
 	return result
 }
 // -------------------------------------------------------------------------------------------------
 // Reed-Solomon ECC generator
 // -------------------------------------------------------------------------------------------------
@(private)
 reed_solomon_compute_divisor :: proc(degree: int, result: []u8) {
 	assert(1 <= degree && degree <= REED_SOLOMON_DEGREE_MAX, "reed-solomon degree out of range")
@@ -637,9 +651,9 @@ reed_solomon_multiply :: proc(x, y: u8) -> u8 {
 	return z
 }
-// -------------------------------------------------------------------------------------------------
+// ---------------------------------------------------------------------------------------------------------------------
-// Drawing function modules
+// ----- Drawing Function Modules ------------------------
-// -------------------------------------------------------------------------------------------------
+// ---------------------------------------------------------------------------------------------------------------------
 // Clears the QR Code grid and marks every function module as dark.
@(private)
@@ -785,9 +799,9 @@ fill_rectangle :: proc(left, top, width, height: int, qrcode: []u8) {
 	}
 }
-// -------------------------------------------------------------------------------------------------
+// ---------------------------------------------------------------------------------------------------------------------
-// Drawing data modules and masking
+// ----- Drawing data modules and masking ------------------------
-// -------------------------------------------------------------------------------------------------
+// ---------------------------------------------------------------------------------------------------------------------
@(private)
 draw_codewords :: proc(data: []u8, data_len: int, qrcode: []u8) {
@@ -965,9 +979,9 @@ finder_penalty_add_history :: proc(current_run_length: int, run_history: ^[7]int
 	run_history[0] = current_run_length
 }
-// -------------------------------------------------------------------------------------------------
+// ---------------------------------------------------------------------------------------------------------------------
-// Basic QR Code information
+// ----- Basic QR code information ------------------------
-// -------------------------------------------------------------------------------------------------
+// ---------------------------------------------------------------------------------------------------------------------
 // Returns the minimum buffer size (in bytes) needed for both temp_buffer and qrcode
 // to encode the given content at the given ECC level within the given version range.
@@ -981,7 +995,7 @@ min_buffer_size :: proc {
 	min_buffer_size_segments,
 }
-// Text path: auto-selects numeric/alphanumeric/byte mode the same way encode_text does.
+// Text path: auto-selects numeric/alphanumeric/byte mode the same way encode_text_manual does.
 //
 // Returns ok=false when:
 //   - The text exceeds QR Code capacity for every version in the range at the given ECL.
@@ -1127,9 +1141,9 @@ get_bit :: #force_inline proc(x: int, i: uint) -> bool {
 	return ((x >> i) & 1) != 0
 }
-// -------------------------------------------------------------------------------------------------
+// ---------------------------------------------------------------------------------------------------------------------
-// Segment handling
+// ----- Segment Handling ------------------------
-// -------------------------------------------------------------------------------------------------
+// ---------------------------------------------------------------------------------------------------------------------
 // Tests whether the given string can be encoded in numeric mode.
 is_numeric :: proc(text: string) -> bool {
@@ -1162,7 +1176,6 @@ calc_segment_buffer_size :: proc(mode: Mode, num_chars: int) -> int {
 	return (temp + 7) / 8
 }
@(private)
 calc_segment_bit_length :: proc(mode: Mode, num_chars: int) -> int {
 	if num_chars < 0 || num_chars > 32767 {
 		return LENGTH_OVERFLOW
@@ -1319,11 +1332,11 @@ make_eci :: proc(assign_val: int, buf: []u8) -> Segment {
 	return result
 }
-// -------------------------------------------------------------------------------------------------
+// ---------------------------------------------------------------------------------------------------------------------
-// Private helpers
+// ----- Helpers ------------------------
-// -------------------------------------------------------------------------------------------------
+// ---------------------------------------------------------------------------------------------------------------------
-@(private)
+// Internal
 append_bits_to_buffer :: proc(val: uint, num_bits: int, buffer: []u8, bit_len: ^int) {
 	assert(0 <= num_bits && num_bits <= 16 && val >> uint(num_bits) == 0, "invalid bit count or value overflow")
 	for i := num_bits - 1; i >= 0; i -= 1 {
@@ -1332,7 +1345,7 @@ append_bits_to_buffer :: proc(val: uint, num_bits: int, buffer: []u8, bit_len: ^
 	}
 }
-@(private)
+// Internal
 get_total_bits :: proc(segs: []Segment, version: int) -> int {
 	result := 0
 	for &seg in segs {
@@ -1354,7 +1367,7 @@ get_total_bits :: proc(segs: []Segment, version: int) -> int {
 	return result
 }
-@(private)
+// Internal
 num_char_count_bits :: proc(mode: Mode, version: int) -> int {
 	assert(VERSION_MIN <= version && version <= VERSION_MAX, "version out of bounds")
 	i := (version + 7) / 17
@@ -1376,8 +1389,8 @@ num_char_count_bits :: proc(mode: Mode, version: int) -> int {
 	unreachable()
 }
 // Internal
 // Returns the index of c in the alphanumeric charset (0-44), or -1 if not found.
@(private)
 alphanumeric_index :: proc(c: u8) -> int {
 	switch c {
 	case '0' ..= '9': return int(c - '0')
@@ -2487,7 +2500,7 @@ test_min_buffer_size_text :: proc(t: ^testing.T) {
 		testing.expect(t, planned > 0)
 		qrcode: [BUFFER_LEN_MAX]u8
 		temp: [BUFFER_LEN_MAX]u8
-		ok := encode_text(text, temp[:], qrcode[:], Ecc.Low)
+		ok := encode_text_manual(text, temp[:], qrcode[:], Ecc.Low)
 		testing.expect(t, ok)
 		actual_version_size := get_size(qrcode[:])
 		actual_buf_len := buffer_len_for_version((actual_version_size - 17) / 4)
@@ -2538,7 +2551,7 @@ test_min_buffer_size_binary :: proc(t: ^testing.T) {
 	testing.expect(t, size > 0)
 	testing.expect(t, size <= buffer_len_for_version(2))
-	// Verify agreement with encode_binary
+	// Verify agreement with encode_binary_manual
 	{
 		data_len :: 100
 		planned, planned_ok := min_buffer_size(data_len, .Medium)
@@ -2549,7 +2562,7 @@ test_min_buffer_size_binary :: proc(t: ^testing.T) {
 		for i in 0 ..< data_len {
 			dat[i] = u8(i)
 		}
-		ok := encode_binary(dat[:], data_len, qrcode[:], .Medium)
+		ok := encode_binary_manual(dat[:], data_len, qrcode[:], .Medium)
 		testing.expect(t, ok)
 		actual_version_size := get_size(qrcode[:])
 		actual_buf_len := buffer_len_for_version((actual_version_size - 17) / 4)
@@ -2609,7 +2622,7 @@ test_min_buffer_size_segments :: proc(t: ^testing.T) {
 		// Verify against actual encode
 		qrcode: [BUFFER_LEN_MAX]u8
 		temp: [BUFFER_LEN_MAX]u8
-		ok := encode_segments(segs[:], Ecc.Low, temp[:], qrcode[:])
+		ok := encode_segments_manual(segs[:], Ecc.Low, temp[:], qrcode[:])
 		testing.expect(t, ok)
 		actual_version_size := get_size(qrcode[:])
 		actual_buf_len := buffer_len_for_version((actual_version_size - 17) / 4)
@@ -2631,7 +2644,7 @@ test_encode_text_auto :: proc(t: ^testing.T) {
 		text :: "Hello, world!"
 		qr_explicit: [BUFFER_LEN_MAX]u8
 		temp: [BUFFER_LEN_MAX]u8
-		ok_explicit := encode_text_explicit_temp(text, temp[:], qr_explicit[:], .Low)
+		ok_explicit := encode_text_manual(text, temp[:], qr_explicit[:], .Low)
 		testing.expect(t, ok_explicit)
 		qr_auto: [BUFFER_LEN_MAX]u8
@@ -2650,7 +2663,7 @@ test_encode_text_auto :: proc(t: ^testing.T) {
 		text :: "314159265358979323846264338327950288419716939937510"
 		qr_explicit: [BUFFER_LEN_MAX]u8
 		temp: [BUFFER_LEN_MAX]u8
-		ok_explicit := encode_text_explicit_temp(text, temp[:], qr_explicit[:], .Medium)
+		ok_explicit := encode_text_manual(text, temp[:], qr_explicit[:], .Medium)
 		testing.expect(t, ok_explicit)
 		qr_auto: [BUFFER_LEN_MAX]u8
@@ -2669,7 +2682,7 @@ test_encode_text_auto :: proc(t: ^testing.T) {
 		text :: "HELLO WORLD"
 		qr_explicit: [BUFFER_LEN_MAX]u8
 		temp: [BUFFER_LEN_MAX]u8
-		ok_explicit := encode_text_explicit_temp(text, temp[:], qr_explicit[:], .Quartile)
+		ok_explicit := encode_text_manual(text, temp[:], qr_explicit[:], .Quartile)
 		testing.expect(t, ok_explicit)
 		qr_auto: [BUFFER_LEN_MAX]u8
@@ -2695,7 +2708,7 @@ test_encode_text_auto :: proc(t: ^testing.T) {
 		text :: "https://www.nayuki.io/"
 		qr_explicit: [BUFFER_LEN_MAX]u8
 		temp: [BUFFER_LEN_MAX]u8
-		ok_explicit := encode_text_explicit_temp(text, temp[:], qr_explicit[:], .High, mask = .M3)
+		ok_explicit := encode_text_manual(text, temp[:], qr_explicit[:], .High, mask = .M3)
 		testing.expect(t, ok_explicit)
 		qr_auto: [BUFFER_LEN_MAX]u8
@@ -2732,7 +2745,7 @@ test_encode_segments_auto :: proc(t: ^testing.T) {
 		qr_explicit: [BUFFER_LEN_MAX]u8
 		temp: [BUFFER_LEN_MAX]u8
-		ok_explicit := encode_segments_explicit_temp(segs[:], .Low, temp[:], qr_explicit[:])
+		ok_explicit := encode_segments_manual(segs[:], .Low, temp[:], qr_explicit[:])
 		testing.expect(t, ok_explicit)
 		qr_auto: [BUFFER_LEN_MAX]u8
@@ -2764,7 +2777,7 @@ test_encode_segments_advanced_auto :: proc(t: ^testing.T) {
 		qr_explicit: [BUFFER_LEN_MAX]u8
 		temp: [BUFFER_LEN_MAX]u8
-		ok_explicit := encode_segments_advanced_explicit_temp(
+		ok_explicit := encode_segments_advanced_manual(
 			segs[:],
 			.Medium,
 			VERSION_MIN,
@@ -2795,7 +2808,7 @@ test_encode_segments_advanced_auto :: proc(t: ^testing.T) {
 		qr_explicit: [BUFFER_LEN_MAX]u8
 		temp: [BUFFER_LEN_MAX]u8
-		ok_explicit := encode_segments_advanced_explicit_temp(
+		ok_explicit := encode_segments_advanced_manual(
 			segs[:],
 			.High,
 			1,
@@ -1,103 +1,139 @@
 package ring
 import "base:runtime"
 import "core:fmt"
@(private)
 ODIN_BOUNDS_CHECK :: !ODIN_NO_BOUNDS_CHECK
-Ring :: struct($T: typeid) {
+Ring :: struct($E: typeid) {
-	data:            []T,
+	data:                  []E,
-	_end_index, len: int,
+	next_write_index, len: int,
 }
-Ring_Soa :: struct($T: typeid) {
+Ring_Soa :: struct($E: typeid) {
-	data:            #soa[]T,
+	data:                  #soa[]E,
-	_end_index, len: int,
+	next_write_index, len: int,
 }
-from_slice_raos :: #force_inline proc(data: $T/[]$E) -> Ring(E) {
+destroy_aos :: #force_inline proc(
-	return {data = data, _end_index = -1}
+	ring: ^Ring($E),
 	allocator := context.allocator,
 ) -> runtime.Allocator_Error {
 	return delete(ring.data)
 }
-from_slice_rsoa :: #force_inline proc(data: $T/#soa[]$E) -> Ring_Soa(E) {
+destroy_soa :: #force_inline proc(
-	return {data = data, _end_index = -1}
+	ring: ^Ring_Soa($E),
 	allocator := context.allocator,
 ) -> runtime.Allocator_Error {
 	return delete(ring.data)
 }
-from_slice :: proc {
+destroy :: proc {
-	from_slice_raos,
+	destroy_aos,
-	from_slice_rsoa,
+	destroy_soa,
 }
 create_aos :: #force_inline proc(
 	$E: typeid,
 	capacity: int,
 	allocator := context.allocator,
 ) -> (
 	ring: Ring(E),
 	err: runtime.Allocator_Error,
 ) #optional_allocator_error {
 	ring.data, err = make([]E, capacity, allocator)
 	return ring, err
 }
 create_soa :: #force_inline proc(
 	$E: typeid,
 	capacity: int,
 	allocator := context.allocator,
 ) -> (
 	ring: Ring_Soa(E),
 	err: runtime.Allocator_Error,
 ) #optional_allocator_error {
 	ring.data, err = make(#soa[]E, capacity, allocator)
 	return ring, err
 }
 // All contents of `data` will be completely ignored, `data` is treated as an empty slice.
 init_from_slice_aos :: #force_inline proc(ring: ^Ring($E), data: $T/[]E) {
 	ring.data = data
 	ring.len = 0
 	ring.next_write_index = 0
 	return
 }
 // All contents of `data` will be completely ignored, `data` is treated as an empty slice.
 init_from_slice_soa :: #force_inline proc(ring: ^Ring_Soa($E), data: $T/#soa[]E) {
 	ring.data = data
 	ring.len = 0
 	ring.next_write_index = 0
 	return
 }
 init_from_slice :: proc {
 	init_from_slice_aos,
 	init_from_slice_soa,
 }
 // Internal
 // Index in the backing array where the ring starts
-_start_index_raos :: proc(ring: Ring($T)) -> int {
+start_index_aos :: #force_inline proc(ring: Ring($E)) -> int {
-	if ring.len < len(ring.data) {
+	return ring.len < len(ring.data) ? 0 : ring.next_write_index
 		return 0
 	} else {
 		start_index := ring._end_index + 1
 		return 0 if start_index == len(ring.data) else start_index
 	}
 }
 // Internal
 // Index in the backing array where the ring starts
-_start_index_rsoa :: proc(ring: Ring_Soa($T)) -> int {
+start_index_soa :: #force_inline proc(ring: Ring_Soa($E)) -> int {
-	if ring.len < len(ring.data) {
+	return ring.len < len(ring.data) ? 0 : ring.next_write_index
 		return 0
 	} else {
 		start_index := ring._end_index + 1
 		return 0 if start_index == len(ring.data) else start_index
 	}
 }
-advance_raos :: proc(ring: ^Ring($T)) {
+advance_aos :: #force_inline proc(ring: ^Ring($E)) {
 	// Length
 	if ring.len != len(ring.data) do ring.len += 1
-	// End index
+	// Write index
-	if ring._end_index == len(ring.data) - 1 { 	// If we are at the end of the backing array
+	ring.next_write_index += 1
-		ring._end_index = 0 // Overflow end to 0
+	if ring.next_write_index == len(ring.data) do ring.next_write_index = 0
 	} else {
 		ring._end_index += 1
 	}
 }
-advance_rsoa :: proc(ring: ^Ring_Soa($T)) {
+advance_soa :: #force_inline proc(ring: ^Ring_Soa($E)) {
 	// Length
 	if ring.len != len(ring.data) do ring.len += 1
-	// End index
+	// Write index
-	if ring._end_index == len(ring.data) - 1 { 	// If we are at the end of the backing array
+	ring.next_write_index += 1
-		ring._end_index = 0 // Overflow end to 0
+	if ring.next_write_index == len(ring.data) do ring.next_write_index = 0
 	} else {
 		ring._end_index += 1
 	}
 }
 advance :: proc {
-	advance_raos,
+	advance_aos,
-	advance_rsoa,
+	advance_soa,
 }
-append_raos :: proc(ring: ^Ring($T), element: T) {
+append_aos :: #force_inline proc(ring: ^Ring($E), element: E) {
 	ring.data[ring.next_write_index] = element
 	advance(ring)
 	ring.data[ring._end_index] = element
 }
-append_rsoa :: proc(ring: ^Ring_Soa($T), element: T) {
+append_soa :: #force_inline proc(ring: ^Ring_Soa($E), element: E) {
 	ring.data[ring.next_write_index] = element
 	advance(ring)
 	ring.data[ring._end_index] = element
 }
 append :: proc {
-	append_raos,
+	append_aos,
-	append_rsoa,
+	append_soa,
 }
-get_raos :: proc(ring: Ring($T), index: int) -> ^T {
+get_aos :: #force_inline proc(ring: Ring($E), index: int) -> ^E {
 	when ODIN_BOUNDS_CHECK {
-		if index >= ring.len {
+		fmt.assertf(index < ring.len, "Ring index %i out of bounds for length %i", index, ring.len)
 			panic(fmt.tprintf("Ring index %i out of bounds for length %i", index, ring.len))
 		}
 	}
-	array_index := _start_index_raos(ring) + index
+	array_index := start_index_aos(ring) + index
 	if array_index < len(ring.data) {
 		return &ring.data[array_index]
 	} else {
@@ -107,14 +143,12 @@ get_raos :: proc(ring: Ring($T), index: int) -> ^T {
 }
 // SOA can't return soa pointer to parapoly T.
-get_rsoa :: proc(ring: Ring_Soa($T), index: int) -> T {
+get_soa :: #force_inline proc(ring: Ring_Soa($E), index: int) -> E {
 	when ODIN_BOUNDS_CHECK {
-		if index >= ring.len {
+		fmt.assertf(index < ring.len, "Ring index %i out of bounds for length %i", index, ring.len)
 			panic(fmt.tprintf("Ring index %i out of bounds for length %i", index, ring.len))
 		}
 	}
-	array_index := _start_index_rsoa(ring) + index
+	array_index := start_index_soa(ring) + index
 	if array_index < len(ring.data) {
 		return ring.data[array_index]
 	} else {
@@ -124,36 +158,36 @@ get_rsoa :: proc(ring: Ring_Soa($T), index: int) -> T {
 }
 get :: proc {
-	get_raos,
+	get_aos,
-	get_rsoa,
+	get_soa,
 }
-get_last_raos :: #force_inline proc(ring: Ring($T)) -> ^T {
+get_last_aos :: #force_inline proc(ring: Ring($E)) -> ^E {
 	return get(ring, ring.len - 1)
 }
-get_last_rsoa :: #force_inline proc(ring: Ring_Soa($T)) -> T {
+get_last_soa :: #force_inline proc(ring: Ring_Soa($E)) -> E {
 	return get(ring, ring.len - 1)
 }
 get_last :: proc {
-	get_last_raos,
+	get_last_aos,
-	get_last_rsoa,
+	get_last_soa,
 }
-clear_raos :: #force_inline proc "contextless" (ring: ^Ring($T)) {
+clear_aos :: #force_inline proc "contextless" (ring: ^Ring($E)) {
 	ring.len = 0
-	ring._end_index = -1
+	ring.next_write_index = 0
 }
-clear_rsoa :: #force_inline proc "contextless" (ring: ^Ring_Soa($T)) {
+clear_soa :: #force_inline proc "contextless" (ring: ^Ring_Soa($E)) {
 	ring.len = 0
-	ring._end_index = -1
+	ring.next_write_index = 0
 }
 clear :: proc {
-	clear_raos,
+	clear_aos,
-	clear_rsoa,
+	clear_soa,
 }
 // ---------------------------------------------------------------------------------------------------------------------
@@ -164,28 +198,27 @@ import "core:testing"
@(test)
 test_ring_aos :: proc(t: ^testing.T) {
-	data := make_slice([]int, 10)
+	ring := create_aos(int, 10)
-	ring := from_slice(data)
+	defer destroy(&ring)
 	defer delete(ring.data)
 	for i in 1 ..= 5 {
 		append(&ring, i)
 		log.debug("Length:", ring.len)
-		log.debug("Start index:", _start_index_raos(ring))
+		log.debug("Start index:", start_index_aos(ring))
-		log.debug("End index:", ring._end_index)
+		log.debug("Next write index:", ring.next_write_index)
 		log.debug(ring.data)
 	}
 	testing.expect_value(t, get(ring, 0)^, 1)
 	testing.expect_value(t, get(ring, 4)^, 5)
 	testing.expect_value(t, ring.len, 5)
-	testing.expect_value(t, ring._end_index, 4)
+	testing.expect_value(t, ring.next_write_index, 5)
-	testing.expect_value(t, _start_index_raos(ring), 0)
+	testing.expect_value(t, start_index_aos(ring), 0)
 	for i in 6 ..= 15 {
 		append(&ring, i)
 		log.debug("Length:", ring.len)
-		log.debug("Start index:", _start_index_raos(ring))
+		log.debug("Start index:", start_index_aos(ring))
-		log.debug("End index:", ring._end_index)
+		log.debug("Next write index:", ring.next_write_index)
 		log.debug(ring.data)
 	}
 	testing.expect_value(t, get(ring, 0)^, 6)
@@ -193,18 +226,18 @@ test_ring_aos :: proc(t: ^testing.T) {
 	testing.expect_value(t, get(ring, 9)^, 15)
 	testing.expect_value(t, get_last(ring)^, 15)
 	testing.expect_value(t, ring.len, 10)
-	testing.expect_value(t, ring._end_index, 4)
+	testing.expect_value(t, ring.next_write_index, 5)
-	testing.expect_value(t, _start_index_raos(ring), 5)
+	testing.expect_value(t, start_index_aos(ring), 5)
 	for i in 15 ..= 25 {
 		append(&ring, i)
 		log.debug("Length:", ring.len)
-		log.debug("Start index:", _start_index_raos(ring))
+		log.debug("Start index:", start_index_aos(ring))
-		log.debug("End index:", ring._end_index)
+		log.debug("Next write index:", ring.next_write_index)
 		log.debug(ring.data)
 	}
 	testing.expect_value(t, get(ring, 0)^, 16)
-	testing.expect_value(t, ring._end_index, 5)
+	testing.expect_value(t, ring.next_write_index, 6)
 	testing.expect_value(t, get_last(ring)^, 25)
 	clear(&ring)
@@ -219,28 +252,27 @@ test_ring_soa :: proc(t: ^testing.T) {
 		x, y: int,
 	}
-	data := make_soa_slice(#soa[]Ints, 10)
+	ring := create_soa(Ints, 10)
-	ring := from_slice(data)
+	defer destroy(&ring)
 	defer delete(ring.data)
 	for i in 1 ..= 5 {
 		append(&ring, Ints{i, i})
 		log.debug("Length:", ring.len)
-		log.debug("Start index:", _start_index_rsoa(ring))
+		log.debug("Start index:", start_index_soa(ring))
-		log.debug("End index:", ring._end_index)
+		log.debug("Next write index:", ring.next_write_index)
 		log.debug(ring.data)
 	}
 	testing.expect_value(t, get(ring, 0), Ints{1, 1})
 	testing.expect_value(t, get(ring, 4), Ints{5, 5})
 	testing.expect_value(t, ring.len, 5)
-	testing.expect_value(t, ring._end_index, 4)
+	testing.expect_value(t, ring.next_write_index, 5)
-	testing.expect_value(t, _start_index_rsoa(ring), 0)
+	testing.expect_value(t, start_index_soa(ring), 0)
 	for i in 6 ..= 15 {
 		append(&ring, Ints{i, i})
 		log.debug("Length:", ring.len)
-		log.debug("Start index:", _start_index_rsoa(ring))
+		log.debug("Start index:", start_index_soa(ring))
-		log.debug("End index:", ring._end_index)
+		log.debug("Next write index:", ring.next_write_index)
 		log.debug(ring.data)
 	}
 	testing.expect_value(t, get(ring, 0), Ints{6, 6})
@@ -248,18 +280,18 @@ test_ring_soa :: proc(t: ^testing.T) {
 	testing.expect_value(t, get(ring, 9), Ints{15, 15})
 	testing.expect_value(t, get_last(ring), Ints{15, 15})
 	testing.expect_value(t, ring.len, 10)
-	testing.expect_value(t, ring._end_index, 4)
+	testing.expect_value(t, ring.next_write_index, 5)
-	testing.expect_value(t, _start_index_rsoa(ring), 5)
+	testing.expect_value(t, start_index_soa(ring), 5)
 	for i in 15 ..= 25 {
 		append(&ring, Ints{i, i})
 		log.debug("Length:", ring.len)
-		log.debug("Start index:", _start_index_rsoa(ring))
+		log.debug("Start index:", start_index_soa(ring))
-		log.debug("End index:", ring._end_index)
+		log.debug("Next write index:", ring.next_write_index)
 		log.debug(ring.data)
 	}
 	testing.expect_value(t, get(ring, 0), Ints{16, 16})
-	testing.expect_value(t, ring._end_index, 5)
+	testing.expect_value(t, ring.next_write_index, 6)
 	testing.expect_value(t, get_last(ring), Ints{25, 25})
 	clear(&ring)
@@ -267,3 +299,141 @@ test_ring_soa :: proc(t: ^testing.T) {
 	testing.expect_value(t, ring.len, 1)
 	testing.expect_value(t, get(ring, 0), Ints{1, 1})
 }
@(test)
 test_ring_aos_init_from_slice :: proc(t: ^testing.T) {
 	// Stack-allocated backing with pre-existing garbage and odd capacity.
 	backing: [7]int = {99, 99, 99, 99, 99, 99, 99}
 	ring: Ring(int)
 	init_from_slice(&ring, backing[:])
 	// Empty ring invariants after init_from_slice.
 	testing.expect_value(t, ring.len, 0)
 	testing.expect_value(t, ring.next_write_index, 0)
 	testing.expect_value(t, start_index_aos(ring), 0)
 	// Partial fill (3 / 7).
 	for i in 1 ..= 3 do append(&ring, i)
 	testing.expect_value(t, ring.len, 3)
 	testing.expect_value(t, ring.next_write_index, 3)
 	testing.expect_value(t, start_index_aos(ring), 0)
 	testing.expect_value(t, get(ring, 0)^, 1)
 	testing.expect_value(t, get(ring, 2)^, 3)
 	testing.expect_value(t, get_last(ring)^, 3)
 	// Fill exactly to capacity. Pushing element 7 must make len == cap
 	// AND wrap next_write_index from 6 back to 0 in the same step.
 	for i in 4 ..= 7 do append(&ring, i)
 	testing.expect_value(t, ring.len, 7)
 	testing.expect_value(t, ring.next_write_index, 0)
 	testing.expect_value(t, start_index_aos(ring), 0)
 	testing.expect_value(t, get(ring, 0)^, 1)
 	testing.expect_value(t, get(ring, 6)^, 7)
 	testing.expect_value(t, get_last(ring)^, 7)
 	// First overwrite — oldest element shifts by one.
 	append(&ring, 8)
 	testing.expect_value(t, ring.len, 7)
 	testing.expect_value(t, ring.next_write_index, 1)
 	testing.expect_value(t, start_index_aos(ring), 1)
 	testing.expect_value(t, get(ring, 0)^, 2)
 	testing.expect_value(t, get(ring, 6)^, 8)
 	testing.expect_value(t, get_last(ring)^, 8)
 	// Stress: 3 more complete wrap cycles (21 more pushes).
 	// After 29 total pushes, ring contains the last 7 (23..=29),
 	// and next_write_index = 29 mod 7 = 1.
 	for i in 9 ..= 29 do append(&ring, i)
 	testing.expect_value(t, ring.len, 7)
 	testing.expect_value(t, ring.next_write_index, 1)
 	testing.expect_value(t, start_index_aos(ring), 1)
 	testing.expect_value(t, get(ring, 0)^, 23)
 	testing.expect_value(t, get(ring, 3)^, 26)
 	testing.expect_value(t, get(ring, 6)^, 29)
 	testing.expect_value(t, get_last(ring)^, 29)
 	// Clear returns ring to empty-equivalent state.
 	clear(&ring)
 	testing.expect_value(t, ring.len, 0)
 	testing.expect_value(t, ring.next_write_index, 0)
 	testing.expect_value(t, start_index_aos(ring), 0)
 	// Single-element edge case: get_last(len==1) routes through get(ring, 0).
 	append(&ring, 42)
 	testing.expect_value(t, ring.len, 1)
 	testing.expect_value(t, ring.next_write_index, 1)
 	testing.expect_value(t, get(ring, 0)^, 42)
 	testing.expect_value(t, get_last(ring)^, 42)
 }
@(test)
 test_ring_soa_init_from_slice :: proc(t: ^testing.T) {
 	Ints :: struct {
 		x, y: int,
 	}
 	// Stack-allocated backing with pre-existing garbage and odd capacity.
 	backing: #soa[7]Ints = {{99, 99}, {99, 99}, {99, 99}, {99, 99}, {99, 99}, {99, 99}, {99, 99}}
 	ring: Ring_Soa(Ints)
 	init_from_slice(&ring, backing[:])
 	// Empty ring invariants after init_from_slice.
 	testing.expect_value(t, ring.len, 0)
 	testing.expect_value(t, ring.next_write_index, 0)
 	testing.expect_value(t, start_index_soa(ring), 0)
 	// Partial fill (3 / 7).
 	for i in 1 ..= 3 do append(&ring, Ints{i, i})
 	testing.expect_value(t, ring.len, 3)
 	testing.expect_value(t, ring.next_write_index, 3)
 	testing.expect_value(t, start_index_soa(ring), 0)
 	testing.expect_value(t, get(ring, 0), Ints{1, 1})
 	testing.expect_value(t, get(ring, 2), Ints{3, 3})
 	testing.expect_value(t, get_last(ring), Ints{3, 3})
 	// Fill exactly to capacity. Pushing element 7 must make len == cap
 	// AND wrap next_write_index from 6 back to 0 in the same step.
 	for i in 4 ..= 7 do append(&ring, Ints{i, i})
 	testing.expect_value(t, ring.len, 7)
 	testing.expect_value(t, ring.next_write_index, 0)
 	testing.expect_value(t, start_index_soa(ring), 0)
 	testing.expect_value(t, get(ring, 0), Ints{1, 1})
 	testing.expect_value(t, get(ring, 6), Ints{7, 7})
 	testing.expect_value(t, get_last(ring), Ints{7, 7})
 	// First overwrite — oldest element shifts by one.
 	append(&ring, Ints{8, 8})
 	testing.expect_value(t, ring.len, 7)
 	testing.expect_value(t, ring.next_write_index, 1)
 	testing.expect_value(t, start_index_soa(ring), 1)
 	testing.expect_value(t, get(ring, 0), Ints{2, 2})
 	testing.expect_value(t, get(ring, 6), Ints{8, 8})
 	testing.expect_value(t, get_last(ring), Ints{8, 8})
 	// Stress: 3 more complete wrap cycles (21 more pushes).
 	// After 29 total pushes, ring contains the last 7 (23..=29),
 	// and next_write_index = 29 mod 7 = 1.
 	for i in 9 ..= 29 do append(&ring, Ints{i, i})
 	testing.expect_value(t, ring.len, 7)
 	testing.expect_value(t, ring.next_write_index, 1)
 	testing.expect_value(t, start_index_soa(ring), 1)
 	testing.expect_value(t, get(ring, 0), Ints{23, 23})
 	testing.expect_value(t, get(ring, 3), Ints{26, 26})
 	testing.expect_value(t, get(ring, 6), Ints{29, 29})
 	testing.expect_value(t, get_last(ring), Ints{29, 29})
 	// Clear returns ring to empty-equivalent state.
 	clear(&ring)
 	testing.expect_value(t, ring.len, 0)
 	testing.expect_value(t, ring.next_write_index, 0)
 	testing.expect_value(t, start_index_soa(ring), 0)
 	// Single-element edge case: get_last(len==1) routes through get(ring, 0).
 	append(&ring, Ints{42, 42})
 	testing.expect_value(t, ring.len, 1)
 	testing.expect_value(t, ring.next_write_index, 1)
 	testing.expect_value(t, get(ring, 0), Ints{42, 42})
 	testing.expect_value(t, get_last(ring), Ints{42, 42})
 }
@@ -1,8 +1,11 @@
 package examples
 import "core:fmt"
 import "core:log"
 import "core:mem"
 import "core:os"
 import "core:sys/posix"
 import mdb "../../lmdb"
 // 0o660
@@ -10,34 +13,75 @@ DB_MODE :: posix.mode_t{.IWGRP, .IRGRP, .IWUSR, .IRUSR}
 DB_PATH :: "out/debug/lmdb_example_db"
 main :: proc() {
-    environment: ^mdb.Env
+	//----- General setup ----------------------------------
 	{
 		// Temp
 		track_temp: mem.Tracking_Allocator
 		mem.tracking_allocator_init(&track_temp, context.temp_allocator)
 		context.temp_allocator = mem.tracking_allocator(&track_temp)
-    // Create environment for lmdb
+		// Default
-    mdb.panic_on_err(mdb.env_create(&environment))
+		track: mem.Tracking_Allocator
-    // Create directory for databases. Won't do anything if it already exists.
+		mem.tracking_allocator_init(&track, context.allocator)
-    // 0o774 gives all permissions for owner and group, read for everyone else.
+		context.allocator = mem.tracking_allocator(&track)
-    os.make_directory(DB_PATH, 0o774)
+		// Log a warning about any memory that was not freed by the end of the program.
-    // Open the database files (creates them if they don't already exist)
+		// This could be fine for some global state or it could be a memory leak.
-    mdb.panic_on_err(mdb.env_open(environment, DB_PATH, 0, DB_MODE))
+		defer {
 			// Temp allocator
 			if len(track_temp.bad_free_array) > 0 {
 				fmt.eprintf("=== %v incorrect frees - temp allocator: ===\n", len(track_temp.bad_free_array))
 				for entry in track_temp.bad_free_array {
 					fmt.eprintf("- %p @ %v\n", entry.memory, entry.location)
 				}
 				mem.tracking_allocator_destroy(&track_temp)
 			}
 			// Default allocator
 			if len(track.allocation_map) > 0 {
 				fmt.eprintf("=== %v allocations not freed - main allocator: ===\n", len(track.allocation_map))
 				for _, entry in track.allocation_map {
 					fmt.eprintf("- %v bytes @ %v\n", entry.size, entry.location)
 				}
 			}
 			if len(track.bad_free_array) > 0 {
 				fmt.eprintf("=== %v incorrect frees - main allocator: ===\n", len(track.bad_free_array))
 				for entry in track.bad_free_array {
 					fmt.eprintf("- %p @ %v\n", entry.memory, entry.location)
 				}
 			}
 			mem.tracking_allocator_destroy(&track)
 		}
 		// Logger
 		context.logger = log.create_console_logger()
 		defer log.destroy_console_logger(context.logger)
 	}
-    // Transactions
+	environment: ^mdb.Env
    txn_handle: ^mdb.Txn
    db_handle: mdb.Dbi
    // Put transaction
    key := 7
    key_val := mdb.autoval(&key)
    put_data := 12
    put_data_val := mdb.autoval(&put_data)
    mdb.panic_on_err(mdb.txn_begin(environment, nil, 0, &txn_handle))
    mdb.panic_on_err(mdb.dbi_open(txn_handle, nil, 0, &db_handle))
    mdb.panic_on_err(mdb.put(txn_handle, db_handle, &key_val.raw, &put_data_val.raw, 0))
    mdb.panic_on_err(mdb.txn_commit(txn_handle))
-    // Get transaction
+	// Create environment for lmdb
-    get_data_val := mdb.nil_autoval(int)
+	mdb.panic_on_err(mdb.env_create(&environment))
-    mdb.panic_on_err(mdb.txn_begin(environment, nil, 0, &txn_handle))
+	// Create directory for databases. Won't do anything if it already exists.
-    mdb.panic_on_err(mdb.get(txn_handle, db_handle, &key_val.raw, &get_data_val.raw))
+	os.make_directory(DB_PATH)
-    mdb.panic_on_err(mdb.txn_commit(txn_handle))
+	// Open the database files (creates them if they don't already exist)
-    data_cpy := mdb.autoval_get_data(&get_data_val)^
+	mdb.panic_on_err(mdb.env_open(environment, DB_PATH, {}, DB_MODE))
-    fmt.println("Get result:", data_cpy)
+
 	// Transactions
 	txn_handle: ^mdb.Txn
 	db_handle: mdb.Dbi
 	// Put transaction
 	key := 7
 	key_val := mdb.blittable_val(&key)
 	put_data := 12
 	put_data_val := mdb.blittable_val(&put_data)
 	mdb.panic_on_err(mdb.txn_begin(environment, nil, {}, &txn_handle))
 	mdb.panic_on_err(mdb.dbi_open(txn_handle, nil, {}, &db_handle))
 	mdb.panic_on_err(mdb.put(txn_handle, db_handle, &key_val, &put_data_val, {}))
 	mdb.panic_on_err(mdb.txn_commit(txn_handle))
 	// Get transaction
 	data_val: mdb.Val
 	mdb.panic_on_err(mdb.txn_begin(environment, nil, {}, &txn_handle))
 	mdb.panic_on_err(mdb.get(txn_handle, db_handle, &key_val, &data_val))
 	data_cpy := mdb.blittable_copy(&data_val, int)
 	mdb.panic_on_err(mdb.txn_commit(txn_handle))
 	fmt.println("Get result:", data_cpy)
 }
@@ -164,24 +164,123 @@
 */
 package lmdb
 foreign import lib "system:lmdb"
 import "core:c"
 import "core:fmt"
 import "core:reflect"
 import "core:sys/posix"
 // ---------------------------------------------------------------------------------------------------------------------
 // ----- Added Odin Helpers ------------------------
 // ---------------------------------------------------------------------------------------------------------------------
 // Wrap a blittable value's bytes as an LMDB Val.
 // T must be a contiguous type with no indirection (no pointers, slices, strings, maps, etc.).
 blittable_val :: #force_inline proc(val_ptr: ^$T) -> Val {
 	fmt.assertf(
 		reflect.has_no_indirections(type_info_of(T)),
 		"blitval: type '%v' contains indirection and cannot be stored directly in LMDB",
 		typeid_of(T),
 	)
 	return Val{size_of(T), val_ptr}
 }
 // Reads a blittable T out of the LMDB memory map by copying it into caller
 // storage. The returned T has no lifetime tie to the transaction.
 blittable_copy :: #force_inline proc(val: ^Val, $T: typeid) -> T {
 	fmt.assertf(
 		reflect.has_no_indirections(type_info_of(T)),
 		"blitval_copy: type '%v' contains indirection and cannot be read directly from LMDB",
 		typeid_of(T),
 	)
 	return (cast(^T)val.data)^
 }
 // Zero-copy pointer view into the LMDB memory map as a ^T.
 // Useful for large blittable types where you want to read individual fields
 // without copying the entire value (e.g. ptr.timestamp, ptr.flags).
 // MUST NOT be written through — writes either segfault (default env mode)
 // or silently corrupt the database (ENV_WRITEMAP).
 // MUST NOT be retained past txn_commit, txn_abort, or any subsequent write
 // operation on the same env — the pointer is invalidated.
 blittable_view :: #force_inline proc(val: ^Val, $T: typeid) -> ^T {
 	fmt.assertf(
 		reflect.has_no_indirections(type_info_of(T)),
 		"blitval_view: type '%v' contains indirection and cannot be viewed directly from LMDB",
 		typeid_of(T),
 	)
 	return cast(^T)val.data
 }
 // Wrap a slice of blittable elements as an LMDB Val for use with put/get.
 // T must be a contiguous type with no indirection.
 // The caller's slice must remain valid (not freed, not resized) for the
 // duration of the put call that consumes this Val.
 slice_val :: #force_inline proc(s: []$T) -> Val {
 	fmt.assertf(
 		reflect.has_no_indirections(type_info_of(T)),
 		"slice_val: element type '%v' contains indirection and cannot be stored directly in LMDB",
 		typeid_of(T),
 	)
 	return Val{uint(len(s) * size_of(T)), raw_data(s)}
 }
 // Zero-copy slice view into the LMDB memory map.
 // T must match the element type that was originally stored.
 // MUST NOT be modified — writes through this slice either segfault (default
 // env mode) or silently corrupt the database (ENV_WRITEMAP).
 // MUST be copied (e.g. slice.clone) if it needs to outlive the current
 // transaction; the view is invalidated by txn_commit, txn_abort, or any
 // subsequent write operation on the same env.
 slice_view :: #force_inline proc(val: ^Val, $T: typeid) -> []T {
 	fmt.assertf(
 		reflect.has_no_indirections(type_info_of(T)),
 		"slice_view: element type '%v' contains indirection and cannot be read directly from LMDB",
 		typeid_of(T),
 	)
 	return (cast([^]T)val.data)[:val.size / size_of(T)]
 }
 // Wrap a string's bytes as an LMDB Val for use with put/get.
 // The caller's string must remain valid (backing memory not freed) for the
 // duration of the put call that consumes this Val.
 string_val :: #force_inline proc(s: string) -> Val {
 	return Val{uint(len(s)), raw_data(s)}
 }
 // Zero-copy string view into the LMDB memory map.
 // MUST NOT be modified — writes through the underlying bytes either segfault
 // (default env mode) or silently corrupt the database (ENV_WRITEMAP).
 // MUST be copied (e.g. strings.clone) if it needs to outlive the current
 // transaction; the view is invalidated by txn_commit, txn_abort, or any
 // subsequent write operation on the same env.
 string_view :: #force_inline proc(val: ^Val) -> string {
 	return string((cast([^]u8)val.data)[:val.size])
 }
 // Panic if there is an error
 panic_on_err :: #force_inline proc(error: Error, loc := #caller_location) {
 	if error != .NONE {
 		fmt.panicf("LMDB error %v: %s", error, strerror(i32(error)), loc = loc)
 	}
 }
 // ---------------------------------------------------------------------------------------------------------------------
 // ----- Bindings ------------------------
 // ---------------------------------------------------------------------------------------------------------------------
 _ :: c
 when ODIN_OS == .Windows {
 	#panic("TODO: Compile windows .lib for lmdb")
 	mode_t :: c.int
 } else {
 	mode_t :: posix.mode_t
 }
 when ODIN_OS == .Windows {
 	filehandle_t :: rawptr
-} else {
+} else when ODIN_OS ==
 	.Linux || ODIN_OS == .Darwin || ODIN_OS == .FreeBSD || ODIN_OS == .OpenBSD || ODIN_OS == .NetBSD {
 	foreign import lib "system:lmdb"
 	mode_t :: posix.mode_t
 	filehandle_t :: c.int
 } else {
 	#panic("levlib/vendor/lmdb: unsupported OS target")
 }
 Env :: struct {}
@@ -189,7 +288,7 @@ Env :: struct {}
 Txn :: struct {}
 /** @brief A handle for an individual database in the DB environment. */
-Dbi :: u32
+Dbi :: c.uint
 Cursor :: struct {}
@@ -205,33 +304,8 @@ Cursor :: struct {}
 * Other data items can in theory be from 0 to 0xffffffff bytes long.
 */
 Val :: struct {
-	mv_size: uint, /**< size of the data item */
+	size: uint, /**< size of the data item */
-	mv_data: rawptr, /**< address of the data item */
+	data: rawptr, /**< address of the data item */
 }
 // Automatic `Val` handling for a given type 'T'.
 // Will not traverse pointers. If `T` stores pointers, you probably don't want to use this.
 Auto_Val :: struct($T: typeid) {
 	raw: Val,
 }
 autoval :: #force_inline proc "contextless" (val_ptr: ^$T) -> Auto_Val(T) {
 	return Auto_Val(T){Val{size_of(T), val_ptr}}
 }
 nil_autoval :: #force_inline proc "contextless" ($T: typeid) -> Auto_Val(T) {
 	return Auto_Val(T){Val{size_of(T), nil}}
 }
 autoval_get_data :: #force_inline proc "contextless" (val: ^Auto_Val($T)) -> ^T {
 	return cast(^T)val.raw.mv_data
 }
 // Panic if there is an error
 panic_on_err :: #force_inline proc(error: Error) {
 	if error != .NONE {
 		fmt.panicf("Irrecoverable LMDB error", strerror(i32(error)))
 	}
 }
 /** @brief A callback function used to compare two keys in a database */
@@ -253,85 +327,65 @@ Cmp_Func :: #type proc "c" (_: ^Val, _: ^Val) -> i32
 */
 Rel_Func :: #type proc "c" (item: ^Val, oldptr, newptr, relctx: rawptr)
-/** @defgroup	mdb_env	Environment Flags
+/** @defgroup mdb_env Environment Flags
 *	@{
 */
-/** mmap at a fixed address (experimental) */
+Env_Flag :: enum u32 {
-ENV_FIXEDMAP :: 0x01
+	FIXEDMAP     = 0, /**< mmap at a fixed address (experimental) */
-/** no environment directory */
+	NOSUBDIR     = 14, /**< no environment directory */
-ENV_NOSUBDIR :: 0x4000
+	NOSYNC       = 16, /**< don't fsync after commit */
-/** don't fsync after commit */
+	RDONLY       = 17, /**< read only */
-ENV_NOSYNC :: 0x10000
+	NOMETASYNC   = 18, /**< don't fsync metapage after commit */
-/** read only */
+	WRITEMAP     = 19, /**< use writable mmap */
-ENV_RDONLY :: 0x20000
+	MAPASYNC     = 20, /**< use asynchronous msync when WRITEMAP is used */
-/** don't fsync metapage after commit */
+	NOTLS        = 21, /**< tie reader locktable slots to Txn objects instead of to threads */
-ENV_NOMETASYNC :: 0x40000
+	NOLOCK       = 22, /**< don't do any locking, caller must manage their own locks */
-/** use writable mmap */
+	NORDAHEAD    = 23, /**< don't do readahead (no effect on Windows) */
-ENV_WRITEMAP :: 0x80000
+	NOMEMINIT    = 24, /**< don't initialize malloc'd memory before writing to datafile */
-/** use asynchronous msync when #MDB_WRITEMAP is used */
+	PREVSNAPSHOT = 25, /**< use the previous snapshot rather than the latest one */
-ENV_MAPASYNC :: 0x100000
+}
-/** tie reader locktable slots to #MDB_txn objects instead of to threads */
+Env_Flags :: distinct bit_set[Env_Flag;c.uint]
 ENV_NOTLS :: 0x200000
 /** don't do any locking, caller must manage their own locks */
 ENV_NOLOCK :: 0x400000
 /** don't do readahead (no effect on Windows) */
 ENV_NORDAHEAD :: 0x800000
 /** don't initialize malloc'd memory before writing to datafile */
 ENV_NOMEMINIT :: 0x1000000
 /** @} */
-/**	@defgroup	mdb_dbi_open	Database Flags
+/** @defgroup mdb_dbi_open Database Flags
 *	@{
 */
-/** use reverse string keys */
+Db_Flag :: enum u32 {
-DB_REVERSEKEY :: 0x02
+	REVERSEKEY = 1, /**< use reverse string keys */
-/** use sorted duplicates */
+	DUPSORT    = 2, /**< use sorted duplicates */
-DB_DUPSORT :: 0x04
+	INTEGERKEY = 3, /**< numeric keys in native byte order */
-/** numeric keys in native byte order: either unsigned int or size_t.
+	DUPFIXED   = 4, /**< with DUPSORT, sorted dup items have fixed size */
-	 *  The keys must all be of the same size. */
+	INTEGERDUP = 5, /**< with DUPSORT, dups are INTEGERKEY-style integers */
-DB_INTEGERKEY :: 0x08
+	REVERSEDUP = 6, /**< with DUPSORT, use reverse string dups */
-/** with #MDB_DUPSORT, sorted dup items have fixed size */
+	CREATE     = 18, /**< create DB if not already existing */
-DB_DUPFIXED :: 0x10
+}
-/** with #MDB_DUPSORT, dups are #MDB_INTEGERKEY-style integers */
+Db_Flags :: distinct bit_set[Db_Flag;c.uint]
 DB_INTEGERDUP :: 0x20
 /** with #MDB_DUPSORT, use reverse string dups */
 DB_REVERSEDUP :: 0x40
 /** create DB if not already existing */
 DB_CREATE :: 0x40000
 /** @} */
-/**	@defgroup mdb_put	Write Flags
+/** @defgroup mdb_put Write Flags
 *	@{
 */
-/** For put: Don't write if the key already exists. */
+Write_Flag :: enum u32 {
-WRITE_NOOVERWRITE :: 0x10
+	NOOVERWRITE = 4, /**< For put: Don't write if the key already exists */
-/** Only for #MDB_DUPSORT<br>
+	NODUPDATA   = 5, /**< For DUPSORT: don't write if the key and data pair already exist.
- * For put: don't write if the key and data pair already exist.<br>
+	                       For mdb_cursor_del: remove all duplicate data items. */
- * For mdb_cursor_del: remove all duplicate data items.
+	CURRENT     = 6, /**< For mdb_cursor_put: overwrite the current key/data pair */
- */
+	RESERVE     = 16, /**< For put: Just reserve space for data, don't copy it */
-WRITE_NODUPDATA :: 0x20
+	APPEND      = 17, /**< Data is being appended, don't split full pages */
-/** For mdb_cursor_put: overwrite the current key/data pair */
+	APPENDDUP   = 18, /**< Duplicate data is being appended, don't split full pages */
-WRITE_CURRENT :: 0x40
+	MULTIPLE    = 19, /**< Store multiple data items in one call. Only for DUPFIXED. */
-/** For put: Just reserve space for data, don't copy it. Return a
+}
- * pointer to the reserved space.
+Write_Flags :: distinct bit_set[Write_Flag;c.uint]
- */
+/** @} */
 WRITE_RESERVE :: 0x10000
 /** Data is being appended, don't split full pages. */
 WRITE_APPEND :: 0x20000
 /** Duplicate data is being appended, don't split full pages. */
 WRITE_APPENDDUP :: 0x40000
 /** Store multiple data items in one call. Only for #MDB_DUPFIXED. */
 WRITE_MULTIPLE :: 0x80000
 /*	@} */
-/**	@defgroup mdb_copy	Copy Flags
+/** @defgroup mdb_copy Copy Flags
 *	@{
 */
-/** Compacting copy: Omit free space from copy, and renumber all
+Copy_Flag :: enum u32 {
- * pages sequentially.
+	COMPACT = 0, /**< Compacting copy: Omit free space from copy, and renumber all pages sequentially. */
- */
+}
-CP_COMPACT :: 0x01
+Copy_Flags :: distinct bit_set[Copy_Flag;c.uint]
-/*	@} */
+/** @} */
 /** @brief Cursor Get operations.
 *
@@ -340,33 +394,24 @@ CP_COMPACT :: 0x01
 */
 Cursor_Op :: enum c.int {
 	FIRST, /**< Position at first key/data item */
-	FIRST_DUP, /**< Position at first data item of current key.
+	FIRST_DUP, /**< Position at first data item of current key. Only for DUPSORT */
-								Only for #MDB_DUPSORT */
+	GET_BOTH, /**< Position at key/data pair. Only for DUPSORT */
-	GET_BOTH, /**< Position at key/data pair. Only for #MDB_DUPSORT */
+	GET_BOTH_RANGE, /**< Position at key, nearest data. Only for DUPSORT */
 	GET_BOTH_RANGE, /**< position at key, nearest data. Only for #MDB_DUPSORT */
 	GET_CURRENT, /**< Return key/data at current cursor position */
-	GET_MULTIPLE, /**< Return up to a page of duplicate data items
+	GET_MULTIPLE, /**< Return up to a page of duplicate data items from current cursor position. Only for DUPFIXED */
 								from current cursor position. Move cursor to prepare
 								for #MDB_NEXT_MULTIPLE. Only for #MDB_DUPFIXED */
 	LAST, /**< Position at last key/data item */
-	LAST_DUP, /**< Position at last data item of current key.
+	LAST_DUP, /**< Position at last data item of current key. Only for DUPSORT */
 								Only for #MDB_DUPSORT */
 	NEXT, /**< Position at next data item */
-	NEXT_DUP, /**< Position at next data item of current key.
+	NEXT_DUP, /**< Position at next data item of current key. Only for DUPSORT */
-								Only for #MDB_DUPSORT */
+	NEXT_MULTIPLE, /**< Return up to a page of duplicate data items from next cursor position. Only for DUPFIXED */
 	NEXT_MULTIPLE, /**< Return up to a page of duplicate data items
 								from next cursor position. Move cursor to prepare
 								for #MDB_NEXT_MULTIPLE. Only for #MDB_DUPFIXED */
 	NEXT_NODUP, /**< Position at first data item of next key */
 	PREV, /**< Position at previous data item */
-	PREV_DUP, /**< Position at previous data item of current key.
+	PREV_DUP, /**< Position at previous data item of current key. Only for DUPSORT */
 								Only for #MDB_DUPSORT */
 	PREV_NODUP, /**< Position at last data item of previous key */
 	SET, /**< Position at specified key */
 	SET_KEY, /**< Position at specified key, return key + data */
-	SET_RANGE, /**< Position at first key greater than or equal to specified key. */
+	SET_RANGE, /**< Position at first key greater than or equal to specified key */
-	PREV_MULTIPLE, /**< Position at previous page and return up to
+	PREV_MULTIPLE, /**< Position at previous page and return up to a page of duplicate data items. Only for DUPFIXED */
 								a page of duplicate data items. Only for #MDB_DUPFIXED */
 }
 Error :: enum c.int {
@@ -419,33 +464,28 @@ Error :: enum c.int {
 	BAD_VALSIZE      = -30781,
 	/** The specified DBI was changed unexpectedly */
 	BAD_DBI          = -30780,
 	/** Unexpected problem - txn should abort */
 	PROBLEM          = -30779,
 }
 /** @brief Statistics for a database in the environment */
 Stat :: struct {
-	ms_psize:          u32,
+	psize:          u32, /**< Size of a database page. This is currently the same for all databases. */
-	/**< Size of a database page.
+	depth:          u32, /**< Depth (height) of the B-tree */
-											This is currently the same for all databases. */
+	branch_pages:   uint, /**< Number of internal (non-leaf) pages */
-	ms_depth:          u32,
+	leaf_pages:     uint, /**< Number of leaf pages */
-	/**< Depth (height) of the B-tree */
+	overflow_pages: uint, /**< Number of overflow pages */
-	ms_branch_pages:   uint,
+	entries:        uint, /**< Number of data items */
 	/**< Number of internal (non-leaf) pages */
 	ms_leaf_pages:     uint,
 	/**< Number of leaf pages */
 	ms_overflow_pages: uint,
 	/**< Number of overflow pages */
 	ms_entries:        uint,
 	/**< Number of data items */
 }
 /** @brief Information about the environment */
 Env_Info :: struct {
-	me_mapaddr:    rawptr, /**< Address of map, if fixed */
+	mapaddr:    rawptr, /**< Address of map, if fixed */
-	me_mapsize:    uint, /**< Size of the data memory map */
+	mapsize:    uint, /**< Size of the data memory map */
-	me_last_pgno:  uint, /**< ID of the last used page */
+	last_pgno:  uint, /**< ID of the last used page */
-	me_last_txnid: uint, /**< ID of the last committed transaction */
+	last_txnid: uint, /**< ID of the last committed transaction */
-	me_maxreaders: u32, /**< max reader slots in the environment */
+	maxreaders: u32, /**< max reader slots in the environment */
-	me_numreaders: u32, /**< max reader slots used in the environment */
+	numreaders: u32, /**< max reader slots used in the environment */
 }
 /** @brief A callback function for most LMDB assert() failures,
@@ -454,7 +494,7 @@ Env_Info :: struct {
 * @param[in] env An environment handle returned by #mdb_env_create().
 * @param[in] msg The assertion message, not including newline.
 */
-Assert_Func :: proc "c" (_: ^Env, _: cstring)
+Assert_Func :: #type proc "c" (_: ^Env, _: cstring)
 /** @brief A callback function used to print a message from the library.
 *
@@ -462,7 +502,7 @@ Assert_Func :: proc "c" (_: ^Env, _: cstring)
 * @param[in] ctx An arbitrary context pointer for the callback.
 * @return < 0 on failure, >= 0 on success.
 */
-Msg_Func :: proc "c" (_: cstring, _: rawptr) -> i32
+Msg_Func :: #type proc "c" (_: cstring, _: rawptr) -> i32
@(default_calling_convention = "c", link_prefix = "mdb_")
 foreign lib {
@@ -623,7 +663,7 @@ foreign lib {
 	* </ul>
 	*/
 	@(require_results)
-	env_open :: proc(env: ^Env, path: cstring, flags: u32, mode: mode_t) -> Error ---
+	env_open :: proc(env: ^Env, path: cstring, flags: Env_Flags, mode: mode_t) -> Error ---
 	/** @brief Copy an LMDB environment to the specified path.
 	*
@@ -682,7 +722,7 @@ foreign lib {
 	* @return A non-zero error value on failure and 0 on success.
 	*/
 	@(require_results)
-	env_copy2 :: proc(env: ^Env, path: cstring, flags: u32) -> Error ---
+	env_copy2 :: proc(env: ^Env, path: cstring, flags: Copy_Flags) -> Error ---
 	/** @brief Copy an LMDB environment to the specified file descriptor,
 	*	with options.
@@ -702,7 +742,7 @@ foreign lib {
 	* @return A non-zero error value on failure and 0 on success.
 	*/
 	@(require_results)
-	env_copyfd2 :: proc(env: ^Env, fd: filehandle_t, flags: u32) -> Error ---
+	env_copyfd2 :: proc(env: ^Env, fd: filehandle_t, flags: Copy_Flags) -> Error ---
 	/** @brief Return statistics about the LMDB environment.
 	*
@@ -767,7 +807,7 @@ foreign lib {
 	* </ul>
 	*/
 	@(require_results)
-	env_set_flags :: proc(env: ^Env, flags: u32, onoff: i32) -> Error ---
+	env_set_flags :: proc(env: ^Env, flags: Env_Flags, onoff: i32) -> Error ---
 	/** @brief Get environment flags.
 	*
@@ -780,7 +820,7 @@ foreign lib {
 	* </ul>
 	*/
 	@(require_results)
-	env_get_flags :: proc(env: ^Env, flags: ^u32) -> Error ---
+	env_get_flags :: proc(env: ^Env, flags: ^Env_Flags) -> Error ---
 	/** @brief Return the path that was used in #mdb_env_open().
 	*
@@ -973,7 +1013,7 @@ foreign lib {
 	* </ul>
 	*/
 	@(require_results)
-	txn_begin :: proc(env: ^Env, parent: ^Txn, flags: u32, txn: ^^Txn) -> Error ---
+	txn_begin :: proc(env: ^Env, parent: ^Txn, flags: Env_Flags, txn: ^^Txn) -> Error ---
 	/** @brief Returns the transaction's #MDB_env
 	*
@@ -1126,7 +1166,7 @@ foreign lib {
 	* </ul>
 	*/
 	@(require_results)
-	dbi_open :: proc(txn: ^Txn, name: cstring, flags: u32, dbi: ^Dbi) -> Error ---
+	dbi_open :: proc(txn: ^Txn, name: cstring, flags: Db_Flags, dbi: ^Dbi) -> Error ---
 	/** @brief Retrieve statistics for a database.
 	*
@@ -1151,7 +1191,7 @@ foreign lib {
 	* @return A non-zero error value on failure and 0 on success.
 	*/
 	@(require_results)
-	dbi_flags :: proc(txn: ^Txn, dbi: Dbi, flags: ^u32) -> Error ---
+	dbi_flags :: proc(txn: ^Txn, dbi: Dbi, flags: ^Db_Flags) -> Error ---
 	/** @brief Close a database handle. Normally unnecessary. Use with care:
 	*
@@ -1229,6 +1269,7 @@ foreign lib {
 	@(require_results)
 	set_dupsort :: proc(txn: ^Txn, dbi: Dbi, cmp: Cmp_Func) -> Error ---
 	// NOTE: Unimplemented in current LMDB — this function has no effect.
 	/** @brief Set a relocation function for a #MDB_FIXEDMAP database.
 	*
 	* @todo The relocation function is called whenever it is necessary to move the data
@@ -1250,6 +1291,7 @@ foreign lib {
 	@(require_results)
 	set_relfunc :: proc(txn: ^Txn, dbi: Dbi, rel: Rel_Func) -> Error ---
 	// NOTE: Unimplemented in current LMDB — this function has no effect.
 	/** @brief Set a context pointer for a #MDB_FIXEDMAP database's relocation function.
 	*
 	* See #mdb_set_relfunc and #MDB_rel_func for more details.
@@ -1344,7 +1386,7 @@ foreign lib {
 	* </ul>
 	*/
 	@(require_results)
-	put :: proc(txn: ^Txn, dbi: Dbi, key: ^Val, data: ^Val, flags: u32) -> Error ---
+	put :: proc(txn: ^Txn, dbi: Dbi, key: ^Val, data: ^Val, flags: Write_Flags) -> Error ---
 	/** @brief Delete items from a database.
 	*
@@ -1517,7 +1559,7 @@ foreign lib {
 	* </ul>
 	*/
 	@(require_results)
-	cursor_put :: proc(cursor: ^Cursor, key: ^Val, data: ^Val, flags: u32) -> Error ---
+	cursor_put :: proc(cursor: ^Cursor, key: ^Val, data: ^Val, flags: Write_Flags) -> Error ---
 	/** @brief Delete current key/data pair
 	*
@@ -1541,7 +1583,7 @@ foreign lib {
 	* </ul>
 	*/
 	@(require_results)
-	cursor_del :: proc(cursor: ^Cursor, flags: u32) -> Error ---
+	cursor_del :: proc(cursor: ^Cursor, flags: Write_Flags) -> Error ---
 	/** @brief Return count of duplicates for current key.
 	*
Author	SHA1	Message	Date
zack	e36229a3ef	Improved consistency with naming of init / create / destroy and when to propagate allocation errors and (#18 ) Co-authored-by: Zachary Levy <zachary@sunforge.is> Reviewed-on: #18	2026-04-24 21:46:21 +00:00
zack	bca19277b3	draw-improvements (#17 ) Major rework to draw rendering system. We are making a SDF first rendering system with tesselated stuff only as a fallback strategy for specific situations where SDF is particularly poorly suited Co-authored-by: Zachary Levy <zachary@sunforge.is> Reviewed-on: #17	2026-04-24 07:57:44 +00:00
zack	37da2ea068	Tweaked general setup tracking allocator and added logger (#11 ) Co-authored-by: Zachary Levy <zachary@sunforge.is> Reviewed-on: #11	2026-04-22 06:03:10 +00:00
zack	cfd9e504e1	vendor-cleanup (#10 ) Major rework of libusb and lmdb bindings Co-authored-by: Zachary Levy <zachary@sunforge.is> Reviewed-on: #10	2026-04-22 04:47:59 +00:00
zack	0d424cbd6e	Texture Rendering (#9 ) Co-authored-by: Zachary Levy <zachary@sunforge.is> Reviewed-on: #9	2026-04-22 00:05:08 +00:00