Cleaned up phased_executor test

Removed using statement from many_bits
Added test all task
2026-04-02 18:30:12 -07:00 · 2026-04-02 18:26:06 -07:00 · 2026-04-02 18:24:38 -07:00 · 2026-04-02 18:19:42 -07:00
33 changed files with 30 additions and 9928 deletions
@@ -32,69 +32,19 @@
    "command": "odin test phased_executor -out=out/debug/test_phased_executor",
    "cwd": "$ZED_WORKTREE_ROOT",
  },
-  {
-    "label": "Test qrcode",
-    "command": "odin test qrcode -out=out/debug/test_qrcode",
-    "cwd": "$ZED_WORKTREE_ROOT",
-  },
  {
    "label": "Test all",
-    "command": "odin test many_bits -out=out/debug/test_many_bits && odin test ring -out=out/debug/test_ring && odin test levsort -out=out/debug/test_levsort && odin test levsync -out=out/debug/test_levsync && odin test levmath -out=out/debug/test_levmath && odin test phased_executor -out=out/debug/test_phased_executor && odin test qrcode -out=out/debug/test_qrcode",
+    "command": "odin test many_bits -out=out/debug/test_many_bits && odin test ring -out=out/debug/test_ring && odin test levsort -out=out/debug/test_levsort && odin test levsync -out=out/debug/test_levsync && odin test levmath -out=out/debug/test_levmath && odin test phased_executor -out=out/debug/test_phased_executor",
    "cwd": "$ZED_WORKTREE_ROOT",
  },
  // ---------------------------------------------------------------------------------------------------------------------
-  // ----- Examples ------------------------
+  // ----- LMDB Examples ------------------------
  // ---------------------------------------------------------------------------------------------------------------------
  {
    "label": "Run lmdb example",
    "command": "odin run vendor/lmdb/examples -debug -out=out/debug/lmdb-examples",
    "cwd": "$ZED_WORKTREE_ROOT",
  },
-  {
-    "label": "Run draw hellope-clay example",
-    "command": "odin run draw/examples -debug -out=out/debug/draw-examples -- hellope-clay",
-    "cwd": "$ZED_WORKTREE_ROOT",
-  },
-  {
-    "label": "Run draw hellope-shapes example",
-    "command": "odin run draw/examples -debug -out=out/debug/draw-examples -- hellope-shapes",
-    "cwd": "$ZED_WORKTREE_ROOT",
-  },
-  {
-    "label": "Run draw hellope-text example",
-    "command": "odin run draw/examples -debug -out=out/debug/draw-examples -- hellope-text",
-    "cwd": "$ZED_WORKTREE_ROOT",
-  },
-  {
-    "label": "Run draw hellope-custom example",
-    "command": "odin run draw/examples -debug -out=out/debug/draw-examples -- hellope-custom",
-    "cwd": "$ZED_WORKTREE_ROOT",
-  },
-  {
-    "label": "Run draw textures example",
-    "command": "odin run draw/examples -debug -out=out/debug/draw-examples -- textures",
-    "cwd": "$ZED_WORKTREE_ROOT",
-  },
-  {
-    "label": "Run qrcode basic example",
-    "command": "odin run qrcode/examples -debug -out=out/debug/qrcode-examples -- basic",
-    "cwd": "$ZED_WORKTREE_ROOT",
-  },
-  {
-    "label": "Run qrcode variety example",
-    "command": "odin run qrcode/examples -debug -out=out/debug/qrcode-examples -- variety",
-    "cwd": "$ZED_WORKTREE_ROOT",
-  },
-  {
-    "label": "Run qrcode segment example",
-    "command": "odin run qrcode/examples -debug -out=out/debug/qrcode-examples -- segment",
-    "cwd": "$ZED_WORKTREE_ROOT",
-  },
-  {
-    "label": "Run qrcode mask example",
-    "command": "odin run qrcode/examples -debug -out=out/debug/qrcode-examples -- mask",
-    "cwd": "$ZED_WORKTREE_ROOT",
-  },
  // ---------------------------------------------------------------------------------------------------------------------
  // ----- Other ------------------------
  // ---------------------------------------------------------------------------------------------------------------------
@@ -1,19 +1,3 @@
 # LevLib

 Narya + BFPOWER unified Odin library collection.
-
-## Meta Tools
-
-The `meta/` package contains build tools that can be run from the project root:
-
-```
-odin run meta -- <command>
-```
-
-Running with no arguments prints available commands.
-
-### Commands
-
-| Command       | Description                                                                                                                                                                                   |
-| ------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `gen-shaders` | Compile all GLSL shaders in `draw/shaders/source/` to SPIR-V and Metal Shading Language, writing results to `draw/shaders/generated/`. Requires `glslangValidator` and `spirv-cross` on PATH. |
@@ -1,814 +0,0 @@
-# draw
-
-2D rendering library built on SDL3 GPU, providing a unified shape-drawing and text-rendering API with
-Clay UI integration.
-
-## Current state
-
-The renderer uses a single unified `Pipeline_2D_Base` (`TRIANGLELIST` pipeline) with two submission
-modes dispatched by a push constant:
-
- **Mode 0 (Tessellated):** Vertex buffer contains real geometry. Used for text (indexed draws into
-  SDL_ttf atlas textures), axis-aligned sharp-corner rectangles (already optimal as 2 triangles),
-  per-vertex color gradients (`rectangle_gradient`, `circle_gradient`), angular-clipped circle
-  sectors (`circle_sector`), and arbitrary user geometry (`triangle`, `triangle_fan`,
-  `triangle_strip`). The fragment shader computes `out = color * texture(tex, uv)`.
-
- **Mode 1 (SDF):** A static 6-vertex unit-quad buffer is drawn instanced, with per-primitive
-  `Primitive` structs uploaded each frame to a GPU storage buffer. The vertex shader reads
-  `primitives[gl_InstanceIndex]`, computes world-space position from unit quad corners + primitive
-  bounds. The fragment shader dispatches on `Shape_Kind` to evaluate the correct signed distance
-  function analytically.
-
-Seven SDF shape kinds are implemented:
-
-1. **RRect** — rounded rectangle with per-corner radii (iq's `sdRoundedBox`)
-2. **Circle** — filled or stroked circle
-3. **Ellipse** — exact signed-distance ellipse (iq's iterative `sdEllipse`)
-4. **Segment** — capsule-style line segment with rounded caps
-5. **Ring_Arc** — annular ring with angular clipping for arcs
-6. **NGon** — regular polygon with arbitrary side count and rotation
-7. **Polyline** — decomposed into independent `Segment` primitives per adjacent point pair
-
-All SDF shapes support fill and stroke modes via `Shape_Flags`, and produce mathematically exact
-curves with analytical anti-aliasing via `smoothstep` — no tessellation, no piecewise-linear
-approximation. A rounded rectangle is 1 primitive (64 bytes) instead of ~250 vertices (~5000 bytes).
-
-MSAA is opt-in (default `._1`, no MSAA) via `Init_Options.msaa_samples`. SDF rendering does not
-benefit from MSAA because fragment coverage is computed analytically. MSAA remains useful for text
-glyph edges and tessellated user geometry if desired.
-
-## 2D rendering pipeline plan
-
-This section documents the planned architecture for levlib's 2D rendering system. The design is driven
-by three goals: **draw quality** (mathematically exact curves with perfect anti-aliasing), **efficiency**
-(minimal vertex bandwidth, high GPU occupancy, low draw-call count), and **extensibility** (new
-primitives and effects can be added to the library without architectural changes).
-
-### Overview: three pipelines
-
-The 2D renderer uses three GPU pipelines, split by **register pressure** (main vs effects) and
-**render-pass structure** (everything vs backdrop):
-
-1. **Main pipeline** — shapes (SDF and tessellated), text, and textured rectangles. Low register
-   footprint (~18–24 registers per thread). Runs at full GPU occupancy on every architecture.
-   Handles 90%+ of all fragments in a typical frame.
-
-2. **Effects pipeline** — drop shadows, inner shadows, outer glow, and similar ALU-bound blur
-   effects. Medium register footprint (~48–60 registers). Each effects primitive includes the base
-   shape's SDF so that it can draw both the effect and the shape in a single fragment pass, avoiding
-   redundant overdraw. Separated from the main pipeline to protect main-pipeline occupancy on
-   low-end hardware (see register analysis below).
-
-3. **Backdrop pipeline** — frosted glass, refraction, and any effect that samples the current render
-   target as input. Implemented as a multi-pass sequence (downsample, separable blur, composite),
-   where each individual pass has a low-to-medium register footprint (~15–40 registers). Separated
-   from the other pipelines because it structurally requires ending the current render pass and
-   copying the render target before any backdrop-sampling fragment can execute — a command-buffer-
-   level boundary that cannot be avoided regardless of shader complexity.
-
-A typical UI frame with no effects uses 1 pipeline bind and 0 switches. A frame with drop shadows
-uses 2 pipelines and 1 switch. A frame with shadows and frosted glass uses all 3 pipelines and 2
-switches plus 1 texture copy. At ~1–5μs per pipeline bind on modern APIs, worst-case switching
-overhead is negligible relative to an 8.3ms (120 FPS) frame budget.
-
-### Why three pipelines, not one or seven
-
-The natural question is whether we should use a single unified pipeline (fewer state changes, simpler
-code) or many per-primitive-type pipelines (no branching overhead, lean per-shader register usage).
-
-#### Main/effects split: register pressure
-
-A GPU shader core has a fixed register pool shared among all concurrent threads. The compiler
-allocates registers pessimistically based on the worst-case path through the shader. If the shader
-contains both a 20-register RRect SDF and a 48-register drop-shadow blur, _every_ fragment — even
-trivial RRects — is allocated 48 registers. This directly reduces **occupancy** (the number of
-warps/wavefronts that can run simultaneously), which reduces the GPU's ability to hide memory
-latency.
-
-Each GPU architecture has a **register cliff** — a threshold above which occupancy starts dropping.
-Below the cliff, adding registers has zero occupancy cost.
-
-On consumer Ampere/Ada GPUs (RTX 30xx/40xx, 65,536 regs/SM, max 1,536 threads/SM, cliff at ~43 regs):
-
-| Register allocation     | Reg-limited threads | Actual (hw-capped) | Occupancy |
-| ----------------------- | ------------------- | ------------------ | --------- |
-| 20 regs (main pipeline) | 3,276               | 1,536              | 100%      |
-| 32 regs                 | 2,048               | 1,536              | 100%      |
-| 48 regs (effects)       | 1,365               | 1,365              | ~89%      |
-
-On Volta/A100 GPUs (65,536 regs/SM, max 2,048 threads/SM, cliff at ~32 regs):
-
-| Register allocation     | Reg-limited threads | Actual (hw-capped) | Occupancy |
-| ----------------------- | ------------------- | ------------------ | --------- |
-| 20 regs (main pipeline) | 3,276               | 2,048              | 100%      |
-| 32 regs                 | 2,048               | 2,048              | 100%      |
-| 48 regs (effects)       | 1,365               | 1,365              | ~67%      |
-
-On low-end mobile (ARM Mali Bifrost/Valhall, 64 regs/thread, cliff fixed at 32 regs):
-
-| Register allocation  | Occupancy                  |
-| -------------------- | -------------------------- |
-| 0–32 regs (main)     | 100% (full thread count)   |
-| 33–64 regs (effects) | ~50% (thread count halves) |
-
-Mali's cliff at 32 registers is the binding constraint. On desktop the occupancy difference between
-20 and 48 registers is modest (89–100%); on Mali it is a hard 2× throughput reduction. The
-main/effects split protects 90%+ of a frame's fragments (shapes, text, textures) from the effects
-pipeline's register cost.
-
-For the effects pipeline's drop-shadow shader — erf-approximation blur math with several texture
-fetches — 50% occupancy on Mali roughly halves throughput. At 4K with 1.5× overdraw (~12.4M
-fragments), a single unified shader containing the shadow branch would cost ~4ms instead of ~2ms on
-low-end mobile. This is a per-frame multiplier even when the heavy branch is never taken, because the
-compiler allocates registers for the worst-case path.
-
-All main-pipeline members (SDF shapes, tessellated geometry, text, textured rectangles) cluster at
-12–24 registers — below the cliff on every architecture — so unifying them costs nothing in
-occupancy.
-
-**Note on Apple M3+ GPUs:** Apple's M3 introduces Dynamic Caching (register file virtualization),
-which allocates registers at runtime based on actual usage rather than worst-case. This weakens the
-static register-pressure argument on M3 and later, but the split remains useful for isolating blur
-ALU complexity and keeping the backdrop texture-copy out of the main render pass.
-
-#### Backdrop split: render-pass structure
-
-The backdrop pipeline (frosted glass, refraction, mirror surfaces) is separated for a structural
-reason unrelated to register pressure. Before any backdrop-sampling fragment can execute, the current
-render target must be copied to a separate texture via `CopyGPUTextureToTexture` — a command-buffer-
-level operation that requires ending the current render pass. This boundary exists regardless of
-shader complexity and cannot be optimized away.
-
-The backdrop pipeline's individual shader passes (downsample, separable blur, composite) are
-register-light (~15–40 regs each), so merging them into the effects pipeline would cause no occupancy
-problem. But the render-pass boundary makes merging structurally impossible — effects draws happen
-inside the main render pass, backdrop draws happen inside their own bracketed pass sequence.
-
-#### Why not per-primitive-type pipelines (GPUI's approach)
-
-Zed's GPUI uses 7 separate shader pairs:
-quad, shadow, underline, monochrome sprite, polychrome sprite, path, surface. This eliminates all
-branching and gives each shader minimal register usage. Three concrete costs make this approach wrong
-for our use case:
-
-**Draw call count scales with kind variety, not just scissor count.** With a unified pipeline,
-one instanced draw call per scissor covers all primitive kinds from a single storage buffer. With
-per-kind pipelines, each scissor requires one draw call and one pipeline bind per kind used. For a
-typical UI frame with 15 scissors and 3–4 primitive kinds per scissor, per-kind splitting produces
-~45–60 draw calls and pipeline binds; our unified approach produces ~15–20 draw calls and 1–5
-pipeline binds. At ~5μs each for CPU-side command encoding on modern APIs, per-kind splitting adds
-375–500μs of CPU overhead per frame — **4.5–6% of an 8.3ms (120 FPS) budget** — with no
-compensating GPU-side benefit, because the register-pressure savings within the simple-SDF range are
-negligible (all members cluster at 12–22 registers).
-
-**Z-order preservation forces the API to expose layers.** With a single pipeline drawing all kinds
-from one storage buffer, submission order equals draw order — Clay's painterly render commands flow
-through without reordering. With separate pipelines per kind, primitives can only batch with
-same-kind neighbors, which means interleaved kinds (e.g., `[rrect, circle, text, rrect, text]`) must
-either issue one draw call per primitive (defeating batching entirely) or force the user to pre-sort
-by kind and reason about explicit layers. GPUI chose the latter, baking layer semantics into their
-API where each layer draws shadows before quads before glyphs. Our design avoids this constraint:
-submission order is draw order, no layer juggling required.
-
-**PSO compilation costs multiply.** Each pipeline takes 1–50ms to compile on Metal/Vulkan/D3D12 at
-first use. 7 pipelines is ~175ms cold startup; 3 pipelines is ~75ms. Adding state axes (MSAA
-variants, blend modes, color formats) multiplies combinatorially — a 2.3× larger variant matrix per
-additional axis with 7 pipelines vs 3.
-
-**Branching cost comparison: unified vs per-kind in the effects pipeline.** The effects pipeline is
-the strongest candidate for per-kind splitting because effect branches are heavier than shape
-branches (~80 instructions for drop shadow vs ~20 for an SDF). Even here, per-kind splitting loses.
-Consider a worst-case scissor with 15 drop-shadowed cards and 2 inner-shadowed elements interleaved
-in submission order:
-
- _Unified effects pipeline (our plan):_ 1 pipeline bind, 1 instanced draw call. Category-3
-  divergence occurs at drop-shadow/inner-shadow boundaries where ~4 warps straddle per boundary × 2
-  boundaries = ~8 divergent warps out of ~19,924 total (0.04%). Each divergent warp pays ~80 extra
-  instructions. Total divergence cost: 8 × 32 × 80 / 12G inst/sec ≈ **1.7μs**.
-
- _Per-kind effects pipelines (GPUI-style):_ 2 pipeline binds + 2 draw calls. But submission order
-  is `[drop, drop, inner, drop, drop, inner, drop, ...]` — the two inner-shadow primitives split the
-  drop-shadow run into three segments. To preserve Z-order, this requires 5 draw calls and 4 pipeline
-  switches, not 2. Cost: 5 × 5μs + 4 × 5μs = **45μs**.
-
-  The per-kind approach costs **26× more** than the unified approach's divergence penalty (45μs vs
-  1.7μs), while eliminating only 0.04% warp divergence that was already negligible. Even in the most
-  extreme stacked-effects scenario (10 cards each with both drop shadow and inner shadow, producing
-  ~60 boundary warps at ~80 extra instructions each), unified divergence costs ~13μs — still 3.5×
-  cheaper than the pipeline-switching alternative.
-
-The split we _do_ perform (main / effects / backdrop) is motivated by register-pressure boundaries
-and structural render-pass requirements (see analysis above). Within a pipeline, unified is
-strictly better by every measure: fewer draw calls, simpler Z-order, lower CPU overhead, and
-negligible GPU-side branching cost.
-
-**References:**
-
- Zed GPUI blog post on their per-primitive pipeline architecture:
-  https://zed.dev/blog/videogame
- Zed GPUI Metal shader source (7 shader pairs):
-  https://github.com/zed-industries/zed/blob/cb6fc11/crates/gpui/src/platform/mac/shaders.metal
- NVIDIA Nsight Graphics 2024.3 documentation on active-threads-per-warp and divergence analysis:
-  https://developer.nvidia.com/blog/optimize-gpu-workloads-for-graphics-applications-with-nvidia-nsight-graphics/
- NVIDIA Ampere GPU Architecture Tuning Guide — SM specs, max warps per SM (48 for cc 8.6, 64 for
-  cc 8.0), register file size (64K), occupancy factors:
-  https://docs.nvidia.com/cuda/ampere-tuning-guide/index.html
- NVIDIA Ada GPU Architecture Tuning Guide — SM specs, max warps per SM (48 for cc 8.9):
-  https://docs.nvidia.com/cuda/ada-tuning-guide/index.html
- CUDA Occupancy Calculation walkthrough (register allocation granularity, worked examples):
-  https://leimao.github.io/blog/CUDA-Occupancy-Calculation/
- Apple M3 GPU architecture — Dynamic Caching (register file virtualization) eliminates static
-  worst-case register allocation, reducing the occupancy penalty for high-register shaders:
-  https://asplos.dev/wiki/m3-chip-explainer/gpu/index.html
-
-### Why fragment shader branching is safe in this design
-
-There is longstanding folklore that "branches in shaders are bad." This was true on pre-2010 hardware
-where shader cores had no branch instructions at all — compilers emitted code for both sides of every
-branch and used conditional select to pick the result. On modern GPUs (everything from ~2012 onward),
-this is no longer the case. Native dynamic branching is fully supported on all current hardware.
-However, branching _can_ still be costly in specific circumstances. Understanding which circumstances
-apply to our design — and which do not — is critical to justifying the unified-pipeline approach.
-
-#### How GPU branching works
-
-GPUs execute fragment shaders in **warps** (NVIDIA/Intel, 32 threads) or **wavefronts** (AMD, 32 or
-64 threads). All threads in a warp execute the same instruction simultaneously (SIMT model). When a
-branch condition evaluates the same way for every thread in a warp, the GPU simply jumps to the taken
-path and skips the other — **zero cost**, identical to a CPU branch. This is called a **uniform
-branch** or **warp-coherent branch**.
-
-When threads within the same warp disagree on which path to take, the warp must execute both paths
-sequentially, masking off threads that don't belong to the active path. This is called **warp
-divergence** and it causes the warp to pay the cost of both sides of the branch. In the worst case
-(50/50 split), throughput halves for that warp.
-
-There are three categories of branch condition in a fragment shader, ranked by cost:
-
-| Category                         | Condition source                                                  | GPU behavior                                                                                   | Cost                  |
-| -------------------------------- | ----------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- | --------------------- |
-| **Compile-time constant**        | `#ifdef`, `const bool`                                            | Dead code eliminated by compiler                                                               | Zero                  |
-| **Uniform / push constant**      | Same value for entire draw call                                   | Warp-coherent; GPU skips dead path                                                             | Effectively zero      |
-| **Per-primitive `flat` varying** | Same value across all fragments of a primitive                    | Warp-coherent for all warps fully inside one primitive; divergent only at primitive boundaries | Near zero (see below) |
-| **Per-fragment varying**         | Different value per pixel (e.g., texture lookup, screen position) | Potentially divergent within every warp                                                        | Can be expensive      |
-
-#### Which category our branches fall into
-
-Our design has two branch points:
-
-1. **`mode` (push constant): tessellated vs. SDF.** This is category 2 — uniform per draw call.
-   Every thread in every warp of a draw call sees the same `mode` value. **Zero divergence, zero
-   cost.**
-
-2. **`shape_kind` (flat varying from storage buffer): which SDF to evaluate.** This is category 3.
-   The `flat` interpolation qualifier ensures that all fragments rasterized from one primitive's quad
-   receive the same `shape_kind` value. Divergence can only occur at the **boundary between two
-   adjacent primitives of different kinds**, where the rasterizer might pack fragments from both
-   primitives into the same warp.
-
-For category 3, the divergence analysis depends on primitive size:
-
- **Large primitives** (buttons, panels, containers — 50+ pixels on a side): a 200×100 rect
-  produces ~20,000 fragments = ~625 warps. At most ~4 boundary warps might straddle a neighbor of a
-  different kind. Divergence rate: **0.6%** of warps.
-
- **Small primitives** (icons, dots — 16×16): 256 fragments = ~8 warps. At most 2 boundary warps
-  diverge. Divergence rate: **25%** of warps for that primitive, but the primitive itself covers a
-  tiny fraction of the frame's total fragments.
-
- **Worst realistic case**: a dense grid of alternating shape kinds (e.g., circle-rect-circle-rect
-  icons). Even here, the interior warps of each primitive are coherent. Only the edges diverge. Total
-  frame-level divergence is typically **1–3%** of all warps.
-
-At 1–3% divergence, the throughput impact is negligible. At 4K with 12.4M total fragments
-(~387,000 warps), divergent boundary warps number in the low thousands. Each divergent warp pays at
-most ~25 extra instructions (the cost of the longest untaken SDF branch). At ~12G instructions/sec
-on a mid-range GPU, that totals ~4μs — under 0.05% of an 8.3ms (120 FPS) frame budget. This is
-confirmed by production renderers that use exactly this pattern:
-
- **vger / vger-rs** (Audulus): single pipeline, 11 primitive kinds dispatched by a `switch` on a
-  flat varying `prim_type`. Ships at 120 FPS on iPads. The author (Taylor Holliday) replaced nanovg
-  specifically because CPU-side tessellation was the bottleneck, not fragment branching:
-  https://github.com/audulus/vger-rs
-
- **Randy Gaul's 2D renderer**: single pipeline with `shape_type` encoded as a vertex attribute.
-  Reports that warp divergence "really hasn't been an issue for any game I've seen so far" because
-  "games tend to draw a lot of the same shape type":
-  https://randygaul.github.io/graphics/2025/03/04/2D-Rendering-SDF-and-Atlases.html
-
-#### What kind of branching IS expensive
-
-For completeness, here are the cases where shader branching genuinely hurts — none of which apply to
-our design:
-
-1. **Per-fragment data-dependent branches with high divergence.** Example: `if (texture(noise, uv).r
-
-   > 0.5)` where the noise texture produces a random pattern. Every warp has ~50% divergence. Every
-   > warp pays for both paths. This is the scenario the "branches are bad" folklore warns about. We
-   > have no per-fragment data-dependent branches in the main pipeline.
-
-2. **Branches where both paths are very long.** If both sides of a branch are 500+ instructions,
-   divergent warps pay double a large cost. Our SDF functions are 10–25 instructions each. Even
-   fully divergent, the penalty is ~25 extra instructions — less than a single texture sample's
-   latency.
-
-3. **Branches that prevent compiler optimizations.** Some compilers cannot schedule instructions
-   across branch boundaries, reducing VLIW utilization on older architectures. Modern GPUs (NVIDIA
-   Volta+, AMD RDNA+, Apple M-series) use scalar+vector execution models where this is not a
-   concern.
-
-4. **Register pressure from the union of all branches.** This is the real cost, and it is why we
-   split heavy effects (shadows, glass) into separate pipelines. Within the main pipeline, all SDF
-   branches have similar register footprints (12–22 registers), so combining them causes negligible
-   occupancy loss.
-
-**References:**
-
- ARM solidpixel blog on branches in mobile shaders — comprehensive taxonomy of branch execution
-  models across GPU generations, confirms uniform and warp-coherent branches are free on modern
-  hardware:
-  https://solidpixel.github.io/2021/12/09/branches_in_shaders.html
- Peter Stefek's "A Note on Branching Within a Shader" — practical measurements showing that
-  warp-coherent branches have zero overhead on Pascal/Volta/Ampere, with clear explanation of the
-  SIMT divergence mechanism:
-  https://www.peterstefek.me/shader-branch.html
- NVIDIA Volta architecture whitepaper — documents independent thread scheduling which allows
-  divergent threads to reconverge more efficiently than older architectures:
-  https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf
- Randy Gaul on warp divergence in practice with per-primitive shape_type branching:
-  https://randygaul.github.io/graphics/2025/03/04/2D-Rendering-SDF-and-Atlases.html
-
-### Main pipeline: SDF + tessellated (unified)
-
-The main pipeline serves two submission modes through a single `TRIANGLELIST` pipeline and a single
-vertex input layout, distinguished by a push constant:
-
- **Tessellated mode** (`mode = 0`): direct vertex buffer with explicit geometry. Unchanged from
-  today. Used for text (SDL_ttf atlas sampling), polylines, triangle fans/strips, gradient-filled
-  shapes, and any user-provided raw vertex geometry.
- **SDF mode** (`mode = 1`): shared unit-quad vertex buffer + GPU storage buffer of `Primitive`
-  structs, drawn instanced. Used for all shapes with closed-form signed distance functions.
-
-Both modes converge on the same fragment shader, which dispatches on a `shape_kind` discriminant
-carried either in the vertex data (tessellated, always `Solid = 0`) or in the storage-buffer
-primitive struct (SDF modes).
-
-#### Why SDF for shapes
-
-CPU-side adaptive tessellation for curved shapes (the current approach) has three problems:
-
-1. **Vertex bandwidth.** A rounded rectangle with four corner arcs produces ~250 vertices × 20 bytes
-   = 5 KB. An SDF rounded rectangle is one `Primitive` struct (~56 bytes) plus 4 shared unit-quad
-   vertices. That is roughly a 90× reduction per shape.
-
-2. **Quality.** Tessellated curves are piecewise-linear approximations. At high DPI or under
-   animation/zoom, faceting is visible at any practical segment count. SDF evaluation produces
-   mathematically exact boundaries with perfect anti-aliasing via `smoothstep` in the fragment
-   shader.
-
-3. **Feature cost.** Adding soft edges, outlines, stroke effects, or rounded-cap line segments
-   requires extensive per-shape tessellation code. With SDF, these are trivial fragment shader
-   operations: `abs(d) - thickness` for stroke, `smoothstep(-soft, soft, d)` for soft edges.
-
-**References:**
-
- Inigo Quilez's 2D SDF primitive catalog (primary source for all SDF functions used):
-  https://iquilezles.org/articles/distfunctions2d/
- Valve's 2007 SIGGRAPH paper on SDF for vector textures and glyphs (foundational reference):
-  https://steamcdn-a.akamaihd.net/apps/valve/2007/SIGGRAPH2007_AlphaTestedMagnification.pdf
- Randy Gaul's practical writeup on SDF 2D rendering with shape-type branching, attribute layout,
-  warp divergence tradeoffs, and polyline rendering:
-  https://randygaul.github.io/graphics/2025/03/04/2D-Rendering-SDF-and-Atlases.html
- Audulus vger-rs — production 2D renderer using a single unified pipeline with SDF type
-  discriminant, same architecture as this plan. Replaced nanovg, achieving 120 FPS where nanovg fell
-  to 30 FPS due to CPU-side tessellation:
-  https://github.com/audulus/vger-rs
-
-#### Storage-buffer instancing for SDF primitives
-
-SDF primitives are submitted via a GPU storage buffer indexed by `gl_InstanceIndex` in the vertex
-shader, rather than encoding per-primitive data redundantly in vertex attributes. This follows the
-pattern used by both Zed GPUI and vger-rs.
-
-Each SDF shape is described by a single `Primitive` struct (~56 bytes) in the storage buffer. The
-vertex shader reads `primitives[gl_InstanceIndex]`, computes the quad corner position from the unit
-vertex and the primitive's bounds, and passes shape parameters to the fragment shader via `flat`
-interpolated varyings.
-
-Compared to encoding per-primitive data in vertex attributes (the "fat vertex" approach), storage-
-buffer instancing eliminates the 4–6× data duplication across quad corners. A rounded rectangle costs
-56 bytes instead of 4 vertices × 40+ bytes = 160+ bytes.
-
-The tessellated path retains the existing direct vertex buffer layout (20 bytes/vertex, no storage
-buffer access). The vertex shader branch on `mode` (push constant) is warp-uniform — every invocation
-in a draw call has the same mode — so it is effectively free on all modern GPUs.
-
-#### Shape kinds
-
-Primitives in the main pipeline's storage buffer carry a `Shape_Kind` discriminant:
-
-| Kind       | SDF function                           | Notes                                                     |
-| ---------- | -------------------------------------- | --------------------------------------------------------- |
-| `RRect`    | `sdRoundedBox` (iq)                    | Per-corner radii. Covers all Clay rectangles and borders. |
-| `Circle`   | `sdCircle`                             | Filled and stroked.                                       |
-| `Ellipse`  | `sdEllipse`                            | Exact (iq's closed-form).                                 |
-| `Segment`  | `sdSegment` capsule                    | Rounded caps, correct sub-pixel thin lines.               |
-| `Ring_Arc` | `abs(sdCircle) - thickness` + arc mask | Rings, arcs, circle sectors unified.                      |
-| `NGon`     | `sdRegularPolygon`                     | Regular n-gon for n ≥ 5.                                  |
-
-The `Solid` kind (value 0) is reserved for the tessellated path, where `shape_kind` is implicitly
-zero because the fragment shader receives it from zero-initialized vertex attributes.
-
-Stroke/outline variants of each shape are handled by the `Shape_Flags` bit set rather than separate
-shape kinds. The fragment shader transforms `d = abs(d) - stroke_width` when the `Stroke` flag is
-set.
-
-**What stays tessellated:**
-
- Text (SDL_ttf atlas, pending future MSDF evaluation)
- `rectangle_gradient`, `circle_gradient` (per-vertex color interpolation)
- `triangle_fan`, `triangle_strip` (arbitrary user-provided point lists)
- `line_strip` / polylines (SDF polyline rendering is possible but complex; deferred)
- Any raw vertex geometry submitted via `prepare_shape`
-
-The rule: if the shape has a closed-form SDF, it goes SDF. If it's described only by a vertex list or
-needs per-vertex color interpolation, it stays tessellated.
-
-### Effects pipeline
-
-The effects pipeline handles blur-based visual effects: drop shadows, inner shadows, outer glow, and
-similar. It uses the same storage-buffer instancing pattern as the main pipeline's SDF path, with a
-dedicated pipeline state object that has its own compiled fragment shader.
-
-#### Combined shape + effect rendering
-
-When a shape has an effect (e.g., a rounded rectangle with a drop shadow), the shape is drawn
-**once**, entirely in the effects pipeline. The effects fragment shader evaluates both the effect
-(blur math) and the base shape's SDF, compositing them in a single pass. The shape is not duplicated
-across pipelines.
-
-This avoids redundant overdraw. Consider a 200×100 rounded rect with a drop shadow offset by (5, 5)
-and blur sigma 10:
-
- **Separate-primitive approach** (shape in main pipeline + shadow in effects pipeline): the shadow
-  quad covers ~230×130 = 29,900 pixels, the shape quad covers 200×100 = 20,000 pixels. The ~18,500
-  shadow fragments underneath the shape run the expensive blur shader only to be overwritten by the
-  shape. Total fragment invocations: ~49,900.
-
- **Combined approach** (one primitive in effects pipeline): one quad covers ~230×130 = 29,900
-  pixels. The fragment shader evaluates the blur, then evaluates the shape SDF, composites the shape
-  on top. Total fragment invocations: ~29,900. The 20,000 shape-region fragments run the blur+shape
-  shader, but the shape SDF evaluation adds only ~15 ops to an ~80 op blur shader.
-
-The combined approach uses **~40% fewer fragment invocations** per effected shape (29,900 vs 49,900)
-in the common opaque case. The shape-region fragments pay a small additional cost for shape SDF
-evaluation in the effects shader (~15 ops), but this is far cheaper than running 18,500 fragments
-through the full blur shader (~80 ops each) and then discarding their output. For a UI with 10
-shadowed elements, the combined approach saves roughly 200,000 fragment invocations per frame.
-
-An `Effect_Flag.Draw_Base_Shape` flag controls whether the sharp shape layer composites on top
-(default true for drop shadow, always true for inner shadow). Standalone effects (e.g., a glow with
-no shape on top) clear this flag.
-
-Shapes without effects are submitted to the main pipeline as normal. Only shapes that have effects
-are routed to the effects pipeline.
-
-#### Drop shadow implementation
-
-Drop shadows use the analytical blurred-rounded-rectangle technique. Raph Levien's 2020 blog post
-describes an erf-based approximation that computes a Gaussian-blurred rounded rectangle in closed
-form along one axis and with a 4-sample numerical integration along the other. Total fragment cost is
-~80 FLOPs, one sqrt, no texture samples. This is the same technique used by Zed GPUI (via Evan
-Wallace's variant) and vger-rs.
-
-**References:**
-
- Raph Levien's blurred rounded rectangles post (erf approximation, squircle contour refinement):
-  https://raphlinus.github.io/graphics/2020/04/21/blurred-rounded-rects.html
- Evan Wallace's original WebGL implementation (used by Figma):
-  https://madebyevan.com/shaders/fast-rounded-rectangle-shadows/
- Vello's implementation of blurred rounded rectangle as a gradient type:
-  https://github.com/linebender/vello/pull/665
-
-### Backdrop pipeline
-
-The backdrop pipeline handles effects that sample the current render target as input: frosted glass,
-refraction, mirror surfaces. It is separated from the effects pipeline for a structural reason, not
-register pressure.
-
-**Render-pass boundary.** Before any backdrop-sampling fragment can run, the current render target
-must be copied to a separate texture via `CopyGPUTextureToTexture`. This is a command-buffer-level
-operation that cannot happen mid-render-pass. The copy naturally creates a pipeline boundary that no
-amount of shader optimization can eliminate — it is a fundamental requirement of sampling a surface
-while also writing to it.
-
-**Multi-pass implementation.** Backdrop effects are implemented as separable multi-pass sequences
-(downsample → horizontal blur → vertical blur → composite), following the standard approach used by
-iOS `UIVisualEffectView`, Android `RenderEffect`, and Flutter's `BackdropFilter`. Each individual
-pass has a low-to-medium register footprint (~15–40 registers), well within the main pipeline's
-occupancy range. The multi-pass approach avoids the monolithic 70+ register shader that a single-pass
-Gaussian blur would require, making backdrop effects viable on low-end mobile GPUs (including
-Mali-G31 and VideoCore VI) where per-thread register limits are tight.
-
-**Bracketed execution.** All backdrop draws in a frame share a single bracketed region of the command
-buffer: end the current render pass, copy the render target, execute all backdrop sub-passes, then
-resume normal drawing. The entry/exit cost (texture copy + render-pass break) is paid once per frame
-regardless of how many backdrop effects are visible. When no backdrop effects are present, the bracket
-is never entered and the texture copy never happens — zero cost.
-
-**Why not split the backdrop sub-passes into separate pipelines?** The individual passes range from
-~15 to ~40 registers, which does cross Mali's 32-register cliff. However, the register-pressure argument
-that justifies the main/effects split does not apply here. The main/effects split protects the
-_common path_ (90%+ of frame fragments) from the uncommon path's register cost. Inside the backdrop
-pipeline there is no common-vs-uncommon distinction — if backdrop effects are active, every sub-pass
-runs; if not, none run. The backdrop pipeline either executes as a complete unit or not at all.
-Additionally, backdrop effects cover a small fraction of the frame's total fragments (~5% at typical
-UI scales), so the occupancy variation within the bracket has negligible impact on frame time.
-
-### Vertex layout
-
-The vertex struct is unchanged from the current 20-byte layout:
-
-```
-Vertex :: struct {
-    position: [2]f32,  //  0: screen-space position
-    uv:       [2]f32,  //  8: atlas UV (text) or unused (shapes)
-    color:    Color,   // 16: u8x4, GPU-normalized to float
-}
-```
-
-This layout is shared between the tessellated path and the SDF unit-quad vertices. For tessellated
-draws, `position` carries actual world-space geometry. For SDF draws, `position` carries unit-quad
-corners (0,0 to 1,1) and the vertex shader computes world-space position from the storage-buffer
-primitive's bounds.
-
-The `Primitive` struct for SDF shapes lives in the storage buffer, not in vertex attributes:
-
-```
-Primitive :: struct {
-    bounds:     [4]f32,         //  0: min_x, min_y, max_x, max_y
-    color:      Color,          // 16: u8x4, unpacked in shader via unpackUnorm4x8
-    kind_flags: u32,            // 20: (kind as u32) | (flags as u32 << 8)
-    rotation:   f32,            // 24: shader self-rotation in radians
-    _pad:       f32,            // 28: alignment
-    params:     Shape_Params,   // 32: raw union, 32 bytes (two vec4s of shape-specific data)
-    uv_rect:    [4]f32,         // 64: texture UV sub-region (u_min, v_min, u_max, v_max)
-}
-// Total: 80 bytes (std430 aligned)
-```
-
-`Shape_Params` is a `#raw_union` with named variants per shape kind (`rrect`, `circle`, `segment`,
-etc.), ensuring type safety on the CPU side and zero-cost reinterpretation on the GPU side. The
-`uv_rect` field is used by textured SDF primitives (Shape_Flag.Textured); non-textured primitives
-leave it zeroed.
-
-### Draw submission order
-
-Within each scissor region, draws are issued in submission order to preserve the painter's algorithm:
-
-1. Bind **effects pipeline** → draw all queued effects primitives for this scissor (instanced, one
-   draw call). Each effects primitive includes its base shape and composites internally.
-2. Bind **main pipeline, tessellated mode** → draw all queued tessellated vertices (non-indexed for
-   shapes, indexed for text). Pipeline state unchanged from today.
-3. Bind **main pipeline, SDF mode** → draw all queued SDF primitives (instanced, one draw call).
-4. If backdrop effects are present: copy render target, bind **backdrop pipeline** → draw
-   backdrop primitives.
-
-The exact ordering within a scissor may be refined based on actual Z-ordering requirements. The key
-invariant is that each primitive is drawn exactly once, in the pipeline that owns it.
-
-### Text rendering
-
-Text rendering currently uses SDL_ttf's GPU text engine, which rasterizes glyphs per `(font, size)`
-pair into bitmap atlases and emits indexed triangle data via `GetGPUTextDrawData`. This path is
-**unchanged** by the SDF migration — text continues to flow through the main pipeline's tessellated
-mode with `shape_kind = Solid`, sampling the SDL_ttf atlas texture.
-
-A future phase may evaluate MSDF (multi-channel signed distance field) text rendering, which would
-allow resolution-independent glyph rendering from a single small atlas per font. This would involve:
-
- Offline atlas generation via Chlumský's msdf-atlas-gen tool.
- Runtime glyph metrics via `vendor:stb/truetype` (already in the Odin distribution).
- A new `Shape_Kind.MSDF_Glyph` variant in the main pipeline's fragment shader.
- Potential removal of the SDL_ttf dependency.
-
-This is explicitly deferred. The SDF shape migration is independent of and does not block text
-changes.
-
-**References:**
-
- Viktor Chlumský's MSDF master's thesis and msdfgen tool:
-  https://github.com/Chlumsky/msdfgen
- MSDF atlas generator for font atlas packing:
-  https://github.com/Chlumsky/msdf-atlas-gen
- Valve's original SDF text rendering paper (SIGGRAPH 2007):
-  https://steamcdn-a.akamaihd.net/apps/valve/2007/SIGGRAPH2007_AlphaTestedMagnification.pdf
-
-### Textures
-
-Textures plug into the existing main pipeline — no additional GPU pipeline, no shader rewrite. The
-work is a resource layer (registration, upload, sampling, lifecycle) plus two textured-draw procs
-that route into the existing tessellated and SDF paths respectively.
-
-#### Why draw owns registered textures
-
-A texture's GPU resource (the `^sdl.GPUTexture`, transfer buffer, shader resource view) is created
-and destroyed by draw. The user provides raw bytes and a descriptor at registration time; draw
-uploads synchronously and returns an opaque `Texture_Id` handle. The user can free their CPU-side
-bytes immediately after `register_texture` returns.
-
-This follows the model used by the RAD Debugger's render layer (`src/render/render_core.h` in
-EpicGamesExt/raddebugger, MIT license), where `r_tex2d_alloc` takes `(kind, size, format, data)`
-and returns an opaque handle that the renderer owns and releases. The single-owner model eliminates
-an entire class of lifecycle bugs (double-free, use-after-free across subsystems, unclear cleanup
-responsibility) that dual-ownership designs introduce.
-
-If advanced interop is ever needed (e.g., a future 3D pipeline or compute shader sharing the same
-GPU texture), the clean extension is a borrowed-reference accessor (`get_gpu_texture(id)`) that
-returns the underlying handle without transferring ownership. This is purely additive and does not
-require changing the registration API.
-
-#### Why `Texture_Kind` exists
-
-`Texture_Kind` (Static / Dynamic / Stream) is a driver hint for update frequency, adopted from the
-RAD Debugger's `R_ResourceKind`. It maps directly to SDL3 GPU usage patterns:
-
- **Static**: uploaded once, never changes. Covers QR codes, decoded PNGs, icons — the 90% case.
- **Dynamic**: updatable via `update_texture_region`. Covers font atlas growth, procedural updates.
- **Stream**: frequent full re-uploads. Covers video playback, per-frame procedural generation.
-
-This costs one byte in the descriptor and lets the backend pick optimal memory placement without a
-future API change.
-
-#### Why samplers are per-draw, not per-texture
-
-A sampler describes how to filter and address a texture during sampling — nearest vs bilinear, clamp
-vs repeat. This is a property of the _draw_, not the texture. The same QR code texture should be
-sampled with `Nearest_Clamp` when displayed at native resolution but could reasonably be sampled
-with `Linear_Clamp` in a zoomed-out thumbnail. The same icon atlas might be sampled with
-`Nearest_Clamp` for pixel art or `Linear_Clamp` for smooth scaling.
-
-The RAD Debugger follows this pattern: `R_BatchGroup2DParams` carries `tex_sample_kind` alongside
-the texture handle, chosen per batch group at draw time. We do the same — `Sampler_Preset` is a
-parameter on the draw procs, not a field on `Texture_Desc`.
-
-Internally, draw keeps a small pool of pre-created `^sdl.GPUSampler` objects (one per preset,
-lazily initialized). Sub-batch coalescing keys on `(kind, texture_id, sampler_preset)` — draws
-with the same texture but different samplers produce separate draw calls, which is correct.
-
-#### Textured draw procs
-
-Textured rectangles route through the existing SDF path via `draw.rectangle_texture` and
-`draw.rectangle_texture_corners`, mirroring `draw.rectangle` and `draw.rectangle_corners` exactly —
-same parameters, same naming — with the color parameter replaced by a texture ID plus an optional
-tint.
-
-An earlier iteration of this design considered a separate tessellated `draw.texture` proc for
-"simple" fullscreen quads, on the theory that the tessellated path's lower register count (~16 regs
-vs ~24 for the SDF textured branch) would improve occupancy at large fragment counts. Applying the
-register-pressure analysis from the pipeline-strategy section above shows this is wrong: both 16 and
-24 registers are well below the register cliff (~43 regs on consumer Ampere/Ada, ~32 on Volta/A100),
-so both run at 100% occupancy. The remaining ALU difference (~15 extra instructions for the SDF
-evaluation) amounts to ~20μs at 4K — below noise. Meanwhile, splitting into a separate pipeline
-would add ~1–5μs per pipeline bind on the CPU side per scissor, matching or exceeding the GPU-side
-savings. Within the main pipeline, unified remains strictly better.
-
-The naming convention follows the existing shape API: `rectangle_texture` and
-`rectangle_texture_corners` sit alongside `rectangle` and `rectangle_corners`, mirroring the
-`rectangle_gradient` / `circle_gradient` pattern where the shape is the primary noun and the
-modifier (gradient, texture) is secondary. This groups related procs together in autocomplete
-(`rectangle_*`) and reads as natural English ("draw a rectangle with a texture").
-
-Future per-shape texture variants (`circle_texture`, `ellipse_texture`, `polygon_texture`) are
-reserved by this naming convention and require only a `Shape_Flag.Textured` bit plus a small
-per-shape UV mapping function in the fragment shader. These are additive.
-
-#### What SDF anti-aliasing does and does not do for textured draws
-
-The SDF path anti-aliases the **shape's outer silhouette** — rounded-corner edges, rotated edges,
-stroke outlines. It does not anti-alias or sharpen the texture content. Inside the shape, fragments
-sample through the chosen `Sampler_Preset`, and image quality is whatever the sampler produces from
-the source texels. A low-resolution texture displayed at a large size shows bilinear blur regardless
-of which draw proc is used. This matches the current text-rendering model, where glyph sharpness
-depends on how closely the display size matches the SDL_ttf atlas's rasterized size.
-
-#### Fit modes are a computation layer, not a renderer concept
-
-Standard image-fit behaviors (stretch, fill/cover, fit/contain, tile, center) are expressed as UV
-sub-region computations on top of the `uv_rect` parameter that both textured-draw procs accept. The
-renderer has no knowledge of fit modes — it samples whatever UV region it is given.
-
-A `fit_params` helper computes the appropriate `uv_rect`, sampler preset, and (for letterbox/fit
-mode) shrunken inner rect from a `Fit_Mode` enum, the target rect, and the texture's pixel size.
-Users who need custom UV control (sprite atlas sub-regions, UV animation, nine-patch slicing) skip
-the helper and compute `uv_rect` directly. This keeps the renderer primitive minimal while making
-the common cases convenient.
-
-#### Deferred release
-
-`unregister_texture` does not immediately release the GPU texture. It queues the slot for release at
-the end of the current frame, after `SubmitGPUCommandBuffer` has handed work to the GPU. This
-prevents a race condition where a texture is freed while the GPU is still sampling from it in an
-already-submitted command buffer. The same deferred-release pattern is applied to `clear_text_cache`
-and `clear_text_cache_entry`, fixing a pre-existing latent bug where destroying a cached
-`^sdl_ttf.Text` mid-frame could free an atlas texture still referenced by in-flight draw batches.
-
-This pattern is standard in production renderers — the RAD Debugger's `r_tex2d_release` queues
-textures onto a free list that is processed in `r_end_frame`, not at the call site.
-
-#### Clay integration
-
-Clay's `RenderCommandType.Image` is handled by dereferencing `imageData: rawptr` as a pointer to a
-`Clay_Image_Data` struct containing a `Texture_Id`, `Fit_Mode`, and tint color. Routing mirrors the
-existing rectangle handling: zero `cornerRadius` dispatches to `draw.texture` (tessellated), nonzero
-dispatches to `draw.rectangle_texture_corners` (SDF). A `fit_params` call computes UVs from the fit
-mode before dispatch.
-
-#### Deferred features
-
-The following are plumbed in the descriptor but not implemented in phase 1:
-
- **Mipmaps**: `Texture_Desc.mip_levels` field exists; generation via SDL3 deferred.
- **Compressed formats**: `Texture_Desc.format` accepts BC/ASTC; upload path deferred.
- **Render-to-texture**: `Texture_Desc.usage` accepts `.COLOR_TARGET`; render-pass refactor deferred.
- **3D textures, arrays, cube maps**: `Texture_Desc.type` and `depth_or_layers` fields exist.
- **Additional samplers**: anisotropic, trilinear, clamp-to-border — additive enum values.
- **Atlas packing**: internal optimization for sub-batch coalescing; invisible to callers.
- **Per-shape texture variants**: `circle_texture`, `ellipse_texture`, etc. — reserved by naming.
-
-**References:**
-
- RAD Debugger render layer (ownership model, deferred release, sampler-at-draw-time):
-  https://github.com/EpicGamesExt/raddebugger — `src/render/render_core.h`, `src/render/d3d11/render_d3d11.c`
- Casey Muratori, Handmade Hero day 472 — texture handling as a renderer-owned resource concern,
-  atlases as a separate layer above the renderer.
-
-## 3D rendering
-
-3D pipeline architecture is under consideration and will be documented separately. The current
-expectation is that 3D rendering will use dedicated pipelines (separate from the 2D pipelines)
-sharing GPU resources (textures, samplers, command buffer lifecycle) with the 2D renderer.
-
-## Multi-window support
-
-The renderer currently assumes a single window via the global `GLOB` state. Multi-window support is
-deferred but anticipated. When revisited, the RAD Debugger's bucket + pass-list model
-(`src/draw/draw.h`, `src/draw/draw.c` in EpicGamesExt/raddebugger) is worth studying as a reference.
-
-RAD separates draw submission from rendering via **buckets**. A `DR_Bucket` is an explicit handle
-that accumulates an ordered list of render passes (`R_PassList`). The user creates a bucket, pushes
-it onto a thread-local stack, issues draw calls (which target the top-of-stack bucket), then submits
-the bucket to a specific window. Multiple buckets can exist simultaneously — one per window, or one
-per UI panel that gets composited into a parent bucket via `dr_sub_bucket`. Implicit draw parameters
-(clip rect, 2D transform, sampler mode, transparency) are managed via push/pop stacks scoped to each
-bucket, so different windows can have independent clip and transform state without interference.
-
-The key properties this gives RAD:
-
- **Per-window isolation.** Each window builds its own bucket with its own pass list and state stacks.
-  No global contention.
- **Thread-parallel building.** Each thread has its own draw context and arena. Multiple threads can
-  build buckets concurrently, then submit them to the render backend sequentially.
- **Compositing.** A pre-built bucket (e.g., a tooltip or overlay) can be injected into another
-  bucket with a transform applied, without rebuilding its draw calls.
-
-For our library, the likely adaptation would be replacing the single `GLOB` with a per-window draw
-context that users create and pass to `begin`/`end`, while keeping the explicit-parameter draw call
-style rather than adopting RAD's implicit state stacks. Texture and sampler resources would remain
-global (shared across windows), with only the per-frame staging buffers and layer/scissor state
-becoming per-context.
-
-## Building shaders
-
-GLSL shader sources live in `shaders/source/`. Compiled outputs (SPIR-V and Metal Shading Language)
-are generated into `shaders/generated/` via the meta tool:
-
-```
-odin run meta -- gen-shaders
-```
-
-Requires `glslangValidator` and `spirv-cross` on PATH.
-
-### Shader format selection
-
-The library embeds shader bytecode per compile target — MSL + `main0` entry point on Darwin (via
-`spirv-cross --msl`, which renames `main` because it is reserved in Metal), SPIR-V + `main` entry
-point elsewhere. Three compile-time constants in `draw.odin` expose the build's shader configuration:
-
-| Constant                      | Type                      | Darwin    | Other      |
-| ----------------------------- | ------------------------- | --------- | ---------- |
-| `PLATFORM_SHADER_FORMAT_FLAG` | `sdl.GPUShaderFormatFlag` | `.MSL`    | `.SPIRV`   |
-| `PLATFORM_SHADER_FORMAT`      | `sdl.GPUShaderFormat`     | `{.MSL}`  | `{.SPIRV}` |
-| `SHADER_ENTRY`                | `cstring`                 | `"main0"` | `"main"`   |
-
-Pass `PLATFORM_SHADER_FORMAT` to `sdl.CreateGPUDevice` so SDL selects a backend compatible with the
-embedded bytecode:
-
-```
-gpu := sdl.CreateGPUDevice(draw.PLATFORM_SHADER_FORMAT, true, nil)
-```
-
-At init time the library calls `sdl.GetGPUShaderFormats(device)` to verify the active backend
-accepts `PLATFORM_SHADER_FORMAT_FLAG`. If it does not, `draw.init` returns `false` with a
-descriptive log message showing both the embedded and active format sets.
@@ -1,175 +0,0 @@
-package draw_qr
-
-import draw ".."
-import "../../qrcode"
-
-// Returns the number of bytes to_texture will write for the given encoded
-// QR buffer. Equivalent to size*size*4 where size = qrcode.get_size(qrcode_buf).
-texture_size :: #force_inline proc(qrcode_buf: []u8) -> int {
-	size := qrcode.get_size(qrcode_buf)
-	return size * size * 4
-}
-
-// Decodes an encoded QR buffer into tightly-packed RGBA pixel data written to
-// texture_buf. No allocations, no GPU calls. Returns the Texture_Desc the
-// caller should pass to draw.register_texture alongside texture_buf.
-//
-// Returns ok=false when:
-//   - qrcode_buf is invalid (qrcode.get_size returns 0).
-//   - texture_buf is smaller than to_texture_size(qrcode_buf).
-@(require_results)
-to_texture :: proc(
-	qrcode_buf: []u8,
-	texture_buf: []u8,
-	dark: draw.Color = draw.BLACK,
-	light: draw.Color = draw.WHITE,
-) -> (
-	desc: draw.Texture_Desc,
-	ok: bool,
-) {
-	size := qrcode.get_size(qrcode_buf)
-	if size == 0 do return {}, false
-	if len(texture_buf) < size * size * 4 do return {}, false
-
-	for y in 0 ..< size {
-		for x in 0 ..< size {
-			i := (y * size + x) * 4
-			c := dark if qrcode.get_module(qrcode_buf, x, y) else light
-			texture_buf[i + 0] = c[0]
-			texture_buf[i + 1] = c[1]
-			texture_buf[i + 2] = c[2]
-			texture_buf[i + 3] = c[3]
-		}
-	}
-
-	return draw.Texture_Desc {
-			width = u32(size),
-			height = u32(size),
-			depth_or_layers = 1,
-			type = .D2,
-			format = .R8G8B8A8_UNORM,
-			usage = {.SAMPLER},
-			mip_levels = 1,
-			kind = .Static,
-		},
-		true
-}
-
-// Allocates pixel buffer via temp_allocator, decodes qrcode_buf into it, and
-// registers with the GPU. The pixel allocation is freed before return.
-//
-// Returns ok=false when:
-//   - qrcode_buf is invalid (qrcode.get_size returns 0).
-//   - temp_allocator fails to allocate the pixel buffer.
-//   - GPU texture registration fails.
-@(require_results)
-register_texture_from_raw :: proc(
-	qrcode_buf: []u8,
-	dark: draw.Color = draw.BLACK,
-	light: draw.Color = draw.WHITE,
-	temp_allocator := context.temp_allocator,
-) -> (
-	texture: draw.Texture_Id,
-	ok: bool,
-) {
-	tex_size := texture_size(qrcode_buf)
-	if tex_size == 0 do return draw.INVALID_TEXTURE, false
-
-	pixels, alloc_err := make([]u8, tex_size, temp_allocator)
-	if alloc_err != nil do return draw.INVALID_TEXTURE, false
-	defer delete(pixels, temp_allocator)
-
-	desc := to_texture(qrcode_buf, pixels, dark, light) or_return
-	return draw.register_texture(desc, pixels)
-}
-
-// Encodes text as a QR Code and registers the result as an RGBA texture.
-//
-// Returns ok=false when:
-//   - temp_allocator fails to allocate.
-//   - The text cannot fit in any version within [min_version, max_version] at the given ECL.
-//   - GPU texture registration fails.
-@(require_results)
-register_texture_from_text :: proc(
-	text: string,
-	ecl: qrcode.Ecc = .Low,
-	min_version: int = qrcode.VERSION_MIN,
-	max_version: int = qrcode.VERSION_MAX,
-	mask: Maybe(qrcode.Mask) = nil,
-	boost_ecl: bool = true,
-	dark: draw.Color = draw.BLACK,
-	light: draw.Color = draw.WHITE,
-	temp_allocator := context.temp_allocator,
-) -> (
-	texture: draw.Texture_Id,
-	ok: bool,
-) {
-	qrcode_buf, alloc_err := make([]u8, qrcode.buffer_len_for_version(max_version), temp_allocator)
-	if alloc_err != nil do return draw.INVALID_TEXTURE, false
-	defer delete(qrcode_buf, temp_allocator)
-
-	qrcode.encode_auto(
-		text,
-		qrcode_buf,
-		ecl,
-		min_version,
-		max_version,
-		mask,
-		boost_ecl,
-		temp_allocator,
-	) or_return
-
-	return register_texture_from_raw(qrcode_buf, dark, light, temp_allocator)
-}
-
-// Encodes arbitrary binary data as a QR Code and registers the result as an RGBA texture.
-//
-// Returns ok=false when:
-//   - temp_allocator fails to allocate.
-//   - The payload cannot fit in any version within [min_version, max_version] at the given ECL.
-//   - GPU texture registration fails.
-@(require_results)
-register_texture_from_binary :: proc(
-	bin_data: []u8,
-	ecl: qrcode.Ecc = .Low,
-	min_version: int = qrcode.VERSION_MIN,
-	max_version: int = qrcode.VERSION_MAX,
-	mask: Maybe(qrcode.Mask) = nil,
-	boost_ecl: bool = true,
-	dark: draw.Color = draw.BLACK,
-	light: draw.Color = draw.WHITE,
-	temp_allocator := context.temp_allocator,
-) -> (
-	texture: draw.Texture_Id,
-	ok: bool,
-) {
-	qrcode_buf, alloc_err := make([]u8, qrcode.buffer_len_for_version(max_version), temp_allocator)
-	if alloc_err != nil do return draw.INVALID_TEXTURE, false
-	defer delete(qrcode_buf, temp_allocator)
-
-	qrcode.encode_auto(
-		bin_data,
-		qrcode_buf,
-		ecl,
-		min_version,
-		max_version,
-		mask,
-		boost_ecl,
-		temp_allocator,
-	) or_return
-
-	return register_texture_from_raw(qrcode_buf, dark, light, temp_allocator)
-}
-
-register_texture_from :: proc {
-	register_texture_from_text,
-	register_texture_from_binary
-}
-
-// Default fit=.Fit preserves the QR's square aspect; override as needed.
-clay_image :: #force_inline proc(
-	texture: draw.Texture_Id,
-	tint: draw.Color = draw.WHITE,
-) -> draw.Clay_Image_Data {
-	return draw.clay_image_data(texture, fit = .Fit, tint = tint)
-}
@@ -1,352 +0,0 @@
-package examples
-
-import "../../draw"
-import "../../vendor/clay"
-import "core:math"
-import "core:os"
-import sdl "vendor:sdl3"
-
-JETBRAINS_MONO_REGULAR_RAW :: #load("fonts/JetBrainsMono-Regular.ttf")
-JETBRAINS_MONO_REGULAR: draw.Font_Id = max(draw.Font_Id) // Max so we crash if registration is forgotten
-
-hellope_shapes :: proc() {
-	if !sdl.Init({.VIDEO}) do os.exit(1)
-	window := sdl.CreateWindow("Hellope!", 500, 500, {.HIGH_PIXEL_DENSITY})
-	gpu := sdl.CreateGPUDevice(draw.PLATFORM_SHADER_FORMAT, true, nil)
-	if !sdl.ClaimWindowForGPUDevice(gpu, window) do os.exit(1)
-	if !draw.init(gpu, window) do os.exit(1)
-
-	spin_angle: f32 = 0
-
-	for {
-		defer free_all(context.temp_allocator)
-		ev: sdl.Event
-		for sdl.PollEvent(&ev) {
-			if ev.type == .QUIT do return
-		}
-		spin_angle += 1
-		base_layer := draw.begin({width = 500, height = 500})
-
-		// Background
-		draw.rectangle(base_layer, {0, 0, 500, 500}, {40, 40, 40, 255})
-
-		// ----- Shapes without rotation (existing demo) -----
-		draw.rectangle(base_layer, {20, 20, 200, 120}, {80, 120, 200, 255})
-		draw.rectangle_lines(base_layer, {20, 20, 200, 120}, draw.WHITE, thickness = 2)
-		draw.rectangle(base_layer, {240, 20, 240, 120}, {200, 80, 80, 255}, roundness = 0.3)
-		draw.rectangle_gradient(
-			base_layer,
-			{20, 160, 460, 60},
-			{255, 0, 0, 255},
-			{0, 255, 0, 255},
-			{0, 0, 255, 255},
-			{255, 255, 0, 255},
-		)
-
-		// ----- Rotation demos -----
-
-		// Rectangle rotating around its center
-		rect := draw.Rectangle{100, 320, 80, 50}
-		draw.rectangle(
-			base_layer,
-			rect,
-			{100, 200, 100, 255},
-			origin = draw.center_of(rect),
-			rotation = spin_angle,
-		)
-		draw.rectangle_lines(
-			base_layer,
-			rect,
-			draw.WHITE,
-			thickness = 2,
-			origin = draw.center_of(rect),
-			rotation = spin_angle,
-		)
-
-		// Rounded rectangle rotating around its center
-		rrect := draw.Rectangle{230, 300, 100, 80}
-		draw.rectangle(
-			base_layer,
-			rrect,
-			{200, 100, 200, 255},
-			roundness = 0.4,
-			origin = draw.center_of(rrect),
-			rotation = spin_angle,
-		)
-
-		// Ellipse rotating around its center (tilted ellipse)
-		draw.ellipse(base_layer, {410, 340}, 50, 30, {255, 200, 50, 255}, rotation = spin_angle)
-
-		// Circle orbiting a point (moon orbiting planet)
-		// Convention B: center = pivot point (planet), origin = offset from moon center to pivot.
-		// Moon's visual center at rotation=0: planet_pos - origin = (100, 450) - (0, 40) = (100, 410).
-		planet_pos := [2]f32{100, 450}
-		draw.circle(base_layer, planet_pos, 8, {200, 200, 200, 255}) // planet (stationary)
-		draw.circle(base_layer, planet_pos, 5, {100, 150, 255, 255}, origin = {0, 40}, rotation = spin_angle) // moon orbiting
-
-		// Ring arc rotating in place
-		draw.ring(base_layer, {250, 450}, 15, 30, 0, 270, {100, 100, 220, 255}, rotation = spin_angle)
-
-		// Triangle rotating around its center
-		tv1 := [2]f32{350, 420}
-		tv2 := [2]f32{420, 480}
-		tv3 := [2]f32{340, 480}
-		draw.triangle(
-			base_layer,
-			tv1,
-			tv2,
-			tv3,
-			{220, 180, 60, 255},
-			origin = draw.center_of(tv1, tv2, tv3),
-			rotation = spin_angle,
-		)
-
-		// Polygon rotating around its center (already had rotation; now with origin for orbit)
-		draw.polygon(base_layer, {460, 450}, 6, 30, {180, 100, 220, 255}, rotation = spin_angle)
-		draw.polygon_lines(base_layer, {460, 450}, 6, 30, draw.WHITE, rotation = spin_angle, thickness = 2)
-
-		draw.end(gpu, window)
-	}
-}
-
-hellope_text :: proc() {
-	HELLOPE_ID :: 1
-	ROTATING_SENTENCE_ID :: 2
-	MEASURED_ID :: 3
-	CORNER_SPIN_ID :: 4
-
-	if !sdl.Init({.VIDEO}) do os.exit(1)
-	window := sdl.CreateWindow("Hellope!", 600, 600, {.HIGH_PIXEL_DENSITY})
-	gpu := sdl.CreateGPUDevice(draw.PLATFORM_SHADER_FORMAT, true, nil)
-	if !sdl.ClaimWindowForGPUDevice(gpu, window) do os.exit(1)
-	if !draw.init(gpu, window) do os.exit(1)
-	JETBRAINS_MONO_REGULAR = draw.register_font(JETBRAINS_MONO_REGULAR_RAW)
-
-	FONT_SIZE :: u16(24)
-	spin_angle: f32 = 0
-
-	for {
-		defer free_all(context.temp_allocator)
-		ev: sdl.Event
-		for sdl.PollEvent(&ev) {
-			if ev.type == .QUIT do return
-		}
-		spin_angle += 0.5
-		base_layer := draw.begin({width = 600, height = 600})
-
-		// Grey background
-		draw.rectangle(base_layer, {0, 0, 600, 600}, {127, 127, 127, 255})
-
-		// ----- Text API demos -----
-
-		// Cached text with id — TTF_Text reused across frames (good for text-heavy apps)
-		draw.text(
-			base_layer,
-			"Hellope!",
-			{300, 80},
-			JETBRAINS_MONO_REGULAR,
-			FONT_SIZE,
-			color = draw.WHITE,
-			origin = draw.center_of("Hellope!", JETBRAINS_MONO_REGULAR, FONT_SIZE),
-			id = HELLOPE_ID,
-		)
-
-		// Rotating sentence — verifies multi-word text rotation around center
-		draw.text(
-			base_layer,
-			"Hellope World!",
-			{300, 250},
-			JETBRAINS_MONO_REGULAR,
-			FONT_SIZE,
-			color = {255, 200, 50, 255},
-			origin = draw.center_of("Hellope World!", JETBRAINS_MONO_REGULAR, FONT_SIZE),
-			rotation = spin_angle,
-			id = ROTATING_SENTENCE_ID,
-		)
-
-		// Uncached text (no id) — created and destroyed each frame, simplest usage
-		draw.text(
-			base_layer,
-			"Top-left anchored",
-			{20, 450},
-			JETBRAINS_MONO_REGULAR,
-			FONT_SIZE,
-			color = draw.WHITE,
-		)
-
-		// Measure text for manual layout
-		size := draw.measure_text("Measured!", JETBRAINS_MONO_REGULAR, FONT_SIZE)
-		draw.rectangle(base_layer, {300 - size.x / 2, 380, size.x, size.y}, {60, 60, 60, 200})
-		draw.text(
-			base_layer,
-			"Measured!",
-			{300, 380},
-			JETBRAINS_MONO_REGULAR,
-			FONT_SIZE,
-			color = draw.WHITE,
-			origin = draw.top_of("Measured!", JETBRAINS_MONO_REGULAR, FONT_SIZE),
-			id = MEASURED_ID,
-		)
-
-		// Rotating text anchored at top-left (no origin offset) — spins around top-left corner
-		draw.text(
-			base_layer,
-			"Corner spin",
-			{150, 530},
-			JETBRAINS_MONO_REGULAR,
-			FONT_SIZE,
-			color = {100, 200, 255, 255},
-			rotation = spin_angle,
-			id = CORNER_SPIN_ID,
-		)
-
-		draw.end(gpu, window)
-	}
-}
-
-hellope_clay :: proc() {
-	if !sdl.Init({.VIDEO}) do os.exit(1)
-	window := sdl.CreateWindow("Hellope!", 500, 500, {.HIGH_PIXEL_DENSITY})
-	gpu := sdl.CreateGPUDevice(draw.PLATFORM_SHADER_FORMAT, true, nil)
-	if !sdl.ClaimWindowForGPUDevice(gpu, window) do os.exit(1)
-	if !draw.init(gpu, window) do os.exit(1)
-	JETBRAINS_MONO_REGULAR = draw.register_font(JETBRAINS_MONO_REGULAR_RAW)
-
-	text_config := clay.TextElementConfig {
-		fontId    = JETBRAINS_MONO_REGULAR,
-		fontSize  = 36,
-		textColor = {255, 255, 255, 255},
-	}
-
-	for {
-		defer free_all(context.temp_allocator)
-		ev: sdl.Event
-		for sdl.PollEvent(&ev) {
-			if ev.type == .QUIT do return
-		}
-		base_layer := draw.begin({width = 500, height = 500})
-		clay.SetLayoutDimensions({width = base_layer.bounds.width, height = base_layer.bounds.height})
-		clay.BeginLayout()
-		if clay.UI()(
-		{
-			id = clay.ID("outer"),
-			layout = {
-				sizing = {clay.SizingGrow({}), clay.SizingGrow({})},
-				childAlignment = {x = .Center, y = .Center},
-			},
-			backgroundColor = {127, 127, 127, 255},
-		},
-		) {
-			clay.Text("Hellope!", &text_config)
-		}
-		clay_batch := draw.ClayBatch {
-			bounds = base_layer.bounds,
-			cmds   = clay.EndLayout(),
-		}
-		draw.prepare_clay_batch(base_layer, &clay_batch, {0, 0})
-		draw.end(gpu, window)
-	}
-}
-
-hellope_custom :: proc() {
-	if !sdl.Init({.VIDEO}) do os.exit(1)
-	window := sdl.CreateWindow("Hellope Custom!", 600, 400, {.HIGH_PIXEL_DENSITY})
-	gpu := sdl.CreateGPUDevice(draw.PLATFORM_SHADER_FORMAT, true, nil)
-	if !sdl.ClaimWindowForGPUDevice(gpu, window) do os.exit(1)
-	if !draw.init(gpu, window) do os.exit(1)
-	JETBRAINS_MONO_REGULAR = draw.register_font(JETBRAINS_MONO_REGULAR_RAW)
-
-	text_config := clay.TextElementConfig {
-		fontId    = JETBRAINS_MONO_REGULAR,
-		fontSize  = 24,
-		textColor = {255, 255, 255, 255},
-	}
-
-	gauge := Gauge {
-		value = 0.73,
-		color = {50, 200, 100, 255},
-	}
-	gauge2 := Gauge {
-		value = 0.45,
-		color = {200, 100, 50, 255},
-	}
-	spin_angle: f32 = 0
-
-	for {
-		defer free_all(context.temp_allocator)
-		ev: sdl.Event
-		for sdl.PollEvent(&ev) {
-			if ev.type == .QUIT do return
-		}
-
-		spin_angle += 1
-		gauge.value = (math.sin(spin_angle * 0.02) + 1) * 0.5
-		gauge2.value = (math.cos(spin_angle * 0.03) + 1) * 0.5
-
-		base_layer := draw.begin({width = 600, height = 400})
-		clay.SetLayoutDimensions({width = base_layer.bounds.width, height = base_layer.bounds.height})
-		clay.BeginLayout()
-
-		if clay.UI()(
-		{
-			id = clay.ID("outer"),
-			layout = {
-				sizing = {clay.SizingGrow({}), clay.SizingGrow({})},
-				childAlignment = {x = .Center, y = .Center},
-				layoutDirection = .TopToBottom,
-				childGap = 20,
-			},
-			backgroundColor = {50, 50, 50, 255},
-		},
-		) {
-			if clay.UI()({id = clay.ID("title"), layout = {sizing = {clay.SizingFit({}), clay.SizingFit({})}}}) {
-				clay.Text("Custom Draw Demo", &text_config)
-			}
-
-			if clay.UI()(
-			{
-				id = clay.ID("gauge"),
-				layout = {sizing = {clay.SizingFixed(300), clay.SizingFixed(30)}},
-				custom = {customData = &gauge},
-				backgroundColor = {80, 80, 80, 255},
-			},
-			) {}
-
-			if clay.UI()(
-			{
-				id = clay.ID("gauge2"),
-				layout = {sizing = {clay.SizingFixed(300), clay.SizingFixed(30)}},
-				custom = {customData = &gauge2},
-				backgroundColor = {80, 80, 80, 255},
-			},
-			) {}
-		}
-
-		clay_batch := draw.ClayBatch {
-			bounds = base_layer.bounds,
-			cmds   = clay.EndLayout(),
-		}
-		draw.prepare_clay_batch(base_layer, &clay_batch, {0, 0}, custom_draw = draw_custom)
-		draw.end(gpu, window)
-	}
-
-	Gauge :: struct {
-		value: f32,
-		color: draw.Color,
-	}
-
-	draw_custom :: proc(layer: ^draw.Layer, bounds: draw.Rectangle, render_data: clay.CustomRenderData) {
-		gauge := cast(^Gauge)render_data.customData
-
-		// Background from clay's backgroundColor
-		draw.rectangle(layer, bounds, draw.color_from_clay(render_data.backgroundColor), roundness = 0.25)
-
-		// Fill bar
-		fill := bounds
-		fill.width *= gauge.value
-		draw.rectangle(layer, fill, gauge.color, roundness = 0.25)
-
-		// Border
-		draw.rectangle_lines(layer, bounds, draw.WHITE, thickness = 2, roundness = 0.25)
-	}
-}
@@ -1,75 +0,0 @@
-package examples
-
-import "core:fmt"
-import "core:mem"
-import "core:os"
-
-main :: proc() {
-	//----- Tracking allocator ----------------------------------
-	{
-		tracking_temp_allocator := false
-		// Temp
-		track_temp: mem.Tracking_Allocator
-		if tracking_temp_allocator {
-			mem.tracking_allocator_init(&track_temp, context.temp_allocator)
-			context.temp_allocator = mem.tracking_allocator(&track_temp)
-		}
-		// Default
-		track: mem.Tracking_Allocator
-		mem.tracking_allocator_init(&track, context.allocator)
-		context.allocator = mem.tracking_allocator(&track)
-		// Log a warning about any memory that was not freed by the end of the program.
-		// This could be fine for some global state or it could be a memory leak.
-		defer {
-			// Temp allocator
-			if tracking_temp_allocator {
-				if len(track_temp.allocation_map) > 0 {
-					fmt.eprintf("=== %v allocations not freed - temp allocator: ===\n", len(track_temp.allocation_map))
-					for _, entry in track_temp.allocation_map {
-						fmt.eprintf("- %v bytes @ %v\n", entry.size, entry.location)
-					}
-				}
-				if len(track_temp.bad_free_array) > 0 {
-					fmt.eprintf("=== %v incorrect frees - temp allocator: ===\n", len(track_temp.bad_free_array))
-					for entry in track_temp.bad_free_array {
-						fmt.eprintf("- %p @ %v\n", entry.memory, entry.location)
-					}
-				}
-				mem.tracking_allocator_destroy(&track_temp)
-			}
-			// Default allocator
-			if len(track.allocation_map) > 0 {
-				fmt.eprintf("=== %v allocations not freed - main allocator: ===\n", len(track.allocation_map))
-				for _, entry in track.allocation_map {
-					fmt.eprintf("- %v bytes @ %v\n", entry.size, entry.location)
-				}
-			}
-			if len(track.bad_free_array) > 0 {
-				fmt.eprintf("=== %v incorrect frees - main allocator: ===\n", len(track.bad_free_array))
-				for entry in track.bad_free_array {
-					fmt.eprintf("- %p @ %v\n", entry.memory, entry.location)
-				}
-			}
-			mem.tracking_allocator_destroy(&track)
-		}
-	}
-
-	args := os.args
-	if len(args) < 2 {
-		fmt.eprintln("Usage: examples <example_name>")
-		fmt.eprintln("Available examples: hellope-shapes, hellope-text, hellope-clay, hellope-custom, textures")
-		os.exit(1)
-	}
-
-	switch args[1] {
-	case "hellope-clay": hellope_clay()
-	case "hellope-custom": hellope_custom()
-	case "hellope-shapes": hellope_shapes()
-	case "hellope-text": hellope_text()
-	case "textures": textures()
-	case:
-		fmt.eprintf("Unknown example: %v\n", args[1])
-		fmt.eprintln("Available examples: hellope-shapes, hellope-text, hellope-clay, hellope-custom, textures")
-		os.exit(1)
-	}
-}
@@ -1,272 +0,0 @@
-package examples
-
-import "../../draw"
-import "../../draw/draw_qr"
-import "core:os"
-import sdl "vendor:sdl3"
-
-textures :: proc() {
-	if !sdl.Init({.VIDEO}) do os.exit(1)
-	window := sdl.CreateWindow("Textures", 800, 600, {.HIGH_PIXEL_DENSITY})
-	gpu := sdl.CreateGPUDevice(draw.PLATFORM_SHADER_FORMAT, true, nil)
-	if !sdl.ClaimWindowForGPUDevice(gpu, window) do os.exit(1)
-	if !draw.init(gpu, window) do os.exit(1)
-	JETBRAINS_MONO_REGULAR = draw.register_font(JETBRAINS_MONO_REGULAR_RAW)
-
-	FONT_SIZE :: u16(14)
-	LABEL_OFFSET :: f32(8) // gap between item and its label
-
-	//----- Texture registration ----------------------------------
-
-	checker_size :: 8
-	checker_pixels: [checker_size * checker_size * 4]u8
-	for y in 0 ..< checker_size {
-		for x in 0 ..< checker_size {
-			i := (y * checker_size + x) * 4
-			is_dark := ((x + y) % 2) == 0
-			val: u8 = 40 if is_dark else 220
-			checker_pixels[i + 0] = val // R
-			checker_pixels[i + 1] = val / 2 // G — slight color tint
-			checker_pixels[i + 2] = val // B
-			checker_pixels[i + 3] = 255 // A
-		}
-	}
-	checker_texture, _ := draw.register_texture(
-		draw.Texture_Desc {
-			width = checker_size,
-			height = checker_size,
-			depth_or_layers = 1,
-			type = .D2,
-			format = .R8G8B8A8_UNORM,
-			usage = {.SAMPLER},
-			mip_levels = 1,
-		},
-		checker_pixels[:],
-	)
-	defer draw.unregister_texture(checker_texture)
-
-	stripe_w :: 16
-	stripe_h :: 8
-	stripe_pixels: [stripe_w * stripe_h * 4]u8
-	for y in 0 ..< stripe_h {
-		for x in 0 ..< stripe_w {
-			i := (y * stripe_w + x) * 4
-			stripe_pixels[i + 0] = u8(x * 255 / (stripe_w - 1)) // R gradient left→right
-			stripe_pixels[i + 1] = u8(y * 255 / (stripe_h - 1)) // G gradient top→bottom
-			stripe_pixels[i + 2] = 128 // B constant
-			stripe_pixels[i + 3] = 255 // A
-		}
-	}
-	stripe_texture, _ := draw.register_texture(
-		draw.Texture_Desc {
-			width = stripe_w,
-			height = stripe_h,
-			depth_or_layers = 1,
-			type = .D2,
-			format = .R8G8B8A8_UNORM,
-			usage = {.SAMPLER},
-			mip_levels = 1,
-		},
-		stripe_pixels[:],
-	)
-	defer draw.unregister_texture(stripe_texture)
-
-	qr_texture, _ := draw_qr.register_texture_from("https://x.com/miiilato/status/1880241066471051443")
-	defer draw.unregister_texture(qr_texture)
-
-	spin_angle: f32 = 0
-
-	//----- Draw loop ----------------------------------
-
-	for {
-		defer free_all(context.temp_allocator)
-		ev: sdl.Event
-		for sdl.PollEvent(&ev) {
-			if ev.type == .QUIT do return
-		}
-		spin_angle += 1
-
-		base_layer := draw.begin({width = 800, height = 600})
-
-		// Background
-		draw.rectangle(base_layer, {0, 0, 800, 600}, {30, 30, 30, 255})
-
-		//----- Row 1: Sampler presets (y=30) ----------------------------------
-
-		ROW1_Y :: f32(30)
-		ITEM_SIZE :: f32(120)
-		COL1 :: f32(30)
-		COL2 :: f32(180)
-		COL3 :: f32(330)
-		COL4 :: f32(480)
-
-		// Nearest (sharp pixel edges)
-		draw.rectangle_texture(
-			base_layer,
-			{COL1, ROW1_Y, ITEM_SIZE, ITEM_SIZE},
-			checker_texture,
-			sampler = .Nearest_Clamp,
-		)
-		draw.text(
-			base_layer,
-			"Nearest",
-			{COL1, ROW1_Y + ITEM_SIZE + LABEL_OFFSET},
-			JETBRAINS_MONO_REGULAR,
-			FONT_SIZE,
-			color = draw.WHITE,
-		)
-
-		// Linear (bilinear blur)
-		draw.rectangle_texture(
-			base_layer,
-			{COL2, ROW1_Y, ITEM_SIZE, ITEM_SIZE},
-			checker_texture,
-			sampler = .Linear_Clamp,
-		)
-		draw.text(
-			base_layer,
-			"Linear",
-			{COL2, ROW1_Y + ITEM_SIZE + LABEL_OFFSET},
-			JETBRAINS_MONO_REGULAR,
-			FONT_SIZE,
-			color = draw.WHITE,
-		)
-
-		// Tiled (4x repeat)
-		draw.rectangle_texture(
-			base_layer,
-			{COL3, ROW1_Y, ITEM_SIZE, ITEM_SIZE},
-			checker_texture,
-			sampler = .Nearest_Repeat,
-			uv_rect = {0, 0, 4, 4},
-		)
-		draw.text(
-			base_layer,
-			"Tiled 4x",
-			{COL3, ROW1_Y + ITEM_SIZE + LABEL_OFFSET},
-			JETBRAINS_MONO_REGULAR,
-			FONT_SIZE,
-			color = draw.WHITE,
-		)
-
-		//----- Row 2: Sampler presets (y=190) ----------------------------------
-
-		ROW2_Y :: f32(190)
-
-		// QR code (RGBA texture with baked colors, nearest sampling)
-		draw.rectangle(base_layer, {COL1, ROW2_Y, ITEM_SIZE, ITEM_SIZE}, {255, 255, 255, 255}) // white bg
-		draw.rectangle_texture(
-			base_layer,
-			{COL1, ROW2_Y, ITEM_SIZE, ITEM_SIZE},
-			qr_texture,
-			sampler = .Nearest_Clamp,
-		)
-		draw.text(
-			base_layer,
-			"QR Code",
-			{COL1, ROW2_Y + ITEM_SIZE + LABEL_OFFSET},
-			JETBRAINS_MONO_REGULAR,
-			FONT_SIZE,
-			color = draw.WHITE,
-		)
-
-		// Rounded corners
-		draw.rectangle_texture(
-			base_layer,
-			{COL2, ROW2_Y, ITEM_SIZE, ITEM_SIZE},
-			checker_texture,
-			sampler = .Nearest_Clamp,
-			roundness = 0.3,
-		)
-		draw.text(
-			base_layer,
-			"Rounded",
-			{COL2, ROW2_Y + ITEM_SIZE + LABEL_OFFSET},
-			JETBRAINS_MONO_REGULAR,
-			FONT_SIZE,
-			color = draw.WHITE,
-		)
-
-		// Rotating
-		rot_rect := draw.Rectangle{COL3, ROW2_Y, ITEM_SIZE, ITEM_SIZE}
-		draw.rectangle_texture(
-			base_layer,
-			rot_rect,
-			checker_texture,
-			sampler = .Nearest_Clamp,
-			origin = draw.center_of(rot_rect),
-			rotation = spin_angle,
-		)
-		draw.text(
-			base_layer,
-			"Rotating",
-			{COL3, ROW2_Y + ITEM_SIZE + LABEL_OFFSET},
-			JETBRAINS_MONO_REGULAR,
-			FONT_SIZE,
-			color = draw.WHITE,
-		)
-
-		//----- Row 3: Fit modes + Per-corner radii (y=360) ----------------------------------
-
-		ROW3_Y :: f32(360)
-		FIT_SIZE :: f32(120) // square target rect
-
-		// Stretch
-		uv_s, sampler_s, inner_s := draw.fit_params(.Stretch, {COL1, ROW3_Y, FIT_SIZE, FIT_SIZE}, stripe_texture)
-		draw.rectangle(base_layer, {COL1, ROW3_Y, FIT_SIZE, FIT_SIZE}, {60, 60, 60, 255}) // bg
-		draw.rectangle_texture(base_layer, inner_s, stripe_texture, uv_rect = uv_s, sampler = sampler_s)
-		draw.text(
-			base_layer,
-			"Stretch",
-			{COL1, ROW3_Y + FIT_SIZE + LABEL_OFFSET},
-			JETBRAINS_MONO_REGULAR,
-			FONT_SIZE,
-			color = draw.WHITE,
-		)
-
-		// Fill (center-crop)
-		uv_f, sampler_f, inner_f := draw.fit_params(.Fill, {COL2, ROW3_Y, FIT_SIZE, FIT_SIZE}, stripe_texture)
-		draw.rectangle(base_layer, {COL2, ROW3_Y, FIT_SIZE, FIT_SIZE}, {60, 60, 60, 255})
-		draw.rectangle_texture(base_layer, inner_f, stripe_texture, uv_rect = uv_f, sampler = sampler_f)
-		draw.text(
-			base_layer,
-			"Fill",
-			{COL2, ROW3_Y + FIT_SIZE + LABEL_OFFSET},
-			JETBRAINS_MONO_REGULAR,
-			FONT_SIZE,
-			color = draw.WHITE,
-		)
-
-		// Fit (letterbox)
-		uv_ft, sampler_ft, inner_ft := draw.fit_params(.Fit, {COL3, ROW3_Y, FIT_SIZE, FIT_SIZE}, stripe_texture)
-		draw.rectangle(base_layer, {COL3, ROW3_Y, FIT_SIZE, FIT_SIZE}, {60, 60, 60, 255}) // visible margin bg
-		draw.rectangle_texture(base_layer, inner_ft, stripe_texture, uv_rect = uv_ft, sampler = sampler_ft)
-		draw.text(
-			base_layer,
-			"Fit",
-			{COL3, ROW3_Y + FIT_SIZE + LABEL_OFFSET},
-			JETBRAINS_MONO_REGULAR,
-			FONT_SIZE,
-			color = draw.WHITE,
-		)
-
-		// Per-corner radii
-		draw.rectangle_texture_corners(
-			base_layer,
-			{COL4, ROW3_Y, FIT_SIZE, FIT_SIZE},
-			{20, 0, 20, 0},
-			checker_texture,
-			sampler = .Nearest_Clamp,
-		)
-		draw.text(
-			base_layer,
-			"Per-corner",
-			{COL4, ROW3_Y + FIT_SIZE + LABEL_OFFSET},
-			JETBRAINS_MONO_REGULAR,
-			FONT_SIZE,
-			color = draw.WHITE,
-		)
-
-		draw.end(gpu, window)
-	}
-}
@@ -1,685 +0,0 @@
-package draw
-
-import "core:c"
-import "core:log"
-import "core:mem"
-import sdl "vendor:sdl3"
-
-Vertex :: struct {
-	position: [2]f32,
-	uv:       [2]f32,
-	color:    Color,
-}
-
-TextBatch :: struct {
-	atlas_texture: ^sdl.GPUTexture,
-	vertex_start:  u32,
-	vertex_count:  u32,
-	index_start:   u32,
-	index_count:   u32,
-}
-
-// ----------------------------------------------------------------------------------------------------------------
-// ----- SDF primitive types -----------
-// ----------------------------------------------------------------------------------------------------------------
-
-Shape_Kind :: enum u8 {
-	Solid    = 0,
-	RRect    = 1,
-	Circle   = 2,
-	Ellipse  = 3,
-	Segment  = 4,
-	Ring_Arc = 5,
-	NGon     = 6,
-}
-
-Shape_Flag :: enum u8 {
-	Stroke,
-	Textured,
-}
-
-Shape_Flags :: bit_set[Shape_Flag;u8]
-
-RRect_Params :: struct {
-	half_size: [2]f32,
-	radii:     [4]f32,
-	soft_px:   f32,
-	stroke_px: f32,
-}
-
-Circle_Params :: struct {
-	radius:    f32,
-	soft_px:   f32,
-	stroke_px: f32,
-	_:         [5]f32,
-}
-
-Ellipse_Params :: struct {
-	radii:     [2]f32,
-	soft_px:   f32,
-	stroke_px: f32,
-	_:         [4]f32,
-}
-
-Segment_Params :: struct {
-	a:       [2]f32,
-	b:       [2]f32,
-	width:   f32,
-	soft_px: f32,
-	_:       [2]f32,
-}
-
-Ring_Arc_Params :: struct {
-	inner_radius: f32,
-	outer_radius: f32,
-	start_rad:    f32,
-	end_rad:      f32,
-	soft_px:      f32,
-	_:            [3]f32,
-}
-
-NGon_Params :: struct {
-	radius:    f32,
-	rotation:  f32,
-	sides:     f32,
-	soft_px:   f32,
-	stroke_px: f32,
-	_:         [3]f32,
-}
-
-Shape_Params :: struct #raw_union {
-	rrect:    RRect_Params,
-	circle:   Circle_Params,
-	ellipse:  Ellipse_Params,
-	segment:  Segment_Params,
-	ring_arc: Ring_Arc_Params,
-	ngon:     NGon_Params,
-	raw:      [8]f32,
-}
-
-#assert(size_of(Shape_Params) == 32)
-
-// GPU layout: 64 bytes, std430-compatible. The shader declares this as a storage buffer struct.
-Primitive :: struct {
-	bounds:     [4]f32, //  0: min_x, min_y, max_x, max_y (world-space, pre-DPI)
-	color:      Color, // 16: u8x4, unpacked in shader via unpackUnorm4x8
-	kind_flags: u32, // 20: (kind as u32) | (flags as u32 << 8)
-	rotation:   f32, // 24: shader self-rotation in radians (used by RRect, Ellipse)
-	_pad:       f32, // 28: alignment to vec4 boundary
-	params:     Shape_Params, // 32: two vec4s of shape params
-	uv_rect:    [4]f32, // 64: u_min, v_min, u_max, v_max (default {0,0,1,1})
-}
-
-#assert(size_of(Primitive) == 80)
-
-pack_kind_flags :: #force_inline proc(kind: Shape_Kind, flags: Shape_Flags) -> u32 {
-	return u32(kind) | (u32(transmute(u8)flags) << 8)
-}
-
-Pipeline_2D_Base :: struct {
-	sdl_pipeline:     ^sdl.GPUGraphicsPipeline,
-	vertex_buffer:    Buffer,
-	index_buffer:     Buffer,
-	unit_quad_buffer: ^sdl.GPUBuffer,
-	primitive_buffer: Buffer,
-	white_texture:    ^sdl.GPUTexture,
-	sampler:          ^sdl.GPUSampler,
-}
-
-@(private)
-create_pipeline_2d_base :: proc(
-	device: ^sdl.GPUDevice,
-	window: ^sdl.Window,
-	sample_count: sdl.GPUSampleCount,
-) -> (
-	pipeline: Pipeline_2D_Base,
-	ok: bool,
-) {
-	// On failure, clean up any partially-created resources
-	defer if !ok {
-		if pipeline.sampler != nil do sdl.ReleaseGPUSampler(device, pipeline.sampler)
-		if pipeline.white_texture != nil do sdl.ReleaseGPUTexture(device, pipeline.white_texture)
-		if pipeline.unit_quad_buffer != nil do sdl.ReleaseGPUBuffer(device, pipeline.unit_quad_buffer)
-		if pipeline.primitive_buffer.gpu != nil do destroy_buffer(device, &pipeline.primitive_buffer)
-		if pipeline.index_buffer.gpu != nil do destroy_buffer(device, &pipeline.index_buffer)
-		if pipeline.vertex_buffer.gpu != nil do destroy_buffer(device, &pipeline.vertex_buffer)
-		if pipeline.sdl_pipeline != nil do sdl.ReleaseGPUGraphicsPipeline(device, pipeline.sdl_pipeline)
-	}
-
-	active_shader_formats := sdl.GetGPUShaderFormats(device)
-	if PLATFORM_SHADER_FORMAT_FLAG not_in active_shader_formats {
-		log.errorf(
-			"draw: no embedded shader matches active GPU formats; this build supports %v but device reports %v",
-			PLATFORM_SHADER_FORMAT,
-			active_shader_formats,
-		)
-		return pipeline, false
-	}
-
-	log.debug("Loaded", len(BASE_VERT_2D_RAW), "vert bytes")
-	log.debug("Loaded", len(BASE_FRAG_2D_RAW), "frag bytes")
-
-	vert_info := sdl.GPUShaderCreateInfo {
-		code_size           = len(BASE_VERT_2D_RAW),
-		code                = raw_data(BASE_VERT_2D_RAW),
-		entrypoint          = SHADER_ENTRY,
-		format              = {PLATFORM_SHADER_FORMAT_FLAG},
-		stage               = .VERTEX,
-		num_uniform_buffers = 1,
-		num_storage_buffers = 1,
-	}
-
-	frag_info := sdl.GPUShaderCreateInfo {
-		code_size    = len(BASE_FRAG_2D_RAW),
-		code         = raw_data(BASE_FRAG_2D_RAW),
-		entrypoint   = SHADER_ENTRY,
-		format       = {PLATFORM_SHADER_FORMAT_FLAG},
-		stage        = .FRAGMENT,
-		num_samplers = 1,
-	}
-
-	vert_shader := sdl.CreateGPUShader(device, vert_info)
-	if vert_shader == nil {
-		log.errorf("Could not create draw vertex shader: %s", sdl.GetError())
-		return pipeline, false
-	}
-
-	frag_shader := sdl.CreateGPUShader(device, frag_info)
-	if frag_shader == nil {
-		sdl.ReleaseGPUShader(device, vert_shader)
-		log.errorf("Could not create draw fragment shader: %s", sdl.GetError())
-		return pipeline, false
-	}
-
-	vertex_attributes: [3]sdl.GPUVertexAttribute = {
-		// position (GLSL location 0)
-		sdl.GPUVertexAttribute{buffer_slot = 0, location = 0, format = .FLOAT2, offset = 0},
-		// uv (GLSL location 1)
-		sdl.GPUVertexAttribute{buffer_slot = 0, location = 1, format = .FLOAT2, offset = size_of([2]f32)},
-		// color (GLSL location 2, u8x4 normalized to float by GPU)
-		sdl.GPUVertexAttribute{buffer_slot = 0, location = 2, format = .UBYTE4_NORM, offset = size_of([2]f32) * 2},
-	}
-
-	pipeline_info := sdl.GPUGraphicsPipelineCreateInfo {
-		vertex_shader = vert_shader,
-		fragment_shader = frag_shader,
-		primitive_type = .TRIANGLELIST,
-		multisample_state = sdl.GPUMultisampleState{sample_count = sample_count},
-		target_info = sdl.GPUGraphicsPipelineTargetInfo {
-			color_target_descriptions = &sdl.GPUColorTargetDescription {
-				format = sdl.GetGPUSwapchainTextureFormat(device, window),
-				blend_state = sdl.GPUColorTargetBlendState {
-					enable_blend = true,
-					enable_color_write_mask = true,
-					src_color_blendfactor = .SRC_ALPHA,
-					dst_color_blendfactor = .ONE_MINUS_SRC_ALPHA,
-					color_blend_op = .ADD,
-					src_alpha_blendfactor = .SRC_ALPHA,
-					dst_alpha_blendfactor = .ONE_MINUS_SRC_ALPHA,
-					alpha_blend_op = .ADD,
-					color_write_mask = sdl.GPUColorComponentFlags{.R, .G, .B, .A},
-				},
-			},
-			num_color_targets = 1,
-		},
-		vertex_input_state = sdl.GPUVertexInputState {
-			vertex_buffer_descriptions = &sdl.GPUVertexBufferDescription {
-				slot = 0,
-				input_rate = .VERTEX,
-				pitch = size_of(Vertex),
-			},
-			num_vertex_buffers = 1,
-			vertex_attributes = raw_data(vertex_attributes[:]),
-			num_vertex_attributes = 3,
-		},
-	}
-
-	pipeline.sdl_pipeline = sdl.CreateGPUGraphicsPipeline(device, pipeline_info)
-	// Shaders are no longer needed regardless of pipeline creation success
-	sdl.ReleaseGPUShader(device, vert_shader)
-	sdl.ReleaseGPUShader(device, frag_shader)
-	if pipeline.sdl_pipeline == nil {
-		log.errorf("Failed to create draw graphics pipeline: %s", sdl.GetError())
-		return pipeline, false
-	}
-
-	// Create vertex buffer
-	vert_buf_ok: bool
-	pipeline.vertex_buffer, vert_buf_ok = create_buffer(
-		device,
-		size_of(Vertex) * BUFFER_INIT_SIZE,
-		sdl.GPUBufferUsageFlags{.VERTEX},
-	)
-	if !vert_buf_ok do return pipeline, false
-
-	// Create index buffer (used by text)
-	idx_buf_ok: bool
-	pipeline.index_buffer, idx_buf_ok = create_buffer(
-		device,
-		size_of(c.int) * BUFFER_INIT_SIZE,
-		sdl.GPUBufferUsageFlags{.INDEX},
-	)
-	if !idx_buf_ok do return pipeline, false
-
-	// Create primitive storage buffer (used by SDF instanced drawing)
-	prim_buf_ok: bool
-	pipeline.primitive_buffer, prim_buf_ok = create_buffer(
-		device,
-		size_of(Primitive) * BUFFER_INIT_SIZE,
-		sdl.GPUBufferUsageFlags{.GRAPHICS_STORAGE_READ},
-	)
-	if !prim_buf_ok do return pipeline, false
-
-	// Create static 6-vertex unit quad buffer (two triangles, TRIANGLELIST)
-	pipeline.unit_quad_buffer = sdl.CreateGPUBuffer(
-		device,
-		sdl.GPUBufferCreateInfo{usage = {.VERTEX}, size = 6 * size_of(Vertex)},
-	)
-	if pipeline.unit_quad_buffer == nil {
-		log.errorf("Failed to create unit quad buffer: %s", sdl.GetError())
-		return pipeline, false
-	}
-
-	// Create 1x1 white pixel texture
-	pipeline.white_texture = sdl.CreateGPUTexture(
-		device,
-		sdl.GPUTextureCreateInfo {
-			type = .D2,
-			format = .R8G8B8A8_UNORM,
-			usage = {.SAMPLER},
-			width = 1,
-			height = 1,
-			layer_count_or_depth = 1,
-			num_levels = 1,
-			sample_count = ._1,
-		},
-	)
-	if pipeline.white_texture == nil {
-		log.errorf("Failed to create white pixel texture: %s", sdl.GetError())
-		return pipeline, false
-	}
-
-	// Upload white pixel and unit quad data in a single command buffer
-	white_pixel := [4]u8{255, 255, 255, 255}
-	white_transfer_buf := sdl.CreateGPUTransferBuffer(
-		device,
-		sdl.GPUTransferBufferCreateInfo{usage = .UPLOAD, size = size_of(white_pixel)},
-	)
-	if white_transfer_buf == nil {
-		log.errorf("Failed to create white pixel transfer buffer: %s", sdl.GetError())
-		return pipeline, false
-	}
-	defer sdl.ReleaseGPUTransferBuffer(device, white_transfer_buf)
-
-	white_ptr := sdl.MapGPUTransferBuffer(device, white_transfer_buf, false)
-	if white_ptr == nil {
-		log.errorf("Failed to map white pixel transfer buffer: %s", sdl.GetError())
-		return pipeline, false
-	}
-	mem.copy(white_ptr, &white_pixel, size_of(white_pixel))
-	sdl.UnmapGPUTransferBuffer(device, white_transfer_buf)
-
-	quad_verts := [6]Vertex {
-		{position = {0, 0}},
-		{position = {1, 0}},
-		{position = {0, 1}},
-		{position = {0, 1}},
-		{position = {1, 0}},
-		{position = {1, 1}},
-	}
-	quad_transfer_buf := sdl.CreateGPUTransferBuffer(
-		device,
-		sdl.GPUTransferBufferCreateInfo{usage = .UPLOAD, size = size_of(quad_verts)},
-	)
-	if quad_transfer_buf == nil {
-		log.errorf("Failed to create unit quad transfer buffer: %s", sdl.GetError())
-		return pipeline, false
-	}
-	defer sdl.ReleaseGPUTransferBuffer(device, quad_transfer_buf)
-
-	quad_ptr := sdl.MapGPUTransferBuffer(device, quad_transfer_buf, false)
-	if quad_ptr == nil {
-		log.errorf("Failed to map unit quad transfer buffer: %s", sdl.GetError())
-		return pipeline, false
-	}
-	mem.copy(quad_ptr, &quad_verts, size_of(quad_verts))
-	sdl.UnmapGPUTransferBuffer(device, quad_transfer_buf)
-
-	upload_cmd_buffer := sdl.AcquireGPUCommandBuffer(device)
-	if upload_cmd_buffer == nil {
-		log.errorf("Failed to acquire command buffer for init upload: %s", sdl.GetError())
-		return pipeline, false
-	}
-	upload_pass := sdl.BeginGPUCopyPass(upload_cmd_buffer)
-
-	sdl.UploadToGPUTexture(
-		upload_pass,
-		sdl.GPUTextureTransferInfo{transfer_buffer = white_transfer_buf},
-		sdl.GPUTextureRegion{texture = pipeline.white_texture, w = 1, h = 1, d = 1},
-		false,
-	)
-
-	sdl.UploadToGPUBuffer(
-		upload_pass,
-		sdl.GPUTransferBufferLocation{transfer_buffer = quad_transfer_buf},
-		sdl.GPUBufferRegion{buffer = pipeline.unit_quad_buffer, offset = 0, size = size_of(quad_verts)},
-		false,
-	)
-
-	sdl.EndGPUCopyPass(upload_pass)
-	if !sdl.SubmitGPUCommandBuffer(upload_cmd_buffer) {
-		log.errorf("Failed to submit init upload command buffer: %s", sdl.GetError())
-		return pipeline, false
-	}
-
-	log.debug("White pixel texture and unit quad buffer created and uploaded")
-
-	// Create sampler (shared by shapes and text)
-	pipeline.sampler = sdl.CreateGPUSampler(
-		device,
-		sdl.GPUSamplerCreateInfo {
-			min_filter = .LINEAR,
-			mag_filter = .LINEAR,
-			mipmap_mode = .LINEAR,
-			address_mode_u = .CLAMP_TO_EDGE,
-			address_mode_v = .CLAMP_TO_EDGE,
-			address_mode_w = .CLAMP_TO_EDGE,
-		},
-	)
-	if pipeline.sampler == nil {
-		log.errorf("Could not create GPU sampler: %s", sdl.GetError())
-		return pipeline, false
-	}
-
-	log.debug("Done creating unified draw pipeline")
-	return pipeline, true
-}
-
-@(private)
-upload :: proc(device: ^sdl.GPUDevice, pass: ^sdl.GPUCopyPass) {
-	// Upload vertices (shapes then text into one buffer)
-	shape_vert_count := u32(len(GLOB.tmp_shape_verts))
-	text_vert_count := u32(len(GLOB.tmp_text_verts))
-	total_vert_count := shape_vert_count + text_vert_count
-
-	if total_vert_count > 0 {
-		total_vert_size := total_vert_count * size_of(Vertex)
-		shape_vert_size := shape_vert_count * size_of(Vertex)
-		text_vert_size := text_vert_count * size_of(Vertex)
-
-		grow_buffer_if_needed(
-			device,
-			&GLOB.pipeline_2d_base.vertex_buffer,
-			total_vert_size,
-			sdl.GPUBufferUsageFlags{.VERTEX},
-		)
-
-		vert_array := sdl.MapGPUTransferBuffer(device, GLOB.pipeline_2d_base.vertex_buffer.transfer, false)
-		if vert_array == nil {
-			log.panicf("Failed to map vertex transfer buffer: %s", sdl.GetError())
-		}
-		if shape_vert_size > 0 {
-			mem.copy(vert_array, raw_data(GLOB.tmp_shape_verts), int(shape_vert_size))
-		}
-		if text_vert_size > 0 {
-			mem.copy(
-				rawptr(uintptr(vert_array) + uintptr(shape_vert_size)),
-				raw_data(GLOB.tmp_text_verts),
-				int(text_vert_size),
-			)
-		}
-		sdl.UnmapGPUTransferBuffer(device, GLOB.pipeline_2d_base.vertex_buffer.transfer)
-
-		sdl.UploadToGPUBuffer(
-			pass,
-			sdl.GPUTransferBufferLocation{transfer_buffer = GLOB.pipeline_2d_base.vertex_buffer.transfer},
-			sdl.GPUBufferRegion{buffer = GLOB.pipeline_2d_base.vertex_buffer.gpu, offset = 0, size = total_vert_size},
-			false,
-		)
-	}
-
-	// Upload text indices
-	index_count := u32(len(GLOB.tmp_text_indices))
-	if index_count > 0 {
-		index_size := index_count * size_of(c.int)
-
-		grow_buffer_if_needed(
-			device,
-			&GLOB.pipeline_2d_base.index_buffer,
-			index_size,
-			sdl.GPUBufferUsageFlags{.INDEX},
-		)
-
-		idx_array := sdl.MapGPUTransferBuffer(device, GLOB.pipeline_2d_base.index_buffer.transfer, false)
-		if idx_array == nil {
-			log.panicf("Failed to map index transfer buffer: %s", sdl.GetError())
-		}
-		mem.copy(idx_array, raw_data(GLOB.tmp_text_indices), int(index_size))
-		sdl.UnmapGPUTransferBuffer(device, GLOB.pipeline_2d_base.index_buffer.transfer)
-
-		sdl.UploadToGPUBuffer(
-			pass,
-			sdl.GPUTransferBufferLocation{transfer_buffer = GLOB.pipeline_2d_base.index_buffer.transfer},
-			sdl.GPUBufferRegion{buffer = GLOB.pipeline_2d_base.index_buffer.gpu, offset = 0, size = index_size},
-			false,
-		)
-	}
-
-	// Upload SDF primitives
-	prim_count := u32(len(GLOB.tmp_primitives))
-	if prim_count > 0 {
-		prim_size := prim_count * size_of(Primitive)
-
-		grow_buffer_if_needed(
-			device,
-			&GLOB.pipeline_2d_base.primitive_buffer,
-			prim_size,
-			sdl.GPUBufferUsageFlags{.GRAPHICS_STORAGE_READ},
-		)
-
-		prim_array := sdl.MapGPUTransferBuffer(device, GLOB.pipeline_2d_base.primitive_buffer.transfer, false)
-		if prim_array == nil {
-			log.panicf("Failed to map primitive transfer buffer: %s", sdl.GetError())
-		}
-		mem.copy(prim_array, raw_data(GLOB.tmp_primitives), int(prim_size))
-		sdl.UnmapGPUTransferBuffer(device, GLOB.pipeline_2d_base.primitive_buffer.transfer)
-
-		sdl.UploadToGPUBuffer(
-			pass,
-			sdl.GPUTransferBufferLocation{transfer_buffer = GLOB.pipeline_2d_base.primitive_buffer.transfer},
-			sdl.GPUBufferRegion{buffer = GLOB.pipeline_2d_base.primitive_buffer.gpu, offset = 0, size = prim_size},
-			false,
-		)
-	}
-}
-
-@(private)
-draw_layer :: proc(
-	device: ^sdl.GPUDevice,
-	window: ^sdl.Window,
-	cmd_buffer: ^sdl.GPUCommandBuffer,
-	render_texture: ^sdl.GPUTexture,
-	swapchain_width: u32,
-	swapchain_height: u32,
-	clear_color: [4]f32,
-	layer: ^Layer,
-) {
-	if layer.sub_batch_len == 0 {
-		if !GLOB.cleared {
-			pass := sdl.BeginGPURenderPass(
-				cmd_buffer,
-				&sdl.GPUColorTargetInfo {
-					texture = render_texture,
-					clear_color = sdl.FColor{clear_color[0], clear_color[1], clear_color[2], clear_color[3]},
-					load_op = .CLEAR,
-					store_op = .STORE,
-				},
-				1,
-				nil,
-			)
-			sdl.EndGPURenderPass(pass)
-			GLOB.cleared = true
-		}
-		return
-	}
-
-	render_pass := sdl.BeginGPURenderPass(
-		cmd_buffer,
-		&sdl.GPUColorTargetInfo {
-			texture = render_texture,
-			clear_color = sdl.FColor{clear_color[0], clear_color[1], clear_color[2], clear_color[3]},
-			load_op = GLOB.cleared ? .LOAD : .CLEAR,
-			store_op = .STORE,
-		},
-		1,
-		nil,
-	)
-	GLOB.cleared = true
-
-	sdl.BindGPUGraphicsPipeline(render_pass, GLOB.pipeline_2d_base.sdl_pipeline)
-
-	// Bind storage buffer (read by vertex shader in SDF mode)
-	sdl.BindGPUVertexStorageBuffers(
-		render_pass,
-		0,
-		([^]^sdl.GPUBuffer)(&GLOB.pipeline_2d_base.primitive_buffer.gpu),
-		1,
-	)
-
-	// Always bind index buffer — harmless if no indexed draws are issued
-	sdl.BindGPUIndexBuffer(
-		render_pass,
-		sdl.GPUBufferBinding{buffer = GLOB.pipeline_2d_base.index_buffer.gpu, offset = 0},
-		._32BIT,
-	)
-
-	// Shorthand aliases for frequently-used pipeline resources
-	main_vert_buf := GLOB.pipeline_2d_base.vertex_buffer.gpu
-	unit_quad := GLOB.pipeline_2d_base.unit_quad_buffer
-	white_texture := GLOB.pipeline_2d_base.white_texture
-	sampler := GLOB.pipeline_2d_base.sampler
-	width := f32(swapchain_width)
-	height := f32(swapchain_height)
-
-	// Initial GPU state: tessellated mode, main vertex buffer, no atlas bound yet
-	push_globals(cmd_buffer, width, height, .Tessellated)
-	sdl.BindGPUVertexBuffers(render_pass, 0, &sdl.GPUBufferBinding{buffer = main_vert_buf, offset = 0}, 1)
-
-	current_mode: Draw_Mode = .Tessellated
-	current_vert_buf := main_vert_buf
-	current_atlas: ^sdl.GPUTexture
-	current_sampler := sampler
-
-	// Text vertices live after shape vertices in the GPU vertex buffer
-	text_vertex_gpu_base := u32(len(GLOB.tmp_shape_verts))
-
-	for &scissor in GLOB.scissors[layer.scissor_start:][:layer.scissor_len] {
-		sdl.SetGPUScissor(render_pass, scissor.bounds)
-
-		for &batch in GLOB.tmp_sub_batches[scissor.sub_batch_start:][:scissor.sub_batch_len] {
-			switch batch.kind {
-			case .Shapes:
-				if current_mode != .Tessellated {
-					push_globals(cmd_buffer, width, height, .Tessellated)
-					current_mode = .Tessellated
-				}
-				if current_vert_buf != main_vert_buf {
-					sdl.BindGPUVertexBuffers(render_pass, 0, &sdl.GPUBufferBinding{buffer = main_vert_buf, offset = 0}, 1)
-					current_vert_buf = main_vert_buf
-				}
-				// Determine texture and sampler for this batch
-				batch_texture: ^sdl.GPUTexture = white_texture
-				batch_sampler: ^sdl.GPUSampler = sampler
-				if batch.texture_id != INVALID_TEXTURE {
-					if bound_texture := texture_gpu_handle(batch.texture_id); bound_texture != nil {
-						batch_texture = bound_texture
-					}
-					batch_sampler = get_sampler(batch.sampler)
-				}
-				if current_atlas != batch_texture || current_sampler != batch_sampler {
-					sdl.BindGPUFragmentSamplers(
-						render_pass,
-						0,
-						&sdl.GPUTextureSamplerBinding{texture = batch_texture, sampler = batch_sampler},
-						1,
-					)
-					current_atlas = batch_texture
-					current_sampler = batch_sampler
-				}
-				sdl.DrawGPUPrimitives(render_pass, batch.count, 1, batch.offset, 0)
-
-			case .Text:
-				if current_mode != .Tessellated {
-					push_globals(cmd_buffer, width, height, .Tessellated)
-					current_mode = .Tessellated
-				}
-				if current_vert_buf != main_vert_buf {
-					sdl.BindGPUVertexBuffers(render_pass, 0, &sdl.GPUBufferBinding{buffer = main_vert_buf, offset = 0}, 1)
-					current_vert_buf = main_vert_buf
-				}
-				text_batch := &GLOB.tmp_text_batches[batch.offset]
-				if current_atlas != text_batch.atlas_texture {
-					sdl.BindGPUFragmentSamplers(
-						render_pass,
-						0,
-						&sdl.GPUTextureSamplerBinding{texture = text_batch.atlas_texture, sampler = sampler},
-						1,
-					)
-					current_atlas = text_batch.atlas_texture
-				}
-				sdl.DrawGPUIndexedPrimitives(
-					render_pass,
-					text_batch.index_count,
-					1,
-					text_batch.index_start,
-					i32(text_vertex_gpu_base + text_batch.vertex_start),
-					0,
-				)
-
-			case .SDF:
-				if current_mode != .SDF {
-					push_globals(cmd_buffer, width, height, .SDF)
-					current_mode = .SDF
-				}
-				if current_vert_buf != unit_quad {
-					sdl.BindGPUVertexBuffers(render_pass, 0, &sdl.GPUBufferBinding{buffer = unit_quad, offset = 0}, 1)
-					current_vert_buf = unit_quad
-				}
-				// Determine texture and sampler for this batch
-				batch_texture: ^sdl.GPUTexture = white_texture
-				batch_sampler: ^sdl.GPUSampler = sampler
-				if batch.texture_id != INVALID_TEXTURE {
-					if bound_texture := texture_gpu_handle(batch.texture_id); bound_texture != nil {
-						batch_texture = bound_texture
-					}
-					batch_sampler = get_sampler(batch.sampler)
-				}
-				if current_atlas != batch_texture || current_sampler != batch_sampler {
-					sdl.BindGPUFragmentSamplers(
-						render_pass,
-						0,
-						&sdl.GPUTextureSamplerBinding{texture = batch_texture, sampler = batch_sampler},
-						1,
-					)
-					current_atlas = batch_texture
-					current_sampler = batch_sampler
-				}
-				sdl.DrawGPUPrimitives(render_pass, 6, batch.count, 0, batch.offset)
-			}
-		}
-	}
-
-	sdl.EndGPURenderPass(render_pass)
-}
-
-destroy_pipeline_2d_base :: proc(device: ^sdl.GPUDevice, pipeline: ^Pipeline_2D_Base) {
-	destroy_buffer(device, &pipeline.vertex_buffer)
-	destroy_buffer(device, &pipeline.index_buffer)
-	destroy_buffer(device, &pipeline.primitive_buffer)
-	if pipeline.unit_quad_buffer != nil {
-		sdl.ReleaseGPUBuffer(device, pipeline.unit_quad_buffer)
-	}
-	sdl.ReleaseGPUTexture(device, pipeline.white_texture)
-	sdl.ReleaseGPUSampler(device, pipeline.sampler)
-	sdl.ReleaseGPUGraphicsPipeline(device, pipeline.sdl_pipeline)
-}
@@ -1,315 +0,0 @@
-#pragma clang diagnostic ignored "-Wmissing-prototypes"
-
-#include <metal_stdlib>
-#include <simd/simd.h>
-
-using namespace metal;
-
-// Implementation of the GLSL mod() function, which is slightly different than Metal fmod()
-template<typename Tx, typename Ty>
-inline Tx mod(Tx x, Ty y)
-{
-    return x - y * floor(x / y);
-}
-
-struct main0_out
-{
-    float4 out_color [[color(0)]];
-};
-
-struct main0_in
-{
-    float4 f_color [[user(locn0)]];
-    float2 f_local_or_uv [[user(locn1)]];
-    float4 f_params [[user(locn2)]];
-    float4 f_params2 [[user(locn3)]];
-    uint f_kind_flags [[user(locn4)]];
-    float f_rotation [[user(locn5), flat]];
-    float4 f_uv_rect [[user(locn6), flat]];
-};
-
-static inline __attribute__((always_inline))
-float2 apply_rotation(thread const float2& p, thread const float& angle)
-{
-    float cr = cos(-angle);
-    float sr = sin(-angle);
-    return float2x2(float2(cr, sr), float2(-sr, cr)) * p;
-}
-
-static inline __attribute__((always_inline))
-float sdRoundedBox(thread const float2& p, thread const float2& b, thread float4& r)
-{
-    float2 _61;
-    if (p.x > 0.0)
-    {
-        _61 = r.xy;
-    }
-    else
-    {
-        _61 = r.zw;
-    }
-    r.x = _61.x;
-    r.y = _61.y;
-    float _78;
-    if (p.y > 0.0)
-    {
-        _78 = r.x;
-    }
-    else
-    {
-        _78 = r.y;
-    }
-    r.x = _78;
-    float2 q = (abs(p) - b) + float2(r.x);
-    return (fast::min(fast::max(q.x, q.y), 0.0) + length(fast::max(q, float2(0.0)))) - r.x;
-}
-
-static inline __attribute__((always_inline))
-float sdf_stroke(thread const float& d, thread const float& stroke_width)
-{
-    return abs(d) - (stroke_width * 0.5);
-}
-
-static inline __attribute__((always_inline))
-float sdf_alpha(thread const float& d, thread const float& soft)
-{
-    return 1.0 - smoothstep(-soft, soft, d);
-}
-
-static inline __attribute__((always_inline))
-float sdCircle(thread const float2& p, thread const float& r)
-{
-    return length(p) - r;
-}
-
-static inline __attribute__((always_inline))
-float sdEllipse(thread float2& p, thread float2& ab)
-{
-    p = abs(p);
-    if (p.x > p.y)
-    {
-        p = p.yx;
-        ab = ab.yx;
-    }
-    float l = (ab.y * ab.y) - (ab.x * ab.x);
-    float m = (ab.x * p.x) / l;
-    float m2 = m * m;
-    float n = (ab.y * p.y) / l;
-    float n2 = n * n;
-    float c = ((m2 + n2) - 1.0) / 3.0;
-    float c3 = (c * c) * c;
-    float q = c3 + ((m2 * n2) * 2.0);
-    float d = c3 + (m2 * n2);
-    float g = m + (m * n2);
-    float co;
-    if (d < 0.0)
-    {
-        float h = acos(q / c3) / 3.0;
-        float s = cos(h);
-        float t = sin(h) * 1.73205077648162841796875;
-        float rx = sqrt(((-c) * ((s + t) + 2.0)) + m2);
-        float ry = sqrt(((-c) * ((s - t) + 2.0)) + m2);
-        co = (((ry + (sign(l) * rx)) + (abs(g) / (rx * ry))) - m) / 2.0;
-    }
-    else
-    {
-        float h_1 = ((2.0 * m) * n) * sqrt(d);
-        float s_1 = sign(q + h_1) * powr(abs(q + h_1), 0.3333333432674407958984375);
-        float u = sign(q - h_1) * powr(abs(q - h_1), 0.3333333432674407958984375);
-        float rx_1 = (((-s_1) - u) - (c * 4.0)) + (2.0 * m2);
-        float ry_1 = (s_1 - u) * 1.73205077648162841796875;
-        float rm = sqrt((rx_1 * rx_1) + (ry_1 * ry_1));
-        co = (((ry_1 / sqrt(rm - rx_1)) + ((2.0 * g) / rm)) - m) / 2.0;
-    }
-    float2 r = ab * float2(co, sqrt(1.0 - (co * co)));
-    return length(r - p) * sign(p.y - r.y);
-}
-
-static inline __attribute__((always_inline))
-float sdSegment(thread const float2& p, thread const float2& a, thread const float2& b)
-{
-    float2 pa = p - a;
-    float2 ba = b - a;
-    float h = fast::clamp(dot(pa, ba) / dot(ba, ba), 0.0, 1.0);
-    return length(pa - (ba * h));
-}
-
-fragment main0_out main0(main0_in in [[stage_in]], texture2d<float> tex [[texture(0)]], sampler texSmplr [[sampler(0)]])
-{
-    main0_out out = {};
-    uint kind = in.f_kind_flags & 255u;
-    uint flags = (in.f_kind_flags >> 8u) & 255u;
-    if (kind == 0u)
-    {
-        out.out_color = in.f_color * tex.sample(texSmplr, in.f_local_or_uv);
-        return out;
-    }
-    float d = 1000000015047466219876688855040.0;
-    float soft = 1.0;
-    if (kind == 1u)
-    {
-        float2 b = in.f_params.xy;
-        float4 r = float4(in.f_params.zw, in.f_params2.xy);
-        soft = fast::max(in.f_params2.z, 1.0);
-        float stroke_px = in.f_params2.w;
-        float2 p_local = in.f_local_or_uv;
-        if (in.f_rotation != 0.0)
-        {
-            float2 param = p_local;
-            float param_1 = in.f_rotation;
-            p_local = apply_rotation(param, param_1);
-        }
-        float2 param_2 = p_local;
-        float2 param_3 = b;
-        float4 param_4 = r;
-        float _491 = sdRoundedBox(param_2, param_3, param_4);
-        d = _491;
-        if ((flags & 1u) != 0u)
-        {
-            float param_5 = d;
-            float param_6 = stroke_px;
-            d = sdf_stroke(param_5, param_6);
-        }
-        float4 shape_color = in.f_color;
-        if ((flags & 2u) != 0u)
-        {
-            float2 p_for_uv = in.f_local_or_uv;
-            if (in.f_rotation != 0.0)
-            {
-                float2 param_7 = p_for_uv;
-                float param_8 = in.f_rotation;
-                p_for_uv = apply_rotation(param_7, param_8);
-            }
-            float2 local_uv = ((p_for_uv / b) * 0.5) + float2(0.5);
-            float2 uv = mix(in.f_uv_rect.xy, in.f_uv_rect.zw, local_uv);
-            shape_color *= tex.sample(texSmplr, uv);
-        }
-        float param_9 = d;
-        float param_10 = soft;
-        float alpha = sdf_alpha(param_9, param_10);
-        out.out_color = float4(shape_color.xyz, shape_color.w * alpha);
-        return out;
-    }
-    else
-    {
-        if (kind == 2u)
-        {
-            float radius = in.f_params.x;
-            soft = fast::max(in.f_params.y, 1.0);
-            float stroke_px_1 = in.f_params.z;
-            float2 param_11 = in.f_local_or_uv;
-            float param_12 = radius;
-            d = sdCircle(param_11, param_12);
-            if ((flags & 1u) != 0u)
-            {
-                float param_13 = d;
-                float param_14 = stroke_px_1;
-                d = sdf_stroke(param_13, param_14);
-            }
-        }
-        else
-        {
-            if (kind == 3u)
-            {
-                float2 ab = in.f_params.xy;
-                soft = fast::max(in.f_params.z, 1.0);
-                float stroke_px_2 = in.f_params.w;
-                float2 p_local_1 = in.f_local_or_uv;
-                if (in.f_rotation != 0.0)
-                {
-                    float2 param_15 = p_local_1;
-                    float param_16 = in.f_rotation;
-                    p_local_1 = apply_rotation(param_15, param_16);
-                }
-                float2 param_17 = p_local_1;
-                float2 param_18 = ab;
-                float _616 = sdEllipse(param_17, param_18);
-                d = _616;
-                if ((flags & 1u) != 0u)
-                {
-                    float param_19 = d;
-                    float param_20 = stroke_px_2;
-                    d = sdf_stroke(param_19, param_20);
-                }
-            }
-            else
-            {
-                if (kind == 4u)
-                {
-                    float2 a = in.f_params.xy;
-                    float2 b_1 = in.f_params.zw;
-                    float width = in.f_params2.x;
-                    soft = fast::max(in.f_params2.y, 1.0);
-                    float2 param_21 = in.f_local_or_uv;
-                    float2 param_22 = a;
-                    float2 param_23 = b_1;
-                    d = sdSegment(param_21, param_22, param_23) - (width * 0.5);
-                }
-                else
-                {
-                    if (kind == 5u)
-                    {
-                        float inner = in.f_params.x;
-                        float outer = in.f_params.y;
-                        float start_rad = in.f_params.z;
-                        float end_rad = in.f_params.w;
-                        soft = fast::max(in.f_params2.x, 1.0);
-                        float r_1 = length(in.f_local_or_uv);
-                        float d_ring = fast::max(inner - r_1, r_1 - outer);
-                        float angle = precise::atan2(in.f_local_or_uv.y, in.f_local_or_uv.x);
-                        if (angle < 0.0)
-                        {
-                            angle += 6.283185482025146484375;
-                        }
-                        float ang_start = mod(start_rad, 6.283185482025146484375);
-                        float ang_end = mod(end_rad, 6.283185482025146484375);
-                        float _710;
-                        if (ang_end > ang_start)
-                        {
-                            _710 = float((angle >= ang_start) && (angle <= ang_end));
-                        }
-                        else
-                        {
-                            _710 = float((angle >= ang_start) || (angle <= ang_end));
-                        }
-                        float in_arc = _710;
-                        if (abs(ang_end - ang_start) >= 6.282185077667236328125)
-                        {
-                            in_arc = 1.0;
-                        }
-                        d = (in_arc > 0.5) ? d_ring : 1000000015047466219876688855040.0;
-                    }
-                    else
-                    {
-                        if (kind == 6u)
-                        {
-                            float radius_1 = in.f_params.x;
-                            float rotation = in.f_params.y;
-                            float sides = in.f_params.z;
-                            soft = fast::max(in.f_params.w, 1.0);
-                            float stroke_px_3 = in.f_params2.x;
-                            float2 p = in.f_local_or_uv;
-                            float c = cos(rotation);
-                            float s = sin(rotation);
-                            p = float2x2(float2(c, -s), float2(s, c)) * p;
-                            float an = 3.1415927410125732421875 / sides;
-                            float bn = mod(precise::atan2(p.y, p.x), 2.0 * an) - an;
-                            d = (length(p) * cos(bn)) - radius_1;
-                            if ((flags & 1u) != 0u)
-                            {
-                                float param_24 = d;
-                                float param_25 = stroke_px_3;
-                                d = sdf_stroke(param_24, param_25);
-                            }
-                        }
-                    }
-                }
-            }
-        }
-    }
-    float param_26 = d;
-    float param_27 = soft;
-    float alpha_1 = sdf_alpha(param_26, param_27);
-    out.out_color = float4(in.f_color.xyz, in.f_color.w * alpha_1);
-    return out;
-}
@@ -1,99 +0,0 @@
-#include <metal_stdlib>
-#include <simd/simd.h>
-
-using namespace metal;
-
-struct Uniforms
-{
-    float4x4 projection;
-    float dpi_scale;
-    uint mode;
-};
-
-struct Primitive
-{
-    float4 bounds;
-    uint color;
-    uint kind_flags;
-    float rotation;
-    float _pad;
-    float4 params;
-    float4 params2;
-    float4 uv_rect;
-};
-
-struct Primitive_1
-{
-    float4 bounds;
-    uint color;
-    uint kind_flags;
-    float rotation;
-    float _pad;
-    float4 params;
-    float4 params2;
-    float4 uv_rect;
-};
-
-struct Primitives
-{
-    Primitive_1 primitives[1];
-};
-
-struct main0_out
-{
-    float4 f_color [[user(locn0)]];
-    float2 f_local_or_uv [[user(locn1)]];
-    float4 f_params [[user(locn2)]];
-    float4 f_params2 [[user(locn3)]];
-    uint f_kind_flags [[user(locn4)]];
-    float f_rotation [[user(locn5)]];
-    float4 f_uv_rect [[user(locn6)]];
-    float4 gl_Position [[position]];
-};
-
-struct main0_in
-{
-    float2 v_position [[attribute(0)]];
-    float2 v_uv [[attribute(1)]];
-    float4 v_color [[attribute(2)]];
-};
-
-vertex main0_out main0(main0_in in [[stage_in]], constant Uniforms& _12 [[buffer(0)]], const device Primitives& _74 [[buffer(1)]], uint gl_InstanceIndex [[instance_id]])
-{
-    main0_out out = {};
-    if (_12.mode == 0u)
-    {
-        out.f_color = in.v_color;
-        out.f_local_or_uv = in.v_uv;
-        out.f_params = float4(0.0);
-        out.f_params2 = float4(0.0);
-        out.f_kind_flags = 0u;
-        out.f_rotation = 0.0;
-        out.f_uv_rect = float4(0.0, 0.0, 1.0, 1.0);
-        out.gl_Position = _12.projection * float4(in.v_position * _12.dpi_scale, 0.0, 1.0);
-    }
-    else
-    {
-        Primitive p;
-        p.bounds = _74.primitives[int(gl_InstanceIndex)].bounds;
-        p.color = _74.primitives[int(gl_InstanceIndex)].color;
-        p.kind_flags = _74.primitives[int(gl_InstanceIndex)].kind_flags;
-        p.rotation = _74.primitives[int(gl_InstanceIndex)].rotation;
-        p._pad = _74.primitives[int(gl_InstanceIndex)]._pad;
-        p.params = _74.primitives[int(gl_InstanceIndex)].params;
-        p.params2 = _74.primitives[int(gl_InstanceIndex)].params2;
-        p.uv_rect = _74.primitives[int(gl_InstanceIndex)].uv_rect;
-        float2 corner = in.v_position;
-        float2 world_pos = mix(p.bounds.xy, p.bounds.zw, corner);
-        float2 center = (p.bounds.xy + p.bounds.zw) * 0.5;
-        out.f_color = unpack_unorm4x8_to_float(p.color);
-        out.f_local_or_uv = (world_pos - center) * _12.dpi_scale;
-        out.f_params = p.params;
-        out.f_params2 = p.params2;
-        out.f_kind_flags = p.kind_flags;
-        out.f_rotation = p.rotation;
-        out.f_uv_rect = p.uv_rect;
-        out.gl_Position = _12.projection * float4(world_pos * _12.dpi_scale, 0.0, 1.0);
-    }
-    return out;
-}
@@ -1,228 +0,0 @@
-#version 450 core
-
-// --- Inputs from vertex shader ---
-layout(location = 0) in vec4 f_color;
-layout(location = 1) in vec2 f_local_or_uv;
-layout(location = 2) in vec4 f_params;
-layout(location = 3) in vec4 f_params2;
-layout(location = 4) flat in uint f_kind_flags;
-layout(location = 5) flat in float f_rotation;
-layout(location = 6) flat in vec4 f_uv_rect;
-
-// --- Output ---
-layout(location = 0) out vec4 out_color;
-
-// --- Texture sampler (for tessellated/text path) ---
-layout(set = 2, binding = 0) uniform sampler2D tex;
-
-// ---------------------------------------------------------------------------
-// SDF helper functions (Inigo Quilez)
-// All operate in physical pixel space — no dpi_scale needed here.
-// ---------------------------------------------------------------------------
-
-const float PI = 3.14159265358979;
-
-float sdCircle(vec2 p, float r) {
-    return length(p) - r;
-}
-
-float sdRoundedBox(vec2 p, vec2 b, vec4 r) {
-    r.xy = (p.x > 0.0) ? r.xy : r.zw;
-    r.x = (p.y > 0.0) ? r.x : r.y;
-    vec2 q = abs(p) - b + r.x;
-    return min(max(q.x, q.y), 0.0) + length(max(q, vec2(0.0))) - r.x;
-}
-
-float sdSegment(vec2 p, vec2 a, vec2 b) {
-    vec2 pa = p - a, ba = b - a;
-    float h = clamp(dot(pa, ba) / dot(ba, ba), 0.0, 1.0);
-    return length(pa - ba * h);
-}
-
-float sdEllipse(vec2 p, vec2 ab) {
-    p = abs(p);
-    if (p.x > p.y) {
-        p = p.yx;
-        ab = ab.yx;
-    }
-    float l = ab.y * ab.y - ab.x * ab.x;
-    float m = ab.x * p.x / l;
-    float m2 = m * m;
-    float n = ab.y * p.y / l;
-    float n2 = n * n;
-    float c = (m2 + n2 - 1.0) / 3.0;
-    float c3 = c * c * c;
-    float q = c3 + m2 * n2 * 2.0;
-    float d = c3 + m2 * n2;
-    float g = m + m * n2;
-    float co;
-    if (d < 0.0) {
-        float h = acos(q / c3) / 3.0;
-        float s = cos(h);
-        float t = sin(h) * sqrt(3.0);
-        float rx = sqrt(-c * (s + t + 2.0) + m2);
-        float ry = sqrt(-c * (s - t + 2.0) + m2);
-        co = (ry + sign(l) * rx + abs(g) / (rx * ry) - m) / 2.0;
-    } else {
-        float h = 2.0 * m * n * sqrt(d);
-        float s = sign(q + h) * pow(abs(q + h), 1.0 / 3.0);
-        float u = sign(q - h) * pow(abs(q - h), 1.0 / 3.0);
-        float rx = -s - u - c * 4.0 + 2.0 * m2;
-        float ry = (s - u) * sqrt(3.0);
-        float rm = sqrt(rx * rx + ry * ry);
-        co = (ry / sqrt(rm - rx) + 2.0 * g / rm - m) / 2.0;
-    }
-    vec2 r = ab * vec2(co, sqrt(1.0 - co * co));
-    return length(r - p) * sign(p.y - r.y);
-}
-
-float sdf_alpha(float d, float soft) {
-    return 1.0 - smoothstep(-soft, soft, d);
-}
-
-float sdf_stroke(float d, float stroke_width) {
-    return abs(d) - stroke_width * 0.5;
-}
-
-// Rotate a 2D point by the negative of the given angle (inverse rotation).
-// Used to rotate the sampling frame opposite to the shape's rotation so that
-// the SDF evaluates correctly for the rotated shape.
-vec2 apply_rotation(vec2 p, float angle) {
-    float cr = cos(-angle);
-    float sr = sin(-angle);
-    return mat2(cr, sr, -sr, cr) * p;
-}
-
-// ---------------------------------------------------------------------------
-// main
-// ---------------------------------------------------------------------------
-
-void main() {
-    uint kind = f_kind_flags & 0xFFu;
-    uint flags = (f_kind_flags >> 8u) & 0xFFu;
-
-    // -----------------------------------------------------------------------
-    // Kind 0: Tessellated path. Texture multiply for text atlas,
-    //         white pixel for solid shapes.
-    // -----------------------------------------------------------------------
-    if (kind == 0u) {
-        out_color = f_color * texture(tex, f_local_or_uv);
-        return;
-    }
-
-    // -----------------------------------------------------------------------
-    // SDF path. f_local_or_uv = shape-centered position in physical pixels.
-    // All dimensional params are already in physical pixels (CPU pre-scaled).
-    // -----------------------------------------------------------------------
-    float d = 1e30;
-    float soft = 1.0;
-
-    if (kind == 1u) {
-        // RRect: rounded box
-        vec2 b = f_params.xy; // half_size (phys px)
-        vec4 r = vec4(f_params.zw, f_params2.xy); // corner radii: tr, br, tl, bl
-        soft = max(f_params2.z, 1.0);
-        float stroke_px = f_params2.w;
-
-        vec2 p_local = f_local_or_uv;
-        if (f_rotation != 0.0) {
-            p_local = apply_rotation(p_local, f_rotation);
-        }
-
-        d = sdRoundedBox(p_local, b, r);
-        if ((flags & 1u) != 0u) d = sdf_stroke(d, stroke_px);
-
-        // Texture sampling for textured SDF primitives
-        vec4 shape_color = f_color;
-        if ((flags & 2u) != 0u) {
-            // Compute UV from local position and half_size
-            vec2 p_for_uv = f_local_or_uv;
-            if (f_rotation != 0.0) {
-                p_for_uv = apply_rotation(p_for_uv, f_rotation);
-            }
-            vec2 local_uv = p_for_uv / b * 0.5 + 0.5;
-            vec2 uv = mix(f_uv_rect.xy, f_uv_rect.zw, local_uv);
-            shape_color *= texture(tex, uv);
-        }
-
-        float alpha = sdf_alpha(d, soft);
-        out_color = vec4(shape_color.rgb, shape_color.a * alpha);
-        return;
-    }
-    else if (kind == 2u) {
-        // Circle — rotationally symmetric, no rotation needed
-        float radius = f_params.x;
-        soft = max(f_params.y, 1.0);
-        float stroke_px = f_params.z;
-
-        d = sdCircle(f_local_or_uv, radius);
-        if ((flags & 1u) != 0u) d = sdf_stroke(d, stroke_px);
-    }
-    else if (kind == 3u) {
-        // Ellipse
-        vec2 ab = f_params.xy;
-        soft = max(f_params.z, 1.0);
-        float stroke_px = f_params.w;
-
-        vec2 p_local = f_local_or_uv;
-        if (f_rotation != 0.0) {
-            p_local = apply_rotation(p_local, f_rotation);
-        }
-
-        d = sdEllipse(p_local, ab);
-        if ((flags & 1u) != 0u) d = sdf_stroke(d, stroke_px);
-    }
-    else if (kind == 4u) {
-        // Segment (capsule line) — no rotation (excluded)
-        vec2 a = f_params.xy; // already in local physical pixels
-        vec2 b = f_params.zw;
-        float width = f_params2.x;
-        soft = max(f_params2.y, 1.0);
-
-        d = sdSegment(f_local_or_uv, a, b) - width * 0.5;
-    }
-    else if (kind == 5u) {
-        // Ring / Arc — rotation handled by CPU angle offset, no shader rotation
-        float inner = f_params.x;
-        float outer = f_params.y;
-        float start_rad = f_params.z;
-        float end_rad = f_params.w;
-        soft = max(f_params2.x, 1.0);
-
-        float r = length(f_local_or_uv);
-        float d_ring = max(inner - r, r - outer);
-
-        // Angular clip
-        float angle = atan(f_local_or_uv.y, f_local_or_uv.x);
-        if (angle < 0.0) angle += 2.0 * PI;
-        float ang_start = mod(start_rad, 2.0 * PI);
-        float ang_end = mod(end_rad, 2.0 * PI);
-
-        float in_arc = (ang_end > ang_start)
-            ? ((angle >= ang_start && angle <= ang_end) ? 1.0 : 0.0) : ((angle >= ang_start || angle <= ang_end) ? 1.0 : 0.0);
-        if (abs(ang_end - ang_start) >= 2.0 * PI - 0.001) in_arc = 1.0;
-
-        d = in_arc > 0.5 ? d_ring : 1e30;
-    }
-    else if (kind == 6u) {
-        // Regular N-gon — has its own rotation in params, no Primitive.rotation used
-        float radius = f_params.x;
-        float rotation = f_params.y;
-        float sides = f_params.z;
-        soft = max(f_params.w, 1.0);
-        float stroke_px = f_params2.x;
-
-        vec2 p = f_local_or_uv;
-        float c = cos(rotation), s = sin(rotation);
-        p = mat2(c, -s, s, c) * p;
-
-        float an = PI / sides;
-        float bn = mod(atan(p.y, p.x), 2.0 * an) - an;
-        d = length(p) * cos(bn) - radius;
-
-        if ((flags & 1u) != 0u) d = sdf_stroke(d, stroke_px);
-    }
-
-    float alpha = sdf_alpha(d, soft);
-    out_color = vec4(f_color.rgb, f_color.a * alpha);
-}
@@ -1,71 +0,0 @@
-#version 450 core
-
-// ---------- Vertex attributes (used in both modes) ----------
-layout(location = 0) in vec2 v_position;
-layout(location = 1) in vec2 v_uv;
-layout(location = 2) in vec4 v_color;
-
-// ---------- Outputs to fragment shader ----------
-layout(location = 0) out vec4 f_color;
-layout(location = 1) out vec2 f_local_or_uv;
-layout(location = 2) out vec4 f_params;
-layout(location = 3) out vec4 f_params2;
-layout(location = 4) flat out uint f_kind_flags;
-layout(location = 5) flat out float f_rotation;
-layout(location = 6) flat out vec4 f_uv_rect;
-
-// ---------- Uniforms (single block — avoids spirv-cross reordering on Metal) ----------
-layout(set = 1, binding = 0) uniform Uniforms {
-    mat4 projection;
-    float dpi_scale;
-    uint mode; // 0 = tessellated, 1 = SDF
-};
-
-// ---------- SDF primitive storage buffer ----------
-struct Primitive {
-    vec4 bounds; // 0-15:  min_x, min_y, max_x, max_y
-    uint color; // 16-19: packed u8x4 (unpack with unpackUnorm4x8)
-    uint kind_flags; // 20-23: kind | (flags << 8)
-    float rotation; // 24-27: shader self-rotation in radians
-    float _pad; // 28-31: alignment padding
-    vec4 params; // 32-47: shape params part 1
-    vec4 params2; // 48-63: shape params part 2
-    vec4 uv_rect; // 64-79: u_min, v_min, u_max, v_max
-};
-
-layout(std430, set = 0, binding = 0) readonly buffer Primitives {
-    Primitive primitives[];
-};
-
-// ---------- Entry point ----------
-void main() {
-    if (mode == 0u) {
-        // ---- Mode 0: Tessellated (legacy) ----
-        f_color = v_color;
-        f_local_or_uv = v_uv;
-        f_params = vec4(0.0);
-        f_params2 = vec4(0.0);
-        f_kind_flags = 0u;
-        f_rotation = 0.0;
-        f_uv_rect = vec4(0.0, 0.0, 1.0, 1.0);
-
-        gl_Position = projection * vec4(v_position * dpi_scale, 0.0, 1.0);
-    } else {
-        // ---- Mode 1: SDF instanced quads ----
-        Primitive p = primitives[gl_InstanceIndex];
-
-        vec2 corner = v_position; // unit quad corners: (0,0)-(1,1)
-        vec2 world_pos = mix(p.bounds.xy, p.bounds.zw, corner);
-        vec2 center = 0.5 * (p.bounds.xy + p.bounds.zw);
-
-        f_color = unpackUnorm4x8(p.color);
-        f_local_or_uv = (world_pos - center) * dpi_scale; // shape-centered physical pixels
-        f_params = p.params;
-        f_params2 = p.params2;
-        f_kind_flags = p.kind_flags;
-        f_rotation = p.rotation;
-        f_uv_rect = p.uv_rect;
-
-        gl_Position = projection * vec4(world_pos * dpi_scale, 0.0, 1.0);
-    }
-}
@@ -1,314 +0,0 @@
-package draw
-
-import "core:c"
-import "core:log"
-import "core:strings"
-import sdl "vendor:sdl3"
-import sdl_ttf "vendor:sdl3/ttf"
-
-Font_Id :: u16
-
-Font_Key :: struct {
-	id:   Font_Id,
-	size: u16,
-}
-
-Cache_Source :: enum u8 {
-	Custom,
-	Clay,
-}
-
-Cache_Key :: struct {
-	id:     u32,
-	source: Cache_Source,
-}
-
-Text_Cache :: struct {
-	engine:     ^sdl_ttf.TextEngine,
-	font_bytes: [dynamic][]u8,
-	sdl_fonts:  map[Font_Key]^sdl_ttf.Font,
-	cache:      map[Cache_Key]^sdl_ttf.Text,
-}
-
-// Internal for fetching SDL TTF font pointer for rendering
-get_font :: proc(id: Font_Id, size: u16) -> ^sdl_ttf.Font {
-	assert(int(id) < len(GLOB.text_cache.font_bytes), "Invalid font ID.")
-	key := Font_Key{id, size}
-	font := GLOB.text_cache.sdl_fonts[key]
-
-	if font == nil {
-		log.debug("Font with id:", id, "and size:", size, "not found. Adding..")
-
-		font_bytes := GLOB.text_cache.font_bytes[id]
-		if font_bytes == nil {
-			log.panicf("Font must first be registered with register_font before using (id=%d)", id)
-		}
-
-		font_io := sdl.IOFromConstMem(raw_data(font_bytes[:]), len(font_bytes))
-		if font_io == nil {
-			log.panicf("Failed to create IOStream for font id=%d: %s", id, sdl.GetError())
-		}
-
-		sdl_font := sdl_ttf.OpenFontIO(font_io, true, f32(size))
-		if sdl_font == nil {
-			log.panicf("Failed to create SDL font for font id=%d size=%d: %s", id, size, sdl.GetError())
-		}
-
-		if !sdl_ttf.SetFontSizeDPI(sdl_font, f32(size), 72 * i32(GLOB.dpi_scaling), 72 * i32(GLOB.dpi_scaling)) {
-			log.panicf("Failed to set font DPI for font id=%d size=%d: %s", id, size, sdl.GetError())
-		}
-
-		GLOB.text_cache.sdl_fonts[key] = sdl_font
-		return sdl_font
-	} else {
-		return font
-	}
-}
-
-// Returns `false` if there are more than max(u16) fonts
-register_font :: proc(bytes: []u8) -> (id: Font_Id, ok: bool) #optional_ok {
-	if GLOB.text_cache.engine == nil {
-		log.panicf("Cannot register font: text system not initialized. Call init() first.")
-	}
-	if len(GLOB.text_cache.font_bytes) > int(max(Font_Id)) do return 0, false
-
-	log.debug("Registering font...")
-	append(&GLOB.text_cache.font_bytes, bytes)
-	return Font_Id(len(GLOB.text_cache.font_bytes) - 1), true
-}
-
-Text :: struct {
-	sdl_text: ^sdl_ttf.Text,
-	position: [2]f32,
-	color:    Color,
-}
-
-// ---------------------------------------------------------------------------------------------------------------------
-// ----- Text cache lookup -------------
-// ---------------------------------------------------------------------------------------------------------------------
-
-// Shared cache lookup/create/update logic used by both the `text` proc and the Clay render path.
-// Returns the cached (or newly created) TTF_Text pointer.
-@(private)
-cache_get_or_update :: proc(key: Cache_Key, c_str: cstring, font: ^sdl_ttf.Font) -> ^sdl_ttf.Text {
-	existing, found := GLOB.text_cache.cache[key]
-	if !found {
-		sdl_text := sdl_ttf.CreateText(GLOB.text_cache.engine, font, c_str, 0)
-		if sdl_text == nil {
-			log.panicf("Failed to create SDL text: %s", sdl.GetError())
-		}
-		GLOB.text_cache.cache[key] = sdl_text
-		return sdl_text
-	} else {
-		if !sdl_ttf.SetTextString(existing, c_str, 0) {
-			log.panicf("Failed to update SDL text string: %s", sdl.GetError())
-		}
-		return existing
-	}
-}
-
-// ---------------------------------------------------------------------------------------------------------------------
-// ----- Text drawing ------------------
-// ---------------------------------------------------------------------------------------------------------------------
-
-// Draw text at a position with optional rotation and origin.
-//
-// When `id` is nil (the default), the text is created and destroyed each frame — simple and
-// leak-free, appropriate for HUDs and moderate UI (up to ~50 text elements per frame).
-//
-// When `id` is set, the TTF_Text object is cached across frames keyed by the provided u32.
-// This avoids per-frame HarfBuzz shaping and allocation, which matters for text-heavy apps
-// (editors, terminals, chat). The user is responsible for choosing unique IDs per logical text
-// element and calling `clear_text_cache` or `clear_text_cache_entry` when cached entries are
-// no longer needed. Custom text IDs occupy a separate namespace from Clay text IDs, so
-// collisions between the two are impossible.
-//
-// `origin` is in pixels from the text block's top-left corner (raylib convention).
-// The point whose local coords equal `origin` lands at `pos` in world space.
-// `rotation` is in degrees, counter-clockwise.
-text :: proc(
-	layer: ^Layer,
-	text_string: string,
-	position: [2]f32,
-	font_id: Font_Id,
-	font_size: u16 = 44,
-	color: Color = BLACK,
-	origin: [2]f32 = {0, 0},
-	rotation: f32 = 0,
-	id: Maybe(u32) = nil,
-	temp_allocator := context.temp_allocator,
-) {
-	c_str := strings.clone_to_cstring(text_string, temp_allocator)
-	defer delete(c_str, temp_allocator)
-
-	sdl_text: ^sdl_ttf.Text
-	cached := false
-
-	if cache_id, ok := id.?; ok {
-		cached = true
-		sdl_text = cache_get_or_update(Cache_Key{cache_id, .Custom}, c_str, get_font(font_id, font_size))
-	} else {
-		sdl_text = sdl_ttf.CreateText(GLOB.text_cache.engine, get_font(font_id, font_size), c_str, 0)
-		if sdl_text == nil {
-			log.panicf("Failed to create SDL text: %s", sdl.GetError())
-		}
-	}
-
-	if needs_transform(origin, rotation) {
-		dpi_scale := GLOB.dpi_scaling
-		transform := build_pivot_rotation(position * dpi_scale, origin * dpi_scale, rotation)
-		prepare_text_transformed(layer, Text{sdl_text, {0, 0}, color}, transform)
-	} else {
-		prepare_text(layer, Text{sdl_text, position, color})
-	}
-
-	if !cached {
-		// Don't destroy now — the draw data (atlas texture, vertices) is still referenced
-		// by the batch buffers until end() submits to the GPU. Deferred to clear_global().
-		append(&GLOB.tmp_uncached_text, sdl_text)
-	}
-}
-
-// ---------------------------------------------------------------------------------------------------------------------
-// ----- Public text measurement -------
-// ---------------------------------------------------------------------------------------------------------------------
-
-// Measure a string in logical pixels (pre-DPI-scaling) using the same font backend as the renderer.
-measure_text :: proc(
-	text_string: string,
-	font_id: Font_Id,
-	font_size: u16 = 44,
-	allocator := context.temp_allocator,
-) -> [2]f32 {
-	c_str := strings.clone_to_cstring(text_string, allocator)
-	defer delete(c_str, allocator)
-	width, height: c.int
-	if !sdl_ttf.GetStringSize(get_font(font_id, font_size), c_str, 0, &width, &height) {
-		log.panicf("Failed to measure text: %s", sdl.GetError())
-	}
-	return {f32(width) / GLOB.dpi_scaling, f32(height) / GLOB.dpi_scaling}
-}
-
-// ---------------------------------------------------------------------------------------------------------------------
-// ----- Text anchor helpers -----------
-// ---------------------------------------------------------------------------------------------------------------------
-
-center_of_text :: proc(text_string: string, font_id: Font_Id, font_size: u16 = 44) -> [2]f32 {
-	size := measure_text(text_string, font_id, font_size)
-	return size * 0.5
-}
-
-top_left_of_text :: proc(text_string: string, font_id: Font_Id, font_size: u16 = 44) -> [2]f32 {
-	return {0, 0}
-}
-
-top_of_text :: proc(text_string: string, font_id: Font_Id, font_size: u16 = 44) -> [2]f32 {
-	size := measure_text(text_string, font_id, font_size)
-	return {size.x * 0.5, 0}
-}
-
-top_right_of_text :: proc(text_string: string, font_id: Font_Id, font_size: u16 = 44) -> [2]f32 {
-	size := measure_text(text_string, font_id, font_size)
-	return {size.x, 0}
-}
-
-left_of_text :: proc(text_string: string, font_id: Font_Id, font_size: u16 = 44) -> [2]f32 {
-	size := measure_text(text_string, font_id, font_size)
-	return {0, size.y * 0.5}
-}
-
-right_of_text :: proc(text_string: string, font_id: Font_Id, font_size: u16 = 44) -> [2]f32 {
-	size := measure_text(text_string, font_id, font_size)
-	return {size.x, size.y * 0.5}
-}
-
-bottom_left_of_text :: proc(text_string: string, font_id: Font_Id, font_size: u16 = 44) -> [2]f32 {
-	size := measure_text(text_string, font_id, font_size)
-	return {0, size.y}
-}
-
-bottom_of_text :: proc(text_string: string, font_id: Font_Id, font_size: u16 = 44) -> [2]f32 {
-	size := measure_text(text_string, font_id, font_size)
-	return {size.x * 0.5, size.y}
-}
-
-bottom_right_of_text :: proc(text_string: string, font_id: Font_Id, font_size: u16 = 44) -> [2]f32 {
-	size := measure_text(text_string, font_id, font_size)
-	return size
-}
-
-// ---------------------------------------------------------------------------------------------------------------------
-// ----- Cache management --------------
-// ---------------------------------------------------------------------------------------------------------------------
-
-// Destroy all cached text objects (both custom and Clay entries). Call on scene transitions,
-// view changes, or periodically in apps that produce many distinct cached text entries over time.
-// After calling this, subsequent text draws with an `id` will re-create their cache entries.
-clear_text_cache :: proc() {
-	for _, sdl_text in GLOB.text_cache.cache {
-		append(&GLOB.pending_text_releases, sdl_text)
-	}
-	clear(&GLOB.text_cache.cache)
-}
-
-// Destroy a specific cached custom text entry by its u32 id (the same value passed to the
-// `text` proc's `id` parameter). This only affects custom text entries — Clay text entries
-// are managed internally and are not addressable by the user.
-// No-op if the id is not in the cache.
-clear_text_cache_entry :: proc(id: u32) {
-	key := Cache_Key{id, .Custom}
-	sdl_text, ok := GLOB.text_cache.cache[key]
-	if ok {
-		append(&GLOB.pending_text_releases, sdl_text)
-		delete_key(&GLOB.text_cache.cache, key)
-	}
-}
-
-// ---------------------------------------------------------------------------------------------------------------------
-// ----- Internal cache lifecycle ------
-// ---------------------------------------------------------------------------------------------------------------------
-
-@(private, require_results)
-init_text_cache :: proc(
-	device: ^sdl.GPUDevice,
-	allocator := context.allocator,
-) -> (
-	text_cache: Text_Cache,
-	ok: bool,
-) {
-	log.debug("Initializing text state")
-	if !sdl_ttf.Init() {
-		log.errorf("Failed to initialize SDL_ttf: %s", sdl.GetError())
-		return text_cache, false
-	}
-
-	engine := sdl_ttf.CreateGPUTextEngine(device)
-	if engine == nil {
-		log.errorf("Failed to create GPU text engine: %s", sdl.GetError())
-		sdl_ttf.Quit()
-		return text_cache, false
-	}
-	sdl_ttf.SetGPUTextEngineWinding(engine, .COUNTER_CLOCKWISE)
-
-	text_cache = Text_Cache {
-		engine = engine,
-		cache  = make(map[Cache_Key]^sdl_ttf.Text, allocator = allocator),
-	}
-
-	log.debug("Done initializing text cache")
-	return text_cache, true
-}
-
-destroy_text_cache :: proc() {
-	for _, font in GLOB.text_cache.sdl_fonts {
-		sdl_ttf.CloseFont(font)
-	}
-	for _, sdl_text in GLOB.text_cache.cache {
-		sdl_ttf.DestroyText(sdl_text)
-	}
-	delete(GLOB.text_cache.sdl_fonts)
-	delete(GLOB.text_cache.font_bytes)
-	delete(GLOB.text_cache.cache)
-	sdl_ttf.DestroyGPUTextEngine(GLOB.text_cache.engine)
-	sdl_ttf.Quit()
-}
@@ -1,414 +0,0 @@
-package draw
-
-import "core:log"
-import "core:mem"
-import sdl "vendor:sdl3"
-
-Texture_Id :: distinct u32
-INVALID_TEXTURE :: Texture_Id(0) // Slot 0 is reserved/unused
-
-Texture_Kind :: enum u8 {
-	Static, // Uploaded once, never changes (QR codes, decoded PNGs, icons)
-	Dynamic, // Updatable via update_texture_region
-	Stream, // Frequent full re-uploads (video, procedural)
-}
-
-Sampler_Preset :: enum u8 {
-	Nearest_Clamp,
-	Linear_Clamp,
-	Nearest_Repeat,
-	Linear_Repeat,
-}
-
-SAMPLER_PRESET_COUNT :: 4
-
-Fit_Mode :: enum u8 {
-	Stretch, // Fill rect, may distort aspect ratio (default)
-	Fit, // Preserve aspect, letterbox (may leave margins)
-	Fill, // Preserve aspect, center-crop (may crop edges)
-	Tile, // Repeat at native texture size
-	Center, // 1:1 pixel size, centered, no scaling
-}
-
-Texture_Desc :: struct {
-	width:           u32,
-	height:          u32,
-	depth_or_layers: u32,
-	type:            sdl.GPUTextureType,
-	format:          sdl.GPUTextureFormat,
-	usage:           sdl.GPUTextureUsageFlags,
-	mip_levels:      u32,
-	kind:            Texture_Kind,
-}
-
-// Internal slot — not exported.
-@(private)
-Texture_Slot :: struct {
-	gpu_texture: ^sdl.GPUTexture,
-	desc:        Texture_Desc,
-	generation:  u32,
-}
-
-// State stored in GLOB
-// This file references:
-//   GLOB.device                 : ^sdl.GPUDevice
-//   GLOB.texture_slots          : [dynamic]Texture_Slot
-//   GLOB.texture_free_list      : [dynamic]u32
-//   GLOB.pending_texture_releases : [dynamic]Texture_Id
-//   GLOB.samplers               : [SAMPLER_PRESET_COUNT]^sdl.GPUSampler
-
-Clay_Image_Data :: struct {
-	texture_id: Texture_Id,
-	fit:        Fit_Mode,
-	tint:       Color,
-}
-
-clay_image_data :: proc(id: Texture_Id, fit: Fit_Mode = .Stretch, tint: Color = WHITE) -> Clay_Image_Data {
-	return {texture_id = id, fit = fit, tint = tint}
-}
-
-// ---------------------------------------------------------------------------------------------------------------------
-// ----- Registration -------------
-// ---------------------------------------------------------------------------------------------------------------------
-
-// Register a texture. Draw owns the GPU resource and releases it on unregister.
-// `data` is tightly-packed row-major bytes matching desc.format.
-// The caller may free `data` immediately after this proc returns.
-@(require_results)
-register_texture :: proc(desc: Texture_Desc, data: []u8) -> (id: Texture_Id, ok: bool) {
-	device := GLOB.device
-	if device == nil {
-		log.error("register_texture called before draw.init()")
-		return INVALID_TEXTURE, false
-	}
-
-	assert(desc.width > 0, "Texture_Desc.width must be > 0")
-	assert(desc.height > 0, "Texture_Desc.height must be > 0")
-	assert(desc.depth_or_layers > 0, "Texture_Desc.depth_or_layers must be > 0")
-	assert(desc.mip_levels > 0, "Texture_Desc.mip_levels must be > 0")
-	assert(desc.usage != {}, "Texture_Desc.usage must not be empty (e.g. {.SAMPLER})")
-
-	// Create the GPU texture
-	gpu_texture := sdl.CreateGPUTexture(
-		device,
-		sdl.GPUTextureCreateInfo {
-			type = desc.type,
-			format = desc.format,
-			usage = desc.usage,
-			width = desc.width,
-			height = desc.height,
-			layer_count_or_depth = desc.depth_or_layers,
-			num_levels = desc.mip_levels,
-			sample_count = ._1,
-		},
-	)
-	if gpu_texture == nil {
-		log.errorf("Failed to create GPU texture (%dx%d): %s", desc.width, desc.height, sdl.GetError())
-		return INVALID_TEXTURE, false
-	}
-
-	// Upload pixel data via a transfer buffer
-	if len(data) > 0 {
-		data_size := u32(len(data))
-		transfer := sdl.CreateGPUTransferBuffer(
-			device,
-			sdl.GPUTransferBufferCreateInfo{usage = .UPLOAD, size = data_size},
-		)
-		if transfer == nil {
-			log.errorf("Failed to create texture transfer buffer: %s", sdl.GetError())
-			sdl.ReleaseGPUTexture(device, gpu_texture)
-			return INVALID_TEXTURE, false
-		}
-		defer sdl.ReleaseGPUTransferBuffer(device, transfer)
-
-		mapped := sdl.MapGPUTransferBuffer(device, transfer, false)
-		if mapped == nil {
-			log.errorf("Failed to map texture transfer buffer: %s", sdl.GetError())
-			sdl.ReleaseGPUTexture(device, gpu_texture)
-			return INVALID_TEXTURE, false
-		}
-		mem.copy(mapped, raw_data(data), int(data_size))
-		sdl.UnmapGPUTransferBuffer(device, transfer)
-
-		cmd_buffer := sdl.AcquireGPUCommandBuffer(device)
-		if cmd_buffer == nil {
-			log.errorf("Failed to acquire command buffer for texture upload: %s", sdl.GetError())
-			sdl.ReleaseGPUTexture(device, gpu_texture)
-			return INVALID_TEXTURE, false
-		}
-		copy_pass := sdl.BeginGPUCopyPass(cmd_buffer)
-		sdl.UploadToGPUTexture(
-			copy_pass,
-			sdl.GPUTextureTransferInfo{transfer_buffer = transfer},
-			sdl.GPUTextureRegion{texture = gpu_texture, w = desc.width, h = desc.height, d = desc.depth_or_layers},
-			false,
-		)
-		sdl.EndGPUCopyPass(copy_pass)
-		if !sdl.SubmitGPUCommandBuffer(cmd_buffer) {
-			log.errorf("Failed to submit texture upload: %s", sdl.GetError())
-			sdl.ReleaseGPUTexture(device, gpu_texture)
-			return INVALID_TEXTURE, false
-		}
-	}
-
-	// Allocate a slot (reuse from free list or append)
-	slot_index: u32
-	if len(GLOB.texture_free_list) > 0 {
-		slot_index = pop(&GLOB.texture_free_list)
-		GLOB.texture_slots[slot_index] = Texture_Slot {
-			gpu_texture = gpu_texture,
-			desc        = desc,
-			generation  = GLOB.texture_slots[slot_index].generation + 1,
-		}
-	} else {
-		slot_index = u32(len(GLOB.texture_slots))
-		append(&GLOB.texture_slots, Texture_Slot{gpu_texture = gpu_texture, desc = desc, generation = 1})
-	}
-
-	return Texture_Id(slot_index), true
-}
-
-// Queue a texture for release at the end of the current frame.
-// The GPU resource is not freed immediately — see "Deferred release" in the README.
-unregister_texture :: proc(id: Texture_Id) {
-	if id == INVALID_TEXTURE do return
-	append(&GLOB.pending_texture_releases, id)
-}
-
-// Re-upload a sub-region of a Dynamic texture.
-update_texture_region :: proc(id: Texture_Id, region: Rectangle, data: []u8) {
-	if id == INVALID_TEXTURE do return
-	slot := &GLOB.texture_slots[u32(id)]
-	if slot.gpu_texture == nil do return
-
-	device := GLOB.device
-	data_size := u32(len(data))
-	if data_size == 0 do return
-
-	transfer := sdl.CreateGPUTransferBuffer(
-		device,
-		sdl.GPUTransferBufferCreateInfo{usage = .UPLOAD, size = data_size},
-	)
-	if transfer == nil {
-		log.errorf("Failed to create transfer buffer for texture region update: %s", sdl.GetError())
-		return
-	}
-	defer sdl.ReleaseGPUTransferBuffer(device, transfer)
-
-	mapped := sdl.MapGPUTransferBuffer(device, transfer, false)
-	if mapped == nil {
-		log.errorf("Failed to map transfer buffer for texture region update: %s", sdl.GetError())
-		return
-	}
-	mem.copy(mapped, raw_data(data), int(data_size))
-	sdl.UnmapGPUTransferBuffer(device, transfer)
-
-	cmd_buffer := sdl.AcquireGPUCommandBuffer(device)
-	if cmd_buffer == nil {
-		log.errorf("Failed to acquire command buffer for texture region update: %s", sdl.GetError())
-		return
-	}
-	copy_pass := sdl.BeginGPUCopyPass(cmd_buffer)
-	sdl.UploadToGPUTexture(
-		copy_pass,
-		sdl.GPUTextureTransferInfo{transfer_buffer = transfer},
-		sdl.GPUTextureRegion {
-			texture = slot.gpu_texture,
-			x = u32(region.x),
-			y = u32(region.y),
-			w = u32(region.width),
-			h = u32(region.height),
-			d = 1,
-		},
-		false,
-	)
-	sdl.EndGPUCopyPass(copy_pass)
-	if !sdl.SubmitGPUCommandBuffer(cmd_buffer) {
-		log.errorf("Failed to submit texture region update: %s", sdl.GetError())
-	}
-}
-
-// ---------------------------------------------------------------------------------------------------------------------
-// ----- Helpers -------------
-// ---------------------------------------------------------------------------------------------------------------------
-
-// Compute UV rect, recommended sampler, and inner rect for a given fit mode.
-// `rect` is the target drawing area; `texture_id` identifies the texture whose
-// pixel dimensions are looked up via texture_size().
-// For Fit mode, `inner_rect` is smaller than `rect` (centered). For all other modes, `inner_rect == rect`.
-fit_params :: proc(
-	fit: Fit_Mode,
-	rect: Rectangle,
-	texture_id: Texture_Id,
-) -> (
-	uv_rect: Rectangle,
-	sampler: Sampler_Preset,
-	inner_rect: Rectangle,
-) {
-	size := texture_size(texture_id)
-	texture_width := f32(size.x)
-	texture_height := f32(size.y)
-	rect_width := rect.width
-	rect_height := rect.height
-	inner_rect = rect
-
-	if texture_width == 0 || texture_height == 0 || rect_width == 0 || rect_height == 0 {
-		return {0, 0, 1, 1}, .Linear_Clamp, inner_rect
-	}
-
-	texture_aspect := texture_width / texture_height
-	rect_aspect := rect_width / rect_height
-
-	switch fit {
-	case .Stretch: return {0, 0, 1, 1}, .Linear_Clamp, inner_rect
-
-	case .Fill: if texture_aspect > rect_aspect {
-				// Texture wider than rect — crop sides
-				scale := rect_aspect / texture_aspect
-				margin := (1 - scale) * 0.5
-				return {margin, 0, 1 - margin, 1}, .Linear_Clamp, inner_rect
-			} else {
-				// Texture taller than rect — crop top/bottom
-				scale := texture_aspect / rect_aspect
-				margin := (1 - scale) * 0.5
-				return {0, margin, 1, 1 - margin}, .Linear_Clamp, inner_rect
-			}
-
-	case .Fit:
-		// Preserve aspect, fit inside rect. Returns a shrunken inner_rect.
-		if texture_aspect > rect_aspect {
-			// Image wider — letterbox top/bottom
-			fit_height := rect_width / texture_aspect
-			padding := (rect_height - fit_height) * 0.5
-			inner_rect = Rectangle{rect.x, rect.y + padding, rect_width, fit_height}
-		} else {
-			// Image taller — letterbox left/right
-			fit_width := rect_height * texture_aspect
-			padding := (rect_width - fit_width) * 0.5
-			inner_rect = Rectangle{rect.x + padding, rect.y, fit_width, rect_height}
-		}
-		return {0, 0, 1, 1}, .Linear_Clamp, inner_rect
-
-	case .Tile:
-		uv_width := rect_width / texture_width
-		uv_height := rect_height / texture_height
-		return {0, 0, uv_width, uv_height}, .Linear_Repeat, inner_rect
-
-	case .Center:
-		u_half := rect_width / (2 * texture_width)
-		v_half := rect_height / (2 * texture_height)
-		return {0.5 - u_half, 0.5 - v_half, 0.5 + u_half, 0.5 + v_half}, .Nearest_Clamp, inner_rect
-	}
-
-	return {0, 0, 1, 1}, .Linear_Clamp, inner_rect
-}
-
-texture_size :: proc(id: Texture_Id) -> [2]u32 {
-	if id == INVALID_TEXTURE do return {0, 0}
-	slot := &GLOB.texture_slots[u32(id)]
-	return {slot.desc.width, slot.desc.height}
-}
-
-texture_format :: proc(id: Texture_Id) -> sdl.GPUTextureFormat {
-	if id == INVALID_TEXTURE do return .INVALID
-	return GLOB.texture_slots[u32(id)].desc.format
-}
-
-texture_kind :: proc(id: Texture_Id) -> Texture_Kind {
-	if id == INVALID_TEXTURE do return .Static
-	return GLOB.texture_slots[u32(id)].desc.kind
-}
-
-// Internal: get the raw GPU texture pointer for binding during draw.
-@(private)
-texture_gpu_handle :: proc(id: Texture_Id) -> ^sdl.GPUTexture {
-	if id == INVALID_TEXTURE do return nil
-	idx := u32(id)
-	if idx >= u32(len(GLOB.texture_slots)) do return nil
-	return GLOB.texture_slots[idx].gpu_texture
-}
-
-// Deferred release (called from draw.end / clear_global)
-@(private)
-process_pending_texture_releases :: proc() {
-	device := GLOB.device
-	for id in GLOB.pending_texture_releases {
-		idx := u32(id)
-		if idx >= u32(len(GLOB.texture_slots)) do continue
-		slot := &GLOB.texture_slots[idx]
-		if slot.gpu_texture != nil {
-			sdl.ReleaseGPUTexture(device, slot.gpu_texture)
-			slot.gpu_texture = nil
-		}
-		slot.generation += 1
-		append(&GLOB.texture_free_list, idx)
-	}
-	clear(&GLOB.pending_texture_releases)
-}
-
-@(private)
-get_sampler :: proc(preset: Sampler_Preset) -> ^sdl.GPUSampler {
-	idx := int(preset)
-	if GLOB.samplers[idx] != nil do return GLOB.samplers[idx]
-
-	// Lazily create
-	min_filter, mag_filter: sdl.GPUFilter
-	address_mode: sdl.GPUSamplerAddressMode
-
-	switch preset {
-	case .Nearest_Clamp:
-		min_filter = .NEAREST; mag_filter = .NEAREST; address_mode = .CLAMP_TO_EDGE
-	case .Linear_Clamp:
-		min_filter = .LINEAR; mag_filter = .LINEAR; address_mode = .CLAMP_TO_EDGE
-	case .Nearest_Repeat:
-		min_filter = .NEAREST; mag_filter = .NEAREST; address_mode = .REPEAT
-	case .Linear_Repeat:
-		min_filter = .LINEAR; mag_filter = .LINEAR; address_mode = .REPEAT
-	}
-
-	sampler := sdl.CreateGPUSampler(
-		GLOB.device,
-		sdl.GPUSamplerCreateInfo {
-			min_filter = min_filter,
-			mag_filter = mag_filter,
-			mipmap_mode = .LINEAR,
-			address_mode_u = address_mode,
-			address_mode_v = address_mode,
-			address_mode_w = address_mode,
-		},
-	)
-	if sampler == nil {
-		log.errorf("Failed to create sampler preset %v: %s", preset, sdl.GetError())
-		return GLOB.pipeline_2d_base.sampler // fallback to existing default sampler
-	}
-
-	GLOB.samplers[idx] = sampler
-	return sampler
-}
-
-// Internal: destroy all sampler pool entries. Called from draw.destroy().
-@(private)
-destroy_sampler_pool :: proc() {
-	device := GLOB.device
-	for &s in GLOB.samplers {
-		if s != nil {
-			sdl.ReleaseGPUSampler(device, s)
-			s = nil
-		}
-	}
-}
-
-// Internal: destroy all registered textures. Called from draw.destroy().
-@(private)
-destroy_all_textures :: proc() {
-	device := GLOB.device
-	for &slot in GLOB.texture_slots {
-		if slot.gpu_texture != nil {
-			sdl.ReleaseGPUTexture(device, slot.gpu_texture)
-			slot.gpu_texture = nil
-		}
-	}
-	delete(GLOB.texture_slots)
-	delete(GLOB.texture_free_list)
-	delete(GLOB.pending_texture_releases)
-}
@@ -124,14 +124,6 @@ spinlock_unlock :: #force_inline proc "contextless" (lock: ^Spinlock) {
 	intrinsics.atomic_store_explicit(lock, false, .Release)
 }

-try_lock :: proc {
-	spinlock_try_lock,
-}
-
-unlock :: proc {
-	spinlock_unlock,
-}
-
 // ---------------------------------------------------------------------------------------------------------------------
 // ----- Tests ------------------------
 // ---------------------------------------------------------------------------------------------------------------------
@@ -1,141 +0,0 @@
-package meta
-
-import "core:fmt"
-import "core:os"
-import "core:strings"
-
-// Compiles all GLSL shaders in source_dir to both SPIR-V (.spv) and
-// Metal Shading Language (.metal), writing results to generated_dir.
-// Overwrites any previously generated files with matching names.
-// Requires `glslangValidator` and `spirv-cross` on PATH.
-gen_shaders :: proc(source_dir, generated_dir: string) -> (success: bool) {
-	if !verify_shader_tool("glslangValidator") do return false
-	if !verify_shader_tool("spirv-cross") do return false
-
-	source_entries, read_err := os.read_all_directory_by_path(source_dir, context.temp_allocator)
-	if read_err != nil {
-		fmt.eprintfln("Failed to read shader source directory '%s': %v", source_dir, read_err)
-		return false
-	}
-	shader_names := make([dynamic]string, len = 0, cap = 24, allocator = context.temp_allocator)
-
-	for entry in source_entries {
-		if strings.has_suffix(entry.name, ".vert") || strings.has_suffix(entry.name, ".frag") {
-			append(&shader_names, entry.name)
-		}
-	}
-
-	if len(shader_names) == 0 {
-		fmt.eprintfln("No shader source files (.vert, .frag) found in '%s'.", source_dir)
-		return false
-	}
-	if os.exists(generated_dir) {
-		rmdir_err := os.remove_all(generated_dir)
-		if rmdir_err != nil {
-			fmt.eprintfln("Failed to remove old output directory '%s': %v", generated_dir, rmdir_err)
-			return false
-		}
-	}
-	mkdir_err := os.mkdir(generated_dir)
-	if mkdir_err != nil {
-		fmt.eprintfln("Failed to create output directory '%s': %v", generated_dir, mkdir_err)
-		return false
-	}
-
-	compiled_count := 0
-	for shader_name in shader_names {
-		source_path := fmt.tprintf("%s/%s", source_dir, shader_name)
-		spv_path := fmt.tprintf("%s/%s.spv", generated_dir, shader_name)
-		metal_path := fmt.tprintf("%s/%s.metal", generated_dir, shader_name)
-
-		fmt.printfln("[GLSL -> SPIR-V]  %s", shader_name)
-		if !compile_glsl_to_spirv(source_path, spv_path) do continue
-
-		fmt.printfln("[SPIR-V -> MSL]   %s", shader_name)
-		if !compile_spirv_to_msl(spv_path, metal_path) do continue
-
-		compiled_count += 1
-	}
-
-	total := len(shader_names)
-	if compiled_count == total {
-		fmt.printfln("Successfully compiled all %d shaders.", total)
-		return true
-	}
-
-	fmt.eprintfln("%d of %d shaders failed to compile.", total - compiled_count, total)
-	return false
-}
-
-verify_shader_tool :: proc(tool_name: string) -> bool {
-	_, _, _, err := os.process_exec(
-		os.Process_Desc{command = []string{tool_name, "--version"}},
-		context.temp_allocator,
-	)
-
-	if err != nil {
-		fmt.eprintfln("Required tool '%s' not found on PATH.", tool_name)
-		if tool_name == "glslangValidator" {
-			fmt.eprintln("\tInstall the Vulkan SDK or the glslang package:")
-			fmt.eprintln("\t  macOS:   brew install glslang")
-			fmt.eprintln("\t  Arch:    sudo pacman -S glslang")
-			fmt.eprintln("\t  Debian:  sudo apt install glslang-tools")
-		} else if tool_name == "spirv-cross" {
-			fmt.eprintln("\tInstall SPIRV-Cross:")
-			fmt.eprintln("\t  macOS:   brew install spirv-cross")
-			fmt.eprintln("\t  Arch:    sudo pacman -S spirv-cross")
-			fmt.eprintln("\t  Debian:  sudo apt install spirv-cross")
-		}
-		return false
-	}
-
-	return true
-}
-
-compile_glsl_to_spirv :: proc(source_path, output_path: string) -> bool {
-	state, stdout_bytes, stderr_bytes, err := os.process_exec(
-		os.Process_Desc{command = []string{"glslangValidator", "-V", source_path, "-o", output_path}},
-		context.temp_allocator,
-	)
-
-	if err != nil {
-		fmt.eprintfln("\tFailed to run glslangValidator for '%s': %v", source_path, err)
-		return false
-	}
-
-	if !state.success {
-		fmt.eprintfln("\tglslangValidator failed for '%s' (exit code %d):", source_path, state.exit_code)
-		print_tool_output(stdout_bytes, stderr_bytes)
-		return false
-	}
-
-	return true
-}
-
-compile_spirv_to_msl :: proc(spv_path, output_path: string) -> bool {
-	state, stdout_bytes, stderr_bytes, err := os.process_exec(
-		os.Process_Desc{command = []string{"spirv-cross", "--msl", spv_path, "--output", output_path}},
-		context.temp_allocator,
-	)
-
-	if err != nil {
-		fmt.eprintfln("\tFailed to run spirv-cross for '%s': %v", spv_path, err)
-		return false
-	}
-
-	if !state.success {
-		fmt.eprintfln("\tspirv-cross failed for '%s' (exit code %d):", spv_path, state.exit_code)
-		print_tool_output(stdout_bytes, stderr_bytes)
-		return false
-	}
-
-	return true
-}
-
-print_tool_output :: proc(stdout_bytes, stderr_bytes: []u8) {
-	stderr_text := strings.trim_right_space(transmute(string)stderr_bytes)
-	stdout_text := strings.trim_right_space(transmute(string)stdout_bytes)
-
-	if len(stderr_text) > 0 do fmt.eprintfln("\t%s", stderr_text)
-	if len(stdout_text) > 0 do fmt.eprintfln("\t%s", stdout_text)
-}
@@ -1,51 +0,0 @@
-package meta
-
-import "core:fmt"
-import "core:os"
-
-Command :: struct {
-	name:        string,
-	description: string,
-	run:         proc() -> bool,
-}
-
-COMMANDS :: []Command {
-	{
-		name = "gen-shaders",
-		description = "Compile GLSL shaders to SPIR-V and Metal Shading Language.",
-		run = proc() -> bool {
-			return gen_shaders("draw/shaders/source", "draw/shaders/generated")
-		},
-	},
-}
-
-main :: proc() {
-	args := os.args[1:]
-
-	if len(args) == 0 {
-		print_usage()
-		return
-	}
-
-	command_name := args[0]
-	for command in COMMANDS {
-		if command.name == command_name {
-			if !command.run() do os.exit(1)
-			return
-		}
-	}
-
-	fmt.eprintfln("Unknown command '%s'.", command_name)
-	fmt.eprintln()
-	print_usage()
-	os.exit(1)
-}
-
-print_usage :: proc() {
-	fmt.eprintln("Usage: meta <command>")
-	fmt.eprintln()
-	fmt.eprintln("Commands:")
-	for command in COMMANDS {
-		fmt.eprintfln("  %-20s %s", command.name, command.description)
-	}
-}
@@ -10,7 +10,6 @@ import "core:sync"
 import "core:thread"

 import b "../basic"
-import "../levsync"

 DEFT_BATCH_SIZE :: 1024 // Number of nodes in each batch
 DEFT_SPIN_LIMIT :: 2_500_000
@@ -19,8 +18,7 @@ Harness :: struct($T: typeid) where intrinsics.type_has_nil(T) {
 	mutex:        sync.Mutex,
 	condition:    sync.Cond,
 	cmd_queue:    q.Queue(T),
-	spin:       bool,
-	lock:       levsync.Spinlock,
+	spin, locked: bool,
 	_pad:         [64 - size_of(uint)]u8, // We want join_count to have its own cache line
 	join_count:   uint, // Number of commands completed since last exec_join
 }
@@ -89,14 +87,14 @@ destroy_executor :: proc(executor: ^Executor($T), allocator := context.allocator
 	// Exit thread loops
 	for &harness in executor.harnesses {
 		for {
-			if levsync.try_lock(&harness.lock) {
+			if try_lock_harness(&harness.locked) {
 				q.push_back(&harness.cmd_queue, nil)
 				if !harness.spin {
 					sync.mutex_lock(&harness.mutex)
 					sync.cond_signal(&harness.condition)
 					sync.mutex_unlock(&harness.mutex)
 				}
-				levsync.unlock(&harness.lock)
+				intrinsics.atomic_store_explicit(&harness.locked, false, .Release)
 				break
 			}
 		}
@@ -110,6 +108,18 @@ destroy_executor :: proc(executor: ^Executor($T), allocator := context.allocator
 	delete(executor.harnesses, allocator)
 }

+// Returns true if lock successfuly acquired, false otherwise
+try_lock_harness :: #force_inline proc "contextless" (locked: ^bool) -> bool {
+	was_locked, lock_acquired := intrinsics.atomic_compare_exchange_weak_explicit(
+		locked,
+		false,
+		true,
+		.Acq_Rel,
+		.Relaxed,
+	)
+	return lock_acquired
+}
+
 build_task :: proc(
 	$on_command_received: proc(command: $T),
 ) -> (
@@ -130,12 +140,12 @@ build_task :: proc(
 			// Spinning
 			spin_count: uint = 0
 			spin_loop: for {
-				if levsync.try_lock(&harness.lock) {
+				if try_lock_harness(&harness.locked) {
 					if q.len(harness.cmd_queue) > 0 {

 						// Execute command
 						command := q.pop_front(&harness.cmd_queue)
-						levsync.unlock(&harness.lock)
+						intrinsics.atomic_store_explicit(&harness.locked, false, .Release)
 						if command == nil do return
 						on_command_received(command)

@@ -143,7 +153,7 @@ build_task :: proc(
 						intrinsics.atomic_add_explicit(&harness.join_count, 1, .Release)
 					} else {
 						defer intrinsics.cpu_relax()
-						defer levsync.unlock(&harness.lock)
+						defer intrinsics.atomic_store_explicit(&harness.locked, false, .Release)
 						spin_count += 1
 						if spin_count == executor.spin_limit {
 							harness.spin = false
@@ -161,8 +171,8 @@ build_task :: proc(
 				sync.cond_wait(&harness.condition, &harness.mutex)
 				for { 	// Loop to acquire harness lock
 					defer intrinsics.cpu_relax()
-					if levsync.try_lock(&harness.lock) {
-						defer levsync.unlock(&harness.lock)
+					if try_lock_harness(&harness.locked) {
+						defer intrinsics.atomic_store_explicit(&harness.locked, false, .Release)
 						if q.len(harness.cmd_queue) > 0 {
 							harness.spin = true
 							break cond_loop
@@ -189,13 +199,13 @@ exec_command :: proc(executor: ^Executor($T), command: T) {
 			}
 		}
 		harness := &executor.harnesses[executor.harness_index]
-		if levsync.try_lock(&harness.lock) {
+		if try_lock_harness(&harness.locked) {
 			if q.len(harness.cmd_queue) <= executor.cmd_queue_floor {
 				q.push_back(&harness.cmd_queue, command)
 				executor.cmd_queue_floor = q.len(harness.cmd_queue)
 				slave_sleeping := !harness.spin
 				// Must release lock before signalling to avoid race from slave spurious wakeup
-				levsync.unlock(&harness.lock)
+				intrinsics.atomic_store_explicit(&harness.locked, false, .Release)
 				if slave_sleeping {
 					sync.mutex_lock(&harness.mutex)
 					sync.cond_signal(&harness.condition)
@@ -203,7 +213,7 @@ exec_command :: proc(executor: ^Executor($T), command: T) {
 				}
 				break
 			}
-			levsync.unlock(&harness.lock)
+			intrinsics.atomic_store_explicit(&harness.locked, false, .Release)
 		}
 	}
 }
@@ -1,285 +0,0 @@
-package examples
-
-import "core:fmt"
-import "core:mem"
-import "core:os"
-
-import qr ".."
-
-main :: proc() {
-	//----- Tracking allocator ----------------------------------
-	{
-		tracking_temp_allocator := false
-		// Temp
-		track_temp: mem.Tracking_Allocator
-		if tracking_temp_allocator {
-			mem.tracking_allocator_init(&track_temp, context.temp_allocator)
-			context.temp_allocator = mem.tracking_allocator(&track_temp)
-		}
-		// Default
-		track: mem.Tracking_Allocator
-		mem.tracking_allocator_init(&track, context.allocator)
-		context.allocator = mem.tracking_allocator(&track)
-		defer {
-			// Temp allocator
-			if tracking_temp_allocator {
-				if len(track_temp.allocation_map) > 0 {
-					fmt.eprintf("=== %v allocations not freed - temp allocator: ===\n", len(track_temp.allocation_map))
-					for _, entry in track_temp.allocation_map {
-						fmt.eprintf("- %v bytes @ %v\n", entry.size, entry.location)
-					}
-				}
-				if len(track_temp.bad_free_array) > 0 {
-					fmt.eprintf("=== %v incorrect frees - temp allocator: ===\n", len(track_temp.bad_free_array))
-					for entry in track_temp.bad_free_array {
-						fmt.eprintf("- %p @ %v\n", entry.memory, entry.location)
-					}
-				}
-				mem.tracking_allocator_destroy(&track_temp)
-			}
-			// Default allocator
-			if len(track.allocation_map) > 0 {
-				fmt.eprintf("=== %v allocations not freed - main allocator: ===\n", len(track.allocation_map))
-				for _, entry in track.allocation_map {
-					fmt.eprintf("- %v bytes @ %v\n", entry.size, entry.location)
-				}
-			}
-			if len(track.bad_free_array) > 0 {
-				fmt.eprintf("=== %v incorrect frees - main allocator: ===\n", len(track.bad_free_array))
-				for entry in track.bad_free_array {
-					fmt.eprintf("- %p @ %v\n", entry.memory, entry.location)
-				}
-			}
-			mem.tracking_allocator_destroy(&track)
-		}
-	}
-
-	args := os.args
-	if len(args) < 2 {
-		fmt.eprintln("Usage: examples <example_name>")
-		fmt.eprintln("Available examples: basic, variety, segment, mask")
-		os.exit(1)
-	}
-
-	switch args[1] {
-	case "basic": basic()
-	case "variety": variety()
-	case "segment": segment()
-	case "mask": mask()
-	case:
-		fmt.eprintf("Unknown example: %v\n", args[1])
-		fmt.eprintln("Available examples: basic, variety, segment, mask")
-		os.exit(1)
-	}
-}
-
-// Creates a single QR Code, then prints it to the console.
-basic :: proc() {
-	text :: "Hello, world!"
-	ecl :: qr.Ecc.Low
-
-	qrcode: [qr.BUFFER_LEN_MAX]u8
-	ok := qr.encode_auto(text, qrcode[:], ecl)
-	if ok do print_qr(qrcode[:])
-}
-
-// Creates a variety of QR Codes that exercise different features of the library.
-variety :: proc() {
-	qrcode: [qr.BUFFER_LEN_MAX]u8
-
-	{ 	// Numeric mode encoding (3.33 bits per digit)
-		ok := qr.encode_auto("314159265358979323846264338327950288419716939937510", qrcode[:], qr.Ecc.Medium)
-		if ok do print_qr(qrcode[:])
-	}
-
-	{ 	// Alphanumeric mode encoding (5.5 bits per character)
-		ok := qr.encode_auto("DOLLAR-AMOUNT:$39.87 PERCENTAGE:100.00% OPERATIONS:+-*/", qrcode[:], qr.Ecc.High)
-		if ok do print_qr(qrcode[:])
-	}
-
-	{ 	// Unicode text as UTF-8
-		ok := qr.encode_auto(
-			"\xE3\x81\x93\xE3\x82\x93\xE3\x81\xAB\xE3\x81\xA1wa\xE3\x80\x81" +
-			"\xE4\xB8\x96\xE7\x95\x8C\xEF\xBC\x81\x20\xCE\xB1\xCE\xB2\xCE\xB3\xCE\xB4",
-			qrcode[:],
-			qr.Ecc.Quartile,
-		)
-		if ok do print_qr(qrcode[:])
-	}
-
-	{ 	// Moderately large QR Code using longer text (from Lewis Carroll's Alice in Wonderland)
-		ok := qr.encode_auto(
-			"Alice was beginning to get very tired of sitting by her sister on the bank, " +
-			"and of having nothing to do: once or twice she had peeped into the book her sister was reading, " +
-			"but it had no pictures or conversations in it, 'and what is the use of a book,' thought Alice " +
-			"'without pictures or conversations?' So she was considering in her own mind (as well as she could, " +
-			"for the hot day made her feel very sleepy and stupid), whether the pleasure of making a " +
-			"daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly " +
-			"a White Rabbit with pink eyes ran close by her.",
-			qrcode[:],
-			qr.Ecc.High,
-		)
-		if ok do print_qr(qrcode[:])
-	}
-}
-
-// Creates QR Codes with manually specified segments for better compactness.
-segment :: proc() {
-	qrcode: [qr.BUFFER_LEN_MAX]u8
-
-	{ 	// Illustration "silver"
-		silver0 :: "THE SQUARE ROOT OF 2 IS 1."
-		silver1 :: "41421356237309504880168872420969807856967187537694807317667973799"
-
-		// Encode as single text (auto mode selection)
-		{
-			concat :: silver0 + silver1
-			ok := qr.encode_auto(concat, qrcode[:], qr.Ecc.Low)
-			if ok do print_qr(qrcode[:])
-		}
-
-		// Encode as two manual segments (alphanumeric + numeric) for better compactness
-		{
-			seg_buf0: [qr.BUFFER_LEN_MAX]u8
-			seg_buf1: [qr.BUFFER_LEN_MAX]u8
-			segs := [2]qr.Segment{qr.make_alphanumeric(silver0, seg_buf0[:]), qr.make_numeric(silver1, seg_buf1[:])}
-			ok := qr.encode_auto(segs[:], qr.Ecc.Low, qrcode[:])
-			if ok do print_qr(qrcode[:])
-		}
-	}
-
-	{ 	// Illustration "golden"
-		golden0 :: "Golden ratio \xCF\x86 = 1."
-		golden1 :: "6180339887498948482045868343656381177203091798057628621354486227052604628189024497072072041893911374"
-		golden2 :: "......"
-
-		// Encode as single text (auto mode selection)
-		{
-			concat :: golden0 + golden1 + golden2
-			ok := qr.encode_auto(concat, qrcode[:], qr.Ecc.Low)
-			if ok do print_qr(qrcode[:])
-		}
-
-		// Encode as three manual segments (byte + numeric + alphanumeric) for better compactness
-		{
-			golden0_str: string = golden0
-			golden0_bytes := transmute([]u8)golden0_str
-			seg_buf0: [qr.BUFFER_LEN_MAX]u8
-			seg_buf1: [qr.BUFFER_LEN_MAX]u8
-			seg_buf2: [qr.BUFFER_LEN_MAX]u8
-			segs := [3]qr.Segment {
-				qr.make_bytes(golden0_bytes, seg_buf0[:]),
-				qr.make_numeric(golden1, seg_buf1[:]),
-				qr.make_alphanumeric(golden2, seg_buf2[:]),
-			}
-			ok := qr.encode_auto(segs[:], qr.Ecc.Low, qrcode[:])
-			if ok do print_qr(qrcode[:])
-		}
-	}
-
-	{ 	// Illustration "Madoka": kanji, kana, Cyrillic, full-width Latin, Greek characters
-		// Encode as text (auto mode — byte mode)
-		{
-			madoka ::
-				"\xE3\x80\x8C\xE9\xAD\x94\xE6\xB3\x95\xE5" +
-				"\xB0\x91\xE5\xA5\xB3\xE3\x81\xBE\xE3\x81" +
-				"\xA9\xE3\x81\x8B\xE2\x98\x86\xE3\x83\x9E" +
-				"\xE3\x82\xAE\xE3\x82\xAB\xE3\x80\x8D\xE3" +
-				"\x81\xA3\xE3\x81\xA6\xE3\x80\x81\xE3\x80" +
-				"\x80\xD0\x98\xD0\x90\xD0\x98\xE3\x80\x80" +
-				"\xEF\xBD\x84\xEF\xBD\x85\xEF\xBD\x93\xEF" +
-				"\xBD\x95\xE3\x80\x80\xCE\xBA\xCE\xB1\xEF" +
-				"\xBC\x9F"
-			ok := qr.encode_auto(madoka, qrcode[:], qr.Ecc.Low)
-			if ok do print_qr(qrcode[:])
-		}
-
-		// Encode with manual kanji mode (13 bits per character)
-		{
-					//odinfmt: disable
-			kanji_chars :: [29]int{
-				0x0035, 0x1002, 0x0FC0, 0x0AED, 0x0AD7,
-				0x015C, 0x0147, 0x0129, 0x0059, 0x01BD,
-				0x018D, 0x018A, 0x0036, 0x0141, 0x0144,
-				0x0001, 0x0000, 0x0249, 0x0240, 0x0249,
-				0x0000, 0x0104, 0x0105, 0x0113, 0x0115,
-				0x0000, 0x0208, 0x01FF, 0x0008,
-			}
-			//odinfmt: enable
-
-			seg_buf: [qr.BUFFER_LEN_MAX]u8
-			for &b in seg_buf {
-				b = 0
-			}
-
-			seg: qr.Segment
-			seg.mode = .Kanji
-			seg.num_chars = len(kanji_chars)
-			seg.bit_length = 0
-			for ch in kanji_chars {
-				for j := 12; j >= 0; j -= 1 {
-					seg_buf[seg.bit_length >> 3] |= u8(((ch >> uint(j)) & 1)) << uint(7 - (seg.bit_length & 7))
-					seg.bit_length += 1
-				}
-			}
-			seg.data = seg_buf[:(seg.bit_length + 7) / 8]
-
-			segs := [1]qr.Segment{seg}
-			ok := qr.encode_auto(segs[:], qr.Ecc.Low, qrcode[:])
-			if ok do print_qr(qrcode[:])
-		}
-	}
-}
-
-// Creates QR Codes with the same size and contents but different mask patterns.
-mask :: proc() {
-	qrcode: [qr.BUFFER_LEN_MAX]u8
-
-	{ 	// Project Nayuki URL
-		ok: bool
-
-		ok = qr.encode_auto("https://www.nayuki.io/", qrcode[:], qr.Ecc.High)
-		if ok do print_qr(qrcode[:])
-
-		ok = qr.encode_auto("https://www.nayuki.io/", qrcode[:], qr.Ecc.High, mask = qr.Mask.M3)
-		if ok do print_qr(qrcode[:])
-	}
-
-	{ 	// Chinese text as UTF-8
-		text ::
-			"\xE7\xB6\xAD\xE5\x9F\xBA\xE7\x99\xBE\xE7\xA7\x91\xEF\xBC\x88\x57\x69\x6B\x69\x70" +
-			"\x65\x64\x69\x61\xEF\xBC\x8C\xE8\x81\x86\xE8\x81\xBD\x69\x2F\xCB\x8C\x77\xC9\xAA" +
-			"\x6B\xE1\xB5\xBB\xCB\x88\x70\x69\xCB\x90\x64\x69\x2E\xC9\x99\x2F\xEF\xBC\x89\xE6" +
-			"\x98\xAF\xE4\xB8\x80\xE5\x80\x8B\xE8\x87\xAA\xE7\x94\xB1\xE5\x85\xA7\xE5\xAE\xB9" +
-			"\xE3\x80\x81\xE5\x85\xAC\xE9\x96\x8B\xE7\xB7\xA8\xE8\xBC\xAF\xE4\xB8\x94\xE5\xA4" +
-			"\x9A\xE8\xAA\x9E\xE8\xA8\x80\xE7\x9A\x84\xE7\xB6\xB2\xE8\xB7\xAF\xE7\x99\xBE\xE7" +
-			"\xA7\x91\xE5\x85\xA8\xE6\x9B\xB8\xE5\x8D\x94\xE4\xBD\x9C\xE8\xA8\x88\xE7\x95\xAB"
-
-		ok: bool
-
-		ok = qr.encode_auto(text, qrcode[:], qr.Ecc.Medium, mask = qr.Mask.M0)
-		if ok do print_qr(qrcode[:])
-
-		ok = qr.encode_auto(text, qrcode[:], qr.Ecc.Medium, mask = qr.Mask.M1)
-		if ok do print_qr(qrcode[:])
-
-		ok = qr.encode_auto(text, qrcode[:], qr.Ecc.Medium, mask = qr.Mask.M5)
-		if ok do print_qr(qrcode[:])
-
-		ok = qr.encode_auto(text, qrcode[:], qr.Ecc.Medium, mask = qr.Mask.M7)
-		if ok do print_qr(qrcode[:])
-	}
-}
-
-// Prints the given QR Code to the console.
-print_qr :: proc(qrcode: []u8) {
-	size := qr.get_size(qrcode)
-	border :: 4
-	for y in -border ..< size + border {
-		for x in -border ..< size + border {
-			fmt.print("##" if qr.get_module(qrcode, x, y) else "  ")
-		}
-		fmt.println()
-	}
-	fmt.println()
-}
@@ -1,489 +0,0 @@
-package clay
-
-import "core:c"
-
-when ODIN_OS == .Windows {
-	foreign import Clay "windows/clay.lib"
-} else when ODIN_OS == .Linux {
-	foreign import Clay "linux/clay.a"
-} else when ODIN_OS == .Darwin {
-	when ODIN_ARCH == .arm64 {
-		foreign import Clay "macos-arm64/clay.a"
-	} else {
-		foreign import Clay "macos/clay.a"
-	}
-} else when ODIN_ARCH == .wasm32 || ODIN_ARCH == .wasm64p32 {
-	foreign import Clay "wasm/clay.o"
-}
-
-String :: struct {
-	isStaticallyAllocated: c.bool,
-	length: c.int32_t,
-	chars:  [^]c.char,
-}
-
-StringSlice :: struct {
-	length: c.int32_t,
-	chars:  [^]c.char,
-	baseChars:  [^]c.char,
-}
-
-Vector2 :: [2]c.float
-
-Dimensions :: struct {
-	width:  c.float,
-	height: c.float,
-}
-
-Arena :: struct {
-	nextAllocation: uintptr,
-	capacity:       c.size_t,
-	memory:         [^]c.char,
-}
-
-BoundingBox :: struct {
-	x:      c.float,
-	y:      c.float,
-	width:  c.float,
-	height: c.float,
-}
-
-Color :: [4]c.float
-
-CornerRadius :: struct {
-	topLeft:     c.float,
-	topRight:    c.float,
-	bottomLeft:  c.float,
-	bottomRight: c.float,
-}
-
-BorderData :: struct {
-	width: u32,
-	color: Color,
-}
-
-ElementId :: struct {
-	id:       u32,
-	offset:   u32,
-	baseId:   u32,
-	stringId: String,
-}
-
-when ODIN_OS == .Windows {
-	EnumBackingType :: u32
-} else {
-	EnumBackingType :: u8
-}
-
-RenderCommandType :: enum EnumBackingType {
-	None,
-	Rectangle,
-	Border,
-	Text,
-	Image,
-	ScissorStart,
-	ScissorEnd,
-	Custom,
-}
-
-RectangleElementConfig :: struct {
-	color:        Color,
-}
-
-TextWrapMode :: enum EnumBackingType {
-	Words,
-	Newlines,
-	None,
-}
-
-TextAlignment :: enum EnumBackingType {
-	Left,
-	Center,
-	Right,
-}
-
-TextElementConfig :: struct {
-	userData:           rawptr,
-	textColor:          Color,
-	fontId:             u16,
-	fontSize:           u16,
-	letterSpacing:      u16,
-	lineHeight:         u16,
-	wrapMode:           TextWrapMode,
-	textAlignment:      TextAlignment,
-}
-
-AspectRatioElementConfig :: struct {
-	aspectRatio:        f32,
-}
-
-ImageElementConfig :: struct {
-	imageData:        rawptr,
-}
-
-CustomElementConfig :: struct {
-	customData: rawptr,
-}
-
-BorderWidth :: struct {
-	left: u16,
-	right: u16,
-	top: u16,
-	bottom: u16,
-	betweenChildren: u16,
-}
-
-BorderElementConfig :: struct {
-	color: Color,
-	width: BorderWidth,
-}
-
-ClipElementConfig :: struct {
-	horizontal:  bool, // clip overflowing elements on the "X" axis
-	vertical:    bool, // clip overflowing elements on the "Y" axis
-	childOffset: Vector2, // offsets the [X,Y] positions of all child elements, primarily for scrolling containers
-}
-
-FloatingAttachPointType :: enum EnumBackingType {
-	LeftTop,
-	LeftCenter,
-	LeftBottom,
-	CenterTop,
-	CenterCenter,
-	CenterBottom,
-	RightTop,
-	RightCenter,
-	RightBottom,
-}
-
-FloatingAttachPoints :: struct {
-	element: FloatingAttachPointType,
-	parent:  FloatingAttachPointType,
-}
-
-PointerCaptureMode :: enum EnumBackingType {
-	Capture,
-	Passthrough,
-}
-
-FloatingAttachToElement :: enum EnumBackingType {
-	None,
-	Parent,
-	ElementWithId,
-	Root,
-}
-
-FloatingClipToElement :: enum EnumBackingType {
-	None,
-	AttachedParent,
-}
-
-FloatingElementConfig :: struct {
-	offset:             Vector2,
-	expand:             Dimensions,
-	parentId:           u32,
-	zIndex:             i16,
-	attachment:         FloatingAttachPoints,
-	pointerCaptureMode: PointerCaptureMode,
-	attachTo:           FloatingAttachToElement,
-	clipTo: 			FloatingClipToElement,
-}
-
-TextRenderData :: struct {
-	stringContents: StringSlice,
-	textColor: Color,
-	fontId: u16,
-	fontSize: u16,
-	letterSpacing: u16,
-	lineHeight: u16,
-}
-
-RectangleRenderData :: struct {
-	backgroundColor: Color,
-	cornerRadius: CornerRadius,
-}
-
-ImageRenderData :: struct {
-	backgroundColor: Color,
-	cornerRadius: CornerRadius,
-	imageData: rawptr,
-}
-
-CustomRenderData :: struct {
-	backgroundColor: Color,
-	cornerRadius: CornerRadius,
-	customData: rawptr,
-}
-
-BorderRenderData :: struct {
-	color: Color,
-	cornerRadius: CornerRadius,
-	width: BorderWidth,
-}
-
-RenderCommandData :: struct #raw_union {
-	rectangle: RectangleRenderData,
-	text: TextRenderData,
-	image: ImageRenderData,
-	custom: CustomRenderData,
-	border: BorderRenderData,
-}
-
-RenderCommand :: struct {
-	boundingBox:        BoundingBox,
-	renderData:         RenderCommandData,
-	userData:           rawptr,
-	id:                 u32,
-	zIndex:             i16,
-	commandType:        RenderCommandType,
-}
-
-ScrollContainerData :: struct {
-	// Note: This is a pointer to the real internal scroll position, mutating it may cause a change in final layout.
-	// Intended for use with external functionality that modifies scroll position, such as scroll bars or auto scrolling.
-	scrollPosition:            ^Vector2,
-	scrollContainerDimensions: Dimensions,
-	contentDimensions:         Dimensions,
-	config:                    ClipElementConfig,
-	// Indicates whether an actual scroll container matched the provided ID or if the default struct was returned.
-	found:                     bool,
-}
-
-ElementData :: struct {
-	boundingBox: BoundingBox,
-	found:       bool,
-}
-
-PointerDataInteractionState :: enum EnumBackingType {
-	PressedThisFrame,
-	Pressed,
-	ReleasedThisFrame,
-	Released,
-}
-
-PointerData :: struct {
-	position: Vector2,
-	state:    PointerDataInteractionState,
-}
-
-SizingType :: enum EnumBackingType {
-	Fit,
-	Grow,
-	Percent,
-	Fixed,
-}
-
-SizingConstraintsMinMax :: struct {
-	min: c.float,
-	max: c.float,
-}
-
-SizingConstraints :: struct #raw_union {
-	sizeMinMax:  SizingConstraintsMinMax,
-	sizePercent: c.float,
-}
-
-SizingAxis :: struct {
-	// Note: `min` is used for CLAY_SIZING_PERCENT, slightly different to clay.h due to lack of C anonymous unions
-	constraints: SizingConstraints,
-	type:        SizingType,
-}
-
-Sizing :: struct {
-	width:  SizingAxis,
-	height: SizingAxis,
-}
-
-Padding :: struct {
-	left: u16,
-	right: u16,
-	top: u16,
-	bottom: u16,
-}
-
-LayoutDirection :: enum EnumBackingType {
-	LeftToRight,
-	TopToBottom,
-}
-
-LayoutAlignmentX :: enum EnumBackingType {
-	Left,
-	Right,
-	Center,
-}
-
-LayoutAlignmentY :: enum EnumBackingType {
-	Top,
-	Bottom,
-	Center,
-}
-
-ChildAlignment :: struct {
-	x: LayoutAlignmentX,
-	y: LayoutAlignmentY,
-}
-
-LayoutConfig :: struct {
-	sizing:          Sizing,
-	padding:         Padding,
-	childGap:        u16,
-	childAlignment:  ChildAlignment,
-	layoutDirection: LayoutDirection,
-}
-
-ClayArray :: struct($type: typeid) {
-	capacity:      i32,
-	length:        i32,
-	internalArray: [^]type,
-}
-
-ElementDeclaration :: struct {
-	id:              ElementId,
-	layout:          LayoutConfig,
-	backgroundColor: Color,
-	cornerRadius:    CornerRadius,
-	aspectRatio: 	 AspectRatioElementConfig,
-	image:           ImageElementConfig,
-	floating:        FloatingElementConfig,
-	custom:          CustomElementConfig,
-	clip:            ClipElementConfig,
-	border:          BorderElementConfig,
-	userData:        rawptr,
-}
-
-ErrorType :: enum EnumBackingType {
-	TextMeasurementFunctionNotProvided,
-	ArenaCapacityExceeded,
-	ElementsCapacityExceeded,
-	TextMeasurementCapacityExceeded,
-	DuplicateId,
-	FloatingContainerParentNotFound,
-	PercentageOver1,
-	InternalError,
-}
-
-ErrorData :: struct {
-	errorType: ErrorType,
-	errorText: String,
-	userData: rawptr,
-}
-
-ErrorHandler :: struct {
-	handler: proc "c" (errorData: ErrorData),
-	userData: rawptr,
-}
-
-Context :: struct {} // opaque structure, only use as a pointer
-
-@(link_prefix = "Clay_", default_calling_convention = "c")
-foreign Clay {
-	_OpenElement :: proc() ---
-	_CloseElement :: proc() ---
-	MinMemorySize :: proc() -> u32 ---
-	CreateArenaWithCapacityAndMemory :: proc(capacity: c.size_t, offset: [^]u8) -> Arena ---
-	SetPointerState :: proc(position: Vector2, pointerDown: bool) ---
-	Initialize :: proc(arena: Arena, layoutDimensions: Dimensions, errorHandler: ErrorHandler) -> ^Context ---
-	GetCurrentContext :: proc() -> ^Context ---
-	SetCurrentContext :: proc(ctx: ^Context) ---
-	UpdateScrollContainers :: proc(enableDragScrolling: bool, scrollDelta: Vector2, deltaTime: c.float) ---
-	SetLayoutDimensions :: proc(dimensions: Dimensions) ---
-	BeginLayout :: proc() ---
-	EndLayout :: proc() -> ClayArray(RenderCommand) ---
-	GetElementId :: proc(id: String) -> ElementId ---
-	GetElementIdWithIndex :: proc(id: String, index: u32) -> ElementId ---
-	GetElementData :: proc(id: ElementId) -> ElementData ---
-	Hovered :: proc() -> bool ---
-	OnHover :: proc(onHoverFunction: proc "c" (id: ElementId, pointerData: PointerData, userData: rawptr), userData: rawptr) ---
-	PointerOver :: proc(id: ElementId) -> bool ---
-	GetScrollOffset :: proc() -> Vector2 ---
-	GetScrollContainerData :: proc(id: ElementId) -> ScrollContainerData ---
-	SetMeasureTextFunction :: proc(measureTextFunction: proc "c" (text: StringSlice, config: ^TextElementConfig, userData: rawptr) -> Dimensions, userData: rawptr) ---
-	SetQueryScrollOffsetFunction :: proc(queryScrollOffsetFunction: proc "c" (elementId: u32, userData: rawptr) -> Vector2, userData: rawptr) ---
-	RenderCommandArray_Get :: proc(array: ^ClayArray(RenderCommand), index: i32) -> ^RenderCommand ---
-	SetDebugModeEnabled :: proc(enabled: bool) ---
-	IsDebugModeEnabled :: proc() -> bool ---
-	SetCullingEnabled :: proc(enabled: bool) ---
-	GetMaxElementCount :: proc() -> i32 ---
-	SetMaxElementCount :: proc(maxElementCount: i32) ---
-	GetMaxMeasureTextCacheWordCount :: proc() -> i32 ---
-	SetMaxMeasureTextCacheWordCount :: proc(maxMeasureTextCacheWordCount: i32) ---
-	ResetMeasureTextCache :: proc() ---
-}
-
-@(link_prefix = "Clay_", default_calling_convention = "c", private)
-foreign Clay {
-	_ConfigureOpenElement :: proc(config: ElementDeclaration) ---
-	_HashString :: proc(key: String, offset: u32, seed: u32) -> ElementId ---
-	_OpenTextElement :: proc(text: String, textConfig: ^TextElementConfig) ---
-	_StoreTextElementConfig :: proc(config: TextElementConfig) -> ^TextElementConfig ---
-	_GetParentElementId :: proc() -> u32 ---
-}
-
-ConfigureOpenElement :: proc(config: ElementDeclaration) -> bool {
-	_ConfigureOpenElement(config)
-	return true
-}
-
-@(deferred_none = _CloseElement)
-UI :: proc() -> proc (config: ElementDeclaration) -> bool {
-	_OpenElement()
-	return ConfigureOpenElement
-}
-
-Text :: proc($text: string, config: ^TextElementConfig) {
-	wrapped := MakeString(text)
-	wrapped.isStaticallyAllocated = true
-	_OpenTextElement(wrapped, config)
-}
-
-TextDynamic :: proc(text: string, config: ^TextElementConfig) {
-	_OpenTextElement(MakeString(text), config)
-}
-
-TextConfig :: proc(config: TextElementConfig) -> ^TextElementConfig {
-	return _StoreTextElementConfig(config)
-}
-
-PaddingAll :: proc(allPadding: u16) -> Padding {
-	return { left = allPadding, right = allPadding, top = allPadding, bottom = allPadding }
-}
-
-BorderOutside :: proc(width: u16) -> BorderWidth {
-	return {width, width, width, width, 0}
-}
-
-BorderAll :: proc(width: u16) -> BorderWidth {
-	return {width, width, width, width, width}
-}
-
-CornerRadiusAll :: proc(radius: f32) -> CornerRadius {
-	return CornerRadius{radius, radius, radius, radius}
-}
-
-SizingFit :: proc(sizeMinMax: SizingConstraintsMinMax) -> SizingAxis {
-	return SizingAxis{type = SizingType.Fit, constraints = {sizeMinMax = sizeMinMax}}
-}
-
-SizingGrow :: proc(sizeMinMax: SizingConstraintsMinMax) -> SizingAxis {
-	return SizingAxis{type = SizingType.Grow, constraints = {sizeMinMax = sizeMinMax}}
-}
-
-SizingFixed :: proc(size: c.float) -> SizingAxis {
-	return SizingAxis{type = SizingType.Fixed, constraints = {sizeMinMax = {size, size}}}
-}
-
-SizingPercent :: proc(sizePercent: c.float) -> SizingAxis {
-	return SizingAxis{type = SizingType.Percent, constraints = {sizePercent = sizePercent}}
-}
-
-MakeString :: proc(label: string) -> String {
-	return String{chars = raw_data(label), length = cast(c.int)len(label)}
-}
-
-ID :: proc(label: string, index: u32 = 0) -> ElementId {
-	return _HashString(MakeString(label), index, 0)
-}
-
-ID_LOCAL :: proc(label: string, index: u32 = 0) -> ElementId {
-	return _HashString(MakeString(label), index, _GetParentElementId())
-}
@@ -1,6 +0,0 @@
-{
-	"$schema": "https://raw.githubusercontent.com/DanielGavin/ols/master/misc/odinfmt.schema.json",
-	"character_width": 180,
-	"sort_imports": true,
-	"tabs": false
-}
Author	SHA1	Message	Date
Zachary Levy	fd3cd1b6e6	Cleaned up phased_executor test	2026-04-02 18:30:12 -07:00
Zachary Levy	ec54afebb2	Removed using statement from many_bits	2026-04-02 18:26:06 -07:00
Zachary Levy	fa3fee52f6	Added test all task	2026-04-02 18:24:38 -07:00
Zachary Levy	5559ed2e0b	Added phased executor	2026-04-02 18:19:42 -07:00