From f85187eff3485311ab630b0d8f919bca51cab8c5 Mon Sep 17 00:00:00 2001
From: Zachary Levy <zachary@sunforge.is>
Date: Mon, 20 Apr 2026 20:27:40 -0700
Subject: [PATCH 1/5] Clean up memory management

---
 draw/README.md   | 222 ++++++++++++++++++++++++++++++++++++++++++-----
 draw/draw.odin   |   5 +-
 draw/shapes.odin |   6 ++
 draw/text.odin   |   2 +
 4 files changed, 213 insertions(+), 22 deletions(-)

diff --git a/draw/README.md b/draw/README.md
index 5f9225a..1066a7e 100644
--- a/draw/README.md
+++ b/draw/README.md
@@ -81,32 +81,63 @@ shader contains both a 20-register RRect SDF and a 72-register frosted-glass blu
 — even trivial RRects — is allocated 72 registers. This directly reduces **occupancy** (the number of
 warps that can run simultaneously), which reduces the GPU's ability to hide memory latency.
 
-Concrete example on a modern NVIDIA SM with 65,536 registers:
+Concrete occupancy analysis on modern NVIDIA SMs, which have 65,536 32-bit registers and a
+hardware-imposed maximum thread count per SM that varies by architecture (Volta/A100: 2,048;
+consumer Ampere/Ada: 1,536). Occupancy is register-limited only when `65536 / regs_per_thread` falls
+below the hardware thread cap; above that cap, occupancy is 100% regardless of register count.
 
-| Register allocation       | Max concurrent threads | Occupancy |
-| ------------------------- | ---------------------- | --------- |
-| 20 regs (RRect only)      | 3,276                  | ~100%     |
-| 48 regs (+ drop shadow)   | 1,365                  | ~42%      |
-| 72 regs (+ frosted glass) | 910                    | ~28%      |
+On consumer Ampere/Ada GPUs (RTX 30xx/40xx, max 1,536 threads per SM):
 
-For a 4K frame (3840×2160) at 1.5× overdraw (~12.4M fragments), running all fragments at 28%
-occupancy instead of 100% roughly triples fragment shading time. At 4K this is severe: if the main
-pipeline's fragment work at full occupancy takes ~2ms, a single unified shader containing the glass
-branch would push it to ~6ms — consuming 72% of the 8.3ms budget available at 120 FPS and leaving
-almost nothing for CPU work, uploads, and presentation. This is a per-frame multiplier, not a
-per-primitive cost — it applies even when the heavy branch is never taken.
+| Register allocation       | Reg-limited threads | Actual (hw-capped) | Occupancy |
+| ------------------------- | ------------------- | ------------------ | --------- |
+| 20 regs (RRect only)      | 3,276               | 1,536              | 100%      |
+| 32 regs                   | 2,048               | 1,536              | 100%      |
+| 48 regs (+ drop shadow)   | 1,365               | 1,365              | ~89%      |
+| 72 regs (+ frosted glass) | 910                 | 910                | ~59%      |
+
+On Volta/A100 GPUs (max 2,048 threads per SM):
+
+| Register allocation       | Reg-limited threads | Actual (hw-capped) | Occupancy |
+| ------------------------- | ------------------- | ------------------ | --------- |
+| 20 regs (RRect only)      | 3,276               | 2,048              | 100%      |
+| 32 regs                   | 2,048               | 2,048              | 100%      |
+| 48 regs (+ drop shadow)   | 1,365               | 1,365              | ~67%      |
+| 72 regs (+ frosted glass) | 910                 | 910                | ~44%      |
+
+The register cliff — where occupancy begins dropping — starts at ~43 regs/thread on consumer
+Ampere/Ada (65536 / 1536) and ~32 regs/thread on Volta/A100 (65536 / 2048). Below the cliff,
+adding registers has zero occupancy cost.
+
+The impact of reduced occupancy depends on whether the shader is memory-latency-bound (where
+occupancy is critical for hiding latency) or ALU-bound (where it matters less). For the
+backdrop-effects pipeline's frosted-glass shader, which performs multiple dependent texture reads,
+59% occupancy (consumer) or 44% occupancy (Volta) meaningfully reduces the GPU's ability to hide
+texture latency — roughly a 1.7× to 2.3× throughput reduction compared to full occupancy. At 4K with
+1.5× overdraw (~12.4M fragments), if the main pipeline's fragment work at full occupancy takes ~2ms,
+a single unified shader containing the glass branch would push it to ~3.4–4.6ms depending on
+architecture. This is a per-frame multiplier, not a per-primitive cost — it applies even when the
+heavy branch is never taken, because the compiler allocates registers for the worst-case path.
+
+**Note on Apple M3+ GPUs:** Apple's M3 GPU architecture introduces Dynamic Caching (register file
+virtualization), which allocates registers dynamically at runtime based on actual usage rather than
+worst-case declared usage. This significantly reduces the static register-pressure-to-occupancy
+penalty described above. The tier split remains useful on Apple hardware for other reasons (keeping
+the backdrop texture-copy out of the main render pass, isolating blur ALU complexity), but the
+register-pressure argument specifically weakens on M3 and later.
 
 The three-pipeline split groups primitives by register footprint so that:
 
-- Main pipeline (~20 regs): 90%+ of fragments run at near-full occupancy.
-- Effects pipeline (~55 regs): shadow/glow fragments run at moderate occupancy; unavoidable given the
-  blur math complexity.
-- Backdrop-effects pipeline (~75 regs): glass fragments run at low occupancy; also unavoidable, and
-  structurally separated anyway by the texture-copy requirement.
+- Main pipeline (~20 regs): all fragments run at full occupancy on every architecture.
+- Effects pipeline (~48–55 regs): shadow/glow fragments run at 67–89% occupancy depending on
+  architecture; unavoidable given the blur math complexity.
+- Backdrop-effects pipeline (~72–75 regs): glass fragments run at 44–59% occupancy; also
+  unavoidable, and structurally separated anyway by the texture-copy requirement.
 
 This avoids the register-pressure tax of a single unified shader while keeping pipeline count minimal
 (3 vs. Zed GPUI's 7). The effects that drag occupancy down are isolated to the fragments that
-actually need them.
+actually need them. Crucially, all shape kinds within the main pipeline (SDF, tessellated, text)
+cluster at 12–24 registers — well below the register cliff on every architecture — so unifying them
+costs nothing in occupancy.
 
 **Why not per-primitive-type pipelines (GPUI's approach)?** Zed's GPUI uses 7 separate shader pairs:
 quad, shadow, underline, monochrome sprite, polychrome sprite, path, surface. This eliminates all
@@ -160,9 +191,9 @@ in submission order:
   cheaper than the pipeline-switching alternative.
 
 The split we _do_ perform (main / effects / backdrop-effects) is motivated by register-pressure tier
-boundaries where occupancy differences are catastrophic at 4K (see numbers above). Within a tier,
-unified is strictly better by every measure: fewer draw calls, simpler Z-order, lower CPU overhead,
-and negligible GPU-side branching cost.
+boundaries where occupancy drops are significant at 4K (see numbers above). Within a tier, unified is
+strictly better by every measure: fewer draw calls, simpler Z-order, lower CPU overhead, and
+negligible GPU-side branching cost.
 
 **References:**
 
@@ -172,6 +203,16 @@ and negligible GPU-side branching cost.
   https://github.com/zed-industries/zed/blob/cb6fc11/crates/gpui/src/platform/mac/shaders.metal
 - NVIDIA Nsight Graphics 2024.3 documentation on active-threads-per-warp and divergence analysis:
   https://developer.nvidia.com/blog/optimize-gpu-workloads-for-graphics-applications-with-nvidia-nsight-graphics/
+- NVIDIA Ampere GPU Architecture Tuning Guide — SM specs, max warps per SM (48 for cc 8.6, 64 for
+  cc 8.0), register file size (64K), occupancy factors:
+  https://docs.nvidia.com/cuda/ampere-tuning-guide/index.html
+- NVIDIA Ada GPU Architecture Tuning Guide — SM specs, max warps per SM (48 for cc 8.9):
+  https://docs.nvidia.com/cuda/ada-tuning-guide/index.html
+- CUDA Occupancy Calculation walkthrough (register allocation granularity, worked examples):
+  https://leimao.github.io/blog/CUDA-Occupancy-Calculation/
+- Apple M3 GPU architecture — Dynamic Caching (register file virtualization) eliminates static
+  worst-case register allocation, reducing the occupancy penalty for high-register shaders:
+  https://asplos.dev/wiki/m3-chip-explainer/gpu/index.html
 
 ### Why fragment shader branching is safe in this design
 
@@ -539,6 +580,145 @@ changes.
 - Valve's original SDF text rendering paper (SIGGRAPH 2007):
   https://steamcdn-a.akamaihd.net/apps/valve/2007/SIGGRAPH2007_AlphaTestedMagnification.pdf
 
+### Textures
+
+Textures plug into the existing main pipeline — no additional GPU pipeline, no shader rewrite. The
+work is a resource layer (registration, upload, sampling, lifecycle) plus two textured-draw procs
+that route into the existing tessellated and SDF paths respectively.
+
+#### Why draw owns registered textures
+
+A texture's GPU resource (the `^sdl.GPUTexture`, transfer buffer, shader resource view) is created
+and destroyed by draw. The user provides raw bytes and a descriptor at registration time; draw
+uploads synchronously and returns an opaque `Texture_Id` handle. The user can free their CPU-side
+bytes immediately after `register_texture` returns.
+
+This follows the model used by the RAD Debugger's render layer (`src/render/render_core.h` in
+EpicGamesExt/raddebugger, MIT license), where `r_tex2d_alloc` takes `(kind, size, format, data)`
+and returns an opaque handle that the renderer owns and releases. The single-owner model eliminates
+an entire class of lifecycle bugs (double-free, use-after-free across subsystems, unclear cleanup
+responsibility) that dual-ownership designs introduce.
+
+If advanced interop is ever needed (e.g., a future 3D pipeline or compute shader sharing the same
+GPU texture), the clean extension is a borrowed-reference accessor (`get_gpu_texture(id)`) that
+returns the underlying handle without transferring ownership. This is purely additive and does not
+require changing the registration API.
+
+#### Why `Texture_Kind` exists
+
+`Texture_Kind` (Static / Dynamic / Stream) is a driver hint for update frequency, adopted from the
+RAD Debugger's `R_ResourceKind`. It maps directly to SDL3 GPU usage patterns:
+
+- **Static**: uploaded once, never changes. Covers QR codes, decoded PNGs, icons — the 90% case.
+- **Dynamic**: updatable via `update_texture_region`. Covers font atlas growth, procedural updates.
+- **Stream**: frequent full re-uploads. Covers video playback, per-frame procedural generation.
+
+This costs one byte in the descriptor and lets the backend pick optimal memory placement without a
+future API change.
+
+#### Why samplers are per-draw, not per-texture
+
+A sampler describes how to filter and address a texture during sampling — nearest vs bilinear, clamp
+vs repeat. This is a property of the _draw_, not the texture. The same QR code texture should be
+sampled with `Nearest_Clamp` when displayed at native resolution but could reasonably be sampled
+with `Linear_Clamp` in a zoomed-out thumbnail. The same icon atlas might be sampled with
+`Nearest_Clamp` for pixel art or `Linear_Clamp` for smooth scaling.
+
+The RAD Debugger follows this pattern: `R_BatchGroup2DParams` carries `tex_sample_kind` alongside
+the texture handle, chosen per batch group at draw time. We do the same — `Sampler_Preset` is a
+parameter on the draw procs, not a field on `Texture_Desc`.
+
+Internally, draw keeps a small pool of pre-created `^sdl.GPUSampler` objects (one per preset,
+lazily initialized). Sub-batch coalescing keys on `(kind, texture_id, sampler_preset)` — draws
+with the same texture but different samplers produce separate draw calls, which is correct.
+
+#### Textured draw procs
+
+Textured rectangles route through the existing SDF path via `draw.rectangle_texture` and
+`draw.rectangle_texture_corners`, mirroring `draw.rectangle` and `draw.rectangle_corners` exactly —
+same parameters, same naming — with the color parameter replaced by a texture ID plus an optional
+tint.
+
+An earlier iteration of this design considered a separate tessellated `draw.texture` proc for
+"simple" fullscreen quads, on the theory that the tessellated path's lower register count (~16 regs
+vs ~24 for the SDF textured branch) would improve occupancy at large fragment counts. Applying the
+register-pressure analysis from the pipeline-strategy section above shows this is wrong: both 16 and
+24 registers are well below the register cliff (~43 regs on consumer Ampere/Ada, ~32 on Volta/A100),
+so both run at 100% occupancy. The remaining ALU difference (~15 extra instructions for the SDF
+evaluation) amounts to ~20μs at 4K — below noise. Meanwhile, splitting into a separate pipeline
+would add ~1–5μs per pipeline bind on the CPU side per scissor, matching or exceeding the GPU-side
+savings. Within the main tier, unified remains strictly better.
+
+The naming convention follows the existing shape API: `rectangle_texture` and
+`rectangle_texture_corners` sit alongside `rectangle` and `rectangle_corners`, mirroring the
+`rectangle_gradient` / `circle_gradient` pattern where the shape is the primary noun and the
+modifier (gradient, texture) is secondary. This groups related procs together in autocomplete
+(`rectangle_*`) and reads as natural English ("draw a rectangle with a texture").
+
+Future per-shape texture variants (`circle_texture`, `ellipse_texture`, `polygon_texture`) are
+reserved by this naming convention and require only a `Shape_Flag.Textured` bit plus a small
+per-shape UV mapping function in the fragment shader. These are additive.
+
+#### What SDF anti-aliasing does and does not do for textured draws
+
+The SDF path anti-aliases the **shape's outer silhouette** — rounded-corner edges, rotated edges,
+stroke outlines. It does not anti-alias or sharpen the texture content. Inside the shape, fragments
+sample through the chosen `Sampler_Preset`, and image quality is whatever the sampler produces from
+the source texels. A low-resolution texture displayed at a large size shows bilinear blur regardless
+of which draw proc is used. This matches the current text-rendering model, where glyph sharpness
+depends on how closely the display size matches the SDL_ttf atlas's rasterized size.
+
+#### Fit modes are a computation layer, not a renderer concept
+
+Standard image-fit behaviors (stretch, fill/cover, fit/contain, tile, center) are expressed as UV
+sub-region computations on top of the `uv_rect` parameter that both textured-draw procs accept. The
+renderer has no knowledge of fit modes — it samples whatever UV region it is given.
+
+A `fit_params` helper computes the appropriate `uv_rect`, sampler preset, and (for letterbox/fit
+mode) shrunken inner rect from a `Fit_Mode` enum, the target rect, and the texture's pixel size.
+Users who need custom UV control (sprite atlas sub-regions, UV animation, nine-patch slicing) skip
+the helper and compute `uv_rect` directly. This keeps the renderer primitive minimal while making
+the common cases convenient.
+
+#### Deferred release
+
+`unregister_texture` does not immediately release the GPU texture. It queues the slot for release at
+the end of the current frame, after `SubmitGPUCommandBuffer` has handed work to the GPU. This
+prevents a race condition where a texture is freed while the GPU is still sampling from it in an
+already-submitted command buffer. The same deferred-release pattern is applied to `clear_text_cache`
+and `clear_text_cache_entry`, fixing a pre-existing latent bug where destroying a cached
+`^sdl_ttf.Text` mid-frame could free an atlas texture still referenced by in-flight draw batches.
+
+This pattern is standard in production renderers — the RAD Debugger's `r_tex2d_release` queues
+textures onto a free list that is processed in `r_end_frame`, not at the call site.
+
+#### Clay integration
+
+Clay's `RenderCommandType.Image` is handled by dereferencing `imageData: rawptr` as a pointer to a
+`Clay_Image_Data` struct containing a `Texture_Id`, `Fit_Mode`, and tint color. Routing mirrors the
+existing rectangle handling: zero `cornerRadius` dispatches to `draw.texture` (tessellated), nonzero
+dispatches to `draw.rectangle_texture_corners` (SDF). A `fit_params` call computes UVs from the fit
+mode before dispatch.
+
+#### Deferred features
+
+The following are plumbed in the descriptor but not implemented in phase 1:
+
+- **Mipmaps**: `Texture_Desc.mip_levels` field exists; generation via SDL3 deferred.
+- **Compressed formats**: `Texture_Desc.format` accepts BC/ASTC; upload path deferred.
+- **Render-to-texture**: `Texture_Desc.usage` accepts `.COLOR_TARGET`; render-pass refactor deferred.
+- **3D textures, arrays, cube maps**: `Texture_Desc.type` and `depth_or_layers` fields exist.
+- **Additional samplers**: anisotropic, trilinear, clamp-to-border — additive enum values.
+- **Atlas packing**: internal optimization for sub-batch coalescing; invisible to callers.
+- **Per-shape texture variants**: `circle_texture`, `ellipse_texture`, etc. — reserved by naming.
+
+**References:**
+
+- RAD Debugger render layer (ownership model, deferred release, sampler-at-draw-time):
+  https://github.com/EpicGamesExt/raddebugger — `src/render/render_core.h`, `src/render/d3d11/render_d3d11.c`
+- Casey Muratori, Handmade Hero day 472 — texture handling as a renderer-owned resource concern,
+  atlases as a separate layer above the renderer.
+
 ## 3D rendering
 
 3D pipeline architecture is under consideration and will be documented separately. The current
diff --git a/draw/draw.odin b/draw/draw.odin
index 0ed28b0..0cb0f82 100644
--- a/draw/draw.odin
+++ b/draw/draw.odin
@@ -265,6 +265,7 @@ measure_text_clay :: proc "c" (
 	context = GLOB.odin_context
 	text := string(text.chars[:text.length])
 	c_text := strings.clone_to_cstring(text, context.temp_allocator)
+	defer delete(c_text, context.temp_allocator)
 	width, height: c.int
 	if !sdl_ttf.GetStringSize(get_font(config.fontId, config.fontSize), c_text, 0, &width, &height) {
 		log.panicf("Failed to measure text: %s", sdl.GetError())
@@ -502,6 +503,7 @@ prepare_clay_batch :: proc(
 	mouse_wheel_delta: [2]f32,
 	frame_time: f32 = 0,
 	custom_draw: Custom_Draw = nil,
+	temp_allocator := context.temp_allocator,
 ) {
 	mouse_pos: [2]f32
 	mouse_flags := sdl.GetMouseState(&mouse_pos.x, &mouse_pos.y)
@@ -541,7 +543,8 @@ prepare_clay_batch :: proc(
 		case clay.RenderCommandType.Text:
 			render_data := render_command.renderData.text
 			txt := string(render_data.stringContents.chars[:render_data.stringContents.length])
-			c_text := strings.clone_to_cstring(txt, context.temp_allocator)
+			c_text := strings.clone_to_cstring(txt, temp_allocator)
+			defer delete(c_text, temp_allocator)
 			// Clay render-command IDs are derived via Clay's internal HashNumber (Jenkins-family)
 			// and namespaced with .Clay so they can never collide with user-provided custom text IDs.
 			sdl_text := cache_get_or_update(
diff --git a/draw/shapes.odin b/draw/shapes.odin
index 2b15f25..5a8b929 100644
--- a/draw/shapes.odin
+++ b/draw/shapes.odin
@@ -83,6 +83,7 @@ rectangle_gradient :: proc(
 	temp_allocator := context.temp_allocator,
 ) {
 	vertices := make([]Vertex, 6, temp_allocator)
+	defer delete(vertices, temp_allocator)
 
 	corner_top_left := [2]f32{rect.x, rect.y}
 	corner_top_right := [2]f32{rect.x + rect.width, rect.y}
@@ -115,6 +116,7 @@ circle_sector :: proc(
 
 	vertex_count := segment_count * 3
 	vertices := make([]Vertex, vertex_count, temp_allocator)
+	defer delete(vertices, temp_allocator)
 
 	start_radians := math.to_radians(start_angle)
 	end_radians := math.to_radians(end_angle)
@@ -167,6 +169,7 @@ circle_gradient :: proc(
 
 	vertex_count := segment_count * 3
 	vertices := make([]Vertex, vertex_count, temp_allocator)
+	defer delete(vertices, temp_allocator)
 
 	step_angle := math.TAU / f32(segment_count)
 
@@ -238,6 +241,7 @@ triangle_lines :: proc(
 	temp_allocator := context.temp_allocator,
 ) {
 	vertices := make([]Vertex, 18, temp_allocator)
+	defer delete(vertices, temp_allocator)
 	write_offset := 0
 
 	if !needs_transform(origin, rotation) {
@@ -273,6 +277,7 @@ triangle_fan :: proc(
 	triangle_count := len(points) - 2
 	vertex_count := triangle_count * 3
 	vertices := make([]Vertex, vertex_count, temp_allocator)
+	defer delete(vertices, temp_allocator)
 
 	if !needs_transform(origin, rotation) {
 		for i in 1 ..< len(points) - 1 {
@@ -312,6 +317,7 @@ triangle_strip :: proc(
 	triangle_count := len(points) - 2
 	vertex_count := triangle_count * 3
 	vertices := make([]Vertex, vertex_count, temp_allocator)
+	defer delete(vertices, temp_allocator)
 
 	if !needs_transform(origin, rotation) {
 		for i in 0 ..< triangle_count {
diff --git a/draw/text.odin b/draw/text.odin
index 5ff7265..7400b33 100644
--- a/draw/text.odin
+++ b/draw/text.odin
@@ -139,6 +139,7 @@ text :: proc(
 	temp_allocator := context.temp_allocator,
 ) {
 	c_str := strings.clone_to_cstring(text_string, temp_allocator)
+	defer delete(c_str, temp_allocator)
 
 	sdl_text: ^sdl_ttf.Text
 	cached := false
@@ -180,6 +181,7 @@ measure_text :: proc(
 	allocator := context.temp_allocator,
 ) -> [2]f32 {
 	c_str := strings.clone_to_cstring(text_string, allocator)
+	defer delete(c_str, allocator)
 	width, height: c.int
 	if !sdl_ttf.GetStringSize(get_font(font_id, font_size), c_str, 0, &width, &height) {
 		log.panicf("Failed to measure text: %s", sdl.GetError())
-- 
2.43.0


From a4623a13b576c662dc6477899a17a4d52e5b6bb4 Mon Sep 17 00:00:00 2001
From: Zachary Levy <zachary@sunforge.is>
Date: Tue, 21 Apr 2026 13:01:02 -0700
Subject: [PATCH 2/5] Basic texture support

---
 .zed/tasks.json                           |   5 +
 draw/README.md                            | 244 +++++++-----
 draw/draw.odin                            | 197 +++++++---
 draw/draw_qr/draw_qr.odin                 |  78 ++++
 draw/examples/hellope.odin                |   5 +-
 draw/examples/main.odin                   |   5 +-
 draw/examples/textures.odin               | 285 ++++++++++++++
 draw/pipeline_2d_base.odin                |  37 +-
 draw/shaders/generated/base_2d.frag.metal |  95 +++--
 draw/shaders/generated/base_2d.frag.spv   | Bin 17776 -> 19164 bytes
 draw/shaders/generated/base_2d.vert.metal |  23 +-
 draw/shaders/generated/base_2d.vert.spv   | Bin 4716 -> 5008 bytes
 draw/shaders/source/base_2d.frag          |  18 +
 draw/shaders/source/base_2d.vert          |   4 +
 draw/shapes.odin                          | 158 +++++++-
 draw/text.odin                            |   4 +-
 draw/textures.odin                        | 433 ++++++++++++++++++++++
 17 files changed, 1375 insertions(+), 216 deletions(-)
 create mode 100644 draw/draw_qr/draw_qr.odin
 create mode 100644 draw/examples/textures.odin
 create mode 100644 draw/textures.odin

diff --git a/.zed/tasks.json b/.zed/tasks.json
index e08acae..8b14508 100644
--- a/.zed/tasks.json
+++ b/.zed/tasks.json
@@ -70,6 +70,11 @@
     "command": "odin run draw/examples -debug -out=out/debug/draw-examples -- hellope-custom",
     "cwd": "$ZED_WORKTREE_ROOT",
   },
+  {
+    "label": "Run draw textures example",
+    "command": "odin run draw/examples -debug -out=out/debug/draw-examples -- textures",
+    "cwd": "$ZED_WORKTREE_ROOT",
+  },
   {
     "label": "Run qrcode basic example",
     "command": "odin run qrcode/examples -debug -out=out/debug/qrcode-examples -- basic",
diff --git a/draw/README.md b/draw/README.md
index 1066a7e..5eeabf2 100644
--- a/draw/README.md
+++ b/draw/README.md
@@ -47,99 +47,107 @@ primitives and effects can be added to the library without architectural changes
 
 ### Overview: three pipelines
 
-The 2D renderer will use three GPU pipelines, split by **register pressure compatibility** and
-**render-state requirements**:
+The 2D renderer uses three GPU pipelines, split by **register pressure** (main vs effects) and
+**render-pass structure** (everything vs backdrop):
 
-1. **Main pipeline** — shapes (SDF and tessellated) and text. Low register footprint (~18–22
-   registers per thread). Runs at high GPU occupancy. Handles 90%+ of all fragments in a typical
-   frame.
+1. **Main pipeline** — shapes (SDF and tessellated), text, and textured rectangles. Low register
+   footprint (~18–24 registers per thread). Runs at full GPU occupancy on every architecture.
+   Handles 90%+ of all fragments in a typical frame.
 
 2. **Effects pipeline** — drop shadows, inner shadows, outer glow, and similar ALU-bound blur
    effects. Medium register footprint (~48–60 registers). Each effects primitive includes the base
    shape's SDF so that it can draw both the effect and the shape in a single fragment pass, avoiding
-   redundant overdraw.
+   redundant overdraw. Separated from the main pipeline to protect main-pipeline occupancy on
+   low-end hardware (see register analysis below).
 
-3. **Backdrop-effects pipeline** — frosted glass, refraction, and any effect that samples the current
-   render target as input. High register footprint (~70–80 registers) and structurally requires a
-   `CopyGPUTextureToTexture` from the render target before drawing. Separated both for register
-   pressure and because the texture-copy requirement forces a render-pass-level state change.
+3. **Backdrop pipeline** — frosted glass, refraction, and any effect that samples the current render
+   target as input. Implemented as a multi-pass sequence (downsample, separable blur, composite),
+   where each individual pass has a low-to-medium register footprint (~15–40 registers). Separated
+   from the other pipelines because it structurally requires ending the current render pass and
+   copying the render target before any backdrop-sampling fragment can execute — a command-buffer-
+   level boundary that cannot be avoided regardless of shader complexity.
 
 A typical UI frame with no effects uses 1 pipeline bind and 0 switches. A frame with drop shadows
 uses 2 pipelines and 1 switch. A frame with shadows and frosted glass uses all 3 pipelines and 2
-switches plus 1 texture copy. At ~5μs per pipeline bind on modern APIs, worst-case switching overhead
-is under 0.15% of an 8.3ms (120 FPS) frame budget.
+switches plus 1 texture copy. At ~1–5μs per pipeline bind on modern APIs, worst-case switching
+overhead is negligible relative to an 8.3ms (120 FPS) frame budget.
 
 ### Why three pipelines, not one or seven
 
 The natural question is whether we should use a single unified pipeline (fewer state changes, simpler
 code) or many per-primitive-type pipelines (no branching overhead, lean per-shader register usage).
 
-The dominant cost factor is **GPU register pressure**, not pipeline switching overhead or fragment
-shader branching. A GPU shader core has a fixed register pool shared among all concurrent threads. The
-compiler allocates registers pessimistically based on the worst-case path through the shader. If the
-shader contains both a 20-register RRect SDF and a 72-register frosted-glass blur, _every_ fragment
-— even trivial RRects — is allocated 72 registers. This directly reduces **occupancy** (the number of
-warps that can run simultaneously), which reduces the GPU's ability to hide memory latency.
+#### Main/effects split: register pressure
 
-Concrete occupancy analysis on modern NVIDIA SMs, which have 65,536 32-bit registers and a
-hardware-imposed maximum thread count per SM that varies by architecture (Volta/A100: 2,048;
-consumer Ampere/Ada: 1,536). Occupancy is register-limited only when `65536 / regs_per_thread` falls
-below the hardware thread cap; above that cap, occupancy is 100% regardless of register count.
+A GPU shader core has a fixed register pool shared among all concurrent threads. The compiler
+allocates registers pessimistically based on the worst-case path through the shader. If the shader
+contains both a 20-register RRect SDF and a 48-register drop-shadow blur, _every_ fragment — even
+trivial RRects — is allocated 48 registers. This directly reduces **occupancy** (the number of
+warps/wavefronts that can run simultaneously), which reduces the GPU's ability to hide memory
+latency.
 
-On consumer Ampere/Ada GPUs (RTX 30xx/40xx, max 1,536 threads per SM):
+Each GPU architecture has a **register cliff** — a threshold above which occupancy starts dropping.
+Below the cliff, adding registers has zero occupancy cost.
 
-| Register allocation       | Reg-limited threads | Actual (hw-capped) | Occupancy |
-| ------------------------- | ------------------- | ------------------ | --------- |
-| 20 regs (RRect only)      | 3,276               | 1,536              | 100%      |
-| 32 regs                   | 2,048               | 1,536              | 100%      |
-| 48 regs (+ drop shadow)   | 1,365               | 1,365              | ~89%      |
-| 72 regs (+ frosted glass) | 910                 | 910                | ~59%      |
+On consumer Ampere/Ada GPUs (RTX 30xx/40xx, 65,536 regs/SM, max 1,536 threads/SM, cliff at ~43 regs):
 
-On Volta/A100 GPUs (max 2,048 threads per SM):
+| Register allocation     | Reg-limited threads | Actual (hw-capped) | Occupancy |
+| ----------------------- | ------------------- | ------------------ | --------- |
+| 20 regs (main pipeline) | 3,276               | 1,536              | 100%      |
+| 32 regs                 | 2,048               | 1,536              | 100%      |
+| 48 regs (effects)       | 1,365               | 1,365              | ~89%      |
 
-| Register allocation       | Reg-limited threads | Actual (hw-capped) | Occupancy |
-| ------------------------- | ------------------- | ------------------ | --------- |
-| 20 regs (RRect only)      | 3,276               | 2,048              | 100%      |
-| 32 regs                   | 2,048               | 2,048              | 100%      |
-| 48 regs (+ drop shadow)   | 1,365               | 1,365              | ~67%      |
-| 72 regs (+ frosted glass) | 910                 | 910                | ~44%      |
+On Volta/A100 GPUs (65,536 regs/SM, max 2,048 threads/SM, cliff at ~32 regs):
 
-The register cliff — where occupancy begins dropping — starts at ~43 regs/thread on consumer
-Ampere/Ada (65536 / 1536) and ~32 regs/thread on Volta/A100 (65536 / 2048). Below the cliff,
-adding registers has zero occupancy cost.
+| Register allocation     | Reg-limited threads | Actual (hw-capped) | Occupancy |
+| ----------------------- | ------------------- | ------------------ | --------- |
+| 20 regs (main pipeline) | 3,276               | 2,048              | 100%      |
+| 32 regs                 | 2,048               | 2,048              | 100%      |
+| 48 regs (effects)       | 1,365               | 1,365              | ~67%      |
 
-The impact of reduced occupancy depends on whether the shader is memory-latency-bound (where
-occupancy is critical for hiding latency) or ALU-bound (where it matters less). For the
-backdrop-effects pipeline's frosted-glass shader, which performs multiple dependent texture reads,
-59% occupancy (consumer) or 44% occupancy (Volta) meaningfully reduces the GPU's ability to hide
-texture latency — roughly a 1.7× to 2.3× throughput reduction compared to full occupancy. At 4K with
-1.5× overdraw (~12.4M fragments), if the main pipeline's fragment work at full occupancy takes ~2ms,
-a single unified shader containing the glass branch would push it to ~3.4–4.6ms depending on
-architecture. This is a per-frame multiplier, not a per-primitive cost — it applies even when the
-heavy branch is never taken, because the compiler allocates registers for the worst-case path.
+On low-end mobile (ARM Mali Bifrost/Valhall, 64 regs/thread, cliff fixed at 32 regs):
 
-**Note on Apple M3+ GPUs:** Apple's M3 GPU architecture introduces Dynamic Caching (register file
-virtualization), which allocates registers dynamically at runtime based on actual usage rather than
-worst-case declared usage. This significantly reduces the static register-pressure-to-occupancy
-penalty described above. The tier split remains useful on Apple hardware for other reasons (keeping
-the backdrop texture-copy out of the main render pass, isolating blur ALU complexity), but the
-register-pressure argument specifically weakens on M3 and later.
+| Register allocation  | Occupancy                  |
+| -------------------- | -------------------------- |
+| 0–32 regs (main)     | 100% (full thread count)   |
+| 33–64 regs (effects) | ~50% (thread count halves) |
 
-The three-pipeline split groups primitives by register footprint so that:
+Mali's cliff at 32 registers is the binding constraint. On desktop the occupancy difference between
+20 and 48 registers is modest (89–100%); on Mali it is a hard 2× throughput reduction. The
+main/effects split protects 90%+ of a frame's fragments (shapes, text, textures) from the effects
+pipeline's register cost.
 
-- Main pipeline (~20 regs): all fragments run at full occupancy on every architecture.
-- Effects pipeline (~48–55 regs): shadow/glow fragments run at 67–89% occupancy depending on
-  architecture; unavoidable given the blur math complexity.
-- Backdrop-effects pipeline (~72–75 regs): glass fragments run at 44–59% occupancy; also
-  unavoidable, and structurally separated anyway by the texture-copy requirement.
+For the effects pipeline's drop-shadow shader — erf-approximation blur math with several texture
+fetches — 50% occupancy on Mali roughly halves throughput. At 4K with 1.5× overdraw (~12.4M
+fragments), a single unified shader containing the shadow branch would cost ~4ms instead of ~2ms on
+low-end mobile. This is a per-frame multiplier even when the heavy branch is never taken, because the
+compiler allocates registers for the worst-case path.
 
-This avoids the register-pressure tax of a single unified shader while keeping pipeline count minimal
-(3 vs. Zed GPUI's 7). The effects that drag occupancy down are isolated to the fragments that
-actually need them. Crucially, all shape kinds within the main pipeline (SDF, tessellated, text)
-cluster at 12–24 registers — well below the register cliff on every architecture — so unifying them
-costs nothing in occupancy.
+All main-pipeline members (SDF shapes, tessellated geometry, text, textured rectangles) cluster at
+12–24 registers — below the cliff on every architecture — so unifying them costs nothing in
+occupancy.
 
-**Why not per-primitive-type pipelines (GPUI's approach)?** Zed's GPUI uses 7 separate shader pairs:
+**Note on Apple M3+ GPUs:** Apple's M3 introduces Dynamic Caching (register file virtualization),
+which allocates registers at runtime based on actual usage rather than worst-case. This weakens the
+static register-pressure argument on M3 and later, but the split remains useful for isolating blur
+ALU complexity and keeping the backdrop texture-copy out of the main render pass.
+
+#### Backdrop split: render-pass structure
+
+The backdrop pipeline (frosted glass, refraction, mirror surfaces) is separated for a structural
+reason unrelated to register pressure. Before any backdrop-sampling fragment can execute, the current
+render target must be copied to a separate texture via `CopyGPUTextureToTexture` — a command-buffer-
+level operation that requires ending the current render pass. This boundary exists regardless of
+shader complexity and cannot be optimized away.
+
+The backdrop pipeline's individual shader passes (downsample, separable blur, composite) are
+register-light (~15–40 regs each), so merging them into the effects pipeline would cause no occupancy
+problem. But the render-pass boundary makes merging structurally impossible — effects draws happen
+inside the main render pass, backdrop draws happen inside their own bracketed pass sequence.
+
+#### Why not per-primitive-type pipelines (GPUI's approach)
+
+Zed's GPUI uses 7 separate shader pairs:
 quad, shadow, underline, monochrome sprite, polychrome sprite, path, surface. This eliminates all
 branching and gives each shader minimal register usage. Three concrete costs make this approach wrong
 for our use case:
@@ -151,7 +159,7 @@ typical UI frame with 15 scissors and 3–4 primitive kinds per scissor, per-kin
 ~45–60 draw calls and pipeline binds; our unified approach produces ~15–20 draw calls and 1–5
 pipeline binds. At ~5μs each for CPU-side command encoding on modern APIs, per-kind splitting adds
 375–500μs of CPU overhead per frame — **4.5–6% of an 8.3ms (120 FPS) budget** — with no
-compensating GPU-side benefit, because the register-pressure savings within the simple-SDF tier are
+compensating GPU-side benefit, because the register-pressure savings within the simple-SDF range are
 negligible (all members cluster at 12–22 registers).
 
 **Z-order preservation forces the API to expose layers.** With a single pipeline drawing all kinds
@@ -190,8 +198,8 @@ in submission order:
   ~60 boundary warps at ~80 extra instructions each), unified divergence costs ~13μs — still 3.5×
   cheaper than the pipeline-switching alternative.
 
-The split we _do_ perform (main / effects / backdrop-effects) is motivated by register-pressure tier
-boundaries where occupancy drops are significant at 4K (see numbers above). Within a tier, unified is
+The split we _do_ perform (main / effects / backdrop) is motivated by register-pressure boundaries
+and structural render-pass requirements (see analysis above). Within a pipeline, unified is
 strictly better by every measure: fewer draw calls, simpler Z-order, lower CPU overhead, and
 negligible GPU-side branching cost.
 
@@ -483,25 +491,40 @@ Wallace's variant) and vger-rs.
 - Vello's implementation of blurred rounded rectangle as a gradient type:
   https://github.com/linebender/vello/pull/665
 
-### Backdrop-effects pipeline
+### Backdrop pipeline
 
-The backdrop-effects pipeline handles effects that sample the current render target as input: frosted
-glass, refraction, mirror surfaces. It is structurally separated from the effects pipeline for two
-reasons:
+The backdrop pipeline handles effects that sample the current render target as input: frosted glass,
+refraction, mirror surfaces. It is separated from the effects pipeline for a structural reason, not
+register pressure.
 
-1. **Render-state requirement.** Before any backdrop-sampling fragment can run, the current render
-   target must be copied to a separate texture via `CopyGPUTextureToTexture`. This is a command-
-   buffer-level operation that cannot happen mid-render-pass. The copy naturally creates a pipeline
-   boundary.
+**Render-pass boundary.** Before any backdrop-sampling fragment can run, the current render target
+must be copied to a separate texture via `CopyGPUTextureToTexture`. This is a command-buffer-level
+operation that cannot happen mid-render-pass. The copy naturally creates a pipeline boundary that no
+amount of shader optimization can eliminate — it is a fundamental requirement of sampling a surface
+while also writing to it.
 
-2. **Register pressure.** Backdrop-sampling shaders read from a texture with Gaussian kernel weights
-   (multiple texture fetches per fragment), pushing register usage to ~70–80. Including this in the
-   effects pipeline would reduce occupancy for all shadow/glow fragments from ~30% to ~20%, costing
-   measurable throughput on the common case.
+**Multi-pass implementation.** Backdrop effects are implemented as separable multi-pass sequences
+(downsample → horizontal blur → vertical blur → composite), following the standard approach used by
+iOS `UIVisualEffectView`, Android `RenderEffect`, and Flutter's `BackdropFilter`. Each individual
+pass has a low-to-medium register footprint (~15–40 registers), well within the main pipeline's
+occupancy range. The multi-pass approach avoids the monolithic 70+ register shader that a single-pass
+Gaussian blur would require, making backdrop effects viable on low-end mobile GPUs (including
+Mali-G31 and VideoCore VI) where per-thread register limits are tight.
 
-The backdrop-effects pipeline binds a secondary sampler pointing at the captured backdrop texture. When
-no backdrop effects are present in a frame, this pipeline is never bound and the texture copy never
-happens — zero cost.
+**Bracketed execution.** All backdrop draws in a frame share a single bracketed region of the command
+buffer: end the current render pass, copy the render target, execute all backdrop sub-passes, then
+resume normal drawing. The entry/exit cost (texture copy + render-pass break) is paid once per frame
+regardless of how many backdrop effects are visible. When no backdrop effects are present, the bracket
+is never entered and the texture copy never happens — zero cost.
+
+**Why not split the backdrop sub-passes into separate pipelines?** The individual passes range from
+~15 to ~40 registers, which does cross Mali's 32-register cliff. However, the register-pressure argument
+that justifies the main/effects split does not apply here. The main/effects split protects the
+_common path_ (90%+ of frame fragments) from the uncommon path's register cost. Inside the backdrop
+pipeline there is no common-vs-uncommon distinction — if backdrop effects are active, every sub-pass
+runs; if not, none run. The backdrop pipeline either executes as a complete unit or not at all.
+Additionally, backdrop effects cover a small fraction of the frame's total fragments (~5% at typical
+UI scales), so the occupancy variation within the bracket has negligible impact on frame time.
 
 ### Vertex layout
 
@@ -524,19 +547,21 @@ The `Primitive` struct for SDF shapes lives in the storage buffer, not in vertex
 
 ```
 Primitive :: struct {
-    kind:   Shape_Kind,     //  0: enum u8
-    flags:  Shape_Flags,    //  1: bit_set[Shape_Flag; u8]
-    _pad:   u16,            //  2: reserved
-    bounds: [4]f32,         //  4: min_x, min_y, max_x, max_y
-    color:  Color,          // 20: u8x4
-    _pad2:  [3]u8,          // 24: alignment
-    params: Shape_Params,   // 28: raw union, 32 bytes
+    bounds:     [4]f32,         //  0: min_x, min_y, max_x, max_y
+    color:      Color,          // 16: u8x4, unpacked in shader via unpackUnorm4x8
+    kind_flags: u32,            // 20: (kind as u32) | (flags as u32 << 8)
+    rotation:   f32,            // 24: shader self-rotation in radians
+    _pad:       f32,            // 28: alignment
+    params:     Shape_Params,   // 32: raw union, 32 bytes (two vec4s of shape-specific data)
+    uv_rect:    [4]f32,         // 64: texture UV sub-region (u_min, v_min, u_max, v_max)
 }
-// Total: 60 bytes (padded to 64 for GPU alignment)
+// Total: 80 bytes (std430 aligned)
 ```
 
 `Shape_Params` is a `#raw_union` with named variants per shape kind (`rrect`, `circle`, `segment`,
-etc.), ensuring type safety on the CPU side and zero-cost reinterpretation on the GPU side.
+etc.), ensuring type safety on the CPU side and zero-cost reinterpretation on the GPU side. The
+`uv_rect` field is used by textured SDF primitives (Shape_Flag.Textured); non-textured primitives
+leave it zeroed.
 
 ### Draw submission order
 
@@ -547,7 +572,7 @@ Within each scissor region, draws are issued in submission order to preserve the
 2. Bind **main pipeline, tessellated mode** → draw all queued tessellated vertices (non-indexed for
    shapes, indexed for text). Pipeline state unchanged from today.
 3. Bind **main pipeline, SDF mode** → draw all queued SDF primitives (instanced, one draw call).
-4. If backdrop effects are present: copy render target, bind **backdrop-effects pipeline** → draw
+4. If backdrop effects are present: copy render target, bind **backdrop pipeline** → draw
    backdrop primitives.
 
 The exact ordering within a scissor may be refined based on actual Z-ordering requirements. The key
@@ -647,7 +672,7 @@ register-pressure analysis from the pipeline-strategy section above shows this i
 so both run at 100% occupancy. The remaining ALU difference (~15 extra instructions for the SDF
 evaluation) amounts to ~20μs at 4K — below noise. Meanwhile, splitting into a separate pipeline
 would add ~1–5μs per pipeline bind on the CPU side per scissor, matching or exceeding the GPU-side
-savings. Within the main tier, unified remains strictly better.
+savings. Within the main pipeline, unified remains strictly better.
 
 The naming convention follows the existing shape API: `rectangle_texture` and
 `rectangle_texture_corners` sit alongside `rectangle` and `rectangle_corners`, mirroring the
@@ -725,6 +750,35 @@ The following are plumbed in the descriptor but not implemented in phase 1:
 expectation is that 3D rendering will use dedicated pipelines (separate from the 2D pipelines)
 sharing GPU resources (textures, samplers, command buffer lifecycle) with the 2D renderer.
 
+## Multi-window support
+
+The renderer currently assumes a single window via the global `GLOB` state. Multi-window support is
+deferred but anticipated. When revisited, the RAD Debugger's bucket + pass-list model
+(`src/draw/draw.h`, `src/draw/draw.c` in EpicGamesExt/raddebugger) is worth studying as a reference.
+
+RAD separates draw submission from rendering via **buckets**. A `DR_Bucket` is an explicit handle
+that accumulates an ordered list of render passes (`R_PassList`). The user creates a bucket, pushes
+it onto a thread-local stack, issues draw calls (which target the top-of-stack bucket), then submits
+the bucket to a specific window. Multiple buckets can exist simultaneously — one per window, or one
+per UI panel that gets composited into a parent bucket via `dr_sub_bucket`. Implicit draw parameters
+(clip rect, 2D transform, sampler mode, transparency) are managed via push/pop stacks scoped to each
+bucket, so different windows can have independent clip and transform state without interference.
+
+The key properties this gives RAD:
+
+- **Per-window isolation.** Each window builds its own bucket with its own pass list and state stacks.
+  No global contention.
+- **Thread-parallel building.** Each thread has its own draw context and arena. Multiple threads can
+  build buckets concurrently, then submit them to the render backend sequentially.
+- **Compositing.** A pre-built bucket (e.g., a tooltip or overlay) can be injected into another
+  bucket with a transform applied, without rebuilding its draw calls.
+
+For our library, the likely adaptation would be replacing the single `GLOB` with a per-window draw
+context that users create and pass to `begin`/`end`, while keeping the explicit-parameter draw call
+style rather than adopting RAD's implicit state stacks. Texture and sampler resources would remain
+global (shared across windows), with only the per-frame staging buffers and layer/scissor state
+becoming per-context.
+
 ## Building shaders
 
 GLSL shader sources live in `shaders/source/`. Compiled outputs (SPIR-V and Metal Shading Language)
diff --git a/draw/draw.odin b/draw/draw.odin
index 0cb0f82..0fb4934 100644
--- a/draw/draw.odin
+++ b/draw/draw.odin
@@ -63,15 +63,17 @@ Rectangle :: struct {
 }
 
 Sub_Batch_Kind :: enum u8 {
-	Shapes, // non-indexed, white texture, mode 0
+	Shapes, // non-indexed, white texture or user texture, mode 0
 	Text, // indexed, atlas texture, mode 0
-	SDF, // instanced unit quad, white texture, mode 1
+	SDF, // instanced unit quad, white texture or user texture, mode 1
 }
 
 Sub_Batch :: struct {
-	kind:   Sub_Batch_Kind,
-	offset: u32, // Shapes: vertex offset; Text: text_batch index; SDF: primitive index
-	count:  u32, // Shapes: vertex count; Text: always 1; SDF: primitive count
+	kind:       Sub_Batch_Kind,
+	offset:     u32, // Shapes: vertex offset; Text: text_batch index; SDF: primitive index
+	count:      u32, // Shapes: vertex count; Text: always 1; SDF: primitive count
+	texture_id: Texture_Id,
+	sampler:    Sampler_Preset,
 }
 
 Layer :: struct {
@@ -95,35 +97,60 @@ Scissor :: struct {
 GLOB: Global
 
 Global :: struct {
-	odin_context:      runtime.Context,
-	pipeline_2d_base:  Pipeline_2D_Base,
-	text_cache:        Text_Cache,
-	layers:            [dynamic]Layer,
-	scissors:          [dynamic]Scissor,
-	tmp_shape_verts:   [dynamic]Vertex,
-	tmp_text_verts:    [dynamic]Vertex,
-	tmp_text_indices:  [dynamic]c.int,
-	tmp_text_batches:  [dynamic]TextBatch,
-	tmp_primitives:    [dynamic]Primitive,
-	tmp_sub_batches:   [dynamic]Sub_Batch,
-	tmp_uncached_text: [dynamic]^sdl_ttf.Text, // Uncached TTF_Text objects to destroy after end()
-	clay_memory:       [^]u8,
-	msaa_texture:      ^sdl.GPUTexture,
-	curr_layer_index:  uint,
-	max_layers:        int,
-	max_scissors:      int,
-	max_shape_verts:   int,
-	max_text_verts:    int,
-	max_text_indices:  int,
-	max_text_batches:  int,
-	max_primitives:    int,
-	max_sub_batches:   int,
-	dpi_scaling:       f32,
-	msaa_width:        u32,
-	msaa_height:       u32,
-	sample_count:      sdl.GPUSampleCount,
-	clay_z_index:      i16,
-	cleared:           bool,
+	// -- Per-frame staging (hottest — touched by every prepare/upload/clear cycle) --
+	tmp_shape_verts:          [dynamic]Vertex, // Tessellated shape vertices staged for GPU upload.
+	tmp_text_verts:           [dynamic]Vertex, // Text vertices staged for GPU upload.
+	tmp_text_indices:         [dynamic]c.int, // Text index buffer staged for GPU upload.
+	tmp_text_batches:         [dynamic]TextBatch, // Text atlas batch metadata for indexed drawing.
+	tmp_primitives:           [dynamic]Primitive, // SDF primitives staged for GPU storage buffer upload.
+	tmp_sub_batches:          [dynamic]Sub_Batch, // Sub-batch records that drive draw call dispatch.
+	tmp_uncached_text:        [dynamic]^sdl_ttf.Text, // Uncached TTF_Text objects destroyed after end() submits.
+	layers:                   [dynamic]Layer, // Draw layers, each with its own scissor stack.
+	scissors:                 [dynamic]Scissor, // Scissor rects that clip drawing within each layer.
+
+	// -- Per-frame scalars (accessed during prepare and draw_layer) --
+	curr_layer_index:         uint, // Index of the currently active layer.
+	dpi_scaling:              f32, // Window DPI scale factor applied to all pixel coordinates.
+	clay_z_index:             i16, // Tracks z-index for layer splitting during Clay batch processing.
+	cleared:                  bool, // Whether the render target has been cleared this frame.
+
+	// -- Pipeline (accessed every draw_layer call) --
+	pipeline_2d_base:         Pipeline_2D_Base, // The unified 2D GPU pipeline (shaders, buffers, samplers).
+	device:                   ^sdl.GPUDevice, // GPU device handle, stored at init.
+	samplers:                 [SAMPLER_PRESET_COUNT]^sdl.GPUSampler, // Lazily-created sampler objects, one per Sampler_Preset.
+
+	// -- Deferred release (processed once per frame at frame boundary) --
+	pending_texture_releases: [dynamic]Texture_Id, // Deferred GPU texture releases, processed next frame.
+	pending_text_releases:    [dynamic]^sdl_ttf.Text, // Deferred TTF_Text destroys, processed next frame.
+
+	// -- Textures (registration is occasional, binding is per draw call) --
+	texture_slots:            [dynamic]Texture_Slot, // Registered texture slots indexed by Texture_Id.
+	texture_free_list:        [dynamic]u32, // Recycled slot indices available for reuse.
+
+	// -- MSAA (once per frame in end()) --
+	msaa_texture:             ^sdl.GPUTexture, // Intermediate render target for multi-sample resolve.
+	msaa_width:               u32, // Cached width to detect when MSAA texture needs recreation.
+	msaa_height:              u32, // Cached height to detect when MSAA texture needs recreation.
+	sample_count:             sdl.GPUSampleCount, // Sample count chosen at init (._1 means MSAA disabled).
+
+	// -- Clay (once per frame in prepare_clay_batch) --
+	clay_memory:              [^]u8, // Raw memory block backing Clay's internal arena.
+
+	// -- Text (occasional — font registration and text cache lookups) --
+	text_cache:               Text_Cache, // Font registry, SDL_ttf engine, and cached TTF_Text objects.
+
+	// -- Resize tracking (cold — checked once per frame in resize_global) --
+	max_layers:               int, // High-water marks for dynamic array shrink heuristic.
+	max_scissors:             int,
+	max_shape_verts:          int,
+	max_text_verts:           int,
+	max_text_indices:         int,
+	max_text_batches:         int,
+	max_primitives:           int,
+	max_sub_batches:          int,
+
+	// -- Init-only (coldest — set once at init, never written again) --
+	odin_context:             runtime.Context, // Odin context captured at init for use in callbacks.
 }
 
 Init_Options :: struct {
@@ -168,22 +195,30 @@ init :: proc(
 	}
 
 	GLOB = Global {
-		layers            = make([dynamic]Layer, 0, INITIAL_LAYER_SIZE, allocator = allocator),
-		scissors          = make([dynamic]Scissor, 0, INITIAL_SCISSOR_SIZE, allocator = allocator),
-		tmp_shape_verts   = make([dynamic]Vertex, 0, BUFFER_INIT_SIZE, allocator = allocator),
-		tmp_text_verts    = make([dynamic]Vertex, 0, BUFFER_INIT_SIZE, allocator = allocator),
-		tmp_text_indices  = make([dynamic]c.int, 0, BUFFER_INIT_SIZE, allocator = allocator),
-		tmp_text_batches  = make([dynamic]TextBatch, 0, BUFFER_INIT_SIZE, allocator = allocator),
-		tmp_primitives    = make([dynamic]Primitive, 0, BUFFER_INIT_SIZE, allocator = allocator),
-		tmp_sub_batches   = make([dynamic]Sub_Batch, 0, BUFFER_INIT_SIZE, allocator = allocator),
-		tmp_uncached_text = make([dynamic]^sdl_ttf.Text, 0, 16, allocator = allocator),
-		odin_context      = odin_context,
-		dpi_scaling       = sdl.GetWindowDisplayScale(window),
-		clay_memory       = make([^]u8, min_memory_size, allocator = allocator),
-		sample_count      = resolved_sample_count,
-		pipeline_2d_base  = pipeline,
-		text_cache        = text_cache,
+		layers                   = make([dynamic]Layer, 0, INITIAL_LAYER_SIZE, allocator = allocator),
+		scissors                 = make([dynamic]Scissor, 0, INITIAL_SCISSOR_SIZE, allocator = allocator),
+		tmp_shape_verts          = make([dynamic]Vertex, 0, BUFFER_INIT_SIZE, allocator = allocator),
+		tmp_text_verts           = make([dynamic]Vertex, 0, BUFFER_INIT_SIZE, allocator = allocator),
+		tmp_text_indices         = make([dynamic]c.int, 0, BUFFER_INIT_SIZE, allocator = allocator),
+		tmp_text_batches         = make([dynamic]TextBatch, 0, BUFFER_INIT_SIZE, allocator = allocator),
+		tmp_primitives           = make([dynamic]Primitive, 0, BUFFER_INIT_SIZE, allocator = allocator),
+		tmp_sub_batches          = make([dynamic]Sub_Batch, 0, BUFFER_INIT_SIZE, allocator = allocator),
+		tmp_uncached_text        = make([dynamic]^sdl_ttf.Text, 0, 16, allocator = allocator),
+		device                   = device,
+		texture_slots            = make([dynamic]Texture_Slot, 0, 16, allocator = allocator),
+		texture_free_list        = make([dynamic]u32, 0, 16, allocator = allocator),
+		pending_texture_releases = make([dynamic]Texture_Id, 0, 16, allocator = allocator),
+		pending_text_releases    = make([dynamic]^sdl_ttf.Text, 0, 16, allocator = allocator),
+		odin_context             = odin_context,
+		dpi_scaling              = sdl.GetWindowDisplayScale(window),
+		clay_memory              = make([^]u8, min_memory_size, allocator = allocator),
+		sample_count             = resolved_sample_count,
+		pipeline_2d_base         = pipeline,
+		text_cache               = text_cache,
 	}
+
+	// Reserve slot 0 for INVALID_TEXTURE
+	append(&GLOB.texture_slots, Texture_Slot{})
 	log.debug("Window DPI scaling:", GLOB.dpi_scaling)
 	arena := clay.CreateArenaWithCapacityAndMemory(min_memory_size, GLOB.clay_memory)
 	window_width, window_height: c.int
@@ -230,12 +265,23 @@ destroy :: proc(device: ^sdl.GPUDevice, allocator := context.allocator) {
 	if GLOB.msaa_texture != nil {
 		sdl.ReleaseGPUTexture(device, GLOB.msaa_texture)
 	}
+	process_pending_texture_releases()
+	destroy_all_textures()
+	destroy_sampler_pool()
+	for ttf_text in GLOB.pending_text_releases do sdl_ttf.DestroyText(ttf_text)
+	delete(GLOB.pending_text_releases)
 	destroy_pipeline_2d_base(device, &GLOB.pipeline_2d_base)
 	destroy_text_cache()
 }
 
 // Internal
 clear_global :: proc() {
+	// Process deferred texture releases from the previous frame
+	process_pending_texture_releases()
+	// Process deferred text releases from the previous frame
+	for ttf_text in GLOB.pending_text_releases do sdl_ttf.DestroyText(ttf_text)
+	clear(&GLOB.pending_text_releases)
+
 	GLOB.curr_layer_index = 0
 	GLOB.clay_z_index = 0
 	GLOB.cleared = false
@@ -455,15 +501,24 @@ append_or_extend_sub_batch :: proc(
 	kind: Sub_Batch_Kind,
 	offset: u32,
 	count: u32,
+	texture_id: Texture_Id = INVALID_TEXTURE,
+	sampler: Sampler_Preset = .Linear_Clamp,
 ) {
 	if scissor.sub_batch_len > 0 {
 		last := &GLOB.tmp_sub_batches[scissor.sub_batch_start + scissor.sub_batch_len - 1]
-		if last.kind == kind && kind != .Text && last.offset + last.count == offset {
+		if last.kind == kind &&
+		   kind != .Text &&
+		   last.offset + last.count == offset &&
+		   last.texture_id == texture_id &&
+		   last.sampler == sampler {
 			last.count += count
 			return
 		}
 	}
-	append(&GLOB.tmp_sub_batches, Sub_Batch{kind = kind, offset = offset, count = count})
+	append(
+		&GLOB.tmp_sub_batches,
+		Sub_Batch{kind = kind, offset = offset, count = count, texture_id = texture_id, sampler = sampler},
+	)
 	scissor.sub_batch_len += 1
 	layer.sub_batch_len += 1
 }
@@ -554,6 +609,46 @@ prepare_clay_batch :: proc(
 			)
 			prepare_text(layer, Text{sdl_text, {bounds.x, bounds.y}, color_from_clay(render_data.textColor)})
 		case clay.RenderCommandType.Image:
+			render_data := render_command.renderData.image
+			if render_data.imageData == nil do continue
+			img_data := (^Clay_Image_Data)(render_data.imageData)^
+			cr := render_data.cornerRadius
+			radii := [4]f32{cr.topLeft, cr.topRight, cr.bottomRight, cr.bottomLeft}
+
+			// Background color behind the image (Clay allows it)
+			bg := color_from_clay(render_data.backgroundColor)
+			if bg[3] > 0 {
+				if radii == {0, 0, 0, 0} {
+					rectangle(layer, bounds, bg)
+				} else {
+					rectangle_corners(layer, bounds, radii, bg)
+				}
+			}
+
+			// Compute fit UVs
+			uv, sampler, inner := fit_params(img_data.fit, bounds, img_data.texture_id)
+
+			// Draw the image — route by cornerRadius
+			if radii == {0, 0, 0, 0} {
+				rectangle_texture(
+					layer,
+					inner,
+					img_data.texture_id,
+					tint = img_data.tint,
+					uv_rect = uv,
+					sampler = sampler,
+				)
+			} else {
+				rectangle_texture_corners(
+					layer,
+					inner,
+					radii,
+					img_data.texture_id,
+					tint = img_data.tint,
+					uv_rect = uv,
+					sampler = sampler,
+				)
+			}
 		case clay.RenderCommandType.ScissorStart:
 			if bounds.width == 0 || bounds.height == 0 do continue
 
diff --git a/draw/draw_qr/draw_qr.odin b/draw/draw_qr/draw_qr.odin
new file mode 100644
index 0000000..9fb3a0f
--- /dev/null
+++ b/draw/draw_qr/draw_qr.odin
@@ -0,0 +1,78 @@
+package draw_qr
+
+import draw ".."
+import "../../qrcode"
+
+// A registered QR code texture, ready for display via draw.rectangle_texture.
+QR :: struct {
+	texture_id: draw.Texture_Id,
+	size:       int, // modules per side (e.g. 21..177)
+}
+
+// Encode text as a QR code and register the result as an R8 texture.
+// The texture uses Nearest_Clamp sampling by default (sharp module edges).
+// Returns ok=false if encoding or registration fails.
+@(require_results)
+create_from_text :: proc(
+	text: string,
+	ecl: qrcode.Ecc = .Low,
+	min_version: int = qrcode.VERSION_MIN,
+	max_version: int = qrcode.VERSION_MAX,
+	mask: Maybe(qrcode.Mask) = nil,
+	boost_ecl: bool = true,
+) -> (
+	qr: QR,
+	ok: bool,
+) {
+	qrcode_buf: [qrcode.BUFFER_LEN_MAX]u8
+	encode_ok := qrcode.encode(text, qrcode_buf[:], ecl, min_version, max_version, mask, boost_ecl)
+	if !encode_ok do return {}, false
+	return create(qrcode_buf[:])
+}
+
+// Register an already-encoded QR code buffer as an R8 texture.
+// qrcode_buf must be the output of qrcode.encode (byte 0 = side length, remaining = bit-packed modules).
+@(require_results)
+create :: proc(qrcode_buf: []u8) -> (qr: QR, ok: bool) {
+	size := qrcode.get_size(qrcode_buf)
+	if size == 0 do return {}, false
+
+	// Build R8 pixel buffer: 0 = light, 255 = dark
+	pixels := make([]u8, size * size, context.temp_allocator)
+	for y in 0 ..< size {
+		for x in 0 ..< size {
+			pixels[y * size + x] = 255 if qrcode.get_module(qrcode_buf, x, y) else 0
+		}
+	}
+
+	id, reg_ok := draw.register_texture(
+		draw.Texture_Desc {
+			width = u32(size),
+			height = u32(size),
+			depth_or_layers = 1,
+			type = .D2,
+			format = .R8_UNORM,
+			usage = {.SAMPLER},
+			mip_levels = 1,
+			kind = .Static,
+		},
+		pixels,
+	)
+	if !reg_ok do return {}, false
+
+	return QR{texture_id = id, size = size}, true
+}
+
+// Release the GPU texture.
+destroy :: proc(qr: ^QR) {
+	draw.unregister_texture(qr.texture_id)
+	qr.texture_id = draw.INVALID_TEXTURE
+	qr.size = 0
+}
+
+// Convenience: build a Clay_Image_Data for embedding a QR in Clay layouts.
+// Uses Nearest_Clamp sampling (set via Sampler_Preset at draw time, not here) and Fit mode
+// to preserve the QR's square aspect ratio.
+clay_image :: proc(qr: QR, tint: draw.Color = draw.WHITE) -> draw.Clay_Image_Data {
+	return draw.clay_image_data(qr.texture_id, fit = .Fit, tint = tint)
+}
diff --git a/draw/examples/hellope.odin b/draw/examples/hellope.odin
index 08026da..eb945bd 100644
--- a/draw/examples/hellope.odin
+++ b/draw/examples/hellope.odin
@@ -78,10 +78,11 @@ hellope_shapes :: proc() {
 		draw.ellipse(base_layer, {410, 340}, 50, 30, {255, 200, 50, 255}, rotation = spin_angle)
 
 		// Circle orbiting a point (moon orbiting planet)
+		// Convention B: center = pivot point (planet), origin = offset from moon center to pivot.
+		// Moon's visual center at rotation=0: planet_pos - origin = (100, 450) - (0, 40) = (100, 410).
 		planet_pos := [2]f32{100, 450}
-		moon_pos := planet_pos + {0, -40}
 		draw.circle(base_layer, planet_pos, 8, {200, 200, 200, 255}) // planet (stationary)
-		draw.circle(base_layer, moon_pos, 5, {100, 150, 255, 255}, origin = {0, 40}, rotation = spin_angle) // moon orbiting
+		draw.circle(base_layer, planet_pos, 5, {100, 150, 255, 255}, origin = {0, 40}, rotation = spin_angle) // moon orbiting
 
 		// Ring arc rotating in place
 		draw.ring(base_layer, {250, 450}, 15, 30, 0, 270, {100, 100, 220, 255}, rotation = spin_angle)
diff --git a/draw/examples/main.odin b/draw/examples/main.odin
index f8107eb..e3ee109 100644
--- a/draw/examples/main.odin
+++ b/draw/examples/main.odin
@@ -57,7 +57,7 @@ main :: proc() {
 	args := os.args
 	if len(args) < 2 {
 		fmt.eprintln("Usage: examples <example_name>")
-		fmt.eprintln("Available examples: hellope-shapes, hellope-text, hellope-clay, hellope-custom")
+		fmt.eprintln("Available examples: hellope-shapes, hellope-text, hellope-clay, hellope-custom, textures")
 		os.exit(1)
 	}
 
@@ -66,9 +66,10 @@ main :: proc() {
 	case "hellope-custom": hellope_custom()
 	case "hellope-shapes": hellope_shapes()
 	case "hellope-text": hellope_text()
+	case "textures": textures()
 	case:
 		fmt.eprintf("Unknown example: %v\n", args[1])
-		fmt.eprintln("Available examples: hellope-shapes, hellope-text, hellope-clay, hellope-custom")
+		fmt.eprintln("Available examples: hellope-shapes, hellope-text, hellope-clay, hellope-custom, textures")
 		os.exit(1)
 	}
 }
diff --git a/draw/examples/textures.odin b/draw/examples/textures.odin
new file mode 100644
index 0000000..ca53ba3
--- /dev/null
+++ b/draw/examples/textures.odin
@@ -0,0 +1,285 @@
+package examples
+
+import "../../draw"
+import "../../draw/draw_qr"
+import "core:math"
+import "core:os"
+import sdl "vendor:sdl3"
+
+textures :: proc() {
+	if !sdl.Init({.VIDEO}) do os.exit(1)
+	window := sdl.CreateWindow("Textures", 800, 600, {.HIGH_PIXEL_DENSITY})
+	gpu := sdl.CreateGPUDevice(draw.PLATFORM_SHADER_FORMAT, true, nil)
+	if !sdl.ClaimWindowForGPUDevice(gpu, window) do os.exit(1)
+	if !draw.init(gpu, window) do os.exit(1)
+	JETBRAINS_MONO_REGULAR = draw.register_font(JETBRAINS_MONO_REGULAR_RAW)
+
+	FONT_SIZE :: u16(14)
+	LABEL_OFFSET :: f32(8) // gap between item and its label
+
+	// -------------------------------------------------------------------------
+	// Procedural checkerboard texture (8x8, RGBA8)
+	// -------------------------------------------------------------------------
+	checker_size :: 8
+	checker_pixels: [checker_size * checker_size * 4]u8
+	for y in 0 ..< checker_size {
+		for x in 0 ..< checker_size {
+			i := (y * checker_size + x) * 4
+			is_dark := ((x + y) % 2) == 0
+			val: u8 = 40 if is_dark else 220
+			checker_pixels[i + 0] = val // R
+			checker_pixels[i + 1] = val / 2 // G — slight color tint
+			checker_pixels[i + 2] = val // B
+			checker_pixels[i + 3] = 255 // A
+		}
+	}
+	checker_texture, _ := draw.register_texture(
+		draw.Texture_Desc {
+			width = checker_size,
+			height = checker_size,
+			depth_or_layers = 1,
+			type = .D2,
+			format = .R8G8B8A8_UNORM,
+			usage = {.SAMPLER},
+			mip_levels = 1,
+		},
+		checker_pixels[:],
+	)
+	defer draw.unregister_texture(checker_texture)
+
+	// -------------------------------------------------------------------------
+	// Non-square gradient stripe texture (16x8, RGBA8) for fit mode demos
+	// -------------------------------------------------------------------------
+	stripe_w :: 16
+	stripe_h :: 8
+	stripe_pixels: [stripe_w * stripe_h * 4]u8
+	for y in 0 ..< stripe_h {
+		for x in 0 ..< stripe_w {
+			i := (y * stripe_w + x) * 4
+			stripe_pixels[i + 0] = u8(x * 255 / (stripe_w - 1)) // R gradient left→right
+			stripe_pixels[i + 1] = u8(y * 255 / (stripe_h - 1)) // G gradient top→bottom
+			stripe_pixels[i + 2] = 128 // B constant
+			stripe_pixels[i + 3] = 255 // A
+		}
+	}
+	stripe_texture, _ := draw.register_texture(
+		draw.Texture_Desc {
+			width = stripe_w,
+			height = stripe_h,
+			depth_or_layers = 1,
+			type = .D2,
+			format = .R8G8B8A8_UNORM,
+			usage = {.SAMPLER},
+			mip_levels = 1,
+		},
+		stripe_pixels[:],
+	)
+	defer draw.unregister_texture(stripe_texture)
+
+	// -------------------------------------------------------------------------
+	// QR code texture (R8_UNORM — see rendering note below)
+	// -------------------------------------------------------------------------
+	qr, _ := draw_qr.create_from_text("https://odin-lang.org/")
+	defer draw_qr.destroy(&qr)
+
+	spin_angle: f32 = 0
+
+	for {
+		defer free_all(context.temp_allocator)
+		ev: sdl.Event
+		for sdl.PollEvent(&ev) {
+			if ev.type == .QUIT do return
+		}
+		spin_angle += 1
+
+		base_layer := draw.begin({width = 800, height = 600})
+
+		// Background
+		draw.rectangle(base_layer, {0, 0, 800, 600}, {30, 30, 30, 255})
+
+		// =====================================================================
+		// Row 1: Sampler presets (y=30)
+		// =====================================================================
+		ROW1_Y :: f32(30)
+		ITEM_SIZE :: f32(120)
+		COL1 :: f32(30)
+		COL2 :: f32(180)
+		COL3 :: f32(330)
+		COL4 :: f32(480)
+
+		// Nearest (sharp pixel edges)
+		draw.rectangle_texture(
+			base_layer,
+			{COL1, ROW1_Y, ITEM_SIZE, ITEM_SIZE},
+			checker_texture,
+			sampler = .Nearest_Clamp,
+		)
+		draw.text(
+			base_layer,
+			"Nearest",
+			{COL1, ROW1_Y + ITEM_SIZE + LABEL_OFFSET},
+			JETBRAINS_MONO_REGULAR,
+			FONT_SIZE,
+			color = draw.WHITE,
+		)
+
+		// Linear (bilinear blur)
+		draw.rectangle_texture(
+			base_layer,
+			{COL2, ROW1_Y, ITEM_SIZE, ITEM_SIZE},
+			checker_texture,
+			sampler = .Linear_Clamp,
+		)
+		draw.text(
+			base_layer,
+			"Linear",
+			{COL2, ROW1_Y + ITEM_SIZE + LABEL_OFFSET},
+			JETBRAINS_MONO_REGULAR,
+			FONT_SIZE,
+			color = draw.WHITE,
+		)
+
+		// Tiled (4x repeat)
+		draw.rectangle_texture(
+			base_layer,
+			{COL3, ROW1_Y, ITEM_SIZE, ITEM_SIZE},
+			checker_texture,
+			sampler = .Nearest_Repeat,
+			uv_rect = {0, 0, 4, 4},
+		)
+		draw.text(
+			base_layer,
+			"Tiled 4x",
+			{COL3, ROW1_Y + ITEM_SIZE + LABEL_OFFSET},
+			JETBRAINS_MONO_REGULAR,
+			FONT_SIZE,
+			color = draw.WHITE,
+		)
+
+		// =====================================================================
+		// Row 2: QR code, Rounded, Rotating (y=190)
+		// =====================================================================
+		ROW2_Y :: f32(190)
+
+		// QR code (R8_UNORM texture, nearest sampling)
+		// NOTE: R8_UNORM samples as (r, 0, 0, 1) in Metal's default swizzle.
+		// With WHITE tint: dark modules (R=1) → red, light modules (R=0) → black.
+		// The result is a red-on-black QR code. The white bg rect below is
+		// occluded by the fully-opaque texture but kept for illustration.
+		draw.rectangle(base_layer, {COL1, ROW2_Y, ITEM_SIZE, ITEM_SIZE}, {255, 255, 255, 255}) // white bg
+		draw.rectangle_texture(
+			base_layer,
+			{COL1, ROW2_Y, ITEM_SIZE, ITEM_SIZE},
+			qr.texture_id,
+			sampler = .Nearest_Clamp,
+		)
+		draw.text(
+			base_layer,
+			"QR Code",
+			{COL1, ROW2_Y + ITEM_SIZE + LABEL_OFFSET},
+			JETBRAINS_MONO_REGULAR,
+			FONT_SIZE,
+			color = draw.WHITE,
+		)
+
+		// Rounded corners
+		draw.rectangle_texture(
+			base_layer,
+			{COL2, ROW2_Y, ITEM_SIZE, ITEM_SIZE},
+			checker_texture,
+			sampler = .Nearest_Clamp,
+			roundness = 0.3,
+		)
+		draw.text(
+			base_layer,
+			"Rounded",
+			{COL2, ROW2_Y + ITEM_SIZE + LABEL_OFFSET},
+			JETBRAINS_MONO_REGULAR,
+			FONT_SIZE,
+			color = draw.WHITE,
+		)
+
+		// Rotating
+		rot_rect := draw.Rectangle{COL3, ROW2_Y, ITEM_SIZE, ITEM_SIZE}
+		draw.rectangle_texture(
+			base_layer,
+			rot_rect,
+			checker_texture,
+			sampler = .Nearest_Clamp,
+			origin = draw.center_of(rot_rect),
+			rotation = spin_angle,
+		)
+		draw.text(
+			base_layer,
+			"Rotating",
+			{COL3, ROW2_Y + ITEM_SIZE + LABEL_OFFSET},
+			JETBRAINS_MONO_REGULAR,
+			FONT_SIZE,
+			color = draw.WHITE,
+		)
+
+		// =====================================================================
+		// Row 3: Fit modes + Per-corner radii (y=360)
+		// =====================================================================
+		ROW3_Y :: f32(360)
+		FIT_SIZE :: f32(120) // square target rect
+
+		// Stretch
+		uv_s, sampler_s, inner_s := draw.fit_params(.Stretch, {COL1, ROW3_Y, FIT_SIZE, FIT_SIZE}, stripe_texture)
+		draw.rectangle(base_layer, {COL1, ROW3_Y, FIT_SIZE, FIT_SIZE}, {60, 60, 60, 255}) // bg
+		draw.rectangle_texture(base_layer, inner_s, stripe_texture, uv_rect = uv_s, sampler = sampler_s)
+		draw.text(
+			base_layer,
+			"Stretch",
+			{COL1, ROW3_Y + FIT_SIZE + LABEL_OFFSET},
+			JETBRAINS_MONO_REGULAR,
+			FONT_SIZE,
+			color = draw.WHITE,
+		)
+
+		// Fill (center-crop)
+		uv_f, sampler_f, inner_f := draw.fit_params(.Fill, {COL2, ROW3_Y, FIT_SIZE, FIT_SIZE}, stripe_texture)
+		draw.rectangle(base_layer, {COL2, ROW3_Y, FIT_SIZE, FIT_SIZE}, {60, 60, 60, 255})
+		draw.rectangle_texture(base_layer, inner_f, stripe_texture, uv_rect = uv_f, sampler = sampler_f)
+		draw.text(
+			base_layer,
+			"Fill",
+			{COL2, ROW3_Y + FIT_SIZE + LABEL_OFFSET},
+			JETBRAINS_MONO_REGULAR,
+			FONT_SIZE,
+			color = draw.WHITE,
+		)
+
+		// Fit (letterbox)
+		uv_ft, sampler_ft, inner_ft := draw.fit_params(.Fit, {COL3, ROW3_Y, FIT_SIZE, FIT_SIZE}, stripe_texture)
+		draw.rectangle(base_layer, {COL3, ROW3_Y, FIT_SIZE, FIT_SIZE}, {60, 60, 60, 255}) // visible margin bg
+		draw.rectangle_texture(base_layer, inner_ft, stripe_texture, uv_rect = uv_ft, sampler = sampler_ft)
+		draw.text(
+			base_layer,
+			"Fit",
+			{COL3, ROW3_Y + FIT_SIZE + LABEL_OFFSET},
+			JETBRAINS_MONO_REGULAR,
+			FONT_SIZE,
+			color = draw.WHITE,
+		)
+
+		// Per-corner radii
+		draw.rectangle_texture_corners(
+			base_layer,
+			{COL4, ROW3_Y, FIT_SIZE, FIT_SIZE},
+			{20, 0, 20, 0},
+			checker_texture,
+			sampler = .Nearest_Clamp,
+		)
+		draw.text(
+			base_layer,
+			"Per-corner",
+			{COL4, ROW3_Y + FIT_SIZE + LABEL_OFFSET},
+			JETBRAINS_MONO_REGULAR,
+			FONT_SIZE,
+			color = draw.WHITE,
+		)
+
+		draw.end(gpu, window)
+	}
+}
diff --git a/draw/pipeline_2d_base.odin b/draw/pipeline_2d_base.odin
index 7b27ca2..a69facb 100644
--- a/draw/pipeline_2d_base.odin
+++ b/draw/pipeline_2d_base.odin
@@ -35,6 +35,7 @@ Shape_Kind :: enum u8 {
 
 Shape_Flag :: enum u8 {
 	Stroke,
+	Textured,
 }
 
 Shape_Flags :: bit_set[Shape_Flag;u8]
@@ -106,9 +107,10 @@ Primitive :: struct {
 	rotation:   f32, // 24: shader self-rotation in radians (used by RRect, Ellipse)
 	_pad:       f32, // 28: alignment to vec4 boundary
 	params:     Shape_Params, // 32: two vec4s of shape params
+	uv_rect:    [4]f32, // 64: u_min, v_min, u_max, v_max (default {0,0,1,1})
 }
 
-#assert(size_of(Primitive) == 64)
+#assert(size_of(Primitive) == 80)
 
 pack_kind_flags :: #force_inline proc(kind: Shape_Kind, flags: Shape_Flags) -> u32 {
 	return u32(kind) | (u32(transmute(u8)flags) << 8)
@@ -566,6 +568,7 @@ draw_layer :: proc(
 	current_mode: Draw_Mode = .Tessellated
 	current_vert_buf := main_vert_buf
 	current_atlas: ^sdl.GPUTexture
+	current_sampler := sampler
 
 	// Text vertices live after shape vertices in the GPU vertex buffer
 	text_vertex_gpu_base := u32(len(GLOB.tmp_shape_verts))
@@ -584,14 +587,24 @@ draw_layer :: proc(
 					sdl.BindGPUVertexBuffers(render_pass, 0, &sdl.GPUBufferBinding{buffer = main_vert_buf, offset = 0}, 1)
 					current_vert_buf = main_vert_buf
 				}
-				if current_atlas != white_texture {
+				// Determine texture and sampler for this batch
+				batch_texture: ^sdl.GPUTexture = white_texture
+				batch_sampler: ^sdl.GPUSampler = sampler
+				if batch.texture_id != INVALID_TEXTURE {
+					if bound_texture := texture_gpu_handle(batch.texture_id); bound_texture != nil {
+						batch_texture = bound_texture
+					}
+					batch_sampler = get_sampler(batch.sampler)
+				}
+				if current_atlas != batch_texture || current_sampler != batch_sampler {
 					sdl.BindGPUFragmentSamplers(
 						render_pass,
 						0,
-						&sdl.GPUTextureSamplerBinding{texture = white_texture, sampler = sampler},
+						&sdl.GPUTextureSamplerBinding{texture = batch_texture, sampler = batch_sampler},
 						1,
 					)
-					current_atlas = white_texture
+					current_atlas = batch_texture
+					current_sampler = batch_sampler
 				}
 				sdl.DrawGPUPrimitives(render_pass, batch.count, 1, batch.offset, 0)
 
@@ -632,14 +645,24 @@ draw_layer :: proc(
 					sdl.BindGPUVertexBuffers(render_pass, 0, &sdl.GPUBufferBinding{buffer = unit_quad, offset = 0}, 1)
 					current_vert_buf = unit_quad
 				}
-				if current_atlas != white_texture {
+				// Determine texture and sampler for this batch
+				batch_texture: ^sdl.GPUTexture = white_texture
+				batch_sampler: ^sdl.GPUSampler = sampler
+				if batch.texture_id != INVALID_TEXTURE {
+					if bound_texture := texture_gpu_handle(batch.texture_id); bound_texture != nil {
+						batch_texture = bound_texture
+					}
+					batch_sampler = get_sampler(batch.sampler)
+				}
+				if current_atlas != batch_texture || current_sampler != batch_sampler {
 					sdl.BindGPUFragmentSamplers(
 						render_pass,
 						0,
-						&sdl.GPUTextureSamplerBinding{texture = white_texture, sampler = sampler},
+						&sdl.GPUTextureSamplerBinding{texture = batch_texture, sampler = batch_sampler},
 						1,
 					)
-					current_atlas = white_texture
+					current_atlas = batch_texture
+					current_sampler = batch_sampler
 				}
 				sdl.DrawGPUPrimitives(render_pass, 6, batch.count, 0, batch.offset)
 			}
diff --git a/draw/shaders/generated/base_2d.frag.metal b/draw/shaders/generated/base_2d.frag.metal
index e03eb46..7a4b934 100644
--- a/draw/shaders/generated/base_2d.frag.metal
+++ b/draw/shaders/generated/base_2d.frag.metal
@@ -25,6 +25,7 @@ struct main0_in
     float4 f_params2 [[user(locn3)]];
     uint f_kind_flags [[user(locn4)]];
     float f_rotation [[user(locn5), flat]];
+    float4 f_uv_rect [[user(locn6), flat]];
 };
 
 static inline __attribute__((always_inline))
@@ -69,6 +70,12 @@ float sdf_stroke(thread const float& d, thread const float& stroke_width)
     return abs(d) - (stroke_width * 0.5);
 }
 
+static inline __attribute__((always_inline))
+float sdf_alpha(thread const float& d, thread const float& soft)
+{
+    return 1.0 - smoothstep(-soft, soft, d);
+}
+
 static inline __attribute__((always_inline))
 float sdCircle(thread const float2& p, thread const float& r)
 {
@@ -127,12 +134,6 @@ float sdSegment(thread const float2& p, thread const float2& a, thread const flo
     return length(pa - (ba * h));
 }
 
-static inline __attribute__((always_inline))
-float sdf_alpha(thread const float& d, thread const float& soft)
-{
-    return 1.0 - smoothstep(-soft, soft, d);
-}
-
 fragment main0_out main0(main0_in in [[stage_in]], texture2d<float> tex [[texture(0)]], sampler texSmplr [[sampler(0)]])
 {
     main0_out out = {};
@@ -169,6 +170,25 @@ fragment main0_out main0(main0_in in [[stage_in]], texture2d<float> tex [[textur
             float param_6 = stroke_px;
             d = sdf_stroke(param_5, param_6);
         }
+        float4 shape_color = in.f_color;
+        if ((flags & 2u) != 0u)
+        {
+            float2 p_for_uv = in.f_local_or_uv;
+            if (in.f_rotation != 0.0)
+            {
+                float2 param_7 = p_for_uv;
+                float param_8 = in.f_rotation;
+                p_for_uv = apply_rotation(param_7, param_8);
+            }
+            float2 local_uv = ((p_for_uv / b) * 0.5) + float2(0.5);
+            float2 uv = mix(in.f_uv_rect.xy, in.f_uv_rect.zw, local_uv);
+            shape_color *= tex.sample(texSmplr, uv);
+        }
+        float param_9 = d;
+        float param_10 = soft;
+        float alpha = sdf_alpha(param_9, param_10);
+        out.out_color = float4(shape_color.xyz, shape_color.w * alpha);
+        return out;
     }
     else
     {
@@ -177,14 +197,14 @@ fragment main0_out main0(main0_in in [[stage_in]], texture2d<float> tex [[textur
             float radius = in.f_params.x;
             soft = fast::max(in.f_params.y, 1.0);
             float stroke_px_1 = in.f_params.z;
-            float2 param_7 = in.f_local_or_uv;
-            float param_8 = radius;
-            d = sdCircle(param_7, param_8);
+            float2 param_11 = in.f_local_or_uv;
+            float param_12 = radius;
+            d = sdCircle(param_11, param_12);
             if ((flags & 1u) != 0u)
             {
-                float param_9 = d;
-                float param_10 = stroke_px_1;
-                d = sdf_stroke(param_9, param_10);
+                float param_13 = d;
+                float param_14 = stroke_px_1;
+                d = sdf_stroke(param_13, param_14);
             }
         }
         else
@@ -197,19 +217,19 @@ fragment main0_out main0(main0_in in [[stage_in]], texture2d<float> tex [[textur
                 float2 p_local_1 = in.f_local_or_uv;
                 if (in.f_rotation != 0.0)
                 {
-                    float2 param_11 = p_local_1;
-                    float param_12 = in.f_rotation;
-                    p_local_1 = apply_rotation(param_11, param_12);
+                    float2 param_15 = p_local_1;
+                    float param_16 = in.f_rotation;
+                    p_local_1 = apply_rotation(param_15, param_16);
                 }
-                float2 param_13 = p_local_1;
-                float2 param_14 = ab;
-                float _560 = sdEllipse(param_13, param_14);
-                d = _560;
+                float2 param_17 = p_local_1;
+                float2 param_18 = ab;
+                float _616 = sdEllipse(param_17, param_18);
+                d = _616;
                 if ((flags & 1u) != 0u)
                 {
-                    float param_15 = d;
-                    float param_16 = stroke_px_2;
-                    d = sdf_stroke(param_15, param_16);
+                    float param_19 = d;
+                    float param_20 = stroke_px_2;
+                    d = sdf_stroke(param_19, param_20);
                 }
             }
             else
@@ -220,10 +240,10 @@ fragment main0_out main0(main0_in in [[stage_in]], texture2d<float> tex [[textur
                     float2 b_1 = in.f_params.zw;
                     float width = in.f_params2.x;
                     soft = fast::max(in.f_params2.y, 1.0);
-                    float2 param_17 = in.f_local_or_uv;
-                    float2 param_18 = a;
-                    float2 param_19 = b_1;
-                    d = sdSegment(param_17, param_18, param_19) - (width * 0.5);
+                    float2 param_21 = in.f_local_or_uv;
+                    float2 param_22 = a;
+                    float2 param_23 = b_1;
+                    d = sdSegment(param_21, param_22, param_23) - (width * 0.5);
                 }
                 else
                 {
@@ -243,16 +263,16 @@ fragment main0_out main0(main0_in in [[stage_in]], texture2d<float> tex [[textur
                         }
                         float ang_start = mod(start_rad, 6.283185482025146484375);
                         float ang_end = mod(end_rad, 6.283185482025146484375);
-                        float _654;
+                        float _710;
                         if (ang_end > ang_start)
                         {
-                            _654 = float((angle >= ang_start) && (angle <= ang_end));
+                            _710 = float((angle >= ang_start) && (angle <= ang_end));
                         }
                         else
                         {
-                            _654 = float((angle >= ang_start) || (angle <= ang_end));
+                            _710 = float((angle >= ang_start) || (angle <= ang_end));
                         }
-                        float in_arc = _654;
+                        float in_arc = _710;
                         if (abs(ang_end - ang_start) >= 6.282185077667236328125)
                         {
                             in_arc = 1.0;
@@ -277,9 +297,9 @@ fragment main0_out main0(main0_in in [[stage_in]], texture2d<float> tex [[textur
                             d = (length(p) * cos(bn)) - radius_1;
                             if ((flags & 1u) != 0u)
                             {
-                                float param_20 = d;
-                                float param_21 = stroke_px_3;
-                                d = sdf_stroke(param_20, param_21);
+                                float param_24 = d;
+                                float param_25 = stroke_px_3;
+                                d = sdf_stroke(param_24, param_25);
                             }
                         }
                     }
@@ -287,10 +307,9 @@ fragment main0_out main0(main0_in in [[stage_in]], texture2d<float> tex [[textur
             }
         }
     }
-    float param_22 = d;
-    float param_23 = soft;
-    float alpha = sdf_alpha(param_22, param_23);
-    out.out_color = float4(in.f_color.xyz, in.f_color.w * alpha);
+    float param_26 = d;
+    float param_27 = soft;
+    float alpha_1 = sdf_alpha(param_26, param_27);
+    out.out_color = float4(in.f_color.xyz, in.f_color.w * alpha_1);
     return out;
 }
-
diff --git a/draw/shaders/generated/base_2d.frag.spv b/draw/shaders/generated/base_2d.frag.spv
index 39179297819cb4e1f6a9a29737cab241a4c3ce03..c1411b31d8b912d85965a6d1cdfbf070e642fbd9 100644
GIT binary patch
literal 19164
zcmZvi2bf(&`Nl8VT@p$NJ@h0%AP@);YA7KDAwf}^QY<W+>@GyoNP(azSrh>k6a+<4
z6h)9GC@Kh%5)=d#!2&8G#e$$nQz=sZzu&p%O(y60pXbgq@B4n=eDlqmIdf+3-Ht(9
z3~senYOU59(wf-Os;;$KgHT#%D_8ZDgO50P-24T-<969~M;+E^wH@^t)*9OCq_)iJ
znK7HH^%V@)P_CnVpR$zlQ_3ThH3v~>uRk4wI_RzK(}h~b>B3_uw)wsL&zLuTX5W~T
z`^WFqztdhwtuETFMm@L2t5eUb@G7nKsOR?{K4;<V-oD<+a~4(YY<qJ5E;hF9+SF;g
z4tQFvuS?yIGo-aKcz*8@eaFx0o4ue2GM*sD*Z_NCYzXe5sBc8wjyJe95<I{6fSEIA
z%$?sJfQhvIHp7<oBfvd0a?GQvHvQc_Gv}VrGll^dW8MN=FU7gqQolL<3#h7g+sJ$S
zyXP;MH|NB@<qgegYi#XyE4RkzSNU|GGNX6F2~gS(X^o@qnLBspsonGDEa+J<W6td5
zGj5FSu($K=Y;CXa?Bi$lRc$+3M^m3vW9zw0hHVytr`7mWu$`%2Og*#4ms8KG@wMXd
zZJ!&!E-3wP7T5X`@bp^0OT2U2=U#Afco5tkZ%6B4@bMJmJVBito&wLW^=GJ^NPV88
zp0}utUlWtKde=(yHK&b-gA->2IB~{;9kc$sffHx4xQ;VL%z*VdR$Tj^37!s7KL<R&
z!kw*4z$ea_?QpQITs>d?-74MvGkcD=YxK_6W%xztLh%`#vv5K8^f@!<wEO6+p2_}N
zv7hhav!HKL71w9AzkBAK={+;M=gjL~cyblDqjd{<doDX$OTf$5Tu19pczdmPRnK{U
z_uQU&J+tbZI$A%$-tN1r+5=)AU&Zcf{nmKxwK$h4ug39vw7Km(;9b?e(Z6g>wtXJg
z$DZmoPd4~p8~mvTf4ae6Xz-U-;9af%!snmRGq-Qq+;z3y(5JgUQ&r{OF=)lSx>n#k
zW$^Y4SJO+Ij@B?VnPYW->}Tq`=k-l*ugA{TCTP`OP>*#}csceF4ZbscUQh3gh1>;W
zPJlP(c<)A=i4A^WgHK(7ceD<LxA&{A){)?5{A2WAb`LvS-7B>3Z1psFZ-e(Y`0NIs
zy8`cMoeJ-%=BJ}|8hBbgSEqxs$@z?&2`}gFtOh@)!7qZ(m_55M&(Fnho~fF5wJw3r
zU(hpeK{qpA_jxH=9}lOB_xEym+FuFAGGjIylKET@U%roawU)FwdpI%cc(>OYF{?I9
z;WK7;_snAxG{&>=@_s)ruXk`lR6Z}E&7aZR*WRbPTCd7ic<wq{ucPIC^d>la>`EQ@
zR&hF7Yk+&$lpWXF;As`^Y;6NB=XGp@kF%ZcvpSz08upzEK84txA=bcm3gZ*|uI#M2
z`<mQKyXTCOyQht6n=89~b2vk)xci*`R2BDpB9Gm(iky_NcU1dNa`6nKt*f^Ayv8Qh
zwlZZMq8V=#?TsT|AEa$0WjJ{crj~0P2_HhOW|6N_V|_-^<EqqR{j?clSgkoXwpo+f
z_UfbXUW+<yY^Pm)cth7_ADcFGwashIJ}qivsqH7ZtViFw!ADZO#@2lIHm$gM>{;`5
z8MSAz_)P_`MA?qm#yx?euSFl*4p3UH#ruZeS@B7e!tbwmYZ-qQ{_u^DF7qEl2iIJE
z^~Ct^ZHKMJ%lsEtytVHJ^mhT+JaTWoRP**2J6m6d>n`65f4S>a&G_nHg}hbpZ@`@|
z@6**_*M;}V_Hx&?n)YkKe3seeu5+~wxeM0=oD+F@g>Oux8zA}_?>n_Y&Hm&*7hc-G
z2bMZ!?Q)+JwanM8ko#Ud=p^mt<8z}u+~-D~@jn2`{5@FP%sKXl&>XMrwfk)GSr*Ud
zN$xt|9Bf<t(|#eE>puK^xX-0r-ygys8Fufa@W<d+4Q_Mq&(rYaV?B)Ej|@9#(r!e-
zRK=#M`g{e&*yie5@JyDul3!MEpApYYvCBO-g}VmsEV$34=cL&6_goaNzh|P7KVESC
zJ^z$;&p##i{1a|}o`1q^|3ZU%{)t_G&p+Yj<JqU=o_oTL=eZ|bf6qN7_skQn|0WIY
z8K<<5XmHOsrF}xd?Qidb8{hLy>F@cb<eqQBZGULNZSNVUv>(&p-38a*^G)gR`KIKa
zZ^HHWd{gqd1vj5l8~n6_+y3-|>+d-y@#LO!O71x)-1eSxO71zQ<eqa%?ir`#o^eX<
zIj7{Fb4tFX!EbMH&p4(3-3{)Ur?fxc;GTC%`>O?aK3*@l<M+H%`g`Unx#ymed-f^0
z=bw^${wcZVpOSn2DY@q#`9AC~?oF5SeEB}Phh0bhHL#lRl!@#}?muer|2kOzrM3U%
zU^U-4iSbRadSbX|sU^l0VB=m*4ByRHg4KM7zf7N3fxTWMM{VDtsJS<ajpO_5+hE^S
za^E*MQhe9Q9pewczFUr>AN_9v>!W^Vt=$YZjyA{ZyHV{8-U#+*zv}L@x7OG_c7S*p
z&yV2ccy5EMO~uFY+yVA-JlbxjjAI=bU!3vW1@>Kc2-%zekHKmdFDr9#PuG@qcZ17z
z_rTRGbVG~fzW1W}{!AbDfz>QtR);Y+zNdZ$F_u@fpVwOH_X~(|O27MSP5Dqp<NW^;
zVoYP$MjyxHJ4xGq^y~ZiR}^#fz3v#z<=3^oI<e%xf%B=pW70MNHkS8BTVg#7HkN0A
z#QGgrU4P$ya&6A}AHc2|ZN3ZTY4=C)Bb1xS({=kOSk1Ge_xUlfm-kuQpD1eHXL0&?
z0&Hyic$`|EKK=~04{c9U%O@~mbNdT;9JOt<B}esF=_7qvr?0<()0gjRd2D|NyOxae
z6t!HN_v#t2_eI;&)ba`S-2VgYcb~?yop$|vMzk5jz2u+Z%P83wo&~G<TwGJn;lJSO
zdG?<NdpU>No};Kahhk%U|6c^#pZ$8KkjM6Ku=iElOVo1b(02a;Z$oVxZOL2RIZR*H
znTwaf*VVqBPvjZnE8udBufo+Hpk$1%fz>m{*TG(nQQQA0YK~ECZ14A*;H<?r!19do
zEpWLOY1&fDS{&qq&t)IlR-&lchd6z7f{pDwc(#(KkCnk$i=MOOS&M_gu0`8uv%PWD
zoyYWLoxWB9>l3~z-1T=hYr*lZ2G>XZyjoiwtiBk(Lu+kl&D92KZ4ESiwV8|OKsEbv
zJ+B4!oS%JfZLpffi%`pa*Fh_N*M+ND_z->h7}i6uuhsdB%Q37EHkNZ^AGX!UoIHza
zH>VqL8V)YsR~x|f$@^+Uus-S;$3|e+VBTLF!`1Y)4{d7RgRCiQ+q?cZ!<Og9v#&h1
zk>Dlx8pm_6JkQN2@MzlGMw>pHP^%}mEx`7F2y?MHcuR_!#S2q4f9}nmg+`<6{}3fP
zY=x#CzBSl>9IyG00qdilchWXs`|^smZLQg7w6V3OYu-uQL2R$TcH=v?YiOt~=K$Z=
z=H$M%1Nai`Ilt@(SM!<oJun{Z<+G=4C(1i0K6~Q4n|1*&rH%VS#=9$;x;gAjEsy_h
zVEx@c<G(wadftnBfN`nbi`oXj#`3;uORPP?#xm!`+6zrx{|VG`ZLaHmz}buTR<CW|
zlYPPFYP*Tl@~oRl;Gy`~Mw>paKXqd}pZkH!J#;c$t$a`H4_D6~dH~qVIo9?zikfpQ
z&X}fvv)&H`%VRqT>^RM7Ds{!{_r%-bz9(#>EqSZ^Y^5)2x%b?@z0Vof!QgV8yc4eG
zdUu^10`_tY+TKN(Pce?zIPP7CfxTDRyAFq|S-db+`5g+^mV0~zxOtD?ji#RU@*c45
z$56Bl)UjMI+7jz1u(8T#^=LG8=jKRixjDOD-V4rpIR-4xyuS}zzPsKJSIfKWSg@D#
zqU{3|HRnZ~K8^z$+y1+$<>_M@IPb0=usrXs>0tL~+i0`Ban!w+>B~BO^?}n@FIXPi
z@nGlPIQ`V}tp5|hBk9XF+VpWfs@vZ6KLcE@{|~~|%JqLDTs`Z57TC)<(l(Q#<{XJL
z4|Bj-|FgmJ*iHg_-^^()wLE#x1H1lhqb+%>J4fluTJBo5Z`W_ewE$eM|C8Zr3n{Mu
zMbuu7LE9;m&rpmbHjeB6Ltxil*8hj$Y8Ed{Req<!wWZxhz-7CS!qqH%$e`WsG>EpW
z^~GTSW+?aN0W@{z;dE+w{LcXEU+&8vLsQRl_HnRz`kZMSsQdOg(`L>-H=h9Kxj7Ro
zkL_%*V>E}esO7PJ66`p%okK0pdHqx1jfrhM+iAC*d-tcojwkQebHQqh$?IeFJNG=e
zF|=hJsCnPLhu$ymn=xI_p9Poq{Q|h!`4sQ_h16c&cWs}eJVh~%IP>~>u=lUL4_`o2
zH;;>`<?;U_SpV`qd<jim|BI>R@&7Vd|MDI96*P7IFQJyl|EpmA%RTUGXzKc3N-dB7
zW#HyI`Z}6=*2Oo#gsj$uwgIrQTo>AMf4>QKU94WepRTCeO~zlFZ7!!)&%ON?*f|Qn
z3an4=>(yX=)Z=pvSfB7~!OmOSTnE-iJ!AMbIAbuTKIU^JwYJR14PeJm-k<NFsmJHL
zwNH7^Z$wj1f8PV!pZ^PEd+qjjJ+-#v{(Z38&8%bpUjG4DE$^9|!CtOIZ8uTWT!-Ss
zxdm*Tf%<v46|D9{O8Qmvc}RQfwEq#<wRCRX{x+~$+AjgC#s3blYbpMBg4N=GJ6J9L
zcY)Qye+)L}E%ms60@g>}=kKS~8Mn53C=XH`w>WX`1v~EW`@mJMb=;r9^-+({&%tGz
zU%<_6N!{jtus-Tn*4i(@#?p3Gy}lm+s}GQ<_se;{8?Np=I|lPL@AUmEaM|~-;bp$R
zf$O85HopazZ61P``92KSNBxRA-`|0ar7ijX9<1KXSG~;Fywmp|z>dkWEv1&n_6WGV
zr+<VymOO`#g7s0)p71BI&xOBLX}_(`_i?bc$0*+aC#k($_u8JIs5zhF<l|V(JvQs)
z_h;}@O4it4z-o(0AU=PEy9Z>B{sz`h{Z{<!$FqsL{ai?G4o^`V!*|Kw!R5Q;8MxZh
z6vy&UYA?s4?H?31<A}4b{0qF4zRG>&IW%?ic$Qio{};gemwVNVXzKbuPc4uCzrp&K
z?~?zZsq6m|wLJd+1?ykFOI}7(&p!7G7?<kZpltwbEZ+^<66-Z^^F8xFG<E%7rItIV
zo-y74=Zx_>Se|?K7TB?6P0_r{VGY`8Gv1rj>S;3wT(;?`+xT~%Hrx0+hkDv{f{hd2
z1=r_va`arXGFUD4!L>bn2;BDhPG}XlF>M!ZRj}>!Rl9(BTMewQHe)#-tAm|~^YM`n
zh3lto?%LGS_nKhur{7zd>soMs(^c2s-*D8tKYq`#4%iswT3Q!PJwEHzKIQscA5A^$
zVK~_J;Cs*Z+LO};VEfHEZbR_>_^IbTvJu!A+R|=gu<f$1Yy#F#-Ld%llUn9#Q?O&n
z`q&I^PLnCx5_bgHw(%QT`?-#^CC^b{V`xv?&A~p8$!QC?e(L7ro}s3{d&HLD^6#h7
za5d*U-@$GLS9gBhH^zXy+&8psO;K~-5GR*y!TI~i-`?c0Z3p(*GADnBljrZJ?ZN*2
zWE*YCTiv}beOb%1XIYnfp?jiy5w*El%MYo!wS0Wdt>t$xNBUbQ-yP{IWAS%Ed2Hjs
z?knc$Z-nxUWoNM8{n<vFKK{<AZhQYm+Xd`*<oWJzSGbzR3lrHa-?MhB{mcDtcQp0n
zv<KLJvQ{R5^-<4vdV7NH%PZPmwPv5u_O3N^$ajDHpxIu3?Z$U(e)p%%@Ba3n_<tHc
z=k8}-{(sc&_zj`<f5XK-=l*ZFbM6{<4DKa6QuN7luoKw4CR4O!tp1;>?X%{02J5T-
zCNbq@F8==F<^NqJm$eJLPL2JaighXGwLZ1~S7P4#P|Qzm?(ZtN_M>Z_{b>()JO1*V
zC!ne446rv?E$8cf!CpRN+9pyQx4DS3b|!(%F@N9h2UfFqIbYrvbJphiHnwq$mpJ=_
z%Q$a?t62t7xTVYDXe;9wFZb+#3NPQA4}=@ny(8;s3RpicbJIsH^K}r|Hts=b^LDUp
zG9G=@(&imt=h!w`7YBpYGso`)t7VQ41$#Ni+76){L2+)xiG3J2V@T}7(bUb~`!9D+
zjA?u0r2V_W8Bf~32TeWg<z;*0#P>+BeI(vd;N)X_xjy>4E{~x&7d|)g_fniU&$im)
z_dc*|-RDNX_fzy$PhTGZr?2F5ESh@G3f*8e%Rn9LII!`wnWN9dag@aE0js$#5_cL{
zJwDUH`jq#y7p|W7dLP)ajiK0Hdu;t+ZKEmrdEYbk<H5$(*Rju_B#yQdDqH(^&kxo%
zpBMccqkGzk6#LMgoM(cY^EC@iJ^Ryaux-`dpXBBK<o8Wp!ztOHHYo6h1>OkkdfS9L
z`_oBa#}vM>;H!X7t$F5R9^Clh^TGC!b+G`fkGl8L_-gS#8EhWmr+~}$i{SdG+unTC
z;{PGAF~dI$*5|@{o=*eoqaL4+fb|LgXwB2+bhtk1IcqKk8%vwNIeHJzpm^WMQWjJ8
zqzq8Z(|OXKK0XF6^ZYnGIT-6qif!`y>nFhasAn$EsxaS8Qs=#PHrO0oU)o(0YUP?3
z3HRERk~Ohefk%K{2cxL-Ub+~Zb>Mx!yuq(+@EaQZ<_2HV;CD6ny$$|ggFoEhPc-;b
z4gO4nKiA-|HFy`3SdMpigO6zNu?@akgHLYoDFq)w-^bQG>*G^s-jnRRp9bSn?Yr7s
zQ`*(DUz`hW-^cbka~+*WQIF4OYM-gp_NC8fDe8H*oe#F1`pfk5IchK8ZQ3rNsJTYO
z$@xOC{pb8}5m?P)?q11TTiSgd?3zovFM!o7WjlSe`EE=9)*0Iu!TOYA`x0C|K9~GI
zK3_&t&-wN%VB4vu-KAh-?@r0QeHE-8pRa+NK9`}XC-&FDwo}g@{td8wYIA;k*3=UB
zo8V^LE6~*Ab0xTW@4tnnp1!XF+fF@WxEgGK+1IWC>!)rmKIdxjzYeSx{_UElkL%(3
zsN3FsK`nWH2kcnNd-PqndVFpKJC@StdvNvi^L?=G)RW5(z~xwOg6pSlF78Qc@&6%M
zE&P_6r;l6V`l#F9JxneBKLR_x@Y}%p4Ak$M+rj#%=U(3dRv(~b&X<DkpxjBZANNMJ
z#QiZ?E&L}nPo8(f^-+({Jq@3q!u3<P5BFQO_}>Rs3;!9o9NW*~jxBxu0<4dEv|obN
z@26xA9snC#oBg`4tEHb`fy;h>4KMrs4ZQ64w{U&b)5k+#<7l&w2dUN4$M3-9n*Du)
z{{de1@d#WW_4M&auyM56$HUZW>Eltb{pL50KY`Wuq_~GZPVMC$s_ik#`YW~fP;ufs
z33iUc{|t71vR3{A)<-?>{J(+;Rs9XBJ-&YfYs<QN0<4zUe+R22&eLGytU=M{IG>_c
zPn&;$%QpXn+h!<5Th_wAz}lXr80Q&kwZwTI?3&K^j4y!IVt)~A-{CKTjgkBDZ?Hb<
z(f$Kgcl{-g|ALLJEwNq(yT;P)6|jElnTuD!j%x%(yJL8cT0L?82QJ6(I$SOGH^Ai>
z-h?}b>;-Ru^-)hQcB)>EVI?$UYfJ1w;BpKdaQ)OXhE6q_IR@>H;We;&;;al-3m*)>
z7@inI;C!lcg0}dr0(L$!&#S`qRnJ&f0~=pk+N}=u*-8ANa6Z+!MqB*W02@Di7~DM5
z-<oiJ)U)2#0vk_Ta#$N|9@-LX9kBaB?!&rp{nQg{J+Qj>!FlsMsOH(@aWv09KEH{%
z0oa(yZ$r3k)#I}f_=dt~W4M0m@!16IK9c#_6war*huRW<GqCNoo1f=XwZs_-F87g9
zaD6f_n}hXHPhVSr`Be8>dtz+~E@O>`m$9~j>!Y4nTZ8#j_f>oTj^3u$H^QdR7)r*p
zE!Z6NaZH}E)sn|{;4+Wx;pUP1I1a3jdh*x-%%>WY_Qcu|T*lf7Ud9>^*GE0Eb_VmQ
z#-u%C+NIVVlRo8`c7>aRK90%16V#H&?qJ86eQ*!3TI>_Rw#gXx1nZ+7pS{5DLz%~Y
z;A(qQ(rzNyw%XEeU$A>f+D(GHX4SJ!_5<5ioB8{9jap*t4>tesw}H!i4}j~Vo_-Dl
ztLNP?8LXBVQ^EEVeo)QR{_Svm)Z_CGuyMi<u6f$L6RwYX#`Z3-v9#qL9|Go6J#X3_
z^Au|J#6A>U_H`IspV$uvn}6p12(Uit>F3>GK2<;3v+ukIY<%sBcO=+7Gh;al?tM~^
z&(UCWNStH9`l)BU?*$uQTiU%3T<(SMhwG=FSRVkZmwTaFxfec8KE@nO-Hpw6PWm_w
z-bD%TY4B+^&-kXp^-<3p_JRk9sV#HZ2j)}tr9Iyv_Ji$HyD`n_c(8eC^PV0{t(Kfm
z0IP-10B=S+`?viE;rghj{}aJ{s_$B~?}zUy@R?xSPNihbv(VJzGaGDwrOzC=dd55#
zY&-Szc@mgUwV!FvoXrE9k9NQFk(b~3jHZp(<`lp4G0!auyd~Io+E&!bc_A^>vp1a#
zSBw3W+8(|LZp?fib}F1t_4|o!_17ok`w(~lyME>@FLT}on^&3h*6`#%25ipTQYVLx
z5JR8L*GJ)esy$m<{7whkKJzyKS6fU;erJGfug(1A$<H-0j^Z_zVopP-x2v)9yFJDH
zcA(DpXlD|`w&Ca0Jb8TrZoAC+S#Y(C=WH;a>Wr?<wQNlF#P}q*jPWVBT4H<}%%?iD
zYcqy<s3*p`VB?0L2i7P2Gc`}1pM|T%em>atc_u#xHl}f+T?n?FzG@dxUqsPYo3Wg$
W&x4)U^Xq5v3vm6^y)Sa_%l`q6m_iEx

literal 17776
zcmZvi2bf+})rG%gW|APGmrw)}iUb3ZCcPv93@Aub0Y!#MCKE;`nUF%UkwFkdK|xd$
z8;T%JKtK=>2_Oh6f`urE1}q4Q2q+~q-}ioZC5QX}_nGsYwbtHepMA<b_sk?6!#5e(
zY7J|x*c#Ou+0m+=)mp<*T4*Dxdg}g%?muzf{N9P%zkO>RR&KQ&^;xAgy46W-nbp%j
zo2vCS43|@`q+CzAo$^!4!<1ENt1q2`I_Rlw??NqObm6fQwt2mK_0OF?Fk}4jecSHZ
zx6Q6dtuETFNIj><qp9asc!kzl)bn}|8C)>CcSi4&!4s=?wmrUYdmGz!b?UTT13azP
z*Q9R88P!@JJg@iA88c_im_5G;vMoW3u@3gcSQp$wQD2X`9dBf7EO=h;J_7^&bLO=N
zU?Oe54Y8&DM&KSAIp)o(HhtYa19Oh)8P9-=F>ivcm*QM)s^4JWe5$J57V_S{?s@a)
z4jwyWc|&vB99z5Hh}L-hDxdBX`g`Xe1Eu|_)<o)_IdcY1>Yh6|zh{2`;OylyZj3Fl
zxAX06ZKdz*nFBMbwjHg*sgJ9%^&BR{HVeShYJ3vd&eYGR9;op})U#@Qg?QVx&(&Zz
zlm0h~YyB4R^jg0|yj|Pp9&mDa0NfsLM{6;7CdD|9QYVMU!SibUNoprjpQowkp4i4q
z#U!rYHH^Ln+jtB(aW(=c&IGVy)_(_Z;!F|Oai)qHus%nLYyZ>0(;@1g0?(^(XX^s+
zvHi0h4z>~1yVcjN(%m=EGt;ioJ6jjx7oiKq_iu2){O;+41B2~8I;;1wuU72m`}oYC
zabgwM_q4BjU~qcRK=<I>?ghtJaXVT!p||I<vvmu2`JU@&ErPfAdRO(1_jS+dncFj~
z&Z(pI6YTB2yQ=p;_HC=!U9I05ue}%NFy+-aevdY%od>+DIy3r~?a8*!5`FBcZu3}!
z|Fyv%Z}2A?{Fw%SuEAf1&+X~$U%<%BY1r`pn|CK#nbU{{U%A0oeFNt_fw#|quGWU&
z=Ddv4f7!frwl;r5`_9%D4ZdZAPi*iV8hocW@Q&7Gcu&0t_6AR@=XxqQ2Z#Ia0C+j>
z0~`F{2LB+ufA;Jd*^@`Z*{3z{YE6gFo8L2cem65-_t}RwgAK3Zz8!$4{U8`i|7>1z
z^Em;&{A}uKo!aJ{i^QzsonC9itlFFj@1Nb>Gna$K7}vwg^}bPFpNfR2d~QaY*WWv%
z{qA?Q7RleRmpfW_qUDUg3!HQ79(}7g9jyn!Jsfn7>ml&83U{`i0hjaoT!TMvJHHWi
zJ})%vFBN<$vHd2mjPF#&C-(c(S#!?{xsP_g+a>p$7}qvecKOC|x~RBkhyGL*_j@mo
z-EY6Vqq28Y?{sqUPN1!evIZ?|GY(IE#cP8`P>-dIA-9p#a&2Sbqo~y^@)c^V&p4vK
ziCV0mHe;+(YmVPGt5VxueH`AaQKyaVw5yM4=-TXKgNClQajn^>MQsAL{Un#Q=z9nF
zSc=btn(x@A6*rGvYQ83;_6`%jeZj*hTN2y2$58aO=wsU<N~^VS&+tVRpEN1_zKXY&
z@u%R6uYF{h|9CpM{IbjTi2rVO*jl*Ee__R2d#*!&=Yq{6Yx9+wx5wDo`YK#^`L6iO
z-HU3*SN{g&^@4vJ?tHmUmx0|Eu9NNM?p-zQSAh9hW|zB{)z)Pdt^znG^708^pGa3j
z^fTTywL#7P<nDbR?bm^&j#<0hcR?-lbu;ANXZJf!yZQKTXb<<@kZ1h&Lo$C4)HZXD
z{XsOxYkTd!Km07S?@nX)b^MQj`)(zlA^5|q+%qZs68NPf+uZrT2A=+`7gzrKP4c~Y
zrMB7s4n&1keNk08y@+C8=H}X8K*`+5FD$t4gm;VB<=!vCUGqf+_Z{>;5xf50AHwzb
z?oje21=rtuLTUG&P;&1H;r8b}A>8)QG`RPK(*AOT4`YI2H^0t;8_zpJ>~il2CHIaH
zZhP+t;re?|2-iNY!M!8IuD^GLaQ(d_l-xT)xc=S|O70yY-1d_jeD8wW-n&BS?_Htf
z-W5vjT_N1|-W5uIbc1_WDDB=8O71<O<lYrZ?p>ker#86vgwpOkq2y;bxc7$A?!BSp
z-Wf{1sNl}$odtLN-Wf`N?+qpQ?oe{?4<+{wQF8APCHD?da_<l&_YNW7o%h+Z;p^;g
z&+EH+%jMq$t9fSc!3p5|t``4qf%U(=_P+?M=9!)t-vO&9hG&FYVq6S1?q$UA+jj|A
z&2Quj^m!@RXDK;q`z}Szvq)?lzZc&F`z?@ro?lDxESEdRAAmi(52GLbZvg9~ep;>F
z2sVy3$LqI8?Nv@-`?Ftl&$XLt?3p%1yo~2Z@Nzu2!qxW0$MM_-_HjJgeoT3uVtjGN
za|hUO$$?~V{&#}aEIwA|;u)$f?d}4X?e2!FSy*mbEcd+!&F@qCxEHKu@v%CXx$!&k
zGl;Q#qW!$qO21z~j8ppES8K`#F&gLpmk?tb!#4Uj9=`$FCeyFqzh6<z(eI{XG?!o3
z`e<Uwe*@>I`rC)LA+WJr8*PcT7;G%Rv5ECNu)6+!Z{*sX^M}Ch8Et-h<Z1T@@WYfF
z$kToM2w2U#nd|%~u#fAk?T-{S*IArC9t9iQK9*3+)5o8|_Mz=DYWdEL*xddCo=9yQ
zZOKvnCHhET*6HhS;PmCUQ6Agh!R{sFJWegw=2|@oc3reRK`r08p8J1*{hOTeY^Pm6
z-w|!b@GSW!_(Dp~g{Q!3z89C*bNDZ~dfxqigMFMsZBJ9woI|m(UH@mn_GiC-!{xF4
z2kg3PdyZP}9NO-`;4P?aqb+%>JBR7ZT7Da~efz#<TrYsj_x(k<+Wpq_xs=++F=%^<
zvIZ}ram3l1uYhl-IQHcJDqP*XUZ$4E|244w&R_grhpXqkpo?Yig|;DljdcrHTVf3d
z8_S#%s{>74fA2YRZSK7;aNY~=Jo2;~2{u>TRreow=6e))CED9Yn?A0ay6xR3D}c*=
z@+P=ixldMvt7o6A1om-`wT-5zImhDUwhB1=#JiY0w$;Fn)117Q$&>f$VE2h_v?XtK
z_eJ`$mb>Qm?K)>%Yl6#tvKCy;v&eli2JGV)w5?6qm0}#RaeNoo1-n)`v)6;GS$r^6
z`FWSrmNi}<++5=~qp4@VYyh_Xc#5{6I+puITViblHdc9#jzv><ZoIe3&Ds63F*y5W
z99W)t-vs<Hv2u=X3RlZH`WCQ{^P+7tikkBxP9NjJ#<qX&#q#v=R<M0&^NuXfIXVIC
zIcghi`WQ#uwM<{u>1!)+`tpt~kL_(>=iWHpx8>PKTZ6~amu<A^<9<}Pz59O~aJm1t
zg{zhOe>=E(_W#?#KF*Q0?I~)`kvQ|PBRKni2e3T0ox!e~Ir)2lJbCW|cK_Q(Tk=+S
zj?$O4+`Vky?%#}SH*mTC_kgSU`>gwa619(G(6%S#Fp6=+#&Q2o0lW9I|M!BcS$r^6
z`AvpvOS^r*WxIF4)hzswLA%}F5N+A(`+~hQl;`q(XzI?xRBCzr-wD>gJeS{vrk?j~
zf3SJ_o@pDZ`}RH4X3oAh?*`|+IRGq=?I5sYG=~GJ<*^+Cb{yIcrk3Y>eJFT+VjItP
z+HL3e@I7G1<9%5EUa;Ci^7=%ry$@~-ZP^EEuDfgKdbw`KbUz;sF4z72aJ3^SuKSVH
zKCZjA4^Xb67)P9WJqqmlmFv)rrfweoE-8<H4_N<l9j2kF>wh%0JpR34{mXCQ3^aB9
zr&G)0KNGBfc?KSXrmlY<wLJd);O0L15Sn`S#j#*QR{KKR5ZGAm3vF57Sz!0Y==vTr
zyKXlHe{HrIpjOY?&H+0|;m3jX$-2%3>!Ti@d0>6Q=YyTMv{?YwM?GUW9-J{4Qy=pg
zq}G=CI1%g^%Jn%3O+7vzu6@ci{|K6T`uix@{`@yu+iSPK6R5Q%_m6?qZmjQ`9|x=D
zJF^h%<37}OGDXdOC{CPHz{VM>--l0t)rKhPSIze!?XA=PG_ZT=th)Ut!D?xLDp)Q4
zXMo*H@&6Q9E&iv2)#CqYuv++Mz{b3(9`{*bebjybK1ZE#Yx^wa%M`~gPMmYVjywEZ
zuyJp#<9;5jk9vI01D9>S05`W=>NZ~l>!W^2t$hh>ENz$8`}=&b`Vfh_Ue5E`aCPU|
zF_^D;r|++T%f2swm-&7bu8(@!d<|T-`8vGJ_Zx71)Gw~{{U+F0+LG^uVD)Cc>Sey>
zoxZ;Xc1(`#Olo;--v*a!dJ){Q<URZjSReJA2^WKXFZ^4v_N>!)!P+jNxc-+>`?&A5
zT}n}NKE=t$v6y>o*2(X3@R^kCu`9r83rQe8SHe95GDqJ7>!*G*e)i+tMBRRlq&A1E
zsEy&b<Z5vFE%`oN?HY<>xsKY$v1q%NqGlX%&Xpg4Z>O*FT)6>F-8`<RmdF2xVExOp
z>LxUG{cog}$Nv_v{^hsiM`-H$-%Kr!|E*yC%Wuh#(bRL!-3G>``Zj1A0vpS3gSN!F
z9o+oR+<~U9{~~I+bLt)AC*a&M?gY!TW_N=fTlUmX!RFw<?P)XKUDWDnb1%4T^E0?@
z{Jl?`ZSJ8~Pn(~EjT8P0us$b~qxX>ez-qDovbKlc54Zh6ypxXU0kAP`7wuPI+v%%z
zE=m0wtgkj>+4eVJ=i!_>)^EZ3shhhtwe<Zvu<LmS{^t67@M4O({tr^Cxjz2Q_hGOx
z%DwalxO#jZseQ`*`A0PM?1w*romanmw%4AVmVoUy_qa#F{yU(0z9Wx;jiD{={tUKV
z&XvD_^;35&4^gXSuKo&kEZHA_1Dn$ninhdk9BkY8{T-~I`$${zd;)9??P>cY*!M9x
z{R6C@x;c4fsOj$+@lSC1`_of!HRpTT-<ja*&adajzrjA98~*}7O;PjQ5GR*s!TJ5^
z8L&LI|A2kB%;`C5d47NTFSz<U6ScPFt?pTuzO3cfQQNn5c@`d7^GUVcTKkxqTg#U+
zC$_hi-&S+$<ohDN8OsZ{r^L1t?73o|FM;J5%gf-+m@C_8)8~0=b=&(p+N<CzX@4O9
zvil17HHw<W2NPK>KeJwk=wF_Hgl(xOr(tmW$zB=mFKiU`{5`q@Y+pXnI%~~7qjlAq
zuK9cP2sGR4uif~L?Q$Aw^WOk=qWJHkzUQ82KK}oM9q}7Q?f+R6`=0xM7F`4PxMT2a
z*_xtH-h*wx<~4<)Eo1fnPuV_uemk(f>aP(~UgqN8Uw!<44asHo0<TeH|IO3?18H7s
zQ~U3@=Dj<`{N(2T?t*JSyyiKdc7nI#FVA^rH1*s8b_1*Be!VBy$9GKI9u&uIF5>K+
zNnmr#@As3zY8D^o%XKklZSHSl8^?HwvlqCGvo~DLGMvJaE{mhBjAOj4**+Csem1`Y
zZd}if?5C+<{d~+#AGOTaeqh^p2BpnA!M4eG^ifNjcY&Q_+hkwt4_40{9{^U%93KSs
zagMbeNI8_^+=vtVU~tBe*oUC0o4@NXcTS9Hd*h`2d%zh_+P@b~J?-UXd*j6SePH`Y
zyu-lB$M$l4^mkt#L2)j8Z{+W%IB(u<wZ-oPVE4N3jebW`^i@w^9|Whb<Z~37dhQC{
zU^UB79qVYY@wAzv@5IrR#O(pAxi1oT8dyC()4}?bYuXD}&v$(W*s+bL*j{^VePC^y
zQS@`&GxnKaW9#eK`zeW|?U>5e{&$ZL)i&Q3{T!ob+OZV-(4L$Jz|HxZg{GeKX*Srl
z>Yh*X@_h2|O+I5NIiJ=k@VW(F5A1$>Gj-0V<G_w7d_loi0H0Lz%*9-|@x$kV?IZhQ
zK3E@h*V6cE@jo7H9^of|%l0S2^-;II`KZPJ!(d~Ee*~=0dG$Ph6s(VWd_D%&C;a0z
zPn(nB`l#owxe#nDZKFBKUBgo-uKNVaLdq_bA&PlAPukPRC%|Q%r^1tiu}-7d=Ir_%
z_#{{#^~~kz73Oai)cLNR0X7Hsmv;ArTDd32!hJTNWKV2Z;EllUgK^aPE}aj~K5*SH
zYVa!>{OSh3vB7U?@H-m(o(6xQ!5269qYeIegFo5ePdE6|2Jd1L%khqB@QoULLWA$n
z;8PlWYQaa*_fa*^{`fSSYm#&KGhke*b61;tO1pZ_i?hJ(b!_i5_tDuD_4s_Y_Su)(
zzV!JVMLplPbHKJ!e}R5JPwnHkP20H?HTQ@(IiCl%|J)zG09Lb@yHE1gmUdqRyXVsG
zOJFri*-jsAe%sQ&b;kB(us-G3z5-W|&jtUF&sWjZbHDu>*mmk^_jRzbccf(Az5!N`
z&o{wMp9|5{6Z>0W+o|Ua|2Eh@wK+e&Yif!69dI-5#c1mBxdhx?`|qNur|(O_wo}g-
zE(6<N&b7<I`l*|X@3~t1uLP@wf3N20<0`m5>bCb>P)nZIfE`P@M&F04$LCtGV<~;E
zgR7^X>%q2DPcA<Imt(mBuAjQOcqXaE|A%0;@SAF$K5mBVqi%c8Ftzyq2<-U6Zw2c!
zRDWxJ4Aw_IYkeD7eTb4dza4xVWf8@GJR8*#_fD`{_)lt{Jnw?*qaL5T8$LgU>!)rX
zp0{f8zZa|){xfhnwx7ctTl)M3SReIhzXYq_N68%A4>q<o`}JH`OFzE?m;L@4UiSMN
zc-il7;rghjj|aiV(PkeHP^+bn-+{|L`}+oe2wwK_FkBz?^zjF<akSaTVrsSY@d(&{
z^KTq~1gq^r@eEx;?c*7$?N5}ohqcd8apF7%c8<dT40e99SN;OlM?K&Czk&%>{SK-<
zzJCL2%f5OPtd`h+2dgE{6JX=4Owr~zAE#DNn}2}IHvfd%W+jTY?1g`UwLL{K&Xd$?
ziSuu;dpdt-d<Lu*`?FyC4u1}8jI76h!1}01`!87C{g*tR2OC>kV!Z%%kEPv<VExoH
z7cYSw*G3fWj^Sx)^~8A@T#n%txLWM5g3B?y26qfO3tk87qn=#sRJ|O-Ff?OpOYGs`
zats}C{nRstPBof22JMbvDOf#mMu64AN5apCC&nl^Kh-@!Tl`i4J0F?nH^KE)&sbIj
z8(&-6jRyPfB>qZpeyV$ow)m|KHh%ajaPv%mtHSkB&wgJGY&>nrVRf*1XiKa$z@7(L
zhc)5)sVCN2V0G8QdGkJ~=G|inns*=H-^5%8Y|P}hF5I^2@mUXib>Xu<TtD^rycz5{
zlKI&H&QG<5+7f?5u<f;*pZ8O>#2E`N&yjI(eKIc_gY{8QUz>pWsn%M1Vr>d8V{Ha6
zW4#5gk9uNl4(6v?SMB*ddW%|L51T&YDH+pS!RDZkWAcuzmOQoumw9XjH;=62M6f>U
z$>VKceyTBPPpqxMWvp%BWvp%C`lu(?c3^(0F=@}3wy$-^q)$1fx5LdrAIIeH32Mn>
zN3i3}Ik*#8E%u$kw#gWG0qdh4pIyP8Lz&0j;cB~4(ryp1ZMCJ{o?y?Aw3`HX&#Gsi
zOa|LloB8{Djap*t1vdZiy}@O^`@r>4Pe1PftLNJ=1+11B`-1H!e7~Bf{X60MsK@7B
zVB>`EU-Ps%0IrXE#`bQov9x844+Qg5y>Hqb^Hgf}#6AdI_H{5^pV$upn}6p1P_RDg
z>E}IQeyV=7=iGTO*!bEL?|op;%#7tQxa*`IpToiCkT^$x^;6Gy-w!svwzT^IxI7Dw
zgzKlCSRVwdmuI0`c@{1qA7hTB?#AXfCw&|Z@1lhFH2AcdXMEG)`lx3Pd%;7*)RsA%
z0p_RbOMCu?*ax;x?Zz~xnPBtM=9(Twt(KgR0jq`ggEyp|{oDRSaDCL%|FK|xs=u{p
zpN#Jc@By%G_oZaav(VJzGaGDwrOzN-J!75&ww-$VJPyoHb)IR@oXrKBk9Pm&BQO8v
zvl(rCHm3MDAM@O#z?*{oroDwaIWHiFdd{Zf;cBs;P}{>#gc~z|4?79YPxbF7w$)#s
zjPJwXA?*5@v%JiC3v51R&YQ!N|9G%Dzm+;Ue2f_SWWGKQ=chWewZ-pbu<bK{LvXc)
zl;n2`*!J4YPoDhT6B8*u6Da01ih9c$JHJ~|%<pZ~`8(Qa#ISAnr)r+OJ_)y7=KOTH
zTE=q*n4jv7uFbt{O!dS#6I{mlG+Zq)J_F{by0dFDhIyzb##vzFhMx`AC;YQDPoAHH
ztHpi}*!FoRKMyvhaiX0Eww=Ce=Td)xqOUe%Iagl<JFn-|@8Xx>`l-7va@Xa5^Xk~f

diff --git a/draw/shaders/generated/base_2d.vert.metal b/draw/shaders/generated/base_2d.vert.metal
index b24ba01..75fa3b4 100644
--- a/draw/shaders/generated/base_2d.vert.metal
+++ b/draw/shaders/generated/base_2d.vert.metal
@@ -19,6 +19,7 @@ struct Primitive
     float _pad;
     float4 params;
     float4 params2;
+    float4 uv_rect;
 };
 
 struct Primitive_1
@@ -30,6 +31,7 @@ struct Primitive_1
     float _pad;
     float4 params;
     float4 params2;
+    float4 uv_rect;
 };
 
 struct Primitives
@@ -45,6 +47,7 @@ struct main0_out
     float4 f_params2 [[user(locn3)]];
     uint f_kind_flags [[user(locn4)]];
     float f_rotation [[user(locn5)]];
+    float4 f_uv_rect [[user(locn6)]];
     float4 gl_Position [[position]];
 };
 
@@ -55,7 +58,7 @@ struct main0_in
     float4 v_color [[attribute(2)]];
 };
 
-vertex main0_out main0(main0_in in [[stage_in]], constant Uniforms& _12 [[buffer(0)]], const device Primitives& _72 [[buffer(1)]], uint gl_InstanceIndex [[instance_id]])
+vertex main0_out main0(main0_in in [[stage_in]], constant Uniforms& _12 [[buffer(0)]], const device Primitives& _74 [[buffer(1)]], uint gl_InstanceIndex [[instance_id]])
 {
     main0_out out = {};
     if (_12.mode == 0u)
@@ -66,18 +69,20 @@ vertex main0_out main0(main0_in in [[stage_in]], constant Uniforms& _12 [[buffer
         out.f_params2 = float4(0.0);
         out.f_kind_flags = 0u;
         out.f_rotation = 0.0;
+        out.f_uv_rect = float4(0.0, 0.0, 1.0, 1.0);
         out.gl_Position = _12.projection * float4(in.v_position * _12.dpi_scale, 0.0, 1.0);
     }
     else
     {
         Primitive p;
-        p.bounds = _72.primitives[int(gl_InstanceIndex)].bounds;
-        p.color = _72.primitives[int(gl_InstanceIndex)].color;
-        p.kind_flags = _72.primitives[int(gl_InstanceIndex)].kind_flags;
-        p.rotation = _72.primitives[int(gl_InstanceIndex)].rotation;
-        p._pad = _72.primitives[int(gl_InstanceIndex)]._pad;
-        p.params = _72.primitives[int(gl_InstanceIndex)].params;
-        p.params2 = _72.primitives[int(gl_InstanceIndex)].params2;
+        p.bounds = _74.primitives[int(gl_InstanceIndex)].bounds;
+        p.color = _74.primitives[int(gl_InstanceIndex)].color;
+        p.kind_flags = _74.primitives[int(gl_InstanceIndex)].kind_flags;
+        p.rotation = _74.primitives[int(gl_InstanceIndex)].rotation;
+        p._pad = _74.primitives[int(gl_InstanceIndex)]._pad;
+        p.params = _74.primitives[int(gl_InstanceIndex)].params;
+        p.params2 = _74.primitives[int(gl_InstanceIndex)].params2;
+        p.uv_rect = _74.primitives[int(gl_InstanceIndex)].uv_rect;
         float2 corner = in.v_position;
         float2 world_pos = mix(p.bounds.xy, p.bounds.zw, corner);
         float2 center = (p.bounds.xy + p.bounds.zw) * 0.5;
@@ -87,8 +92,8 @@ vertex main0_out main0(main0_in in [[stage_in]], constant Uniforms& _12 [[buffer
         out.f_params2 = p.params2;
         out.f_kind_flags = p.kind_flags;
         out.f_rotation = p.rotation;
+        out.f_uv_rect = p.uv_rect;
         out.gl_Position = _12.projection * float4(world_pos * _12.dpi_scale, 0.0, 1.0);
     }
     return out;
 }
-
diff --git a/draw/shaders/generated/base_2d.vert.spv b/draw/shaders/generated/base_2d.vert.spv
index c318fc2f5e7d89cff97175f6bdbe615b0a6f3218..ca08cba758c5e0e0184d61dc6f3b8bc35780228d 100644
GIT binary patch
literal 5008
zcmaKu33pRf5XT?14G4%RvZzqgR;+-ctb(W%Sqh3;L~u8RCaHlmsp$gl;J%``uV28A
z;g@nb$8-Gs-g{GGj>pUS_s;z9%-or|_rB1xaA;AIEJ&6ni<4iHY+Ro#gh}8E={$Pu
z#IY^YGnFkncHX1K@}ws<)aJV6`c&@a?_{~&R9Opd2K}H22Ehg}3^syIU@O=Kj)5Nj
zEoA<UfPG0X|E|XBgIR)pvRSXS+LP0WWK~Ex<*9b-Y;}C5-eTwA&8sq1FHMh^8`ac4
zi{T!Vt%?*;dy-`e4EHqWT4}u1Xth(_{O9KNLRw3$)Myd1)M}Sz=W@SdYL}|`^`*7e
zN>k-_Im=bf+JoDc`W4b%)Jo^-%}S}(C{LtyX={IKrFLtkoYf>}ziMn*YEx<Me>d|)
zqcm1+pQ^TJsuxhx`sE$2wWjN7kV3L3k+&}?UvJKwsK1)_A=c~-YX=(jse|?DnR0Wy
zO1#X^({^pP(U@Pef1aVk96igq(o}4jatGkX+Vx4Yn2U3ebDU|-HY-$Jz*5fKvhHVf
z$eB}Thdf{T9&oioS7Uf@XnlogK5id!oJQ<bnCG_5+wlmJ=&-dN(gs-DVdm~S<Q$!K
z$aU8t_g{6$|5qLI-F3)y*CAH`S#4Pza{sQQFF67?pL^QxLdrq&cJi}<x8{c>8GS_Y
zU2^6}o7r7B+N@-E%(rruweeQFS#4+iX`r8PwHp=BKlSfTX0@p{XS5k5wr|)z9Km1N
zVNFN)8@jo#=;DUZ9bspG<wiO<<H{Wbqr~@J4**B#gUI@-`_}z7In?7@WgK<aMA#R*
zbYr<5_8W5D^R!mKbIVh?H<h*ZOkMMR$C|Qh_QADH9c#+2dB(2!e%;SL#2a6V?g-9&
z<(z}lN6y@WQ<lqe>P_FSdpWf69y#o(BRJ#A#TjX5e7PGz*co5WJ_qMsa_c%c>y|5a
zaMmm54^8-bFXTpm=QGNz-B_ykU>EiErZVdM-mW?Hci78^na#~u$^*a=`e3G}y6?bV
zZ3Zi;)_Mns(Zj5~8p&LWU=8uiRo!~m@^>+_ocb#MUc+1jj{f}aYokxL)|>8U>?L3?
zKJM?GmNUk+%tf#OY|eG>l<P(G>vGw6t~=iV95-UMl3BYo%+_Ik?xSvB<Xpdr**#)i
z-M1oVt)8WM$G+>4^_4fCy7wyL-HNPltg9DuF7_Wp*WSF1Jp>$gl7)HP2IM*szP>|!
z&HeUVwszOmhjY&LJ961QTvy+ia}m!sXuQb9JCGz7_eCyuqD3CAWytz{7kXA}Z?Xm1
zedPL)e81Kr=bA7l@4WlDZVu}H#zqdFyZ?W1XuqA=-g|$vx1atFyKmf!-MOy)9%Sv^
zx1ZZ}fZ06kk&qX10sRi<y7u}V&h`KETTj-b*lF93e1uuuct@GdL(aU_eTQ<cKf-Jc
z*6X^u?@-S5$C$M>w)zvyo%x>3b@PipJd11}wSO*Ox7JvH5!p8ydKua8Q|J}sPWuUT
z_cUHTm%Z28Yv=zD8T)n-#axZ;Fo!>2jBRc9_7E_Zy0Ol7>5VSk-&XDS6UDc_0{DGc
z4jv$eb^E_^`CqZ|z59KZ^E>c6S*&Ju{Dr-1tH49R+?0*u9=8DB{Db)TZBTd3S_eQ8
zxaK=oR~}@(Eth?hArB+Fzcw40hjQ6>=3BBJ@AM8}J>GTq@_z3G^0D{5$li5*yw~>u
zeZ245cm{ic{AFy+_kN(C{BbyK9su%=;2uPlTL8xBf_>Nrv~|tAed||<;o1l=KW)s%
z@92Ku+qDMkRUZL=(qDUTpY6A~de0vP$H0EF@_s)Kd=vV+rrlxYG2oiHy7%#%V@mga
zg82mSPQ~8p+UeuncoG=L`powv^C_T@{WD*6?ZW3NWPKKb@Oc_pAMc$$>e}gJZ=V7B
zc<1$bj`?|@kMB_*b?w6E1!R4EFX2-{*2g!hkGgjH*n`tRpFW_EJvamODFc1fPlNCo
zN7lzY!>5X@PX*|su3h-lko9rT@OcSYp9!Fkx^{6-&mntHUGrV5hwmh^zMhlz#%%)m
z$I;d?w{lM25oczNQ=r@b<(v=yHoBbf?0P`tHjQjf%Xne#H3N*l1-P%i+L(KT+4ac%
zJaYHm7vSV$?^lrJgrhU}SJAuuUxSkm|JRY_MC5*n^EKBufbq47_x)zh&oP_xEVG<B
zd&l1b?ztP-1K+|$Ais>8VE-=x^Be&efpOL{%SYedMz$ZJzk}>v@8r3?i>!^jd+%X>
z4|IQ@-p~2S{{wV=<=wZRSuT41A+q_GfqR?pM?hZt!CVjfkI}WSfUy4rSzh}M%<9pf
zPm#6vJ!@}&J_GXF59hk}zU9k6d*8D51?JCzy!IP&J@WYiUHb_T`CLJk*M1YTy7t?d
zzXbMd8_<3+^H)G#-FxsgXaIfe^EW`BIAe4AHs_<y-=XU#AK#ttk>wr1{eUdzeTd(I
sACa|nZ3t`zSJ>OL_z9SwHs-U2`Dd^dSc5gHYuEWZup0eucG(U70ijBNdjJ3c

literal 4716
zcmZ{liFQ;)5QZ<ANkl<hki{(yi4hS+WD`*!0fe9^LEN`7$xPzFWF|~P+!e%q!QF@O
zF?=bPb3DiI>w7DVbFg#%>8ii0s;jH3Z%Tc`*JRm>Y*n^0`!&nQO<5mI1~-uAV<%3Z
z*tW1(+qP@>b}iOqrDUkh4cU!JF7rD#*=VV31h;`fPys_=1Z)Odz&5ZGJON7l^)dfW
zu>P#f-<3H1FxR0!*J{kPJ97(&WHgXC<@rwgOnrK>(Pr1+&8arus4h%THtWegkKrD3
z?V1!(OWA4#hI^XtOm({5Y<H4w?n}%1Ksv`vwb>?QwcV+nUn=}6$u6n*^`~>rROcr<
zlX<Rko;$c5$!{RlVWxVv(W+Hvnv=8XT-sWjnQEuKIGLZRpHtlrH`}aE)H~1DJB#(V
zQPWxF9j>((8p&ZGyC&INgTyyli>Dj!rZdNybztpqvoU|9v9LJVnywQs_w!a=JKt<B
zuURYa%VCb*+fsEtwoKd@+(f4_M;1#_b2-OU`+Td$2?wy0Gq=3b`8njwsk<89rM&rE
zsm7HU_6TLJF)hdKM~=$JUIWYA4t6hPht+}SIKteUvz((lXSv>-<*t^q{MB-nFN5x!
z<$80L`)|(u*-^OVJX0M95(iEH7d40XeB+Xge(uw^#a$k6<#&9%Rm*$Fe)o>1+nrXu
zlh<Mv{jGMVS@T|#e>v4@y53sU<`}Vkhn{&He`SaHkMkQkACmMn=;nqsWCUj|<wm<W
z<H~vdW5oAuZUT<bhmiGE_r3aUaj3_)#yIM(iLkHq=*DtA>_-aSyYsAmr`9C7oMdgi
zKi7Ptv8L>rb#To$X-v-=?{gKpBRF%Ca}G`)Irk4vSuWqVoZg*#Ikd4>4(sL!&bV@Z
zm4h?B+$~+4@#U;#*twV79bKGfldE)b_Nbiy46OI#%yOeiFQxu0A=@kVuOs#eU4Msl
zvu|BDmhvXx2z@A5lWt#Im#tte)%I+gh~a+9>ygZ*0yYrOo>cd28~I(yET_JX-)osG
zz%f|#vNrnUYvr_`vF-Cm@o|59RL&UJGgrV0u(i<b5!Wl|Hx#n*Tz9@1IBvmeEwgqT
zm_3L2xsSSak#qf4X7`A7b$egVvwCml9sAyitgpQB)a{{&cNen0v94YzxY&OPU3>F3
z_AqeVPZs8JH<0T_`1-E&HTQc8*|WQ@zPaFBzqgRx!*%s71s65D4?WMfobF#+xj{1X
zy_t`ky&;VI0JAx`Zd`T$pUK5tJ&5ek-k!F8_JH=*WG8Unu-{$i+V4TuKHoR5(LVGn
zyKr!jn;t_lFV8Ks6LP{BhYDF+eU23R|M}cW_P#mo29b|4s~hVWv-!xGzxsH=xqcjZ
z1+ZsbSNBcHxqgCKe`BjpFn8yAy3oxn>hU6S-1AGtx@U~_SCPwIdKJ09OP@m4-u-IK
z?q^TxFMkd&4(`Fr+>GUL&p%;|<yj6f9|roX8>`-<&-Uo^g+4|U-|3COx9$7gPYlm_
z6OjK68{ewmQ90l6A7rtf+3`2_uB`*3z}%FLV@|gN-|GQ<eAnu(dDcy!0$lTrsw)pM
z-(AQfUGipR_t$0%^DyuX+nFB%p25C0zP-Ey$j5%WknL-I?BR!jKK8aY)_pIKzl@DJ
zJ_7WUKLw}Fqd?vf++)acE5HON_w4(Dwyv40Z{!DJxOM=TpEl;>ckdwZoq7h(sIJ{#
z)YN)g+fin7wU?g+_VE~5*~3o(`(A(7v^&E5G;qya-TP$0F{QmHm`?%wDE3y@P9OW>
z8DJdGXTH9PXMsM}&wSOj3!mqZ_2~oQ^E|RXYk)rL+Ua9$UjX`)fIhyhmw`UMIepZ%
z3!hhz_3@2_&uhr~ybAPD*G?a6@H)`PcchOscmwF;Th&KhyYQJr*2g@<XBt_bDWH$K
zcF}V)$o8CTzDxD+eG^$<Yp%U{HGup{H2d#N!O1(K{^olY^!hgoKK$p<<%IW80+Cw_
z*_>AMTHLD*jK2-IufE!t`z*8Tk$VTZckcx_`Ph3ASxz{*b3c#X>%RmiAO3G4%ZbSS
zB6nr3Zv*3N6L0mMf<MP>&hyN2=6ss@UErR(fi<wN-vjcixmD}`J}}Qw;F@tZGRsHZ
zK0vk}p?`?%UYCm8K0?+;-o5uQUjVB?>}?NS1Q$T$|1q+@^6opxEEhGuglzuyr+b_4
zCqQ2Np+XP)PtmoX0%89dvb^>q%<55}&ylrX1GKk3UjTXSHy670zRNFx_P$H)J=0e}
zUi&SDuKjl8uYom)eqV|FO~LE$y?zU3fj-vlGU%<_cLg7H`yO3C`S>RMfGqC_?nh)f
w?>D~DKOt-D+A!Fc^!yv~GqU+<V?G<0e*wM^&)^x=wd?*yuSfrfUG{>10j14jZvX%Q

diff --git a/draw/shaders/source/base_2d.frag b/draw/shaders/source/base_2d.frag
index e6af939..cf301d5 100644
--- a/draw/shaders/source/base_2d.frag
+++ b/draw/shaders/source/base_2d.frag
@@ -7,6 +7,7 @@ layout(location = 2) in vec4 f_params;
 layout(location = 3) in vec4 f_params2;
 layout(location = 4) flat in uint f_kind_flags;
 layout(location = 5) flat in float f_rotation;
+layout(location = 6) flat in vec4 f_uv_rect;
 
 // --- Output ---
 layout(location = 0) out vec4 out_color;
@@ -130,6 +131,23 @@ void main() {
 
         d = sdRoundedBox(p_local, b, r);
         if ((flags & 1u) != 0u) d = sdf_stroke(d, stroke_px);
+
+        // Texture sampling for textured SDF primitives
+        vec4 shape_color = f_color;
+        if ((flags & 2u) != 0u) {
+            // Compute UV from local position and half_size
+            vec2 p_for_uv = f_local_or_uv;
+            if (f_rotation != 0.0) {
+                p_for_uv = apply_rotation(p_for_uv, f_rotation);
+            }
+            vec2 local_uv = p_for_uv / b * 0.5 + 0.5;
+            vec2 uv = mix(f_uv_rect.xy, f_uv_rect.zw, local_uv);
+            shape_color *= texture(tex, uv);
+        }
+
+        float alpha = sdf_alpha(d, soft);
+        out_color = vec4(shape_color.rgb, shape_color.a * alpha);
+        return;
     }
     else if (kind == 2u) {
         // Circle — rotationally symmetric, no rotation needed
diff --git a/draw/shaders/source/base_2d.vert b/draw/shaders/source/base_2d.vert
index e72aa3b..a43b51f 100644
--- a/draw/shaders/source/base_2d.vert
+++ b/draw/shaders/source/base_2d.vert
@@ -12,6 +12,7 @@ layout(location = 2) out vec4 f_params;
 layout(location = 3) out vec4 f_params2;
 layout(location = 4) flat out uint f_kind_flags;
 layout(location = 5) flat out float f_rotation;
+layout(location = 6) flat out vec4 f_uv_rect;
 
 // ---------- Uniforms (single block — avoids spirv-cross reordering on Metal) ----------
 layout(set = 1, binding = 0) uniform Uniforms {
@@ -29,6 +30,7 @@ struct Primitive {
     float _pad; // 28-31: alignment padding
     vec4 params; // 32-47: shape params part 1
     vec4 params2; // 48-63: shape params part 2
+    vec4 uv_rect; // 64-79: u_min, v_min, u_max, v_max
 };
 
 layout(std430, set = 0, binding = 0) readonly buffer Primitives {
@@ -45,6 +47,7 @@ void main() {
         f_params2 = vec4(0.0);
         f_kind_flags = 0u;
         f_rotation = 0.0;
+        f_uv_rect = vec4(0.0, 0.0, 1.0, 1.0);
 
         gl_Position = projection * vec4(v_position * dpi_scale, 0.0, 1.0);
     } else {
@@ -61,6 +64,7 @@ void main() {
         f_params2 = p.params2;
         f_kind_flags = p.kind_flags;
         f_rotation = p.rotation;
+        f_uv_rect = p.uv_rect;
 
         gl_Position = projection * vec4(world_pos * dpi_scale, 0.0, 1.0);
     }
diff --git a/draw/shapes.odin b/draw/shapes.odin
index 5a8b929..cca0140 100644
--- a/draw/shapes.odin
+++ b/draw/shapes.odin
@@ -68,6 +68,19 @@ emit_rectangle :: proc(x, y, width, height: f32, color: Color, vertices: []Verte
 	vertices[offset + 5] = solid_vertex({x, y + height}, color)
 }
 
+@(private = "file")
+prepare_sdf_primitive_textured :: proc(
+	layer: ^Layer,
+	prim: Primitive,
+	texture_id: Texture_Id,
+	sampler: Sampler_Preset,
+) {
+	offset := u32(len(GLOB.tmp_primitives))
+	append(&GLOB.tmp_primitives, prim)
+	scissor := &GLOB.scissors[layer.scissor_start + layer.scissor_len - 1]
+	append_or_extend_sub_batch(scissor, layer, .SDF, offset, 1, texture_id, sampler)
+}
+
 // ----- Drawing functions ----
 
 pixel :: proc(layer: ^Layer, pos: [2]f32, color: Color) {
@@ -358,17 +371,20 @@ triangle_strip :: proc(
 
 // ----- SDF drawing functions ----
 
-// Compute new center position after rotating a center-parametrized shape
-// around a pivot point. The pivot is at (center + origin) in world space.
+// Compute the visual center of a center-parametrized shape after applying
+// Convention B origin semantics: `center` is where the origin-point lands in
+// world space; the visual center is offset by -origin and then rotated around
+// the landing point.
+//   visual_center = center + R(θ) · (-origin)
+// When θ=0: visual_center = center - origin (pure positioning shift).
+// When origin={0,0}: visual_center = center (no change).
 @(private = "file")
 compute_pivot_center :: proc(center: [2]f32, origin: [2]f32, rotation_deg: f32) -> [2]f32 {
 	if origin == {0, 0} do return center
 	theta := math.to_radians(rotation_deg)
 	cos_angle, sin_angle := math.cos(theta), math.sin(theta)
-	// pivot = center + origin; new_center = pivot + R(θ) * (center - pivot)
 	return(
 		center +
-		origin +
 		{cos_angle * (-origin.x) - sin_angle * (-origin.y), sin_angle * (-origin.x) + cos_angle * (-origin.y)} \
 	)
 }
@@ -384,6 +400,13 @@ rotated_aabb_half_extents :: proc(half_width, half_height, rotation_radians: f32
 // Draw a filled rectangle via SDF (analytical anti-aliasing at all orientations).
 // `roundness` is a 0–1 fraction controlling uniform corner rounding — 0 is sharp, 1 is fully rounded.
 // For per-corner pixel-precise rounding, use `rectangle_corners` instead.
+//
+// Origin semantics:
+//   `origin` is a local offset from the rect's top-left corner that selects both the positioning
+//   anchor and the rotation pivot. `rect.x, rect.y` specifies where that anchor point lands in
+//   world space. When `origin = {0, 0}` (default), `rect.x, rect.y` is the top-left corner.
+//   When `origin = center_of_rectangle(rect)`, `rect.x, rect.y` is the visual center.
+//   Rotation always occurs around the anchor point.
 rectangle :: proc(
 	layer: ^Layer,
 	rect: Rectangle,
@@ -400,6 +423,7 @@ rectangle :: proc(
 // Draw a stroked rectangle via SDF (analytical anti-aliasing at all orientations).
 // `roundness` is a 0–1 fraction controlling uniform corner rounding — 0 is sharp, 1 is fully rounded.
 // For per-corner pixel-precise rounding, use `rectangle_corners_lines` instead.
+// Origin semantics: see `rectangle`.
 rectangle_lines :: proc(
 	layer: ^Layer,
 	rect: Rectangle,
@@ -415,6 +439,7 @@ rectangle_lines :: proc(
 }
 
 // Draw a rectangle with per-corner rounding radii via SDF.
+// Origin semantics: see `rectangle`.
 rectangle_corners :: proc(
 	layer: ^Layer,
 	rect: Rectangle,
@@ -436,12 +461,12 @@ rectangle_corners :: proc(
 	half_width := rect.width * 0.5
 	half_height := rect.height * 0.5
 	rotation_radians: f32 = 0
-	center_x := rect.x + half_width
-	center_y := rect.y + half_height
+	center_x := rect.x + half_width - origin.x
+	center_y := rect.y + half_height - origin.y
 
 	if needs_transform(origin, rotation) {
 		rotation_radians = math.to_radians(rotation)
-		transform := build_pivot_rotation({rect.x, rect.y}, origin, rotation)
+		transform := build_pivot_rotation({rect.x + origin.x, rect.y + origin.y}, origin, rotation)
 		new_center := apply_transform(transform, {half_width, half_height})
 		center_x = new_center.x
 		center_y = new_center.y
@@ -480,6 +505,7 @@ rectangle_corners :: proc(
 }
 
 // Draw a stroked rectangle with per-corner rounding radii via SDF.
+// Origin semantics: see `rectangle`.
 rectangle_corners_lines :: proc(
 	layer: ^Layer,
 	rect: Rectangle,
@@ -502,12 +528,12 @@ rectangle_corners_lines :: proc(
 	half_width := rect.width * 0.5
 	half_height := rect.height * 0.5
 	rotation_radians: f32 = 0
-	center_x := rect.x + half_width
-	center_y := rect.y + half_height
+	center_x := rect.x + half_width - origin.x
+	center_y := rect.y + half_height - origin.y
 
 	if needs_transform(origin, rotation) {
 		rotation_radians = math.to_radians(rotation)
-		transform := build_pivot_rotation({rect.x, rect.y}, origin, rotation)
+		transform := build_pivot_rotation({rect.x + origin.x, rect.y + origin.y}, origin, rotation)
 		new_center := apply_transform(transform, {half_width, half_height})
 		center_x = new_center.x
 		center_y = new_center.y
@@ -545,7 +571,114 @@ rectangle_corners_lines :: proc(
 	prepare_sdf_primitive(layer, prim)
 }
 
+// Draw a rectangle with a texture fill via SDF. Supports rounded corners via `roundness`,
+// rotation, and analytical anti-aliasing on the shape silhouette.
+// Origin semantics: see `rectangle`.
+rectangle_texture :: proc(
+	layer: ^Layer,
+	rect: Rectangle,
+	id: Texture_Id,
+	tint: Color = WHITE,
+	uv_rect: Rectangle = {0, 0, 1, 1},
+	sampler: Sampler_Preset = .Linear_Clamp,
+	roundness: f32 = 0,
+	origin: [2]f32 = {0, 0},
+	rotation: f32 = 0,
+	soft_px: f32 = 1.0,
+) {
+	cr := min(rect.width, rect.height) * clamp(roundness, 0, 1) * 0.5
+	rectangle_texture_corners(
+		layer,
+		rect,
+		{cr, cr, cr, cr},
+		id,
+		tint,
+		uv_rect,
+		sampler,
+		origin,
+		rotation,
+		soft_px,
+	)
+}
+
+// Draw a rectangle with a texture fill and per-corner rounding radii via SDF.
+// Origin semantics: see `rectangle`.
+rectangle_texture_corners :: proc(
+	layer: ^Layer,
+	rect: Rectangle,
+	radii: [4]f32,
+	id: Texture_Id,
+	tint: Color = WHITE,
+	uv_rect: Rectangle = {0, 0, 1, 1},
+	sampler: Sampler_Preset = .Linear_Clamp,
+	origin: [2]f32 = {0, 0},
+	rotation: f32 = 0,
+	soft_px: f32 = 1.0,
+) {
+	max_radius := min(rect.width, rect.height) * 0.5
+	top_left := clamp(radii[0], 0, max_radius)
+	top_right := clamp(radii[1], 0, max_radius)
+	bottom_right := clamp(radii[2], 0, max_radius)
+	bottom_left := clamp(radii[3], 0, max_radius)
+
+	padding := soft_px / GLOB.dpi_scaling
+	dpi_scale := GLOB.dpi_scaling
+
+	half_width := rect.width * 0.5
+	half_height := rect.height * 0.5
+	rotation_radians: f32 = 0
+	center_x := rect.x + half_width - origin.x
+	center_y := rect.y + half_height - origin.y
+
+	if needs_transform(origin, rotation) {
+		rotation_radians = math.to_radians(rotation)
+		transform := build_pivot_rotation({rect.x + origin.x, rect.y + origin.y}, origin, rotation)
+		new_center := apply_transform(transform, {half_width, half_height})
+		center_x = new_center.x
+		center_y = new_center.y
+	}
+
+	bounds_half_width, bounds_half_height := half_width, half_height
+	if rotation_radians != 0 {
+		expanded := rotated_aabb_half_extents(half_width, half_height, rotation_radians)
+		bounds_half_width = expanded.x
+		bounds_half_height = expanded.y
+	}
+
+	prim := Primitive {
+		bounds     = {
+			center_x - bounds_half_width - padding,
+			center_y - bounds_half_height - padding,
+			center_x + bounds_half_width + padding,
+			center_y + bounds_half_height + padding,
+		},
+		color      = tint,
+		kind_flags = pack_kind_flags(.RRect, {.Textured}),
+		rotation   = rotation_radians,
+		uv_rect    = {uv_rect.x, uv_rect.y, uv_rect.width, uv_rect.height},
+	}
+	prim.params.rrect = RRect_Params {
+		half_size = {half_width * dpi_scale, half_height * dpi_scale},
+		radii     = {
+			top_right * dpi_scale,
+			bottom_right * dpi_scale,
+			top_left * dpi_scale,
+			bottom_left * dpi_scale,
+		},
+		soft_px   = soft_px,
+		stroke_px = 0,
+	}
+	prepare_sdf_primitive_textured(layer, prim, id, sampler)
+}
+
 // Draw a filled circle via SDF.
+//
+// Origin semantics (Convention B):
+//   `origin` is a local offset from the shape's center that selects both the positioning anchor
+//   and the rotation pivot. The `center` parameter specifies where that anchor point lands in
+//   world space. When `origin = {0, 0}` (default), `center` is the visual center.
+//   When `origin = {r, 0}`, the point `r` pixels to the right of the shape center lands at
+//   `center`, shifting the shape left by `r`.
 circle :: proc(
 	layer: ^Layer,
 	center: [2]f32,
@@ -582,6 +715,7 @@ circle :: proc(
 }
 
 // Draw a stroked circle via SDF.
+// Origin semantics: see `circle`.
 circle_lines :: proc(
 	layer: ^Layer,
 	center: [2]f32,
@@ -619,6 +753,7 @@ circle_lines :: proc(
 }
 
 // Draw a filled ellipse via SDF.
+// Origin semantics: see `circle`.
 ellipse :: proc(
 	layer: ^Layer,
 	center: [2]f32,
@@ -665,6 +800,7 @@ ellipse :: proc(
 }
 
 // Draw a stroked ellipse via SDF.
+// Origin semantics: see `circle`.
 ellipse_lines :: proc(
 	layer: ^Layer,
 	center: [2]f32,
@@ -715,6 +851,7 @@ ellipse_lines :: proc(
 }
 
 // Draw a filled ring arc via SDF.
+// Origin semantics: see `circle`.
 ring :: proc(
 	layer: ^Layer,
 	center: [2]f32,
@@ -757,6 +894,7 @@ ring :: proc(
 }
 
 // Draw stroked ring arc outlines via SDF.
+// Origin semantics: see `circle`.
 ring_lines :: proc(
 	layer: ^Layer,
 	center: [2]f32,
diff --git a/draw/text.odin b/draw/text.odin
index 7400b33..0a741b3 100644
--- a/draw/text.odin
+++ b/draw/text.odin
@@ -246,7 +246,7 @@ bottom_right_of_text :: proc(text_string: string, font_id: Font_Id, font_size: u
 // After calling this, subsequent text draws with an `id` will re-create their cache entries.
 clear_text_cache :: proc() {
 	for _, sdl_text in GLOB.text_cache.cache {
-		sdl_ttf.DestroyText(sdl_text)
+		append(&GLOB.pending_text_releases, sdl_text)
 	}
 	clear(&GLOB.text_cache.cache)
 }
@@ -259,7 +259,7 @@ clear_text_cache_entry :: proc(id: u32) {
 	key := Cache_Key{id, .Custom}
 	sdl_text, ok := GLOB.text_cache.cache[key]
 	if ok {
-		sdl_ttf.DestroyText(sdl_text)
+		append(&GLOB.pending_text_releases, sdl_text)
 		delete_key(&GLOB.text_cache.cache, key)
 	}
 }
diff --git a/draw/textures.odin b/draw/textures.odin
new file mode 100644
index 0000000..64f636d
--- /dev/null
+++ b/draw/textures.odin
@@ -0,0 +1,433 @@
+package draw
+
+import "core:log"
+import "core:mem"
+import sdl "vendor:sdl3"
+
+// ---------------------------------------------------------------------------
+// Texture types
+// ---------------------------------------------------------------------------
+
+Texture_Id :: distinct u32
+INVALID_TEXTURE :: Texture_Id(0) // Slot 0 is reserved/unused
+
+Texture_Kind :: enum u8 {
+	Static, // Uploaded once, never changes (QR codes, decoded PNGs, icons)
+	Dynamic, // Updatable via update_texture_region
+	Stream, // Frequent full re-uploads (video, procedural)
+}
+
+Sampler_Preset :: enum u8 {
+	Nearest_Clamp,
+	Linear_Clamp,
+	Nearest_Repeat,
+	Linear_Repeat,
+}
+
+SAMPLER_PRESET_COUNT :: 4
+
+Fit_Mode :: enum u8 {
+	Stretch, // Fill rect, may distort aspect ratio (default)
+	Fit, // Preserve aspect, letterbox (may leave margins)
+	Fill, // Preserve aspect, center-crop (may crop edges)
+	Tile, // Repeat at native texture size
+	Center, // 1:1 pixel size, centered, no scaling
+}
+
+Texture_Desc :: struct {
+	width:           u32,
+	height:          u32,
+	depth_or_layers: u32,
+	type:            sdl.GPUTextureType,
+	format:          sdl.GPUTextureFormat,
+	usage:           sdl.GPUTextureUsageFlags,
+	mip_levels:      u32,
+	kind:            Texture_Kind,
+}
+
+// Internal slot — not exported.
+@(private)
+Texture_Slot :: struct {
+	gpu_texture: ^sdl.GPUTexture,
+	desc:        Texture_Desc,
+	generation:  u32,
+}
+
+// State stored in GLOB
+// This file references:
+//   GLOB.device                 : ^sdl.GPUDevice
+//   GLOB.texture_slots          : [dynamic]Texture_Slot
+//   GLOB.texture_free_list      : [dynamic]u32
+//   GLOB.pending_texture_releases : [dynamic]Texture_Id
+//   GLOB.samplers               : [SAMPLER_PRESET_COUNT]^sdl.GPUSampler
+
+// ---------------------------------------------------------------------------
+// Clay integration type
+// ---------------------------------------------------------------------------
+
+Clay_Image_Data :: struct {
+	texture_id: Texture_Id,
+	fit:        Fit_Mode,
+	tint:       Color,
+}
+
+clay_image_data :: proc(id: Texture_Id, fit: Fit_Mode = .Stretch, tint: Color = WHITE) -> Clay_Image_Data {
+	return {texture_id = id, fit = fit, tint = tint}
+}
+
+// ---------------------------------------------------------------------------
+// Registration
+// ---------------------------------------------------------------------------
+
+// Register a texture. Draw owns the GPU resource and releases it on unregister.
+// `data` is tightly-packed row-major bytes matching desc.format.
+// The caller may free `data` immediately after this proc returns.
+@(require_results)
+register_texture :: proc(desc: Texture_Desc, data: []u8) -> (id: Texture_Id, ok: bool) {
+	device := GLOB.device
+	if device == nil {
+		log.error("register_texture called before draw.init()")
+		return INVALID_TEXTURE, false
+	}
+
+	assert(desc.width > 0, "Texture_Desc.width must be > 0")
+	assert(desc.height > 0, "Texture_Desc.height must be > 0")
+	assert(desc.depth_or_layers > 0, "Texture_Desc.depth_or_layers must be > 0")
+	assert(desc.mip_levels > 0, "Texture_Desc.mip_levels must be > 0")
+	assert(desc.usage != {}, "Texture_Desc.usage must not be empty (e.g. {.SAMPLER})")
+
+	// Create the GPU texture
+	gpu_texture := sdl.CreateGPUTexture(
+		device,
+		sdl.GPUTextureCreateInfo {
+			type = desc.type,
+			format = desc.format,
+			usage = desc.usage,
+			width = desc.width,
+			height = desc.height,
+			layer_count_or_depth = desc.depth_or_layers,
+			num_levels = desc.mip_levels,
+			sample_count = ._1,
+		},
+	)
+	if gpu_texture == nil {
+		log.errorf("Failed to create GPU texture (%dx%d): %s", desc.width, desc.height, sdl.GetError())
+		return INVALID_TEXTURE, false
+	}
+
+	// Upload pixel data via a transfer buffer
+	if len(data) > 0 {
+		data_size := u32(len(data))
+		transfer := sdl.CreateGPUTransferBuffer(
+			device,
+			sdl.GPUTransferBufferCreateInfo{usage = .UPLOAD, size = data_size},
+		)
+		if transfer == nil {
+			log.errorf("Failed to create texture transfer buffer: %s", sdl.GetError())
+			sdl.ReleaseGPUTexture(device, gpu_texture)
+			return INVALID_TEXTURE, false
+		}
+		defer sdl.ReleaseGPUTransferBuffer(device, transfer)
+
+		mapped := sdl.MapGPUTransferBuffer(device, transfer, false)
+		if mapped == nil {
+			log.errorf("Failed to map texture transfer buffer: %s", sdl.GetError())
+			sdl.ReleaseGPUTexture(device, gpu_texture)
+			return INVALID_TEXTURE, false
+		}
+		mem.copy(mapped, raw_data(data), int(data_size))
+		sdl.UnmapGPUTransferBuffer(device, transfer)
+
+		cmd_buffer := sdl.AcquireGPUCommandBuffer(device)
+		if cmd_buffer == nil {
+			log.errorf("Failed to acquire command buffer for texture upload: %s", sdl.GetError())
+			sdl.ReleaseGPUTexture(device, gpu_texture)
+			return INVALID_TEXTURE, false
+		}
+		copy_pass := sdl.BeginGPUCopyPass(cmd_buffer)
+		sdl.UploadToGPUTexture(
+			copy_pass,
+			sdl.GPUTextureTransferInfo{transfer_buffer = transfer},
+			sdl.GPUTextureRegion{texture = gpu_texture, w = desc.width, h = desc.height, d = desc.depth_or_layers},
+			false,
+		)
+		sdl.EndGPUCopyPass(copy_pass)
+		if !sdl.SubmitGPUCommandBuffer(cmd_buffer) {
+			log.errorf("Failed to submit texture upload: %s", sdl.GetError())
+			sdl.ReleaseGPUTexture(device, gpu_texture)
+			return INVALID_TEXTURE, false
+		}
+	}
+
+	// Allocate a slot (reuse from free list or append)
+	slot_index: u32
+	if len(GLOB.texture_free_list) > 0 {
+		slot_index = pop(&GLOB.texture_free_list)
+		GLOB.texture_slots[slot_index] = Texture_Slot {
+			gpu_texture = gpu_texture,
+			desc        = desc,
+			generation  = GLOB.texture_slots[slot_index].generation + 1,
+		}
+	} else {
+		slot_index = u32(len(GLOB.texture_slots))
+		append(&GLOB.texture_slots, Texture_Slot{gpu_texture = gpu_texture, desc = desc, generation = 1})
+	}
+
+	return Texture_Id(slot_index), true
+}
+
+// Queue a texture for release at the end of the current frame.
+// The GPU resource is not freed immediately — see "Deferred release" in the README.
+unregister_texture :: proc(id: Texture_Id) {
+	if id == INVALID_TEXTURE do return
+	append(&GLOB.pending_texture_releases, id)
+}
+
+// Re-upload a sub-region of a Dynamic texture.
+update_texture_region :: proc(id: Texture_Id, region: Rectangle, data: []u8) {
+	if id == INVALID_TEXTURE do return
+	slot := &GLOB.texture_slots[u32(id)]
+	if slot.gpu_texture == nil do return
+
+	device := GLOB.device
+	data_size := u32(len(data))
+	if data_size == 0 do return
+
+	transfer := sdl.CreateGPUTransferBuffer(
+		device,
+		sdl.GPUTransferBufferCreateInfo{usage = .UPLOAD, size = data_size},
+	)
+	if transfer == nil {
+		log.errorf("Failed to create transfer buffer for texture region update: %s", sdl.GetError())
+		return
+	}
+	defer sdl.ReleaseGPUTransferBuffer(device, transfer)
+
+	mapped := sdl.MapGPUTransferBuffer(device, transfer, false)
+	if mapped == nil {
+		log.errorf("Failed to map transfer buffer for texture region update: %s", sdl.GetError())
+		return
+	}
+	mem.copy(mapped, raw_data(data), int(data_size))
+	sdl.UnmapGPUTransferBuffer(device, transfer)
+
+	cmd_buffer := sdl.AcquireGPUCommandBuffer(device)
+	if cmd_buffer == nil {
+		log.errorf("Failed to acquire command buffer for texture region update: %s", sdl.GetError())
+		return
+	}
+	copy_pass := sdl.BeginGPUCopyPass(cmd_buffer)
+	sdl.UploadToGPUTexture(
+		copy_pass,
+		sdl.GPUTextureTransferInfo{transfer_buffer = transfer},
+		sdl.GPUTextureRegion {
+			texture = slot.gpu_texture,
+			x = u32(region.x),
+			y = u32(region.y),
+			w = u32(region.width),
+			h = u32(region.height),
+			d = 1,
+		},
+		false,
+	)
+	sdl.EndGPUCopyPass(copy_pass)
+	if !sdl.SubmitGPUCommandBuffer(cmd_buffer) {
+		log.errorf("Failed to submit texture region update: %s", sdl.GetError())
+	}
+}
+
+// ---------------------------------------------------------------------------
+// Accessors
+// ---------------------------------------------------------------------------
+
+texture_size :: proc(id: Texture_Id) -> [2]u32 {
+	if id == INVALID_TEXTURE do return {0, 0}
+	slot := &GLOB.texture_slots[u32(id)]
+	return {slot.desc.width, slot.desc.height}
+}
+
+texture_format :: proc(id: Texture_Id) -> sdl.GPUTextureFormat {
+	if id == INVALID_TEXTURE do return .INVALID
+	return GLOB.texture_slots[u32(id)].desc.format
+}
+
+texture_kind :: proc(id: Texture_Id) -> Texture_Kind {
+	if id == INVALID_TEXTURE do return .Static
+	return GLOB.texture_slots[u32(id)].desc.kind
+}
+
+// Internal: get the raw GPU texture pointer for binding during draw.
+@(private)
+texture_gpu_handle :: proc(id: Texture_Id) -> ^sdl.GPUTexture {
+	if id == INVALID_TEXTURE do return nil
+	idx := u32(id)
+	if idx >= u32(len(GLOB.texture_slots)) do return nil
+	return GLOB.texture_slots[idx].gpu_texture
+}
+
+// ---------------------------------------------------------------------------
+// Deferred release (called from draw.end / clear_global)
+// ---------------------------------------------------------------------------
+
+@(private)
+process_pending_texture_releases :: proc() {
+	device := GLOB.device
+	for id in GLOB.pending_texture_releases {
+		idx := u32(id)
+		if idx >= u32(len(GLOB.texture_slots)) do continue
+		slot := &GLOB.texture_slots[idx]
+		if slot.gpu_texture != nil {
+			sdl.ReleaseGPUTexture(device, slot.gpu_texture)
+			slot.gpu_texture = nil
+		}
+		slot.generation += 1
+		append(&GLOB.texture_free_list, idx)
+	}
+	clear(&GLOB.pending_texture_releases)
+}
+
+// ---------------------------------------------------------------------------
+// Sampler pool
+// ---------------------------------------------------------------------------
+
+@(private)
+get_sampler :: proc(preset: Sampler_Preset) -> ^sdl.GPUSampler {
+	idx := int(preset)
+	if GLOB.samplers[idx] != nil do return GLOB.samplers[idx]
+
+	// Lazily create
+	min_filter, mag_filter: sdl.GPUFilter
+	address_mode: sdl.GPUSamplerAddressMode
+
+	switch preset {
+	case .Nearest_Clamp:
+		min_filter = .NEAREST; mag_filter = .NEAREST; address_mode = .CLAMP_TO_EDGE
+	case .Linear_Clamp:
+		min_filter = .LINEAR; mag_filter = .LINEAR; address_mode = .CLAMP_TO_EDGE
+	case .Nearest_Repeat:
+		min_filter = .NEAREST; mag_filter = .NEAREST; address_mode = .REPEAT
+	case .Linear_Repeat:
+		min_filter = .LINEAR; mag_filter = .LINEAR; address_mode = .REPEAT
+	}
+
+	sampler := sdl.CreateGPUSampler(
+		GLOB.device,
+		sdl.GPUSamplerCreateInfo {
+			min_filter = min_filter,
+			mag_filter = mag_filter,
+			mipmap_mode = .LINEAR,
+			address_mode_u = address_mode,
+			address_mode_v = address_mode,
+			address_mode_w = address_mode,
+		},
+	)
+	if sampler == nil {
+		log.errorf("Failed to create sampler preset %v: %s", preset, sdl.GetError())
+		return GLOB.pipeline_2d_base.sampler // fallback to existing default sampler
+	}
+
+	GLOB.samplers[idx] = sampler
+	return sampler
+}
+
+// Internal: destroy all sampler pool entries. Called from draw.destroy().
+@(private)
+destroy_sampler_pool :: proc() {
+	device := GLOB.device
+	for &s in GLOB.samplers {
+		if s != nil {
+			sdl.ReleaseGPUSampler(device, s)
+			s = nil
+		}
+	}
+}
+
+// Internal: destroy all registered textures. Called from draw.destroy().
+@(private)
+destroy_all_textures :: proc() {
+	device := GLOB.device
+	for &slot in GLOB.texture_slots {
+		if slot.gpu_texture != nil {
+			sdl.ReleaseGPUTexture(device, slot.gpu_texture)
+			slot.gpu_texture = nil
+		}
+	}
+	delete(GLOB.texture_slots)
+	delete(GLOB.texture_free_list)
+	delete(GLOB.pending_texture_releases)
+}
+
+// ---------------------------------------------------------------------------
+// Fit mode helper
+// ---------------------------------------------------------------------------
+
+// Compute UV rect, recommended sampler, and inner rect for a given fit mode.
+// `rect` is the target drawing area; `texture_id` identifies the texture whose
+// pixel dimensions are looked up via texture_size().
+// For Fit mode, `inner_rect` is smaller than `rect` (centered). For all other modes, `inner_rect == rect`.
+fit_params :: proc(
+	fit: Fit_Mode,
+	rect: Rectangle,
+	texture_id: Texture_Id,
+) -> (
+	uv_rect: Rectangle,
+	sampler: Sampler_Preset,
+	inner_rect: Rectangle,
+) {
+	size := texture_size(texture_id)
+	texture_width := f32(size.x)
+	texture_height := f32(size.y)
+	rect_width := rect.width
+	rect_height := rect.height
+	inner_rect = rect
+
+	if texture_width == 0 || texture_height == 0 || rect_width == 0 || rect_height == 0 {
+		return {0, 0, 1, 1}, .Linear_Clamp, inner_rect
+	}
+
+	texture_aspect := texture_width / texture_height
+	rect_aspect := rect_width / rect_height
+
+	switch fit {
+	case .Stretch: return {0, 0, 1, 1}, .Linear_Clamp, inner_rect
+
+	case .Fill: if texture_aspect > rect_aspect {
+				// Texture wider than rect — crop sides
+				scale := rect_aspect / texture_aspect
+				margin := (1 - scale) * 0.5
+				return {margin, 0, 1 - margin, 1}, .Linear_Clamp, inner_rect
+			} else {
+				// Texture taller than rect — crop top/bottom
+				scale := texture_aspect / rect_aspect
+				margin := (1 - scale) * 0.5
+				return {0, margin, 1, 1 - margin}, .Linear_Clamp, inner_rect
+			}
+
+	case .Fit:
+		// Preserve aspect, fit inside rect. Returns a shrunken inner_rect.
+		if texture_aspect > rect_aspect {
+			// Image wider — letterbox top/bottom
+			fit_height := rect_width / texture_aspect
+			padding := (rect_height - fit_height) * 0.5
+			inner_rect = Rectangle{rect.x, rect.y + padding, rect_width, fit_height}
+		} else {
+			// Image taller — letterbox left/right
+			fit_width := rect_height * texture_aspect
+			padding := (rect_width - fit_width) * 0.5
+			inner_rect = Rectangle{rect.x + padding, rect.y, fit_width, rect_height}
+		}
+		return {0, 0, 1, 1}, .Linear_Clamp, inner_rect
+
+	case .Tile:
+		uv_width := rect_width / texture_width
+		uv_height := rect_height / texture_height
+		return {0, 0, uv_width, uv_height}, .Linear_Repeat, inner_rect
+
+	case .Center:
+		u_half := rect_width / (2 * texture_width)
+		v_half := rect_height / (2 * texture_height)
+		return {0.5 - u_half, 0.5 - v_half, 0.5 + u_half, 0.5 + v_half}, .Nearest_Clamp, inner_rect
+	}
+
+	return {0, 0, 1, 1}, .Linear_Clamp, inner_rect
+}
-- 
2.43.0


From ba522fa051143ce15a1371918ee043e902ac07e2 Mon Sep 17 00:00:00 2001
From: Zachary Levy <zachary@sunforge.is>
Date: Tue, 21 Apr 2026 15:35:55 -0700
Subject: [PATCH 3/5] QR code improvements

---
 draw/draw_qr/draw_qr.odin   | 216 +++++++++++++++++++++++++++---------
 draw/examples/textures.odin |  12 +-
 qrcode/generate.odin        | 128 +++++++++++++--------
 3 files changed, 247 insertions(+), 109 deletions(-)

diff --git a/draw/draw_qr/draw_qr.odin b/draw/draw_qr/draw_qr.odin
index 9fb3a0f..e5b1d84 100644
--- a/draw/draw_qr/draw_qr.odin
+++ b/draw/draw_qr/draw_qr.odin
@@ -3,76 +3,188 @@ package draw_qr
 import draw ".."
 import "../../qrcode"
 
-// A registered QR code texture, ready for display via draw.rectangle_texture.
-QR :: struct {
-	texture_id: draw.Texture_Id,
-	size:       int, // modules per side (e.g. 21..177)
+// -----------------------------------------------------------------------------
+// Layer 1 — pure: encoded QR buffer → RGBA pixels + descriptor
+// -----------------------------------------------------------------------------
+
+// Returns the number of bytes to_texture will write for the given encoded
+// QR buffer. Equivalent to size*size*4 where size = qrcode.get_size(qrcode_buf).
+texture_size :: #force_inline proc(qrcode_buf: []u8) -> int {
+	size := qrcode.get_size(qrcode_buf)
+	return size * size * 4
 }
 
-// Encode text as a QR code and register the result as an R8 texture.
-// The texture uses Nearest_Clamp sampling by default (sharp module edges).
-// Returns ok=false if encoding or registration fails.
+// Decodes an encoded QR buffer into tightly-packed RGBA pixel data written to
+// texture_buf. No allocations, no GPU calls. Returns the Texture_Desc the
+// caller should pass to draw.register_texture alongside texture_buf.
+//
+// Returns ok=false when:
+//   - qrcode_buf is invalid (qrcode.get_size returns 0).
+//   - texture_buf is smaller than to_texture_size(qrcode_buf).
 @(require_results)
-create_from_text :: proc(
+to_texture :: proc(
+	qrcode_buf: []u8,
+	texture_buf: []u8,
+	dark: draw.Color = draw.BLACK,
+	light: draw.Color = draw.WHITE,
+) -> (
+	desc: draw.Texture_Desc,
+	ok: bool,
+) {
+	size := qrcode.get_size(qrcode_buf)
+	if size == 0 do return {}, false
+	if len(texture_buf) < size * size * 4 do return {}, false
+
+	for y in 0 ..< size {
+		for x in 0 ..< size {
+			i := (y * size + x) * 4
+			c := dark if qrcode.get_module(qrcode_buf, x, y) else light
+			texture_buf[i + 0] = c[0]
+			texture_buf[i + 1] = c[1]
+			texture_buf[i + 2] = c[2]
+			texture_buf[i + 3] = c[3]
+		}
+	}
+
+	return draw.Texture_Desc {
+			width = u32(size),
+			height = u32(size),
+			depth_or_layers = 1,
+			type = .D2,
+			format = .R8G8B8A8_UNORM,
+			usage = {.SAMPLER},
+			mip_levels = 1,
+			kind = .Static,
+		},
+		true
+}
+
+// -----------------------------------------------------------------------------
+// Layer 2 — raw: pre-encoded QR buffer → registered GPU texture
+// -----------------------------------------------------------------------------
+
+// Allocates pixel buffer via temp_allocator, decodes qrcode_buf into it, and
+// registers with the GPU. The pixel allocation is freed before return.
+//
+// Returns ok=false when:
+//   - qrcode_buf is invalid (qrcode.get_size returns 0).
+//   - temp_allocator fails to allocate the pixel buffer.
+//   - GPU texture registration fails.
+@(require_results)
+register_texture_from_raw :: proc(
+	qrcode_buf: []u8,
+	dark: draw.Color = draw.BLACK,
+	light: draw.Color = draw.WHITE,
+	temp_allocator := context.temp_allocator,
+) -> (
+	texture: draw.Texture_Id,
+	ok: bool,
+) {
+	tex_size := texture_size(qrcode_buf)
+	if tex_size == 0 do return draw.INVALID_TEXTURE, false
+
+	pixels, alloc_err := make([]u8, tex_size, temp_allocator)
+	if alloc_err != nil do return draw.INVALID_TEXTURE, false
+	defer delete(pixels, temp_allocator)
+
+	desc := to_texture(qrcode_buf, pixels, dark, light) or_return
+	return draw.register_texture(desc, pixels)
+}
+
+// -----------------------------------------------------------------------------
+// Layer 3 — text → registered GPU texture
+// -----------------------------------------------------------------------------
+
+// Encodes text as a QR Code and registers the result as an RGBA texture.
+//
+// Returns ok=false when:
+//   - temp_allocator fails to allocate.
+//   - The text cannot fit in any version within [min_version, max_version] at the given ECL.
+//   - GPU texture registration fails.
+@(require_results)
+register_texture_from_text :: proc(
 	text: string,
 	ecl: qrcode.Ecc = .Low,
 	min_version: int = qrcode.VERSION_MIN,
 	max_version: int = qrcode.VERSION_MAX,
 	mask: Maybe(qrcode.Mask) = nil,
 	boost_ecl: bool = true,
+	dark: draw.Color = draw.BLACK,
+	light: draw.Color = draw.WHITE,
+	temp_allocator := context.temp_allocator,
 ) -> (
-	qr: QR,
+	texture: draw.Texture_Id,
 	ok: bool,
 ) {
-	qrcode_buf: [qrcode.BUFFER_LEN_MAX]u8
-	encode_ok := qrcode.encode(text, qrcode_buf[:], ecl, min_version, max_version, mask, boost_ecl)
-	if !encode_ok do return {}, false
-	return create(qrcode_buf[:])
+	qrcode_buf, alloc_err := make([]u8, qrcode.buffer_len_for_version(max_version), temp_allocator)
+	if alloc_err != nil do return draw.INVALID_TEXTURE, false
+	defer delete(qrcode_buf, temp_allocator)
+
+	qrcode.encode_auto(
+		text,
+		qrcode_buf,
+		ecl,
+		min_version,
+		max_version,
+		mask,
+		boost_ecl,
+		temp_allocator,
+	) or_return
+
+	return register_texture_from_raw(qrcode_buf, dark, light, temp_allocator)
 }
 
-// Register an already-encoded QR code buffer as an R8 texture.
-// qrcode_buf must be the output of qrcode.encode (byte 0 = side length, remaining = bit-packed modules).
+// -----------------------------------------------------------------------------
+// Layer 4 — binary → registered GPU texture
+// -----------------------------------------------------------------------------
+
+// Encodes arbitrary binary data as a QR Code and registers the result as an RGBA texture.
+//
+// Returns ok=false when:
+//   - temp_allocator fails to allocate.
+//   - The payload cannot fit in any version within [min_version, max_version] at the given ECL.
+//   - GPU texture registration fails.
 @(require_results)
-create :: proc(qrcode_buf: []u8) -> (qr: QR, ok: bool) {
-	size := qrcode.get_size(qrcode_buf)
-	if size == 0 do return {}, false
+register_texture_from_binary :: proc(
+	bin_data: []u8,
+	ecl: qrcode.Ecc = .Low,
+	min_version: int = qrcode.VERSION_MIN,
+	max_version: int = qrcode.VERSION_MAX,
+	mask: Maybe(qrcode.Mask) = nil,
+	boost_ecl: bool = true,
+	dark: draw.Color = draw.BLACK,
+	light: draw.Color = draw.WHITE,
+	temp_allocator := context.temp_allocator,
+) -> (
+	texture: draw.Texture_Id,
+	ok: bool,
+) {
+	qrcode_buf, alloc_err := make([]u8, qrcode.buffer_len_for_version(max_version), temp_allocator)
+	if alloc_err != nil do return draw.INVALID_TEXTURE, false
+	defer delete(qrcode_buf, temp_allocator)
 
-	// Build R8 pixel buffer: 0 = light, 255 = dark
-	pixels := make([]u8, size * size, context.temp_allocator)
-	for y in 0 ..< size {
-		for x in 0 ..< size {
-			pixels[y * size + x] = 255 if qrcode.get_module(qrcode_buf, x, y) else 0
-		}
-	}
+	qrcode.encode_auto(
+		bin_data,
+		qrcode_buf,
+		ecl,
+		min_version,
+		max_version,
+		mask,
+		boost_ecl,
+		temp_allocator,
+	) or_return
 
-	id, reg_ok := draw.register_texture(
-		draw.Texture_Desc {
-			width = u32(size),
-			height = u32(size),
-			depth_or_layers = 1,
-			type = .D2,
-			format = .R8_UNORM,
-			usage = {.SAMPLER},
-			mip_levels = 1,
-			kind = .Static,
-		},
-		pixels,
-	)
-	if !reg_ok do return {}, false
-
-	return QR{texture_id = id, size = size}, true
+	return register_texture_from_raw(qrcode_buf, dark, light, temp_allocator)
 }
 
-// Release the GPU texture.
-destroy :: proc(qr: ^QR) {
-	draw.unregister_texture(qr.texture_id)
-	qr.texture_id = draw.INVALID_TEXTURE
-	qr.size = 0
-}
+// -----------------------------------------------------------------------------
+// Clay integration helper
+// -----------------------------------------------------------------------------
 
-// Convenience: build a Clay_Image_Data for embedding a QR in Clay layouts.
-// Uses Nearest_Clamp sampling (set via Sampler_Preset at draw time, not here) and Fit mode
-// to preserve the QR's square aspect ratio.
-clay_image :: proc(qr: QR, tint: draw.Color = draw.WHITE) -> draw.Clay_Image_Data {
-	return draw.clay_image_data(qr.texture_id, fit = .Fit, tint = tint)
+// Default fit=.Fit preserves the QR's square aspect; override as needed.
+clay_image :: #force_inline proc(
+	texture: draw.Texture_Id,
+	tint: draw.Color = draw.WHITE,
+) -> draw.Clay_Image_Data {
+	return draw.clay_image_data(texture, fit = .Fit, tint = tint)
 }
diff --git a/draw/examples/textures.odin b/draw/examples/textures.odin
index ca53ba3..a89be7d 100644
--- a/draw/examples/textures.odin
+++ b/draw/examples/textures.odin
@@ -79,8 +79,8 @@ textures :: proc() {
 	// -------------------------------------------------------------------------
 	// QR code texture (R8_UNORM — see rendering note below)
 	// -------------------------------------------------------------------------
-	qr, _ := draw_qr.create_from_text("https://odin-lang.org/")
-	defer draw_qr.destroy(&qr)
+	qr_texture, _ := draw_qr.register_texture_from_text("https://x.com/miiilato/status/1880241066471051443")
+	defer draw.unregister_texture(qr_texture)
 
 	spin_angle: f32 = 0
 
@@ -161,16 +161,12 @@ textures :: proc() {
 		// =====================================================================
 		ROW2_Y :: f32(190)
 
-		// QR code (R8_UNORM texture, nearest sampling)
-		// NOTE: R8_UNORM samples as (r, 0, 0, 1) in Metal's default swizzle.
-		// With WHITE tint: dark modules (R=1) → red, light modules (R=0) → black.
-		// The result is a red-on-black QR code. The white bg rect below is
-		// occluded by the fully-opaque texture but kept for illustration.
+		// QR code (RGBA texture with baked colors, nearest sampling)
 		draw.rectangle(base_layer, {COL1, ROW2_Y, ITEM_SIZE, ITEM_SIZE}, {255, 255, 255, 255}) // white bg
 		draw.rectangle_texture(
 			base_layer,
 			{COL1, ROW2_Y, ITEM_SIZE, ITEM_SIZE},
-			qr.texture_id,
+			qr_texture,
 			sampler = .Nearest_Clamp,
 		)
 		draw.text(
diff --git a/qrcode/generate.odin b/qrcode/generate.odin
index 9014b56..8261021 100644
--- a/qrcode/generate.odin
+++ b/qrcode/generate.odin
@@ -117,7 +117,7 @@ NUM_ERROR_CORRECTION_BLOCKS := [4][41]i8{
 //   - The text cannot fit in any version within [min_version, max_version] at the given ECL.
 //   - The encoded segment data exceeds the buffer capacity.
 @(require_results)
-encode_text_explicit_temp :: proc(
+encode_text_manual :: proc(
 	text: string,
 	temp_buffer, qrcode: []u8,
 	ecl: Ecc,
@@ -130,7 +130,7 @@ encode_text_explicit_temp :: proc(
 ) {
 	text_len := len(text)
 	if text_len == 0 {
-		return encode_segments_advanced_explicit_temp(
+		return encode_segments_advanced_manual(
 			nil,
 			ecl,
 			min_version,
@@ -162,7 +162,7 @@ encode_text_explicit_temp :: proc(
 			seg.data = temp_buffer[:text_len]
 		}
 		segs := [1]Segment{seg}
-		return encode_segments_advanced_explicit_temp(
+		return encode_segments_advanced_manual(
 			segs[:],
 			ecl,
 			min_version,
@@ -211,13 +211,9 @@ encode_text_auto :: proc(
 		return false
 	}
 	defer delete(temp_buffer, temp_allocator)
-	return encode_text_explicit_temp(text, temp_buffer, qrcode, ecl, min_version, max_version, mask, boost_ecl)
+	return encode_text_manual(text, temp_buffer, qrcode, ecl, min_version, max_version, mask, boost_ecl)
 }
 
-encode_text :: proc {
-	encode_text_explicit_temp,
-	encode_text_auto,
-}
 
 // Encodes arbitrary binary data to a QR Code using byte mode.
 //
@@ -234,7 +230,7 @@ encode_text :: proc {
 // Returns ok=false when:
 //   - The payload cannot fit in any version within [min_version, max_version] at the given ECL.
 @(require_results)
-encode_binary :: proc(
+encode_binary_manual :: proc(
 	data_and_temp: []u8,
 	data_len: int,
 	qrcode: []u8,
@@ -256,7 +252,7 @@ encode_binary :: proc(
 	seg.num_chars = data_len
 	seg.data = data_and_temp[:data_len]
 	segs := [1]Segment{seg}
-	return encode_segments_advanced(
+	return encode_segments_advanced_manual(
 		segs[:],
 		ecl,
 		min_version,
@@ -268,6 +264,55 @@ encode_binary :: proc(
 	)
 }
 
+// Encodes arbitrary binary data to a QR Code using byte mode,
+// automatically allocating and freeing the temp buffer.
+//
+// Parameters:
+//   bin_data       - [in]  Payload bytes (aliased by the internal segment; not modified).
+//   qrcode         - [out] On success, contains the encoded QR Code. On failure, qrcode[0] is
+//                          set to 0.
+//   temp_allocator - Allocator used for the internal scratch buffer. Freed before return.
+//
+// qrcode must have length >= buffer_len_for_version(max_version).
+//
+// Returns ok=false when:
+//   - The payload cannot fit in any version within [min_version, max_version] at the given ECL.
+//   - The temp_allocator fails to allocate.
+@(require_results)
+encode_binary_auto :: proc(
+	bin_data: []u8,
+	qrcode: []u8,
+	ecl: Ecc,
+	min_version: int = VERSION_MIN,
+	max_version: int = VERSION_MAX,
+	mask: Maybe(Mask) = nil,
+	boost_ecl: bool = true,
+	temp_allocator := context.temp_allocator,
+) -> (
+	ok: bool,
+) {
+	seg: Segment
+	seg.mode = .Byte
+	seg.bit_length = calc_segment_bit_length(.Byte, len(bin_data))
+	if seg.bit_length == LENGTH_OVERFLOW {
+		qrcode[0] = 0
+		return false
+	}
+	seg.num_chars = len(bin_data)
+	seg.data = bin_data
+	segs := [1]Segment{seg}
+	return encode_segments_advanced_auto(
+		segs[:],
+		ecl,
+		min_version,
+		max_version,
+		mask,
+		boost_ecl,
+		qrcode,
+		temp_allocator,
+	)
+}
+
 // Encodes the given segments to a QR Code using default parameters
 // (VERSION_MIN..VERSION_MAX, auto mask, boost ECL).
 //
@@ -282,17 +327,8 @@ encode_binary :: proc(
 // Returns ok=false when:
 //   - The total segment data exceeds the capacity of version 40 at the given ECL.
 @(require_results)
-encode_segments_explicit_temp :: proc(segs: []Segment, ecl: Ecc, temp_buffer, qrcode: []u8) -> (ok: bool) {
-	return encode_segments_advanced_explicit_temp(
-		segs,
-		ecl,
-		VERSION_MIN,
-		VERSION_MAX,
-		nil,
-		true,
-		temp_buffer,
-		qrcode,
-	)
+encode_segments_manual :: proc(segs: []Segment, ecl: Ecc, temp_buffer, qrcode: []u8) -> (ok: bool) {
+	return encode_segments_advanced_manual(segs, ecl, VERSION_MIN, VERSION_MAX, nil, true, temp_buffer, qrcode)
 }
 
 // Encodes segments to a QR Code using default parameters, automatically allocating the temp buffer.
@@ -328,13 +364,9 @@ encode_segments_auto :: proc(
 		return false
 	}
 	defer delete(temp_buffer, temp_allocator)
-	return encode_segments_explicit_temp(segs, ecl, temp_buffer, qrcode)
+	return encode_segments_manual(segs, ecl, temp_buffer, qrcode)
 }
 
-encode_segments :: proc {
-	encode_segments_explicit_temp,
-	encode_segments_auto,
-}
 
 // Encodes the given segments to a QR Code with full control over version range, mask, and ECL boosting.
 //
@@ -353,7 +385,7 @@ encode_segments :: proc {
 //   - The total segment data exceeds the capacity of every version in [min_version, max_version]
 //     at the given ECL.
 @(require_results)
-encode_segments_advanced_explicit_temp :: proc(
+encode_segments_advanced_manual :: proc(
 	segs: []Segment,
 	ecl: Ecc,
 	min_version, max_version: int,
@@ -490,7 +522,7 @@ encode_segments_advanced_auto :: proc(
 		return false
 	}
 	defer delete(temp_buffer, temp_allocator)
-	return encode_segments_advanced_explicit_temp(
+	return encode_segments_advanced_manual(
 		segs,
 		ecl,
 		min_version,
@@ -502,18 +534,17 @@ encode_segments_advanced_auto :: proc(
 	)
 }
 
-encode_segments_advanced :: proc {
-	encode_segments_advanced_explicit_temp,
-	encode_segments_advanced_auto,
+encode_manual :: proc {
+	encode_text_manual,
+	encode_binary_manual,
+	encode_segments_manual,
+	encode_segments_advanced_manual,
 }
 
-encode :: proc {
-	encode_text_explicit_temp,
+encode_auto :: proc {
 	encode_text_auto,
-	encode_binary,
-	encode_segments_explicit_temp,
+	encode_binary_auto,
 	encode_segments_auto,
-	encode_segments_advanced_explicit_temp,
 	encode_segments_advanced_auto,
 }
 
@@ -981,7 +1012,7 @@ min_buffer_size :: proc {
 	min_buffer_size_segments,
 }
 
-// Text path: auto-selects numeric/alphanumeric/byte mode the same way encode_text does.
+// Text path: auto-selects numeric/alphanumeric/byte mode the same way encode_text_manual does.
 //
 // Returns ok=false when:
 //   - The text exceeds QR Code capacity for every version in the range at the given ECL.
@@ -1162,7 +1193,6 @@ calc_segment_buffer_size :: proc(mode: Mode, num_chars: int) -> int {
 	return (temp + 7) / 8
 }
 
-@(private)
 calc_segment_bit_length :: proc(mode: Mode, num_chars: int) -> int {
 	if num_chars < 0 || num_chars > 32767 {
 		return LENGTH_OVERFLOW
@@ -2487,7 +2517,7 @@ test_min_buffer_size_text :: proc(t: ^testing.T) {
 		testing.expect(t, planned > 0)
 		qrcode: [BUFFER_LEN_MAX]u8
 		temp: [BUFFER_LEN_MAX]u8
-		ok := encode_text(text, temp[:], qrcode[:], Ecc.Low)
+		ok := encode_text_manual(text, temp[:], qrcode[:], Ecc.Low)
 		testing.expect(t, ok)
 		actual_version_size := get_size(qrcode[:])
 		actual_buf_len := buffer_len_for_version((actual_version_size - 17) / 4)
@@ -2538,7 +2568,7 @@ test_min_buffer_size_binary :: proc(t: ^testing.T) {
 	testing.expect(t, size > 0)
 	testing.expect(t, size <= buffer_len_for_version(2))
 
-	// Verify agreement with encode_binary
+	// Verify agreement with encode_binary_manual
 	{
 		data_len :: 100
 		planned, planned_ok := min_buffer_size(data_len, .Medium)
@@ -2549,7 +2579,7 @@ test_min_buffer_size_binary :: proc(t: ^testing.T) {
 		for i in 0 ..< data_len {
 			dat[i] = u8(i)
 		}
-		ok := encode_binary(dat[:], data_len, qrcode[:], .Medium)
+		ok := encode_binary_manual(dat[:], data_len, qrcode[:], .Medium)
 		testing.expect(t, ok)
 		actual_version_size := get_size(qrcode[:])
 		actual_buf_len := buffer_len_for_version((actual_version_size - 17) / 4)
@@ -2609,7 +2639,7 @@ test_min_buffer_size_segments :: proc(t: ^testing.T) {
 		// Verify against actual encode
 		qrcode: [BUFFER_LEN_MAX]u8
 		temp: [BUFFER_LEN_MAX]u8
-		ok := encode_segments(segs[:], Ecc.Low, temp[:], qrcode[:])
+		ok := encode_segments_manual(segs[:], Ecc.Low, temp[:], qrcode[:])
 		testing.expect(t, ok)
 		actual_version_size := get_size(qrcode[:])
 		actual_buf_len := buffer_len_for_version((actual_version_size - 17) / 4)
@@ -2631,7 +2661,7 @@ test_encode_text_auto :: proc(t: ^testing.T) {
 		text :: "Hello, world!"
 		qr_explicit: [BUFFER_LEN_MAX]u8
 		temp: [BUFFER_LEN_MAX]u8
-		ok_explicit := encode_text_explicit_temp(text, temp[:], qr_explicit[:], .Low)
+		ok_explicit := encode_text_manual(text, temp[:], qr_explicit[:], .Low)
 		testing.expect(t, ok_explicit)
 
 		qr_auto: [BUFFER_LEN_MAX]u8
@@ -2650,7 +2680,7 @@ test_encode_text_auto :: proc(t: ^testing.T) {
 		text :: "314159265358979323846264338327950288419716939937510"
 		qr_explicit: [BUFFER_LEN_MAX]u8
 		temp: [BUFFER_LEN_MAX]u8
-		ok_explicit := encode_text_explicit_temp(text, temp[:], qr_explicit[:], .Medium)
+		ok_explicit := encode_text_manual(text, temp[:], qr_explicit[:], .Medium)
 		testing.expect(t, ok_explicit)
 
 		qr_auto: [BUFFER_LEN_MAX]u8
@@ -2669,7 +2699,7 @@ test_encode_text_auto :: proc(t: ^testing.T) {
 		text :: "HELLO WORLD"
 		qr_explicit: [BUFFER_LEN_MAX]u8
 		temp: [BUFFER_LEN_MAX]u8
-		ok_explicit := encode_text_explicit_temp(text, temp[:], qr_explicit[:], .Quartile)
+		ok_explicit := encode_text_manual(text, temp[:], qr_explicit[:], .Quartile)
 		testing.expect(t, ok_explicit)
 
 		qr_auto: [BUFFER_LEN_MAX]u8
@@ -2695,7 +2725,7 @@ test_encode_text_auto :: proc(t: ^testing.T) {
 		text :: "https://www.nayuki.io/"
 		qr_explicit: [BUFFER_LEN_MAX]u8
 		temp: [BUFFER_LEN_MAX]u8
-		ok_explicit := encode_text_explicit_temp(text, temp[:], qr_explicit[:], .High, mask = .M3)
+		ok_explicit := encode_text_manual(text, temp[:], qr_explicit[:], .High, mask = .M3)
 		testing.expect(t, ok_explicit)
 
 		qr_auto: [BUFFER_LEN_MAX]u8
@@ -2732,7 +2762,7 @@ test_encode_segments_auto :: proc(t: ^testing.T) {
 
 		qr_explicit: [BUFFER_LEN_MAX]u8
 		temp: [BUFFER_LEN_MAX]u8
-		ok_explicit := encode_segments_explicit_temp(segs[:], .Low, temp[:], qr_explicit[:])
+		ok_explicit := encode_segments_manual(segs[:], .Low, temp[:], qr_explicit[:])
 		testing.expect(t, ok_explicit)
 
 		qr_auto: [BUFFER_LEN_MAX]u8
@@ -2764,7 +2794,7 @@ test_encode_segments_advanced_auto :: proc(t: ^testing.T) {
 
 		qr_explicit: [BUFFER_LEN_MAX]u8
 		temp: [BUFFER_LEN_MAX]u8
-		ok_explicit := encode_segments_advanced_explicit_temp(
+		ok_explicit := encode_segments_advanced_manual(
 			segs[:],
 			.Medium,
 			VERSION_MIN,
@@ -2795,7 +2825,7 @@ test_encode_segments_advanced_auto :: proc(t: ^testing.T) {
 
 		qr_explicit: [BUFFER_LEN_MAX]u8
 		temp: [BUFFER_LEN_MAX]u8
-		ok_explicit := encode_segments_advanced_explicit_temp(
+		ok_explicit := encode_segments_advanced_manual(
 			segs[:],
 			.High,
 			1,
-- 
2.43.0


From 7650b90d911dbe8cda13db1c522a89df4075a95d Mon Sep 17 00:00:00 2001
From: Zachary Levy <zachary@sunforge.is>
Date: Tue, 21 Apr 2026 16:09:40 -0700
Subject: [PATCH 4/5] Comment cleanup

---
 draw/draw_qr/draw_qr.odin | 20 --------------------
 1 file changed, 20 deletions(-)

diff --git a/draw/draw_qr/draw_qr.odin b/draw/draw_qr/draw_qr.odin
index e5b1d84..3567092 100644
--- a/draw/draw_qr/draw_qr.odin
+++ b/draw/draw_qr/draw_qr.odin
@@ -3,10 +3,6 @@ package draw_qr
 import draw ".."
 import "../../qrcode"
 
-// -----------------------------------------------------------------------------
-// Layer 1 — pure: encoded QR buffer → RGBA pixels + descriptor
-// -----------------------------------------------------------------------------
-
 // Returns the number of bytes to_texture will write for the given encoded
 // QR buffer. Equivalent to size*size*4 where size = qrcode.get_size(qrcode_buf).
 texture_size :: #force_inline proc(qrcode_buf: []u8) -> int {
@@ -59,10 +55,6 @@ to_texture :: proc(
 		true
 }
 
-// -----------------------------------------------------------------------------
-// Layer 2 — raw: pre-encoded QR buffer → registered GPU texture
-// -----------------------------------------------------------------------------
-
 // Allocates pixel buffer via temp_allocator, decodes qrcode_buf into it, and
 // registers with the GPU. The pixel allocation is freed before return.
 //
@@ -91,10 +83,6 @@ register_texture_from_raw :: proc(
 	return draw.register_texture(desc, pixels)
 }
 
-// -----------------------------------------------------------------------------
-// Layer 3 — text → registered GPU texture
-// -----------------------------------------------------------------------------
-
 // Encodes text as a QR Code and registers the result as an RGBA texture.
 //
 // Returns ok=false when:
@@ -134,10 +122,6 @@ register_texture_from_text :: proc(
 	return register_texture_from_raw(qrcode_buf, dark, light, temp_allocator)
 }
 
-// -----------------------------------------------------------------------------
-// Layer 4 — binary → registered GPU texture
-// -----------------------------------------------------------------------------
-
 // Encodes arbitrary binary data as a QR Code and registers the result as an RGBA texture.
 //
 // Returns ok=false when:
@@ -177,10 +161,6 @@ register_texture_from_binary :: proc(
 	return register_texture_from_raw(qrcode_buf, dark, light, temp_allocator)
 }
 
-// -----------------------------------------------------------------------------
-// Clay integration helper
-// -----------------------------------------------------------------------------
-
 // Default fit=.Fit preserves the QR's square aspect; override as needed.
 clay_image :: #force_inline proc(
 	texture: draw.Texture_Id,
-- 
2.43.0


From ea19b83ba4054448c76290623b16d0308e7a4dcf Mon Sep 17 00:00:00 2001
From: Zachary Levy <zachary@sunforge.is>
Date: Tue, 21 Apr 2026 16:16:51 -0700
Subject: [PATCH 5/5] Cleanup

---
 draw/draw_qr/draw_qr.odin     |   5 +
 draw/examples/textures.odin   |  31 ++---
 draw/textures.odin            | 251 ++++++++++++++++------------------
 qrcode/examples/examples.odin |  80 ++++-------
 qrcode/generate.odin          | 115 +++++++---------
 5 files changed, 211 insertions(+), 271 deletions(-)

diff --git a/draw/draw_qr/draw_qr.odin b/draw/draw_qr/draw_qr.odin
index 3567092..91cf532 100644
--- a/draw/draw_qr/draw_qr.odin
+++ b/draw/draw_qr/draw_qr.odin
@@ -161,6 +161,11 @@ register_texture_from_binary :: proc(
 	return register_texture_from_raw(qrcode_buf, dark, light, temp_allocator)
 }
 
+register_texture_from :: proc {
+	register_texture_from_text,
+	register_texture_from_binary
+}
+
 // Default fit=.Fit preserves the QR's square aspect; override as needed.
 clay_image :: #force_inline proc(
 	texture: draw.Texture_Id,
diff --git a/draw/examples/textures.odin b/draw/examples/textures.odin
index a89be7d..b49b33a 100644
--- a/draw/examples/textures.odin
+++ b/draw/examples/textures.odin
@@ -2,7 +2,6 @@ package examples
 
 import "../../draw"
 import "../../draw/draw_qr"
-import "core:math"
 import "core:os"
 import sdl "vendor:sdl3"
 
@@ -17,9 +16,8 @@ textures :: proc() {
 	FONT_SIZE :: u16(14)
 	LABEL_OFFSET :: f32(8) // gap between item and its label
 
-	// -------------------------------------------------------------------------
-	// Procedural checkerboard texture (8x8, RGBA8)
-	// -------------------------------------------------------------------------
+	//----- Texture registration ----------------------------------
+
 	checker_size :: 8
 	checker_pixels: [checker_size * checker_size * 4]u8
 	for y in 0 ..< checker_size {
@@ -47,9 +45,6 @@ textures :: proc() {
 	)
 	defer draw.unregister_texture(checker_texture)
 
-	// -------------------------------------------------------------------------
-	// Non-square gradient stripe texture (16x8, RGBA8) for fit mode demos
-	// -------------------------------------------------------------------------
 	stripe_w :: 16
 	stripe_h :: 8
 	stripe_pixels: [stripe_w * stripe_h * 4]u8
@@ -76,14 +71,13 @@ textures :: proc() {
 	)
 	defer draw.unregister_texture(stripe_texture)
 
-	// -------------------------------------------------------------------------
-	// QR code texture (R8_UNORM — see rendering note below)
-	// -------------------------------------------------------------------------
-	qr_texture, _ := draw_qr.register_texture_from_text("https://x.com/miiilato/status/1880241066471051443")
+	qr_texture, _ := draw_qr.register_texture_from("https://x.com/miiilato/status/1880241066471051443")
 	defer draw.unregister_texture(qr_texture)
 
 	spin_angle: f32 = 0
 
+	//----- Draw loop ----------------------------------
+
 	for {
 		defer free_all(context.temp_allocator)
 		ev: sdl.Event
@@ -97,9 +91,8 @@ textures :: proc() {
 		// Background
 		draw.rectangle(base_layer, {0, 0, 800, 600}, {30, 30, 30, 255})
 
-		// =====================================================================
-		// Row 1: Sampler presets (y=30)
-		// =====================================================================
+		//----- Row 1: Sampler presets (y=30) ----------------------------------
+
 		ROW1_Y :: f32(30)
 		ITEM_SIZE :: f32(120)
 		COL1 :: f32(30)
@@ -156,9 +149,8 @@ textures :: proc() {
 			color = draw.WHITE,
 		)
 
-		// =====================================================================
-		// Row 2: QR code, Rounded, Rotating (y=190)
-		// =====================================================================
+		//----- Row 2: Sampler presets (y=190) ----------------------------------
+
 		ROW2_Y :: f32(190)
 
 		// QR code (RGBA texture with baked colors, nearest sampling)
@@ -214,9 +206,8 @@ textures :: proc() {
 			color = draw.WHITE,
 		)
 
-		// =====================================================================
-		// Row 3: Fit modes + Per-corner radii (y=360)
-		// =====================================================================
+		//----- Row 3: Fit modes + Per-corner radii (y=360) ----------------------------------
+
 		ROW3_Y :: f32(360)
 		FIT_SIZE :: f32(120) // square target rect
 
diff --git a/draw/textures.odin b/draw/textures.odin
index 64f636d..b9e5b31 100644
--- a/draw/textures.odin
+++ b/draw/textures.odin
@@ -4,10 +4,6 @@ import "core:log"
 import "core:mem"
 import sdl "vendor:sdl3"
 
-// ---------------------------------------------------------------------------
-// Texture types
-// ---------------------------------------------------------------------------
-
 Texture_Id :: distinct u32
 INVALID_TEXTURE :: Texture_Id(0) // Slot 0 is reserved/unused
 
@@ -61,10 +57,6 @@ Texture_Slot :: struct {
 //   GLOB.pending_texture_releases : [dynamic]Texture_Id
 //   GLOB.samplers               : [SAMPLER_PRESET_COUNT]^sdl.GPUSampler
 
-// ---------------------------------------------------------------------------
-// Clay integration type
-// ---------------------------------------------------------------------------
-
 Clay_Image_Data :: struct {
 	texture_id: Texture_Id,
 	fit:        Fit_Mode,
@@ -75,9 +67,9 @@ clay_image_data :: proc(id: Texture_Id, fit: Fit_Mode = .Stretch, tint: Color =
 	return {texture_id = id, fit = fit, tint = tint}
 }
 
-// ---------------------------------------------------------------------------
-// Registration
-// ---------------------------------------------------------------------------
+// ---------------------------------------------------------------------------------------------------------------------
+// ----- Registration -------------
+// ---------------------------------------------------------------------------------------------------------------------
 
 // Register a texture. Draw owns the GPU resource and releases it on unregister.
 // `data` is tightly-packed row-major bytes matching desc.format.
@@ -236,130 +228,9 @@ update_texture_region :: proc(id: Texture_Id, region: Rectangle, data: []u8) {
 	}
 }
 
-// ---------------------------------------------------------------------------
-// Accessors
-// ---------------------------------------------------------------------------
-
-texture_size :: proc(id: Texture_Id) -> [2]u32 {
-	if id == INVALID_TEXTURE do return {0, 0}
-	slot := &GLOB.texture_slots[u32(id)]
-	return {slot.desc.width, slot.desc.height}
-}
-
-texture_format :: proc(id: Texture_Id) -> sdl.GPUTextureFormat {
-	if id == INVALID_TEXTURE do return .INVALID
-	return GLOB.texture_slots[u32(id)].desc.format
-}
-
-texture_kind :: proc(id: Texture_Id) -> Texture_Kind {
-	if id == INVALID_TEXTURE do return .Static
-	return GLOB.texture_slots[u32(id)].desc.kind
-}
-
-// Internal: get the raw GPU texture pointer for binding during draw.
-@(private)
-texture_gpu_handle :: proc(id: Texture_Id) -> ^sdl.GPUTexture {
-	if id == INVALID_TEXTURE do return nil
-	idx := u32(id)
-	if idx >= u32(len(GLOB.texture_slots)) do return nil
-	return GLOB.texture_slots[idx].gpu_texture
-}
-
-// ---------------------------------------------------------------------------
-// Deferred release (called from draw.end / clear_global)
-// ---------------------------------------------------------------------------
-
-@(private)
-process_pending_texture_releases :: proc() {
-	device := GLOB.device
-	for id in GLOB.pending_texture_releases {
-		idx := u32(id)
-		if idx >= u32(len(GLOB.texture_slots)) do continue
-		slot := &GLOB.texture_slots[idx]
-		if slot.gpu_texture != nil {
-			sdl.ReleaseGPUTexture(device, slot.gpu_texture)
-			slot.gpu_texture = nil
-		}
-		slot.generation += 1
-		append(&GLOB.texture_free_list, idx)
-	}
-	clear(&GLOB.pending_texture_releases)
-}
-
-// ---------------------------------------------------------------------------
-// Sampler pool
-// ---------------------------------------------------------------------------
-
-@(private)
-get_sampler :: proc(preset: Sampler_Preset) -> ^sdl.GPUSampler {
-	idx := int(preset)
-	if GLOB.samplers[idx] != nil do return GLOB.samplers[idx]
-
-	// Lazily create
-	min_filter, mag_filter: sdl.GPUFilter
-	address_mode: sdl.GPUSamplerAddressMode
-
-	switch preset {
-	case .Nearest_Clamp:
-		min_filter = .NEAREST; mag_filter = .NEAREST; address_mode = .CLAMP_TO_EDGE
-	case .Linear_Clamp:
-		min_filter = .LINEAR; mag_filter = .LINEAR; address_mode = .CLAMP_TO_EDGE
-	case .Nearest_Repeat:
-		min_filter = .NEAREST; mag_filter = .NEAREST; address_mode = .REPEAT
-	case .Linear_Repeat:
-		min_filter = .LINEAR; mag_filter = .LINEAR; address_mode = .REPEAT
-	}
-
-	sampler := sdl.CreateGPUSampler(
-		GLOB.device,
-		sdl.GPUSamplerCreateInfo {
-			min_filter = min_filter,
-			mag_filter = mag_filter,
-			mipmap_mode = .LINEAR,
-			address_mode_u = address_mode,
-			address_mode_v = address_mode,
-			address_mode_w = address_mode,
-		},
-	)
-	if sampler == nil {
-		log.errorf("Failed to create sampler preset %v: %s", preset, sdl.GetError())
-		return GLOB.pipeline_2d_base.sampler // fallback to existing default sampler
-	}
-
-	GLOB.samplers[idx] = sampler
-	return sampler
-}
-
-// Internal: destroy all sampler pool entries. Called from draw.destroy().
-@(private)
-destroy_sampler_pool :: proc() {
-	device := GLOB.device
-	for &s in GLOB.samplers {
-		if s != nil {
-			sdl.ReleaseGPUSampler(device, s)
-			s = nil
-		}
-	}
-}
-
-// Internal: destroy all registered textures. Called from draw.destroy().
-@(private)
-destroy_all_textures :: proc() {
-	device := GLOB.device
-	for &slot in GLOB.texture_slots {
-		if slot.gpu_texture != nil {
-			sdl.ReleaseGPUTexture(device, slot.gpu_texture)
-			slot.gpu_texture = nil
-		}
-	}
-	delete(GLOB.texture_slots)
-	delete(GLOB.texture_free_list)
-	delete(GLOB.pending_texture_releases)
-}
-
-// ---------------------------------------------------------------------------
-// Fit mode helper
-// ---------------------------------------------------------------------------
+// ---------------------------------------------------------------------------------------------------------------------
+// ----- Helpers -------------
+// ---------------------------------------------------------------------------------------------------------------------
 
 // Compute UV rect, recommended sampler, and inner rect for a given fit mode.
 // `rect` is the target drawing area; `texture_id` identifies the texture whose
@@ -431,3 +302,113 @@ fit_params :: proc(
 
 	return {0, 0, 1, 1}, .Linear_Clamp, inner_rect
 }
+
+texture_size :: proc(id: Texture_Id) -> [2]u32 {
+	if id == INVALID_TEXTURE do return {0, 0}
+	slot := &GLOB.texture_slots[u32(id)]
+	return {slot.desc.width, slot.desc.height}
+}
+
+texture_format :: proc(id: Texture_Id) -> sdl.GPUTextureFormat {
+	if id == INVALID_TEXTURE do return .INVALID
+	return GLOB.texture_slots[u32(id)].desc.format
+}
+
+texture_kind :: proc(id: Texture_Id) -> Texture_Kind {
+	if id == INVALID_TEXTURE do return .Static
+	return GLOB.texture_slots[u32(id)].desc.kind
+}
+
+// Internal: get the raw GPU texture pointer for binding during draw.
+@(private)
+texture_gpu_handle :: proc(id: Texture_Id) -> ^sdl.GPUTexture {
+	if id == INVALID_TEXTURE do return nil
+	idx := u32(id)
+	if idx >= u32(len(GLOB.texture_slots)) do return nil
+	return GLOB.texture_slots[idx].gpu_texture
+}
+
+// Deferred release (called from draw.end / clear_global)
+@(private)
+process_pending_texture_releases :: proc() {
+	device := GLOB.device
+	for id in GLOB.pending_texture_releases {
+		idx := u32(id)
+		if idx >= u32(len(GLOB.texture_slots)) do continue
+		slot := &GLOB.texture_slots[idx]
+		if slot.gpu_texture != nil {
+			sdl.ReleaseGPUTexture(device, slot.gpu_texture)
+			slot.gpu_texture = nil
+		}
+		slot.generation += 1
+		append(&GLOB.texture_free_list, idx)
+	}
+	clear(&GLOB.pending_texture_releases)
+}
+
+@(private)
+get_sampler :: proc(preset: Sampler_Preset) -> ^sdl.GPUSampler {
+	idx := int(preset)
+	if GLOB.samplers[idx] != nil do return GLOB.samplers[idx]
+
+	// Lazily create
+	min_filter, mag_filter: sdl.GPUFilter
+	address_mode: sdl.GPUSamplerAddressMode
+
+	switch preset {
+	case .Nearest_Clamp:
+		min_filter = .NEAREST; mag_filter = .NEAREST; address_mode = .CLAMP_TO_EDGE
+	case .Linear_Clamp:
+		min_filter = .LINEAR; mag_filter = .LINEAR; address_mode = .CLAMP_TO_EDGE
+	case .Nearest_Repeat:
+		min_filter = .NEAREST; mag_filter = .NEAREST; address_mode = .REPEAT
+	case .Linear_Repeat:
+		min_filter = .LINEAR; mag_filter = .LINEAR; address_mode = .REPEAT
+	}
+
+	sampler := sdl.CreateGPUSampler(
+		GLOB.device,
+		sdl.GPUSamplerCreateInfo {
+			min_filter = min_filter,
+			mag_filter = mag_filter,
+			mipmap_mode = .LINEAR,
+			address_mode_u = address_mode,
+			address_mode_v = address_mode,
+			address_mode_w = address_mode,
+		},
+	)
+	if sampler == nil {
+		log.errorf("Failed to create sampler preset %v: %s", preset, sdl.GetError())
+		return GLOB.pipeline_2d_base.sampler // fallback to existing default sampler
+	}
+
+	GLOB.samplers[idx] = sampler
+	return sampler
+}
+
+// Internal: destroy all sampler pool entries. Called from draw.destroy().
+@(private)
+destroy_sampler_pool :: proc() {
+	device := GLOB.device
+	for &s in GLOB.samplers {
+		if s != nil {
+			sdl.ReleaseGPUSampler(device, s)
+			s = nil
+		}
+	}
+}
+
+// Internal: destroy all registered textures. Called from draw.destroy().
+@(private)
+destroy_all_textures :: proc() {
+	device := GLOB.device
+	for &slot in GLOB.texture_slots {
+		if slot.gpu_texture != nil {
+			sdl.ReleaseGPUTexture(device, slot.gpu_texture)
+			slot.gpu_texture = nil
+		}
+	}
+	delete(GLOB.texture_slots)
+	delete(GLOB.texture_free_list)
+	delete(GLOB.pending_texture_releases)
+}
diff --git a/qrcode/examples/examples.odin b/qrcode/examples/examples.odin
index 4db3d59..fabca9a 100644
--- a/qrcode/examples/examples.odin
+++ b/qrcode/examples/examples.odin
@@ -73,57 +73,32 @@ main :: proc() {
 	}
 }
 
-// -------------------------------------------------------------------------------------------------
-// Utilities
-// -------------------------------------------------------------------------------------------------
-
-// Prints the given QR Code to the console.
-print_qr :: proc(qrcode: []u8) {
-	size := qr.get_size(qrcode)
-	border :: 4
-	for y in -border ..< size + border {
-		for x in -border ..< size + border {
-			fmt.print("##" if qr.get_module(qrcode, x, y) else "  ")
-		}
-		fmt.println()
-	}
-	fmt.println()
-}
-
-// -------------------------------------------------------------------------------------------------
-// Demo: Basic
-// -------------------------------------------------------------------------------------------------
-
 // Creates a single QR Code, then prints it to the console.
 basic :: proc() {
 	text :: "Hello, world!"
 	ecl :: qr.Ecc.Low
 
 	qrcode: [qr.BUFFER_LEN_MAX]u8
-	ok := qr.encode(text, qrcode[:], ecl)
+	ok := qr.encode_auto(text, qrcode[:], ecl)
 	if ok do print_qr(qrcode[:])
 }
 
-// -------------------------------------------------------------------------------------------------
-// Demo: Variety
-// -------------------------------------------------------------------------------------------------
-
 // Creates a variety of QR Codes that exercise different features of the library.
 variety :: proc() {
 	qrcode: [qr.BUFFER_LEN_MAX]u8
 
 	{ 	// Numeric mode encoding (3.33 bits per digit)
-		ok := qr.encode("314159265358979323846264338327950288419716939937510", qrcode[:], qr.Ecc.Medium)
+		ok := qr.encode_auto("314159265358979323846264338327950288419716939937510", qrcode[:], qr.Ecc.Medium)
 		if ok do print_qr(qrcode[:])
 	}
 
 	{ 	// Alphanumeric mode encoding (5.5 bits per character)
-		ok := qr.encode("DOLLAR-AMOUNT:$39.87 PERCENTAGE:100.00% OPERATIONS:+-*/", qrcode[:], qr.Ecc.High)
+		ok := qr.encode_auto("DOLLAR-AMOUNT:$39.87 PERCENTAGE:100.00% OPERATIONS:+-*/", qrcode[:], qr.Ecc.High)
 		if ok do print_qr(qrcode[:])
 	}
 
 	{ 	// Unicode text as UTF-8
-		ok := qr.encode(
+		ok := qr.encode_auto(
 			"\xE3\x81\x93\xE3\x82\x93\xE3\x81\xAB\xE3\x81\xA1wa\xE3\x80\x81" +
 			"\xE4\xB8\x96\xE7\x95\x8C\xEF\xBC\x81\x20\xCE\xB1\xCE\xB2\xCE\xB3\xCE\xB4",
 			qrcode[:],
@@ -133,7 +108,7 @@ variety :: proc() {
 	}
 
 	{ 	// Moderately large QR Code using longer text (from Lewis Carroll's Alice in Wonderland)
-		ok := qr.encode(
+		ok := qr.encode_auto(
 			"Alice was beginning to get very tired of sitting by her sister on the bank, " +
 			"and of having nothing to do: once or twice she had peeped into the book her sister was reading, " +
 			"but it had no pictures or conversations in it, 'and what is the use of a book,' thought Alice " +
@@ -148,10 +123,6 @@ variety :: proc() {
 	}
 }
 
-// -------------------------------------------------------------------------------------------------
-// Demo: Segment
-// -------------------------------------------------------------------------------------------------
-
 // Creates QR Codes with manually specified segments for better compactness.
 segment :: proc() {
 	qrcode: [qr.BUFFER_LEN_MAX]u8
@@ -163,7 +134,7 @@ segment :: proc() {
 		// Encode as single text (auto mode selection)
 		{
 			concat :: silver0 + silver1
-			ok := qr.encode(concat, qrcode[:], qr.Ecc.Low)
+			ok := qr.encode_auto(concat, qrcode[:], qr.Ecc.Low)
 			if ok do print_qr(qrcode[:])
 		}
 
@@ -172,7 +143,7 @@ segment :: proc() {
 			seg_buf0: [qr.BUFFER_LEN_MAX]u8
 			seg_buf1: [qr.BUFFER_LEN_MAX]u8
 			segs := [2]qr.Segment{qr.make_alphanumeric(silver0, seg_buf0[:]), qr.make_numeric(silver1, seg_buf1[:])}
-			ok := qr.encode(segs[:], qr.Ecc.Low, qrcode[:])
+			ok := qr.encode_auto(segs[:], qr.Ecc.Low, qrcode[:])
 			if ok do print_qr(qrcode[:])
 		}
 	}
@@ -185,7 +156,7 @@ segment :: proc() {
 		// Encode as single text (auto mode selection)
 		{
 			concat :: golden0 + golden1 + golden2
-			ok := qr.encode(concat, qrcode[:], qr.Ecc.Low)
+			ok := qr.encode_auto(concat, qrcode[:], qr.Ecc.Low)
 			if ok do print_qr(qrcode[:])
 		}
 
@@ -201,7 +172,7 @@ segment :: proc() {
 				qr.make_numeric(golden1, seg_buf1[:]),
 				qr.make_alphanumeric(golden2, seg_buf2[:]),
 			}
-			ok := qr.encode(segs[:], qr.Ecc.Low, qrcode[:])
+			ok := qr.encode_auto(segs[:], qr.Ecc.Low, qrcode[:])
 			if ok do print_qr(qrcode[:])
 		}
 	}
@@ -219,7 +190,7 @@ segment :: proc() {
 				"\xEF\xBD\x84\xEF\xBD\x85\xEF\xBD\x93\xEF" +
 				"\xBD\x95\xE3\x80\x80\xCE\xBA\xCE\xB1\xEF" +
 				"\xBC\x9F"
-			ok := qr.encode(madoka, qrcode[:], qr.Ecc.Low)
+			ok := qr.encode_auto(madoka, qrcode[:], qr.Ecc.Low)
 			if ok do print_qr(qrcode[:])
 		}
 
@@ -254,16 +225,12 @@ segment :: proc() {
 			seg.data = seg_buf[:(seg.bit_length + 7) / 8]
 
 			segs := [1]qr.Segment{seg}
-			ok := qr.encode(segs[:], qr.Ecc.Low, qrcode[:])
+			ok := qr.encode_auto(segs[:], qr.Ecc.Low, qrcode[:])
 			if ok do print_qr(qrcode[:])
 		}
 	}
 }
 
-// -------------------------------------------------------------------------------------------------
-// Demo: Mask
-// -------------------------------------------------------------------------------------------------
-
 // Creates QR Codes with the same size and contents but different mask patterns.
 mask :: proc() {
 	qrcode: [qr.BUFFER_LEN_MAX]u8
@@ -271,10 +238,10 @@ mask :: proc() {
 	{ 	// Project Nayuki URL
 		ok: bool
 
-		ok = qr.encode("https://www.nayuki.io/", qrcode[:], qr.Ecc.High)
+		ok = qr.encode_auto("https://www.nayuki.io/", qrcode[:], qr.Ecc.High)
 		if ok do print_qr(qrcode[:])
 
-		ok = qr.encode("https://www.nayuki.io/", qrcode[:], qr.Ecc.High, mask = qr.Mask.M3)
+		ok = qr.encode_auto("https://www.nayuki.io/", qrcode[:], qr.Ecc.High, mask = qr.Mask.M3)
 		if ok do print_qr(qrcode[:])
 	}
 
@@ -290,16 +257,29 @@ mask :: proc() {
 
 		ok: bool
 
-		ok = qr.encode(text, qrcode[:], qr.Ecc.Medium, mask = qr.Mask.M0)
+		ok = qr.encode_auto(text, qrcode[:], qr.Ecc.Medium, mask = qr.Mask.M0)
 		if ok do print_qr(qrcode[:])
 
-		ok = qr.encode(text, qrcode[:], qr.Ecc.Medium, mask = qr.Mask.M1)
+		ok = qr.encode_auto(text, qrcode[:], qr.Ecc.Medium, mask = qr.Mask.M1)
 		if ok do print_qr(qrcode[:])
 
-		ok = qr.encode(text, qrcode[:], qr.Ecc.Medium, mask = qr.Mask.M5)
+		ok = qr.encode_auto(text, qrcode[:], qr.Ecc.Medium, mask = qr.Mask.M5)
 		if ok do print_qr(qrcode[:])
 
-		ok = qr.encode(text, qrcode[:], qr.Ecc.Medium, mask = qr.Mask.M7)
+		ok = qr.encode_auto(text, qrcode[:], qr.Ecc.Medium, mask = qr.Mask.M7)
 		if ok do print_qr(qrcode[:])
 	}
 }
+
+// Prints the given QR Code to the console.
+print_qr :: proc(qrcode: []u8) {
+	size := qr.get_size(qrcode)
+	border :: 4
+	for y in -border ..< size + border {
+		for x in -border ..< size + border {
+			fmt.print("##" if qr.get_module(qrcode, x, y) else "  ")
+		}
+		fmt.println()
+	}
+	fmt.println()
+}
diff --git a/qrcode/generate.odin b/qrcode/generate.odin
index 8261021..0bf3b0d 100644
--- a/qrcode/generate.odin
+++ b/qrcode/generate.odin
@@ -2,10 +2,30 @@ package qrcode
 
 import "core:slice"
 
+VERSION_MIN :: 1
+VERSION_MAX :: 40
 
-// -------------------------------------------------------------------------------------------------
-// Types
-// -------------------------------------------------------------------------------------------------
+// The worst-case number of bytes needed to store one QR Code, up to and including version 40.
+BUFFER_LEN_MAX :: 3918 // buffer_len_for_version(VERSION_MAX)
+
+// Returns the number of bytes needed to store any QR Code up to and including the given version.
+buffer_len_for_version :: #force_inline proc(n: int) -> int {
+	size := n * 4 + 17
+	return (size * size + 7) / 8 + 1
+}
+
+@(private)
+LENGTH_OVERFLOW :: -1
+@(private)
+REED_SOLOMON_DEGREE_MAX :: 30
+@(private)
+PENALTY_N1 :: 3
+@(private)
+PENALTY_N2 :: 3
+@(private)
+PENALTY_N3 :: 40
+@(private)
+PENALTY_N4 :: 10
 
 // The error correction level in a QR Code symbol.
 Ecc :: enum {
@@ -44,39 +64,6 @@ Segment :: struct {
 	bit_length: int,
 }
 
-// -------------------------------------------------------------------------------------------------
-// Constants
-// -------------------------------------------------------------------------------------------------
-
-VERSION_MIN :: 1
-VERSION_MAX :: 40
-
-// The worst-case number of bytes needed to store one QR Code, up to and including version 40.
-BUFFER_LEN_MAX :: 3918 // buffer_len_for_version(VERSION_MAX)
-
-// Returns the number of bytes needed to store any QR Code up to and including the given version.
-buffer_len_for_version :: #force_inline proc(n: int) -> int {
-	size := n * 4 + 17
-	return (size * size + 7) / 8 + 1
-}
-
-// -------------------------------------------------------------------------------------------------
-// Private constants
-// -------------------------------------------------------------------------------------------------
-
-@(private)
-LENGTH_OVERFLOW :: -1
-@(private)
-REED_SOLOMON_DEGREE_MAX :: 30
-@(private)
-PENALTY_N1 :: 3
-@(private)
-PENALTY_N2 :: 3
-@(private)
-PENALTY_N3 :: 40
-@(private)
-PENALTY_N4 :: 10
-
 //odinfmt: disable
 // For generating error correction codes. Index 0 is padding (set to illegal value).
 @(private)
@@ -96,10 +83,9 @@ NUM_ERROR_CORRECTION_BLOCKS := [4][41]i8{
 }
 //odinfmt: enable
 
-
-// -------------------------------------------------------------------------------------------------
-// Encode procedures
-// -------------------------------------------------------------------------------------------------
+// ---------------------------------------------------------------------------------------------------------------------
+// ----- Encode Procedures ------------------------
+// ---------------------------------------------------------------------------------------------------------------------
 
 // Encodes the given text string to a QR Code, automatically selecting
 // numeric, alphanumeric, or byte mode based on content.
@@ -548,9 +534,10 @@ encode_auto :: proc {
 	encode_segments_advanced_auto,
 }
 
-// -------------------------------------------------------------------------------------------------
-// Error correction code generation
-// -------------------------------------------------------------------------------------------------
+
+// ---------------------------------------------------------------------------------------------------------------------
+// ----- Error Correction Code Generation ------------------------
+// ---------------------------------------------------------------------------------------------------------------------
 
 // Appends error correction bytes to each block of data, then interleaves bytes from all blocks.
 @(private)
@@ -618,10 +605,6 @@ get_num_raw_data_modules :: proc(ver: int) -> int {
 	return result
 }
 
-// -------------------------------------------------------------------------------------------------
-// Reed-Solomon ECC generator
-// -------------------------------------------------------------------------------------------------
-
 @(private)
 reed_solomon_compute_divisor :: proc(degree: int, result: []u8) {
 	assert(1 <= degree && degree <= REED_SOLOMON_DEGREE_MAX, "reed-solomon degree out of range")
@@ -668,9 +651,9 @@ reed_solomon_multiply :: proc(x, y: u8) -> u8 {
 	return z
 }
 
-// -------------------------------------------------------------------------------------------------
-// Drawing function modules
-// -------------------------------------------------------------------------------------------------
+// ---------------------------------------------------------------------------------------------------------------------
+// ----- Drawing Function Modules ------------------------
+// ---------------------------------------------------------------------------------------------------------------------
 
 // Clears the QR Code grid and marks every function module as dark.
 @(private)
@@ -816,9 +799,9 @@ fill_rectangle :: proc(left, top, width, height: int, qrcode: []u8) {
 	}
 }
 
-// -------------------------------------------------------------------------------------------------
-// Drawing data modules and masking
-// -------------------------------------------------------------------------------------------------
+// ---------------------------------------------------------------------------------------------------------------------
+// ----- Drawing data modules and masking ------------------------
+// ---------------------------------------------------------------------------------------------------------------------
 
 @(private)
 draw_codewords :: proc(data: []u8, data_len: int, qrcode: []u8) {
@@ -996,9 +979,9 @@ finder_penalty_add_history :: proc(current_run_length: int, run_history: ^[7]int
 	run_history[0] = current_run_length
 }
 
-// -------------------------------------------------------------------------------------------------
-// Basic QR Code information
-// -------------------------------------------------------------------------------------------------
+// ---------------------------------------------------------------------------------------------------------------------
+// ----- Basic QR code information ------------------------
+// ---------------------------------------------------------------------------------------------------------------------
 
 // Returns the minimum buffer size (in bytes) needed for both temp_buffer and qrcode
 // to encode the given content at the given ECC level within the given version range.
@@ -1158,9 +1141,9 @@ get_bit :: #force_inline proc(x: int, i: uint) -> bool {
 	return ((x >> i) & 1) != 0
 }
 
-// -------------------------------------------------------------------------------------------------
-// Segment handling
-// -------------------------------------------------------------------------------------------------
+// ---------------------------------------------------------------------------------------------------------------------
+// ----- Segment Handling ------------------------
+// ---------------------------------------------------------------------------------------------------------------------
 
 // Tests whether the given string can be encoded in numeric mode.
 is_numeric :: proc(text: string) -> bool {
@@ -1349,11 +1332,11 @@ make_eci :: proc(assign_val: int, buf: []u8) -> Segment {
 	return result
 }
 
-// -------------------------------------------------------------------------------------------------
-// Private helpers
-// -------------------------------------------------------------------------------------------------
+// ---------------------------------------------------------------------------------------------------------------------
+// ----- Helpers ------------------------
+// ---------------------------------------------------------------------------------------------------------------------
 
-@(private)
+// Internal
 append_bits_to_buffer :: proc(val: uint, num_bits: int, buffer: []u8, bit_len: ^int) {
 	assert(0 <= num_bits && num_bits <= 16 && val >> uint(num_bits) == 0, "invalid bit count or value overflow")
 	for i := num_bits - 1; i >= 0; i -= 1 {
@@ -1362,7 +1345,7 @@ append_bits_to_buffer :: proc(val: uint, num_bits: int, buffer: []u8, bit_len: ^
 	}
 }
 
-@(private)
+// Internal
 get_total_bits :: proc(segs: []Segment, version: int) -> int {
 	result := 0
 	for &seg in segs {
@@ -1384,7 +1367,7 @@ get_total_bits :: proc(segs: []Segment, version: int) -> int {
 	return result
 }
 
-@(private)
+// Internal
 num_char_count_bits :: proc(mode: Mode, version: int) -> int {
 	assert(VERSION_MIN <= version && version <= VERSION_MAX, "version out of bounds")
 	i := (version + 7) / 17
@@ -1406,8 +1389,8 @@ num_char_count_bits :: proc(mode: Mode, version: int) -> int {
 	unreachable()
 }
 
+// Internal
 // Returns the index of c in the alphanumeric charset (0-44), or -1 if not found.
-@(private)
 alphanumeric_index :: proc(c: u8) -> int {
 	switch c {
 	case '0' ..= '9': return int(c - '0')
-- 
2.43.0